So it appears the main barrier between me and Russian fluency is…

that I haven’t been told that google translate is a somewhat unreliable translation.

I’ve posted a few question on the Russian StackExchange. I get two things– pretty good answers from experts and cheap potshots from the commenters. The experts sound like the might have experience teaching a foreign language. The commenters are worse than useless and the moderators side with them.

Beginning language learners begin writing by a variety of means, I know I’ve tried.

Initially, you use what ever form of word you remember. “polu” — it means floor, what gender, what case, what number? Not sure, you hear this in the phrase, it fell on the floor. So you use the word as if it has only one form. A non teacher will try to teach you that each word has a half dozen forms. I already know that. Thanks, the sole thing between me and fluency was that I didn’t already know that Russian has morphology.

Then you have words you have never heard, so you look them up. Then people criticize you for using the dictionary form too often. Thanks again, the sole thing between me and Russian fluency is that I didn’t know that Russian has morphology.

So I try google translate. I can gauge the quality of the translation by looking at how well it does from Russian to English and I can generally see what is certainly wrong and what is dodgy. But that doesn’t mean I know how to write it better. So I go ask a question and all I get is “Oh! Don’t use google translate, it is worthless” (And instead they suggested I use just google! Whee, now I have a word used in a completely different context, wrong case, wrong number and you need to know what word to search for to google it.) If you are fluent bilingual and learned English from Mom and Russian from Dad and can decide the case of a noun by asking, “What word answers to Kovo?”– well, for you google translate is useless.

For you dictionary, google translate and learner-haters out there– I’m happy you are smart and bilingual, but I wish you just bugger off. You have an idealized idea of how to learn a language where everyone does it just the same way you did, they memorized all 600 pages of Dr. Smiths Grammar, you can rattle off all 2000 slots in the tables of word endings and do so in Cyrillic and English alphabetical order and you’ve already memorized the dictionary entries of the entire dictionary and can recited any page on demand just by someone naming the page number.

Good for you. Leave me alone.

And as for the Russian StackExchange, I will leave you alone, the same as it seems the rest of the internet is doing. The Master Russian forum (the main competitor in this space) lets people post translation requests– people post what google translate thought and they get… translations & help. (I’ll update later if I’m wrong about the general level of civility and acceptance of how language learners really are on the Master Russian forum.)

Posted in Learning Any Language | Comments Off


Someone said, “3 people are fluent in toki pona” What does that even mean? It means squat.

The gold standard of fluency is native fluency, which kids get for free. It is not so free that a language learned as a child will be excellent– for example, if you learn Spanish in the US from your grandmother, you will not speak it as well as someone who learned it in Mexico and took 12 years of schooling in Spanish.

There are about 5 cases of kids becoming fluent in a conlang, ghostlang or other similar language that previously was spoken by no one and is now spoken by them and their mom or dad. One for Living Latin(1), Hebrew(many), Esperanto (many), Klingon (1), Volapuk(1) and a lady on FB who is teaching her diary language to her kids. In the cases where there was only one– obviously they stopped speaking it as adults and probably don’t speak it all that well now. Still, these cases are what linguists prefer when looking for fluent speakers and if warmed up, they might speak it better than anyone who knows that language only as an L2.

Near native fluency is fairly hard to gain as an adult– it happens, but usually there will be accents and grammatical peculiarities in the speech of someone who has learned a language as a 2nd language, even if they are crazy smart. Typically it takes 10 years of living in a country, speaking your 2nd language all the time before you hit near native fluency.

Below that, is a huge range of degrees of fluency. I can converse in Russian, but can’t write it to save my life. I can read Icelandic, often better than I can read Russian. I can pick out a few words here and there in French and Spanish and my translation score would be better than a machine that chose words at random [secret: that's how google translate works;-)]. There are people who can write or read just fine, so long as they have a reference grammar and lots and lots of time. There are people who can spit out a stead stream of words as fast as you like and are generally intelligible, but they make grammatical errors and have a thick accent. And so on.

When I used to organize study groups, ideally everyone is at a similar fluency level. In practice, it turned out that there are 100s of distinct levels of fluency– people that know 100 words, or 1000 words or 10000 words– each case is an entirely different situation. This is why people who each speak English as an second language understand each other better than if they have to speak to a native English speaker. If everyone is drawing on the same 5000 words and same few dozen grammatical constructions, communication is simple and fluid.

People who have never studied a foreign language, nor a conlang, have no clue about any of this. So they hear that toki pona has 3 fluent speakers and they don’t realize that that factoid is bullshit. It is accurate to say that there are zero native speakers, zero near-native speakers, there are 50 to 100 people who have ever written a paragraph or more of toki pona and probably about 10 or 20 who can do it without warm up *right now* (everyone else would probably have to review and re-remember it all).

Also, another important point is that conlangs are not fully defined. You can learn all there is possible to learn about toki pona (and with more effort) about Klingon. And you’ll hit a wall, after which there are statements whose grammaticallity can’t really be judged for lack of native speakers, except maybe to get a ruling from the creator or relevant “language board”.

**NB technical note, there is at least one theory about Tok Pisin– that it was a language of mostly adults and developed it’s unique grammar among adults and then kids copied their parents. This compares with the other story that adults speaking new languages (creoles and conlangs) are in fact mostly using their L1 grammar and L2 vocabulary while the natively fluent children speak with the vocab and grammar of the new language.

Posted in conlang, conlang learning | 1 Comment

Syntax Coloring and Highlighting and Autocompletion

I wish there was syntax highlighting for English. When it is there, you see errors faster. I like autocompletion too, where you type a word and get a list of possible next words, sort of like what cell phone keyboard do.

Branching Direction.
If I type “the” the next word could be a long list. If this were Icelandic, if I start a noun, I have a short list of ways to end it (some including “the”). So I think in general, right branching will lead to a decision tree where you pick a word from the infinite possibilities and the next word is somewhat predictable. Another example, if I say very… the next word could be anything. But if the order was reversed and I said “hot” the next word is a short list of possibilities, probably including “very”

Take the example of toki pona, there are only 125 or so words, so we should be able to predict what is next, but it has mixed right branching, purely left branching would be better.

noun + modifier => maybe 50 choices are most likely.
start of sentence + pronoun => it’s going to be only a few possible things.
Conditionals are backwards, then are tagged at the end of a phrase. So a text editor wouldn’t know a conditional is starting. Vocatives are backwards, being tagged at the end (jan o!). But imperatives are right branching (o moku!)

Part of Speech Systems vs Content/Function word systems
The distinction here is obvious to me. Think of Esperanto. There is a very strict system of words being adjectives, nouns, adverbs, verbs and to convert one to the other you must change the word ending. In English (and toki pona), the system is more like content words which can be converted among any part of speed depending on place in sentence and function words, which glue phrases together and generally resist changing into nouns, verbs or adjectives. Function words would include things like “a”, “the”, “for”, “will” and so on.

When there is a strong part of speech system, a parse can see that a word is, say, a noun and infer that some sort of adjective is next. Also, with a Content/Function word system, you can’t tell the part of speech until you parse the sentence. In English and toki pona, there is often more than one way to parse a sentence, so a word an be one part of speech or another depending on how you want to understand it.

Syntactic Ambiguity
I’m no expert but I already suspect that Lojban doesn’t deliver on ambiguity-free sentences. I suspect that all sentence can have 2 parses– the one people mean and the one they said. And there can be more meaning semantically. Two people read a syntactically unambiguous sentence, they deserialize it to a data structure in their brain and the gety

Posted in conlang design | Comments Off

An unimplemented idea for conlang phonotactics

Phonotactics- the recipes for building new words.

So someone created a fake language. But they died, or got a real job or otherwise abandoned it. How to move it forward if they didn’t document the phonotactics?

There are many word generators and they mostly use very small domain specific languages (DSLs). For example

Valid Patterns: CV, CVCV, CCVV, VCV

And usually there are additional rules to reduce the total number of rules, e.g. (C)VCV means VCV can optionally start with a consonant.

Now the values of C, V and “Valid Patterns” are all sort of simple. So why not generate rule sets at random and then score how often they are able to account for the existing words? And to further optimize the algorithm, mutate the best sets or genetically cross them (take half of the rules of each highly performing rule set and check to see how suitable a new merged ruleset is)

This would allow for providing a list of sample words, generating a rule set and then generating a list of potential new words.

What this won’t do: it won’t account for things like in CVCV, the two vowels will be similar to each other because people have lazy tongues, so the vowels sometimes become similar or identical. But with enough computations, defects like these might become unimportant.

Posted in machine assisted conlanging | 1 Comment

Kickstarter and fake langauges

Finishing a conlang is a lot of work. So can fans provide an incentive to new language creators to finish it, say by pledging $ in return for a variety of prizes to be given to the fans?

Possible rewards:
The foundational documents in paper or ebook format- dictionary, reference grammar, canonical corpus.
Educational materials. Graded reader, workbooks, flash cards.
Educational services, like tutoring, classes, in person or online.
Artwork. (Posters with script)
Symbolic gestures– all the things that are not *really* related to fake languages, such as “your name on the list of contributors, a tote bag with the language name or logo.
Inclusion in the creation process: e.g. your name will be come an epoynm and it will mean “chunk style”
…. something else.

Possible challenges
A reference grammar isn’t too exciting for fans. Fans, for the most part, are there for the community.
Scale- If you get 2 sales, that’s better than zero, but doesn’t cover your fix costs of doing anything. If you get 2 million sales, you probably don’t have access to the loans, staff and what not you need to actually fulfill what ever you were promising. There is some sweet spot for sales, above or below that and this kickstarter is just a headache.
Not actually providing any motivation. A successful kickstarter might promise $9000– enough to motivate someone to ship a few hundred copies of an already written book, not sure if it is enough to motivate me (as a language creator) to write a reference grammar, dictionary, invest the time to become competent in a new language, etc. This would lead to fans being upset about unmet promises.

Oh. Intrinsic rewards. I’m reading a book right now that implies that if money gets involved in an activity you used to like intrinsically, they you become less motivated, especially when the money goes away. I.e. a lot of people do this conlang thing for free. If we paid them, they would probably get a burst of energy while being paid, but after the money goes away, they would be less likely to continue to work on it. I wonder if this has any application to movie languages– did Okrand, Peterson or Frommer work less on their languages when between movies? The guy that did Loglan worked on it till he died– he hoped to make money on it, but AFAIK, never did. Maybe that accounts for the lifetime, intrinsic motivation.

Posted in Uncategorized | Comments Off

Bresenish. It should work like a programming language

So I’m continuing to think about the syntax of bresenish, but in the back ground I’m reading a lot about programing languages. I’m thinking that there should be a way to speak a programming language. Except it would execute in your mind, not on a machine. And it wouldn’t be neurolinguistic programming, but hey, if it inspires someone to write a NLP sci-fi book with less handwaving, so much the better.

Okay, syntax.

Originally I thought I could get by with just elements, sets and relations (i.e. operators) because a simple sentence looks like that “Jack and Jill ran up the hill”=”The set of the elements jack, jill is related to the set of the element hill, via the relationship of running up” But my first attempt to translate the tower of babel into gloss showed that I needed assignment and variables. I need something that works like a pronoun, but has set theoretic typing (the set, the element) and works like a pronoun (“The set of men working on building a tower, which I shall henceforth refer to as the A team”) And I needed to express distinctly equivalence & assignment. “There is a set of men who worked all night, we’ll call them the B team. The A team and the B team are the same.” So I assign that list of men to the B team. And then I declare that A and B are the same, i.e. I have two different ways to describe them. Another example: “There is a pretty girl. There is a young girl. There is a school girl. These elements are all the same element”

Bresenish would be imperative. So it would work like this:

Imagine a stage. Put on that stage these elements, jane, jack, a base ball bat. The bat moves repeatedly to jacks head. The bat is in the hands of jane.

Imagine a stage. Put on that stage these elements, jack, a cake. Imagine another stage with these elements: jack. The former stage became the later stage.

More on the consequences of variables. Variables work a bit like proper nouns and pronouns. They mean something specific, anyone can make one up, they can last a long time or a short time. Like pronouns, they refer to something else. Natural pronouns tend to be of a small number rely on a “type”, e.g. person, number, social rank, animacy. In a set inspired language, there would be a need for a lot of different variables and having too many “its” would be a pain. Having too many ad hoc words would be a pain too. Maybe if there were some conventional variable names, like the way programmers often use “foo” “bar” “baz” as common variables, or mathematicians with x, y, z.

Bresenish on a machine…
I don’t actually have the skills to write this, but having worked with programming languages I think I can imagine what is currently possible.

An executable Bresenish would start out with a fixed number of elements, relations, attributes (adjectives). And those would be the elements of possible discourse. You could create as many variables as you’d like, but all those variables would refer back to already defined elements. In human instructions, we elide the obvious. So in a machine oriented language, we’d probably be very verbose and need loops. Human oriented would probably use an attribute on a relation to indicate an action was repeated.

So something like

Imagine a stage. (Computer draws a stage). On the stage are these elements (list elements, draw those on state. If abstract, just list them on side bar). The elements are related this way (locative elements would be the easiest to illustrate, abstract ones would just be represented with arrows, i.e. A loves B). The elements, X, Y, Z are a set, call ‘em foo. They have the attribute of “red”. (Computer draws circle around X, Y, Z and paints them red).

And so on. I think with some effort, Harry Potter could be translated into such a thing and a computer could hypothetically draw the resulting movie.

So to re-cap:

Stage = A universe (that you hold in your mind) being described. This is what you are imperatively telling your listener to manipulate.
Elements = vocab, both concrete and abstract
Sets = Things that relate to each other, might only have 1 element
Relations = Roughly verbs. They relate sets.
Attributes = Roughly adverbs and adjectives, modifiers of relations and elements.
Variables = Temporary names for an element or set, either drawn from a conventional list or made up on the spot.
Assignment = Saying an unbound variables in bound to another one. An lo, so it is.
Equivalence = Declaring two things are referring to the same thing. A machine, which is keep track of all the elements and the relations defined so far, would merge the two views of a stage after hearing that they are equivalent. “Stage one- The man is running” “Stage two- The man is sweating” “Stage one == Stage two” (The listener merges these two views and now we have a sweating man, running on the stage)

And I think that is enough moving parts for a workable grammar. Now I just have to work out a morphology and my list of elements. Which I think will still be Icelandic sounding words.

Posted in Bresenish | Comments Off

Bresenish. This is That.

So another post on a language I’ve provisionally called Bresenish. The goal is to creates something inspired by set theory and computer programming languages.

So one book I read said language is kind of like describing a stage where a play is happening. There are actors, elements on the stage, things happen across a time line. We then use words to turn all of this into descrete models of reality. These models are serialized into a linear string of sounds. Someone hears it and frickin’ magic happens and they deserialize it back into a representation of reality in their brain. Unlike lojban that tries to solve the problem of deserializing the sound back into an intermediate model, I’m content to leave that up to frickin’ magic.

So far I have the idea of the stage being described by a set of typed elements (the items on the stage, each of which can be part of a type, such a animate, inanimate, blue, not blue, what ever), a set of relationships (sees, is-sitting, is-tired), and a set of compound pronouns for disabiguation, such as, it-him, they-her. To parse such a sentence, you’d imagine all the possible relationships (the cartesian product, which is all possible links) and then filter down the list using the compound pronouns.

Seems like a good idea, but I have a hard time thinking of occasions when there is more than two elements on the stage, more than one relationship and thus only one compound pronoun. This maps down to just S-V-O, some subject is define in relationship to some object. An awkward English gloss might be, “Cat, Dog”, “Sees”, “him-him” The cat and the dog see each other (or maybe just the dog sees the cat or vice-versa)

Also, I needed another structure for dealing with equality, such as “The animal in the other room is a cat”. So I got the idea of adding a bunch of imperatives.

“Image there is a stage. On the stage is a barrel, some fish, a smoking gun.”

Now I’d like to say that this is the set up for a joke. Or it’s a recipe. Or it’s a crime scene.

“Imagine another stage. On the stage is a recipe. These stages are the same”

In pseudo code, it would look like

var stage1 = [barrel, fish, smoking gun]; //I’m telling you to imagine that these exist, they are on stage.
var stage2 = [recipe]; //I’m telling you to image that this exists. This time, it’s an abstraction you need to imagine.
assertEqual(stage1,stage2); //Where I said, this is true and you can challenge me if it isn’t.

I can’t think of any human language that uses an open class of arbitrary as proper modifier names as pronouns. But why not, it’s convenient. If I’d chosen the names “this” and “that” then it would be pretty readable.

Another problem I ran into was how to do set operations. Set operations need to be able to yield a new set, and then discourse needs to be able to refer to that new set without repeating the recipe.

var stage1 = [All the barbers who shave.]
var set2= stage1 – [barbers who shave themselves]
var stage2 = [All independent people] intersect set2
var stage3 = [poor people]
assertEqual(stage2, stage3)

Posted in Bresenish | 1 Comment

The Sound of Spoken Dothraki is the Sound of 1000 Dying Metaphors

Sorry, I just thought that would be a nice title. I subscribe to Pinker’s idea that those pretty metaphors in our language (MONEY IS WATER) are by and large dead metaphors since the context where they were made passed long ago. And so the sheep metaphors in Icelandic (Nej, SHEEP er KRONA!). And horse metaphors, were I to hypothetically write/speak Dothraki as a non-horse owning modern amateur language hobbyist. [Wait, are there horse metaphors in Dothraki? I don't really know, I don't have a 300 page ref guide to check) So back to what I really wanted to write about:

As I've said before, learning a language poorly takes a long time. So as a indiscriminate learner of languages, very rapidly I have to get picky about what I bother to study. I like languages that are small. Oddly, Klingon is small. The active community made the vocabulary closed and "de facto" capped at about 3000 words, which grows by about 1 every few years. I think one of the more recent new words was "monk". Dothraki is scheduled to be about 7000+ words, which is about the vocabulary at which people begin to report they feel fluent in say French or Spanish. So by that measure it isn't small. [By published vocab it is small, but texts written contrained by 500 words are not going to be compat with future texts unconstrained by 500 words.] So I ignored it for a few years, until I finished season 1 of GoT. I was annoyed that I had to switch the subtitles back from Swedish to English for the Dothraki sections.

It appears that the reference grammar of Dothraki is at 300+ pages now. Where is the book? I’m wondering if like Na’vi, Dothraki a prisoner of the needs and concerns (or more likely, the lack of needs and concerns) of the movie makers.

Movie makers want a language that isn’t constrained by backwards compat issues (this became a nuisance in the history of Klingon), that is reasonably easy to pronounce by actors, makes the movie sound better than ook-gook-poop nonsense, etc.

What fans want: A published lexicon, enough words in enough semantic domains to get by (but not too many, this is a commercially useless skill you know?), stagnation in the development of grammar, pedagogical grammars (lessons rather than a phonetics/morphology/syntax/discourse tome), a stock of canonical corpus texts, a safe harbor for using the language without risk of cease and desist letters.

Lets score Dothraki so far:
Lexicon – cobbled together by fans by reverse engineering leaked & in movie material
Grammar – Unpublished, so I presume that stuff beyond the current movie text is subject to change
Pedagogical grammars – Again, unpublished, lesson cobbled together by the reverse engineering of fans.
Canonical texts – Laboriously copied down by fans
Safe harbor – Not sure (I know for Klingon, at least the KLI can publish books & make money, couldn’t find the current status for Na’vi, initially Frommer acted as if he was pretty constrained about publishing much & I’d assume that meant fans were similarly constrained) (And yeah, what language we wisper behind closed doors– hollywood couldn’t give a flying f*k, and for usage up to but not publishing for significant money, I can’t imagine a lawyer could be motivated to squash it. But take down notices are always a risk when posting fan-fic or stuff that borders on that)
And what I keep thinking fans want, but don’t know it yet. They want governance… not a great Khal to crush them under the heel of his boot (er, horses’ shoes?), but more like the Tiddly Wink Tournament official rules. Are loan words from English okay? (Like EO) Can words change part of speech to fill a lexical gap? Can new words be coined? Which community proposal for weights, measures, calendars and the Moh’s scale should be used?

Often I’ve been criticized for not understanding conlanging, that one is supposed to enjoy it not by actually using a language, but by sitting in a large chair and reading the reference grammar and dictionary and being amused by that. Well, follow your joy, I’ll follow mine. I’m always shopping for languages to use, even if the number of years left in my life doesn’t really warrant learning many more.

Anyhow, I’ve got 45 minutes until the end of my holiday, maybe I can make some progress on that red Lojban book. At least it has been published.

Posted in Uncategorized | 3 Comments

Baby’s First Language

So, some things my baby likes to say (he’s 2 months old, so according to most charts he’s not supposed to express any communicative skills for 2 more months).

At about 3 weeks he said “no, ba-gwa, no, ba-gwa, no” (no = english “now” or Japanese, “nao”) Google translate said it was probably Welsh but failed to translate it to anything sensible.

Agoo. No kidding, this means “I sneezed.” It sounds related to achoo.
Mam. No kidding, he says this when is upset, crying and in context, appears to want Mom.
Owh. I’m going to cry in a moment. (or possibly, it means, “Can’t you just give in to my demands already?”)
And he said something once that sounded like “Hello”, but that appears to have been a one-off production for the moment.

We’re keen to teach baby real ASL. The boy has spent the first two months of his life with his hands in tight fists. Not very conducive to ASL signing. The first age of signing would drop if the system worked with clenched fists. One book I read said the lower bound for first sign is 4 months.

Recently, often when he’s laying on his back waiting to be picked up, he will raise his index finger– the rest of the hand is still a fist. I don’t know if it means, pick me up, but we try to react to it if it was. Pointing is a pretty early expression anyhow. So we likely have his first protogesture.

He also makes a fist with pinkie up. This is close to the sign for play/fun in ASL, something I sign to him a lot when he is listening to music. Again, I can’t tell if when he makes these gestures if it really match up to that thought in his head, but I try to react to pinkie-up as if means “play/fun” or music. The ASL sign for music requires making a two arm sweeping gesture, which is beyond what the boy can do right now– his arms are sort of jerky and all over the place.

He doesn’t do the up sign when already picked up, nor when he’s nursing.

Symbol Boards
In the disabled community, there has been bouncing around the idea of communication systems for boards for communication by pointing at successive symbols. We have the white on black baby book. It is a bunch of mono chrome black and white (no grey scale) pictures. The boy finds it very interesting. This is odd because some days it’s hard to get him to look at other books at all. So if I were to teach a toddler a pointing board, I would use a black and white or white on black symbol set– no colors, nothing intricate.

Posted in Uncategorized | Comments Off

Orphaned projects in the world of toki pona, aka conlanging (err..conlexing when you aren’t the inventor)

This is about conlexing, not conlanging. In a conlex (article on definition forthcoming), there is a community of users and they don’t take well to deep reforms that you might see in independent conlanging or collaborative conlanging. In a collaborative conlang, people might collectively decide to switch from nominative-accusative to ergative-absolutive and that would be find because no one was speaking it off the cuff anyhow and no one had invested 100 hours into memorizing flashcards. For a conlex, you just aren’t going to convince people to re-do that 100 hours of flash card memorizing and habit-undoing.

So back to toki pona, a project that I have been planning to take a break from in favor of spiffying up my Russian.

Big Projects.
A field linguists reference grammar. The hard part here is tying everything back to community corpus text.
The phrasal dictionary. Writing a dictionary that tracks down an example community corpus usage is hard.

Medium Projects
Watching the net for toki pona activity. I use RSS, twitter, and a google search alert to find toki pona stuff as it pops up.
- Reposting new tp content on toki lili & twitter (or the main forum)
- Welcome committee. Let people know that they will be corrected a lot. Correct people a lot. Respond to questions.
- Self appointed tutor. Lesson plans need to be written.
- Translate the foundational docs.
- News contriver. The inventor used to contrive news by adding a new word to the lexicon every other year or so. It’s a big deal if the lexicon is all closed classes and there’s only 125 or so words. The threshhold for what counts as news is pretty low, but if someone isn’t out there contriving news, the conversation dies out.

Small Projects
Systems. Especially ones that don’t call on lots of proper modifiers.
- Number systems. Can anyone make a truely quick to say number system?
- Spelling systems.
- Calendar systems.
- Writing systems.
- Science systems. From Moh’s scale (geology) to Biological taxonomy systems to systems for describing the life cycle of a star. It’s a endless pit of small projects.
- Monitor the wikipedia article. These get in accurate and fixing them is hard because of original research restrictions. So if I do a survey and figure out the true number of “fluent speakers”, I’m not supposed to go fix the article and post a link to my blog. Someone else is suppose to do it.

Posted in toki pona | 1 Comment

A very, very verbose Tower of Babel conlex implemented in JavaScript

This is my conlex called “Bresenish”. It’s set inspired. The idea is to do for programming data structures what lojban did for propositional logic. I tried this with C# and that was pretty verbose. JS is just as verbose but for different reasons (no built in foreach loops for one, many string manipulation functions missing, etc- JS did have fewer type declarations).

I wish I could write the datastructures of an message in an IDE (integrated development environment), then have a computer serialize it to a speakable conlex. I’m not too concerned about deserialization, humans are smart, but I notice while writing that the IDE, the computer can check to see how many potential parses it has to see if it is too ambiguous. Unlike lojban, the goal isn’t to have a single possible parse tree, since I assuming that the deserializer is a human brain, which has other knowledge available to aid in deserialization. Imagine if you had the 3 words, “mouse, cat, chased”– no matter the order, you can use your knowledge of small mammals to figure out what scenario is most likely.

You can copy the code below to jsFiddle to see the unimpressive results.


//Write two sentences of tower of babel in a English to English translation.
//For the conlex version, imagine foreing text in all the strings
function writeTowerOfBabelStory()
    var elements = new Object();
    elements.Clay = "Clay.INANIM";
    elements.Tower = "Tower.INANIM";
    elements.Bricks = "Bricks.INANIM";
    elements.Men = "Men.ANIM";

    var relations = new Object();
    relations.Bake = "Bake"; //Yields an new element
    relations.Build = "Build"; //Yields an new element
    relations.Say = "Say"; //Yields an new element (that which was said)
    relations.Intersect = "Insersect";//Yields a new set
    relations.Union = "Union"; //Yields a new set
    relations.IdenticalTo = "Identical to"; //Each element in one set is equal to an element in the other
                    //Yields a truth value.
    relations.DescriptionFor = "Is another description for"; //Two sets, describe the same reality, but with non-matching elements
                    //Yields a truth value.
    var pronouns = new Object();
    pronouns.anim = "the animate";
    pronouns.inanim = "the inanimate";
    pronouns.element = "an element";
    pronouns.actsOn = function (a, b)
        return a + " acts upon " +b;

    var anaphora = new Object();
    anaphora.ThisSet = "this set";
    anaphora.ThatSet = "that set";
    anaphora.TheOtherSet = "the other set";
    anaphora.YeildsAnNewElement = "previously yeilded new element";

    //Story- The men made bricks out of clay.
    var bricksTopics = new Array();
    bricksTopics.Name = anaphora.YeildsAnNewElement;
    bricksTopics.push(elements.Clay, elements.Bricks, elements.Men);
    var brickRelations = new Array();
    brickRelations.push(relations.Bake, relations.Build);
    var obviationBricks = new Array();
        pronouns.actsOn(pronouns.anim, pronouns.inanim)

    //Story- The men made a tower out of bricks.
    var towerTopics = new Array();
    towerTopics.Name = anaphora.YeildsAnNewElement;
    towerTopics.push(elements.Tower, bricksTopics.Name, elements.Men);
    var towerRelations = new Array();
    var obviationTower = new Array();
        pronouns.actsOn(pronouns.anim, pronouns.inanim)

    var babel = serialize(bricksTopics, brickRelations, obviationBricks);
    babel += "<br/>";
    babel += serialize(towerTopics, towerRelations, obviationTower);
    //Optional, list # of possible meanings
    function possibleMeanings(topics, topicRelations, obviations)
        //Each pair of elemements related by topicRelations.
        //Selecting only the possible using the obviations

    function serialize(topics,topicRelations,obviations)
        var utterance = "";
        utterance += "<br/>So there was these ";
        for (var i = 0; i < topics.length; i++)
            utterance += topics[i] + ", ";

        utterance += "<br/>some of which were related to the others via the relations of ";
        for (var j = 0; j < topicRelations.length; j++)
            utterance += topicRelations[j] + ", ";

        utterance += "Who acted on who?<br/>";
        for (var k = 0; k < obviations.length; k++)
            utterance += obviations[k] + ", ";

        return utterance;
Posted in Bresenish | Comments Off

toki pona and Orwell’s NewSpeak

Wikipedia has for a long time said this nonsense:

This goal, together with Toki Pona’s deliberately restricted vocabulary, has led some to feel that the language, whose name literally means “simple language”, “good language”, or “goodspeak”, resembles George Orwell’s fictional language Newspeak.[6]

First off, what some people’s feelings are is about as encyclopedic as fart. What matters is what is defensible on some standard of truth, being it either sociological– (conlangs are things that people do, so lets study it)– or linguistics– (conlangs might even be languages the way French is, so let’s study it).

The “They don’t have a word for it” trope.
New Speak in the fictional world was for expressing fictional English Socialism and as a means of totalitarian mind control. Vocabulary was restricted on the probably defective idea that if you don’t have a word for something, you can’t think about it, or do it, e.g. revolution, resistance, protest, etc.

toki pona isn’t primarily or tertiarily trying to control thought or to prevent unhappiness by taking away the words for unhappiness. In fact, the language’s lexicon isn’t all that happy, with words for death but not life and other oddities. The only way that you could think that toki pona was created by a NewSpeakian process of removing words incompatible with a philosophy is to not actually look at the lexicon. toki pona’s lexicon comes from a choice process more akin to Basic English, where words of high frequency, highly polysemous are chosen and narrow ones are left out.

toki pona and the philosophy of simplicity
Something can express a philosophy and something can be inspired by a philosophy. The Wizard of Oz was inspired by the monetary philosophy of the gold standard. It does a lousy job of expressing it, most people don’t get the allegory. At a museum you might have a piece of art inspired by some philosophy, without a cheat sheet, I bet you’d be hard pressed to figure out what philosophy generated what art (short of obvious hints like crucifixes). toki pona’s design and recommended use was inspired by a philosophy of simplicity. It doesn’t express any philosophy in use. You can write any message you want in toki pona– they will all be equally difficult to read and equally verbose.

Which one resembles a language, which resembles an idea for a language
New Speak is an artlang that in the real world isn’t defined enough to do squat with it. So a linguist couldn’t really do much with it. toki pona, by dint of effort of it’s fans, is some percent on the way to being a language– people use it online for communication. No one uses New Speak for anything, except as a rhetorical device for criticizing the way people choose words to encourage listeners to agree with their political views.

toki pona is not an artlang. toki pona is not embedded in any fictional work and does not have a conculture associated with it. Also, unlike a typical artlang, fans are expected to and have memorized the words and practiced the grammar to the point that they can read and write texts for consumption by other people on the internet. toki pona’s foundational documents are pedagogical, not primarily entertaining reference grammars and entertaining dictionaries, although I suppose anyone could potentially find anything amusing.

Structural Differences
toki pona’s lexicon is closed except for proper nouns (aka proper modifiers). NewSpeak’s vocabulary is open for all technical, scientific words. I suspect the effect of this would be to make the basic vocabulary of people impoverished, but people would still have a huge vocab of scientific words. But we don’t know– there isn’t a complete spec for NewSpeak and there isn’t a community of people trying to speak NewSpeak into existence.

NewSpeak is derivationally agglutinative. toki pona is isolating.

NewSpeak is a condialect of English. toki pona might as well be apriori as it borrows little syntactically and the lexical borrowings might have well have been apriori as the mother tongue semantic range is irrelevant in use and is generally not helpful for recognizing what a word means should you speak the loaning language.

Phonetically, NewSpeak is English. toki pona phonetically, was designed the way an apriori auxlang designer might, by picking sounds that are globally common.

I recommend this fix to the wikipedia article:

This goal, together with Toki Pona’s deliberately restricted vocabulary, triggered some gaseous, flatulent airbag to fart that the language, whose name literally means “simple language”, “good language”, or “goodspeak”, resembles George Orwell’s fictional language Newspeak. In the meanwhile, other people know that this nonsequitor comment is irrelevant, unrelated, nonsense and detached from any serious analysis of toki pona as a small social movement or as a spec for a small language.[6]

(Why don’t we instead compare tp to Klingon– because Klingon’s vocab is in practice fixed because Marc Okrand only coins a few words every once in a while, or compare it to Lingua Ignota because the inventor had religion/philosophy on their mind at the time, or gah! why bother? There isn’t a tidy pair like “Esperanto is like Interlingua” Actually one fake language it is kind of similar to is Sona, but that isn’t a very parallel fit either, but since no one knows anything about Sona either, it wouldn’t be a very enlightening comparison in an encyclopedic article)

Posted in toki pona | Comments Off

Toddler conlangs- aka idiolects, plans to teach toki pona to baby

I read about this a long time ago- the story was that twins would speak to each other in their own language. But while reading Baby Brain Rules (which happens to be on sale on amazon, cheap $3) the author mentioned his own son used “dah” to mean vehicle and a modified version for car, plane and boat. A boat was a “wet-dah”. What is amazing is the author related this story as just an amusing anecdote, unaware of what a big deal this is for the research of creoles, the origin of language and study about the “in-built-ness” of language. Elsewhere, I’ve read that kids creating language was a rare and uncommon thing, the result of contact situations, twins, parents incompetent in their own language (e.g. immigrants who refuse to speak their mother tongue but can’t speak the local one very well either). It might be that no one is paying attention and just attributes toddler’s language innovations to non-sense or language errors.

Teaching toki pona to baby
Ha, not like you think. I’ll be speaking Russian to baby, mom will speak English (one-parent-one-language) and when the baby is near his vocab spurt, I plan to do 30 hours of imaginative play using sock puppets and those sock puppets speak toki pona. It’s important for babies to work out who speaks what language, so I can’t be seen as the one that speaks toki pona, but it’s fine if sock puppets do.

Posted in babies and new languages | 3 Comments

How not to put philosophy into a language

This is a follow up to my last post, “Conlangs for expressing a philosophy“.

I suppose one can use any vessel for expressing a philosophy you’d like, a prose book, fortune cookies, songs, or even a refrigerator manual or a dictionary. In the case of the refrigerator manual, the fridge stuff is just a distraction to your main message. (And hey, there is a real book, zen & the art of motorcycle maintenance)

Editorializing Definitions
“A banker is someone one who offers an umbrella only when it isn’t raining”

“fun: fun is going to church, does not apply to drunkenness, dancing or gossiping”

“There is no word for traitor”

These aren’t definitions that a real lexicographer would accept. Words are as words are used. In a conlang with users, the words are as the fans use them, the definitions in the foundational documents are just a starting point. As soon as the language is born, there will be lexical gaps– and those lexical gaps will be filled by a variety of strategies, expanding the definitions of existing words, wordy circumlocutions, and so on.

Words don’t work this way. Language is very declarative, it says what is, not what it should be. Language describes the inner reality of real people and you can’t dictate their inner reality. You can just provide some words and recommended initial usage that spans the gap between your inner reality and someone else’s. As you pair any lexicon with a given inner reality, you will see different usage patterns. For example, in modern Russia, people routinely call business men bandits and criminals, having grown up hearing that sort of thing in school and just everywhere. In the west, where the rule of law works well enough, the local baker would be a businessman, not a bandit and a westerner would make the same distinction if he were speaking Russian or English. And the Russian may fail to make the distinction if he were speaking English or Russian or a fake language with pro- (or anti-) capitalist editorializing in the official definitions.

Salvaging a bad idea
Let’s imagine a communication system really did have the above 3 rules regarding bankers, fun and traitor. To keep fans from just ignoring them, you’d need to provide words (or suitable lexicalized phrases) for “banker, as in a person to lends money or holds money for safe keeping”, “fun, as in dancing, drunkeness and gossiping”, and the compound word or phrase that means the same thing as “traitor”. The lack of a short word for traitor will be likely temporary– useful phrases get lexicalized, shortened and turned into words through time and use. The attempt to redefine banker and fun failed– instead we created two jargon words.

Anyhow, an entire language is an awfully big and cumbersome vehicle for expounding on a philosophy– if one is making a language to express a philosophy and all you have are jargon and editorializing definitions, then one might as well extract that to a discussion of philosophy and “how-things-should-be”.

(And if you haven’t read my other post, please do, it that one I’m more optimistic about getting some philosophy into a language via grammaticalizations, pronoun systems, etc)

Posted in conlang design | 1 Comment

Content Management for Conlangs

I’m considering my options for posting the evolving definition and learning materials for a small conlang.

Plain html. Evolves poorly. Easy to set up for a single document.

Wiki Evolves well, but at the end it will look like a bunch of disconnected pages. Some versions, like media wiki, have such poor security they get overrun by spam. Easy to turn into a bunch of stubs if you aren’t careful.

Blog. Well, this blog isn’t a good option. I hate mixing streams of content with different audience. This blog is for just anyone interested in fake languages. Inserting a bunch of lessons on a specific language is just noise.

A dedicate blog isn’t a very good option because by default, a blog makes recent content very visible and old content less visible. A conlang has about three levels of content– the expository description for tourists and people deciding if they want to try it out. The second is a lesson plan, work book ,flash cards and a forum for posting texts, questions, etc. Comments on a blog do a poor job of allowing the community to initiate a discussion.

The order in which content is created has nothing to do with the order in which content is most digestible.

Email Lists These scale poorly, but are easy to join. They scale poorly— at more than a dozen emails a day, people start to ignore it, send everything to a folder than they ignore, or they use only the web interface which makes a list behave more like a forum. I have no idea what the etiquette is for mailing lists where it is just the creator trying to create most of the content but I for one, wouldn’t want to be constantly trying to create conversation on a mailing list if it didn’t have critical mass, where as I can post daily to a blog and there isn’t a problem if no one is answering (at the moment).

Mailing lists assume there is someone ready to respond *right now*. Blogs allow people to respond years later.

Forum. Forums have a high entry cost, but work well for core community participants. Forums work well for high volumes of messages and work poorly for low volumes of messages. Some people just don’t like forums. Forums also have rules of their own, can attract people who enjoy other forums that may have a culture that you’d rather not import onto your life.

Miniblogging This is a good place for learner’s discussion, maybe a way to teach– breaking the lessons and vocab into a tweet or two a day is a creative way to drip feed the world the knowledge to use a language when they otherwise might not have the time to devote a 30 hour block to it.

Moodle and the like Moodle is an online lesson thing. Most computer base training is multiple choice quiz oriented, e.g. you read some text and answer a multichoice question and repeat. It sound like a lot of work to create one well and it isn’t clear how many people would be comfortable with that sort of training. Lets say that 500 people try to learn a language, will their budgeted attention span expire before they get used to the user interface? Anyhow, I’ve got a moodle up, I’m starting to think the real burden is going to be on the course designer and less so on the course user.

chat There isn’t a single dominate chat technology– people use irc, skype, and many others. For conlang projects, a key feature is the ability to get consent to record logs and to record the logs for corpus research.

Posted in conlang community building | 3 Comments