30DayConlang – Day 8

Since I’m making remarkable little progress, I decided to work on defining my goal, the thing that tells me, “Yes, you wrote a language and you wrote it in 30 days.” And if I could make that goal quantifiable, so that some could visit a webpage and upload documents or fill out a survey, then one could do the next step of saying, “These people completed a language, these people didn’t quite finish in time”

I submitted this question to the Conlang mailing list and will incorporate some of the ideas from there.

Done as in perfect vs done as in objectively along the line starting at [no effort] and ending at ["like a natural language"]
I very much have the philosophy on NaNoWriMo in mind. In NaNoWriMo, contestant write a 50,000 word novel in 30 days. At the end of that, they only really know for sure that they have 50,000 words. It could be 50,000 words of “Ni!” or it could be a rather readable story. The point isn’t that 50,000 words is magic, the point is that a quantifiable goal will motivate some people to do something rather than nothing.

While it feels good to say that language development will never be done, it’s done when it is perfect, must take an infinite amount of time, etc, it isn’t really motivating to contemplate that. In fact, even if conlanging really is like trying to drain the sea with a teaspoon, let’s just ignore that and work on something quantifiable. People need new languages, they need to be written and shipped. And language learners need to be able to decide which languages are ready to learn and which aren’t ready. If some conlangers don’t plan to ever reach that point, that’s fine– follow your joy, a 30 day conlang project is probably not going to fit your design agenda anyhow.

Morally better vs more complete. I think I’ve made it clear here that quantification isn’t about writing a better language. A paragraph describing an really great idea for a conlang might be morally better than a language that has a dictionary larger than the OED, a corpus larger than the combined published works of all human languages todate, etc. But at the end of a month, I’d feel more pleased by accomplishing the later, or something close to it, than accomplishing the former. And I think numbers can be put on that.

Multidimensional measures.
What ever measure turns out to be pragmatic, I expect it to be multidimensional. CALS is already choc full of conlang descriptions that exist only as a check list of features and there are plenty of conlangs that exist only as a long list of words and single word English translations. With a little of each, it is possible to write something in a conlang, with only one of each, almost nothing can be written. So these two measures are multidiminisional and sort of multiplicative.

Some measures are additive, or at lease sometimes additive– i.e. a signed language will be complete without anything being written about the phonetic inventory or phonotactics.

Dictionary Size
Of the top of my head I though of dictionary counting, which has some technical problems, especially with how you lemmatize your words. In a language with lots of morphology, it can be tricky to say what is a word and what is a part of a word. Word counting works the best for isolating languages, worst for polysynthetic languages. Word count can’t take into consideration the quality of words– for example a dictionary of commonly used words represents a more complete description of a language than a dictionary with a similar number of scientific and technical words.

A potential fix for the lemmatization problem, suggested by And Rosta
Count listemes. – Theese are word with meanings that can’t be inferred from the parts. So write, do, work, rewrite, redo and rework would be about four listemes: re-, do, work, write (or maybe 3 and derivational morphology would be in the grammar, not the dictionary.)

Corpus Size
Words here can be counted, with some of the same caveats as with dictionaries. The word counts will be most comparable with similar languages (i.e. analytic compares to analytic, agglutinating compares to other agglutinating). To do interlanguage comparisons, you’d need similar texts. Unfortunately, there are few texts that people are commonly translated, e.g. Lord’s Prayer, Babel Story, UN Declaration of Human Rights. These are short and pretty specific semantic domains. Even a large text, like the entire Bible, doesn’t cover enough scenarios to say that the language is usable in modern contexts, as demonstrated by Modern Hebrew, which had to invent numerous neologisms to cope with modern situations.

Grammar Size
There are templates and lists of questions that attempt to cover a broad area of grammatical topics that typically occur in languages. WALS/CALS is one, the outline of “Describing Morphosyntax” is another. If you stick to these too closely, they will influence the design of your language. Imagine if you used your high school French textbook as a model– you’d end up with a French-like conlang. Many of these question sets can yield “Does not apply” if the language in question is too different than what the question set was expecting. For example, toki pona uses almost no morphology at all, so any morphology sections would be “skipped”. Should these skipped sections count as complete or incomplete?

Merely counting word count of the supporting documents works only if the general quality and depth is similar and those criteria are hard to measure. On the other hand, this is the exact same problem than NaNoWriMo has when comparing a 50000 word good novel and 50000 words of “Ni!”

Maybe a compromise could be found by requiring that a descriptions follow a series of sections and that a certain minimum word count be in each section, or some percent of sections (to take into account that some languages have more going on in morphology than syntax or vica versa)

(Some clarifications on the potentials of this strategy from Jim Henry, phrasing is all my own)

Highest Assessable Competency
This idea get’s its inspiration from the competency exams people take to assess their skills in English, French, etc., such as the TOEFL and many other like it.

Let’s take the Black Language of Tolkien. There is one sentence written in it. If someone one were to write a competency test, at best someone could test out to being incompetent. On the other hand, Esperanto is so complete in this sense that you could test out as level 1, 2, 3, etc. up to “passes as a native,” since there are native speakers of Esperanto.

I like this idea, and I suppose in the 30DayConlang context, this could be implemented as writing a series of exams similar to the real life exams, starting at the easiest and working up to the most advanced. The acid test of course would be to take these tests oneself and pass. But that certainly would take more than a month. It’s in a range of imagination to write a language in a month, but learning a language in a month is really a pipe dream, unless there isn’t much to it.

(General idea suggested by Sam Stutter, rephrasing all my own.)

Achievements is a common videogame system, where you can rack up points in variety of domains. In the conlang sense, you might gain achievement points for translating certain texts, for completing a 1000 word dictionary, for writing a lesson. These achievement can be earned multiple times, so you can earn 20 lesson achievements, earn the 1000 words in the dictionary achievement five times, etc. Mathematically, this allows for certain interlanguage comparisons.

Language A: 5 achievements of type Q, 6 of R, 10 of S
Language B: 12 of Q, 18 of R, 22 of S
Language C: 1 of Q, 44 of R, 15 of S

B is clearly more complete than A, but C can’t be ranked without making some potentially arbitrary decisions about weighting of scores. The nice thing about a website is that if I feel ambitious enough to do the programming, I could allow people to set their own weighting and sort languages on themselves.

An advantage of achievement scores is that people can compare languages based on criteria that they believe in. For example, if you think the grammar is a bunch of nonsense, then you could sort a list by quantity of corpus to decide who has a complete language and who hasn’t really been trying.

(General idea suggested by David Peterson, rephrasing all my own.)

Completeness by Translation Challenge
So the idea here is to take a large corpus in, say English, and pick sentence at random. If the maximally competent speaker/user of the language at the moment can translate that sentence, and many more, then it is more complete than another language where the best speaker says, “There just isn’t a way to say that yet”

Now this gets tricky with extendable languages and deciding how convenient a translation can be before one accepts it as a translation. English can be translated into toki pona, but whilst doing it, one feels like they are inventing a lot and the result is sometimes clumsy and wordy. If one can find a way to say most things using toki pona– a langauge impoverished in most mechanisms lexical and syntactical, then I suspect many other language that a man in the street might call “incomplete” might have some clumsy way to say just about anything.

All that said, this certainly could be done by survey. A reasonably honest and cooperative conlanger could answer a survey and then say yes/no to “This sentence was translatable” and “I didn’t have to innovate to translate this sentence” and at the end, count up how many yes. In a long enough survey and across enough different people, the individual peculiarities of how people answer should disappear leaving a rough statistic of completeness.

(General idea suggested by And Rosta, rephrasing all my own.)

This entry was posted in 30DayConlang. Bookmark the permalink.

One Response to 30DayConlang – Day 8

  1. Brian Barker says:

    Hi Matthew

    I don’t know if you’re interested but the Esperanto-Asocio de Britio will have an Esperanto stand at the London Language Show at the end of October.
    If you know of any Esperanto beginners there’s a taster course on Saturday afternoon as well.
    Tickets to the show are free, but you need to book using this link http://www.thelanguageshow.co.uk/page.cfm/link=7
    Amike salutas