Readability Scores for Conlangs

I was making good progress on my 30 day conlang as long as I had tools that matched the task of the moment (word generation, dictionary creation), and then I ran out of tools. I needed a random sentence generator, so I wrote one for a language I’m more familiar with–toki pona, then I noticed that the generated sentences were really, really complex– but I didn’t have a good way to quantify this. I wrote a first approximation anyhow.

The Flesch–Kincaid and Gunning Fog indexes both use sentence length, the latter uses something reminiscent of the index of synthesis (# of morphemes per word) and the former users syllable count, which is probably a proxy for an index of synthesis type measure. Both use scalings and weights that aren’t explained and were probably tuned by asking fluent speakers what was harder to read.

Dr Frommer in the Conlang podcast also in passing mentioned that one of the observations about conlangs and natural languages is how much they expand in word count when translated from the English to the conlang. I guess a conlang readability score would have to somehow account for that. (Obviously not all languages experience the same word, letter or morpheme count expansion, some languages are explicitly written to be short-hand conlangs)

If a sentence takes 2 times as many morphemes/letters/words/etc for all types of sentences, then the scaling factor for average length of sentence should be smaller than the factor used in English readability scores to remain comparable.

Mathematically, it is something like this:
English Factor for Sentence Length / (% growth/shrinkage when translating)
*
Avg Sentence length
+
English Factor for Word complexity / (% difference between typical English and conlang words)
*
Measure of word complexity

This entry was posted in 30DayConlang, conlang design. Bookmark the permalink.

Comments are closed.