An unimplemented idea for conlang phonotactics

Phonotactics- the recipes for building new words.

So someone created a fake language. But they died, or got a real job or otherwise abandoned it. How to move it forward if they didn’t document the phonotactics?

There are many word generators and they mostly use very small domain specific languages (DSLs). For example

Valid Patterns: CV, CVCV, CCVV, VCV

And usually there are additional rules to reduce the total number of rules, e.g. (C)VCV means VCV can optionally start with a consonant.

Now the values of C, V and “Valid Patterns” are all sort of simple. So why not generate rule sets at random and then score how often they are able to account for the existing words? And to further optimize the algorithm, mutate the best sets or genetically cross them (take half of the rules of each highly performing rule set and check to see how suitable a new merged ruleset is)

This would allow for providing a list of sample words, generating a rule set and then generating a list of potential new words.

What this won’t do: it won’t account for things like in CVCV, the two vowels will be similar to each other because people have lazy tongues, so the vowels sometimes become similar or identical. But with enough computations, defects like these might become unimportant.

This entry was posted in machine assisted conlanging. Bookmark the permalink.

One Response to An unimplemented idea for conlang phonotactics

  1. Larry Smith says:

    I would suggest a Markov chain algorithm initialized with the available words. That will then give you a potential word list using the probabilities of certain letter combinations in the sample, achieving the above with much less work.