Dictionaries for toki pona

I read about the dictionary making for Algonquin, a highly synthetic language with few unbound morphemes. Everything of interest is a bound morpheme. Full words necessarily drag along with them a lot of other cruft, as if a dictionary had a definition for unsympathetically but the word sympathetic wasn’t allowed to be a stand alone word.

Surprisingly, toki pona is like that. toki pona has compound words, which if you are a grumpy cat, you can call them colocations (words that appear together commonly), or just call them compound words– because they behave rather similar to two stem words in languages with bound morphemes. Beyond that, we have “templates”.

Noun Phrases (content phrase)
jan pona. This is a perfect compound word. It takes modifiers, resists splitting, and it has two “slots”– stuff goes before it and after it.

These phrases have little internal structure. These are useful for machine parsing, the traditional dictionary just works. You could look up words by their head word and life is beautiful.

kin la. == really. Also a good compound word, it has two slots, you can put more la phrases before, a sentence after and that is it.

Verb Phrase
Verbs phrases are closer to templates because the head verb is one word.
kama sona. This isn’t a perfect compound word, it has three slots: [0] kama [1] sona [2]. The head verb is still kama and you can add modals before, negation, intensification and adverbs after kama. stuff after sona describes sona, not the kama sona phrase.

Templates are a lousy fit for a traditional dictionary. The head word could be in a variety of places. Sometimes the template doesn’t rely on any specific word, e.g.

x li y tawa z. x feels y towards x. (where y is usually pona or ike, x is usually an agent.)

I don’t even know where to put that in dictionary alphabetical order. I feel like I’m back in Algonquin again.

mun li pimeje e suno. eclipse. This almost doesn’t feel like a template anymore. To use it in a sentence requires extensive rework. It has at least 4 template points not counting adding all the optional things available to the maximal sentence.

Other patterns.
kule lon palisa luka. Fingernail polish. This is also a template with significant internal structure.

Keep the templates separate from untemplated definitions.
Be explicit about the slots in templates.

Unrelated advice:
Be wary of unwarranted glosses and translations.
jan Sonja said telo is sauce, so I guess it is.
If I say telo means rocket fuel, it’s an unwarranted translation unless there is some text to set that up.

This entry was posted in lexicography, machine assisted conlanging, Virginian Algonquian. Bookmark the permalink.

Comments are closed.