Soooo, this toki pona project of mine. I’m parsing community generated texts. I’ve got incompatible goal. On one hand, I want a parser that will work. So if you type English, it should blow up. But if you type toki pona, it should parse it if it is beautiful, conservative toki pona. But if it is sloppy toki pona, I don’t really want to make a big deal of it. So you forgot to capitalize, forgot a period, forgot a closing quote, forgot the li, added a period instead of a comma before a li, and on. I’m not going to add new rules to try to deal with these. So at the moment, I normalize them. I just fix them.
Then there is this:
meli li tawa en tan lon palisa.
Ignoring what it might mean, it’s a compound propositional phrase, just like English, “No smoking in or around the school.” It could have been written:
meli li tawa lon palisa li tan lon palisa kin.
But that would just sound pedantic. Some stuff is sort of borderline.
How to you deal with noises?
jan li owi. => The guy said ouch!
I could fix it to:
jan li mu owi. => They guy made noise like ouch.
But that looks pedantic and I don’t like the word mu all that much. It sounds too much like a cow and you have to add another word to indicate the actual sound.
How do you deal with defective names?
nimi mi li nimi ‘jan Laowi’
Fixing them would make parsing easier, but would get in the way of communication, especially if a name is already well known.