Object Oriented Conlanging

If you don’t write code, you may feel like the wrong audience.

So recently I’ve been on a kick of learning about object oriented library writing, especially for the most basic types. I notice that MSDN’s advice for writing basic types and the exercise of writing a library acts sort of as a model exploration exercise that leads to surprising discoveries.

Why a Conlang?
Sure natural language processing libraries exist for real languages like French and English? Yeah, they do, but French and English are so mind boggling complex that writing code to parse them is a task that has befuddled developers for decades. Even established conlangs are much smaller, even those that weren’t initially designed to be small or limited in their complexity.

Also, during the creation of a conlang, out of necessity, it is small, if only because the vast majority hasn’t been imagined yet.

One more reason: because you may overly restrict yourself by considering datastructures that are peculiar to English and French when one possible goal of a conlang is to create a human communication system that goes beyond what can be done in well known natural languages (this compares to the naturalistic-conlang goal of creating a reference grammar that fools professional linguists– a goal that will put some limits on breaking new ground into new techniques for communication strategies)

Basic Types in a Conlang
Words. The basic type should be a token, not a string. Strings are not atomic enough.
Morphology. The language I am using (toki pona) as my model doesn’t have any morphology to speak of.
Phrases. Here is where I discovered the most. It turned out to be helpful to have many specialized classes that can exploit information that goes beyond what a formal grammar might know. For example, some words are likely agents, and those agents can own things, they can be the agent of a transitive verb, and so on.
Forgotten Structures. We almost never formally deal with punctuation, quoted text, diglossia, numbers, dates, animal sounds and so on. When you work with real sample texts, you encounter these issues immediately and realize that they can’t be ignored and deserve to be treated in terms of importance as concepts on par with subject, object and prepositional phrase.
Alternative parsing. A machine parser will yield only one parse, automatically, for free because you don’t have any other choice. However, the formal grammar will suggest that for any given text many parsings are legal. I think that letting the machine pretend there is only one parsing is pragmatic– it forces the language creator to decide how much ambiguity is tolerable.

Basic Applications of a Conlang Library
Word Processing. If you can parse it, it is potentially valid syntactically. A prerequisite to parsing is dictionary lookup, so if you can parse, you can also support spell-check.

Lorem Ipsum Generation. If you can turn text into a data structure, you can turn data structures into text and you can do it randomly. This will illustrate which sort of structures are possible, both validating the language design and the

Knowledge Representation. I wrote about this recently, but in short, if you can generate random sentences, you can take a sentence template and generate complete sentences using a data table, say a phone book. And with a parser, you can turn natural language questions into queries and commands to get data back from a phone book.

Concordance And concordance, although I would almost suggest that existing concordance tools are good enough and don’t need to be language specific.

This entry was posted in machine assisted conlanging. Bookmark the permalink.

Comments are closed.