Draft for tp++ as toki pona superset for a toki pona cross compiler

Please read the article on what a cross compiler is in the context of conlangs before reading this.

The recommended way to use this, should I (or someone!) complete it and write a compiler is that certain advanced toki pona users would write tp++ and compile it to ordinary, human readable toki pona for posting long documents on forums. I suppose an on the fly compiler could be written for chatting. As such, anyone who doesn’t want to think about tp++, they don’t have to, they could just read the compiled version. Could someone take tp++ source code and post it uncompiled to forums and mailing lists? Sure, but that would be violating the spirit of this mini-project.

tp++ is a superset of tp
Almost all existing valid toki pona is valid tp++. A compiler may run in strict mode to make certain rules obligatory.
The compiled output of all tp++ is ordinary, valid toki pona.

Type Annotation
Numbers are annotated with a #. Depending on compiler setting, the compiler will compile numbers to stupid, half-stupid, advanced or poman numbers, since at the moment those are they only systems you see people use.
Dates can be proper modifier dates or any of the existing community proposals for dates. Dates would compile to suno/mun/sike suno with numbers as above.

Part of Speech
Prepositions obligatorily need , before them.
Numbers obligatorily require # before them.
Ordinals obligatorily require the terminal particle nin.

As a computer language, it needs to have a sense of formatting expressions. tp++ will use Markdown as the formatting syntax. HTML is too complex. Markdown has the advantage of either compiling down to nothing or to HTML.

Particle Innovations
There will be a set of particles with certain strong meanings that compile down to their weaker meanings.
mon compiles to pi when pi indicates personal ownership. soweli mon jan Mato. Matt’s cat.

Sentence busting.
Currently you can’t easily put certain kinds of sentences together. You can have multiple subjects, multiple actions, multiple objects, but if you have notion that requires a sentence, you can’t embedded it into one sentence you have to split it into many sentences and rely on the reader to coordinate the ni’s.

jan Mato li jo e soweli tanen soweli li ken moku e soweli lili ike.

compiles to

jan Mato li jo e soweli tan ni: soweli li ken moku e soweli lili ike.

Arbitrary phrase ordering.
Currently, we can front phrases with la. In tp++ any phrase can be fronted by adding la to it’s head particle. En is start of a subject phrase.

jan Mato li jo e soweli lon tomo.
lalon tomo en jan Mato li jo e soweli.
laje soweli en jan Mato li jo.

La’s phrases are not in a natural location and lack a “warning” creating unnecessary garden path parsings.

lani telo li anpa la mi li tawa ala.

compiles to

telo li anpa la mi tawa ala.
mi li tawa ala lani telo li anpa.

compiles to

telo li anpa la mi tawa ala.

Scope and Sentence Ends.
Sentence ends are obligatory and can be period, semi colon, question mark, exclamation mark.
! is emphatic.
?! or !? is surprising.
!! is a command. If the target is a machine, it should execute it.
? is a query. If the target is a machine, it should respond with matching facts.
?? is an rhetorical question and expects no response.
. is a fact. If the target is a machine it should be inserted into the current knowledge base.

Scopes being with //[ and end with ]//. The compile down to nothing. Declarations exist only in a given scope.

Proper nouns would be written in any supported natural language, at least including English. It would compile using a look dictionary for well known proper nouns, and machine transliteration for unknown words.

Neologisms are words that the compiler hasn’t seen before, aren’t expansions. For example, if official toki pona got the word apeja, it would be treated as a neologism by compilers written today. Since we want the compiler to continue to work, we need to let users specify that a word is a neologism and should be outputted.

Expansions are closely related to neologisms. An expansion is a word you invent and it is automatically expanded into a valid toki pona fragment.

An expansion is valid toki pona word that expands into a toki pona noun, verb or modifier phrase.
An expansion can either be contingent or noncontingent.
Contingent expansions expansions can only occur in certain locations, such as verbs, or are different depending on if they are a verb or noun.
Noncontingent expansions always expand to the same phrase, no matter where they appear. This may cause problems when a phrase takes on an unexpected meaning when in verb positions, or when it is a modifier.

pasin li pona tawa mi.
//I like grain spirits.

compiles to

telo nasa pi kiwen lili pan li pona tawa mi.

@@ imports a text file, for example a collection of declarations.


Fragments are permissible utterances so long as they start with a particle and end with a sentence terminator. It is included in the output as is.

Comment fragments are started, include or are terminated by … These fragments are not parsed because the mean the sentence is missing a word. For example, the transmission was interrupted. As such, a compiler can’t be expected to make sense of it, but possibly a human could. So it behaves like a comment. It is included in compiled output.

Anything that starts with // is a comment. All comments are stripped from compiled text.
Block comments are between /* and */. These are also removed from compiled text.
” delineates foreign text. Foreign text is preserved in output, but not parsed.
/// is toki pona that is a declaration or other toki pona that will not be a part of the final document.

Unnecessary irregularity
mi li and sina li are obligatory in tp++ and automatically compile to bare mi and sina.

mi li jan. //source code
mi jan. //result

Coordinated Values
A programming language does a lot of variable to value binding. For example, x=1+1, is evaluated and x binds to the value 2. In toki pona, there are not enough clues to allow for variable declaration, nor binding. The closest thing we have are pronouns.

Declared Variables
Here is a declaration for the variable jan Mato for the scope of the entire application.

// (male, single-not plural, animate)
/// jan Mato:(jm) li mije li wan li moli ala.

jan Mato: is the declaration, some noun phrase.
(jm) is the annotation. It has to be attached to each jan Mato or ona that refers to jan Mato, or else it refers to a different jan Mato.
The li chain is used for validating the pronoun. jan Mato can be represented by ona mija, ona wan. The animate/inanimate marker could be used for improving machine translation but would not necessarily change the toki pona output.

In the following scope

//Compile error, wrong number.
jan Mato(jm) li jo e soweli. ona(jm) tu li pona kin.

 //Compiler warning, unbindable pronoun.
jan Mato(jm) li jo e soweli. ona tu li pona kin.

//Compiler warning, undeclared soweli(s)
jan Mato(jm) li jo e soweli(s). ona(s) tu li pona kin.

Stupid Obviates. Real world obviates are too difficult to explain. tp++ stupid obviates are declared annotations that strongly match up referents to their pronouns. For example

//Lots of hard to bind pronouns.
jan Mato li jo e waso. ona li ken tawa sama waso.
ona li pilin pona tawa ona.

//All pronouns bind explicitly.
jan Mato(m) li jo e waso(w). ona(w) li ken tawa sama waso.
ona(m) li pilin pona tawa ona(w).
This entry was posted in machine assisted conlanging, toki pona. Bookmark the permalink.

Comments are closed.