Cherokee Word Deconstruction

ᎣᏏᏲ. ᏙᎯᏧ? Hey, welcome back!

I’ve been working on a word deconstructor. This was originally called an AffixSplitter or Lemmatizer depending on the project you look at. The original projects can be found here and here and here.

It’s easy to determine affixes when you know what type of word you’re dealing with (verb, noun, adjective, etc). When you don’t have that information it’s a little more difficult. Ideally, the conjugation engine would be complete and the dictionary would be fully relational with a lookup table. Then pretty much every form could be looked up and then given back as a list of options from the database. We’re not to that place, yet. So, how do you proceed?

I had to start with known variables. We know there are definitive final suffixes, non-final suffixes, and pronominal prefixes. For verbs, this breaks down fairly easily. Whether you realize it or not translation begins with the third person present tense form of the verb. Then you break it down to the root and then add affixes. For example, /gawoniha/ breaks down to: pronominal prefix = ga; root is woni; root ending is h; and the tense is present = a. All Cherokee verbs can be conjugated from the third person present tense. To conjugate you have an “explosion” where the epicenter is the root and you build from there expanding the primary, secondary, and tertiary portions of the blast become the conjugation itself. For breaking down verbs it’s an implosion. You start at the furthest point (beginning and end of the word) and move inward strip off affixes til you get to the root. You do this subconsciously once you’ve learned a language. Your brain stores these lookup tables naturally.

To breakdown a word we have to start at the ends of the word, remember implosion, such as (ᎦᏬᏂᎭ). The transliteration is /gawoniha/. From a lookup table we know that the final ending is PRESENT tense and the pronominal prefix. From here if you want to make the word past tense you add the appropriate pronominal prefix, maybe change the root ending, and add the past tense suffix. So /gawoniha/ as third person present tense would be come /uwonisvi/ or pronominal prefix = u + root = woni + root ending = s + past tense = vi. There are rules and patterns for almost everything in conjugations.

To further demonstrate the conjugation and deconstruction of verbs here are two graphics I modified.

Word deconstruction “implosion” work from the outside to the center
Word conjugation “explosion” work from the center to the outside

The conjugation engine used on the CED online site employs this methodology of breaking down the word to the root then adding affixes to form the new word. This is easier because we know all of the stems of a verb, where forms are created to represent what we’re wanting to conjugate, then we conjugate the verb.

When breaking down a word there are two approaches. The first would be to know what the verb (noun or adj) is then remove the affixes and list them in some user friendly manner. Where /gawoniha/ would become something like:

ga
woni
h
a
3rd person
singular
animated
to speak
root ending
present tense

Without knowing what the word is we have to employ a, somewhat, brute force approach. I’m going to outline the approach then discuss it.

process() -- method to start code
    lookup word in dictioanry
        if word exits
            setup wholeWord with dictionary entries and return to user
        else
            deconstruct() -- method to continue breaking the word down
                strip final suffixes
                lookup word in dictionary
                    if exists
                        setup wholeWord with dictionary entries and final suffix list and return to user
                    else
                        remove affixes and add to wholeWord object

                    put together a simple word that might match the database - this is the uwoniha to gawoniha example above
                    lookup this simple word in the dictionary
                        if exists
                            setup wholeWord with dictionary entries and complete breakdowns and return to user
                        else
                           we didn't find a match so no definition - this is still a possibility of occurring, however, I've not seen it much.
                           return wholeWord

Basically, I take the word and check the dictionary for it. Then I remove final suffixes and do another check in the dictionary. If there’s still no definition, then I remove all of the affixes, use the pronominal prefixes to determine what I should look up and try that to get a definition. Three calls to check the word. Since there are rules for this with verbs I could run it through the entire deconstruct method. However, I don’t know what part of speech the word could be. This approach isn’t foolproof. There could be something missing and there’s still no definition. This is where testing comes into play. Lots of tests written to verify the code is doing what it’s supposed to. And after a change is made verify that all of the tests still work.

The object I’ve come up with to return to the user looks like this:

wholeWord = {
    phonetic: "", // phonetic transliteration of original syllabary lookup
    syllabary: "", // original word to look up
    root_phonetic: "", // phonetic root of the word as broken down
    root_syllabary: "", // syllabary root of the word as broken down - if the root ending contains a phonetic "letter" then it will appear as syllabary+phonetic letter
    definitions: [], // all definitions found in the database
    root_ending: "", // what the root ending is - phonetic only
    constructedVerbToLookup: "", // if this word is a verb then this takes the third person prefix type (A or B), root, root ending, present tense to give a lookup so uwonisvi (third past) would become gawoniha (third present) and is then
    verbTense: {tense: "", ending: ""}, // what is the tense of the current word
    initialPrefixes: [], // what are the initial prefixes found
    pronounPrefixes: [], // what are the pronoun prefixes found
    reflexive: false, // was a reflexive prefix found
    nonFinalSuffixes: [], // what are the non final suffixes found
    finalSuffixes: [] // what are the final suffixes found
}

Each property has a comment on what they are used for. Those marked with square brackets [ ] mean that there could be more than one result.

Another element to this is that the verb could be broken down using syllabary, but computers don’t know the nuances. An example of this, using gawoniha, is ᎦᏬᏂᎭ. Let’s see what that would look like using the breakdown of the word above.

ᎦᏬᏂᎭ

Ꭶ (ga)
ᏬᏂ (woni)
- (h)
Ꭰ (a)
3rd person
singular
animated
to speak
root ending
present tense

The syllabary broken down doesn’t match the phonetic transliteration. That’s because the only single sounds in Cherokee are vowels. /h/ is not a single sound. It must be paired with a vowel or, in the case of the syllable hna, Ꮏ.

In order to deal with this I first transliterate ᎦᏬᏂᎭ to /gawoniha/ then I break it down. In order to do this with syllabary you would have to make the /ha/ it’s own ending but /h/ isn’t the only root ending so you’d have to check for all of those. In addition there is an initial prefix /ga/ which means “since”. so you have to also look for if the word is /gagawoniha/. Thus the real breakdown of the word becomes:

Ꭶ (ga)
ᏬᏂ (woni)
h (h)
Ꭰ (a)
3rd person
singular
animated
to speak
root ending
present tense

Where the root ending is it’s own phonetic value instead of a syllabary value. To add further complication to this – the 3rd person singular animated is g- before vowels and ga- before consonants. There are many other rules for prefixes such as ji- cannot occur before the initial prefix ga- (since). so this becomes ga + verb root + ending + (space) + jigi. There are many such morpheme rules such as y- before vowel and yi- before consonant but yu- before a /w/. Still yet, there are some prefixes that disappear when they’re placed before other prefixes.

All of these have to be broken down and considered when breaking down a word. Nouns and adjectives can become verbs by adding suffixes. Words such as /osigwu/ means all right or fine. However, -ju (one of the interrogative suffixes) added to /osigwu/ becomes /osigwuju/ or “are you alright?” or “how are you doing?”

I hope I’ve not missed something I should’ve explained. The process of deconstructing words programmatically is a complicated process and attempting to simplify that process can sometimes damage one’s calm.

If you have any questions or comments especially if I’ve missed something. Please let me know.

Until next time. Dodadagohvi. ᏙᏓᏓᎪᎲᎢ.

Bibliography
“Target Bullseye” by Alan O’Rourke is licensed with CC BY 2.0. To view a copy of this license, visit https://creativecommons.org/licenses/by/2.0/

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.