Practice with Nested Data Structures

For your current assignment, you'll be building and using Markov chains, which we are representing with dictionaries that have keys that are tuples of strings and values that are also dictionaries. Those inner dictionaries, in turn, have keys that are strings, and values that are percentages. As examples, we posted the 1^st- and 2^nd-order Markov chains built from Johnny B. Goode and Eight Days a Week.

Before, we have also seen lists of tuples, and COMP 130 students have used lists of lists.

But, a Markov chain is the most complicated data structure we've used so far in this course. It is easy to get overwhelmed working with complicated data structures, so today we will get some practice working with them.

While these exercises introduce very little that is brand new, you will likely find them challenging. They work with more complicated data, and require some thought to break into smaller pieces.

Today's Example Data Structure

As a similar example, we will model dictionaries. No, not what Python calls a dictionary — something that gives meanings of English words.

We could use just a Python dictionary. Each key would be a word, and the associated value would be the corresponding meaning. That intuition is the idea behind the name of Python dictionaries. But, it misses out on two important ideas that we want to capture. First, a word can be used as different parts of speech. For example, “bug” can be a noun or a verb. It's meaning naturally differs for each part of speech. Second, a word can have multiple meanings, even for the same part of speech. For example, “bug” as a noun can be a kind of animal, or a mistake.

So, a better representation would be a dictionary that maps words (strings) to dictionaries that map parts of speech (strings) to a list of meanings (strings). For example

{"bug": {"noun": ["A small insect, especially a hemipteran",
                  "A defect or mistake, especially in computer code",
                  "A listening device",
                  "A microorganism, especially a virus",
                  "A craze or obsession"],
         "verb": ["To bother or annoy",
                  "To install a listening device"]},
 "bugle": {"noun": ["A brass instrument like a small trumpet",
                    "A plant of the Ajuga family"],
           "verb": ["To play a bugle instrument"]},
 …}

To avoid confusion in using the word “dictionary”, we'll introduce the following terminology.

A meaning is a string such as "To both or annoy".
Meanings are a list of these meanings.
A definition is a dictionary mapping a part of speech (a string) to meanings. I.e., it is one of the inner dictionaries.
A lexicon is a dictionary mapping a word (a string) to definitions. I.e., it is the whole outer dictionary.

While the above will be a suitable example for today, we could get even more elaborate with our representation. We could include each word's pronunciation, etymology, example uses of each definition, alternate spellings, derived forms, and similar information.

Observe that this data structure is similar to a Markov chain, so you should be able to adapt the lessons learned here.

Exercises

We want to be able to create a lexicon. Simply typing in something like

lexicon = {"bug": {"noun": ["A small insect, especially a hemipteran", "A defect or mistake, especially in computer code", "A listening device", "A microorganism, especially a virus", "A craze or obsession"], "verb": ["To bother or annoy", "To install a listening device"]}}

is possible, but tedious and annoying. Instead, we want functions to create an empty lexicon and add a meaning to a lexicon.

Define a function that creates and returns a brand-new empty lexicon. The function has no arguments. The underlying question you should first ask yourself is what does an empty lexicon look like?

# This is our first, simplest version.  We'll come back to this question later.
#
# An empty lexicon is just an empty dictionary.
def createLexicon():
    """Returns an empty lexicon."""
    return {}

Now, define a function that adds one meaning to a lexicon. The function takes a lexicon, a word, a part of speech, and a meaning. It returs nothing, but it modifies the given lexicon. Note that the word might or might not be in the lexicon already. Similarly, the word might or might not already have a meaning for this part of speech.

To get you started, we will provide an outline of one way of structuring the code. Rather than having one function that manipulates the whole data structure, i.e., the outer and inner dictionaries and the list, we can write multiple functions, where each only manipulates one level of the data structure. This is an example of problem decomposition. It allows us to only think about one piece of the problem at a time.

def createMeanings():
    """Returns an empty group of meanings."""
    ...

def addToMeanings(meanings,meaning):
    """Adds a meaning to the group of meanings."""
    ...


def createDefinition():
    """Returns an empty definition."""
    ...

def addToDefinition(definition,partOfSpeech,meaning):
    """Adds a meaning to the definition."""

    # Initialize the group of meanings, if necessary.
    ...
    # Add the meaning.
    ...


# Repeated for convenience.
def createLexicon():
    """Returns an empty lexicon."""
    return {}

def addToLexicon(lexicon,word,partOfSpeech,meaning):
    """Adds a meaning to the lexicon."""

    # Initialize the definition, if necessary.
    ...
    # Add the meaning.
    ...

The idea is that we would use the code in the following way:

lexicon = createLexicon()
addToLexicon(lexicon,"bug","verb","To bother or annoy")
addToLexicon(lexicon,"bugle","noun","A brass instrument like a small trumpet")
addToLexicon(lexicon,"bugle","noun","A plant of the Ajuga family")
addToLexicon(lexicon,"bug","noun","A small insect, especially a hemipteran")
...

Can you finish the above code?

Here is the completed version that follows the previous outline.

def createMeanings():
    """Returns an empty group of meanings."""
    return []

def addToMeanings(meanings,meaning):
    """Adds a meaning to the group of meanings."""
    meanings.append(meaning)


def createDefinition():
    """Returns an empty definition."""
    return {}

def addToDefinition(definition,partOfSpeech,meaning):
    """Adds a meaning to the definition."""

    if partOfSpeech not in definition:
        definition[partOfSpeech] = createMeanings()

    addToMeanings(definition[partOfSpeech],meaning)


def createLexicon():
    """Returns an empty lexicon."""
    return {}

def addToLexicon(lexicon,word,partOfSpeech,meaning):
    """Adds a meaning to the lexicon."""

    if word not in lexicon:
        lexicon[word] = createDefinition()

    addToDefinition(lexicon[word],partOfSpeech,meaning)

Again, we have a separate function for each nested level of the data structure. When writing a function, you only need to think about the data representation of that one level. If we changed the representation of the inner dictionary (definition), for example, we wouldn't have to change the functions that manipualted the outer dictionary (lexicon).

For comparison, here is a monolithic version that does everything in one function. While shorter, if you ignore the comments, it is potentially very confusing. The comments here are almost necessary to understand the reasoning.

def createLexicon():
    """Returns an empty lexicon."""
    return {}

def addToLexicon(lexicon,word,partOfSpeech,meaning):
    """Adds a meaning to the lexicon."""
    if word not in lexicon:
        # The word is not in the lexicon.
        # Add the word to the lexicon (outer dictionary),
        # along with its part of speech and definition.
        
        lexicon[word] = {partOfSpeech : [meaning]}

    elsif partOfSpeech not in lexicon[word]:
        # The word is in the lexicon, but the part of speech is not.
        # Use the existing word's definition (the inner dictionary),
        # but add a new part of speech to it.

        lexicon[word][partOfSpeech] = [meaning]

    else:
        # The word is in the lexicon, and the part of speech is, too.
        # Use the existing word's definition (the inner dictionary),
        # and use the definition's existing part of speech.
        # Add a new meaning to the list of meanings.

        lexicon[word][partOfSpeech].append(meaning)

However, we can simplify both versions of the above code. Recall our previous use of defaultdict. It's just like a regular dictionary, but it allows us to simplify the initialization, by initializing all values once with a default value. That allows us to not check whether or not a key is already in the dictionary.

from collections import defaultdict

# Repeated for convenience.
def createMeanings():
    """Returns an empty group of meanings."""
    return []

# Repeated for convenience.
def addToMeanings(meanings,meaning):
    """Adds a meaning to the group of meanings."""
    meanings.append(meaning)


# This now creates a defaultdict.  The default is whatever createMeanings returns.
def createDefinition():
    """Returns an empty definition."""
    return defaultdict(createMeanings)

# This no longer has to initialize.
def addToDefinition(definition,partOfSpeech,meaning):
    """Adds a meaning to the definition."""
    addToMeanings(definition[partOfSpeech],meaning)


# This now creates a defaultdict.  The default is whatever createDefinition returns.
def createLexicon():
    """Returns an empty lexicon."""
    return defaultdict(createDefinition)

# This no longer has to initialize.
def addToLexicon(lexicon,word,partOfSpeech,meaning):
    """Adds a meaning to the lexicon."""
    addToDefinition(lexicon[word],partOfSpeech,meaning)

Whew! Now we have several ways to create our data structure. We also want to use it. We'll have one exercise for finding things in a lexicon.

Define a function that takes a lexicon and a string. It returns a list of all the lexicon words that have a meaning containing the string. For example, searching for "annoy" in our example lexicon would return ["bug"]. Note that the returned list should contain any word at most once.

As in the previous examples, we strongly recommend decomposing the problem by writing helper functions for each level of the data structure.

We'll show the code one piece at a time, in case you need a hint to get started.

def wordsInLexicon(lexicon,string):
    """Returns a list of words with the string in a meaning."""
    words = []
    for word,definition in lexicon.items():
        if occursInDefinition(definition,string):
            words.append(word)
    return words

So, how can you write the desired helper function?

def occursInDefinition(definition,string)
    """Returns whether the string occurs in any of the definition's meanings."""
    for partOfSpeech,meanings in definition.items():
        if occursInMeanings(meanings,string):
            return True
    return False

And how can you write the next desired helper function?

def occursInMeanings(meanings,string):
    """Returns whether the string occurs in any of the meanings."""
    for meaning in meanings:
        if string in meaning:
            return True
    return False

Again, each of these functions “knows” about one level of the data structure. If we were to change the representation of only one level, we would need to change only that one corresponding function.

For comparison, here's one way to define this with everything in one function. How understandable is this?

def wordsInLexicon(lexicon,string):
    """Returns a list of words with the string in a meaning."""
    words = []
    for word,definition in lexicon.items():
        occurs = False
        for partOfSpeech,meanings in definition.items():
            for meaning in meanings:
                if string in meaning:
                    occurs = True
                    words.append(word)
                    break      # break from "for meaning" loop
            if occurs:
                break          # break from "for partOfSpeech,meanings" loop
    return words

Try it out on examples like the following. As always, if you are having trouble understanding the behavior of the code, add some print statements to the functions, so that you can observe what is happening.

lexicon = createLexicon()
addToLexicon(lexicon,"bug","verb","To bother or annoy")
addToLexicon(lexicon,"bugle","noun","A brass instrument like a small trumpet")
addToLexicon(lexicon,"bugle","noun","A plant of the Ajuga family")
addToLexicon(lexicon,"bug","noun","A small insect, especially a hemipteran")
...
print wordsInLexicon(lexicon,"annoy")
print wordsInLexicon(lexicon,"small")

COMP 200 Elements of Computer Science & COMP 130 Elements of Algorithms and Computation Spring 2012

Practice with Nested Data Structures

Today's Example Data Structure

Exercises

COMP 200 Elements of Computer Science &
COMP 130 Elements of Algorithms and Computation
Spring 2012