COMP 200 Elements of Computer Science &
COMP 130 Elements of Algorithms and Computation
Spring 2012

Practice with Nested Data Structures

For your current assignment, you'll be building and using Markov chains, which we are representing with dictionaries that have keys that are tuples of strings and values that are also dictionaries. Those inner dictionaries, in turn, have keys that are strings, and values that are percentages. As examples, we posted the 1st- and 2nd-order Markov chains built from Johnny B. Goode and Eight Days a Week.

Before, we have also seen lists of tuples, and COMP 130 students have used lists of lists.

But, a Markov chain is the most complicated data structure we've used so far in this course. It is easy to get overwhelmed working with complicated data structures, so today we will get some practice working with them.

While these exercises introduce very little that is brand new, you will likely find them challenging. They work with more complicated data, and require some thought to break into smaller pieces.

Today's Example Data Structure

As a similar example, we will model dictionaries. No, not what Python calls a dictionary — something that gives meanings of English words.

We could use just a Python dictionary. Each key would be a word, and the associated value would be the corresponding meaning. That intuition is the idea behind the name of Python dictionaries. But, it misses out on two important ideas that we want to capture. First, a word can be used as different parts of speech. For example, “bug” can be a noun or a verb. It's meaning naturally differs for each part of speech. Second, a word can have multiple meanings, even for the same part of speech. For example, “bug” as a noun can be a kind of animal, or a mistake.

So, a better representation would be a dictionary that maps words (strings) to dictionaries that map parts of speech (strings) to a list of meanings (strings). For example

{"bug": {"noun": ["A small insect, especially a hemipteran",
                  "A defect or mistake, especially in computer code",
                  "A listening device",
                  "A microorganism, especially a virus",
                  "A craze or obsession"],
         "verb": ["To bother or annoy",
                  "To install a listening device"]},
 "bugle": {"noun": ["A brass instrument like a small trumpet",
                    "A plant of the Ajuga family"],
           "verb": ["To play a bugle instrument"]},
 …}

To avoid confusion in using the word “dictionary”, we'll introduce the following terminology.

While the above will be a suitable example for today, we could get even more elaborate with our representation. We could include each word's pronunciation, etymology, example uses of each definition, alternate spellings, derived forms, and similar information.

Observe that this data structure is similar to a Markov chain, so you should be able to adapt the lessons learned here.

Exercises

We want to be able to create a lexicon. Simply typing in something like

is possible, but tedious and annoying. Instead, we want functions to create an empty lexicon and add a meaning to a lexicon.

Define a function that creates and returns a brand-new empty lexicon. The function has no arguments. The underlying question you should first ask yourself is what does an empty lexicon look like?

Now, define a function that adds one meaning to a lexicon. The function takes a lexicon, a word, a part of speech, and a meaning. It returs nothing, but it modifies the given lexicon. Note that the word might or might not be in the lexicon already. Similarly, the word might or might not already have a meaning for this part of speech.

To get you started, we will provide an outline of one way of structuring the code. Rather than having one function that manipulates the whole data structure, i.e., the outer and inner dictionaries and the list, we can write multiple functions, where each only manipulates one level of the data structure. This is an example of problem decomposition. It allows us to only think about one piece of the problem at a time.

The idea is that we would use the code in the following way:

Can you finish the above code?

However, we can simplify both versions of the above code. Recall our previous use of defaultdict. It's just like a regular dictionary, but it allows us to simplify the initialization, by initializing all values once with a default value. That allows us to not check whether or not a key is already in the dictionary.

Whew! Now we have several ways to create our data structure. We also want to use it. We'll have one exercise for finding things in a lexicon.

Define a function that takes a lexicon and a string. It returns a list of all the lexicon words that have a meaning containing the string. For example, searching for "annoy" in our example lexicon would return ["bug"]. Note that the returned list should contain any word at most once.

As in the previous examples, we strongly recommend decomposing the problem by writing helper functions for each level of the data structure.

Try it out on examples like the following. As always, if you are having trouble understanding the behavior of the code, add some print statements to the functions, so that you can observe what is happening.