COMP 200: Elements of Computer Science
Spring 2013

Assignment 5

Put all your answers in one CodeSkulptor file. Put text answers in Python comments. Use Python comments to clearly mark each problem number.

Work in assigned pairs on this assignment. Submit it on OWL-Space, under Assignment 5. Only the first person listed in the pair should submit the Codeskulptor URL. Put the pair designation as posted, (NetID1, NetID2), at the top of your submission as well as in Python comments at the top of your CodeSkulptor code. Including the person's actual names too would be nice.

As with the previous assignments, your grade will partly depend on your code style.

Be sure to read the course policies.

Text Riffing (40 points total)

Given a Markov chain, we can generate text based upon it with the following general approach.

  1. Pick one of the chain's/dictionary's n-word keys.
  2. Initialize a list of strings with these words.
  3. Loop as long as desired:
    1. Use the previous n-word sequence to look up possible successors in the chain/dictionary. If there are no successors, exit from the loop.
    2. Pick one of these, based upon the probabilities.
    3. Add it to the list of strings.
  4. Convert that list of strings into a single string, and return it.

For example, consider the 2nd-order Markov chain for Eight Days a Week. Assume we pick ('I', 'need') as our starting sequence. We add 'I' and 'need' to an empty list. We see that ('I', 'need') can be followed by 'you' or 'your', each with 1/2 probability. We randomly pick 'you' and add that to our list. We now see that ('need', 'you') must be followed by a comma, so we add that to our list. We now see that ('you', ',') can be followed by 'Hold' or 'Eight', each with 1/2 probability. We randomly pick 'Hold' and add that to our list. Let's assume we only want to generate 5 words. We have ["I", "need", "you", ",", "Eight"], and we can use string.join() to get "I need you , Eight".

There's potentially confusing detail. Each word sequence we pick must be in the original text, and thus in the dictionary, with one possible exception. If the last n words in the original text only appear as that sequence once at the end, then that sequence has no possible successors. So, when generating our text, if we are working with that sequence, we can't possibly continue, so we just exit from the loop.

  1. (20 points)

    Define the function random_choice_weighted(choices), which takes a dictionary, where each possible choice is mapped to its probability. You may assume without checking that the probabilities are non-negative and sum to 1. The function returns one of the choices picked randomly with respect to the weighted probabilities.

    Hint: Structure your function so it works like the following example. Assume your choices are {"a" : .25, "b" : .5, "c" : .25"}. We randomly pick the number 0.8. Our number 0.8 is not less than or equal to the first probability 0.25, so we skip the first option and adjust our number to 0.55 = 0.8 - 0.25. Our number 0.55 is not less than or equal to the next probability 0.5, so we skip that second option and adjust our number to 0.05 = 0.55 - 0.5. Our number 0.05 is less than or equal to the next probability 0.25, so we pick that third option and return "c".

  2. (20 points)

    Define the function generate_text(chain, num_words). Given a Markov chain, it generates a random sequence of the given number of words. It returns a string of the resulting text.

    The Markov chain input should be created by the code from Assignment 4's last problem. You may use your code or our sample solution.

    As previously described, the result can be less than the specified number of words, if the generation routine happens to comes across a word sequence with no successor.

    When picking the starting sequence, you can simply pick any of those that occur in the Markov chain, or you can be slightly more sophisticated and do something like pick one that starts with a capital letter.

Sets (30 points total)

You'll probably want to review the CodeSkulptor documentation to see the available operations on sets.

  1. (10 points)

    Define a function all_pairs(input_set). It returns a set of all pairs of elements in the input set — such that order matterns and including self-pairs. Mathematically, this is called the Cartesian product of a set with itself.

    For example, all_pairs(set([1, 2, 3])) should return set([(1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (3,1), (3,2), (3,3)]).

    Style note: Your code should only use elements, pairs, and sets. In particular, you should not convert the input set into a list or other data structure.

  2. (20 points)

    Define a function arbitrary_pairing(input_set). If the input set has an even number of elements, it returns a set of pairs of elements such that each element occurs once. If the input set has an odd number of elements, it has the same behavior, except that the resulting set also contains one lone element.

    For example arbitrary_pairing(set([1, 2, 3, 4, 5, 6])) could return set([(1, 4), (6, 2), (3, 5)]), while arbitrary_pairing(set([1, 2, 3, 4, 5])) could return set([(1, 4), (3, 5), 2]). More usefully, given a class roster, this generates pairings of assignment partners!

    Hint: Don't use the random module. The specifications do not indicate that the pairing needs to be random or non-deterministic. I.e., calling the function repeatedly with the same arguments can always lead to the same results.

    Style note: Your code should only use elements, pairs, and sets. In particular, you should not convert the input set into a list or other data structure.

Graphs (30 points total)

  1. (10 points)

    Give the encodings for the directed and undirected graphs on slide 13 of the Graphs/Networks overview. Your answers should be similar to those illustrated in the Representing Graphs notes.

  2. (20 points)

    Define the function random_undirected_graph(num_nodes, prob). It returns an undirected graph with the specified number of nodes named with integers 0…num_nodes-1. The given probability value is the probability that the edge between each arbitrary pair of nodes exists. In other words, for each potential edge, we roll an imaginary die and, with the given probability, we add the edge to the graph.

    Note: We would expect the graph to have approximately prob × num_nodes × (num_nodes - 1)/2 edges.