COMP 200 Elements of Computer Science &
COMP 130 Elements of Algorithms and Computation
Spring 2012

Practice for Exam 2: Comp130-specific topics

The following problems are typical of those that will be on the exam as Comp130-specific problems.  For non-Comp130-specific problem examples, see the other practice problem set.

In addition to answering these questions, you should also review the course's online notes and the solutions to the two assignments. We have also made a study guide.

Problems

Explain why you might expect the PCA analysis of a word/tuple occurrences in a specific author's text corpus to form a cluster in a many-dimensional space.

Solution (No peeking!):

""" A cluster in the PCA analyzed data indicates that certain combinations of words/tuples
are particularly common in the text corpus.    Since there is naturally a statistical spread
in the occurences of the word/tuple combinations, a cluster is formed as each multi-dimensional 
data point representing the word/tuple counts for a given text chunk tend to clump around a 
central point in a high dimensional space.   These combinations may possibily involving 
many words/tuples of words, thus the cluster may exist in a the high dimensional space 
represented by the counts of the words/tuples involved.   It should be noted that hte cluster
may however exist in only a small subspace of the even larger dimensional space represented 
by all of the word/tuple counts used to measure the text corpus. 
"""

 

Rewrite the countTextChunks() function from the PCA analysis code as a MapReduce operation.

Solution (no peeking!):  

import text_analysis as ta
import MapReduce as mr
from collections import defaultdict


def list2dict(aList):
    """ Convert the given list to a dictionary where each key
    is the index of the associated value in the given list.
    """
    aDict = {}
    for i in range(len(aList)):
        aDict[i] = aList[i]
    return aDict
    
    
def countTextChunks(chunks, order):
    """ Count the word occurences in each text chunk in chunks, a list of lists of words.
    Returns a list of defaultDict(int) dictionaries that relates word->count for each text chunk.
    """
    chunkDict = list2dict(chunks)   # convert list of text chunks into a dictionary of text chunks
    
    def mapCountChunk(k, v, outputFunc):
        """ Calls the given outputFunc(aKey, aValue) where aKey = k and aValue 
        is the dictionary of tuple of length order counts in the given text chunk, v.         
        """
        outputFunc(k, ta.countSequences(v, order))   # notice how the order is curried in.
        
    def reduceCountChunk(k, v, outputFunc):
        """  Calls the outputFunc(aKey, aValue) where aKey = k and
        aValue = makeNormalizedDict(v)).
        """
        outputFunc(k, makeNormalizedDict(v[0]))
    
    resultDict = mr.mapReduce(chunkDict, mapCountChunk, reduceCountChunk)
    return resultDict.values()

""" Note: If one was really going to be using MapReduce, one would probably read the chunks directly into 
a dictionary rather than convert the list of chunks into a dictionary as done here.  The choice to insert 
the conversion was to maintain a "drop-in" compatibility with the original countTextChunks() function.

One could argue that the normalization process could have taken place in the map function instead
of the reduce function.

By that same token, one could argue that map could be a no-op, simply passing k and v onto its 
outputFunc and that all the counting work is done in the reduce function.
"""

# The following code is just support code to make a fully operational file.  
# MapReduce.py and text_analysis.py and text files are required as well.
  
def readChunks(filename, chunkSize):
    """ Reads filename, assumed to be in the texts subfolder, and breaks the parsed texted into 
    a list of chunks (lists of words) where each chunk has chunkSize elements.
    The list of chunks is returned.
    """
    words = ta.inputText('texts/'+filename)
    return makeChunks(words, chunkSize)
 
 
def makeChunks(aList, nElts):
    """ Break a given list into multiple smaller lists each of the given number of elements.
    Any leftover elements are ignored.
    A list of the resultant lists is returned.
    """
    return [aList[i:i+nElts] for i in range(0, len(aList), nElts)]

def getTotalValues(dict):
    """Returns the sum of all the values in a given dictionary
    """
    return sum(zip(*dict.items())[1])
    
def makeNormalizedDict(dict):
    """Return a normalized defaultdict(int) where all its values from the given dictionary have be 
    divided by the given total.
    """
    total = getTotalValues(dict)
    result = defaultdict(int)
    for k,v in dict.items():
        result[k] = float(v)/total
    return result    
 
chunks = readChunks("Shakespeare Complete Works.txt", 1000)
countDicts = countTextChunks(chunks, 3)

	

Explain the following functions in terms of currying and closures:

def makeAddToFunc(x):
    def aFunc(y):
        return x+y
    return aFunc

def makeMultByFunc(x):
    def aFunc(y):
        return x*y
    return aFunc
    
def makePowByFunc(x):
    def aFunc(y):
        return pow(y,x)
    return aFunc

def makeFuncCombo(f1, f2):
    def aFunc(x):
        return f1(f2(x))
    return aFunc


f1 = makeFuncCombo(makeMultByFunc(2), makeAddToFunc(5))
f2 = makeFuncCombo(makePowByFunc(2), makeAddToFunc(1))
  
print f1(3), f2(3)
  

Solution (no peeking!):

"""All 4 functions curry their input parameters into their returned functions.   
The returned functions only access the input parameter value that was used to 
create the returned function because that is the value in the closure of that 
particualr function.

makeAddToFunc(x) returns a function that will return the sum of its input parameter 
the "x" value used when the returned function was made.

makeMultByFunc(x) returns a function that will return the product of its input parameter 
the "x" value used when the returned function was made.

makePowByFunc(x) returns a function that will return the its input parameter raised to  
the power of the "x" value used when the returned function was made.

makeFuncCombo(f1, f2) returns a function that will return the result of the compound 
function formed by f1 and f2 applied to its input parameter, i.e. f1(f2(y)) 


""" 

Explain the behavior of the two returned functions of make2Funcs() in terms of currying and closures:


def make2Funcs(a0):
    a = [a0]   # This has to be a list because of the way that Python creates closures
    def aFunc1(x):
        a[0] += x
        return a[0]
    def aFunc2(x):
        return a[0]*x
    return aFunc1, aFunc2
    
g1, h1 = make2Funcs(10)
g2, h2 = make2Funcs(-10)

print g1(2), h1(5), g2(2), h2(5)
print g1(2), h1(5), g2(2), h2(5)
print g1(2), h1(5), g2(2), h2(5)
print g1(2), h1(5), g2(2), h2(5)

Solution (no peeking!):

""" make2Funcs() curries its input into the first element of the local list variable, a, 
which is part of the closures of both the aFunc1() and aFunc2() functions.   The input 
parameter is thus the intial value for a[0], which differs from one call to the next of make2Funcs().   
Since aFunc1 and aFunc2 share the same closure, they share the same list variable, a.  Thus, 
when an aFunc1 function, e.g. g1() or g2(), changes the value of the first element of a, 
then the associated aFunc2 function, e.g. h1() or h2() respectively, will access that same
variable and return a value based on it.  g1() and h1() are linked together while g2() and 
h2() are separatedly linked because the closure shared by g1() and h1() is not the same
closure as g2() and h2().

Technical note:  Python does not allow a curried value (local variable) in a closure to 
be mutated, so the classic "hack" is to use a variable that refers to a list, not a 
primitive value such as an integer or float.   This way, the variable itself, "a" here, is 
never mutated because it always refers to the same list.   On the other hand, the *contents*
of that list are mutated.   Java shares this restriction with Python so the same hack works
in Java.  C# on the other hand, allows mutation of curried variables, so the hack is 
unnecessary in C#. 
"""