1. Objects and References

In Python, everything is an object, that is, every entity, be it a simple integer, a list, a string, a dictionary or a URL processing object, is an encapsulated entity with associated functions (methods). We like to say that objects are “intelligent” in that they have their own behaviors and thus “know” how to take care of themselves. Here's something cute to try, showing that even integers are objects:

>>> x = -42          # -42 is an integer object
>>> x.__abs__()      # call the absolute value method of the integer object

Because everything in Python is an object, that means that variables are actually references to those objects. This is analogous to the variable being a name tag associated with a person. Other people identify that person by the name on the name tag. The name tag is not the person, it's only a name that refers to that person. You can do a number of things with name tags. If you give the name tag to someone else, the name tag now refers to the new person, not you any more. You can have more than one name tag and thus answer to multiple names. If you had two name tags, “Bob” and “Robert”, then if someone hits “Bob” in the face, “Robert” will also have a black eye.

For example, if we type x = ("a", "b"), then Python will dutifully create a tuple. But the net result, as far as the rest of the program is concerned, is that the Python system creates a reference to that tuple. In computer hardware-speak, the reference is the memory location (address) in the computer where the data for the tuple is actually stored. Thus, in reality, the code ("a", "b") actually corresponds to this memory reference, not to the actual tuple. Thus the equals sign simply makes the x variable contain a new reference. The value of x is thus the reference to the tuple, not the actual tuple.

When one variable is assigned to another, the value of whatever the first variable is referencing is not copied. Instead, it is the reference that is copied over to the new variable. The original object is untouched.

Consider the following code:

mylist = ["a", "b", "c"]
yourlist = mylist
mylist[1] = 42
mylist = [5, 4, 3, 2, 1]

Let's run it here.

It's a lot messier, but you can see a more accurate visualization of the execution of this code here.

Because all variables are really references to objects and not the objects themselves, there are unintended side effects that can happen when references to mutable objects are passed around in and out of functions and not treated with respect.

The input parameter of a function is a local variable and as such, is a reference to an object, the same as any other variable. When a function is called with a given input value, the input parameter variable is set to reference that given input value.

For example, consider the following code:

def func(alist):
    alist[2] = 100
    return alist

mylist = ["a", "b", "c"]
yourlist = func(mylist)

Let's run it here.

The messier, but more accurate, version is here.

It is important to understand how these references operate and what that means for your program. There is often confusion about the visibility and scope of particular variables. The terms local and global are used to describe the relative visibility and thus usability of variables in different parts of your program. This is part of the overall notion of visibility called scoping.

Global variables can be seen and used by any code with equal or greater indentation with respect to the indentation where it was defined. But that same variable is local to its own indentation and thus cannot be seen or used by anything that is separated from it by a lesser level of indentation. In Python, a variable's scope is the level of indentation at which it was defined. A variable is global to everything that is indented from where it was defined and is local to anything of lesser indentation. In common usage, a global variable is generally meant to be one that is global to the entire program file and a local variable is one that is only usable with the scope of a function or other greater indentation.

In the event that a local variable has the same name as a global variable, the local variable always has precedence. This phenomenon is called masking.

This badly written program illustrates some of these ideas:

myGlobal = 42                    # Defines a global variable

def myFunc1():
    myLocal = "abc"              # Defines a local variable

    print "myFunc1: myGlobal = ", myGlobal  # myGlobal is the global variable.
    print "myFunc1: myLocal = ", myLocal    # myLocal is the local variable.

def myFunc2():
    myLocal = "ABC"              # Defines a local variable
    myGlobal = 200               # Defines another local variable
	
    print "myFunc2: myGlobal = ", myGlobal  # myGlobal is a local variable.
    print "myFunc2: myLocal = ", myLocal    # myLocal is a local variable.

def myFunc3(myGlobal):           # The input parameter is a local variable.
    myLocal = "XYZ"              # Defines a local variable
    myGlobal = 300               # Reassigns the local variable

    print "myFunc3: myGlobal = ", myGlobal  # myGlobal is the local variable.
    print "myFunc3: myLocal = ", myLocal    # myLocal is the local variable.


print "myGlobal = ", myGlobal
myFunc1()           # Prints global variable because there's no name conflict
print "myGlobal = ", myGlobal

# The following line will fail when uncommented:
#print "myLocal = ", myLocal

myFunc2()           # Name conflict between local and global variable names. Local variable has precendence.
print "myGlobal = ", myGlobal

myFunc3(myGlobal)   # Name conflict between local and global variable names.  Local variables, such as the input parameter, have precendence.
print "myGlobal = ", myGlobal

You can see how this program executes here.

Exercise 1.1 - Pass by Reference

What is the output of this code? Do not run it. Instead, generate a hypothesis and explain it to your partners. After you all agree, run the code to confirm your answer.

def mystery_function(bool_list):
    """
    What do I do?

    Arguments:
    bool_list - a list of booleans
    """
    for index in xrange(len(bool_list)):
        bool_list[index] = not bool_list[index]
>>> my_list = [True, False]
>>> mystery_function(my_list)
>>> my_list
"what shows up here???"

Exercise 1.2 - ???

What is the output of this code? Do not run it. Instead, generate a hypothesis and explain it to your partners. After you all agree, run the code to confirm your answer.

d = {}
s = set([1, 2, 3])
d[1] = s
d[2] = s
print d
s.add(4)
print d

What is the output of this code? Do not run it. Instead, generate a hypothesis and explain it to your partners. After you all agree, run the code to confirm your answer.

list1 = [[1, 2], [3, 4], [5, 6]]
list2 = list1
list2[1].append(5)
print list1
print list2

Exercise 1.3 - Graph Functions

Write two versions of the graph functions add_node(g) and remove_node(g). In the first version, modify the given graph with respect to the operation and in the second return a new graph that realizes the operation while leaving the given graph unchanged.

2. Matrices in NumPy

In this course, you will sometimes be asked to write code using matrices. We could represent matrices in Python as a list of lists. A couple disadvantages of that are that we have to ensure that all the sub-lists are of the same length and that we have to write lots of basic functions such as adding and multiplying matrices. Instead, we'll use the matrices and matrix routines already defined in the NumPy library. (Technically, we'll be using the 2-dimensional arrays in NumPy. NumPy also has a separate type for matrices that we won't be using.)

For the assignment, you need to be able to create a matrix and multiply matrices.

>>> import numpy
>>> a = numpy.array([2,3,4])            # a 1x3 matrix (a vector)
>>> a
array([2,3,4])
>>> b = numpy.array([(1,2,3),(4,5,6)])  # a 2x3 matrix
>>> b                                   # Note the different output form.
array([[1,2,3],
       [4,5,6]])
>>> b.shape
(2,3)
>>> numpy.zeros((2,3))
array([[0,0,0],
       [0,0,0]])
>>> numpy.ones((2,3))
array([[1,1,1],
       [1,1,1]])
>>> numpy.empty((2,3))
array([[?,?,?],
       [?,?,?]])                        # The actual values are essentially random.
>>> b + b
array([[2,4,6],
       [8,10,12]])
>>> b - numpy.ones((2,3))
array([[0,1,2],
       [3,4,5]])
>>> b[1]
[4,5,6]
>>> b[1][1]
5
>>> b[1][1] = 500
>>> for row in b:
         print row
[1,2,3]
[4,500,6]

For additional examples, refer to the NumPy Tutorial. Also, the NumPy Example List individually lists each function available.

2.1 Matrix Practice

Look at the NumPy Tutorial and find out how to do the following.

  1. From the above examples, b is a 2 x 3 matrix. Create a 3 x 4 matrix, and multiply them with numpy.dot(). The result should be a 2 x 4 matrix.
  2. Find a function to easily produce the n x n identity matrix.
  3. Write a function that takes a matrix and determines whether it is an (0,1)-matrix or not. I.e., are all of its values 0 or 1?
  4. Write a function adjlist2adjmatrix(g) that converts a dictionary representation of a graph to a matrix representation.
  5. Write a function edge_count(g) that takes a graph in matrix form and returns the number of edges.
  6. Write a function find_popular_nodes(g) that takes a graph in matrix form and returns a set of nodes with above average degree. Note that the pseudo-code for this is identical to the psuedo-code we wrote in lab 2!

3. Testing Strategies

While proving that a non-trivial program is correct under all circumstances is practically impossible, testing is still a critical aspect of program construction. By testing our programs we gain confidence in their ability to achieve our desired goals while also providing an opportunity to identify and correct defects. For many of the functions we have asked you to write in lab so far we have also asked you to verify those functions' output against an expected output when given a particular input. Defining the expected output of a function relative to a given input is the cornerstone of a testing strategy called unit testing. Unit testing as a whole is a deep subject beyond the scope of this course; however, it is worthwhile to examine some of its basic principles in order to aid our efforts.

Imagine we wanted to test Python's list sorting method list.sort(). We could easily do so by creating a list of elements, invoking sort and then visually inspecting the output to ensure the expected output. However, to ensure that sort works on larger lists we must write code to try sorting such larger lists which in turn could be tedious and error prone to verify visually. Instead, we could write a simple function to aid in our testing. Consider the following code:

>>> some_list = [5, 2, 1, 9, 4, 5, 9, 8]
>>> some_list.sort()
>>> assert_lists_identical([1,2,4,5,5,8,9,9], some_list)
True

Here the assert_lists_identical function checks to ensure that the two list parameters are of equal length and contain equivalent elements in the same order. By returning True this function asserts that the Python sort method has produced the expected effect upon some_list. The advantage of using a function such as assert_lists_identical instead of simply testing for logical list equality via the == operator is that, should a difference be found, our function can print useful information about the differences such as the first index a difference was detected and what the difference is.

One of the chief advantages of creating such tests is that they are repeatable. Not only can they characterize the robustness of our functions (e.g. set) the first time we execute them but also at any point in the future. Imagine creating a separate Python file filled with dozens of tests for a function you have created. Should you ever tweak your function, perhaps for performance improvements or bug fixes, you can easily go back to your testing file and re-execute the archived test code to ensure your modifications have not compromised any of your test cases. This process, in testing parlance, is referred to as regression testing. Such testing is a mission critical activity for software quality assurance. In fact, it is not uncommon for companies to dedicate whole servers to perpetually check out, compile and regression test a project's source code in an unending and everlasting cycle.

Exercise 3.1 assert_lists_identical

Create a function assert_lists_identical that takes two lists and returns True if they are logically identical and False otherwise. If the lists are not logically identical, your method should print out useful testing information to help diagnose where and how the lists differ.

Exercise 3.2 assert_graphs_identical

Create a function assert_graphs_identical that takes two graphs and returns True if they are logically identical and False otherwise. If the graphs are not logically identical, your method should print out useful testing information to help diagnose where and how the graphs differ.


Submit your code

At the end of lab, submit your group's Python code here.