1. Objects and References
In Python, everything is an object, that is, every entity, be it a simple integer, a list, a string, a dictionary or a URL processing object, is an encapsulated entity with associated functions (methods). We like to say that objects are “intelligent” in that they have their own behaviors and thus “know” how to take care of themselves. Here's something cute to try, showing that even integers are objects:
>>> x = -42 # -42 is an integer object
>>> x.__abs__() # call the absolute value method of the integer object
Because everything in Python is an object, that means that variables are actually references to those objects. This is analogous to the variable being a name tag associated with a person. Other people identify that person by the name on the name tag. The name tag is not the person, it's only a name that refers to that person. You can do a number of things with name tags. If you give the name tag to someone else, the name tag now refers to the new person, not you any more. You can have more than one name tag and thus answer to multiple names. If you had two name tags, “Bob” and “Robert”, then if someone hits “Bob” in the face, “Robert” will also have a black eye.
For example, if we type x = ("a", "b")
, then
Python will dutifully create a tuple. But the net result, as
far as the rest of the program is concerned, is that the
Python system creates a reference to that tuple. In computer
hardware-speak, the reference is the memory location
(address) in the computer where the data for the tuple is
actually stored. Thus, in reality, the code ("a",
"b")
actually corresponds to this memory reference,
not to the actual tuple. Thus the equals sign simply makes
the x
variable contain a new reference. The
value of x
is thus the reference to the tuple,
not the actual tuple.
When one variable is assigned to another, the value of whatever the first variable is referencing is not copied. Instead, it is the reference that is copied over to the new variable. The original object is untouched.
Consider the following code:
mylist = ["a", "b", "c"]
yourlist = mylist
mylist[1] = 42
mylist = [5, 4, 3, 2, 1]
Let's run it here.
It's a lot messier, but you can see a more accurate visualization of the execution of this code here.
Because all variables are really references to objects and not the objects themselves, there are unintended side effects that can happen when references to mutable objects are passed around in and out of functions and not treated with respect.
The input parameter of a function is a local variable and as such, is a reference to an object, the same as any other variable. When a function is called with a given input value, the input parameter variable is set to reference that given input value.
For example, consider the following code:
def func(alist):
alist[2] = 100
return alist
mylist = ["a", "b", "c"]
yourlist = func(mylist)
Let's run it here.
The messier, but more accurate, version is here.
It is important to understand how these references operate and what that means for your program. There is often confusion about the visibility and scope of particular variables. The terms local and global are used to describe the relative visibility and thus usability of variables in different parts of your program. This is part of the overall notion of visibility called scoping.
Global variables can be seen and used by any code with equal or greater indentation with respect to the indentation where it was defined. But that same variable is local to its own indentation and thus cannot be seen or used by anything that is separated from it by a lesser level of indentation. In Python, a variable's scope is the level of indentation at which it was defined. A variable is global to everything that is indented from where it was defined and is local to anything of lesser indentation. In common usage, a global variable is generally meant to be one that is global to the entire program file and a local variable is one that is only usable with the scope of a function or other greater indentation.
In the event that a local variable has the same name as a global variable, the local variable always has precedence. This phenomenon is called masking.
This badly written program illustrates some of these ideas:
myGlobal = 42 # Defines a global variable
def myFunc1():
myLocal = "abc" # Defines a local variable
print "myFunc1: myGlobal = ", myGlobal # myGlobal is the global variable.
print "myFunc1: myLocal = ", myLocal # myLocal is the local variable.
def myFunc2():
myLocal = "ABC" # Defines a local variable
myGlobal = 200 # Defines another local variable
print "myFunc2: myGlobal = ", myGlobal # myGlobal is a local variable.
print "myFunc2: myLocal = ", myLocal # myLocal is a local variable.
def myFunc3(myGlobal): # The input parameter is a local variable.
myLocal = "XYZ" # Defines a local variable
myGlobal = 300 # Reassigns the local variable
print "myFunc3: myGlobal = ", myGlobal # myGlobal is the local variable.
print "myFunc3: myLocal = ", myLocal # myLocal is the local variable.
print "myGlobal = ", myGlobal
myFunc1() # Prints global variable because there's no name conflict
print "myGlobal = ", myGlobal
# The following line will fail when uncommented:
#print "myLocal = ", myLocal
myFunc2() # Name conflict between local and global variable names. Local variable has precendence.
print "myGlobal = ", myGlobal
myFunc3(myGlobal) # Name conflict between local and global variable names. Local variables, such as the input parameter, have precendence.
print "myGlobal = ", myGlobal
You can see how this program executes here.
Exercise 1.1 - Pass by Reference
What is the output of this code? Do not run it. Instead, generate a hypothesis and explain it to your partners. After you all agree, run the code to confirm your answer.
def mystery_function(bool_list):
"""
What do I do?
Arguments:
bool_list - a list of booleans
"""
for index in xrange(len(bool_list)):
bool_list[index] = not bool_list[index]
>>> my_list = [True, False]
>>> mystery_function(my_list)
>>> my_list
"what shows up here???"
Exercise 1.2 - ???
What is the output of this code? Do not run it. Instead, generate a hypothesis and explain it to your partners. After you all agree, run the code to confirm your answer.
d = {}
s = set([1, 2, 3])
d[1] = s
d[2] = s
print d
s.add(4)
print d
What is the output of this code? Do not run it. Instead, generate a hypothesis and explain it to your partners. After you all agree, run the code to confirm your answer.
list1 = [[1, 2], [3, 4], [5, 6]]
list2 = list1
list2[1].append(5)
print list1
print list2
Exercise 1.3 - Graph Functions
Write two versions of the graph functions
add_node(g)
and remove_node(g)
.
In the first version, modify the given graph with respect
to the operation and in the second return a new graph
that realizes the operation while leaving the given graph
unchanged.
2. Matrices in NumPy
In this course, you will sometimes be asked to write code using matrices. We could represent matrices in Python as a list of lists. A couple disadvantages of that are that we have to ensure that all the sub-lists are of the same length and that we have to write lots of basic functions such as adding and multiplying matrices. Instead, we'll use the matrices and matrix routines already defined in the NumPy library. (Technically, we'll be using the 2-dimensional arrays in NumPy. NumPy also has a separate type for matrices that we won't be using.)
For the assignment, you need to be able to create a matrix and multiply matrices.
>>> import numpy
>>> a = numpy.array([2,3,4]) # a 1x3 matrix (a vector)
>>> a
array([2,3,4])
>>> b = numpy.array([(1,2,3),(4,5,6)]) # a 2x3 matrix
>>> b # Note the different output form.
array([[1,2,3],
[4,5,6]])
>>> b.shape
(2,3)
>>> numpy.zeros((2,3))
array([[0,0,0],
[0,0,0]])
>>> numpy.ones((2,3))
array([[1,1,1],
[1,1,1]])
>>> numpy.empty((2,3))
array([[?,?,?],
[?,?,?]]) # The actual values are essentially random.
>>> b + b
array([[2,4,6],
[8,10,12]])
>>> b - numpy.ones((2,3))
array([[0,1,2],
[3,4,5]])
>>> b[1]
[4,5,6]
>>> b[1][1]
5
>>> b[1][1] = 500
>>> for row in b:
print row
[1,2,3]
[4,500,6]
For additional examples, refer to the NumPy Tutorial. Also, the NumPy Example List individually lists each function available.
2.1 Matrix Practice
Look at the NumPy Tutorial and find out how to do the following.
- From the above examples,
b
is a2 x 3
matrix. Create a3 x 4
matrix, and multiply them withnumpy.dot()
. The result should be a2 x 4
matrix. - Find a function to easily produce the
n x n
identity matrix. - Write a function that takes a matrix and determines whether it is an (0,1)-matrix or not. I.e., are all of its values 0 or 1?
- Write a function
adjlist2adjmatrix(g)
that converts a dictionary representation of a graph to a matrix representation. - Write a function
edge_count(g)
that takes a graph in matrix form and returns the number of edges. - Write a function
find_popular_nodes(g)
that takes a graph in matrix form and returns a set of nodes with above average degree. Note that the pseudo-code for this is identical to the psuedo-code we wrote in lab 2!
3. Testing Strategies
While proving that a non-trivial program is correct under all circumstances is practically impossible, testing is still a critical aspect of program construction. By testing our programs we gain confidence in their ability to achieve our desired goals while also providing an opportunity to identify and correct defects. For many of the functions we have asked you to write in lab so far we have also asked you to verify those functions' output against an expected output when given a particular input. Defining the expected output of a function relative to a given input is the cornerstone of a testing strategy called unit testing. Unit testing as a whole is a deep subject beyond the scope of this course; however, it is worthwhile to examine some of its basic principles in order to aid our efforts.
Imagine we wanted to test Python's list sorting method
list.sort()
. We could easily do so by
creating a list of elements, invoking sort
and then visually inspecting the output to ensure the
expected output. However, to ensure that
sort
works on larger lists we must write
code to try sorting such larger lists which in turn could
be tedious and error prone to verify visually. Instead,
we could write a simple function to aid in our testing.
Consider the following code:
>>> some_list = [5, 2, 1, 9, 4, 5, 9, 8]
>>> some_list.sort()
>>> assert_lists_identical([1,2,4,5,5,8,9,9], some_list)
True
Here the assert_lists_identical
function
checks to ensure that the two list parameters are of
equal length and contain equivalent elements in the same
order. By returning True
this function
asserts that the Python sort
method has
produced the expected effect upon some_list
.
The advantage of using a function such as
assert_lists_identical
instead of simply
testing for logical list equality via the ==
operator is that, should a difference be found, our
function can print useful information about the
differences such as the first index a difference was
detected and what the difference is.
One of the chief advantages of creating such tests is
that they are repeatable. Not only can they characterize
the robustness of our functions (e.g. set
)
the first time we execute them but also at any point in
the future. Imagine creating a separate Python file
filled with dozens of tests for a function you have
created. Should you ever tweak your function, perhaps for
performance improvements or bug fixes, you can easily go
back to your testing file and re-execute the archived
test code to ensure your modifications have not
compromised any of your test cases. This process, in
testing parlance, is referred to as regression
testing. Such testing is a mission critical activity
for software quality assurance. In fact, it is not
uncommon for companies to dedicate whole servers to
perpetually check out, compile and regression test a
project's source code in an unending and everlasting
cycle.
Exercise 3.1 assert_lists_identical
Create a function assert_lists_identical
that takes two lists and returns True
if
they are logically identical and False
otherwise. If the lists are not logically identical, your
method should print out useful testing information to
help diagnose where and how the lists differ.
Exercise
3.2 assert_graphs_identical
Create a function assert_graphs_identical
that takes two graphs and returns True
if
they are logically identical and False
otherwise. If the graphs are not logically identical,
your method should print out useful testing information
to help diagnose where and how the graphs differ.
Submit your code
At the end of lab, submit your group's Python code here.