Thanks to Prof. Devika Subramanian for the following discussion from the Fall 2011 COMP 140 class:

Lab 04: Graphs and networkx

A graph is a set of nodes or vertices, connected together by edges. A good example of a graph is an airline route map, where the vertices are the airports and the edges are the flights that go from one airport to another. You can see the route map graphs for the major US airlines here. In today’s lab, we will be working with undirected graphs using the Python module networkx.

Working with graphs using `networkx`

The networkx module in Python makes it easy to construct and analyze graphs.

First you need to import the networkx module.

import networkx as nx

This does the usual import for us, but instead of arduously typing the 8 characters of networkx in front of all the functions from the module, all we have to type is the 2 characters: nx. Computer scientists will go through a lot of effort to type a couple fewer characters!

Creating a Graph

In networkx, a graph is an object, i.e., an encapsulated entity that possesses both information (data values) and behavior (methods/functions). Our first step is to create an empty graph:

g = nx.Graph()

The nx.Graph() function constructs a new undirected graph. We now have an empty graph with no nodes and no edges. Before we add nodes and edges to our graph, let us give it a great name.

g.name = "The world's most awesome graph"

This is useful for identifying a graph for plotting purposes. You could name the graph during the graph’s construction as shown below.

g = nx.Graph(name = "The world's most awesome graph")

To add a node to the graph, we use the add_node method associated with our graph object g. We are representing people in a company as nodes in a graph, and we place an edge between a pair of nodes if the corresponding people are working on a project together.

g.add_node("John")
g.add_node("Maria")
g.add_node("Alex")

The input to the add_node method can be any immutable entity that is useful for identifying the node. Typically, numbers and strings are used, but tuples will work too. Edges can be added between nodes using the add_edge method.

g.add_edge("John", "Maria")
g.add_edge("Maria","Alex")

The add_edge method adds an undirected edge between the two specified nodes. A node is specified using whatever identifier was used to create the node using add_node. An added feature of this method is that if one or both of the nodes do not yet exist, add_edge will automatically create them, using the given identifiers. While convenient, this feature can make keeping track of what nodes exist and what nodes do not, quite difficult at times. In general, it is much safer to explicitly create the nodes and then separately connect them, as shown in this example. Note that it is possible to create a self-loop, which is to join a node to itself with an edge; thought it has to make sense in the context of the underlying phenomenon that the graph represents.

Examining a graph’s nodes and edges

To retrieve all nodes in the graph, use the nodes method associated with the graph.

>>> g.nodes()
['John','Maria','Alex']

The nodes method returns a list of all the nodes in the graph. For our example graph, the result I got is shown above. You may observe a different ordering than the one shown here. To get all the edges associated with a graph, use the edges method associated with the graph.

>>> g.edges()
[('John', 'Maria'), ('Alex', 'Maria')]

The edges method returns a list of tuples where each tuple is a pair of nodes that are joined by an edge.

To visualize the graph, download graphplot.py.

import graphplot
graphplot.save_graph(g,"aGraph.png")

The file “aGraph.png” will be created in your directory and you can double-click and view it on your computer, or attach it to a file to submit as part of your homework.

Computing properties of a graph

There are a number of characteristics of graphs that are useful in helping us understand the properties of the underlying phenomenon it represents. Here are a couple of examples:

Degree of a node in a graph and degree histograms

The degree of a node is the number of edges that are incident on it. It helps us understand the local connectivity of a graph. For our example graph, the degree of the node “Maria” is 2. To get the degree of a single node, use the degree method associated with the graph.

>>> g.degree("Maria")
2

To get a dictionary of the degrees of all the nodes in a graph, use the degree method with no input parameter:

>>> g.degree()
{'Alex': 1, 'John': 1, 'Maria': 2}

To get a histogram of the degrees of the nodes in a graph:

>>> nx.degree_histogram(g)
[0,2,1]

nx.degree_histogram returns a list containing the number of nodes in the graph that have particular degree values. The entry at position i (counting from 0) in the list is the number of nodes in the graph that have degree i. In our example graph, no nodes have degree zero, two nodes have degree 1, and one node has degree 2.

Diameter of graphs

The length of a path from one node to another in a graph is typically measured in the number of edges traversed. The shortest path between two nodes is a graph is a path between two nodes with the smallest length. Then the diameter of a graph is the longest of all possible shortest paths in the graph. In our example, the diameter is 2 because no two nodes are separated by a path that traverses more than two edges.

>>> nx.diameter(g)
2

The diameter helps us understand how “wide” the graph is.

Graph centers

A center of a graph is a node where the maximum distance (number of edges) traversed to any other node is a minimum, compared to any other node. Essentially, the center of a graph is the most centrally located node. Note that there may be more than one node that qualifies as the center of the graph.

    >>> nx.center(g)
    ["Maria"]

The networkx function center returns a list of all the node identifiers that qualify as the center of the graph.

Connected components of a graph

A connected component of a graph is a subgraph where every node can be reached from every other node. There is a networkx function to find all the connected components of a graph.

>> nx.connected_component_subgraphs(g)
[<networkx.classes.graph.Graph at 0x101da9a50>]

The networkx function connected_component_subgraphs returns a list of subgraphs, each of which is a connected component of the original graph. Our example graph is connected, and therefore has only one component. If the list had more than one connected component, you can examine them and their properties using the loop below.

for cc in nx.connected_component_subgraphs(g):
    print cc, " has ", len(cc.nodes()), " nodes."

For more information

The above is only a small sampling of the methods of networkx and the Graph object. For instance, there are methods for initializing the graph using lists of vertices and edges and many more types of analysis that can be done on the graph. For more information, look at the following information sources:

Networkx documentation: http://networkx.lanl.gov/contents.html
Python’s built-in help function, e.g. help(nx.Graph)

Graph making exercises

In the seven exercises below, you will gain practice in using networkx to construct graphs and compute properties of graphs.

Exercise 1: making a graph

Write a function called make_simple_graph that returns the graph object that represents the following diagram:

exercise 1

Exercise 2: computing diameter and node degrees of graphs

For the graph made by your function in the previous exercise, calculate the diameter and the degrees of each node, and visually confirm those values by inspecting the above graph. Write a function compute_diameter_and_degrees which takes a networkx graph object as input, and returns the diameter and the degrees of all the nodes in the graph.

Exercise 3: computing average degree of nodes in a graph

Write a function called avg_degree which takes a networkx graph object as input and returns the average degree of the nodes in a graph. Check your function carefully to make sure that it returns a real value instead of truncating it as an integer. (Hint: The float function will turn an integer into a floating point value.). Use your function to compute the average degree of nodes for the graph constructed in Exercise 1.

Exercise 4: computing the center(s) of the graph

Write a function get_centers which takes a networkx graph object and returns its centers. Use the function to compute the centers of the graph constructed in Exercise 1.

Exercise 5a: Graph construction: graph with the greatest diameter

Write a function called get_widest_graph that returns a graph with 5 nodes with the largest possible diameter. You can name your nodes 0,1,2,3, and 4. Use the functions compute_diameter_and_degrees and get_centers that you have built in exercises 2 and 4 to get the diameter of the graph and the center(s) of the graph.

Exercise 5b: Graph construction: graph with the smallest diameter

Write a function called get_narrowest_graph that returns a graph with 5 nodes with smallest possible diameter. You can name your nodes 0,1,2,3, and 4. Use the functions compute_diameter_and_degrees and get_centers that you have built in exercises 2 and 4 to get the diameter of the graph and the center(s) of the graph.

Exercise 5c: Graph construction: graphs with a simple cycle

Write a function called make_simple_cycle that takes a single input parameter N, the number of nodes, and returns a graph with nodes whose identifiers are the integers {0,…,N-1}. This graph has edges from node 0 to node 1, node 1 to node 2, …, node N-2 to node N-1, and finally from node N-1 to node 0. Use the functions compute_diameter_and_degrees and get_centers that you have built in exercises 2 and 4 to get the diameter of the graph and the center(s) of the graph, for the case that N = 5.

Exercise 5d: Graph construction: complete graphs

Write a function called make_complete_graph that takes a single input parameter, the number of nodes N, and returns a graph with nodes whose identifiers are the integers {0,…,N-1}. A complete graph has an edge between every pair of nodes (but no self-loops). Use the functions compute_diameter_and_degrees and get_centers that you have built in exercises 2 and 4 to get the diameter of the graph and the center(s) of the graph, for the case that N = 5.

Exercise 5e: Graph construction: binomial graphs

Write a function called make_binomial_graph that takes two input parameters, the number of nodes, N, and a probability prob, and returns a graph where the probability of any node being linked to any other node is prob. Assume that 0.0 <= prob < 1.0. A graph constructed using this recipe is called a binomial graph (or an “Erdos-Renyi graph”) and has some interesting properties which we will study in class.

Observe the change in the number of connected components of the binomial graph for N = 100 as you vary prob = .005 to prob = .05 in steps of 0.001.

def show_binomial_graph_components(N,prob):  
    bgraph = make_binomial_graph(N,prob)
    i = 0
    for g in nx.connected_component_subgraphs(bgraph):
        g.name = "Component " + str(i)
        graphplot.save_graph(g)
        i = i + 1

show_binomial_graph_components(100,0.05)

To see the individual connected components, use the function above.

Exercise 6: An airline route map graph

NoFrills Airlines has the following route map for the cities that it services, where every pair of cities has service in both directions between the cities:

routemap =  [('St. Louis', 'Miami'), ('St. Louis', 'San Diego'), ('St. Louis', 'Chicago'), ('San Diego', 'Chicago'), ('San Diego', 'San Francisco'), ('San Diego', 'Minneapolis'), ('San Diego', 'Boston'), ('San Diego', 'Portland'), ('San Diego', 'Seattle'), ('Tulsa', 'New York'), ('Tulsa', 'Dallas'), ('Phoenix', 'Cleveland'), ('Phoenix', 'Denver'), ('Phoenix', 'Dallas'), ('Chicago', 'New York'), ('Chicago', 'Los Angeles'), ('Miami', 'New York'), ('Miami', 'Philadelphia'), ('Miami', 'Denver'), ('Boston', 'Atlanta'), ('Dallas', 'Cleveland'), ('Dallas', 'Albuquerque'), ('Philadelphia', 'Atlanta'), ('Denver', 'Minneapolis'), ('Denver', 'Cleveland'), ('Albuquerque', 'Atlanta'), ('Minneapolis', 'Portland'), ('Los Angeles', 'Seattle'), ('San Francisco', 'Portland'), ('San Francisco', 'Seattle'), ('San Francisco', 'Cleveland'), ('Seattle', 'Portland')]

What is the maximum number of hops that would ever be taken by a passenger on a single trip between any two serviced cities?
If you were a rich jet-setter traveling everywhere in across the country and were constrained to fly NoFrills (probably because of lucrative endorsement deals), which city would be the most optimal place for you to live, to minimize the number of hops you would have to make on average as you jet from home to your latest vacation spot?

Write Python functions to calculate the answers to these two questions.

Displaying Graphs

In this section, I show you how graphs are visualized by walking you through the code in graphutils.py

Creating graphs is all fine and good, but it’s hard to visualize what a graph looks like from just a printout of numbers. It would be much nicer to plot the graph in a nice graphical format. Luckily (surprise, surprise!), Python has a module for just that!

The module is called pylab and is part of the matplotlib package.

Pylab‘s basic operational metaphor is that of an “artist” working on some “current” canvas.

Create a figure (canvas) to work on.
Do various plotting operations that by default will apply to that figure.
Show the figure so that the user can see it.

It will be easier to analyze some code, so consider the following function to plot a graph in its own window (a “figure”):

import pylab
import networkx as nx

def make_graph_fig(aGraph):
    fig = pylab.figure()
    pylab.title(aGraph.name)
    pos = nx.spring_layout(aGraph)  
    nx.draw(aGraph, pos)
    return fig

def save_graph(aGraph,filename):
    fig = make_graph_fig(aGraph)
    fig.savefig(filename)

def show_graph(aGraph):
    fig = make_graph_fig(aGraph)
    fig.show()

When you call the show_graph function, passing it a Graph object you have made, a window like this should pop up:

Let’s take a look at the code, line by line:

pylab.figure() creates a matplotlib.figure.Figure object that will be the default plotting figure for all subsequent plotting operations.
pylab.title(aGraph.name) sets the title of the figure to the name of the graph.
nx.spring_layout(aGraph) creates a dictionary that keys the nodes in the graph to X-Y positions on the plot. This “spring” layout determines the positions of the nodes by imagining that every node is connected to every other node by springs, so that the final positions are a balance between being too far apart and too close together. This is also known as a “minimum energy” configuration. There are other layout techniques available in networkx, e.g. random_layout, but for most purposes, the spring_layout gives the most reasonable appearance.
nx.draw(aGraph, pos) draws the graph onto the figure using the node position layout determined by the spring layout function and labels each node with its identifier. If you wanted more control over the plotting process, you could use three separate operations:
- nx.draw_networkx_nodes(aGraph,pos)
- nx.draw_networkx_edges(aGraph,pos)
- nx.draw_networkx_labels(aGraph,pos)

If you take a graph that you have created and call show_graph with it, then a separate window should pop up onto your screen, displaying your graph with its name as the plot’s title.

It should be noted that all of the above pylab and networkx graph plotting functions have a myriad of optional parameters plus there are many, many more plotting functions available that we don’t have time to explore today.

Warning: pylab quirk with IDLE

If you had started IDLE first and opened your Python program from inside IDLE, the above function will only show a blank window on the screen. For some reason, pylab is highly dependent on exactly how the code is run and IDLE runs code differently depending on how the Python files are opened. You will need to write the following function:

def show_plots():
    pylab.show()

Call this function after the last of your show_graph calls to make all the graph plots visible. There can only be one call to this function, so be sure it is after any plotting calls you wish to make.

The reason for making this odd function that calls another function that you could have called just as easily will become apparent later in this lab.

Warning: pylab quirk with IPython

With IPython you get a different quirk. If you close windows generated by pylab, IPython freezes and segment faults. One option is to close all figure windows at only at the end when you are done running functions. The other is to use the save_graph function as indicated at the top of the lab, and get pylab to save the graph in .png format in your directory. You can view the image using a standard image viewer.

Anti-pylab rant

The thing to note about the above plotting code is that the setting of the figure’s title and the drawing of the graph on the figure are done with no explicit reference to the figure itself. There’s only one way in which this could have been done….global variables (cue scary music). This is what is called “pure side-effect programming” because the effects of what your code does are strictly side-effects on something else that is hidden from view. This is bad for many reasons, such as

It is not clear what is being affected by your code. For instance, what if you create multiple figures, which figure is being affected?
The resulting code is difficult to understand because it involves entities that are not part of the code.
The resulting code is dangerous and unstable. If you don’t know what exactly is being affected by your code, how do you know what anyone else’s code is doing, or even other parts of your own code? The above mentioned “quirk” is most likely caused by the use of global variables.
The system is not scalable. What unnecessary changes will need to be made just to enable someone to create multiple plots?

Bottom line: The world is full of bad programmers. Don’t be one of them. Don’t use global variables.

Modules

The functions in graphplot.py can be used with any graph-manipulating program we wish to write. This is why we put these functions in its own module so we do not have to write the same functions over and over again for every graph application that we write. We can then simply reuse the module by importing it whenever we need to use it:

One little gotcha…

Since Python assumes that code in imported modules represents invariant code, the Python system assumes that the imported code never changes. It is arguable whether this is a reasonable assumption or not, but we’re stuck with the fact that when a module is imported the first time, Python loads it into memory and then if that module is imported again, Python doesn’t actually do anything because it already has the module. This is called “caching” and it is useful for making systems significantly faster.

The effect of this caching is that if you are in fact, still working on the imported module’s code, as we are today, it is difficult to see the effects of any changes we make because Python will not reload the module unless we completely exit out of Python and restart the whole system.

The workaround for this is to add the following line just after the import graphplot statement:

reload(graphplot)

This will forcibly reload the graphplot module and enable us to immediately see any changes we make in it.

Plotting the Degree Histogram of a Graph

To plot the degree histogram of a graph we will use the nx.degree_histogram function and the bar-chart plotting capabilities of pylab.

From above, we see that a call to nx.degree_histogram(aGraph) will produce a list that maps a degree value, an index of the list, to the frequency of that degree. So, suppose that

hist = nx.degree_histogram(aGraph)

Once again, it is easier to look directly at the code in the graphplot module.

def make_histogram(aGraph):     
    fig = pylab.figure()
    pylab.title(aGraph.name)
    hist = nx.degree_histogram(aGraph)
    pylab.bar(range(len(hist)), hist, align = 'center')
    pylab.xlim((0, len(hist)))
    pylab.xlabel("Degree of node")
    pylab.ylabel("Number of nodes")
    return fig
def save_histogram(aGraph,filename):
    fig = make_histogram(aGraph)
    fig.savefig(filename)

Calling the function save_histogram with a networkx graph object and a filename with a .png extension, will save the node degree histogram in a .png file in your directory.

Much of the above code is the same as in show_graph and is for the same purposes.

pylab.bar(range(len(hist)), hist, align = 'center') takes the x and y values given by range(len(hist)) and hist respectively and creates a vertical bar chart from them. The option to align the vertical bars over the center of the x-values has also been specified to make the graph look nicer.

pylab.xlim((0, len(hist))) ensures that the bar chart’s x-axis goes from zero to one past the last possible degree value (to ensure that the last bar is visible).

pylab.xlabel and pylab.ylabel set the text labels for the x and y axis of the chart so that we know what is going on in it.