COMP 130 Elements of Algorithms and Computation
Spring 2012

Priority Queues

Priority queues are a type Restricted Access Containers, i.e. "black-box" data stores that simply let the user "put" and "get" individual data elements without detailed knowledge of how the data is being stored or retrieved.   In a priority queue, the order in which data elements are retrieved depends on a "priority" value associated with each data element.

Download all the code from this discussion:  priority.py   (includes code to do put() and get() speed comparisons)

 First, let's consider the RAC class heirarchy that we already developed:

class RAC:
    """ Restricted Access Container.
    A container with an internal data storage that allows you to add items to that data store
    in an encapsulated manner.
    Retrieval of data items is not defined at this level.
    """
    def __init__(self):
        """ Constructor for the class.   Initializes the internal data store.
        """
        self._data = []

    def put(self, x):
        """ Add the given data item, x, to the internal data store.
        For convenience, returns the RAC itself.
        """
        self._data.append(x)
        return self
    
    def putList(self, aList):
        """ Convenience method to add a list if items to the internal data store.  
        Items are added in order starting from the front of the given list.
        For convenience, returns the RAC itself.
        """
        for x in aList:
            self.put(x)
        return self
    
    def isEmpty(self):
        """ REturns true if the internal data store is empty.  Returns false otherwise.
        """
        return 0 == len(self._data)
    
    def __str__(self):
        """ Returns the string representation of the internal data store, in order from oldest to the newest elements.
        """
        return str(self._data)
        
class Stack(RAC):
    """ Stack subclass of RAC where the data is retrieved in a 
    First In/Last Out (FILO) order.
    Inherits put(), putList(), isEmpty() and __str__() from RAC.
    """
   
    def __init__(self):
        """ Class constructor.  Simply calls the superclass constructor.
        """
        RAC.__init__(self)
        
    
    def get(self):
        """ Returns that latest element that was put into the Stack.
        Throws an exception if the Stack is empty.
        """
        return self._data.pop()
        

    
class Queue(RAC):
    """ Queue subclass of RAC where the data is retrieved in a 
    First In/First Out (FIFO) order.
    Inherits put(), putList(), isEmpty() and __str__() from RAC.
    """
   
    def __init__(self):
        """ Class constructor.  Simply calls the superclass constructor.
        """
        RAC.__init__(self)
        
    def get(self):
        """ Returns the oldest element currently in the Queue.
        Throws an exception if the Queue is empty.
        """
        return self._data.pop(0)
      

There are many ways in which to implement priority queues.   One way is to always hold sorted data internally.   A "put" operation therefore does an ordered insertion into the internal data store while the "get" operation simply pops off the minimum (or maximum) value, which will always be at one end of the data store.

In the code below, for simplicity's sake, we are assuming that the priority queue holds 2-element tuples where the first element is a "priority" measure, a number used to indicate relative priority, and whose second element is a "value" associated with that priority.

class OrderedPutPQ(Queue):
    """ A priority queue subclass of RAC where the put() method performs an ordered 
    insert into the internal data store.  get() is an O(1) retrieval from the end of the list.
    The queue is assumed to hold either priority values directly or tuples of the form (priority, value)
    """
        
    def __init__(self):
        """ Class constructor.  Simply calls the superclass constructor.
        """
        Queue.__init__(self)
        
    def put(self, x):
        """ Add the given data item, x, to the internal data store.
        x is a priority or a tuple of the form (priority, value)
        Overrides the put() inherited from RAC.
        For convenience, returns the RAC itself.
        """
        # self._data is assumed to be sorted such that the min priority is at the end of the list
        # thus enabling get() to be an O(1) operation that returns the min priority entry.
        for i in range(len(self._data)):
            if x > self._data[i]:  # comparing tuples only compares the first element
                self._data.insert(i, x)
                return self
        self._data.append(x)        
        return self
        
        def get(self):
            """ O(1) retrieval of smallest data element from end of internal data store.
            """
            return self._data.pop()

By the same token, the data could be stored internally without regards for its priority and the "get" operation simply removes the minimum (maximum) priority value:

class OrderedGetPQ(RAC):
    """ A priority queue subclass of RAC where the get() method removes the minimum
    priority entry in the internal data store.  put() is the O(1) append onto the internal data store
    inherited from RAC.
    The queue is assumed to hold either priority values directly or tuples of the form (priority, value)
    """"
    
    def __init__(self):
        """ Class constructor.  Simply calls the superclass constructor.
        """
        RAC.__init__(self)
        
    def get(self):
        """ Returns the minimum priority element in the queue.
        Throws an exception if the queue is empty.
        """
        # internal data store is assumed to be unsorted.  This enables put() 
        # to be an O(1) operation.
        minIdx = 0   # index of minimum priority entry
        for i in  range(1,len(self._data)):
            if self._data[i] < self._data[minIdx]:  # comparing tuples only compares the first element
                minIdx = i
        return self._data.pop(minIdx)

Class Exercise:  What are the pro's and con's of the two above approaches for implementing a priority queue?

Big-O in a Nutshell

Computers are fast enough these days that for small data sets, it almost doesn't matter what algorithm or implementation one uses to process it--the results are returned so fast that slowest technique gives essentially instantaneous results.  But the problem is that in our "Information Age", data sets are often huge, taking even the fastest of computers significant amounts of time to process.   So how does one evaluate how well an algorithm or implementation will fare on such large data sets?     Computer scientists often focus on the notion of "scaling", where rather than look at the computational time for a data set of a particular size, one asks the question, "How does the computational time change as the data size changes?"

That is, if a data set has n elements, how does the computational time relate to n?       For instance, is it independent of n, i.e. O(1)?   Is it linear in n, that is, if n doubles, the computational time doubles, i.e. O(n)?    Is is quadratic in n, taht is,  if n doubles, the computational time quadruples, i.e. O(n2)?    Another possibility is that the computational time is logarithmic in n, thyat is, if n doubles, the computational time only goes up by an additive amount proportional to the logarithm of 2,  i.e. O(log(n)).

Examples:

Retrieve an indexed value from an array (Python list) O(1)
Retrieve a dictionary entry with a key O(1)
Find an element in a list O(n)
Ordered insert into a list O(n)
DFS or BFS search through a graph O(n)
Insertion or selection soft of a list O(n2)
Search through a balanced binary tree O(log(n))
Merge sort of a list O(n*log(n))
   

Why the big deal?   Consider numbers like a billion data points.   For argument's sake, let's say that the processing algorithm takes an hour to process those billion data points.    If we up the number data points to 10 billion data points, an O(1) algorithm will not be affected at all--it still takes 1 hour   An O(n) algorithm will take ten times as long or 10 hours   An O(n2) algorithm will take 100 times as long, or 100 hours.   An O(log(n)) algorithm will only take 1/9'th longer, or only about an additonal 7 minutes!   

 For more information:

 

A Faster Priority Queue?

Now that we "feel the need for speed", how can we speed up the O(n) behavior of our priority queues above?  The trick here is to recognize that the priority queue doesn't really need to have the entire internal data store sorted.   All it really needs is a fast way to retrieve the one minimum (maximum/extrema) priority.

Consider the following data structure called a "heap".   A heap, conceptually, is a binary tree with the following properties:

One of the curious things about heaps is that they can easily be represented using arrays (Python lists).    Unfortunately, the internal mechanics of a heap and its associated algorithms are beyond the time constraints of this course, but suffice it to say at this point that since a heap is an optimally balanced tree, the insertion and deletion operations run in O(log(n)) time.   Thus a priority queue that uses a heap internally will run much faster than the above implementations, which may use more sophisticated heap implementations to achieve even better performance (see references below).

import heapq   # for heap functions

class HeapPQ(RAC):
    """ A priority queue subclass of RAC where the internal data store is a heap.  
    put() and get() are heap insertion and removal operations.
    The queue is assumed to hold either priority values directly or tuples of the form (priority, value)
    """
    
    def __init__(self):
        """ Class constructor.  Simply calls the superclass constructor.
        """
        RAC.__init__(self)
        
    def put(self, x):
        """ Add the given data item, x, to the internal heap data store.
        x is a priority or a tuple of the form (priority, value)
        Overrides the put() inherited from RAC.
        For convenience, returns the RAC itself.
        """
        heapq.heappush(self._data, x)
        return self
    
    def get(self):
        """ Retrieval of smallest data element from internal heap data store.
        """
        return heapq.heappop(self._data)
   

 Luckily for us, Python supplies us with a ready-made, heap implementation of a priority queue, called, amazingly,  PriorityQueue, which is in the Queue package.   (Warning:  use something like "import Queue as qu" to avoid a name clash with the above Queue class!)   The usage of PriorityQueue is exactly the same as the above implementations.

For more information:

Class Discussion:  Can a Queue.PriorityQueue type queue be used with non-static priorities, that is, priorities that can change over time (think airplanes that run out of fuel as they wait to land at an airport)?

Objects in Priority Queues

One can always use an object as the value in the (priority, value) tuple described in the above priority queue implementations.   But sometimes, the priority value is inextricably linked to the object itself, so we'd prefer to have the object determine its own priority rather than use an externally applied value.  

There is a special method of objects called __cmp__(self, other) that is used to compare two objects.     This method should return a negative integer if this object (self) is less than the other object, zero if they are equal and a positive integer if self is greater than other.

Implementing this method iin a class definition will allow instances of that class to be used directly as the priority entry in all the priority queues above.

Class discussion:   In what situation are the above OrderedPutPQ, HeapPQ and PriorityQueue implementation completely unusable as a priority queue?     Why is one forced to use a OrderedGetPQ queue with the objects themselves being used directly as the priority value?