Data Structures III


  1. Introduction

    This is the third in a series of articles dealing with the use of data structures in game programming. Part I explained why this is an interesting topic, and explained Big O and generics, which are important to understand in order to get the most from this article. Part II began discussing actual data structures, covering arrays, lists, and collections. This article continues that discussion, and covers linked lists, stacks, and queues.

  2. LinkedLists

    All of the data structures that we have examined so far have not been ideally suited to insert or add operations. Lists were slightly better than arrays, but they still required linear time copies in some cases. What we need is a data structure that lets us insert in constant time, all the time. The LinkedList structure provides this functionality.

    The core of the LinkedList can be found in the LinkedListNode, the class that makes up LinkedList. Every LinkedListNode is made up of two things: a bit of data, and references to the node's neighbors in the list.

    public sealed class LinkedListNode
    {
        public LinkedListNode Next { get; }
        public LinkedListNode Previous { get; }
        public T Value { get; set; }
    }

    Every operation performed on LinkedList is based around “walking the links,” moving from one LinkedListNode to the next. LinkedList itself keeps a reference to the first node in the list, so the operations know where to begin. Inserting items into a LinkedList, then, will be much faster than doing so with an array or list. The implementation probably looks something like this:

    public void InsertAfter( LinkedListNode after,
        LinkedListNode newNode )
    {
        newNode.Previous = after;
        newNode.Next = after.Next;
        after.Next.Previous = newNode;
        after.Next = newNode;
    }

    Notice that the insertion does not depend in any way on how many elements are in the LinkedList: we’ve got a constant-time insertion. Removing is a similar operation, and will also take constant time. Both of these operations are much faster than with arrays or lists, which take linear time.

    There is of course, a tradeoff: unlike an array, a LinkedList does not have quick constant-time access to a random element in the structure. Whereas with arrays, you can simply use [] to jump to any element, a LinkedList must walk the links, starting from the beginning until it reaches the desired index. The code probably looks something like this:

    public LinkedListNode Get( LinkedListNode head, int n )
    {
        LinkedListNode toReturn = head;
        for (int i = 0; i < n; i++)
        {
            toReturn = toReturn.Next;
        }
        return toReturn;
    }

    Notice that for loop? We’re back in the slow realm of linear time, O(n).

    Although random order access is slower than lists and arrays, it’s important to note that the speed of in-order access is still fast. To see why, look at the following code, which calculates the sum of all if the members of an array and a linked list:

    // Add every element of an int array together: this takes linear time
    int[] someArray = new int[10];

    // Insert array filling code here

    int sum = 0;
    for (int i = 0; i < someArray.Length; i++)
    {
        sum += someArray[i];
    }

    // Add every element of a LinkedList together:
    // This also takes linear time

    LinkedList<int> someLinkedList = new LinkedList<int>();

    // Insert list filling code here

    sum = 0;
    LinkedListNode<int> node = someLinkedList.First;
    while (node != null)
    {
        sum += node.Value;
        node = node.Next;
    }

    The contents of both loops are executed n times, meaning both loops take O(n). In other words, if all your algorithm needs to do is operate on your entire data structure from one end to another, it'll be no different whether your structure is an array or a LinkedList—both will take linear time.

    So, at a high level, in-order access is just as fast as using arrays and lists. However, it may be worthwhile to consider that in-order access may be slowed down in a LinkedList that has had many items inserted or deleted. Despite the fact that insertions and removals operations themselves are fast, a LinkedList that has had many items removed and inserted in different locations typically has its data spread all over in memory. This means that as you move through your LinkedList, there will be many more cache misses as data is fetched from more "spread out" locations in memory. A full discussion of cache misses is out of scope for this document, but for more information, a quick internet search will yield plenty of information. It's a very important topic, but to callously gloss over the whole subject in a few words: cache misses are slow.

  3. Stacks

    Stacks are common data structures, and are commonly used in many different algorithms. They are interesting primarily for their functionality, not necessarily because of performance benefits. You can visualize a stack, just as you might guess from the name, as a stack of data. Just like a real world stack, when you add a new item, it goes on the top of a stack, and when you remove an item, you remove it from the top. This behavior is called "last in, first out," or LIFO, and makes many algorithms and data collections much easier to work with.

    The .NET Framework Stack implementation provides three main functions: Push, which puts a new item on the top of the stack; Pop, which removes an item from the top of the stack; and Peek, which examines the element on the top of the stack. Like List, is implemented using a dynamically growing array, so we know that adding new elements via Push will be O(1) if there is enough space in the array, and O(n) if not. The implementations of Pop and Peek are trivial, and take O(1).

    The performance characteristics of a Stack, then, are very similar to those of a List. For this reason, don't choose a Stack for performance reasons. Rather, choose a Stack when the algorithm lends itself to LIFO behavior.

  4. Queues

    Queues exist for the same reason as stacks: not for performance reasons, but because the functionality they offer is well suited to different types of algorithms. The difference is that stacks provide last-in, first-out data storage, whereas queues are first in, first out (FIFO).

    The .NET Framework implementation uses the functions Enqueue, Dequeue, and Peek. Enqueue adds an element to the end of the queue, Dequeue removes an element from the front of the queue, and Peek examines the next element to be de-queued. The internal implementation of Queue is similar to that of Stack, using an array to store data, which is resized as necessary as you add more elements. Again, adding new elements will be O(1) if there is enough space, O(n) if not. Dequeue and Peek are O(1).

  5. Conclusion

    That concludes Part III. Part IV finishes our discussion of data structures, and covers dictionaries and trees, both of which are extremely useful in game programming.