[an error occurred while processing this directive] [an error occurred while processing this directive] List Processing
Rice University
COMP 200
Elements of Computer Science
 
List Processing

The List Structure

A list is a pre-defined structure that comes with Scheme.  It is designed to hold an arbitrary number of data elements.  Here is the definition of a list.

A list of data elements is either:

  1. empty or
  2. (cons elt loelt), where elt is some data element and loept is another list of data elements.  Here elt is called the first element of the list and loelt is called the rest of the list.

In the above definition, note that when a list is not empty, it is defined in terms of another list.  This kind of definition is said to be recursive.  It is based on an abstract mathematical concept called recursion.  The case when the list is empty (case 1) is called the base case.  The case when the list is not empty and is defined in terms of another list (case 2) is called the recursive case.  Recursive decomposition, that is decomposing a problem into smaller problems that are of the same structure as the original one, is a paradigm of fundamental importance in computing.

Examples

  1. Is empty a list?
    Yes, by case 1 of the definition of a list.
     
  2. Is (cons 'a empty) a list?
    Since empty is a list by case 1, by case 2 of the definition, (cons 'a empty) is a list.
     
  3. Is (cons 5 (cons 'a empty)) a list?
    From example b/,  (cons 'a empty) is a list; thus by case 2 of the definition, (cons 5 (cons 'a empty)) is a list.
     
  4. Is (cons 5) a list?
    No, it is not.  Why?

List Construction

To construct a list, we use empty to get an empty list and cons to get a non-empty list with a given first and a given rest.  For examples,

In DrScheme, language levels "Beginning Student with List Abbreviation" and above provide a short hand notation for constructing list using the "list" function.  For examples,

List Selectors

When the list is not empty, we use the function first to select the first element and the function rest to select the remaining sublist.  For examples,

Remember while first returns a data element, rest returns a list, not a data element.

NOTE: Using first and rest on an empty list will result in an error.  Try to call first and rest on empty in the Interactions pane of DrScheme and see what happens.

List Processing Template

Because the defining structure of a list is recursive, the template for functions that process lists is also recursive and mirrors the structure of the list that is being processed.

(define (fun-for-list a-list)
  (cond
    [(empty? a-list) ...] ;; base case
    [else                 ;; recursive case
      ... (first a-list) ... 
      ... (fun-for-list (rest a-list)) ...]))  ;; recursive function call

Note that the recursive function call is made on the rest of the input list and NOT on the input list.  This is the most basic form of recursion.  It is called natural recursion.

We now write a few functions to process lists.

The Many Ways to Sum a List of numbers

Here is a fine way to add up all the numbers in a list of numbers:

(define myList1 (list 5 4 3 2 1)) ;; this is an abbreviation of (cons 5 (cons 4 (cons 3 (cons 2 (cons 1 empty))))
(define myList2 (list -2 -1 0 1 2 ))

;; contract:
;; fine_add: list-of-num -> num
;; purpose:
;; (fine-add a-lon) adds up all the numbers in a-lon
;; examples/tests
(check-expect (fine_add empty) 0)
(check-expect (fine_add myList1) 15)
(check-expect (fine_add myList2) 0)
;; template
;; (define (fine_add a-lon)
;;  (cond
;;    [(empty? a-lon) ...] 
;;    [else                 
;;      ... (first a-list) ... 
;;      ... (fine_add (rest a-lon)) ...]))
;; definition:  
(define (fine_add a-lon)
   (cond
      [(empty? a-lon) 0]
      [(else (+ (first a-lon) (fine_add (rest a-lon)))]))

  But is there any other way to do this?

Well, there's always a bad way...

;; bad_add: list-of-num -> num
;; adds up all the numbers in a list of num
;; examples/tests
;; ...
;; definition
(define (bad_add a-lon)
   (cond
      [(empty? a-lon) 0]
      [else 
         (cond
            [(empty? (rest a-lon)) (first a-lon)]
            [else (+ (first a-lon) (bad_add (rest a-lon)))])]))

Why is this bad?

Answer: By checking the emptiness of (rest a-lon), it violates the encapsulation of (rest a-lon).

Let's think about what we are trying to do... We're trying to find another way to add up the numbers in a list of numbers (lon). So far, we've used "natural recursion" to add up the values on the way out of the recursion. I'll call this "reverse accumulation" (note that this is not an industry standard term). Reverse accumulation works great for many processes, but it doesn't work for everything.

Another way we can add up the numbers is to add them up on the way into the recursion. I'll call this "forward accumulation". Let's see how we might do this:

"The sum of all the numbers in a list is first plus the the 'running total' which is the sum of all the previous numbers."

The problem is that the first number is treated slightly differently since there is no running total yet. Let's deal with this case first, since we are always fundamentally dealing with the beginning of a list. Since the processing of the rest of the list is different than that of the first, we simply delegate that responsibility to another function:

;; good_add: lon -> num
;; adds up all the numbers in a lon
;; ...
;; ...
;; definition
(define (good_add a-lon)
   (cond
      [(empty? a-lon) 0]    ;; The sum of an empty list is still 0!
      [else (good_add_help (rest a-lon) (first a-lon))]))  ;; first is the running total so far.

Notice that the function used in the else part is not the original function "good_add" because the function that is needed requires an additional input parameter, namely the running total. good_add is not a recursive function!

So let's see what good_add_help, the "helper function", does:

;; good_add_help: a-lon num -> num
;; adds all the numbers in a lon to the supplied accumulator
;; and returns the result.
;; ...
;; ...
(define (good_add_help a-lon acc)
   (cond
      [(empty? a-lon) acc]   ;; the sum is the running total 
      [else (good_add_help (rest a-lon) (+ (first a-lon) acc))]))   ;; add first to accumulator and recurse.

The running total is called the "accumulator". Functions of this style are called "accumulator style algorithms".

So why use accumulator style algorithms? It looks from the simple example above that in only means more code.

Possible reasons for using accumulator style algorithms:

  1. It preserves the encapsulation.
  2. It follows the data definition-derived template.
  3. It creates "tail recursive" algorithms -- algorithms where the unprocessed recursive result is returned, which can be optimized for space and speed.
  4. It may not be possible using reverse accumulation -- at least not without violating various precepts of good programming style.

Things to consider when writing (forward) accumulator-style algorithms

  1. Which way is the accumulated data moving?
    1. Into the list from the front of the list towards the empty list: This is forward accumulation.
    2. Out of the list from the rear of the list (the empty list) towards the front.: This is reverse accumulation ("natural recursion")
  2. The main ("outer") function's job is to set up the call to the helper function by calculating the initial value of the accumulator(s).
  3. The main function is not recursive.
  4. The helper ("inner") function is usually recursive.
  5. The helper function has at least one more input parameter than the main outer function--it needs to take the accumulator.
    1. There may be more than one accumulator
    2. Sometimes, accumulator values are simply passed onto the recursive call without being processed, until the right situation arises for them to be used. See for instance an "find minimum" algorithm.
    3. Sometimes, the new accumulator value is simply a replacement for the old accumulator value. See for instance, the "get last element" algorithm.
  6. The inductive case of the helper is never the same as the inductive case of the main function. However, the base cases may or may not be the same.

 

There's a second way to write the main function to sum the list of numbers:
;; good_add2: lon -> num
;; adds up all the numbers in a lon
;; ...
;; ...
(define (good_add2 a-lon)
   (good_add_help a-lon 0))

Pros:
  1. Shorter code
  2. Treats first element the same as the second.
  3. Easier to understand for some people.
Cons:
  1. Doesn't follow our original template exactly (it fits a new view on the problem--see below).
  2. Harder to understand for some people.
  3. Requires that the helper function have a well defined initial value that is not dependent on any other input.

Often this style of accumulator algorithm comes around because one initially conceived the processing of the list in terms of the what is now the helper function. The main function is really just a "wrapping" -- an encapsulation -- of the helper function that hides the initialization of the accumulator from the user. A good way to think of this (Thanks to Dr. Greiner for pointing this out) is that the good_add2 function is not a recursive function but rather just simply a function on an encapsulated type, list-of-numbers. good_add2 just processes the list without breaking its encapsulation at all, just like a function on an number would process the number.

Here's a third type of implementation that is halfway between the first two:

;; good_add3: lon -> num
;; adds up all the numbers in a lon
;; ...
;; ...
(define (good_add3 a-lon)
   (cond
      [(empty? a-lon) 0]
      [else (good_add_help a-lon 0)]))

This technique preserves the invariant template, but yet allows a clear initialization of the accumulator.

One requirement for using the above two methods is that the accumulator must have a well-defined initial value that can be used. This is not always true, forcing you to use the longer first implementation.

We find that we use all three styles in my own work. Because of structural enforcements of invariants plus more subtle theoretical reasons, the third method here is chosen over the first or second.

The first technique shown above will always work because it strictly follows the invariant template.

Our advice: Apply the First Law of Programming, which is: Your code should reflect the way you think about how you solve problem.

All 3 types of implementations will be accepted, but the first and third methods are recommended.

Let's try another algorithm: Let's try to find the last element in a non empty list.

First, the data definition:

;; a NElon is a non-empty list of numbers
;; note that a NElon is a lon.
;; (cons first rest)
;; where first is a number and rest is a lon
Here is one way to solve the problem:  (this is to be "acted out" in class)
#| Template:

(define (f-NElon a-nelon...)
   ... (first a-nelon)...(rest a-nelon)...)
|#

;; note that the empty? clause is missing in the template
;;
;; find_last: NElon -> num 
;; returns the last element of a NElon
;; examples/tests:
(check-expect (find_last myList1) 1)
(check-expect (find_last myList2) 2)
;; template
;; ... student to complete ...
;; ....
;; definition
(define (find_last a-nelon)
   (cond
      [(empty? (rest a-nelon)) ...]  ;; student to complete
      [else ... (rest a-nelon) ...]));; student to complete

One thing we do not like about the above solution is that we have to check for the emptiness of (rest a-nelon).  In programming parlance, we say we are "breaking the encapsulation" of (rest a-nelon).  Is there a way to find the last element without checking for the emptiness of (rest a-nelon)?  The idea here is to pass the first and rest of the original to a helper function and let it figure out the last element based on the information it receives.

Let's try that again:

;; good_last: NElon -> num or sym
;; returns the last element of a NElon
;; ...
;; ...
(define (good_last a-nelon)
    (good_last_help (rest a-nelon) (first a-nelon))

;; good_last_help: lon, num -> num
;; returns the last element of a lon 
;; or the accumulator if the lon is empty.
;;
;; examples/tests
(check-expect (good_last_help empty 5) 5)
(check-expect (good_last_help myList1 7) 1)
(check-expect (good_last_help myList2 99) 2)
;;
;; template
;; (define (good_last_help a-lon acc)
;;  (cond
;;    [(empty? a-lon) ...] 
;;    [else                 
;;      ... (first a-list) ... 
;;      ... (good_last_help (rest a-lon) ...) ...]))
;;
;;definition
(define (good_last_help a-lon acc)
   (cond
      [(empty? a-lon) ...] ;; student to complete
      [else
         ... (first a-lon) ...
         ... ( good_last_help (rest a-lon) ....)])) ;; student to complete

We often call this sort of algorithm a "pass-ahead" algorithm because of how information is passed from forward in the recursion.

Notice that the helper takes a lon, not a NElon. Why?

If a-nelon is a NElon, then what can you say about (rest a-nelon)?

Class exercise: Write a function, reverse_list, that will take a list and return a list with the elements in reverse order:

"reverse_list test cases:"
(check-expect (reverse_list empty)empty )
(check-expect  (reverse_list (list 1 2 3 4 5))(list 5 4 3 2 1))
(check-expect  (reverse_list (list 'e 'd 'c 'b 'a))(list 'a  'b  'c 'd  'e))

Note that there are at least three ways to write this function.

Here's a classic example of when you absolutely need an accumulator algorithm:

Consider the way that we tend to print lists so they are easily readable ("pretty print"):

empty --> "()"

(cons 'a empty) --> "(a)"

(cons 'a (cons 'b empty)) --> "(a b)"

(cons 'a (cons 'b (cons 'c empty))) --> "(a b c)"

The problem here is that while the list of symbols above are recursive, the pretty printed list is not recursive. Notice that the first element does not have a space in front of it while the second element does.

Thus we are forced, on a cons list, to treat the first element differently than the second. From the second element on, not including the ending paranthesis, the output is recursive, so we can use a recursive function.

That is, consider that

"(a b c)" is really "(a[the rest of the list])"

and [the rest of the list] is " b c" (note the spaces).

What is [the rest of the list] when we start with "(a)"?

This function can only be written using the first technique above! -- Why?

Try writing the pretty-printing function without peeking at the solution. Advice: use the built-in string-append and symbol->string functions.

The code from today, including the class exercise, can be downloaded here.

Important Exercise:

Write the function:

;;lastbiggerthanfirst?: lon --> boolean
;; returns true if the last element is bigger than the first
;; returns false if list is empty.
(define (lastbiggerthanfirst lon)
   .....)

Answer: lbtf.scm

 

 


© Dung X. Nguyen & Stephen Wong

Last revised 11/17/2009 11:12 AM

Maintained by the professor; see contact information on the course home page