Comp200 Symonds Day 01.feb.28

Comp 200
Symonds Day
01.feb.28

Today we'll be comparing different algorithms for the same task, by looking at the running-time differences. You'll want to start up DrScheme, so you can actually try the examples.

In class, we've seen two different ways of sorting numbers, insertion sort, and mergesort. For a list of n numbers, we argued that insertion sort (``isort'') takes roughly proportional to n² steps (worst case); mergesort (``msort'') takes roughly n log(n) steps. In today's lab we put on our lab jackets and safety goggles, and actually run some timing experiments, checking whether the results jive with our theoretical analysis.

Settings

After firing up DrScheme, a couple of small things:

Select Language:Set Teachpack..., and type in the filename
\\gh.owlnet.rice.edu\comp200\sorts.ss
This gives us pre-written code for mergesort, etc., described more below. (gh is a nickname for great-horned.)
Select Language:Language Level..., and choose "intermediate student". (This will give us the keyword time, described below.)
(Optional) Open a browser to follow along on this page,
http://www.owlnet.rice.edu/~comp200/symonds.html

Press Execute, to make sure these selections are enabled.

Making Test Lists

In order to compare these different sorting algorithms, let's create some large lists of numbers to try them on. You are provided with functions nums-down, nums-up, nums-rand, which each take a single number and return a list of that length. (First try some small examples, and then create some large examples, giving names to those lists -- e.g., (define up100 (nums-up 100)).)

Timing

You can time a function call, say, (mSort (list 8 1 3 2)) by writing (time (mSort (list 8 1 3 2))). Of course, if you have a placeholder like up100, you could (time (mSort up100)). (Why do we really want to make the list in advance, and when doing the timing just use the placeholder?) It gives three numbers; the interesting one is the "cpu time", which is in milliseconds.

There is a slight inconvenience here, in that for our purposes we don't really care about the result of mergesort; we only care about the time information. Yet the result spews forth, taking up many lines of the screen. Instead, you can (time (empty? (mergesort up100))); presuming that empty? takes a negligible amount of time to run, this helps us keep our display readable.

Try running (time (empty? (mergesort up100))) several times in a row. Do you always get the same answer? If not, how do we come up with a "right" answer?

Today's Task

We will complete the following table with actual data. You will be divided into three teams, one for each row. Within a team you will be responsible for organizing yourselves to measure each cell. When you have an answer, bring it to the instructor, who will fill in the table.

input size

400

800

1600

Insert sort

up	3
down	324
rand	160

up	6.6
down	1655
rand	148

up	9
down	7506
rand	3237

Mergesort

up	28
down	32
rand	30

up	86
down	76
rand	68

up	145
down	160
rand	175

Quicksort

up	782
down	921
rand	40

up	3451
down	3599
rand	93

up	12445
down	13798
rand	215

Natural questions to ask

Do these times agree with our analysis from class? For instance, does the time for insertion sort really follow a O(n²) pattern?
Hint: When we double the size of the input, how much does k n² change (where k represents the constant factor hidden in the big-Oh notation)? That is, how much bigger is (2n)²k than n²k?
What about merge sort -- does it follow a O(n log(n)) pattern? Why or why not?
What about this "quicksort"? What behavior does it seem to give?
What are the constant factors for these different sorts (hidden in the big-Oh), and how do they compare? (Note: the constant factor also depends on the actual machine, since it relates size-of-inputs to number-of-actual-seconds.)

As a final exercise, recall the functions doublerA, doublerB. How many steps did each of those need to compute their answer? Try timing them on various inputs. What is the largest value of n for which (doublerB n) finishes in less than 30 seconds? A minute? What is the largest input, do you predict, for which doublerB could finish in less than a year (which happens to be 525960min, or about 2¹⁹ min)?

`n`	`(time (doublerA n))`	`(time (doublerB n))`	`(time (max-of-list (nums-up n)))`	`(time (max-of-list (nums-down n)))`	`(time (max-of-list (nums-rand n)))`
10
12
14
16
18
20

Examining the code for these functions, can you explain the timing results?

Back to the Comp 200 home page.