Comp 200 Homework 4

Due in class Wed 27 Feb 2002
OR
in my CS Department mailbox by 5:00PM Fri 01 Mar 2002

For all problems where you write a function, be sure to follow the Program Design Guidelines. This means you must have test cases. You need not rewrite a data definition that was used in a previous function. Turn in printouts of your functions from DrScheme; definitons window only, please.
  1. Recall the function find-number, a linear search that we saw in class. You will write a function binary-search: num list-of-num -> boolean, that performs the binary search algorithm we discussed in class, which is also described on pp 131-134 of the Harel book. You will need to use the functions first-half and second-half as discussed in class. You may also use middle, if desired. To use these functions, you should download the file halves.ss and load it as a Teachpack. (Notice that first-half and second-half can be used on both empty and odd-length lists. To understand their exact behavior, test them out.) Use find-number to compare your results with binary-search in your test cases. The results should be identical for all identical input. Afterwards, answer these questions:
    1. What is the worst-case running time, in big-O notation, for find-number? In your own words, explain why.
    2. Now do the same for binary-search.
    3. Which is faster?
  2. Recall the function insert-sort, a number sorting algorithm that we saw in class. You will write a function merge-sort: list-of-num -> list-of-num, that performs the merge sort algorithm we discussed in class, which is also described on pp 83-85 of the Harel book. You will need to use the functions first-half and second-half again. (Note: Don't forget! The function merge-sort requires a helper function merge. You will write merge as well.) Use insert-sort to compare your results with merge-sort in your test cases. The results should be identical for all identical input. Afterwards, answer these questions:
    1. What is the worst-case running time, in big-O notation, for insert-sort? In your own words, explain why.
    2. Now do the same for merge-sort.
    3. Which is faster?
  3. If I have an unsorted list of numbers, and I want to search for a number in that list, is it faster to (A) search for it with find-number or (B) to sort it first with merge-sort, then use binary-search? What if I had to do N searches rather than a single search, where N is the number of elements in the list? Which option would be faster then? Justify your answer in both cases.
  4. Suppose you have a personal computer which runs at 100MHz -- that is, it computes 100 million instructions per second¹. If you have an algorithm which solves a problem of size n in f (n) instructions, how long does it take to compute problems for various functions f and various input-sizes n?

    f
    log10(n) n n log10(n) n2 n3 2n n!
    n 102              
    103              
    106              
    109              

    Give your answer with only two significant figures (that is, the first non-zero digit and the next one.) Answer in time units that are meaningful to you -- e.g. "2.0 days" is better than "170,000 seconds". For fractions of a second ("s") or many many years ("yr"), it is standard to use metric prefixes (here are some distilled references).

    You may also use drScheme and some helper functions for unit conversions, if you like. For the last few columns, some values may exceed your calculator; you can ignore any numbers larger than 10100 ("one googol"). Note that in scheme, (expt 2 10) = 1024 (2 to the 10th power). log is built-in, but it is the natural log (base e); we can define log-base-10 though:

    (define (log10 x)
      (/ (log x) (log 10)))
    
    (log10 100) = 2
    (log10 100000) = 5
    

    Compute the 2100 entry by hand². How many times larger is the 21000 entry? (Is 2100 more than one trillionth of one percent of 21000?)

    For related fun, try this tour of 10+23m down to 10-15m. Have any of you seen the Hayden Sphere "Scales of the Universe" exhibit in the Rose Center at the Museum of Natural History in NYC?

  5. Harel, problem 6.3 (analyze Pwr).
    You only need to look at only one of Pwr1,Pwr2, and only one of Pwr3,Pwr4. (The members of each pair are similar -- the first is iterative and the second is recursive.) Before computing the answer, run an example by hand -- say 35. (For Pwr3,Pwr4 keep in mind how 5 is written in binary numerals; this provides an insight as to how the function works.)

  6. Harel, problem 6.4 (Binary search, worst cases).
    Observe that part (a) has several possible answer; part (b) is asking you how many answers part (a) had! Note: where he asks "describe an input list which ...", it should read "describe a name-to-search-for which ...". You can imagine always using the Houston white pages (which indeed has ~ 1million entries).

    Hint for (b): Instead of a phonebook with million (~ 220) names, first think about 15 = 24-1; Of the 15 possible names-to-search-for, how many of them are worst-cases? (If it helps, for each of the 15 names, see how many comparisons each one requires; then you can see how many are worst cases, and if there is a pattern to their arrangments.) (Perhaps think in terms of decision trees, or in terms of playing hi-lo from 1..15 and the binary representations [0001]2...[1111]2. Each guess involves learning whether the next digit is 0 or 1, and a solution is found when all the remaining digits are all 0.)


¹ This is an oversimplification; there turn out to be quite a few subtleties in correlating a computer's clock speed (e.g. 400MHz) with how many instructions per second it computes. Moreover, different families of computers (e.g. Pentiums, PowerMacs, Sparcstations, ...) have different instructions, making exact comparisons between these families more difficult. However, the idea of one clock cycle being one instruction is a handy first approximation.

² Your answer can include powers of 2 or 10, but should be in centuries (or larger), since it's not intuitive to me how (say) 290 seconds relates to centuries.

Note that for any number m, 2m ~ 10(m/3.322). Why, you ask? Remember that it takes m digits to write 2m in base 2. How many binary digits are needed for every decimal digit? That is the same as asking how many factors of two are needed to make a single factor of 10. By definition of log, the answer is log2(10) ~ 3.322 -- that is, in binary, 10 requires a smidgen more than 3 binary digits after the leading 1.

If you like, to get a rough approximation of the factorial function, you can use Stirling's formula: n! = (n/e)n sqrt(2 pi n) (1 + 1/(12n) + 1/(288n²) + O(1/n³)). [Note the use of big-O to let us use an equal-sign, instead of approximately-equal. You can ignore all the small terms (1/12n + 1/288n² + O(1/n³)).]