Lecture #14 Fri 15 Feb 2002 NEC Homework: average ~ 8.5 CONTRACTS AND DATA DEFINITIOINS???? Problem 5c == first fib fib-help Solutions posted this afternoon! More Reading Assignments as well, most likely! 0. Tree properties, review 1. Decision Trees 2. Comparison of Algorithms, DoublerA and Doubler B 3. Hi-Lo: Algorithmic Efficiency 4. Searching, Linear and Binary 0. So, as we recall from last time, with trees: size <= 2^(height+1) - 1 Equivalently: height >= log2(size+1) - 1 The important thing to remember is that the height of a full binary tree is log2(size). So, size <=~ 2^height height >=~ log2(size) For any given binary tree . . . 1. Decision Trees: Trees aren't only useful to us as data structures, but we can use them to analyze algorithms. decision-tree for max3 In the decision tree, leaves correspond to answers, root-leaf path corresponds to an individual computation, length of that path is length of that computation, height of the tree is the worst-case length of computation. (max3 a b c) a b > / \ < a c b c > /\ > < /\ < a c b c Consider the puzzle: you are given three coins, one of them counterfeit (either heavier or lighter). Your only way of testing them is a balance scale, but it costs you a buck per weighing. ...This is also a decision tree. What if there are are 7 coins? Note that different trees correspond to different algorithms. Are some better than others? Can we look for even better algorithms? Argue that *all* trees must be at least some height! (Argue via number of leaves needed, and our relations between the size of the tree and its height.) 2. Computation Trees Another way of determining running-time of a program: Make a tree for the execution of doublerA, doublerB. These functions always produce the same result, right? Recall the functions: (define (doublerA n) (if (zero? n) 1 (* 2 (doublerA (sub1 n))))) (define (doublerB n) (if (zero? n) 1 (+ (doublerB (sub1 n)) (doublerB (sub1 n))))) Let's say we're charged a cost for each arithmetic operation we perform, be it +,-,*, or /, and they're all equal . . . which algorithm costs more? (doublerA 3) (dB 3) | * / + \ (doublerA 2) (dB 2) (dB 2) | * / + \ / + \ (doublerA 1) (dB 1) (dB 1) (dB 1) (dB 1) | * / + \ / + \ / + \ / + \ (doublerA 0) (dB0) (dB0)(dB0)(dB0)(dB0) (dB0)(dB0)(dB0) Cost = 3 = height = n Cost = 1 + 2 + 4 = 7 = 2^(height)-1 =~ 2^height = 2^n These are not decision-trees; here the computation is not just a root-to-leaf path, but the entire tree. (We can call them "computation trees" perhaps; the name isn't important.) (combine branches with AND, not OR) doublerB took 2^n operations! doublerA takes how many? (Draw tree) How does n compare with 2^n? n 2^n --- ---------------------------------------- 10 K (thousand + ...) = 1024 20 Meg (million + ...) = 1048576 30 Gig (billion + ...) = 1073741824 250 #particles in universe. = you don't want to know So, some algorithms are better than others. 3. Let's look at one more: hi-lo: I choose a number in 0..31. (suppose, 10). You ask "is it X?" And I respond "higher" or "lower" . . . repeat. How many steps? Yes, log2(n), or nod(n), right? Look at the binary representation. What are you *really* asking? "Is the highest bit 0 or 1? Is the next bit 0 or 1? ..." How many steps? nod(n) = log2(31) = 5. Aside: Consider an alternate question somebody asked: "Is it even or odd?" That's really asking about low bits. 3. Searching So, as you recall, we had our Database in the previous homework assignment . . . Let's say it was a huge employee database and I wanted to search efficiently through it, finding employee numbers quickly. Let's not worry about the structures, let's just deal with numbers, because it REDUCES to the same problem. Task: Given a list of numbers, find out if a particular number is in it. We've already done this type of problem, With a function like this: ;;find-number: number list-of-numbers -> boolean ;;returns true if the number is in the list and false otherwise (define (find-number n l) (if (empty? l) false (if (= n (first l)) true (find-number (rest l))))) What can we do to help us out? Well, what if we ensured that the numbers in the list were in order, from least to greatest, would that help? Yes it would. Why? Well, it wouldn't necessarily help so much if the number WAS there, but what if it WASN'T there? We could quit searching sooner and save ourselves some time, right? (list 2 4 5 6 8 9 10 11 15 18 19 20 21 24 25 26) Is 11 there? yes, we had to call our function 8 times, making 8 comparisons (Let's say this "costs" us 8 units of computing time, and that 1 unit of computing time is equivalent to 1 complete ITERATION of the function.) Is 3 there? no, and, why, that took us 16 calls! We had to go through the whole list! But what if we modified our function? ;;find-number: number list-of-numbers -> boolean ;;returns true if the number is in the list and false otherwise ;;REQUIREMENT: the list of numbers must be in ascending order!!! (define (find-number n l) (if (or (empty? l) (< n (first l)) ;;remember the "or" evaluation rules!!! false (if (= n (first l)) true (find-number (rest l))))) Now, using our new find-number, is 3 there? no, and it only took us 2 calls to find that out! We were efficient!!! So, on average, for a list of length N, how many calls will it take us to find out if a given number is there or not? It takes us N/2 calls on average, right? This formula, N/2 describes the efficiency of our algorithm! More next time . . . we'll continue this topic . . .