Intro NP-complete problems require exponential time, as far as we know, but many are interesting/useful. So, what can we do? Easy options: Use slow algorithms, but only for smaller inputs. Use slow algorithms and lots of time More creative solutions: Find algorithms that are good on most/common inputs. E.g., SAT-solvers are now in common use. Use probabilistic algorithms E.g., for decision problems, return a correct yes/no answer with some probability, a maybe otherwise Use approximation For optimization problems, return a solution that is within some bound of being optimal This will be our focus Hardware solutions: Use parallelism Theoretically, could use an exponential number of processors, but in reality, benefit is at most a constant (#proc) factor. Use nondeterminism There is some work on building quantum computers that hold the promise of nondeterministic computing. NPC Approximation E.g., for independent sets, want to find an independent set that is close to maximal. E.g., for graph coloring, want to find out how to color graph in near-minimal number of colors. While we don't know what the optimum value is, we can often find a near-optimal algorithm Only interesting if algorithm is in P More precisely, we have a provable bound on how near optimal Approximation ratio = C_opt / C_approx for maximization C_approx / C_opt for minimization By def'n, >=1, and =1 would mean P=NP A function of input size: rho(n) Most interesting when a constant. Called a "rho(n)-approximation algorithm" Note: Even though NPC problems are all polynomially reducible to one another, the same does NOT apply to their approximations. E.g., we will see a 2-approximation alg for vertex cover, while there exists epsilon such that independent set in not approximable by n^epsilon unless P=NP. Even better than a constant-approximation alg is the ability to have a "dial" to choose the approximation ratio. More precisely, can obtain (1+epsilon)-approximation, for all positive epsilon. Called an "approximation scheme" Better yet is to have an approximation scheme whose running time is also polynomial in epsilon. Called a "fully-polynomial approximation scheme" Rare, and mainly known for knapsack-like problems And best, rather than having an approximation ratio, a multiplicative factor, we can sometimes approximate within an additive factor. Also, there are negative results of the form: problem P can't be approximated within some margin unless something unlikely (like P=NP). An active research area, with most results in the last 20 years. Vertex cover -- An approximation algorithm A vertex cover is a set of vertices that includes at least one endpoint of each edge Want to minimize Maximizing is uninteresting -- just pick all vertices Example -- an arbitrary graph Example -- a star See [CLRS] for NPC proof. Most approximation algorithms are greedy in some sense. A greedy, but poor, approximation algorithm Cover = empty set Edges = E while Edges not empty (u,v) = arbitrary edge in Edges add u to Cover remove all edges incident to u from Edges return Cover Polynomial time -- clearly Correctness? Run on examples Example of when it approximates poorly? A greedy and good approximation algorithm Cover = empty set Edges = E while Edges not empty (u,v) = arbitrary edge in Edges add u and v to Cover remove all edges incident to u and v from Edges return Cover Polynomial time -- clearly Correctness? Run on examples Algorithm puts vertices in cover in pairs. Optimal cover must include at least one of each pair in order to cover the corresponding edge. |OptCover| >= 1/2 |Cover| 2-approximation Traveling salesman -- An approximation algorithm Given a set of vertices ("cities") and weighted edges ("distances"), find tour visiting each city exactly once. See [CLRS] for NPC proof. A restriction: Assume distance function satisfies triangle inequality. E.g., Euclidian or Manhattan distance Still NPC. A greedy approximation algorithm Construct minimum spanning tree Construct walk/traversal of MST (returning to MST root) Construct tour from walk by deleting duplicate visits (Can easily be integrated into previous step.) Polynomial time How to bound approximation? cost(Tour) <= cost(Walk) By triangle inequality cost(Walk) = 2 cost(MST) cost(MST) <= cost(OptTour) Since OptTour is a spanning tree plus an edge. cost(Tour) <= 2 cost(OptTour) 2-approximation See [CLRS] about approximation without triangle inequality restriction. Subset sum -- A fully-polynomial approximation scheme Given a set S of positive integers, and a positive integer t. Find subset of S that adds to largest value <= t. Or, still NPC, simply find this largest sum <= t. To come up with an approximation alg (actually FPAS), it is often useful to start with an exact alg. Exact alg L will be a sorted list of ALL sums of members of S that are <= t. L = [0] // sum of the empty list for i = 1 to |S| // we can either add S_i or not L = delete_if_greater(merge(L, L+S_i),t) sum_opt = last(L) return sum_opt Run example: S={1,3,4,7} In general, what is |L| at bottom of loop? <= 2^i Approximation alg Idea: Throw away some elements in L on each iteration, so that |L| is polynomial in i. To make approximation "good", don't want to throw arbitrary elements out. Rather, throw away elements that are close to others that are kept. If y and z are close, and z < y, which should be keep and why? Keep z to be safe, since our constraint is an upper bound. delta = the quality of our approximation smaller is better 0 < delta < 1 Delete y iff we keep a z such that y/(1+delta) <= z <= y z is less than y and "close enough". trim(L,delta) Since L is sorted, we just have to check y/(1+delta) <= z, i.e., y <= z(1+delta). L' = [L_1] z = L_1 for i = 2 to |L| if l_i > z(1+delta) then add L_i to end of L' z = L_i return L' Run examples with same L, and different delta. Approximation for 0 < epsilon < 1 approx(S,epsilon) // return value should be within 1+epsilon // factor of optimal L will be a sorted list of SOME sums of members of S that are <= t. delta = ??? L = [0] // sum of the empty list for i = 1 to |S| // we can either add S_i or not L = delete_if_greater(trim(merge(L, L+S_i),delta),t) sum_approx = last(L) return sum_approx Run example Implementation note: Can combine each iteration's operations on L into one pass. delta = epsilon / 2|S| delta should scale along with input size |S|. That this value is appropriate only follows from detailed analysis. Analysis (1) Need to show sum_opt/sum_approx <= 1+epsilon. (2) Need to show polynomial in |S|, log t, 1/epsilon. (1) By induction and the def'n of trim(), forall y in ith L-exact, exist z in ith L-approx, y/(1+eps/2|S|)^i <= z <= y Thus, exist z in last L-approx, sum_opt/(1+eps/2|S|)^|S| <= z <= sum_opt Since z in last L-approx, z <= sum_approx By algebra, sum_opt/sum_approx <= (1+eps/2|S|)^|S| Useful results about e, lim |S|->inf (1+eps/2|S|)^|S| = e^(eps/2) <= 1+eps/2+(eps/2)^2 Since 0 < eps < 1, have tighter bound, ... <= 1+eps By transitivity sum_opt/sum_approx <= 1+eps (2) In ith L-approx, by def'n of trim(), L_(j+1)/L_j > 1+eps/2|S| I.e., elts differ by at least a factor of 1+eps/2|S|. Within range (1,t], L-approx contains at most floor(log_(1+eps/2|S|) t) elements Note that doesn't depend on i. Since L-approx also includes 0 and possibly 1, L-approx contains at most floor(log_(1+eps/2|S|) t)+2 elements By algebra and useful results about e, (log_(1+eps/2|S|) t) + 2 = (ln t) / (ln (1+eps/2|S|)) + 2 <= (2|S| (1+eps/2|S|) ln t)/eps + 2 <= (4|S| ln t)/eps + 2 Which is polynomial in |S|, log t, 1/eps, as needed. Within one unit of optimal Deciding 2-coloring is easy with a greedy algorithm. 3-coloring even a planar graph is NPC. (Proof omitted.) 4-coloring a planar graph is P, as previously mentioned. So, the 4-coloring algorithm (with a 2-coloring check) is an approximation within one unit of optimal.