Comp 200 Lecture
Hard Problems (NP)

Recap

Last time, before that long spring break:
We saw a very nice, complete picture for sorting: Mergesort requires n log(n) comparisons; AND we saw that ANY (comparison-based) sort requires 1/3 n log(n) comparisons. So nobody will ever do better than a factor of 3 over mergesort. [Note: "1/3 n log(n)" was a weak estimate; using Stirling's approximation we could have gotten that any sort requires n (log(n) - 1.5) comparisons.]

And we saw other problems: "Well-behaved" jigsaw puzzles with n pieces can be solved with attempted pieces (though in practice you like to do even better). As compared to the monkey-puzzle, which our only algorithm is not much more than trying-all-possibilites (in some organized way), which might require up to n! attempts, for all we know.
Are ther other clever people who know a better way to do the monkey puzzle? Consider the fact that you buy (well-behaved) jigsaws with several thousand pieces, while monkey-puzzles come with about 15 pieces. That's evidence that other people don't know a better approach!

Looking for boundaries

The question becomes for the monkey puzzle, the same question we had for sorting: Can we possibly do better, or is trying-all-possiblities about the best that can be done? (That is, can we have some argument to show that any algorithm requires proportional to n! steps?
Answer: Nobody knows. (If you can find a better method, or if you can prove that all algorithms indeed requrie approx. n! steps, you'll be a famous computer scientist!)

Some other difficult problems:

Reducing Problems

Is one of these problems (TSP, HamPath) easier than the other? Yes -- we'll show that:
HamPath <= TSP.
What do we mean when we write one problem as less than another?

Okay, details: Exactly how do we reduce HamPath to TSP?
That is, I set up Ye Olde HamPath Solving Shoppe. Somebody brings me a tough instance of a HamPath problem; i scratch my head, but i remember that i can run out the back door to my friend's Ye Olde TSP Solving Shoppe.

Here's my algorithm: I take the map I was given (which just showed whether certain cities had direct flights to another, but didn't mention any costs), and i convert it to a TSP instance:

Classifying Problems

Computer scientists see how some problems are feasible to solve (sorting, jigsaws), and others don't seem to be feasible: Monkey-puzzle, TSP, HamPath. Two broad categories:

Note that P is necessarily contained in NP (What does that mean, in terms of "...a problem efficiently solvable" and "...a problem with efficiently checkable solutions"? Why is the statement true?).

[Monday's lecture ended here.]


Monkeys vs Hams

What about the monkey puzzle: is that easier or harder than HamPath? If we're clever, we can indeed reduce HamPath to Monkey, meaning that HamPath is really no more difficult than Monkey is.

So what does "reduce HamPath to MonkeyPuzzle" mean, again? We want to go into business solving the HamPath problem, but we're assuming that we have a neighboring shop which solves the Monkey Puzzles, to whom we can subcontract out. That is, given a HamPath problem, let's create monkey-puzzle which is somehow "equivalent" (a solution to one puzzle can be viewed as a solution to the other).

So, a customer gives us a HamPath map with n cities (say, A, B, C, K, T, W; n=6). we'll create a monkey puzzle which uses one color per city (think "chicago orange", "new york orange", "houston orange", etc.), plus the colors blue and pink. It will be a long skinny monkey puzzle: 1-by-n (or maybe 1-by-3n or such).

What monkey-tiles do we make? Ones with adjacent city colors on left and right, and blue on the top and bottom. That is, if A and W are connected, then we'll make a tile "AW" on the left/right (and blue on top/bottom). If A is also connected to C, then we'll also make a similar tile "AC". Notice -- a solved puzzle will then just be a sequence reading a HamPath from left to right, perhaps:

        AW, WC, CB, BK, KT, TA.
where A is adjacent to W, W to C, etc.
A number of problems to iron out: We take this monkey puzzle to our friend's shop, have them solve it, and when they hand back the solved problem we just read off the solution. Note that to be sure this scheme works correctly, we'd have to prove two things:
  1. A solution to the monkey puzzle really does correspond to a meaningful Hamiltonian Path in the original map,
  2. and
  3. A Hamiltionian Path really does give rise to a legal solution to the monkey-puzzle we created.

Sidenote: The above reduction creates monkey-puzzles with large numbers of colors -- n+3 or so. Is the many-colored-monkey-puzzle problem harder or easier than a stricter version, where you are only allowed to use, say, five colors? How would you prove this? What is the minimum number of colors needed? [Answering these questions could be the bulk of your class-project.]

Turnabout is Fair Play

Thus, even though they seemed like unrelated problems, we have reduced HamPath to MonkeyPuzzle: any HamPath problem is really just type of MonkeyPuzzles in disguise.

How about the other way? As it turns out, Yes!, we can actually reduce MonkeyPuzzle to HamPath.

This reduction is more difficult than the one just shown: We're given a monkey-puzzle, and we need to create a map, where a hamiltionian circuit corresponds to a solution to our original monkey-puzzle. It involves:

  1. Creating a city for each tile-in-a-certain-location. That is, there will be a city with the name "Tile-G at location 3,2", and another city with the name "Tile-G at location 4,7", etc. (How many cities? n² of them, so far!)
  2. Note that a hamiltonian path will go through all of the above cities; we need to concoct some way to figure out how to make a path correspond to an legal monkey-solution. To that end, we'll make further cities which say "the city we visit next corresponds to a real tile location", and more which say "the city we visit next does not correspond to a real tile location".
  3. The hard part is connecting these cities in a clever way, so that any hamiltonian path ends up indicating that exactly one of the "Tile-G at location ..." cities is indicated as true, and all the other "Tile-G location ..." cities are indicated as not true.
Again, it has to be argued that a HamPath corresponds to a legal solution to the corresponding monkey puzzle, and vice versa. [Working through the details of this would make a fine class project (I'd specify the reduction, but you show how it's correct; I'd provide help with you on this.)]

Equally Difficult Problems (NPC)

Hey, what does it mean that HamPath and Monkey both reduce to each other? It means they're essentially equivalently difficult -- or put differently, the two problems are really just disguised versions of each other! As it turns out, nobody's been able to find a good alg for solving either.

Hold on to your seat, we're about to get more abstract:
Fact: Any problem with efficient-checking-of-proposed-solution can be reduced to MonkeyPuzzle!
Wow, this is a very strong statement: any problem with that property, even problems which haven't been thought up yet. (This was proved by Steve Cook, 1972. Possible project: understand this result. Quick insight: use monkey-tiles to simulate scheme expressions, with tiles like "apply function" and "the placeholder sum"; the bottom row of the monkey puzzle is the initial problem, and every row above it, in a valid monkey puzzle, would correspond to a single step of the stepper.)

We say that Monkey-puzzle is NP-complete ("NPC"), since any problem in NP reduces to it. (It's as tough as any problem in NP.) Note that finding an efficient solution to MonkeyPuzzle means finding an efficient solution to any problem in NP!

What happens when you put the above fact together with MonkeyPuzzle <= HamPath? Yes -- Any problem with efficient-checking-of-proposed-solution can also be reduced to HamPath. (The notation "<=" is suggestive of this.) There are many NP-complete problems -- including these two and TSP; finding a way to efficiently solve any of these problems will mean an efficient way to solve all of them!
[Ian: Pass around Garey&Johnson, with its 90 pages of terse problem descriptions.]

P vs NP

Is P = NP? That is, checking a proposed soln is easier than finding soln? We can't prove it's more difficult (or, that it's not more difficult)! Cf. Simpson's halloween episode, where he enters 3-d world; one of the floating statement is "P=NP", as well as a purported counterexample to fermat's last th'm.

It is not known whether P = NP -- that is, if any problem where a solution is easily checkable also means it's easy to solve. (Seems like it should be harder to solve a problem than to just verify a proposed solution, but we can't prove this.) A very embarrassing lack of knowledge.

Note that there is one big difference between Monkey-Puzzle, HamPath, etc, and our discussion of insert-sort, merge-sort, etc.: Here we are talking about the problem, and not a particular algorithm for the problem!

Beyond NP

You might be wondering, what is a problem which isn't in NP? Can't all purported answers always be efficiently checked? As it turns out, no: