(Weak) (Mathematical) Induction

Introductory example

f(n) = 0, if n=0; n+f(n), if n>0

Can recognize that f(n) = ∑_i=0…n i
- Can then look up closed form solution, n(n+1)/2
- Problem: Relying on other authorities doesn't teach you how to answer the question yourself. Most problem's answers not in books.
Can solve by realizing…
- f(n) = 0 + 1 + 2 + … + (n-2) + (n-1) + n
- f(n) = n + (n-1) + (n-2) + … + 2 + 1 + 0
- 2f(n) = n + n + n + … + n + n + n = n(n+1)
- Problem: Pattern identification is implicit. Pattern correctness is assumed. Both simple in this case, but they aren't always.
- Problem: Doesn't tell us anything about how to solve other problems.
Let P(n) = "f(n) = n(n+1)/2"
- Start with explicit identification of pattern.
- Proof by induction of ∀n∈ℕ.P(n):
  - Base case: P(0)
  - Inductive case: P(i)→P(i+1) — ∀i∈ℕ
- Inductive case structure corresponds exactly with recursive definition case structure

More examples

We probably won't have time for all of these. There are more examples in the book.

fact(n) ≤ nⁿ
- 0⁰ undefined — Need to revise statement to only hold for n≥1.
∀ n∈ℕ, n³-n is divisible by 3
∀ n≥5 and n∈ℕ, 2ⁿ>n²
- For motivation, first compare 2ⁿ and n² for small values.
- Compare to equivalent: &forall n∈ℕ, (n≥5 → 2ⁿ>n²)
- Compare to equivalent: &forall n∈ℕ, 2ⁿ⁺⁵>(n+5)²
Suppose R is a partial order on a set A. Prove that every finite, nonempty set B⊆A has an R-minimal element.
- Intuitively, what is this saying?
  - Since R is a partial order, not every element in B is necessarily related to every other element. But this claims there is some element x that is "less-than-or-equal-to" any other element that it is related to by R. I.e., for each element y, either R(x,y) or neither R(x,y) nor R(y,x). Since R is a partial order, there might be multiple minima, but the claim is that there is at least one. This is also effectively saying that we can't have a "loop" in the ordering, such as R(a,b), R(b,c), R(c,a).
- How to reword as ∀ n, P(n)?
  - ∀ n≥1, ∀ B⊆A, (B has n elements ⇒ B has an R-minimal element)
- We prove this by induction on the size of set B. Let n=|B|.
  - When n=1, its single element is clearly R-minimal.
  - Now consider when n>1. Let B={b₁, …, b_n}, for some arbitrary ordering of elements. Let C={b₁, …, b_n-1}. By induction, C has an R-minimal element, and let i be its index. In B, b_i is still "less-than-or-equal-to" or unrelated to all other elements, except for possibly b_n, so lets consider how those two elements might be related.
    - If they are not related by R, then b_i is R-minimal in set B. (b_n might also be R-minimal in this case, but that is irrelevant.)
    - If R(b_i,b_n), then b_i is R-minimal in set B. (b_n is definitely not R-minimal, but again that is irrelevant.)
    - If R(b_n,b_i), then we claim that b_n is R-minimal in set B. To show this, consider an arbitrary b_j, such that j is neither i nor n. Since b_i was R-minimal in C, for any b_j such that R(b_i,b_j), then R(b_n,b_j) by transitivity. And for any b_j such that b_i and b_j are unrelated, then either b_n and b_j are unrelated, or R(b_n,b_j). In particular, R(b_j,b_n) is impossible, since transitivity would imply that R(b_j,b_i), which would be a contradiction.
  - These cases are exhaustive, and in each we have identified an R-minimal element.
∀ n≥3, if n distinct points on a circle are connected in consecutive (i.e., consistently clockwise or consistently counter-clockwise) order with straight lines, then the interior angles of the resulting polygon add up to (n-2)×180°.

Note that we've generalized induction to the form: To prove ∀n∈{k,k+1,k+2,…}.P(n)

Base case: P(k)
Inductive case: P(i)→P(i+1) — ∀i∈{k,k+1,k+2,…}

Incorrect uses of induction

Example:

What's wrong with this inductive proof:
P(n) = "n > 1000".
Proof: … (After all, it's easy to show P(k) → P(k+1).)

Okay, we'll patch that problem for this example:

What's wrong with this inductive proof:
P(n) = "n>1000 or n=0".
Proof: …

Example:

False claim: All Rice students are in the same college. P(n) = "for all sets of n students, those students are all in the same college". Prove ∀n≥1.P(n). We know this conclusion is wrong, so what's wrong with the following?
P(1): Yep, in a singleton set {s₁}, every student is in s₁'s college.
P(i) → P(i+1): We can assume that P(i) holds: for all sets of i students, those students are all in the same college. We need to show P(i+1). Start with an arbitrary set of k+1 students, S = {s₁, …, s_k, s_k+1}, and we'll show they all are in the same college as s₁.

By inductive assumption, the set {s₁, …, s_k} are all in the same college as s₁. Similarly, so are {s₁, … s_k-1, s_k+1}. Since "is in the same college as" is transitive, all elements of S are in the same college as s₁. Q.E.D.

Strong (mathematical) induction

Recall weak induction — to prove ∀n∈ℕ, P(n), show
- Base case: P(0)
- Inductive case: P(i) → P(i+1), for arbitrary i∈ℕ
Strong induction allows you to assume more in the inductive case —
- Base case: P(0)
- Inductive case: (∀j≤i, P(j)) → P(i+1), for arbitrary i∈ℕ
- Alternate presentation — don't actually need the base case, because the inductive case subsumes it.

Examples

Fibonacci number closed form:
- Start with recursive definition:
  - F₀ = 0
  - F₁ = 1
  - F_n = F_n-2 + F_n-1, if n≥2
- Prove closed form is F_n = ((aⁿ-(bⁿ)/√5, where
  - a = (1+√5)/2 — golden ratio
  - b = (1-√5)/2 — golden ratio conjugate or silver ratio
- Note that inductive form of the proof needs to be reworded with two base cases, like the definition.
- In the proof, the following lemmas are helpful:
  - a² = 1 + a
  - b² = 1 + b
Division: ∀ natural numbers n,m, where m>0, there are natural numbers q,r such that n=mq+r and r<m.
- Aside: A non-constructive definition of quotient and remainder. Also, doesn't even claim uniqueness.
- Induction on n or m?
- for arbitrary positive m, prove ∀n.P(n), where P(n) = "∃q.∃r.(n=mq+r ∧ r<m)"

Relation to Weak Induction

As name implies, stronger than weak induction, i.e., subsumes it.
- Because allowed to assume more in inductive case.
- Every weakly inductive proof is also a strongly inductive proof.
- Thus, given strong induction, don't need weak induction. (Although weak induction might lead to a clearer presentation.)
To ponder: Given weak induction, do you really need strong induction?
- I.e., given a strongly inductive proof, can it be (in general) reworded into a weakly inductive proof?

Structural induction

Introductory example

Define: a binary tree is either empty, called a leaf, or (make-node datum t₁ t₂), where t₁,t₂ are binary trees.

Prove: for any binary tree t, #leaves(t) = #nodes(t) + 1.

First, define #nodes(), #leaves() recursively.

How to prove?

(Strong or weak) induction on #nodes or #leaves.
(Weak) induction on height.
Somehow trying to pair up leaves and nodes, with one leaf unpaired. How in general, for arbitrary binary tree?
Structural induction.

Example

Define: for an arbitrary n, an n-ary tree is either empty, or (make-node datum ts), where ts is an n-tuple of n-ary trees.

Prove: For any n-ary tree t, #nodes(t) ≤ n^height(t)-1

Again, first define #nodes(), height() recursively.

Semi-aside: other ways to prove same thing?

As before, strong induction on #nodes or height.
Note: induction on n not helpful.
Prove inductively for complete n-ary trees, then argue arbitrary tree has no more nodes that complete tree.
Prove as a sum of #nodes per level for either complete or general tree.

Example

Prove: For any propositional WFF f, #connectives(f) ≥ #propositions(f) - 1.

Proof has same number of base and inductive cases as the data definition.

Example

Define strings: For a set Σ (the alphabet), the set Σ^* (strings over Σ) contains:

λ (empty string)
wa, where a ∈ Σ, and w ∈ Σ^*.

Define length.

Define concatenation: For v,w ∈ Σ^*, define vw ∈ Σ^* as follows:

If w=λ, vw = v.
if w=ua (for a letter a and a string u), vw = (vu)a.

Prove: length(vw) = length(v) + length(w).

Use induction on what? v or w? Why?
Induction on w matches the definition of concatenation. P(w) = "length(vw) = …" (for arbitrary string v).

Example

; append: list, list -> list
;
(define (append x y)
   (cond [(empty? x) y]
         [(cons?  x) (cons (first x)
                           (append (rest x) y))]))

Prove:

For any list a, (append a (append a a)) = (append (append a a) a)

Hmm, proof gets stuck.

Ironically, by proving a stronger statement, your life actually becomes easier! Prove:

For any lists a,b,c, (append a (append b c)) = (append (append a b) c).

First, must decide what to do induction on: a, b, or c?

Induction on a: let P(a) = "For all lists b,c, (append a (append b c)) = (append (append a b) c).

Known as generalizing, strengthening, or loading the inductive hypothesis.

Induction Overview

Why all these different forms of induction?

To ponder: Is weak induction on numbers enough?

Induction can be used on any data defined recursively (or inductively), i.e., anything tree-like.

Note: Other forms of induction, e.g., co-induction and transfinite induction, allow you to do different kinds of things. They are more advanced, but still sometimes useful in CS — consider streams (e.g., I/O) and infinite trees.

Structuring proofs correctly

What's wrong with the following?

Let our domain be 2-3 trees. Have some P(t), but for this example, don't care what it is.

Base case: P(empty) holds because …
Inductive case: Inductively assume P(t₁), P(t₂), and P(t₃), for arbitrary 2-3 trees t₁, t₂, t₃. Also assume k₁ < k₂, for arbitrary key values k₁, k₂. Then P((make-3-node t₁ k₁ t₂ k₂ t₃)) holds because …

Biggest problem: Never reasons about 2-nodes. Structuring this way obscures the fact that the case split is incomplete.

Base case: P(empty) holds because …
Inductive case: Inductively assume P(t₁), P(t₂), and P(t₃), for arbitrary 2-3 trees t₁, t₂, t₃. Also assume k₁ < k₂, for arbitrary key values k₁, k₂. Then P((make-3-node t₁ k₁ t₂ k₂ t₃)) holds because … And P((make-2-node t₁ k₁ t₂)) holds because …

Fixes the previous problem, although we still can't just look at the beginning of each case to verify this.

Other problems: It doesn't state much of what we should know about the ordering relationships among the keys and trees. But, it does overstate one ordering relationship — for 2-nodes, it doesn't allow k₁ to be the maximal key.

All of these problems are more easily seen with a better structure.

Base case (t=empty): P(t) holds because …
Inductive case (t not empty):
- Case t=(make-2-node t₁ k₁ t₂): We know the following about the fields: … By the inductive hypothesis, we can assume P(t₁) and P(t₂). Thus we know P(t) because …
- Case t=(make-3-node t₁ k₁ t₂ k₂ t₃): We know the following about the fields: … By the inductive hypothesis, we can assume P(t₁), P(t₂), and P(t₃). Thus we know P(t) because …

Or, rather than having two subcases of the inductive case (as most textbooks suggest), have two inductive cases (following the data definition, as in COMP 210/212.)