Where we're headed…

Next section focuses on mathematical tools for describing data — sets, relations, functions, …. These are familiar and very interrelated concepts that we'll delve into in a bit more detail.

Sets

Defining

set and element-of are undefined operations (just as point/line/plane in geometry; in fact "incident on a line" is really same as "element-of"); it all comes down to being able to answer, is a particular element in the set?

How do we describe sets in math and code?

By name
- E.g., ∅, ℕ, ℤ, ℚ, ℜ or ℝ, Σ
- Names are convenient, but begs the question of how the name was defined
Enumeration, i.e., listing all the elements
- E.g., { black, silver, gray, white, maroon, red, purple, fuchsia, green, lime, olive, yellow, navy, blue, teal, aqua }
- E.g., { 1, 7, 10, 29 }
- E.g., { 0, 1, 2, 3, …}
Set-builder notation, set comprehension notation, i.e., defining a membership, indicator, or characteristic function
- E.g., { c | c's RGB color components are each in {x00, x80, xFF}, each non-zero component is the same, or c is xC0C0C0 }
  - Note: xC0C0C0 is silver. See standard HTML color names.
- E.g., { x | x is even }
The previous, combined with set operators

Problems with "…"

E.g., is {0,1,2,3,…} referring to the natural numbers, the non-composites, or the odd numbers plus 0 and 2? (OEIS lists 2609 hits!)
What is {1,3,6,10,15,21,28,36,45,55,…} — intended to mean triangular numbers (although OEIS lists 34 hits)
What is {0,1,2,6,15,40,104, …} — Golden Rectangle numbers — product of adjacent Fibonacci numbers

(Aside: The On-Line Encyclopedia of Integer Sequences)

Code considerations

When writing a library for sets, what are options on representing the data, and how do they correspond to the previous ideas?

Some ideas to touch on:

various data structures, including bit vectors
representing infinite sets
type predicates
time and efficiency, esp. of enumeration vs. membership

Some important relations on sets

Start with some (mostly?) familiar definitions…

A is a subset of B, A ⊆ B

for all x, x ∈ A implies x ∈ B

A is equal to B, A=B

x ∈ A iff x ∈ B
A ⊆ B, and B ⊆ A

A is a proper subset of B, A ⊂ B

A ⊆ B, AND
not A=B

A is a non-trivial subset of B

A ⊂ B, AND
not A=∅

Comparing equality definitions

Definition (1) is clearly correct -- the sets must have the same members. Definition (2) seems correct (esp. if you're familiar with it already).

Which translates to usable code better?

How to prove (1) and (2) are equivalent? …

Proof: We have to show two things:

Suppose def'n(1) holds, then show def'n(2) holds.

Recall def'n(1): x ∈ A iff x ∈ B. To show def'n(2), we need to show two things:
- Show that A ⊆ B.
  
  By def'n of ⊆, this means showing x ∈ A implies x ∈ B. This is indeed subsumed by the "only if" of (1).
- Show that B ⊆ A.
  
  By def'n of ⊆, this means showing x ∈ B implies x ∈ A. This is indeed subsumed by the "if" (1).
Now, suppose def'n(2) holds, then show def'n(1) holds.

Recall def'n(2): A ⊆ B, and B ⊆ A. To show def'n(1), we need to show two things:
- Show that "x ∈ A if x ∈ B", equivalently "x ∈ B implies x ∈ A".
  
  We are given that B ⊆ A, which (by def'n of ⊆) means that x ∈ B implies x ∈ A — yay!
- Show that "x ∈ A only if x ∈ B", equivalently "x ∈ A implies x ∈ B".
  
  We are given that A ⊆ B, which (by def'n of ⊆) means that x ∈ A implies x ∈ B — yippee!

Okay, this was written out in great detail, with constant reminders of what was being assumed and what was to be shown. But step back and observe the structure of the proof: showing two parts of "(1) equivalent to (2)" (each of which again further required an "if" and an "only if" direction.)

Code

For our two representation approaches, how to define the following?

set=?: Already discussed.
empty-set?: Easy, given set=?.
subset?: ???

Set operations

Many of these should be familiar already.

union, A∪B
- x ∈ A∪B iff (x ∈ A ∨ x ∈ B)
intersection, A∩B
- x ∈ A∩B iff (x ∈ A ∧ x ∈ B)
complementation
- x ∈ overbar(A) iff x ∉ A
- Danger -- hidden assumption! This is often relative to some "universe" — e.g. the complement of the odd integers relative to the integers is the even integers, but relative to all real numbers, the complement is the number line with some points removed.
- See below — discussion of "SuperU", the set that contains everything.
subtraction, A-B
- x ∈ A-B iff (x ∈ A ∧ x ∉ B)
- A-B = A ∩ overbar(B)
- Avoids the hidden assumption of complementation, as A acts as the universe.
Cartesian product, A×B
- E.g., {vw, saab, bmw, audi} × {red, orange, yellow}
- E.g., {vw, saab, bmw, audi} × {}
- Notation: A = A¹, A×A = A², A×A×A = A³, …
- "×" suggests multiplication — What is |A×B|?
Power set, P(A), Pow(A), ℘(A), 2^A.
- E.g., P({1,2,3}
- E.g., P(∅)
- E.g., P(ℕ) or P(ℜ) -- we'll talk about these soon
- |P(A)| = ?

Note that union/or/addition all look similar; and likewise intersection/and/multiplication. Be careful about taking this too far though, e.g., it's generally not the case that (A-B)∪B = A !

Coding

On your own, consider how to code these for each of the set representations.

Naive Set Theory

Consider the set with the constant-true indicator function: (define (SuperU elt) true). What set is this? Note that not only are 4 and my car elements of SuperU, but so is the entire set ℜ is in SuperU, as well as the three-item set { ∅, black, ℜ }. (That's fine to have sets that contain sets, just as lists can contain lists.) Somewhat disturbingly though, SuperU is itself contained in SuperU! A set containing itself?! A bit dizzying, but we'll let it in the door.

Uh-oh, logician/philosopher Bertrand Russell (around 1900) sees SuperU, and senses he can do something wicked.

First, he suggests SuperB ("Bertrand"), the set of all sets which which contains themselves. The indicator function is even easy to code up: (define (SuperB elt) (element-of? elt elt)). For instance, SuperU is one of the elements of SuperB. (Think of "a book which lists all the books which mention their own title".) This discomfits us, but we don't kick Bertrand out; maybe if we just ignore him he'll go away and stop giving us a headache. [Interesting: does SuperB contain itself? Either answer is consistent, but it tips us off that our set's "easy condition" doesn't seem to fully specify the set.]

Not so lucky: encouraged by this success, Bertie cranks it up a notch, and suggests the set SuperR, of all sets which don't contain themselves. Again the indicator function is easy: (define (SuperR elt) (not (element-of? elt elt))) (Think of "a book which lists all the books which don't mention their own title".) SuperR seems to include relatively normal elements and sets, but our head is pounding more than ever. It's too late, for Bertrand jumps up and cackles as he asks:

Does SuperR contain itself?

"Well", we say, flustered: "Either it contains itself or it doesn't — after all that's all there is to sets: they either contain a given item or not. So, let's see…"

"Suppose SuperR does contain itself. But, if (element-of? SuperR SuperR), then then by the def'n of SuperR, it wouldn't contain SuperR."
"Suppose SuperR doesn't contain itself. But, if (not (element-of? SuperR SuperR)) is true, then by the def'n of SuperR, it would contain SuperR."

"Aaaahhh!" And we are carted away to the asylum, as Bertrand sits in our favorite easy chair, drumming his fingers together, saying "Excellent…"

Indeed, this was very disturbing news to mathematicians around 1900: the concepts of sets, which underlies all of mathematics, has a paradox! Within mathematics, there could be no worse possible catastrophe. This began a program to re-vamp set theory.

Russell's own approach: a "tiered" classifications sets :
sets of atomic elts, sets containing sets of elts, …
It can be finagled to work, but is very unsatisfying.

Used today, when a rigorous foundation needed: a formalization "Axiomatic set theory". But for our purposes, we'll just use "naive set theory" —

We'll take certain sets for granted (integers; the set of all functions from integers to booleans).
We'll also allow ourselves to build new sets out of old ones (through union, cartesian product, etc).
And, when using set-builder notation, we'll always specify a universe —
- rather than { x | some-condition-on-x },
- we'll write { x ∈ U | some-condition-on-x } where U is some set we already have constructed. (which is how they all did it pre-1900)

Thus we disallow SuperU, even though it has the easiest indicator function of all.

Yes, this is deeply connected to the Halting Problem, discussed at end of the course (though discovered 30 years apart).