COMP 130 Elements of Algorithms and Computation
Spring 2012

Computing with Probabilities 

Demo code for this lecture:  cards.py

So far, everything we've been doing has been deterministic in the sense that the results of our computations were the result of well-defined processes that transformed input values into output values.   We will now look at a completely different way of viewing computation that is instead based on probabilities where outputs are inferred from their likelihoods rather than the being the inexorable result of a deterministic process. 

Mars pathfinder robot
Mars Pathfinder robot
The probabilitistic approach has revolutionized computing in many areas, from facial recognition to DNA analysis to autonomous navigation of robots.    Probabalistic techniques are particularly useful for recognition tasks, such as detecting if an incoming e-mail is spam or what ZIP code is written on a letter as it zips through an automatic sorting machine at the post office.  Hidden Markov Models, another common statistical modeling approach, is based upon the Markov chains we used for text analysis.
Monte Carlo simulations, so named because of their reliance on random probabities like the famous casinos, are used for a wide variety of tasks from simulating nuclear warhead explosions to galactic models to guiding investment portfolios.   Statistically-driven simulations such as these are often characterized by a faster  convergence to the final results than more "brute force" methods that evaluate every possibility in a distinct order.   The problem on Exam 2 to calculate  the value of π was a simple Monte Carlo simulation.  Monte Carlo simulation of pollution plume
Monte Carlo simulation of a pollution plume
Bayesian network
Bayesian network
Bayesian inference, which we will cover in more detail later, which uses newly measured probabilities  to transform previously measured outcomes into inferences of future outcomes.    Its uses range from handwriting and facial recognition to clinical drug trials to models of human reasoning
 "Fuzzy logic", where "truth" is represented by degrees of probabilistic values, is used in esoteric artificial intelligence and machine learning down to mundane kitchen counter rice cookers.  Artificial Neural Networks are a particular approach to building up fuzzy logic expressions, based loosely on natural neural networks.   Fuzzy logic rice cooker
Fuzzy logic rice cooker
 

Basic Probability Primer

The probability of something occuring (or of being something), is the likelihood that such is true.  It doesn't guarantee that something will happen or that it is.   We can say that the probability of pulling an spade from a shuffled deck of cards is 25%, but that does not mean that every 4'th card we pull will be a spade.   Instead, it simply says that if we repeat the experiment (pulling one card from a shuffled deck of 52 cards) many, many, many times, that approximately one quarter of the cards pulled will be spades.    It is possible to show that the more times the experiment is repeated, the closer to 25% the ratio will become. 

Some axioms of probability:

0 ≤  P(A) ≤ 1     --  A probability of something ("A", here), is always between zero and one, inclusive.    Zero means that A will never happen while 1 means that it will always happen.

1 = P(True)    -- The probability of something, whatever it is, happening is 1, i.e. something will definitely happen.   Note:  Don't confuse "nothing" as a possible outcome with "none of the possible outcomes".  In that case, "nothing" is "something"!

0 = P(False)  -- The probability of something other than one of the possible outcomes is zero, that is, the impossible won't happen (by definition).   Note:  Don't confuse "impossible" with "unaccounted possibility".  The former has zero probabilty while the latter has a non-zero probability.

1 = P(A) + P(B) + P(C) + ...   --- if A, B, C, etc are all the possible disjoint (non-overlapping or distinct) outcomes of a measurement, the sum of their probabilities must be 1.    This is technically equivalent to the 1 = P(True) axiom.   One must be a little careful here, for instance, P(spade) is not disjoint from P(king).    An example of disjoint outcomes is spades, hearts, diamonds and clubs because no card can be a combination of suits.   That is, 1 = P(spades) + P(hearts) + P(diamonds) + P(clubs).

P(A or B) = P(A) + P(B)  - P(A and B)  -- if A nd B are not necessarily disjoint outcomes, then the probability of one or the other occuring is related to not only their own probabilities, but also to their overlap.  More on this below.

Note that the above statements are inter-related, and arguably, in some ways equivalent statements at times.   It is also arguable whether or not these form the true base axioms of probability.   A mathematician would probably start at a more basic definition of probability, though one could derive the more classical axioms given the above statements.    For our purposes, starting here simplifies our lives with no loss of generality.     

Venn Diagram Visualizations of Probabilities

It's very easy to get lost in the algebra of probabilities.   For many people, it's a lot easier to visualize probabilistic relationships using Venn Diagrams, which use overlapping shapes to represents sets of possible outcomes and their inter-relationships.    (Note:  While it is common to represent sets of outcomes or possibilities as filled circles in Venn diagrams, there is no rquirement to do so -- use whatever shape makes the cleanest, clearest and simplest diagram.)

In the following Venn diagram, the entire square box represents the set of all possible outcomes of a measurement.   In terms of probabilities, the entire square box thus has an area of 1.     In that space of possible outcomes, we illustrate two possible, but non-overlapping ("disjoint") outcomes, A and B.    Here, the green area with left-to-right upward slanting lines represents all the possible outcomes that qualify as a being an "A" outcome.    Likewise, the yellow area with right-to-left upward slanting lines represents all the possible outcomes that qualify as a being an "B" outcome.     The overlap area between the green and yellow, with its net diagonal cross-hatching, are the outcomes that qualify as both "A" and "B". 

Non-distinct probabilities

P(A ∪ B) = P(A) + P(B) - P(A ∩ B)

Remember that here, we are talking about a single measurement and asking what it's outcome is likely to be.   Think of randomly throwing a dart at the above Venn diagram dart board.    Assuming that the dart will land somewhere on the board and that will be the outcome of the experiment.   The larger the area representing a particular type of outcome, the more likely the dart will be to land within that area.   Repeating our dart throw many times and we expect the statistics of where it lands to approach the relative size of each possible outcome areas to approach their relative size with respect to the (unit sized) total area.    This is exactly how we estimated the value of π in Exam 2.

We will adopt the standard mathematical notation for the following concepts:

A ∪ B   -- all possible outcomes that qualify for being either A or B.    That is, this is the set of outcomes that is the union of the A and B sets.

A ∩ B   -- all possible outcomes that qualify for being both A and B.   That is, this is the set of outcomes that is the intersection of the A and B sets.  

P(A or B) = P(A) + P(B)  - P(A and B) can thus be written as  P(A ∪ B) = P(A) + P(B) - P(A ∩ B)

We can easily see from the Venn diagram above that the last axiom above simply states that the size of the combined (union of) green and yellow areas is the area of the green plus the area of the yellow minus the size of the double-counted overlap area (the intersection).

Example:   The probability of selecting a spade from a shuffled deck is 1/4 since spades represent one of 4 suits in the deck, and the probability of a selecting a king is 1/13 since it is one of the 13 cards in each suit.   The probabilty of a card being both a spade and a king is 1/52 because there is only one of them in the deck.    Thus the probability of selecting a card that is either a spade or a king is 1/4 + 1/13 - 1/52 = 13/52 + 4/52 - 1/52 = (13+4-1)/52 = 16/52  which is what we expect since there are 13 spades (one of which is a king) plust 1 heart king, 1 clubs king and 1 diamonds king, for a total of 16 possibilities that satisfy being a spade or a king out of 52 total possible outcomes.

For a much more complete treatment of this subject, please see Prof. Subramanian's notes for Comp140:

Comp200 students: Part 1

Comp130 students: Part 1

Conditional Probabilities

The above discussion only concerns itself with a single experiment though perhaps repeated many times.   But consider this alternative situation:   Suppose we perform one experiment and then take the result of that experiment and do another experiment?    That is, the second experiment is conditional on the first experiment.   How does this change what we see?

For example, if we have already drawn a king, what is the probability that the card is also a spade?  

This is called a conditional probability and is denoted as P(A|B), the probabiltiy of A given B.

The value of a conditional probability is easy to see by looking at the above Venn diagram.   Suppose B is the probability of drawing a king from the full deck (ignore the fact the diagram is not to proper scale).    And suppose that A is the probabilityo of drawing a spade from the full deck (once again, ignoring the out-of-scale-ness of the drawing).  But once we have drawn the king, our world is not the whole grey square anymore.   Since we are now holding a king in our hand, our world is just the yellow circle. In that reduced world, the spades are just the cross-hatched intersection area.    The probability of drawing a spade given that we are holding a king already is thus the ratio of the size of the cross-hatched intersection area to the total yellow area.     That is

P(spade|king) = P(spade ∩ king)/ P(king)  = (1/52) /(1/13)  = 13/52 = 1/4.  

This makes intuitive sense because there only 4 kings, only one of which is a spade.  Thus, if we are holding a king, the chance is one in four that it is the spade king.

Generalizing:

P(A|B) = P(A ∩ B)/P(B)

This is all fine and dandy, but what if we don't know P(A ∩ B)?