COMP 200 Elements of Computer Science &
COMP 130 Elements of Algorithms and Computation
Spring 2012

Practice with Dictionaries — Substitution Ciphers

Review of Dictionary Generalities

A dictionary is a “mapping” that captures the association of a key to a value. For every key, there is an associated value, so if one where given a particular key, one could find the associated value. Do note that a dictionary can only represent the one-way relationship of a key to a value and not the other way around.

A canonical example of a dictionary is its namesake, which maps each word (the key) to its lists of meanings (the value).

We're already familiar with lists, which are essentially a special case of dictionaries. Consider the list ["apple","pear","orange"]. The list index 0 maps to "apple", the index 1 maps to "pear", and the index 2 maps to "orange". The difference is that a list's keys are consecutive integers from zero, i.e., 0, 1, 2, …, whereas the keys of a dictionary can be (almost) any value whatsoever.

Dictionaries are found in many programming languages under a variety of names, including associative arrays, hash maps, and hash tables.

Simple Substitution Ciphers

When sending a private message to a friend, one way to keep it secret from prying eyes is to encrypt it into an unreadable form, then have your friend decrypt it back to its original form. Later in the class, we'll discuss this idea more. For now, we'll look at one simple method for doing this. A substitution cipher simply replaces each letter or symbol with a different one. This is a classic method for encryption that was used by Julius Caesar.

Here's an example of a simple substitution cipher. For brevity, we've only listed the part that we'll actually use.

d a
e m
H o
l ?
o x
r e
w d
! H
[space] g

So we can use these relationships to encode a string by substituting each letter one at a time:

Hello world! om??xgdxe?aH

To decode the message, one needs the inverse of the above relationships (note that this says that substitution ciphers must be bijections):

a d
d w
e r
g [space]
H !
m e
o H
x o
? l

With this relationship we can do the same substitution process as before:

om??xgdxe?aH Hello world!

Ta da!

The thing to notice here is that the substitution relations above are exactly the same type of one-way key-value relationships we discussed at the top of this lab. That means that dictionaries are the perfect data structure to use to represent the substitution relationships.

As a variation on this idea, you can encrypt the original letters to an entirely different set of symbols. It doesn't change the process. It only changes what the encrypted message looks like.

In the rest of this module, we'll create a sample substitution cipher, develop encryption and decryption unctions, and develop a function to randomly generate a substitution cipher.

A cipher

In the dictionary finger exercises, we gave an incomplete example of a dictionary representing a substitution cipher, repeated below:

Create a dictionary that maps the entire alphabet and space. You'll use this in the next steps. For brevity, I suggest not including punctuation or upper-case letters.

Encoding

The idea of encoding was described above. Give a pseudo-code algorithm for an encoding function.

Define a Python function based upon the algorithm.

Encode some strings on your sample cipher, e.g.,

Decoding

As mentioned above, a substitution cipher must be invertible. In other words, each character occurs exactly once on the “left-hand side” of the mapping and exactly once on the ”right-hand side”. Thus, if you “reverse” the direction of the encoding cipher, you get another cipher, the one you need for decoding. So, one way to write the decode function would be

What we need is the function invert to create a dictionary mapping in the opposite direction. Review the dictionary finger exercises for how to loop over a dictionary.

Now decode an encoded string, and you should get back what you started with. Try some samples.

Generating a random cipher

Creating a cipher by hand is tedious because it has lots of entries. It is also error-prone, because you have to check that it is invertible. We'd prefer to have a function to create one for us. One way to do that is to generate it randomly.

Here is the approach we'll take.

  1. Create a list (or string) of all the input characters we will be encoding. E.g., ["a","b","c"].
  2. Create a list of all the chracters we will be encoding to, i.e., all the possible output characters in the encoding. We must have the same number of input and output characters. It could be the same characters as in the input. E.g., ["X","Y","Z"].
  3. Randomly re-order the list of output characters. E.g., ["Y","Z","X"]. We're using a list of output characters because Python has a function random.shuffle(aList) exactly for this.
  4. Build a dictionary of mapping the input characters to the corresponding output characters. E.g., {"a": "Y", "b": "Z", "c": "X"}

Write a function that follows the given algorithm.

Try your function out, e.g.,