[an error occurred while processing this directive] [an error occurred while processing this directive]
![]() |
|
A dictionary is a data structure used to represent the one-way relationship between entities in one set and entities in another set.
A dictionary is a "mapping" that enables captures the association of a key to a value. For every key, there is an associated value, so if one where given a particular key, one could find the associated value. Do note that a dictionary can only represent the one-way relationship of a key to a value and not the other way around.
Conversely, for purposes of designing a computer program, if our problem contains entities that can be related by a one-way relationship then a dictionary could be very useful for representing that relationship in our program. Note that two-way relationships can sometimes be represented by two one-way relationships.
There are many different ways in which one could implement a dictionary. In Scheme, a dictionary is implemented using a specific technique called a "hash table" which is very fast and memory efficient. Unfortunately, the internal algorithms of a hash table are beyond the scope of this course. For more information, you can see, for instance, the Wikipedia article on hash tables or simply ask the instructors, who will be happy to chew your ear off on everything you'd ever want to know about hash tables!
For the remainder of this discussion, since we will be limited to Scheme code, the terms "dictionary" and "hash table" will be synonymous, but bear in mind that a hash table is a dictionary, but not necessarily the other way around.
IMPORTANT CODING INFORMATION: To use hash tables in DrScheme, you must force DrScheme into a special "Module" mode by including the following lines at the top of your definitions: #lang Scheme Unfortunately, you will lose the niceties of the student language levels of DrScheme, such as testing with check-expect, but hey, this is the big leagues now! |
Creating an empty hash table:
(make-hash) -- returns an empty hash table
Adding a NEW key-value pair to a hash table:
(hash-set! ht key value) -- sets the given key and value in the given hash table, ht. If the key already exists in the hash table, the current value is replaced with the new value.
Retrieving the value for a given key in a hash table:
(hash-ref ht key) -- returns the value associated with the given key in the given hash table, ht. An exception is raised if the key does not exist in the hash table.
Removing a key-value pair:
(hash-remove! ht key) -- removes the key and its associated value from the given hash table, ht.
Update the value for a given key in a hash table:
(hash-update! ht key existing_key_fn nonexisting_key_fn) -- For the given hash table, ht,
If the given key already exists in the hash table: The new value for that key is the return value of the function existing_key_fn when given the input parameter of the currently associated value for that key. That is, new_value = (existing_key_fn old_value).
If the given key does not yet exist in the hash table: A new key-value pair is created where the value for that key is set to the return value of the function nonexisting_key_fn and then the value is updated as if it already existing (see above). That is, new_value = (existing_key_fn (nonexisting_key_fn)).
Either named functions or lambdas can be used for the existing_key_fn and nonexisting_key_fn functions, so long as they satisfy the following contracts:
existing_key_fn: any --> any
nonexisting_key_fn: --> any
Processing every element of a hash table:
(hash-map ht aFunc) --> (list (aFunc key1 value1) (aFunc key2 value2) (aFunc key3 value3) ...) -- returns a list of values resulting from executing the aFunc for every every key-value pair in the hash table, with the key and value as input parameters. IMPORTANT: The order of the key-value pairs processed by hash-map is NOT guaranteed!
The contract for aFunc is thus
aFunc: any any --> any
(hash-for-each ht aFunc) -- Does the same thing as hash-map, but does not create an output list of results. There is no return value for this hash-for-each, as this function is used exclusively for "side-effects" of aFunc, such as updating or setting the values in the hash table. In such, the contract for aFunc does not require that it return a value. IMPORTANT: The order of the key-value pairs processed by hash-map is NOT guaranteed!
> (define ht (make-hash))
; make an empty hash table
> (hash-set! ht "a" 1) ;
add the key-value pair ("a" 1)
> (hash-set! ht "b" 10) ; add
the key-value pair ("b" 10)
> (hash-ref ht "a") ;
retrieve the value associated with the key "a".
1
> (hash-ref ht "b") ; retrieve
the value associated with the key "b".
10
> (hash-set! ht "d" 100) ; Creates a valid key,
"d", associated with the value, 100.
> (hash-remove! ht "d") ; Remove the key, "d" and
its associated value.
> (hash-ref ht "d") ; Double check to make sure key
is gone.
. . hash-ref: no value found for key: "d"
> (hash-update! ht "a" (lambda (v) (add1 v)) (lambda () 42))
; add one to the value associated with the key "a".
> (hash-ref ht "a") ; check the
value for the key "a".
2
> (hash-update! ht "b" (lambda (v) (add1 v)) (lambda () 42))
; add one to the value associated with the key "b".
> (hash-ref ht "b") ; check the
value for the key "b".
11
> (hash-ref ht "a") ; verify
that the value for the key "a" is unaffected.
2
> (hash-ref ht "c") ; attempt to
get the value for the non-existent key "c".
. . hash-ref: no value found for key: "c"
> (hash-update! ht "c" (lambda (v) (add1 v)) (lambda () 42))
; update the value non-existing key, "c", by first setting the
value to 42 and then adding 1.
> (hash-ref ht "c") ; check the value
for the key "c".
43
> (hash-map ht (lambda (k v) k)) ; get a list of
all the keys in the hash table.
("b" "a" "c")
> (hash-map ht (lambda (k v) v)) ; get a list of
all the values in a hash table.
(11 2 43)
> (hash-map ht (lambda (k v) (list k v))) ;
get a list of all the key-value pairs (2 element lists) in the hash table.
(("b" 11) ("a" 2) ("c" 43))
> (hash-map ht (lambda (k v) ; replace every value
less than 20 with 20
(cond
[(< v 20) (begin
(hash-set! ht k 20)
(list k 20))]
[else (list k v)])))
(("b" 20) ("a" 20) ("c" 43))
> (hash-map ht (lambda (k v) (list k v))) ; check
the key-value pairs in the hash table.
(("b" 20) ("a" 20) ("c" 43))
> (hash-for-each ht (lambda (k v)
(hash-set!
ht k (+ v 10)))) ; add 10 to every
value in the hash table
>(hash-map ht (lambda (k v) (list k v))) ; check
the key-value pairs in the hash table.
(("b" 30) ("a" 30) ("c" 53))
Turn a hash table into a list of key-value pairs (we did this above--it's so useful, it's nice to have it as a function):
;;
hash-items: hashtable(any-->any) -->
list of (any, any)
;; Returns a list of all the key-value pairs in a hashtable.
(define (hash-items ht)
(hash-map ht (lambda (k v)
(list k v))))
In this part of the lab, we will develop some functions that will work as a simple "substitution cipher", which is a method of encrypting text messages by substituting each letter with another one, very much like the "secret decoder ring" in your favorite breakfast cereal. (e.g. http://www.exploratorium.edu/ronh/secret/secret.html and http://en.wikipedia.org/wiki/Secret_decoder_ring)
Here's an example of a simple substitution cipher: (for simplicity's sake, I've only shown the letters that I will need)
d --> a
e --> m
H --> o
l --> ?
o --> x
r --> e
w --> d
! --> H
[space] --> g
So we can use these relationships to encode a string by substituting each letter one at a time:
Hello world! --> om??xgdxe?aH
To decode the message, one needs the inverse of the above relationships (note what this says mathematically about the nature of substitution ciphers):
a --> d
d --> w
e --> r
g --> [space]
H --> !
m --> e
o --> H
x --> o
? --> l
With this relationship we can do the same substitution process as before:
om??xgdxe?aH --> Hello world!
Ta da!
The thing to notice here is that the substitution relations above are exactly the same type of one-way key-value relationships we discussed at the top of this lecture. That means that dictionaries are the perfect data structure to use to represent the substitution relationships.
The first thing we need to do is to create the dictionary that maps the letters and characters in a message. We will create a random mapping for maximum security.
One of the crucial issues however is that any mapping we create must be reversible, that is, each output character must correspond to one and only one input character. For instance, suppoose we have d --> a, then we cannot also have p --> a. Why do you think that this restriction is necessary? Is it also necessary to impose the restriction the other way, i.e. one output character for any input character?
Fundamentally, this is how we will accomplish this goal:
Let's call our function makeEncoderDict which will take one parameter, a list of available characters, and return a dictionary where the keys are the original characters and the values are the encoded characters.
Here's what we need to do:
Make a list of all the available characters for encoding:
It's easier to type characters as part of a string, so we
can type out a long string with all the desired characters and
then use the built-in string->list
function to convert the string into a list of individual
characters:
(define all_chars (string->list"abcdefghijklmnopqrstuvwxyz
.,!")) -- you'll probably want more, such as
capital letters, more punctuation marks, etc.
For an even more complete list of characters, use the technique described below in the "Advanced encoding with ASCII characters" section.
Thinking ahead, we need a utility function to remove the n'th element from a list:
We write a simple recursive function to "count down" from n to zero as we recur through the list and return a list without that element by not cons'ing first onto the recursive result from rest. This gives us 3 distinct cases to contend with:
This generates the following code:
;; remove-nth: list-of-any integer -->
list-of-any
;; (remove-nth loa n) returns a list where the n'th element (starting at zero)
from
;; loa has been removed.
;; Throws an exception if the n'th element doesn't exist.
(define (remove-nth loa n)
(cond
[(empty? loa) (raise "remove-nth: Too
few elements in list!")] ; no elements to remove here!
[(zero? n) (rest loa)]
; found the n'th element, return the rest of the list without first
[else (cons (first loa) (remove-nth
(rest loa) (sub1 n)))])) ; not here. Count down into rest.
Make the dictionary to encode the original text string:
The process to randomly associate characters with another character goes something like this:
The process might look like something like this, for a srcChars list of ('a 'b 'c 'd):
Step # | remainingChars | nAvail | valueIdx | key -value pair |
1 | ('a 'b 'c 'd) | 4 | 2 | 'd --> 'c |
2 | ('a 'b 'd) | 3 | 0 | 'c --> 'a |
3 | ('b 'd) | 2 | 1 | 'b --> 'd |
4 | ('b) | 1 | 0 | 'a --> 'b |
5 | () | 0 | Done! | Done! |
The above process can be captured in a function called makeEncoderDict by utilizing some local definitions and a helper function:
;; makeEncoderDict: list-of-chars -->
hashtable(char-->char)
;; (makeEncoderDict srcChars) returns a hash table who's keys are the characters
;; in srcChars where the associated value is another, randomly chosen character
;; from the same set.
(define (makeEncoderDict srcChars)
(local
[(define dict (make-hash))
; The dictionary that will be returned.
;;
fillDict: list-of-chars --> hashtable(char-->char)
;; (fillDict srcChars) is a
recursive helper function that returns the above dict hash table
;; who's keys are the
characters in srcChars where the associated value is another,
;; randomly chosen character
from the same set.
(define (fillDict
remainingChars)
(local
[(define nAvail (length remainingChars))] ; this value is
needed several times below
(cond
[(zero? nAvail) dict] ; done! return the filled dictionary
[else (local
[(define valueIdx (random nAvail))] ; pick a random number
0<= valueIdx < n
(begin
; associate the n-1'th element of the srcChars list with
; the i'th member of the remaining characters
(hash-set! dict (list-ref srcChars (sub1 nAvail)) (list-ref remainingChars
valueIdx))
; recurse onto the remaining characters with the i'th char
removed.
(fillDict (remove-nth remainingChars valueIdx))))])))]
(fillDict srcChars)))
; Start the process off with the full list of characters
To use makeEncoderDict, simply call it with a list with has all the available characters for encoding, such as all_chars, e.g.
(define myEncoderDict (makeEncoderDict all_chars))
Encoding is a trivial matter of simply walking through every element of a string and building up a new string by using the encoding dictionary to retrieve the corresponding encoded character. The "walking through the string" can be easily accomplished by mapping a lambda over a list of characters formed from the string. This will produce a list of encoded characters, which can be easily transformed back into a string.
The encoder function, called encode, will take a string and an encoder dictionary (hash table) and return an encoded string.
All three steps above can be accomplished in a single line of code. Take a shot at it:
;; encode: string
hashtable(char-->char) --> string
;; (encode str dict) returns a string where every character
;; in str has been replaced by the corresponding value in dict.
(define (encode str dict)
...)
You should now check your encoder function:
In order to decode the encoded string, we need to represent the opposite relationship as the encoder dictionary. To do this, we simply have to walk through the encoder dictionary and in the process, fill a new dictionary with key-value pairs that are exactly the opposite of what came from the encoder dictionary. That is, the key becomes the value and the value becomes the key.
Make a function called makeDecoderDict that will take an encoder dictionary and return a decoder dictionary made from it.
There are two similar ways of doing this:
In a nutshell, the process is thus:
Give it a shot:
;; makeDecoderDict: hashtable(any-->any)
--> hashtable(any-->any)
;; (makeDecoderDict inputDict) inverts the input hashtable, inputDict,
;; returning a new hashtable where the keys and values from inputDict been
;; swapped. For instance if in inputDict, key x --> value y, then in the
;; returned hashtable, key y --> value x.
(define (makeDecoderDict inputDict)
...)
A call to makeDecoder should look something like this:
(define myDecoderDict (makeDecoder myEncoderDict))
Do we need to write a special decoding function, like we had for the encoding? Why or why not?
After you have obtained a solution for the above exercises, you may download the full code here to compare your algorithms with those from the instructors.
One of the problems with a single substitution cipher as we just created is the fact that they are relatively easily cracked. Even though each letter has been replaced by another, since each letter is represented by a single letter in the encrypted message, the relative frequency that each letter occurs is unchanged. Since the frequency of each letter in general texts is well known (see, for example http://en.wikipedia.org/wiki/Letter_frequency), then one can easily tell which letter each character in the encrypted message really represents.
One possible way to make a substitution cipher much more difficult to crack is to have multiple possibilities for each letter. For instance,
d --> a or "
e --> m or z or 1
H --> o or :
l --> ? or 8 or ]
o --> x or t or \
r --> e or %
w --> d or q
! --> H or >
[space] --> g or =
Note that the number of possibilities for each letter does not have to be the same. The restriction however is that the number of output characters is much greater than the number of input characters. Using this sort of mapping, randomly choosing what letter to substitute from the list of choices, we get the following sort of encoding. Note how repeated letters such as the "l", do not appear to be repeating any more.
Hello world! --> oz?8tgdxe]a>
The decoding relationship is essentially the same as before, but just with more entries, where some of the entries map back to the same value:
a --> d
d --> w
e --> r
g --> [space]
H --> !
m --> e
o --> H
q --> w
t --> o
x --> o
z --> e
? --> l
" --> d
1 --> e
8 --> l
] --> l
: --> H
\ --> o
% --> r
> --> h
= --> [space]
With this decoding mapping, the encoded message is easily decoded:
oz?8tgdxe]a> --> Hello world!
So your challenge is, can you modify your simple, single substitution code to be able to perform a multiple substitution cipher as described above?
What if you wanted to be able to encode some non-printing characters such as linefeed, carriage return, backspace and tab? These are very difficult to add to your string of available characters.
One way is to represent the characters not as regular printable entities, but as integers. This is done by the use of ASCII characters. Please take a moment to read about them in the addendum below.
By using the escape sequences for the non-printing characters, you could painstakingly add them all to your string of available characters, but there's an easier way: simply define a function that will generate the string of available characters.
Define a function called makeCharList that takes two integers as inputs, a beginning ASCII integer value and an ending ASCII integer value.
The function should simply recurse through all the ASCII integer values, starting at the given start value up to but not including the given end value.
At each iteration of the loop, by using the integer->char function, simply append the character equivalent of the ASCII integer value to a result string (initialized before the loop).
After the loop is done, simply return the result string.
Define two integer variables called ASCII_START and ASCII_END that define the range of ACII characters you are willing to encode. For basic type-able characters, starting at 32 (space character) and ending at 127 (126 is the is the tilde, ~, character) is sufficient. This will insure that your encoded message is easily printed. On the other hand, if you want to encode anything, you could use something like 0 through 128.
(define SPACE_CHAR 32) ;; ASCII code
for space character
(define DEL_CHAR 127) ;; ASCII code for del character
;; makeCharList:
int int --> list-of-chars
;; (makeCharList startCharInt stopCharInt) Returns a list of characters with
consecutive ASCII numerical values,
;; starting with the integer value, startCharInt, inclusive up to stopCharInt,
exclusive.
(define (makeCharList startCharInt stopCharInt)
(cond
[(>= startCharInt stopCharInt) empty]
[else (cons (integer->char startCharInt) (makeCharList (add1 startCharInt)
stopCharInt))]))
A call to makeCharList should look something like the following, which will create a list of all the printable ASCII characters, except for those that control multiline text, such as newline and carriage return:
(define all_chars (makeCharList ASCII_START, ASCII_END))
Now you can easily make a list of available characters, including non-printing ones, without having to type them all out!
Before we start working on our substitution cipher, we need to understand a little more about how characters are represented in a computer. The standard binary representation of characters in computers is called the ASCII (American Standard Code for Information Interchange) character set. The ASCII character set associates every character with a specific integer number between zero and 127 (inclusive). The characters represented included upper and lower case letters, numbers, various symbols such as exclamation points and question marks, as well as non-printing characters used to control the display device, such as tabs, line feeds and carriage returns. With 128 total characters, a 7 bit binary number can be used to represent them all.
With the advent of the IBM PC in 1981, with its natural ability to handle multiples of 8-bits, the ASCII character set was extended to 8-bits or 256 possible values. These new "extended ASCII codes" included some Greek letters, letters with diacriticals and various lines and corners used to create "ASCII art" borders and drawings on primitive text screens and printouts. There is not a true standard for an 8-bit ASCII code however, so it may vary from program to program.
Here are some links to ASCII tables on the web:
Python provides convenient functions to convert a character to and from its respective ASCII numerical values:
(integer->char anInteger) --> the character represented by the given integer.
(char->integer aCharacter) --> the integer that the ASCII code says represents the given character.
Non-printing characters
Consider the phenomena when trying to display a 3 element list:
> (define s1 (list->string (list
(integer->char 65) (integer->char 66) (integer->char 67))))
> s1
"ABC"
> (display s1)
ABC
> (define s2 (list->string (list (integer->char 65) (integer->char 10)
(integer->char 67))))
> s2
"A\nC"
> (display s2)
A
B
Note that print(hr(i)) may show what appears to be a different output than simply typing chr(i) at the interactive Python prompt. In particular, this issue will show up for characters whose integer values are less than 32 or greater than 127. For numbers less than 128, this is due to the fact that the characters represented by those integer values do not have nice screen representations. Originally these were used control the primitive teletypes that were in use when the ASCII code was developed. Originally, there were no character codes above 128, but later they were used for international symbols and "ASCII art" symbols for making nicer looking layouts on text-only screens. These control codes do not have representations that are easily rendered on the screen.
MMost of the ASCII control codes are useless these days, except for a few notable exceptions:
So how does one specify a character that doesn't print?
Scheme, as many languages do, uses a special "escape sequence" of characters to specify non-printing characters. The escape sequence always starts with a backslash, "\", and is followed by a single letter or numbers. The single letter is a shorthand for common characters and the numbers are the ASCII integer value in either hexadecimal (base 16, preceded by an "x") or octal (base 8). For instance:
More information on Python escape codes can be found here: http://python.org/doc/2.5.2/ref/strings.html
In our modern world with non-English letters, there was a need for a much richer character system than the old ASCII. Unicode is a multi-byte character set that can represent most of the characters and symbols used in the world today. Any program that expects to be used in an international setting should use Unicode. For more information, see
© Dung X. Nguyen & Stephen Wong
Last revised 11/23/2009 10:28 AM