- Recall from Lecture 28 that a hash function,
*h*, maps the set U of keys into the slots of a**hash table**T[0..m-1],*i.e.*,*h*:U -> {0, 1, ..., m - 1}. - There are many functions that satisfy this requirement. Choosing a good function is critical to the performance of a hash table.
- A good hash function distributes the keys over the range as uniformly as possible, minimizing collisions.
- A bad hash function is exemplified by h(k)=0.
- Typically, a hash function is implemented in two steps:
- Map the key to a natural number. Often, the domain of this mapping is larger than the range, for example, a function mapping arbitrary length strings to a subset of the natural numbers, such as [0, 2^31-1].
- The Java Object class's method hashCode() is such a mapping. The key being the underlying object.
- Any class can override the Object class's implementation of
this method with a more appropriate implementation. For example,
we may want our class's hashCode() to use a single field, such as a
string, as the key. (Thus, two objects that otherwise differ, but
contain the same string, would have the same hashCode() value.)

- Select a slot in the hash table based upon the natural number computed in the previous step. There are two common methods:
- The division method:
- h(k) = k.hashCode() % m
- The conventional wisdom says to beware of values of m
that are powers of two. The rationale for this is based on better
performance, i.e., fewer
collisions, for a poorly chosen hashCode() mapping.

- The multiplicative method, which has three steps:
- Multiply k.hashCode() by a constant
**A**that is 0 <**A**< 1. The golden section number 0.61803... is often selected as the value of A. Here is a good URL discussing the golden section: http://www.mcs.surrey.ac.uk/Personal/R.Knott/Fibonacci/fib.html.

The effect of using the golden section is to disperse keys that are adjacent in the hash functions domain to widely separated values in the range. - Extract the fractional part of this product.
- Multiply the fractional part by m, keeping only the integral part.
- Although this method appears complex, there are very efficient ways of implementing it if m is a power of two.
- Ultimately, you should test your hash function on representative
inputs. For example, this
study compares numerous hashCode()
implementations using the division method. (Note that m is a
power of two in their study.)

Expected-case Cost | Worst-case Cost | ||

lookup() | O(n) | O(n) | |

DictLRS | insert() | O(n) | O(n) |

remove() | O(n) | O(n) | |

lookup() | O(log n) | O(n) | |

DictBT | insert() | O(log n) | O(n) |

remove() | O(log n) | O(n) | |

lookup() | O(log n) | O(log n) | |

DictArray | insert() | O(n) | O(n) |

remove() | O(n) | O(n) | |

lookup() | O(1) | O(n) | |

DictHash | insert() | O(1)* | O(n) |

remove() | O(1) | O(n) |

- DictHash is an excellent implementation of IDictionary.
- Hash tables are, however, a poor choice for ranged operations, such as removeRange().