L-41 MCS 360 Friday 25 April 2003
Below is a very brief summary of what we talked about in class.
If you missed the lecture, then what is below may guide your
reading of the text book.
Hashing
A hash function maps a key to a position in the table.
1. Desirable Properties
We identified 4 properties we desire from a hash function:
- fast to evaluate, in uniform time for every key;
(the main goal is to have fast uniform access to the data)
- a hash function h is said to be perfect if
for all keys i /= j, we have h(i) /= h(j);
(we want to avoid collisions or hash clashes)
- ideally, we want a hash function to be minimal,
where the range of the hash function to map #K keys
equals the size of the table;
(we wish to avoid wasting memory)
- order preserving: for i < j, we have h(i) < h(j);
(in case we want to traverse the table in the order of the keys)
2. Techniques to create hash functions
We survey three techniques to build hash functions:
- Selecting digits from the keys
- Folding, e.g.: adding all digits in the key
- Modular arithmetic, e.g.: h(k) = k modulo size(table)
Usually a hash function will use a combination of these
three techniques. An interesting one uses a random vector
of doubles: we choose first once and for all one random vector
and keep it fixed during hashing. The hash function takes the
inner product of this random vector with the digits in the keys.
With probability one, all values will be different. Selecting
the leading digits and modulo size(table) will yield the hash
function. It is a very interesting exercise to experiment with
this hash function.
3. Dealing with collisions
Collisions or hash clashes are usually unavoidable (unless
we have studied our set of keys really well). There are
two methods for dealing with collisions:
- having buckets of records (see radix sort and address
calculation); the book shows techniques to deal with
overflowing buckets
- linear hashing or rehashing, one simple example is
rh(k,i) = (h(k) + i) modulo size(table), where i is
the i-th rehashing (beware of infinite loops!)