Structure | Search | Insert | Delete | Space |
---|---|---|---|---|
Linked List | O(n) | O(1) | O(n) | O(n) |
Sorted Array | O(log n) | O(n) | O(n) | O(N) |
Balanced BST | O(log n) | O(log n) | O(log n) | O(n) |
Array[KeyRange] of EltType | O(1) | O(1) | O(1) | KeyRange |
Other possibilities include unordered array, ordered linked list, unbalanced BST.
We can get slightly more efficient algorithms with Sorted Arrays if we use an interpolation search (as long as know the distribution of keys). But it is still O(log n).
This implementation assumes that the data has a key which is of a restricted type (some enumerated type in Pascal, integers in Java), which is not always the case.
Note also that the size requirements for this implementation could be prohibitive.
Ex. If the array held 2000 student records indexed by social security number it would be declared as ARRAY[0..999,999,999]
What if most of entries are empty? If we use a smaller array then all elements will still fit.
Suppose we have a lot of data elements of type EltType and a set of locations in which we could store data elements.
Consider a function H: EltType -> Location with the properties
Instead we use something that behaves well, but not necessarily perfectly.
The goal is to scatter elements through the array randomly so that they won't bump into each other.
Define a function H: Keys -> Addresses, and call H(element.key) the home address of element.
Of course now we can't list elements easily in any kind of order, but hopefully we can find them in time O(1).
Note that each entry in the table will need to include the actual key, since several different keys will likely get mapped to the same subscript.
There are two problems to look at:
Here are some sample Hashing functions.
Presume for the moment that the keys are numbers.
Unfortunately it is easy to get a biased sample. We can carefully analyze keys to see which will work best. We must watch out for patterns - they should generate all possible table positions. (For example the first digits of SS#'s reflect the region in which they were assigned and hence usually would work poorly as a hashing function.)
This is very efficient and often gives good results if the TableSize is chosen properly.
If it is chosen poorly then you can get very poor results. If TableSize = 28 = 256 and the keys are integer ASCII equivalent of two letter pairs, i.e. Key(xy) = 28 * ORD(x) + ORD(y), then all pairs ending with the same letter get mapped to same address. Similar problems arise with any power of 2.
The best bet seems to be to let the TableSize be a prime number.
In practice if no divisor of the TableSize is less than 20, the hash function seems to be OK. (Text uses 997 in the sample program)
Example: Let the keys range between 1 and 32000 and let the TableSize be 2048 = 211.
Square the Key and remove the middle 11 bits. (Grabbing certain bits of a word is easy to do using shift operators in assembly language or can be done using the div and mod operators using powers of two.)
In general r bits gives a table of size 2r.
This is often used if the key is too big. E.g., If the keys are Social security numbers, the 9 digits will generally not fit into an integer. Break it up into three pieces - the 1st digit, the next 4, and then the last 4. Then add them together.
Now you can do arithmetic on them.
This technique is often used in conjunction with other methods (e.g. division)
Here is a very simple-minded hash code for strings: Add together the ordinal equivalent of all letters and take the remainder mod tableSize.
Problem: Words with same letters get mapped to same places:
miles, slime, smile
This would be much improved if you took the letters in pairs before division.
Nevertheless, for simplicity we adopt this simple-minded (and thus relatively useless) hash function for the following discussion.
Here is a function which adds up ord of letters and then mod tableSize:
hash = 0; for (int charNo = 0; charNo < word.length(); charNo++) hash = hash + (int)(word.charAt(CharNo)); hash = hash % tableSize; /* gives 0 <= hash < tableSize */