|
So on the video after the bad hash quiz our instructor says that looking at the first letter is not the best method. However for the hash quiz answer he used 12 buckets instead of 26. Shouldn't he be using 26 buckets since there are 26 letters in the alphabet. Also I am still kind of confused at how the modulus can be accurate. Why doesn't he write a code that directly checks for the first letter in the string input and then searches for this letter in the alphabetical bucket list. |
|
Hi Justin, If you assign keys to 26 buckets, according to the first letter of the word, you will find that the buckets that correspond to letters that frequently start a word, like t, r, or s will have tons of keys, while buckets that correspond to letters like u, q, z or x have almost no keys at all. Hence, using solely the first letter of the word as the key to the bucket defeats the goal of having all the keys evenly spread across all the buckets. Your goal is to have all the buckets equally full to make the lookup time independent of the key (i.e., the lookup time should not depend on whether your word is frequently used or not) |
|
The goal for a hash function is to spread the keys around in several buckets. It is not intended to have a bucket for each key or key prefix. A bucket can [and usually does] end up with multiple entries. That is many different keys can 'calculate' to the same bucket number. One of the purposes of using hashes is to speed up lookup WITHOUT needING to sort the data before adding to lists [or buckets in lists]. Sorting is an expensive operation, especially when the number of items to be sortted starts getting large. |
|
The point of the hash function is to produce a number to index into the hash table so that you don't have to search in the hash table at that level (indexing into an array is many times faster than searching through the array). Because the array in each bucket is smaller than the whole, the searching is greatly reduced. David deliberately chose 12 as a bad example of the number of buckets. |
|
Modulus is a nice method to convert any number into a value in the range 0…N−1. So for any hash value and any number of buckets it provides a bucket index. |