|
In unit X we have learned how to sort keys into buckets based on the sum of the value of their characters (for example "abd" has the same value as "acc", as "b"+1="d"-1="c"). That's it in a nutshell, not too precise I know. Let's suppose I want the search engine to be able to deal with queries such as "U?acit?" (which will search the keys Udacity, Ubacity, Udacita, etc...). Obviously without buckets it wouldn't be much of a problem with what we've learned so far in Python, but as the intro vid for the unit suggests, we want to "make things faster", which is harder without a hashtable. So to conclude, my question is, do you have a suggestion as to what solution could be applied here? The bucketing we've learned so far won't work since we don't know the value of some characters, and without a hashtable this can get pretty slow with giant indexes. Is there a hashtable that will work for the situation? (or perhaps a totally different solution) note: This is just a hypothetical question. If this is way over what we have learned so far, please tell me. |
|
There are some pretty smart guys/girls on this forum that will probably have better answers. What makes google so fast is that they cache a lot of searches. Meaning they don't have to query their database very often they just return the cached result. Your question regarding wildcards may not be a huge deal. If google doesn't have your search request in their cache then maybe they will have to actually go through and find the results manually. I believe your question is how to make this manual search as fast as possible. Here is one man's solution to this problem: |
|
That is actually no problem in Python's built-in dictionary type because it uses the id() function on the objects to compute the hash, so the computed hash is different from what we know, e.g.:
And actually Python doesn't use any buckets. Buckets are a collision resolution strategy (there are many different). Python uses another one, which is called open addressing, that takes O(1) time for lookup, but the input may take more time (it essentially hops several fields further). |