In unit X we have learned how to sort keys into buckets based on the sum of the value of their characters (for example "abd" has the same value as "acc", as "b"+1="d"-1="c"). That's it in a nutshell, not too precise I know.
Let's suppose I want the search engine to be able to deal with queries such as "U?acit?" (which will search the keys Udacity, Ubacity, Udacita, etc...). Obviously without buckets it wouldn't be much of a problem with what we've learned so far in Python, but as the intro vid for the unit suggests, we want to "make things faster", which is harder without a hashtable.
So to conclude, my question is, do you have a suggestion as to what solution could be applied here? The bucketing we've learned so far won't work since we don't know the value of some characters, and without a hashtable this can get pretty slow with giant indexes. Is there a hashtable that will work for the situation? (or perhaps a totally different solution)
note: This is just a hypothetical question. If this is way over what we have learned so far, please tell me.
asked 24 Mar '12, 18:01
There are some pretty smart guys/girls on this forum that will probably have better answers. What makes google so fast is that they cache a lot of searches. Meaning they don't have to query their database very often they just return the cached result. Your question regarding wildcards may not be a huge deal. If google doesn't have your search request in their cache then maybe they will have to actually go through and find the results manually. I believe your question is how to make this manual search as fast as possible.
Here is one man's solution to this problem:
Suppose you had an index that could return a list of all keywords starting with 'U', and another that could list all keywords with 'a' in the third position, and another that could list all keywords with 'c' in the fourth position, etc. Then the problem reduces to finding the intersection of all the returned lists. This is basically how bibliographic retrieval systems work: intersect works by the specified author, works with certain title words, works with specified words in the text, etc. This is also how Google works, with lists of pages that contain specified words. (Or specified word roots.) If you ask for a quoted phrase, Google then scans the retrieved pages for one that contains the keywords in exactly the required order. (And, I assume, they don't compute beyond the first page of answers until you request a second page.)
For your hypothetical alphabetic search, however, it would be more common to set up a regular expression function that can be applied quickly to each keyword in the dictionary. One pass through a list of a million words doesn't take very long. I do such searches on my home computer to answer word puzzles and (if absolutely necessary) to finish Saturday Stumper crossword puzzles.
answered 24 Mar '12, 18:33
Kenneth I. L...
That is actually no problem in Python's built-in dictionary type because it uses the id() function on the objects to compute the hash, so the computed hash is different from what we know, e.g.:
And actually Python doesn't use any buckets. Buckets are a collision resolution strategy (there are many different). Python uses another one, which is called open addressing, that takes O(1) time for lookup, but the input may take more time (it essentially hops several fields further).
answered 24 Mar '12, 18:38