|
I tried to write a recursive implementation for edit_distance, but kept getting errors, so I ended up submitting a non-recursive solution. At least, I think it's a solution, inasmuch as it was labeled "Correct" when I resubmitted it after the deadline. (I had a stupid typo in my recursive implementation that I finally caught -- but only after the submission deadline.) Anyway, my code for the non-recursive is below. I used a dictionary to keep track of all the distances as I went through the strings. What has me confused is that, at the end, if I print out my entire dictionary (as opposed to returning the one final result), all of my keys are two-value pairs enclosed in parentheses -- such as: (2,1) or (0,3). I expected the keys to be lists of two values. But a list would be enclosed in square brackets. We didn't see anything in this course that looked like those pairs in parentheses. But, I submitted this since it worked -- and the function wasn't returning anything other than the dictionary value, so the keys didn't show up anyway. My question is: What on earth did I do? Are these pairs the expected output of dictionary assignments that I did this way? Or is it the merest coincidence that this works at all? (I tested it with the long list of edit_distance test strings that was posted in another comment, and they all worked.) Of course, if I'd caught that stupid typo earlier, none of this would be an issue for me! Thanks in advance to anyone who can offer some insight. def edit_distance(s,t):
|
|
You accidentally used tuple as your key, f.e. in this line:
, where key consists of number pair (i,0). Tuple is data type specific to Python, somewhat similar to an ordered list. See also Python docs. The code works because ordered list of two elements from 0 to n-1 maps exactly to the elements of two-dimensional n*n matrix. |
|
You asked "Or is it the merest coincidence that this works at all? " I ask "Is it the merest coincidence that this is a Pythonised version of the code at http://en.wikipedia.org/wiki/Levenshtein_distance?" You are using a dictionary with tuples (i,j) as the index to store the matrix values. Many years ago I did a little Fortran programming. (It was Fortran 4 in those days.) Within what we then called "do loops," you could use double-subscripting for the loop indices. I was trying to do something like that in Python but we weren't introduced to any sort of multiply-dimensioned subscripted variable. Never heard of a "tuple" in Python; I don't think it was covered in lecture. I thought I would be using a two-member list as my dictionary key. While that idea wasn't introduced, we weren't told that it was illegal, either. In terms of the Wikipedia entry, no, I had not seen it. I was familiar with the Levenshtein distance computation from the paper-and-pencil approach, and that's why I tried to use the dictionary to replace the two-dimensional grid that you use with paper and pencil. I think the next commenter has it right: I accidentally discovered "tuples" and they worked the same way that a two-member list would have worked as a dictionary key. I have no idea of anything else I might be able to do with a tuple. I'll investigate the Python docs. So it wasn't a mere coincidence that your code implemented the algorithm on the wikipedia page even though you had not seen that particular page. That, plus your accidental discovery of using tuples as indexes is why your solution worked. Which is the question you asked. :-) 1
I first encountered the algorithm in connection with peptide sequences, where the "letters" are the 20 amino acids. The computation is a (first-pass) measure of how similar two peptides are. I was comparing the sequences of the thrombin-cleaved fibrinogen peptides in a wide variety of species. |