thinking about data structures

Due to an auto-grading error, Wobbly has graduated CS101 top of the class, his details have been passed on to BitBank.com and he arrives for his first day at work. His first assignment is to review the method by which PINs and card numbers are checked when they arrive 'down the wire' as PIN, card# pairs. The return value is either 'Incorrect PIN' or 'PIN accepted'. The current system is a list of lists, length 9999. In each sub-list are the 16 digit card numbers that match that PIN. Example, in element 3245 (Wobbly's PIN) is his card number 0890 1281 9090 2177 (plus lots of others also associated with PIN 3245). There are about 10 million card numbers. When a pair arrives, the current program goes to the PINth element of the main list and does something like,

if 0890 1281 9090 2177 in biglist[3245]:
    return 'PIN accepted'
else:
    return 'Incorrect PIN'

Any new card number are placed in their PINth element. Wobbly has no idea whether or not this is an efficient method and is left sucking his pencil at his desk. Can you help?

asked 25 Mar '12, 10:42

wobbly-1's gravatar image

wobbly-1
30241326
accept rate: 35%


3 Answers:

It's not efficient, you would use a database.

Also grouping by PIN would be a very poor idea. Think of all the people that would use 1234 (same as my luggage :O ) or repeating numbers like 2222 or 5555.

link

answered 25 Mar '12, 11:21

mhurron's gravatar image

mhurron
2.6k11135

It is not efficient, you would use a set.

All you care about is whether a certain credit card has a certain pin or not. You don't care about the order, and you don't keep any extra information; you just want fast membership tests.

Alternatively, you can use a dictionary and have the credit card number be the key, and the pin number the value. This means you can (quickly) check whether some card exists, and not only whether it matches a given pin.

Which is better? Profile. This choice also largely depends on what you want to do with the data (except for what you've already stated).

That's in-memory; out of memory you'd probably use a database as SMB says.

link

answered 25 Mar '12, 11:37

Anton%20Golov's gravatar image

Anton Golov ♦
13.3k2174174

jesyspa: what is a set?

Nonetheless, I took wobbly's question as a little extra homework (thanks) and measured the time between list and dictionary (with 10 million cardnumbers as suggested).
The dictionary fills 15 - 20% faster and the lookup in the dictionary is faster by a factor of 10:

Time for filling list: 3.60431591166
Time for filling dictionary: 3.03564838548
Time for lookup in list: 4.55365137189e-05
PIN accepted
Time for lookup in dictionary: 5.30793718134e-06
PIN accepted

Don't know if I should post the whole code here. Nevermind.

link

answered 25 Mar '12, 12:26

Andre%20Luecke's gravatar image

Andre Luecke
5993823

why is the dictionary faster?

(25 Mar '12, 13:38) wobbly-1 wobbly-1's gravatar image

If you have 10 million card numbers and 10000 pins and the pins are evenly distributed then you have 1000 card numbers for each pin ( -> a list with 1000 entries for each pin).
So even if you access the biglist at the right index you still have to check 1000 card numbers.
The dictionary is a hashtable as we have learned. Therefore the lookup in a dictionary accesses the searched key-value-pair right away.

(25 Mar '12, 13:47) Andre Luecke Andre%20Luecke's gravatar image
Your answer
Question text:

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×15,335
×33
×21
×9
×6
×1

Asked: 25 Mar '12, 10:42

Seen: 282 times

Last updated: 25 Mar '12, 13:47