Efficiency of code for Sudoku question

Edit: I originally asked about the performance of my code to go along with the topic, but it seems it doesn't even fully work for all cases. I'll leave the topic, since it still stands as something worthwhile. My code is listed below where others have commented on it.

Peter's answer is long, and maybe hard to understand, but it goes through the rows and columns at the same time. I'm assuming that makes it faster and more efficient, since it loops through rows and columns separately.

It would be great if someone was willing to benchmark all the code posted by everyone so far to see whose was most efficient.

Kilby Sanchez
4921916

accept rate: 0%

Does your code work? It looks like it fails on cases such as

[[1, 2, 3, 4], [1, 2, 3, 4], [4, 3, 2, 1], [4, 3, 2, 1]]


and

[[1, 2, -1, 4], [2, 3, 4, 1], [3, 4, 1, 2], [4, 1, 2, 3]]

(15 Mar '12, 22:00)

My code, which is what fnenu is referring to:

def check_sudoku(s_list):
count = 0
for i in s_list:
for k in i:
if i[count] == k and i.index(k) != count or k > len(s_list):
return False
count += 1
count = 0
for i in s_list:
for k in s_list:
if i[count] == k[count] and i != k:
return False
count += 1
return True

(16 Mar '12, 00:48)

One way to measure the "efficiency" of Peter's solution is to count how many times each Sudoku square is visited. His solution starts with digit at 1 and counts how often it appears in column 1, then row 1 (it should be once each), then in column 2 and row 2 (again, once each). However, it should be just as efficient (though slightly longer in terms of writing code) if he visited each row first, then each column afterwards. Interleaving rows and columns doesn't make it more efficient.

To see this, imagine if you have a class of 25 students, seated in 5 rows with 5 columns. You say the name of each student, row by row, then column by column. Each student's name is called out twice, once as everyone's name is read row by row, and once again as it's being read column by column. Even if you alternated reading row 1, then column 1, then row 2, and column 2, you'd agree that each row is read once, and each column is read once, and so each name is read twice.

Since there are N * N squares and each are read twice, then there's 2 * N * N squares looked at for each digit. There are N digits, so the efficiency is 2 * N * N * N. In algorithms, this is said to be O(N^3). Constants like 2 are generally ignored.

Your code take a different approach. You are essentially checking for duplicates. In your first loop, you take a row and set count to the 0th element of that row, then search through the rest of the list to see if there's a match. However, you know k might be the same position of as row[count] (you call it i[count]), so you exclude it, and you double check that the number is not bigger than N. You should also check that the number is not less than 1.

Why? Take any valid solution of Sudoku. Negate all the numbers. This is no longer valid (since numbers must be between 1 and N). Yours would (in theory) say it's valid because all you're doing is checking for duplicates and there won't be any.

Now, it turns out your first loop doesn't work. What it's doing is looking at row 0 (which is a list) and at the first element of the list (when count is set to 0) and checking to see if the rest of the list has a number that matches the first element. What if the 3rd and 4th element are the same? You don't compare those two numbers to each other. Then, in the next iteration, you are looking at row 1 (the next row). count is 1, so you are looking at the second element of the list, and comparing every other number to the second number. Indeed, for each row, i, you are only check if the i_th element of the i_th row is duplicated, instead of comparing every pair of numbers within per row.

Your second loop does the same thing. Essentially, you compare the row 0, col 0 value to ever other value in that the column, but you don't make a check to see if, say, row 3, col 0 is equal to row 2, col 0. Essentially for each column, you're picking a specific number in that column and asking if the other numbers are equal to that number, but you are not checking every possible pair of numbers in the column.

So you weren't more inefficient, but being correct is the first priority, then efficient (one can always write an efficient incorrect solution). You had a good idea, but the code doesn't quite do what you thought it would do.

Charles Lin
9.5k4295136

My code that charlzz is referring to:

def check_sudoku(s_list):
count = 0
for i in s_list:
for k in i:
if i[count] == k and i.index(k) != count or k > len(s_list):
return False
count += 1
count = 0
for i in s_list:
for k in s_list:
if i[count] == k[count] and i != k:
return False
count += 1
return True


Funny enough, I actually got this marked right. The test cases weren't that extensive, I suppose. To fix my code would require adding an extra loop in each of the columns and rows sections, along with more count variables, which should make the entire thing very inefficient, I figure.

(16 Mar '12, 00:49)

All of the code examples I've looked at so far seem to be running in time O(n^3), or in a few cases O(n^3 log n). It's not always obvious since Python can wrap a lot of processing into one line of code. Has anyone come up with an algorithm that does better than O(n^3)?

EDIT: Now that I think about it, I see that it can be done in only O(n^2 log n). A nice example of this is the beautiful one-line procedure by @EnTerr. It starts by appending the original Sudoku square (a list of row-lists) with its transpose (a list of column-lists), so there are 2n lists of n elements each. This bit of overhead should take O(n^2) operations, I think. Then each of these 2n lists is sorted (which should take O(n log n) steps) and then compared to the range 1,...,n (which will take O(n) steps). Altogether that's O(n^2) + 2n * [O(n log n) + O(n)]. The dominant term there is 2n*O(n log n), or in other words O(n^2 log n).

Kurt Van Etten
6.6k1833

Actually, I wouldn't be surprised if creating the transposed was an O(1) operation. (I think zip is implemented as a generator, although I'm not sure.) Accesses are more expensive, of course.

(16 Mar '12, 10:59)

Two nested loops so O(n^2) but I don't know about the complexity of append. Any ideas? Anything else expensive going on in there?

def check_sudoku(square):
n=len(square)
if n==0:
return False
colnum=0
for e in square:   # n elements
if len(e) != n:
return False
checker=1
rownum=0
col=[]
for nums in square: # n elements
col.append(square[rownum][colnum])  # complexity?
rownum+=1
colnum+=1
for nums in square:   # n elements
if checker not in e or checker not in col:  # I think this is probably O(1) -  hash table?
return False
checker+=1
return True
(16 Mar '12, 23:36)

@fnenu: I had a little bit of trouble understanding your code at first because of the two "for nums in square" loops, since you don't actually use nums anywhere. You'd be better off doing something like "for rownum in range(n)" and "for checker in range(1, n+1)". But that shouldn't affect the overall complexity of your code.

I'd assume that append() is an O(1) operation. However, I don't think checking membership in a list is O(1). I don't know anything about how Python implements lists internally but it seems unlikely that it would use hash tables for this. However, you raise a good point--you could rewrite this to use a dict instead of a list to store the rows and columns, and then the membership check could be done in O(1). I haven't thought it through carefully yet, but it seems like this would bring your algorithm down to O(n^2).

(17 Mar '12, 23:51)

range(n) wasn't taught in the course up to that point so I worked around it. I prefer to work within the bounds of the course instead of using other things in what I submit. I guess I should have tidied it up before posting.

I tried to find some info on 'in' but couldn't work out what to look for to find its complexity. Doesn't look like it is O(1) though as you say.

Edit: http://wiki.python.org/moin/TimeComplexity so it's O(n).

(18 Mar '12, 00:12)

After finding if ... in... is O(n), I tried again. This code runs more slowly on small squares than my other code, but it's faster on larger for 100x100 squares. So have I missed anything 'expensive' on this?

def check_sudoku(square):
n=len(square)
if n==0:
return False
counter=0
for row in square:   # n elements
# check the rows are the right length
if len(row) != n:
return False
checker=1

# create an empty column and row which will contain info as to whether
# the number has occurred before
newcol = []
newrow = []
for i in range(n):
newcol.append(False)
newrow.append(False)

# Change False to True if they're in the right range and aren't a duplicate.
# If it is a duplicate, return False
for i in range(n): # n elements
if square[i][counter] >0 and square[i][counter] <= n and not newcol[square[i][counter]-1]:
newcol[square[i][counter]-1] = True
else:
return False
if row[i] >0 and row[i] <= n and not newrow[row[i]-1]:
newrow[row[i]-1] = True
else:
return False
counter += 1

return True


fnenu-1 ♦♦
18.7k1981231

Hey, that looks really nice, and (unless I'm also missing something) it run in time O(n^2). And since you have to look at all n^2 entries at least once to check the Sudoku square, that's the best possible result one can get.

(20 Mar '12, 00:31)

Question text:

Markdown Basics

• *italic* or _italic_
• **bold** or __bold__
• image?![alt text](/path/img.jpg "Title")
• numbered list: 1. Foo 2. Bar
• to add a line break simply add two spaces to where you would like the new line to be.
• basic HTML tags are also supported

×29,746
×130
×118
×68
×63
×24