Sudoku Programmers Forum Index

 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister   ProfileProfile   Log inLog in          Games  Calendar

Log in to check your private messagesLog in to check your private messages   

Best sudoku data structure and best way to validate numbers

 
Post new topic   Reply to topic    Sudoku Programmers Forum Index -> Programming sudoku
View previous topic :: View next topic  
Author Message
ThemePark

Joined: 22 Mar 2009
Posts: 17
:

Items
PostPosted: Tue Feb 16, 2010 4:52 pm    Post subject: Best sudoku data structure and best way to validate numbers Reply with quote

I have decided to write a sudoku generator in C for a bigger hobby project I'm developing. And I would like the advice of people as to what is the best (fastest) data structure for storing a sudoku puzzle, and what in turn is the fastest way to check if an inserted number is row, column and box valid.

The idea is that I have a specific pattern, consisting of boolean 1s and 0s, and I want to generate all sudokus with that pattern, but only using a specific one of gsf's 416 bands (the way he has ordered the output of his sudoku program, when asked to generate all essentially different sudokus). So I will be running through each digit from 1 to 9 whereever there's a 1 in my pattern, and check if each digit is consistent with the puzzle so far.

Because of this, I want my generator to be as fast as possible, hence why I need advice on these choices. This is a hobby project, so readibility is not an issue for me, only the speed of the code and being able to understand what it does, obviously. However, the solver (JSolve) I use to check if a puzzle is valid, uses an array of chars, so 1 will be ASCII 49, 2 will be 50 and so on. Therefore I will need to convert each puzzle to the char format anyway, unless I choose to work with that from the beginning. But since my only concern is to get this to be as fast as possible, conversion is not an issue to me, unless it will make my program slower than other options.
Back to top
View user's profile Send private message
lkSudoku

Joined: 16 May 2009
Posts: 60
:

Items
PostPosted: Tue Feb 16, 2010 6:03 pm    Post subject: Reply with quote

Quote:
what is the best (fastest) data structure for storing a sudoku puzzle


Since you mention that you are using the JSolve solver for every digit, I would suggest that the data structure you are using will be the same as the internal data structure of JSolve (which is bit based, for every cell each bit represents the optional digit in the bit location), this will reduce the translation time, the characters format is translated to the internal bit based structure at solve time and translated back at the end

Quote:
what in turn is the fastest way to check if an inserted number is row, column and box valid


You should check after every new digit if there is still at least one solution, and if not backtrack and try the next option; I would recommend that you modify the JSolve (or any other solver you are using) so that instead of trying to solve from scratch after every digit change, you will keep the puzzle state before the new digit in the solver information format, and for wrong digit you backtrack to that state, after entering a new digit, another state should be stored
Back to top
View user's profile Send private message Send e-mail
JasonLion

Joined: 16 Nov 2008
Posts: 61
:
Location: Silver Spring, MD

Items
PostPosted: Tue Feb 16, 2010 6:05 pm    Post subject: Reply with quote

Do you really want "the fastest" approach if that means thousands of hours of programming? It is nearly always better to first get something that works running and then play around with different changes to see how you can improve the speed. While the data format does make a difference, changing the algorithm can make a much larger difference.

In almost all situations, simply using a character string with the current board in it and passing that to JSolve will be your best choice. However, if you want to do lots and lots of work you can store the board in the format that JSolve uses internally. That will allow you to remove the character string parser from JSolve and save a small amount of time. But be aware that there will be a fair amount of programming and it really won't gain much speed (perhaps a percent or two).

You can make much larger improvements by thinking about how you are going to choose which digit to try next and using some kind of backtracking search to find a combination of digits that work for the current pattern.

One simple step is to keep a bit map of which digits are still needed in each house. Then a simple logical and of the values for each house the current cell is in will give you a list of digits that might go into that cell.
Back to top
View user's profile Send private message
ThemePark

Joined: 22 Mar 2009
Posts: 17
:

Items
PostPosted: Wed Feb 17, 2010 12:02 am    Post subject: Reply with quote

Actually, I was originally only gonna call JSolve after I had placed a digit for every 1 in the pattern, but now I'm actually considering calling it every time I have placed a new digit. And this is what I mean by the best way, because what IS the best way to do it? If I only call JSolve after placing every digit, I reckon that I will have some puzzles which, even before placing the last few digits, won't be solvable (even if I make sure that I don't place a digit more than once in each row, column and box) but I am not sure of this. On the other hand, if I call JSolve after each digit placement, that would add a lot of extra time and code to run through.

I didn't know JSolve used a bit based data structure, I've just called the JSolve method with an array of chars, but using its own data structure is a good idea, so I will look into that.

Originally I was gonna use a for loop for each cell marked 1, regardless of whether the digit already existed and then use methods to check if the digit would fit in the row, column and digit. But I am now thinking of making an array of the 9 possible digits for each cell, and having a pointer for each cell, pointing to the last possible value in the array. Then I would swap the 2 corresponding values from each row, column and box when I'd place a new digit and move the pointer one back, and move the pointer one forward when I would backtrack. So something like this:

Code:
123456789
        ^
        End pointer

The digit 4 is chosen.
123956784
       ^
       End pointer


I will then use a pointer to iterate through all the possible digits, ending at the end pointer.

I do not want to spend thousands of hours on this, but it shouldn't be necessary at all. I can spend a few days or even weeks on it, but I want to improve the speed of this as much as I can, because since it's going to be going through millions of sudokus, speed will obviously matter, so I would like to make any improvement possible, which won't take months to implement. This is why I ask about the best way to check if a digit exists in a row, column or box, as I could use an int with bits for each digit, I could use a pointer, I could use a simple array with all the options as mentioned. What I'm interested in is what the best/fastest option would be.
Back to top
View user's profile Send private message
JasonLion

Joined: 16 Nov 2008
Posts: 61
:
Location: Silver Spring, MD

Items
PostPosted: Wed Feb 17, 2010 3:45 am    Post subject: Reply with quote

Based on what you just said, you really don't want to be thinking about things like using JSolve's internal data format for anything other than getting ideas from. You aren't anywhere near ready for that kind of optimization, and probably won't ever be. The gains possible from thinking that way are very small, while the gains possible from changing just about everything else you mentioned are huge.

If you place all of the digits first and then check, the odds of getting an acceptable board are extremely low. One the other hand, if you test after each individual digit is added you will be doing extra testing that you don't need to be doing.

The ideal approach, from my experience, is to place several digits without testing, then test, then switch to testing in small groups (say two to four at a time), finally after some # of additional candidates switch to testing each digit individually. The goal being to have about the same probability of having a valid board at each point where you check. At first, with only a couple of digits placed, the odds are a valid sudoku can be found. As the board starts to fill up, the odds of it being valid drop rapidly, till near the end it becomes quite unlikely to be correct.

Bit maps are much more efficient than lists of digits/swapping/and pointers. Spend a little time becoming fluent in thinking of an int/word as having one bit for each digit. The bit is set if that digit is allowed in the current context.

There are two basic approaches to keeping track of the remaining possibilities for each cell. One is to keep a bit map of possible digits in each house and a separate map of digits tried and rejected in specific cells. The other is to keep a bit map for each cell.

The "map" of digits rejected in specific cells can often be handled as part of a recursive function, one cell per recursion, rather than keeping a bit map for every cell. You can keep a bit map for each cell, which has some logical clarity advantages, but that tends to increase the amount of book keeping code required.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Sudoku Programmers Forum Index -> Programming sudoku All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Sudoku Programmers topic RSS feed 


Powered by phpBB © 2001, 2005 phpBB Group

Igloo Theme Version 1.0 :: Created By: Andrew Charron