Canonical sudoku algorithm?

vidarino · Posted: Tue Mar 28, 2006 7:26 am Post subject:

Ah, OK, I think it's slowly dawning on me what you're after. You want to find (or check that) two puzzles (that (may?) have the same solution) are to be considered equivalent, despite having different sets of givens.

I'm far from sure that such a comparison is possible, though. One way to compare two puzzles would be to compile some kind of solving paths / graphs for the puzzles and compare those.

In this thread on the Players' Forum a group of us are trying to find the hardest possible (and recently, the easiest possible) puzzle that requires nothing but singles. In that regard, we're using a solving path "breadth" measurement which may be similar to what you're looking for. But it would have to be extended beyond just counting singles, of course.

But as said, I'm not sure if even this will show that two different puzzles are equivalent, although the solver (human or not) might find them to be fairly close. As far as I can see, the only way to know for sure if they are equivalent is to canonicalize them by their solution grids, and see if they match up. If they do, they are definitely equivalent. If not, good luck. Wink

daj95376 · Posted: Tue Mar 28, 2006 4:42 pm Post subject:

Lummox JR wrote:

Ruud · Site Admin

Lummox JR · Posted: Tue Mar 28, 2006 9:29 pm Post subject:

Moschopulus · Posted: Tue Mar 28, 2006 10:56 pm Post subject:

daj95376 · Posted: Tue Mar 28, 2006 11:06 pm Post subject:

Lummox JR · Posted: Wed Mar 29, 2006 12:06 am Post subject:

Well daj, there are just a few rules. Two puzzles (and therefore solutions) are equivalent if:

- The puzzle is transposed (flipped along a diagonal)
- The order of 9x3 and/or 3x9 bands is permuted
- The order of columns or rows within bands are permuted
- The digits are permuted

None of those operations will change a puzzle in any significant way. The same solve path can still be followed to the solution, provided you also transform the steps of the path.

Lummox JR · Posted: Wed Mar 29, 2006 12:36 am Post subject:

I think I understand gsf's algorithm from the source code, now, well enough to explain it here.

Let's start with terminology. The sudoku is NxN in size, and PQ=N, where P>=Q, but P and Q are as close to each other as possible. If N is a square number, like in a standard 9x9 sudoku, P=Q. The boxes are arranged such that each is P columns by Q rows.

Each column or row of entire boxes we'll call a band. The P-bands will be P boxes stacked to PxN in size, and the Q-bands are Q boxes stacked to QxN in size. If you prefer, think of these as towers and slabs, respectively.

Step 1: "Loop A". Loop through each box, and each orientation (original or transposed). This is 2N iterations if P=Q (that is, 18 for a 9x9 puzzle), or N iterations otherwise; ignore transposition if P=Q. Move the chosen box to the box 1 position by transposing the puzzle if necessary, then swapping P- and Q-bands till they reach the desired position.
Step 2: "Loop B". Loop through each permutation of the columns and rows in box 1. This is P!Q! iterations, which in a 9x9 puzzle is 3!3!=36.
Step 3: Remap all digits in box 1 in the order 1, 2, 3, 4, etc. going left to right from the top line down.
Step 4: Find the smallest digit in row 1 that is not in box 1. Rearrange the P-bands so this goes in box 2, then rearrange the columns in this band so this part of row 1 is in ascending order. Repeat for all remaining P-bands.
Step 5: You can now look at the puzzle as an NQ-digit number (27 digits in a 9x9 puzzle), with r1c1 as the leftmost digit. Compare numbers; if your result is higher than the current minimum, bail out and move to the next iteration of loop B (step 2). If it is equal or lower, continue to the next steps.
Step 6: Find the smallest digit in column 1 that is not in box 1. Rearrange the Q-bands so this goes in box Q+1, then rearrange the rows in this band so this part of column 1 is in ascending order. Repeat for all remaining Q-bands.
Step 7: At this point, compare N^2-digit number like in step 5. If this is lower, it is the new minimum and the best candidate so far for canonical form. If it is equal, then two different transforms have resulted in the same candidate canonical form, and this algorithm cannot be used to canonicalize a set of givens.

Now, consider the last part I added. If it is possible for two solution grids to have the same candidate canonical form, it means it could be possible for two puzzles to have the same actual canonical form, which means two different methods of transforming the puzzle will result in two equivalent puzzles with different givens but the same exact solution.

It is possible to test this, at least if N is known. I don't think it'd be possible to determine if this is true in general for all sudoku of any arbitrary size. The way I'd test this would be as follows:

Step 1: "Loop A". Set r1c1 to 1, and iterate through all permutations of box 1's digits where r1c1 through r1cP are in ascending order, and if P=Q then r1c2<r2c1.
Step 2: "Loop B". Iterate through all permutations of row 1 where each P-band segment of row 1 is in ascending order. Iterate further through the rest of the possibilities for the entire top Q-band.
Step 3: "Loop C". Iterate through all permutations of column 1 where each Q-band segment of column 1 is in ascending order. Iterate further through the possibilities for the rest of the puzzle.
Step 4: At this point we have a sort of organized form of solution grid, formed by rearranging only, not by changing digits. Reduce this to canonical form. If the same form can be reached more than one way by the above algorithm, then that algorithm will not suffice to find canonical versions of all puzzles.

Now, who wants to try it? Anyone?

daj95376 · Posted: Wed Mar 29, 2006 12:51 am Post subject:

Okay. Below are two puzzles. The first is the easy1.ss puzzle supplied with Simple Sudoku. The second is the puzzle after two naked cells have been filled.

Lummox JR · Posted: Wed Mar 29, 2006 1:30 am Post subject:

Correct, they are not equivalent. The second one has extra givens, therefore it is impossible to follow the same solve path.

Moschopulus · Posted: Wed Mar 29, 2006 9:20 am Post subject:

Here is a grid:

124567893
378294516
659831742
987123465
231456978
546789321
863972154
495618237
712345689

If you rotate clockwise by 90 degrees, and relabel the digits by 1->3->9->7->1 and 2->6->8->4->2, you get the same grid.

Here is a puzzle whose solution is the above grid:

104000890
378200010
050830700
987000000
000050070
000009300
063070000
400008000
000000000

If you rotate clockwise by 90 degrees, and relabel the digits by 1->3->9->7->1 and 2->6->8->4->2, you don't get the same puzzle.

I must admit that I'm not sure if this is relevant to the discussion, but I hope so!

gsf · Posted: Wed Mar 29, 2006 11:36 am Post subject:

Soultaker · Posted: Wed Mar 29, 2006 2:52 pm Post subject:

This thread becomes more interesting by the day!