generating random sudokugrids

dukuso · Posted: Tue Jul 26, 2005 9:23 am Post subject: generating random sudokugrids

how do you generate a random sudoku-grid ?
For 9*9 sudokugrids we could just enumerate equivalence classes,
choose one class with weighted probability according to
its size and then choose a random member in the class.
For larger sudokus this is infeasable.

For latin squares we have the nice Jacobsen-Matthews algorithm.
Is there something similar for sudokugrids ?

jaap

dukuso · Posted: Tue Jul 26, 2005 11:53 am Post subject:

>dukuso wrote:
>For latin squares we have the nice Jacobsen-Matthews algorithm.

sorry, it's Jacobson Matthews with two "o"

>Is there something similar for sudokugrids ?
>
>
>I hadn't heard of that algorithm before, so I had to look it up.
>Sadly, I don't think it can be generalised to Sudoku in a
>straightforward manner.
>
>The algorithm considers a latin square as a function
>f(row,column,value), which is 1 if that value is in the cell
>at that row and column, and 0 otherwise. The constraints of
>the latin square (every value once in each column and row,
>every cell only one value) can then be given as conditions on f,
>namely that the sum of f() over each of its coordinates is exactly 1.
>
>The algorithm then gives a way of making a small
>change to the function such that it still satisfies
>the conditions. One example of such a change is replacing
>the value of 4 cells on a rectangle from this:
>
>1...2
>2...1
>
>to this:
>
>2...1
>1...2
>
>(Note: The algorithm also allows f to temporarily take on
>value -1 at one point, which is then usually rectified by
>the next change. This occurs if the rectangle doesn't contain
>only 2 distinct values. I'll ignore this for now as it is not important.)

very good, that you figured this out in this short time.
I didn't find such a description on the web. I implemented
the algo some time ago, but didn't quite understand it.

>All this relies on the fact however that the coordinates
>are orthogonal. If we were to add the Sudoku boxes as an
>extra condition, that condition is no longer guaranteed
>to hold after the change is made, so that won't work.
>
>In other words, if the four corners of the rectangle lie
>in 4 different boxes, then we break the box condition by such a swap.
>
>The only question that remains now is this:
>Suppose we start with a filled grid, and allow these rectangle
>swaps only when the corners do not lie in 4 boxes (but only in 2),
>but in all other respects follow the J-M algorithm, can we reach
>every other possible Sudoku grid?
>
>I suspect not.

that's right. You can determine the cycle-structure for any pair
of symbols, that is the connected components in the graph whose
vertices are the 18-cells holding the 2 symbols and there is an edge,
iff the cells are in the same row,column,block.
The cycle structure is not changed by your operation and when I
tried 450 sudokus, 420 of them had different cycle-structure.

But the same is true for latin squares, so I assume where this
-1 conditions comes in to change the cycle-structure .

Maybe we could still generate a random latin square and then
make some small changes so the block-constraint is met.
But that would probably no longer be random enough.

You could also allow the -1 as in the JM and then check
every k JM-iterations whether the LS is still a sudoku,
if not then backtrack to the last known sudoku.

jaap · Posted: Tue Jul 26, 2005 1:23 pm Post subject:

dukuso · Posted: Tue Jul 26, 2005 3:06 pm Post subject:

>As said before, the change the algo makes to a rectangle is
>something like this:
>3..5
>5..3
>becomes
>5..3
>3..5
>
>Think of this as removing the first numbers, and then placing
>the next ones. If this same change were done to this
>rectangle:
>3..5
>5..7
>then afterwards the bottom right cell still contains a 7, but
>also a 5 and a debt of 3:
>3..5
>5..{-3,5,7}

typo here ?

>The next step must use the multivalued cell as the first
>corner of the rectangle, and have in the adjacent corners
>threes. For example:
>
>4...3
>3...{-3,5,7}
>
>Swapping threes and for example sevens on these corners, we
>get:
>
>{-7,4,3}...7
>7.....{3,-3,5}
>
>The added 3 cancels the debt of 3 in the bottom right so that
>we get:
>
>{-7,4,3}...7
>7.............5
>
>Continue doing this until the fourth corner of the random
>rectangle happens to have a cancellation, and then we get a
>real latin square.

I have an idea how it works, still not exactly.
Sort of generalized chain

>The problem is that you don't have much choice of rectangles
>when you have one of these multi-valued cells. You must use
>the rectangle that has the debt value at the adjacent corners.
>The only choice seems to be which of the two other values you
>will use in the swap.

maybe it can be extended to sudokus nevertheless ?
You have to do further adjustments for the blocks though.

>dukuso wrote:
>
>You could also allow the -1 as in the JM and then check
>every k JM-iterations whether the LS is still a sudoku,
>if not then backtrack to the last known sudoku.
>
>
>Maybe. I have my doubts though, because the number of latin
>squares is so much higher than the number of sudoku's. Once it
>is no longer a sudoku, there is too little chance of it
>becoming a sudoku again. Therefore, this method will probably
>get stuck in a loop for a very long time.

just take care that the number of multi-valued cells don't increase
and then hope that there will be a chance eventually to reduce them.
Doesn't matter, if this takes longer than with latin squares

Guenter.

jaap · Posted: Thu Jul 28, 2005 12:35 pm Post subject:

dukuso · Posted: Thu Jul 28, 2005 1:46 pm Post subject:

yes, the beauty of the JM-algo (I think I understand now how it works)
is, that they proved that
each LS is created with _exactly_ the same probability.

In practice you won't care, of course whether some grid
is a tiny bit more likely of being chosen than another one ;-)

You can always just start at some sudokugrid and then just try to
find the lexicographically next one(s) with backtracking
in however order. Or backtrack through all possibilities
to permute entries containing digits -say- 2,5,7 while keeping
the other entries. And then apply the 6^8*9!*2 sudoku-
transformations etc.

antony · Last edited by antony on Fri Jul 29, 2005 2:32 pm; edited 1 time in total

As far as I am concerned, I only experimented with 2 ways to generate random grids. In both cases, the absence of need for guessing is not guaranted, so a human-like solver should be used on the output grids.

The first way is mentioned on Wikipedia. It is quite fast with DLX, and produces a random (symmetric) pattern with a random number of hints, typically ranging from 21 to 27.

tilps

antony · Posted: Fri Jul 29, 2005 2:31 pm Post subject:

Oh, right! The last For loop does not need to be in a Repeat loop: if a pair can't be removed in the first pass, it can't be either in the second. I'm editing my previous message.

ethel · Posted: Fri Aug 26, 2005 7:09 pm Post subject: Re: generating random sudokugrids

dukuso · Posted: Fri Sep 09, 2005 4:31 am Post subject: Re: generating random sudokugrids

ethel · Posted: Sat Sep 10, 2005 7:21 pm Post subject: Re: generating random sudokugrids

mugnyte · Posted: Wed Jan 11, 2006 12:50 am Post subject: Generation Algorithm

Just to add to the survey, here's mine:

(1) Place 9 randomized unique starts on the board, one in each box. This guarantees a solvable board. Or place a randomized pattern on the board (input).

(2) Complete the puzzle according to basic rules, accepting the first solution (there are several with only 9 starts)

(3) Remove cells until either S starts (S>=17, input) are left, or the board cannot remove any cells without making the solutions > 1 (count solutions using all logic possible for speed).

To ensure a specific difficulty level, I go a bit further:

(4) Solve the game left at (3) according to the logic modules chosen (input). Count the moves made, if >= required number of moves (input), keep. Otherwise, start over at (1).

The UI maps a simple slider to modules + moves, which is a "quick difficulty" chooser. Inputs: starts, modules, moves, [pattern], [start board]

This method is fairly efficient, except that step (4) can discard quite a few puzzles before attaining the required number of moves on tough puzzles. (max ~50)
_________________
thanks

Soultaker · Posted: Tue Feb 28, 2006 9:54 pm Post subject:

I 'invented' a different method of generating a random solution grid, which, I hope, does not suffer from the problem of not generating all classes of valid grids with equal probability.

What I do is initialize the grid with a random permutation of all symbols (so I start, for example, with nine 1's on the first row, nine 2's on the second, and so on, and then shuffle the entire grid). This, ofcourse, is not a valid grid yet because there are many cases of two symbols (eg. two 1's) on the same row, column or block. I count each extra symbol in a row, column or block as a conflict.

To reduce the number of conflicts I use a process that can be seen as a primitive form of simulated annealing: I pick two random positions in the grid and swap their symbols. If this increases the number of conflicts, I swap them back. If however, the number of conflicts either decreases or stays the same I leave it at that. This step is repeated until the number of conflicts is reduced to zero, at which point the grid is valid (by the definition of a conflict).

This method may sound a little flaky (like many probabilistic algorithms) but in practice it works very well. I would expect to run into problematic grids, where I need to increase the number of conflicts before I can ever reduce them again, but this situation nevers seems to arise in practice.

I can easily generate over a hundred 9x9 grids per second. I tried larger grids and altough they are slower, I can still generate a valid grid as large as 64x64 (!) in less than five minutes. So, I think, it is a very practical algorithm, especially for grid sizes for which no equivalence classes are known (i.e. everything except 9x9).

Now, I have two questions that I'd like to pose here (which I have not been able to solve myself):
1) Does this algorithm select every possible grid with equal probability? It is clear that it does not exclude any possible grids and I don't see why it would favor one class over the other, but that is not a complete proof. Does anyone have any thoughts about this?
2) Is it possible to arrive at a grid where it is impossible to reduce the number of conflicts, without first increasing the number of conflicts (which is, in my current algorithm, impossible)? If not, why not?