Sudoku Programmers Forum Index

 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister   ProfileProfile   Log inLog in          Games  Calendar

Log in to check your private messagesLog in to check your private messages   

generating random sudokugrids
Goto page 1, 2  Next
 
Post new topic   Reply to topic    Sudoku Programmers Forum Index -> The mathematics of sudoku
View previous topic :: View next topic  
Author Message
dukuso

Joined: 14 Jul 2005
Posts: 424
:
Location: germany

Items
PostPosted: Tue Jul 26, 2005 9:23 am    Post subject: generating random sudokugrids Reply with quote

how do you generate a random sudoku-grid ?
For 9*9 sudokugrids we could just enumerate equivalence classes,
choose one class with weighted probability according to
its size and then choose a random member in the class.
For larger sudokus this is infeasable.

For latin squares we have the nice Jacobsen-Matthews algorithm.
Is there something similar for sudokugrids ?
Back to top
View user's profile Send private message Send e-mail Visit poster's website
jaap

Joined: 13 Jun 2005
Posts: 24
:
Location: NL

Items
PostPosted: Tue Jul 26, 2005 10:27 am    Post subject: Re: generating random sudokugrids Reply with quote

dukuso wrote:
For latin squares we have the nice Jacobsen-Matthews algorithm.
Is there something similar for sudokugrids ?


I hadn't heard of that algorithm before, so I had to look it up. Sadly, I don't think it can be generalised to Sudoku in a straightforward manner.

The algorithm considers a latin square as a function f(row,column,value), which is 1 if that value is in the cell at that row and column, and 0 otherwise. The constraints of the latin square (every value once in each column and row, every cell only one value) can then be given as conditions on f, namely that the sum of f() over each of its coordinates is exactly 1.

The algorithm then gives a way of making a small change to the function such that it still satisfies the conditions. One example of such a change is replacing the value of 4 cells on a rectangle from this:

1...2
2...1

to this:

2...1
1...2

(Note: The algorithm also allows f to temporarily take on value -1 at one point, which is then usually rectified by the next change. This occurs if the rectangle doesn't contain only 2 distinct values. I'll ignore this for now as it is not important.)

All this relies on the fact however that the coordinates are orthogonal. If we were to add the Sudoku boxes as an extra condition, that condition is no longer guaranteed to hold after the change is made, so that won't work.

In other words, if the four corners of the rectangle lie in 4 different boxes, then we break the box condition by such a swap.

The only question that remains now is this:
Suppose we start with a filled grid, and allow these rectangle swaps only when the corners do not lie in 4 boxes (but only in 2), but in all other respects follow the J-M algorithm, can we reach every other possible Sudoku grid?

I suspect not.
_________________
Jaap
--
Jaap's Puzzle Page:
http://www.geocities.com/jaapsch/puzzles
Back to top
View user's profile Send private message Visit poster's website
dukuso

Joined: 14 Jul 2005
Posts: 424
:
Location: germany

Items
PostPosted: Tue Jul 26, 2005 11:53 am    Post subject: Reply with quote

>dukuso wrote:
>For latin squares we have the nice Jacobsen-Matthews algorithm.

sorry, it's Jacobson Matthews with two "o"

>Is there something similar for sudokugrids ?
>
>
>I hadn't heard of that algorithm before, so I had to look it up.
>Sadly, I don't think it can be generalised to Sudoku in a
>straightforward manner.
>
>The algorithm considers a latin square as a function
>f(row,column,value), which is 1 if that value is in the cell
>at that row and column, and 0 otherwise. The constraints of
>the latin square (every value once in each column and row,
>every cell only one value) can then be given as conditions on f,
>namely that the sum of f() over each of its coordinates is exactly 1.
>
>The algorithm then gives a way of making a small
>change to the function such that it still satisfies
>the conditions. One example of such a change is replacing
>the value of 4 cells on a rectangle from this:
>
>1...2
>2...1
>
>to this:
>
>2...1
>1...2
>
>(Note: The algorithm also allows f to temporarily take on
>value -1 at one point, which is then usually rectified by
>the next change. This occurs if the rectangle doesn't contain
>only 2 distinct values. I'll ignore this for now as it is not important.)

very good, that you figured this out in this short time.
I didn't find such a description on the web. I implemented
the algo some time ago, but didn't quite understand it.

>All this relies on the fact however that the coordinates
>are orthogonal. If we were to add the Sudoku boxes as an
>extra condition, that condition is no longer guaranteed
>to hold after the change is made, so that won't work.
>
>In other words, if the four corners of the rectangle lie
>in 4 different boxes, then we break the box condition by such a swap.
>
>The only question that remains now is this:
>Suppose we start with a filled grid, and allow these rectangle
>swaps only when the corners do not lie in 4 boxes (but only in 2),
>but in all other respects follow the J-M algorithm, can we reach
>every other possible Sudoku grid?
>
>I suspect not.

that's right. You can determine the cycle-structure for any pair
of symbols, that is the connected components in the graph whose
vertices are the 18-cells holding the 2 symbols and there is an edge,
iff the cells are in the same row,column,block.
The cycle structure is not changed by your operation and when I
tried 450 sudokus, 420 of them had different cycle-structure.

But the same is true for latin squares, so I assume where this
-1 conditions comes in to change the cycle-structure .



Maybe we could still generate a random latin square and then
make some small changes so the block-constraint is met.
But that would probably no longer be random enough.

You could also allow the -1 as in the JM and then check
every k JM-iterations whether the LS is still a sudoku,
if not then backtrack to the last known sudoku.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
jaap

Joined: 13 Jun 2005
Posts: 24
:
Location: NL

Items
PostPosted: Tue Jul 26, 2005 1:23 pm    Post subject: Reply with quote

dukuso wrote:
>(Note: The algorithm also allows f to temporarily take on
>value -1 at one point, which is then usually rectified by
>the next change. This occurs if the rectangle doesn't contain
>only 2 distinct values. I'll ignore this for now as it is not important.)

very good, that you figured this out in this short time.
I didn't find such a description on the web. I implemented
the algo some time ago, but didn't quite understand it.


Obviously I haven't tried the algo yet, but I was wrong saying that the -1 gets rectified at the next change. The -1 does get fixed, but more often than not a -1 appears at another point. The larger the square, the longer it takes before the -1 dissappears. It is only then that you have reached another latin square.

dukuso wrote:

But the same is true for latin squares, so I assume where this
-1 conditions comes in to change the cycle-structure .


That is my understanding too. What is really happening with that -1 value for f is that you are allowed one cell containing several things, namely two normal values as well as a debt of another value.

As said before, the change the algo makes to a rectangle is something like this:
3..5
5..3
becomes
5..3
3..5

Think of this as removing the first numbers, and then placing the next ones. If this same change were done to this rectangle:
3..5
5..7
then afterwards the bottom right cell still contains a 7, but also a 5 and a debt of 3:
3..5
5..{-3,5,7}

The next step must use the multivalued cell as the first corner of the rectangle, and have in the adjacent corners threes. For example:

4...3
3...{-3,5,7}

Swapping threes and for example sevens on these corners, we get:

{-7,4,3}...7
7.....{3,-3,5}

The added 3 cancels the debt of 3 in the bottom right so that we get:

{-7,4,3}...7
7.............5

Continue doing this until the fourth corner of the random rectangle happens to have a cancellation, and then we get a real latin square.

The problem is that you don't have much choice of rectangles when you have one of these multi-valued cells. You must use the rectangle that has the debt value at the adjacent corners. The only choice seems to be which of the two other values you will use in the swap.

dukuso wrote:

You could also allow the -1 as in the JM and then check
every k JM-iterations whether the LS is still a sudoku,
if not then backtrack to the last known sudoku.


Maybe. I have my doubts though, because the number of latin squares is so much higher than the number of sudoku's. Once it is no longer a sudoku, there is too little chance of it becoming a sudoku again. Therefore, this method will probably get stuck in a loop for a very long time.
_________________
Jaap
--
Jaap's Puzzle Page:
http://www.geocities.com/jaapsch/puzzles
Back to top
View user's profile Send private message Visit poster's website
dukuso

Joined: 14 Jul 2005
Posts: 424
:
Location: germany

Items
PostPosted: Tue Jul 26, 2005 3:06 pm    Post subject: Reply with quote

>As said before, the change the algo makes to a rectangle is
>something like this:
>3..5
>5..3
>becomes
>5..3
>3..5
>
>Think of this as removing the first numbers, and then placing
>the next ones. If this same change were done to this
>rectangle:
>3..5
>5..7
>then afterwards the bottom right cell still contains a 7, but
>also a 5 and a debt of 3:
>3..5
>5..{-3,5,7}

typo here ?

>The next step must use the multivalued cell as the first
>corner of the rectangle, and have in the adjacent corners
>threes. For example:
>
>4...3
>3...{-3,5,7}
>
>Swapping threes and for example sevens on these corners, we
>get:
>
>{-7,4,3}...7
>7.....{3,-3,5}
>
>The added 3 cancels the debt of 3 in the bottom right so that
>we get:
>
>{-7,4,3}...7
>7.............5
>
>Continue doing this until the fourth corner of the random
>rectangle happens to have a cancellation, and then we get a
>real latin square.

I have an idea how it works, still not exactly.
Sort of generalized chain

>The problem is that you don't have much choice of rectangles
>when you have one of these multi-valued cells. You must use
>the rectangle that has the debt value at the adjacent corners.
>The only choice seems to be which of the two other values you
>will use in the swap.

maybe it can be extended to sudokus nevertheless ?
You have to do further adjustments for the blocks though.

>dukuso wrote:
>
>You could also allow the -1 as in the JM and then check
>every k JM-iterations whether the LS is still a sudoku,
>if not then backtrack to the last known sudoku.
>
>
>Maybe. I have my doubts though, because the number of latin
>squares is so much higher than the number of sudoku's. Once it
>is no longer a sudoku, there is too little chance of it
>becoming a sudoku again. Therefore, this method will probably
>get stuck in a loop for a very long time.

just take care that the number of multi-valued cells don't increase
and then hope that there will be a chance eventually to reduce them.
Doesn't matter, if this takes longer than with latin squares



Guenter.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
jaap

Joined: 13 Jun 2005
Posts: 24
:
Location: NL

Items
PostPosted: Thu Jul 28, 2005 12:35 pm    Post subject: Reply with quote

dukuso wrote:

>Think of this as removing the first numbers, and then placing
>the next ones. If this same change were done to this
>rectangle:
>3..5
>5..7
>then afterwards the bottom right cell still contains a 7, but
>also a 5 and a debt of 3:
>3..5
>5..{-3,5,7}

typo here ?


I meant
5..3
3..{-3,5,7}

Each row/column still contains the same set of values as before.

dukuso wrote:
maybe it can be extended to sudokus nevertheless ?
You have to do further adjustments for the blocks though.


This is all interesting theory, but probably not practical until it is thoroughly researched and tested.

I am of the opinion it would be easier in practice to generate a full sudoku grid using a normal brute force sudoku solver. Just adjust it so that when it is at a branching point (e.g. choosing which value to try in the next cell) it tries each possibility in a random order, instead of numerical order. Then give this solver an empty grid and have it return the first solution it finds.

The only drawback with this kind of method is that you don't know for certain whether each grid has an equal chance of being generated. At some point one choice may still leave many possible grids to choose from, and another choice may force just one possible grid. On such a path the probabilities for the grids are obviously not the same. However, for every grid there are many paths for generating it, so I would hope that on average these biases would cancel each other out.

On second thought, this would probably be a little biased against sudoku grids containing one of those 2-value rectangles we mentioned before, e.g.
1...3
3...1
because the choice between that and its opposite (with 1,3 swapped) can occur fairly late in the decision tree. The more minor variations a grid has, the less likely it is.
_________________
Jaap
--
Jaap's Puzzle Page:
http://www.geocities.com/jaapsch/puzzles
Back to top
View user's profile Send private message Visit poster's website
dukuso

Joined: 14 Jul 2005
Posts: 424
:
Location: germany

Items
PostPosted: Thu Jul 28, 2005 1:46 pm    Post subject: Reply with quote

yes, the beauty of the JM-algo (I think I understand now how it works)
is, that they proved that
each LS is created with _exactly_ the same probability.

In practice you won't care, of course whether some grid
is a tiny bit more likely of being chosen than another one ;-)


You can always just start at some sudokugrid and then just try to
find the lexicographically next one(s) with backtracking
in however order. Or backtrack through all possibilities
to permute entries containing digits -say- 2,5,7 while keeping
the other entries. And then apply the 6^8*9!*2 sudoku-
transformations etc.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
antony

Joined: 22 Jul 2005
Posts: 13
:

Items
PostPosted: Fri Jul 29, 2005 1:43 am    Post subject: 2 methods for generating random grids Reply with quote

As far as I am concerned, I only experimented with 2 ways to generate random grids. In both cases, the absence of need for guessing is not guaranted, so a human-like solver should be used on the output grids.

The first way is mentioned on Wikipedia. It is quite fast with DLX, and produces a random (symmetric) pattern with a random number of hints, typically ranging from 21 to 27.

Code:
Start with an empty grid.
Repeat
   If there are too many hints (30 for a 3x3), empty the grid.
   Add one (or two symmetric) random hint(s).
   Solve.
   If no solution, remove the hint(s).
until the solution is unique.
For every hint (or pair of hints), in a random order,
   Remove it.
   Solve.
   If multiple solutions, put the hint(s) back.


The second way can generate grids according to a given pattern. I used it quite successfully on an experimental samurai-sudoku generator. Of course, one can still feed the algorithm with a random pattern. The number of givens can be fixed (it depends on the pattern.) For symmetric 3x3 grids, it generates 20- or 21-hint grids much faster than the algorithm above, but is slower for 24- to 27-hint grids.

There are 2 steps. The first one produces a random full grid, by using the solver with a few hints only. Then I apply the mask. The grid remains solvable, but with multiple solutions. The second step tries and reduces the number of solutions, down to 1 (not always, though) by randomly changing the digits in the disclosed cells.

Code:
Repeat
   Start with an empty grid.
   Repeat
      Add one random hint.
      Solve.
      If no solution, remove the hint.
   until there are a few hints. (e.g N=10)
   (at this point, the grid has multiple solutions.)
   Take hints from a solution according to a predefined pattern (i.e mask the first solution you find).
   Until the solution is unique, repeat
      Change the value of a hint randomly.
      Solve.
      If there is no solution, or more solutions than before*, restore the hint.
      If too many iterations (e.g 1000) without improvement, break.
until the solution is unique.


(*) Here, I use my small DLX solver to count the number of solutions. Since it could take days on grids with few hints, it aborts after a fixed (high) number of iterations. This gives me a poor (but good enough) estimate for the number of solutions. But a complete search has priority over a partial search, even if the number of solutions is a bit worse.

There is certainly nothing new in all this. Just wanted to mention it as a starting point for some interesting discussion Laughing


Last edited by antony on Fri Jul 29, 2005 2:32 pm; edited 1 time in total
Back to top
View user's profile Send private message
tilps

Joined: 19 Jun 2005
Posts: 44
:

Items
PostPosted: Fri Jul 29, 2005 9:26 am    Post subject: Re: 2 methods for generating random grids Reply with quote

antony wrote:
As far as I am concerned, I only experimented with 2 ways to generate random grids. In both cases, the absence of need for guessing is not guaranted, so a human-like solver should be used on the output grids.

The first way is mentioned on Wikipedia. It is quite fast with DLX, and produces a random (symmetric) pattern with a random number of hints, typically ranging from 21 to 27.

Code:
Start with an empty grid.
Repeat
   If there are too many hints (30 for a 3x3), empty the grid.
   Add one (or two symmetric) random hint(s).
   Solve.
   If no solution, remove the hint(s).
until the solution is unique.
Repeat
   For every hint (or pair of hints), in any order,
      Remove it.
      Solve.
      If multiple solutions, put the hint(s) back.
until the pass removed no hint.



A couple of points, 'too many hints' is ofcourse arbitary. I have a loop counter which clears the board if the number of loops is > boardsize cubed.
In the final step, one can make this faster by not attempting to remove the same hint/pair of hints twice. There is no need to repeat the pass, as the removing of hints wont allow you to remove other hints.
Back to top
View user's profile Send private message
antony

Joined: 22 Jul 2005
Posts: 13
:

Items
PostPosted: Fri Jul 29, 2005 2:31 pm    Post subject: Reply with quote

Oh, right! The last For loop does not need to be in a Repeat loop: if a pair can't be removed in the first pass, it can't be either in the second. I'm editing my previous message.
Back to top
View user's profile Send private message
ethel

Joined: 26 Aug 2005
Posts: 2
:

Items
PostPosted: Fri Aug 26, 2005 7:09 pm    Post subject: Re: generating random sudokugrids Reply with quote

dukuso wrote:
how do you generate a random sudoku-grid ?

Easy. Use any backtracking algorithm you like to fill in an empty grid, but work through the list of choices at each level in random order.

For example: set a random 81-long array of digits 0-8, called the offset list. Now build up from a blank sudoku so that at level r in the recursion, when we're trying to fill in the r^th cell of the sudoku grid, our k^th choice digit is 1 + (OffsetList[r]+k)%9.

Note that if you want to generate many independently-random sudokus then you need to start from scratch (with a new offset list) each time.
Back to top
View user's profile Send private message
dukuso

Joined: 14 Jul 2005
Posts: 424
:
Location: germany

Items
PostPosted: Fri Sep 09, 2005 4:31 am    Post subject: Re: generating random sudokugrids Reply with quote

ethel wrote:
dukuso wrote:
how do you generate a random sudoku-grid ?

Easy. Use any backtracking algorithm you like to fill in an empty grid, but work through the list of choices at each level in random order.

For example: set a random 81-long array of digits 0-8, called the offset list. Now build up from a blank sudoku so that at level r in the recursion, when we're trying to fill in the r^th cell of the sudoku grid, our k^th choice digit is 1 + (OffsetList[r]+k)%9.

Note that if you want to generate many independently-random sudokus then you need to start from scratch (with a new offset list) each time.




but it won't generate each grid with same probability
Back to top
View user's profile Send private message Send e-mail Visit poster's website
ethel

Joined: 26 Aug 2005
Posts: 2
:

Items
PostPosted: Sat Sep 10, 2005 7:21 pm    Post subject: Re: generating random sudokugrids Reply with quote

dukuso wrote:
ethel wrote:
dukuso wrote:
how do you generate a random sudoku-grid ?
Easy. Use any backtracking algorithm you like to fill in an empty grid, but work through the list of choices at each level in random order.
...


but it won't generate each grid with same probability


Yeah, okay, you got me. The algorithm assumes that each valid branch in the search tree has an equal number of grids below it, which might be rather far from the truth. I assume you're generating random grids in order to find 'nice' ones in some sense? In which case a fast biased algorithm might be better than a slow exact one. Confused
Back to top
View user's profile Send private message
mugnyte

Joined: 07 Dec 2005
Posts: 13
:
Location: portland, or

Items
PostPosted: Wed Jan 11, 2006 12:50 am    Post subject: Generation Algorithm Reply with quote

Just to add to the survey, here's mine:

(1) Place 9 randomized unique starts on the board, one in each box. This guarantees a solvable board. Or place a randomized pattern on the board (input).

(2) Complete the puzzle according to basic rules, accepting the first solution (there are several with only 9 starts)

(3) Remove cells until either S starts (S>=17, input) are left, or the board cannot remove any cells without making the solutions > 1 (count solutions using all logic possible for speed).

To ensure a specific difficulty level, I go a bit further:

(4) Solve the game left at (3) according to the logic modules chosen (input). Count the moves made, if >= required number of moves (input), keep. Otherwise, start over at (1).

The UI maps a simple slider to modules + moves, which is a "quick difficulty" chooser. Inputs: starts, modules, moves, [pattern], [start board]

This method is fairly efficient, except that step (4) can discard quite a few puzzles before attaining the required number of moves on tough puzzles. (max ~50)
_________________
thanks
Back to top
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger
Soultaker

Joined: 28 Feb 2006
Posts: 49
:

Items
PostPosted: Tue Feb 28, 2006 9:54 pm    Post subject: Reply with quote

I 'invented' a different method of generating a random solution grid, which, I hope, does not suffer from the problem of not generating all classes of valid grids with equal probability.

What I do is initialize the grid with a random permutation of all symbols (so I start, for example, with nine 1's on the first row, nine 2's on the second, and so on, and then shuffle the entire grid). This, ofcourse, is not a valid grid yet because there are many cases of two symbols (eg. two 1's) on the same row, column or block. I count each extra symbol in a row, column or block as a conflict.

To reduce the number of conflicts I use a process that can be seen as a primitive form of simulated annealing: I pick two random positions in the grid and swap their symbols. If this increases the number of conflicts, I swap them back. If however, the number of conflicts either decreases or stays the same I leave it at that. This step is repeated until the number of conflicts is reduced to zero, at which point the grid is valid (by the definition of a conflict).

This method may sound a little flaky (like many probabilistic algorithms) but in practice it works very well. I would expect to run into problematic grids, where I need to increase the number of conflicts before I can ever reduce them again, but this situation nevers seems to arise in practice.

I can easily generate over a hundred 9x9 grids per second. I tried larger grids and altough they are slower, I can still generate a valid grid as large as 64x64 (!) in less than five minutes. So, I think, it is a very practical algorithm, especially for grid sizes for which no equivalence classes are known (i.e. everything except 9x9).

Now, I have two questions that I'd like to pose here (which I have not been able to solve myself):
1) Does this algorithm select every possible grid with equal probability? It is clear that it does not exclude any possible grids and I don't see why it would favor one class over the other, but that is not a complete proof. Does anyone have any thoughts about this?
2) Is it possible to arrive at a grid where it is impossible to reduce the number of conflicts, without first increasing the number of conflicts (which is, in my current algorithm, impossible)? If not, why not?
Back to top
View user's profile Send private message MSN Messenger
Display posts from previous:   
Post new topic   Reply to topic    Sudoku Programmers Forum Index -> The mathematics of sudoku All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Sudoku Programmers topic RSS feed 


Powered by phpBB © 2001, 2005 phpBB Group

Igloo Theme Version 1.0 :: Created By: Andrew Charron