Sudoku Programmers Forum Index

 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister   ProfileProfile   Log inLog in          Games  Calendar

Log in to check your private messagesLog in to check your private messages   

Statistics of clue placement

 
Post new topic   Reply to topic    Sudoku Programmers Forum Index -> The mathematics of sudoku
View previous topic :: View next topic  
Author Message
Lummox JR

Joined: 07 Sep 2005
Posts: 202
:

Items
PostPosted: Thu Sep 08, 2005 11:38 pm    Post subject: Statistics of clue placement Reply with quote

I've been having a devil of a time trying to generate sudoku that are not either too easy (anything more complex than box-line interaction is rare) or too hard (requiring coloring, Nishio, etc.). I'm wondering now if there's some means of determining a good clue set based on frequency of digits and position. That is, perhaps there's some secret to positioning clues so as to develop a truly crafty puzzle.

It's been my observation that digits, at least, play a role in this. When several digits appear only once, and especially if one appears not at all, puzzles are harder to solve. And obviously a board will be non-unique if two digits are missing from the clues entirely.

Is there any statistical information that could point me in the direction of a better clue builder for a unique grid, with the aim of achieving a certain difficulty level?
Back to top
View user's profile Send private message
dukuso

Joined: 14 Jul 2005
Posts: 424
:
Location: germany

Items
PostPosted: Fri Sep 09, 2005 4:23 am    Post subject: Reply with quote

you can fix the positions of the clues and just permute their values.
I think Nick70 uses this method and produced quite some hard puzzles with it in another thread here.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Nick70

Joined: 08 Jun 2005
Posts: 160
:

Items
PostPosted: Fri Sep 09, 2005 9:04 am    Post subject: Reply with quote

dukuso wrote:
you can fix the positions of the clues and just permute their values.
I think Nick70 uses this method and produced quite some hard puzzles with it in another thread here.

Yes I lock the position of the clues, choose an order to fill the clues, and then
place 1 in the first cell
place 2 or 1 in the second cell
place 3 or 2 or 1 (if 2 in second cell), 2 or 1 (if 1 in second cell) in the third cell

and so on

In theory one could find all possible puzzles with a given clue pattern, but it takes a very long time. The filling order of the clues is important to reduce the time, e.g. if the first 4 clues are all in the same unit they can only be 1234 and nothing else.
Back to top
View user's profile Send private message
Lummox JR

Joined: 07 Sep 2005
Posts: 202
:

Items
PostPosted: Sat Sep 10, 2005 5:22 am    Post subject: Reply with quote

To lock clue positions though you'd at least need a good idea where to put them, which brings me full circle back to the question of statistics.

Based on rudimentary observations so far it seems that the hardest puzzles will come from leaving out the clues in an entire box, column, row, and/or digit. In theory it should be possible to leave out up to 3 boxes in a 9x9 grid; in practice leaving out more than one box is probably not very workable without a large set of clues. For columns, rows, and digits, only one can be omitted entirely to still maintain uniqueness, and if all three omissions are used then the missing digit may not appear at the intersection of the missing column and row. I believe it's likely to turn out that any 2 of the positional constraints (box, column, row) may not intersect a missing digit. It is possible to leave out a box, column, and row such that they all intersect, and in fact may even be necessary when leaving out all three.

Based on the above, I believe it might be possible to generate clues for more difficult puzzles by first choosing 1-3 constraints in which to leave out information (e.g., show no 7's, no clues in box 2), then filling in one clue each where information is needed (row 1, column 5, box 3, digit 8, etc.) and only then adding more--still according to chosen constraints--to create a solvable grid. Another means of controlling difficulty would be to limit the number of clues for several boxes/columns/rows/digits to just 1.
Back to top
View user's profile Send private message
dukuso

Joined: 14 Jul 2005
Posts: 424
:
Location: germany

Items
PostPosted: Sat Sep 10, 2005 6:15 am    Post subject: Reply with quote

I didn't yet observe, that leaving out a row,column,box is
characteristic for hard puzzles. Where do you base this upon ?
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Nick70

Joined: 08 Jun 2005
Posts: 160
:

Items
PostPosted: Sat Sep 10, 2005 9:17 am    Post subject: Reply with quote

Lummox JR wrote:
To lock clue positions though you'd at least need a good idea where to put them, which brings me full circle back to the question of statistics.

The difficulty isn't determined by the position of the clues alone. It depends on what you put in the clues. With a fixed clue arrangement you can have all difficulties ranging from super-easy to super-natural.
Back to top
View user's profile Send private message
Lummox JR

Joined: 07 Sep 2005
Posts: 202
:

Items
PostPosted: Sat Sep 10, 2005 4:06 pm    Post subject: Reply with quote

It's a superficial, but I think valid, observation of mine that leaving out a box/column/row/digit increases difficulty. In all cases you're left without a clue for one part of the puzzle, and the only way to gain a clue is to fill in something else. In the case of a missing column for example, you can only fill in a value for that column if a row is almost completely filled. If a box intersecting the column is full in its other columns (a difficult thing to do without one column to eliminate choices), you can use a pointing pairs/triples test to rule out some choices for the missing column, but that's about it. The same logic can be extended to other constraints, since obviously if you don't have a 7 on the board, you can't even hope to fill it until you have a naked single for it.

Most puzzles I've attempted which fit such a constraint have been considerably harder. Likewise sparseness of clues in boxes/columns/rows/digits is also a factor. I'm just wondering if anyone else has observed situations like this, and if they have any statistical info to help find better clue patterns. If clue frequency is represented by a 36-element array (clue in box, column, row, digit), I suspect that the array's mean and standard deviation are both strongly linked to puzzle difficulty. Is there a database of puzzles that this could be tested on?
Back to top
View user's profile Send private message
Nick70

Joined: 08 Jun 2005
Posts: 160
:

Items
PostPosted: Sat Sep 10, 2005 4:49 pm    Post subject: Reply with quote

Lummox JR wrote:
If clue frequency is represented by a 36-element array (clue in box, column, row, digit), I suspect that the array's mean and standard deviation are both strongly linked to puzzle difficulty.

That's just not true.

I have generated almost 100,000 puzzles, all using the same clue pattern. They run across the whole difficulty scale.
Back to top
View user's profile Send private message
Nick70

Joined: 08 Jun 2005
Posts: 160
:

Items
PostPosted: Sat Sep 10, 2005 4:57 pm    Post subject: Reply with quote

Here are two puzzles using the same clue pattern and having extremely different difficulty:

Code:
1...2...3
...5...7.
..9...8..
.3...9...
4...6...7
...1...3.
..1...5..
.4...3...
5...8...9


1...2...3
...6...4.
..9...5..
.6...8...
4...6...7
...3...8.
..3...2..
.8...4...
5...8...9
Back to top
View user's profile Send private message
Lummox JR

Joined: 07 Sep 2005
Posts: 202
:

Items
PostPosted: Sat Sep 10, 2005 7:07 pm    Post subject: Reply with quote

I don't doubt that other factors are involved, granted, but I still think there's some truth in the idea that difficulty is easier to seed with certain patterns vs. others--patterns not just of placement but also of digit choice.

One interesting statistic I'd like to see for your model is how puzzle difficulty is impacted by a limit on the available choices based on previous clue placements. I.e., is a puzzle more difficult when the next choice at each step is more random, or when there are only a few choices available for certain steps?

Also, can you achieve the same result (a wide variety of difficulty levels) with any randomly chosen pattern of a certain number of clues? And that being the case, knowing there are zillions of permutations possible, how quickly can you find such a variety?
Back to top
View user's profile Send private message
Lummox JR

Joined: 07 Sep 2005
Posts: 202
:

Items
PostPosted: Sat Sep 10, 2005 10:13 pm    Post subject: Reply with quote

After testing the method of leaving clue positions constant and just trying different numbers, I've come to the conclusion that it's no improvement over any other algorithm. It takes an inordinately long time to find a difficult problem, and the results are just as good as if I'd simply tried clues entirely at random over and over again.
Back to top
View user's profile Send private message
dukuso

Joined: 14 Jul 2005
Posts: 424
:
Location: germany

Items
PostPosted: Sat Sep 10, 2005 11:45 pm    Post subject: Reply with quote

here is some statistics.
1e6 randomn locally minimal sudokus.

free rows, average nodes, sudokus
0,83.9,857233
1,81.2,137789
2,81.0,4945
3,90.0,33


free columns,average nodes, sudokus
0,83.8,855902
1,81.6,139046
2,80.5,5013
3,79.9,39

free blocks,average nodes,sudokus
0,83.5,884184
1,82.9,112100
2,86.8,3697
3,115.4,19

not much difference in hardness depending on free
rows,cols,blocks
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    Sudoku Programmers Forum Index -> The mathematics of sudoku All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Sudoku Programmers topic RSS feed 


Powered by phpBB © 2001, 2005 phpBB Group

Igloo Theme Version 1.0 :: Created By: Andrew Charron