Sudoku Programmers Forum Index

 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister   ProfileProfile   Log inLog in          Games  Calendar

Log in to check your private messagesLog in to check your private messages   

Datasets for solver benchmarking

 
Post new topic   Reply to topic    Sudoku Programmers Forum Index -> Solving sudoku
View previous topic :: View next topic  
Author Message
Red Ed

Joined: 04 Nov 2006
Posts: 21
:

Items
PostPosted: Sat Jun 23, 2007 7:49 pm    Post subject: Datasets for solver benchmarking Reply with quote

This thread discusses what puzzle sets to use when evaluating / comparing solvers.

A common mistake is to say that a solver can crack "99% of all puzzles", or to try to compare success rates of solver A on collection X with solver B on a different collection Y; or even to just compare two solvers on a single dataset and expect that to generalise somehow to "all puzzles" - whatever that means. (By "puzzle", I always mean one that has a unique solution.) This thread advocates moving away from talking about "all puzzles" towards benchmarks based on standard datasets and generators.

First, I'll argue that the set of all puzzles is not interesting. It's been estimated here that there are ~6e45 distinct sudoku puzzles. It is possible to sample from this space in a virtually unbiased manner, from which estimates (such as this) can be produced for the proportion of puzzles with a given number of clues, that are solvable with technique X, or whatever. My own simulations from the space of all puzzles shows that ~99.995% have at least 30 clues (which is rather more than you see in newspapers and in the popular online puzzle collections) and that ~99.8% are solvable with naked/hidden singles/pairs/triples/quads, line-box interactions and x-wings (so all but ~0.2% of puzzles are rather trivial). These statistics are not representative of the puzzles that people normally like to solve, so for that reason I would discourage sampling from the space of all puzzles.

So now I think we need to agree what constitutes an "interesting" puzzle. A minimum requirement should be that an interesting puzzle is one that people have an interest in solving, which generally means that the puzzle should exhibit one or more "nice" qualities such as:
  • few clues
  • symmetry
  • automorphism
  • minimality
  • some minimum difficulty requirement, e.g. superior
Now, a serious problem with "interesting" puzzles is that we don't know how to sample from them in an unbiased way. Whatever ways we do have of generating "nice" puzzles may, for all we know, have very low bias, but the point is that the bias, and any effect it has on solvers, is unquantified and cannot be assumed to be negligible. Thus, if we have a puzzle generator X that can spit out symmetric superior puzzles (say) then it is okay to say, "my solver cracks ~98% of puzzles generated using X", but not to generalise this to say "my solver cracks ~98% of all symmetric superior puzzles". That is, when quoting solver statistics, you must also describe the data source.

OK, so what data sources should we use? I think we need two things: (1) some standard lists of pre-prepared puzzles; and (2) a list of generators + command lines needed to generate "interesting" puzzles. The need for the first is clear enough: lists such as gfroyle's 17s and the top-N puzzles were produced due to the inherent "interestingness" of the puzzles within, and people will want to know how easily solvers cope with those puzzles. The need for the second is twofold: one is to be able to produce statistics on much larger datasets than are available online; and the other is to provide a fair "tie-breaker" when two solvers claim similar performance on one of the pre-prepared datasets (in that case, an impartial third party could suggest a starting seed for the generator and then ask the solvers to attack 1000000 (say) puzzles each).

That's all I have to say for now. It would be great if any interested readers could provide command lines and links to datasets for future reference. Anyone?
Back to top
View user's profile Send private message
humble_programmer

Joined: 27 Jun 2006
Posts: 69
:
Location: Colorado Springs, USA

Items
PostPosted: Sun Jun 24, 2007 12:34 am    Post subject: Reply with quote

Here's a thought: how about a database application designed specifically for storing / retrieving Sudoku puzzles? As a professional software developer who's been tinkering with Sudoku for a while, I have a home-brew database application that I am willing to clean up and open source through this community. To make it as useful as possible to as many as possible, I'll need some encouragement and some feedback on the following:

1. GUI or command-line? The current flavor is a user-hostile console app that runs under Windows, but could be easily ported to Linux. It could also be wrapped in a Windows GUI without too much more work, which might make it more usable.

2. Database Fields
[a] Givens (starting values)
[b] Givens Count
[c] Solution Count (0: none 1: unique 2+: multiple)
[d] Suexrat9 Rating
[e] Sudocoup Rating
[f] Bit Mask of solving techniques required

Your thoughts?
_________________
Cheers!
Humble Programmer
,,,^..^,,,
www.humble-programmer.com
Back to top
View user's profile Send private message Visit poster's website
gsf

Joined: 18 Aug 2005
Posts: 411
:
Location: NJ USA

Items
PostPosted: Sun Jun 24, 2007 5:58 am    Post subject: Re: Datasets for solver benchmarking Reply with quote

thanks Ed
here's some freshly generated fodder

my generator does what Carla Gomes (quasigroup with holes) calls punching holes
start with a valid (solution) grid and puch holes in it and start testing for
desired properties at/after some hole percentage

punching holes tends to converge on valid puzzles faster than starting from empty grids
the grid sequence is the one I mentioned in the unbiased grid thread, fast at the expense of some bias

three files were generated, each with 100,000 puzzles, each with different generation parameters

m1n -- when a puzzle is hit, minimize it, output the first minimal puzzle, and start with a new grid
command line wrote:

sudoku -gp -n100000 -m1 -f'%v # %(C)q %(S)q %q' -Fi'# %#RX' -Ff'# %#et %#sn %#an'

mxn -- when a puzzle is hit output all minimal puzzles derived from it
command line wrote:

sudoku -gp -n100000 -m -f'%v # %(C)q %(S)q %q' -Fi'# %#RX' -Ff'# %#et %#sn %#an'

mxs -- like mxn but only punch holes in symmetric patterns and output all possible symmetric-minimal puzzles
command line wrote:

sudoku -gp -n100000 -m -sg -f'%v # %(C)q %(S)q %q' -Fi'# %#RX' -Ff'# %#et %#sn %#an'

s17 -- a gfroyle 17 catalog snapshot @40,000 entries
command line wrote:

wget http://people.csse.uwa.edu.au/gordon/sudoku17


mxs repeats the grid+punch+minimize process for each of the 9 symmetry types
(some of which are equivalent, like diagonal ~ antidiagonal)
the m* have 100,000 entries, s17 counts below are scaled by multiplying by 2.5

none of the sample files have isomorphic dups -- this was not an overt attempt,
but serendipidous fallout from the implementation

some of the samples have isomorphic solution grid dups (mx* by design)
here are the dup counts

m1n
Code:

2


mxn
Code:

4 8 12 76 127 170 196 202 293 391 395 419 524 623 851 885 907 1249 1521 2364 2390 2711 3405 5072 5527 5848 6771 6852 9192 10722 13408 16885


mxs
Code:

61 62 71 73 83 86 87 91 92 98 100 104 104 105 106 106 109 111 111 111 112 112 113 114 114 116 117 119 120 120 121 135 135 136 137 143 144 151 152 156 159 159 160 161 162
165 166 166 170 171 173 176 176 178 178 181 181 182 182
184 185 194 195 195 196 196 199 200 201 202 202 203 205
206 208 211 212 213 215 217 218 220 222 222 223 224 225
226 228 229 234 236 236 239 239 239 239 242 242 245 247
251 258 258 259 261 262 266 266 268 273 273 274 275 275
277 277 279 281 283 284 287 288 290 293 301 303 304 305
307 308 314 314 320 321 325 328 330 330 330 331 339 339
342 344 346 349 350 350 351 352 359 359 361 364 364 367
369 372 373 373 374 375 379 380 382 382 387 395 396 399
407 419 422 433 433 440 441 448 452 461 462 466 471 472
472 473 482 489 501 515 520 521 526 533 537 540 542 542
550 564 567 608 612 613 621 636 636 639 641 652 666 687
697 698 744 770 791 795 821 829 846 865 959 959 965 977
1032 1047 1095 1145 1150 1200 1323 1326 1387 1486 1574
1618 1784 1842 2104 2718 2942


s17 -- many solution grid dups, so this list is the number of dups by count, e.g., there are 1599 duped pairs
these counts are actual, not scaled
Code:

1599 2
 236 3
  81 4
  18 5
  20 6
   7 7
   3 8
   1 9
   1 11
   1 12
   1 14
   1 20
   1 29


finally a table that counts the constraints/methods/techniques required in the solution
the default constraint order in my solver is FNBTHWXYKOG
Code:

F naked singles
N hidden singles
B box-line
T naked n-tuples
H hidden n-tuples
W n-wings (x-wing swordfish jellyfish)
X pure X cycles (type { 1 3 } edges)
Y XY cycles { type 1 2 3 } edges
K Y-knots -- invalid Y edges discovered by the Y method
O rookery template overlays
G guessing


the order is roughly easy to hard
my solver is coded to exhaust constraints from the left before proceeding to the next,
and restarts at the left after a constraint produces a placement/elimination
(the default constraint order can be changed by the documented -qORDER command line option)
active GOKYXW (constraints that produces placements/eliminations) were counted to give an idea of solution difficulty
(note -- I don't do uniqueness techniques)

Code:

         m1n        mxn        mxs        s17
G        115         39          9         17
O         29         14          7         15
K      11667      14668       4294       1740
Y      25771      34813      14858       6255
X      22993      30968      15324      10515
W       5030       6437       5174       1182


the data files (except s17) are posted at:
http://www.research.att.com/~gsf/sudoku/data/m1n-100000.dat.gz etc.

each contains:
Code:

puzzle # Cclues[.minimal] [Ssymmetry.order] active-constraints

where active constraints are constraints that produce at least one placement/elimination
and minimal is m for minimal and M for symmetric minimal

[edit] also, two of the m* catalogs (I forgot to do it for the other) have
first line wrote:

# <-r command line option seed to reproduce pseudo random puzzle sequence>

corresponding to the -Fi command line format option (initial format)
last line wrote:

# <time to generate> <#puzzles checked/solved> <#puzzles output>

corresponding to the -Ff command line format option (final format)

the actual first and last lines are (@3Ghz)
Code:

mxn # 35374674558495 #  3h32m 160633794 100000
mxs #  2252720389140 # 40m58s  59879122 100000


one of the things that jumps out is adding restrictions, like symmetry or small number of clues,
tends to lower the number of "harder" methods required to solve (i.e., "easier" puzzles) for my generator
I'll leave it to the stats mavens to determine if the difference is significant

another is that although some collections may have little or no isomorphic dup puzzles,
they may have plenty of isomorphic dup solution grids
so another question is what is the nature of the bias, if any,
of puzzles derived from only one solution grid vs. puzzles derived from another solution grid
Back to top
View user's profile Send private message Visit poster's website
Red Ed

Joined: 04 Nov 2006
Posts: 21
:

Items
PostPosted: Sun Jun 24, 2007 5:30 pm    Post subject: Reply with quote

humble_programmer wrote:
Here's a thought: how about a database application designed specifically for storing / retrieving Sudoku puzzles? As a professional software developer who's been tinkering with Sudoku for a while, I have a home-brew database application that I am willing to clean up and open source through this community.
That's a very kind offer, Humble Programmer, but I think that such a database would be more useful for personal research -- allowing you to quickly find/sort puzzles of various qualities -- than it would be to produce just a few datasets for benchmarking purposes. At this stage, all I'm after is a good set of benchmark puzzle lists.

I've searched around the forums a little bit and have found several puzzle lists. Perhaps I should've done my research before starting this thread! Oh well ...These, together with gsf's command lines given earlier (thanks gsf) and anything else people can suggest, I will place on a sudopedia web page. But not for another week or so, as I will be away from my computer for a bit.

Thanks gsf and Humble Programmer for your comments.
Back to top
View user's profile Send private message
gsf

Joined: 18 Aug 2005
Posts: 411
:
Location: NJ USA

Items
PostPosted: Sun Jun 24, 2007 9:34 pm    Post subject: Reply with quote

add to the list these puzzles scraped from the players forum hardest thread
combined with others generated by the -gH option of my solver,
all with SE, suexrate, and (my solver) -q1 ratings,
along with the time to rate, providing evidence of the current sad state of rating

q1-taxonomy-2007-06-11.dat

this file is in a CSV (character separated variable length line) form that can be input by my solver
the first line describes the fields
its also compatible with the std unix text tools
Back to top
View user's profile Send private message Visit poster's website
NewUrbanBlues

Joined: 22 Oct 2006
Posts: 36
:

Items
PostPosted: Sat Sep 01, 2007 2:02 pm    Post subject: Reply with quote

Hello

I jus wonder... Lot of reference about the top 1465 boards but few statistics about the percentage of resolution by the different software
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Sudoku Programmers Forum Index -> Solving sudoku All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Sudoku Programmers topic RSS feed 


Powered by phpBB © 2001, 2005 phpBB Group

Igloo Theme Version 1.0 :: Created By: Andrew Charron