|
View previous topic :: View next topic |
Author |
Message |
| Red Ed
| Joined: 04 Nov 2006 | Posts: 21 | : | | Items |
|
Posted: Sat Jun 23, 2007 7:49 pm Post subject: Datasets for solver benchmarking |
|
|
This thread discusses what puzzle sets to use when evaluating / comparing solvers.
A common mistake is to say that a solver can crack "99% of all puzzles", or to try to compare success rates of solver A on collection X with solver B on a different collection Y; or even to just compare two solvers on a single dataset and expect that to generalise somehow to "all puzzles" - whatever that means. (By "puzzle", I always mean one that has a unique solution.) This thread advocates moving away from talking about "all puzzles" towards benchmarks based on standard datasets and generators.
First, I'll argue that the set of all puzzles is not interesting. It's been estimated here that there are ~6e45 distinct sudoku puzzles. It is possible to sample from this space in a virtually unbiased manner, from which estimates (such as this) can be produced for the proportion of puzzles with a given number of clues, that are solvable with technique X, or whatever. My own simulations from the space of all puzzles shows that ~99.995% have at least 30 clues (which is rather more than you see in newspapers and in the popular online puzzle collections) and that ~99.8% are solvable with naked/hidden singles/pairs/triples/quads, line-box interactions and x-wings (so all but ~0.2% of puzzles are rather trivial). These statistics are not representative of the puzzles that people normally like to solve, so for that reason I would discourage sampling from the space of all puzzles.
So now I think we need to agree what constitutes an "interesting" puzzle. A minimum requirement should be that an interesting puzzle is one that people have an interest in solving, which generally means that the puzzle should exhibit one or more "nice" qualities such as:- few clues
- symmetry
- automorphism
- minimality
- some minimum difficulty requirement, e.g. superior
Now, a serious problem with "interesting" puzzles is that we don't know how to sample from them in an unbiased way. Whatever ways we do have of generating "nice" puzzles may, for all we know, have very low bias, but the point is that the bias, and any effect it has on solvers, is unquantified and cannot be assumed to be negligible. Thus, if we have a puzzle generator X that can spit out symmetric superior puzzles (say) then it is okay to say, "my solver cracks ~98% of puzzles generated using X", but not to generalise this to say "my solver cracks ~98% of all symmetric superior puzzles". That is, when quoting solver statistics, you must also describe the data source.
OK, so what data sources should we use? I think we need two things: (1) some standard lists of pre-prepared puzzles; and (2) a list of generators + command lines needed to generate "interesting" puzzles. The need for the first is clear enough: lists such as gfroyle's 17s and the top-N puzzles were produced due to the inherent "interestingness" of the puzzles within, and people will want to know how easily solvers cope with those puzzles. The need for the second is twofold: one is to be able to produce statistics on much larger datasets than are available online; and the other is to provide a fair "tie-breaker" when two solvers claim similar performance on one of the pre-prepared datasets (in that case, an impartial third party could suggest a starting seed for the generator and then ask the solvers to attack 1000000 (say) puzzles each).
That's all I have to say for now. It would be great if any interested readers could provide command lines and links to datasets for future reference. Anyone? |
|
Back to top |
|
|
| humble_programmer
| Joined: 27 Jun 2006 | Posts: 69 | : | Location: Colorado Springs, USA | Items |
|
Posted: Sun Jun 24, 2007 12:34 am Post subject: |
|
|
Here's a thought: how about a database application designed specifically for storing / retrieving Sudoku puzzles? As a professional software developer who's been tinkering with Sudoku for a while, I have a home-brew database application that I am willing to clean up and open source through this community. To make it as useful as possible to as many as possible, I'll need some encouragement and some feedback on the following:
1. GUI or command-line? The current flavor is a user-hostile console app that runs under Windows, but could be easily ported to Linux. It could also be wrapped in a Windows GUI without too much more work, which might make it more usable.
2. Database Fields
[a] Givens (starting values)
[b] Givens Count
[c] Solution Count (0: none 1: unique 2+: multiple)
[d] Suexrat9 Rating
[e] Sudocoup Rating
[f] Bit Mask of solving techniques required
Your thoughts? _________________ Cheers!
Humble Programmer
,,,^..^,,,
www.humble-programmer.com |
|
Back to top |
|
|
| gsf
| Joined: 18 Aug 2005 | Posts: 411 | : | Location: NJ USA | Items |
|
Posted: Sun Jun 24, 2007 5:58 am Post subject: Re: Datasets for solver benchmarking |
|
|
thanks Ed
here's some freshly generated fodder
my generator does what Carla Gomes (quasigroup with holes) calls punching holes
start with a valid (solution) grid and puch holes in it and start testing for
desired properties at/after some hole percentage
punching holes tends to converge on valid puzzles faster than starting from empty grids
the grid sequence is the one I mentioned in the unbiased grid thread, fast at the expense of some bias
three files were generated, each with 100,000 puzzles, each with different generation parameters
m1n -- when a puzzle is hit, minimize it, output the first minimal puzzle, and start with a new grid
command line wrote: |
sudoku -gp -n100000 -m1 -f'%v # %(C)q %(S)q %q' -Fi'# %#RX' -Ff'# %#et %#sn %#an'
|
mxn -- when a puzzle is hit output all minimal puzzles derived from it
command line wrote: |
sudoku -gp -n100000 -m -f'%v # %(C)q %(S)q %q' -Fi'# %#RX' -Ff'# %#et %#sn %#an'
|
mxs -- like mxn but only punch holes in symmetric patterns and output all possible symmetric-minimal puzzles
command line wrote: |
sudoku -gp -n100000 -m -sg -f'%v # %(C)q %(S)q %q' -Fi'# %#RX' -Ff'# %#et %#sn %#an'
|
s17 -- a gfroyle 17 catalog snapshot @40,000 entries
mxs repeats the grid+punch+minimize process for each of the 9 symmetry types
(some of which are equivalent, like diagonal ~ antidiagonal)
the m* have 100,000 entries, s17 counts below are scaled by multiplying by 2.5
none of the sample files have isomorphic dups -- this was not an overt attempt,
but serendipidous fallout from the implementation
some of the samples have isomorphic solution grid dups (mx* by design)
here are the dup counts
m1n
mxn
Code: |
4 8 12 76 127 170 196 202 293 391 395 419 524 623 851 885 907 1249 1521 2364 2390 2711 3405 5072 5527 5848 6771 6852 9192 10722 13408 16885
|
mxs
Code: |
61 62 71 73 83 86 87 91 92 98 100 104 104 105 106 106 109 111 111 111 112 112 113 114 114 116 117 119 120 120 121 135 135 136 137 143 144 151 152 156 159 159 160 161 162
165 166 166 170 171 173 176 176 178 178 181 181 182 182
184 185 194 195 195 196 196 199 200 201 202 202 203 205
206 208 211 212 213 215 217 218 220 222 222 223 224 225
226 228 229 234 236 236 239 239 239 239 242 242 245 247
251 258 258 259 261 262 266 266 268 273 273 274 275 275
277 277 279 281 283 284 287 288 290 293 301 303 304 305
307 308 314 314 320 321 325 328 330 330 330 331 339 339
342 344 346 349 350 350 351 352 359 359 361 364 364 367
369 372 373 373 374 375 379 380 382 382 387 395 396 399
407 419 422 433 433 440 441 448 452 461 462 466 471 472
472 473 482 489 501 515 520 521 526 533 537 540 542 542
550 564 567 608 612 613 621 636 636 639 641 652 666 687
697 698 744 770 791 795 821 829 846 865 959 959 965 977
1032 1047 1095 1145 1150 1200 1323 1326 1387 1486 1574
1618 1784 1842 2104 2718 2942
|
s17 -- many solution grid dups, so this list is the number of dups by count, e.g., there are 1599 duped pairs
these counts are actual, not scaled
Code: |
1599 2
236 3
81 4
18 5
20 6
7 7
3 8
1 9
1 11
1 12
1 14
1 20
1 29
|
finally a table that counts the constraints/methods/techniques required in the solution
the default constraint order in my solver is FNBTHWXYKOG
Code: |
F naked singles
N hidden singles
B box-line
T naked n-tuples
H hidden n-tuples
W n-wings (x-wing swordfish jellyfish)
X pure X cycles (type { 1 3 } edges)
Y XY cycles { type 1 2 3 } edges
K Y-knots -- invalid Y edges discovered by the Y method
O rookery template overlays
G guessing
|
the order is roughly easy to hard
my solver is coded to exhaust constraints from the left before proceeding to the next,
and restarts at the left after a constraint produces a placement/elimination
(the default constraint order can be changed by the documented -qORDER command line option)
active GOKYXW (constraints that produces placements/eliminations) were counted to give an idea of solution difficulty
(note -- I don't do uniqueness techniques)
Code: |
m1n mxn mxs s17
G 115 39 9 17
O 29 14 7 15
K 11667 14668 4294 1740
Y 25771 34813 14858 6255
X 22993 30968 15324 10515
W 5030 6437 5174 1182
|
the data files (except s17) are posted at:
http://www.research.att.com/~gsf/sudoku/data/m1n-100000.dat.gz etc.
each contains:
Code: |
puzzle # Cclues[.minimal] [Ssymmetry.order] active-constraints
|
where active constraints are constraints that produce at least one placement/elimination
and minimal is m for minimal and M for symmetric minimal
[edit] also, two of the m* catalogs (I forgot to do it for the other) have
first line wrote: |
# <-r command line option seed to reproduce pseudo random puzzle sequence>
|
corresponding to the -Fi command line format option (initial format)
last line wrote: |
# <time to generate> <#puzzles checked/solved> <#puzzles output>
|
corresponding to the -Ff command line format option (final format)
the actual first and last lines are (@3Ghz)
Code: |
mxn # 35374674558495 # 3h32m 160633794 100000
mxs # 2252720389140 # 40m58s 59879122 100000
|
one of the things that jumps out is adding restrictions, like symmetry or small number of clues,
tends to lower the number of "harder" methods required to solve (i.e., "easier" puzzles) for my generator
I'll leave it to the stats mavens to determine if the difference is significant
another is that although some collections may have little or no isomorphic dup puzzles,
they may have plenty of isomorphic dup solution grids
so another question is what is the nature of the bias, if any,
of puzzles derived from only one solution grid vs. puzzles derived from another solution grid |
|
Back to top |
|
|
| Red Ed
| Joined: 04 Nov 2006 | Posts: 21 | : | | Items |
|
Posted: Sun Jun 24, 2007 5:30 pm Post subject: |
|
|
humble_programmer wrote: | Here's a thought: how about a database application designed specifically for storing / retrieving Sudoku puzzles? As a professional software developer who's been tinkering with Sudoku for a while, I have a home-brew database application that I am willing to clean up and open source through this community. | That's a very kind offer, Humble Programmer, but I think that such a database would be more useful for personal research -- allowing you to quickly find/sort puzzles of various qualities -- than it would be to produce just a few datasets for benchmarking purposes. At this stage, all I'm after is a good set of benchmark puzzle lists.
I've searched around the forums a little bit and have found several puzzle lists. Perhaps I should've done my research before starting this thread! Oh well ...These, together with gsf's command lines given earlier (thanks gsf) and anything else people can suggest, I will place on a sudopedia web page. But not for another week or so, as I will be away from my computer for a bit.
Thanks gsf and Humble Programmer for your comments. |
|
Back to top |
|
|
| gsf
| Joined: 18 Aug 2005 | Posts: 411 | : | Location: NJ USA | Items |
|
Posted: Sun Jun 24, 2007 9:34 pm Post subject: |
|
|
add to the list these puzzles scraped from the players forum hardest thread
combined with others generated by the -gH option of my solver,
all with SE, suexrate, and (my solver) -q1 ratings,
along with the time to rate, providing evidence of the current sad state of rating
q1-taxonomy-2007-06-11.dat
this file is in a CSV (character separated variable length line) form that can be input by my solver
the first line describes the fields
its also compatible with the std unix text tools |
|
Back to top |
|
|
| NewUrbanBlues
| Joined: 22 Oct 2006 | Posts: 36 | : | | Items |
|
Posted: Sat Sep 01, 2007 2:02 pm Post subject: |
|
|
Hello
I jus wonder... Lot of reference about the top 1465 boards but few statistics about the percentage of resolution by the different software |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
Igloo Theme Version 1.0 :: Created By: Andrew Charron
|