Datasets for solver benchmarking

Red Ed · Posted: Sat Jun 23, 2007 7:49 pm Post subject: Datasets for solver benchmarking

This thread discusses what puzzle sets to use when evaluating / comparing solvers.

A common mistake is to say that a solver can crack "99% of all puzzles", or to try to compare success rates of solver A on collection X with solver B on a different collection Y; or even to just compare two solvers on a single dataset and expect that to generalise somehow to "all puzzles" - whatever that means. (By "puzzle", I always mean one that has a unique solution.) This thread advocates moving away from talking about "all puzzles" towards benchmarks based on standard datasets and generators.

First, I'll argue that the set of all puzzles is not interesting. It's been estimated here that there are ~6e45 distinct sudoku puzzles. It is possible to sample from this space in a virtually unbiased manner, from which estimates (such as this) can be produced for the proportion of puzzles with a given number of clues, that are solvable with technique X, or whatever. My own simulations from the space of all puzzles shows that ~99.995% have at least 30 clues (which is rather more than you see in newspapers and in the popular online puzzle collections) and that ~99.8% are solvable with naked/hidden singles/pairs/triples/quads, line-box interactions and x-wings (so all but ~0.2% of puzzles are rather trivial). These statistics are not representative of the puzzles that people normally like to solve, so for that reason I would discourage sampling from the space of all puzzles.

So now I think we need to agree what constitutes an "interesting" puzzle. A minimum requirement should be that an interesting puzzle is one that people have an interest in solving, which generally means that the puzzle should exhibit one or more "nice" qualities such as:

few clues
symmetry
automorphism
minimality
some minimum difficulty requirement, e.g. superior

Now, a serious problem with "interesting" puzzles is that we don't know how to sample from them in an unbiased way. Whatever ways we do have of generating "nice" puzzles may, for all we know, have very low bias, but the point is that the bias, and any effect it has on solvers, is unquantified and cannot be assumed to be negligible. Thus, if we have a puzzle generator X that can spit out symmetric superior puzzles (say) then it is okay to say, "my solver cracks ~98% of puzzles generated using X", but not to generalise this to say "my solver cracks ~98% of all symmetric superior puzzles". That is, when quoting solver statistics, you must also describe the data source.

OK, so what data sources should we use? I think we need two things: (1) some standard lists of pre-prepared puzzles; and (2) a list of generators + command lines needed to generate "interesting" puzzles. The need for the first is clear enough: lists such as gfroyle's 17s and the top-N puzzles were produced due to the inherent "interestingness" of the puzzles within, and people will want to know how easily solvers cope with those puzzles. The need for the second is twofold: one is to be able to produce statistics on much larger datasets than are available online; and the other is to provide a fair "tie-breaker" when two solvers claim similar performance on one of the pre-prepared datasets (in that case, an impartial third party could suggest a starting seed for the generator and then ask the solvers to attack 1000000 (say) puzzles each).

That's all I have to say for now. It would be great if any interested readers could provide command lines and links to datasets for future reference. Anyone?

humble_programmer · Posted: Sun Jun 24, 2007 12:34 am Post subject:

Here's a thought: how about a database application designed specifically for storing / retrieving Sudoku puzzles? As a professional software developer who's been tinkering with Sudoku for a while, I have a home-brew database application that I am willing to clean up and open source through this community. To make it as useful as possible to as many as possible, I'll need some encouragement and some feedback on the following:

1. GUI or command-line? The current flavor is a user-hostile console app that runs under Windows, but could be easily ported to Linux. It could also be wrapped in a Windows GUI without too much more work, which might make it more usable.

2. Database Fields
[a] Givens (starting values)
[b] Givens Count
[c] Solution Count (0: none 1: unique 2+: multiple)
[d] Suexrat9 Rating
[e] Sudocoup Rating
[f] Bit Mask of solving techniques required

Your thoughts?
_________________
Cheers!
Humble Programmer
,,,^..^,,,
www.humble-programmer.com

gsf

thanks Ed
here's some freshly generated fodder

my generator does what Carla Gomes (quasigroup with holes) calls punching holes
start with a valid (solution) grid and puch holes in it and start testing for
desired properties at/after some hole percentage

punching holes tends to converge on valid puzzles faster than starting from empty grids
the grid sequence is the one I mentioned in the unbiased grid thread, fast at the expense of some bias

three files were generated, each with 100,000 puzzles, each with different generation parameters

m1n -- when a puzzle is hit, minimize it, output the first minimal puzzle, and start with a new grid

Red Ed · Posted: Sun Jun 24, 2007 5:30 pm Post subject:

gsf · Posted: Sun Jun 24, 2007 9:34 pm Post subject:

add to the list these puzzles scraped from the players forum hardest thread
combined with others generated by the -gH option of my solver,
all with SE, suexrate, and (my solver) -q1 ratings,
along with the time to rate, providing evidence of the current sad state of rating

q1-taxonomy-2007-06-11.dat

this file is in a CSV (character separated variable length line) form that can be input by my solver
the first line describes the fields
its also compatible with the std unix text tools

NewUrbanBlues · Posted: Sat Sep 01, 2007 2:02 pm Post subject:

Hello

I jus wonder... Lot of reference about the top 1465 boards but few statistics about the percentage of resolution by the different software