Sudoku Programmers Forum Index

 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister   ProfileProfile   Log inLog in          Games  Calendar

Log in to check your private messagesLog in to check your private messages   

Top95: Our common benchmark?
Goto page 1, 2, 3, 4  Next
 
Post new topic   Reply to topic    Sudoku Programmers Forum Index -> Solving sudoku
View previous topic :: View next topic  
Author Message
Ruud
Site Admin
Joined: 17 Sep 2005
Posts: 708
:
Location: Netherlands

Items
PostPosted: Fri Oct 28, 2005 12:49 pm    Post subject: Top95: Our common benchmark? Reply with quote

To test our solvers, many of us use the Top95 collection as a benchmark.

Does anyone know the answer to the following 2 questions?

1. How many of the top95 can be solved without backtracking, only with "pure logic" techniques?

2. Are all currently known techniques (with variations) covered in this collection?

If the answer to the second question is "no", it would be nice to have a collection that can be used as a common testset.Cool
For new techniques, when discovered, test puzzles can then be added to this collection.

Ruud.
Back to top
View user's profile Send private message Visit poster's website
dukuso

Joined: 14 Jul 2005
Posts: 424
:
Location: germany

Items
PostPosted: Fri Oct 28, 2005 6:04 pm    Post subject: Reply with quote

these are just the hardest for my exact-cover solver, which
only propagates on naked/hidden singles and else
backtracks on the constraint with fewest candidates.
Average over 100-1000 randomized trials of the same sudoku.

You can use
http://magictour.free.fr/suexrate.exe
to verify the ratings or to rate other puzzles with it.

The top-series is occasionally updated when harder puzzles
are found.
Actually latest is
http://magictour.free.fr/top91
(waiting for gsf to filter his list of 2e8 sudokus ... )

For benchmarking it should be better to use more than 95 puzzles,
and add some, which are hard for _some_ solver, even if they
are easy for others. Just to get a greater variety of hard puzzles.


-Guenter.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
xyzzy

Joined: 24 Aug 2005
Posts: 80
:

Items
PostPosted: Fri Oct 28, 2005 9:24 pm    Post subject: Reply with quote

Can you give an objective definition of "pure logic"?

You might say, "anything but backtracking", but most "logic" techniques beyond singles are implimented with some kind of depth first search, which IS backtracking.
Back to top
View user's profile Send private message Visit poster's website
Ruud
Site Admin
Joined: 17 Sep 2005
Posts: 708
:
Location: Netherlands

Items
PostPosted: Fri Oct 28, 2005 10:12 pm    Post subject: Reply with quote

Well, I did use quote-quote, acknowledging the fact that the definition may be tricky.

If I had to draw a line (and you now force me to do so), it would be at the point where a "pure logic" technique can still be explained to a human, either by text or by showing a diagram (equivalent of 1000 words).

I think it still comes down to "anything but backtracking"...

Whatever the definition, I'm mostly interested in a collection of puzzles, that can be solved without brute force, and where each known technique (and variety thereof) has at least one example. This list could then be used in 2 ways:

1. As a benchmark for rating puzzles.
2. As a testset for programmers implementing different solving techniques.

Ruud.
Back to top
View user's profile Send private message Visit poster's website
Lummox JR

Joined: 07 Sep 2005
Posts: 202
:

Items
PostPosted: Tue Nov 01, 2005 5:06 am    Post subject: Reply with quote

You know, I just updated my solver to analyze "top" lists like this, and it found something extraordinary: A significant number of the top95 "hardest" puzzles aren't hard at all. Many of them can be solved with nothing harder than box-line interactions, which makes them medium puzzles at best on a pedestrian scale. (Although I found none that are categorically easy, using only singles, so that's at least something.) Puzzle #1 is criminally simple.
Code:
4 . .|. . .|8 . 5
. 3 .|. . .|. . .
. . .|7 . .|. . .
-----------------
. 2 .|. . .|. 6 .
. . .|. 8 .|4 . .
. . .|. 1 .|. . .
-----------------
. . .|6 . 3|. 7 .
5 . .|2 . .|. . .
1 . 4|. . .|. . .

This puzzle needs no test even as complicated as subsets.

However, I did find that puzzle #75 from the original top95 is an extraordinarily difficult one. My solver rates this as harder than the infamous "toughest known" sudoku:
Code:
. 5 2|. . 6|8 . .
. . .|. . 7|. 2 .
. . .|. . .|6 . .
-----------------
. . 4|8 . .|9 . .
2 . .|4 1 .|. . .
. . 1|. . .|. . 8
-----------------
. . 6|1 . .|3 8 .
. . .|. 9 .|. . 6
3 . .|6 . .|1 . 9

According to my solver, bifurcating chains have to be used frequently and will often fail to find anything in 2- or 3-candidate cells, instead resorting to 4- or 5-candidate cells. It's optimized to go for the cells with the fewest candidates first, so that should tell you something. On a numerical scale based on the tests it calls and the number of times, always trying the easiest first, it rates this puzzle a 461,090, whereas the "toughest known" puzzle scores only 380,170.

Interestingly, puzzle #60 from the top95 is a variant of the TKP, having the same solution and mostly the same givens. Some of the first four singles my solver finds for #60 are givens from the TKP, so it appears that a few of the givens in one are just changed to easy singles in the other, and vice-versa. My solver rates both puzzles identically. Here's the TKP and #60 side-by-side:
Code:
top95 #60              "Toughest known"
. . .|. . .|9 4 .      . . .|. 7 .|9 4 .
. . .|. 9 .|. . 5      . . .|. 9 .|. . 5
3 . .|. . 5|. 7 .      3 . .|. . 5|. 7 .
-----------------      -----------------
. 8 .|4 . .|1 . .      . . 7|4 . .|1 . .
4 6 3|. . .|. . .      4 6 3|. . .|. . .
. . .|. . 7|. 8 .      . . .|. . 7|. 8 .
-----------------      -----------------
8 . .|7 . .|. . .      8 . .|. . .|. . .
7 . .|. . .|. 2 8      7 . .|. . .|. 2 8
. 5 .|2 6 .|. . .      . 5 .|2 6 .|. . .

Interestingly, #75 from top95 has no equivalent in that new top91, in spite of the fact that it appears to kick the crap out of anything else in the top95 set. At the very least it's still a more difficult puzzle than many in the top91, which still is riddled with mass-market mediums.

Truly these sudoku are being sifted on a fairly arbitrary basis. However to the top95's credit, the collection does contain a wealth of puzzles that my solver rates as 200,000+, which are insanely difficult, and 100,000+ which are almost as bad. For anything to get up that high in my solver, it has to require techniques akin to T&E such as bifurcating chains, or something laboriously complex like full tabling (which my solver does not implement). There are also a lot of lesser puzzles which could certainly qualify as hard, but at lower orders of magnitude.

All in all my analysis shows 10 of the top95 definitely don't belong there. Those are #1-3, #23-24, #27, #36, #43-44, and #84. A number of others are there only on the strength of subset tests, and if you exclude anything that uses at most subsets, you'll find even more puzzles worthy of elimination from the set.
Back to top
View user's profile Send private message
dukuso

Joined: 14 Jul 2005
Posts: 424
:
Location: germany

Items
PostPosted: Tue Nov 01, 2005 5:33 am    Post subject: Reply with quote

I explained earlier in this thread, what are the criteria to be included
in these topxx-lists, so I won't remove sudokus from that list because they are easy for some solver.
Yes, some of these are easy with technics which are only slightly
above FN (singles), but require lots of steps when using FN only.

One day I'll maybe make another list which uses a solver
which includes more technics (as long as those provide
speed-increase in average)

The problem that some puzzles which are hard for one solver are
easy for another one is quite common,
so we better test this on a list of about 100 sudokus
and then take the average.

Please post other lists of hardest sudokus to the puzzles-section !

There is also a larger list at:
http://magictour.free.fr/top2365
you can also test the old lists : top100,top99,...,top91

-Guenter.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Lummox JR

Joined: 07 Sep 2005
Posts: 202
:

Items
PostPosted: Tue Nov 01, 2005 5:52 am    Post subject: Reply with quote

But this goes to show that hardness for DLX is quite unrelated to hardness by other means. Dancing links is bound to go for some constraints at times that are the long way around to an otherwise clear conclusion; it's not operating on any heuristic except to pick the most constrained column, and in the absence of a clear choice will just pick the first one. It's interesting that many truly astoundingly difficult puzzles are caught by this method, but then I also wonder how many it's missing.

Far more interesting would be a magic cell analysis. At each post-single stage, how many magic cells exist? How many stages on average does it take to reach a conclusion? Easier puzzles should, in theory, have more weak points and lead to quicker deductions from them.
Back to top
View user's profile Send private message
dukuso

Joined: 14 Jul 2005
Posts: 424
:
Location: germany

Items
PostPosted: Tue Nov 01, 2005 6:01 am    Post subject: Reply with quote

Lummox JR wrote:
But this goes to show that hardness for DLX is quite unrelated to hardness by other means. Dancing links is bound to go for some constraints at times that are the long way around to an otherwise clear conclusion; it's not operating on any heuristic except to pick the most constrained column, and in the absence of a clear choice will just pick the first one. It's interesting that many truly astoundingly difficult puzzles are caught by this method, but then I also wonder how many it's missing.

Far more interesting would be a magic cell analysis. At each post-single stage, how many magic cells exist? How many stages on average does it take to reach a conclusion? Easier puzzles should, in theory, have more weak points and lead to quicker deductions from them.



the same seems to be true for "hardness by other means"
vs. "hardness by still other means", so IMO it's not DLX-specific.
When some technique can crack some problem easily,
that doesn't still mean that it's useful(=fast) to check this
technique in average sudokus.

However, I _do_ expect that there are some techniques which
would be worth to be considered i.e.
"single unit candidate" in Nick70 notation.

Although I expect implementing this will decrease the rating
of most puzzles, it will probably still improve on average.

-Guenter
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Ruud
Site Admin
Joined: 17 Sep 2005
Posts: 708
:
Location: Netherlands

Items
PostPosted: Tue Nov 01, 2005 12:07 pm    Post subject: Reply with quote

Lummox JR wrote:
Interestingly, puzzle #60 from the top95 is a variant of the TKP

I already noticed this in this topic.

I've looked through old topics, here and on the players forum. Turns out that the first posting of TKP had 28 clues, but it isn't minimal. There are at least 3 minimal forms that can be made from it.

I also noticed that puzzle #75 in the top95 is a tough one. It remarkably slowed down my tabling implementation. I wonder how you would rate #14 in the list? It also required a lot of tabling to be solved.

Upto this point, the only puzzles I cannot solve without backtracking are #9, #60 and #69.

Ruud.
Back to top
View user's profile Send private message Visit poster's website
xyzzy

Joined: 24 Aug 2005
Posts: 80
:

Items
PostPosted: Tue Nov 01, 2005 12:17 pm    Post subject: Reply with quote

dukuso wrote:

When some technique can crack some problem easily,
that doesn't still mean that it's useful(=fast) to check this
technique in average sudokus.


I suppose you could say, you rate the difficulty of a technique by how long it takes a computer to perform it. Which does seem a much more effective metric than just assigning arbitrary weights.
Back to top
View user's profile Send private message Visit poster's website
Lummox JR

Joined: 07 Sep 2005
Posts: 202
:

Items
PostPosted: Wed Nov 02, 2005 4:35 am    Post subject: Reply with quote

Ruud wrote:
I also noticed that puzzle #75 in the top95 is a tough one. It remarkably slowed down my tabling implementation. I wonder how you would rate #14 in the list? It also required a lot of tabling to be solved.

Code:
top95 #14:
3 . 6|. 7 .|. . .
. . .|. . .|. 5 1
8 . .|. . .|. . .
-----------------
. 1 .|4 . 5|. . .
7 . .|. . .|6 . .
. . .|2 . .|. . .
-----------------
. 2 .|. . .|. 4 .
. . .|. 8 .|3 . .
. . .|5 . .|. . .

Does your solver implement uniqueness tests? I don't know to what extent it makes the difference here, but it was present in my solver's solution, and I find uniqueness tests often greatly reduce the severity of a puzzle (except the B form, which often finds only trivial eliminations). The steps my solver took, in order, for anything harder than subsets were: Uniqueness 4, coloring by 8, coloring by 9, and advanced coloring--one round of each. Ultimate difficulty rating in my solver's metric: 21970. This is the sort of puzzle that, with some time to kill and Sudoku Susser loaded to help with some of the grunt work, I'd be willing to solve "manually". When it comes to advanced coloring I paste the candidate list into a text editor and assign colors myself.
Back to top
View user's profile Send private message
Ruud
Site Admin
Joined: 17 Sep 2005
Posts: 708
:
Location: Netherlands

Items
PostPosted: Wed Nov 02, 2005 12:16 pm    Post subject: Reply with quote

Currently, I only have implemented uniqueness test 1. I'm trying to get a complete picture (using your and MadOverlord's discussions) to implement the other types too.

My solver does solve this #14, but the progress was very slow, until I added a check for Magic Cells.

Now it solves it very quickly:

-basics (singles + line-box)
-Hidden triple (2,4,5) in row 5
-Coloring by 8 (2 eliminations)
-r2c1 Magic Cell for value 9
-complete with basics

I know that these Magic Cells are like cheat-codes, but it saves soooo much time...

Ruud.
Back to top
View user's profile Send private message Visit poster's website
Bob Hanson

Joined: 05 Oct 2005
Posts: 187
:
Location: St. Olaf College

Items
PostPosted: Thu Nov 03, 2005 5:26 am    Post subject: Reply with quote

The Sudoku Assistant finds this one pretty easy (37 steps):

http://www.stolaf.edu/people/hansonr/sudoku?-3,5,-6,8,-7,1,2,9,4,9,7,2,6,4,3,8,-5,-1,-8,4,1,9,5,2,7,3,6,2,-1,3,-4,6,-5,9,8,7,-7,9,4,3,1,8,-6,2,5,6,8,5,-2,9,7,4,1,3,1,-2,8,7,3,6,5,-4,9,5,6,9,1,-8,4,-3,7,2,4,3,7,-5,2,9,1,6,8

The only notable things I see there are:

Code:


Step 29 Strong Chains:  1  2  3  4  5  6  7  8  9  10

36 cells left to solve
Cross checking
Checking block ranges
Checking for subset elimination
Checking strong chains
11 strong chains
Checking for weak links
26 weak links
94 weak corners
r8c1 ISN'T 4: incompatible weak links to chain 2 with +(1) parity
r9c6 ISN'T 4: incompatible weak links to chain 2 with +(1) parity
r9c1 ISN'T 9: incompatible weak links to chain 2 with +(1) parity

Step 26 Strong Chains:  1  2  3  4  5  6  7  8  9  10  11  12  13  14

38 cells left to solve
Cross checking
Checking block ranges
Checking for subset elimination
Checking strong chains
15 strong chains
Checking for weak links
52 weak links
157 weak corners
Hypothesis and Proof

Logical analysis: Chain 2(0)
a: The hypothesis is this: Chain 2(0) can be eliminated.
d: found either state for r8c1#4 leads to the same conclusion
The hypothesis is proven. QED Table
r8c1 ISN'T 1: Chain 2(0) may be eliminated.
r7c1 ISN'T 9: Chain 2(0) may be eliminated.


Step 25 Strong Chains:  1  2  3  4  5  6  7  8  9  10  11  12  13  14

38 cells left to solve
Cross checking
Checking block ranges
Checking for subset elimination
Checking strong chains
15 strong chains
Checking for weak links
49 weak links
155 weak corners
r5c6 ISN'T 3: weak 0/1 corner between linked 1/0 strong chains 1 and 3
r5c5 ISN'T 3: weak 1/1 corner between linked 0/0 strong chains 3 and 12

Step 21 Strong Chains:  1  2  3  4  5  6  7  8  9  10  11  12  13  14

38 cells left to solve
Cross checking
Checking block ranges
Checking for subset elimination
Checking strong chains
15 strong chains
Checking for weak links
24 weak links
125 weak corners
r6c2 ISN'T 6: weak 1/1 corner between linked 0/0 strong chains 3 and 8



Step 17 Strong Chains:  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18

41 cells left to solve
Cross checking
Checking block ranges
Checking for subset elimination
Checking strong chains
19 strong chains
Checking for weak links
33 weak links
176 weak corners
r6c9 ISN'T 8: incompatible weak links to chain 14 with +(0) parity


The rest are just some block exclusions, hidden subsets, and the like.
A nice variety, I guess.
_________________
Bob Hanson
Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Lummox JR

Joined: 07 Sep 2005
Posts: 202
:

Items
PostPosted: Thu Nov 03, 2005 5:37 am    Post subject: Reply with quote

Ruud wrote:
Currently, I only have implemented uniqueness test 1. I'm trying to get a complete picture (using your and MadOverlord's discussions) to implement the other types too.

Uniqueness 2 and 4 are pretty easy to implement; it's 3 that's the killer.

In 2, finding the floor cells is easy because you know they'll each have 3 identical candidates. In 4, the extra candidates don't matter and you can focus instead on finding whether one of the roof candidates is constrained to just the floor cells. Overall, 1 and 4 seem to be the most useful uniqueness tests.

For form 3 you basically need a good general-purpose subset solver and the ability to lie to it.
Back to top
View user's profile Send private message
Ruud
Site Admin
Joined: 17 Sep 2005
Posts: 708
:
Location: Netherlands

Items
PostPosted: Thu Nov 03, 2005 6:22 am    Post subject: Reply with quote

Uniqueness test 2 (A&B) are now implemented and tested.

For test 3 I implemented the naked subset part, had to struggle through your discussions to grasp the hidden subset part, but I think I got the concept now.

Test 4 initially looked a little vague, but I now see that I have to look for a conjugate pair in the roof cells for any of the 2 "deadly digits". This is indeed easy. Maybe I do that before the hidden subsets in type 3.

Problem with the Ultimate Guide is that many of the samples that MadOverlord provided are thwarted by lower level techniques.

This one (I filled it with the bound cells from the candidate grid, since the original grid was not included)

Code:
9 3 6|. . .|. 7 5
8 7 5|. . 3|. . 1
1 4 2|7 5 6|3 8 9
-----+-----+-----
. . 7|. . .|8 . .
5 1 4|8 . 2|7 . 6
. 8 9|. 7 .|. . .
-----+-----+-----
4 . .|. . 7|9 . 3
. . .|. 4 .|. . 7
7 9 .|2 . 5|. . 8


It should have been an example of type 2, but a hidden subset (1,5) in column 7 spoils the fun by reducing it to a type 1.

This one:

Code:
2 9 .|3 . .|. . .
. . .|5 2 9|. 8 .
3 5 .|. . 1|4 9 2
-----+-----+-----
1 . .|9 . 5|8 . .
8 7 .|. 1 2|. 3 .
9 . .|7 . 8|. . 6
-----+-----+-----
4 2 9|1 5 3|7 6 8
5 8 .|2 . 7|. . .
. . .|. . .|. . 5


It should be a type 2B example, but a (beautiful) jellyfish is swimming in the way.

This #82 from top95 is a better example for type 2:

Code:
. . .|. . .|. . .
. 5 7|2 4 .|. . 9
8 . .|. . 9|4 7 .
-----+-----+-----
. . 9|. . 3|. . .
5 . .|9 . .|1 2 .
. . 3|. 1 .|9 . .
-----+-----+-----
. 6 .|. . .|2 5 .
. . .|5 6 .|. . .
. 7 .|. . .|. . 6


So far, I've been able to find only one usable example for type 3-naked. Given the complexity, I should use more test samples before I can really say it works as required.

On a philosophical note:

When a uniqueness test can do eliminations both in a line and a box, and the line eliminations have a cascading effect that 'destroys' the deadly pattern, can you still do the box eliminations as if the pattern were still there?

Ruud.
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    Sudoku Programmers Forum Index -> Solving sudoku All times are GMT
Goto page 1, 2, 3, 4  Next
Page 1 of 4

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Sudoku Programmers topic RSS feed 


Powered by phpBB © 2001, 2005 phpBB Group

Igloo Theme Version 1.0 :: Created By: Andrew Charron