what is the hardest known suduko ?

Doug · Posted: Thu Aug 11, 2005 9:23 am Post subject:

There are two basic approaches to computing difficulty of sudokus.

1) Try to gain some idea of the hardest logic required for solution.

2) Measure how fast a computer using some form of backtracking can solve the puzzle.

Now, when 2) has intelligent backtracking, with simple logic rules being used when possible, it gives results (on average) that are close to 1).

The problem with 2), of course, is that its non-canonical; different choices made by the backtracker result in different difficulty measures-- and there are almost always arbitrary choices in the backtracker methods.

One notes that backtrackers, especially intelligent ones, run very quickly.

One way to make the backtracker estimates more meaningful is for them to be run a number of times, randomizing when they have choices to take and averaging their numbers. This should give a good indication of puzzle difficulty. Since the backtrackers run so fast, this is pretty much invisible to a person and should result in pretty good measures.

Regarding approach 1):

It seems to me that what most people want to know is how hard it is for a person to reason out the solution. This comes down to asking what is the length of the longest chain of implications that a person would have to use at some point to solve a given puzzle.

But this problem is more subtle than it first appears.

Immagine that at "state n" of solving the puzzle a set of 4 implications is required (minimum) to make a deduction about the puzzle, say an elimination or actually determining the value of a cell. Also assume that at state n all other implications that give an advance in the puzzle require 4 or more implications. So it would appear that the best choice is to pick the logic that requires only 4 implications. But this might not be the case!

It might be that after doing the 4 implications we get to state n+1 where the shortest number of implications that will give progress is, say 10. But it might be the case that back at state n if we had chosen a set of implications of lenght 5, the next state would only require 2 implications for progress.

What I'm trying to say is that locally minimizing the logic during the solve doesn't always show the difficulty of a sudoku. In mathematical terms, the order in which you solve for information is not commutative.

I look at it this way, if God were solving the sudoku by logic, how would he do it? The goal should be that the step in the puzzle with the longest logic required should be minimal.

Why does one care about the solution with shortest logic? Simply because that tells you that any other solution by logic will require MORE work. That's why knowing the path with the minimal logic shows what happens in a best case scenerio. You can be sure the puzzle is "at least that hard!"

To attack this, you first need an algorithm that "finds all logic". silps original proposal, before he hamstrung it by removing advancement by contradiction, looked like a good start to me. MadOverlord's Tables, when fixed, should be pretty much the same. His solver is also publically available. (I mention this only because as a non-coder, its the only way I have of exploring hard sudokus: the other public solvers, that I'm aware of, are weaker.)

Given a solver that is very powerful in finding logic, the next step is to have it to search ahead over a number of logical steps, finding sequences that minimize the longest step over the search window. If you can do it for the whole puzzle, you have found its logical difficulty.

This brings up an important theoretical difference between measures of type 1) and 2). If you tried to minimize the "length of solutions" when backtracking, the answer would always be the number of unsolved squares at the start of the puzzle. (In the best case scenerio, you would make all the right choices while backtracking.) This would tell you nothing about the logical difficulty of the puzzle.

With measures of type 1) minimal logics for progress are well-defined and measure the amount of reasoning that will be needed to solve the puzzle. Type 2) at best only can give an approximation of this. How good of an approximation remains to be seen. I must say, though, that I've been impressed with dukuso's measures from his type 2) solver. They correspond very well to what I've gotten from the same puzzles with MadOverlord's Sudoku Susser.

Another idea which MadOverlord suggested to me (I'm paraphrasing), is to consider the ratio of logical paths that result in no progress to the total number of logical paths for each state in the puzzle. Still one would want to minimize this over the whole puzzle to get some measure of puzzle difficulty. The higher the ratio, the harder the puzzle.

That's my take.

Cheers!
_________________
Doug Bowman
www.flickr.com/photos/bistrosavage

MadOverlord · Posted: Thu Aug 11, 2005 9:42 am Post subject:

Actually, this may no longer be true. Doug Bowman and I had a breakthrough on the tables algorithm tonight. We can now crack all known puzzles without any guessing; purely by logical deduction.

As such, it may be possible to create a metric using the dimensions of the table size needed to solve the puzzle with the algorithm.

It's going to take a few days to get my code cleaned up and write up a full explanation of the algorithm, but it is human-executable [assuming you have a lot of time and a lot of pencils and scrap-paper]

dukuso · Posted: Thu Aug 11, 2005 10:36 am Post subject:

why do people try to disqualify "backtracking" as improper ?
Just because humans often have difficulties with it ?
(Although I also read messages here, where people did
backtracking of level >20 with paper and pencil )

Maybe better teach humans how to backtrack rather than
teach computers how to avoid it !

The best method is the one which is fastest.
Which needs the fewest steps. Anyone disagrees ?

When wily Coyote has finally examined all furcations
using Jellyfish etc. to find the right way , then
Roadrunner has already tried out all the forks !

When some of the sudoku rules are useful, then
backtrackers can also use them to make their
programs faster and this fastest program should
be used for measuring "hardness" more scientifically.
You could still use other measures in local newspapers,
according to what supposedly matches their readers
fellings better. But they should somehow mark it
as "non-universal" hardness-measuring.

Guenter.

Doug · Posted: Thu Aug 11, 2005 10:36 am Post subject:

@MadOverlord

What may no longer be true? Something I said, or someone else? What that I said? I said a lot of things....

Regarding tables: it will be interesting to see just how large they can get if you use them exclusively from the start of a puzzle. If they can fit into ram, say within a gigabyte or so, then super accurate measurements of logical difficulty can be given.
_________________
Doug Bowman
www.flickr.com/photos/bistrosavage

Doug · Posted: Thu Aug 11, 2005 10:41 am Post subject:

@dukuso

In fact I teach people to backtrack by hand at university. If you are refering to what I wrote, read it carefully. I said that your measures using backtracking are a) Fast and b) seem to give about the same ranking as pure logic measures. I'm just after the EXACT logical measure. That's a different thing and not something that backtracking gives. Call it a mathematical curiosity of mine. I hope it's ok for me to have it.
_________________
Doug Bowman
www.flickr.com/photos/bistrosavage

dukuso · Posted: Thu Aug 11, 2005 11:35 am Post subject:

Doug · Posted: Thu Aug 11, 2005 11:52 am Post subject:

Guenter,

Acutally, I love to see you so actively pushing your position here. It helps to ballance out those who think that "proof by contradiction" isn't logic.

You are sort of the opposite extreme! In fact, the people who demonized "proof by contradiction" did it, if I understand the history of this issue correctly, (and that's a big if since I've only been part of this community a little while), because they associated it with backtracking! How rediculous! What's worse, people seemed to just accept it in order to not make waves. Like silps removing the contradiction logic from his logic solver, statements others keep making....

I'm personally, somewhere in the middle.

I think there is absolutely nothing wrong with proof by contradiction.

I also think that backtracking is certainly logic. (D'uh-- it is folks!)

I think what the issue comes down to is this:

What is the purpose of the solver you are writing?

I see three main ones:

a) Just get the damn answer as fast as possible.

b) Get an answer and give steps that people can follow. Make the steps as easy as possible.

c) Investigate the mathematics of sudoku.

Now, backtracking can contribute to all three of the above!

But it doesn't rule out logic solvers from doing b) and c). And they can do things in b) and c) that your backtracker cannot do. Period. As a mathematician I am primariely interested in c), with some interest in b). (I'm a teacher too and I love to see applications that can train people in reasoning, ok?)

There doesn't need to be some stupid political battle about solving methods; they all have their place. Even backtracking.

Edit: Oh, regarding the thread here. My purpose has been to discuss difficulty metrics. There are many possible and refined backtrackers have been noted. But there are other possibilities that people are interested in. And they are all "scientific".
_________________
Doug Bowman
www.flickr.com/photos/bistrosavage

Merri · Posted: Thu Aug 11, 2005 2:04 pm Post subject:

Finally taking part in the actual discussion... I'm pretty simple with my mind and bad in theoretical level of mathematics (thus I have little or no interest in the mathematics of sudoku), but I did wonder what is the hardest. And after reading about the 17 starting numbers sudokus, I figured out some of these must be the hardest one can find. My simple reasoning: they cause the most work for the computers as there are 64 unknown cells in the beginning. The hardest of the minimal sudokus are also clearly the hardest sudokus I have found: my solver can solve an easy sudoku within 0.018 milliseconds, but the hardest minimal sudoku takes a whopping 10 ms.

Technically my solver is a backtracker with simple logic. The logic can be easily expanded if the need be. However, any additional logic must stay simple, because otherwise the speed gain given by the logic isn't very remarkable. The only thing any additional logic can do is to reduce the amount of required backtracking.

dukuso · Posted: Thu Aug 11, 2005 3:19 pm Post subject:

some further thoughts:

I see that there could be some difference in demonstrating a
solution-path in the easiest way, and actually finding it.

suppose you have the choice:
(1) one method is expected to find the solution in 1 ms
but needs two pages to explain it to the audience
(2) another method is expected to do it in 2 ms but
can explain the solution on one page.

which one is better ?
how should the puzzle be rated ? By (1) or by (2) ?

When it comes to the mathematics, then 9*9 sudoku-solvers
would probably not be very interesting.
Mathematicians are interested in the asymptotics,
what happens for larger n.
And then, I assume that the larger sudokus which can be governed
by the developed 9*9-rules are only a tiny fraction of all sudokus.
When it is just too unlikely that a puzzle falls into some
category, it doesn't make sense to examine it for that.
Even if you could solve it faster when you detect the property,
the total expected time for that subroutine could rise.

I'm thinking at the related and well examined
quasigroup completion problem here.
These can be solved for 36*36 and larger, I'd like to
see a comparison with sudokus here !

Guenter.

Doug · Posted: Thu Aug 11, 2005 4:24 pm Post subject:

dukuso said:

"Mathematicians are interested in the asymptotics,
what happens for larger n. "

That you make such categorical statements (and very false) about people makes you look a lot like a troll. Either that, or very ignorant, or someone with an axe to grind.

For the sake of others:

Mathematicians are interested in a lot more than asymptotocs. I'll take the field of combinatorics as an example. Combinatorialists are interested in existence of configurations of finite sets: "Do there exist 16 value starting configuration sudokus with unique solutions?. They are interested in optimal configurations of finites sets: What is the minimal number of starting numbers for a sudoku with a unique solution? or What is the sudoku with the most complex logic required for its solution? They are interested in enumeration of configurations of finite sets. Exactly how many 17 starting number 9x9 sudokus are there with unique solutions? or For large n, what is an exact formula for the number of unique (up to the natural group action) completed sudoku boards of size nxn are there. Failing an exact formula, give the asymptotics. And those are just the most obvious ones arising from combinatorics, one field of mathematics. Other fields ask yet other questions.

Now, on to a major reason why mathematicians are interested in sudoku.

You said:

"When it comes to the mathematics, then 9*9 sudoku-solvers
would probably not be very interesting.

...
... (excising your incorrect statement above)

And then, I assume that the larger sudokus which can be governed
by the developed 9*9-rules are only a tiny fraction of all sudokus.
When it is just too unlikely that a puzzle falls into some
category, it doesn't make sense to examine it for that.
Even if you could solve it faster when you detect the property,
the total expected time for that subroutine could rise. "

Actually this is precisely the reason why the logical complexity and the logical rules of sudoku should be investigated thoroughly at the 9x9 size.

As n increases, nxn sudoku is known to be NP-complete. But the P=NP problem is one of the most important ones in mathematics and computer science. If some sufficiently general and powerful logical approach can be found that extends to all n, some light might be shed on this issue. Conversely, understanding the logical difficulty at n=9 might also begin to show why the problem is so hard, or give some insight.

Saying "I think I'll give up and just do backtracking cause the problem is NP-complete" is just sticking your head in the sand. I prefer to swim in the water.

Enjoy the taste of sand!
_________________
Doug Bowman
www.flickr.com/photos/bistrosavage

Merri · Posted: Thu Aug 11, 2005 4:47 pm Post subject:

I think I just got an idea. Because for any backtracker looking through all the possible solutions takes a while, why not count up all the solutions a certain problem can give. Then use this to divide the used time or required loops and this will be the difficulty rating? Very easy sudoku doesn't take long to solve. Also, multiple solution sudokus are easier for a human to solve (and computer as well).

I guess you with more knowledge about sudokus prove this wrong. I'm too tired to see a big flaw in this idea...

Edit And right after posting I noticed a "real" sudoku doesn't have multiple solutions... oh well, I hope there is atleast some kind of idea in this.

MadOverlord · Posted: Thu Aug 11, 2005 4:52 pm Post subject:

Doug, we posted at the same time. I was replying to the message prior to you; what I was saying is that now that we have Tables working well, we don't need to worry about backtracking, since we can solve the puzzle by pure logic.

As for how large the tables get, for a typical puzzle, solving from the start, they seem to get up into the 10-15K implications range. In practice, I stop tabling when I've gotten several reduction or solve results, or 8 rounds after finding the first result, so that I can reduce the puzzle and use simpler techniques. My guess is that the toughest puzzles will require 25-30K inferences to crack at present.

However, there's cruft in the tables algorithm at present; we added some heuristics in the process of finding the important ones that probably aren't needed. So it may be possible to reduce the # of implications needed to solve a puzzle considerably.

My job over the next few days is to clean things up and find what stuff is not needed.

One approach that has occurred to me is that once we find a set of assertions and implications that crack a puzzle, we try and resolve it with individual assertions deleted and see if it still cracks. That might get us closer to the minimum number of implications needed to solve.

Merri · Posted: Thu Aug 11, 2005 4:58 pm Post subject:

The only question that arises to me in that is the speed. Knowing about speed optimization, it seems (at the moment) that methods that aren't very simple are slow to execute. Though, I guess I'd need to know more about the tables and actually understand how it works. Too limited time to get into it (I tried reading up on supercoloring, but I just couldn't follow it well... theories aren't my stuff).

Doug · Posted: Thu Aug 11, 2005 5:08 pm Post subject:

MadOverlord--

I think one needs to do more than the entire crack from start to finish as a single table in ram in order to accurately measure hardness. One probably wants to let tables run till it can no longer make any deductions. One can view the table as a directed graph with the vertices being statements and the edges being implications. Lots of different ways of measuring logical difficulty of the puzzle will be readable from the graph.

How large will the table be after it runs out of deductions, long after the solution has been found?
_________________
Doug Bowman
www.flickr.com/photos/bistrosavage

MadOverlord · Posted: Thu Aug 11, 2005 7:49 pm Post subject: