can this 16*16-puzzle be solved ?

dukuso · Posted: Fri Nov 11, 2005 7:30 am Post subject: can this 16*16-puzzle be solved ?

there can be multiple solutions, so please no
uniqueness tests.
With DLX it seems not possible to solve it in reasonable time.

xyzzy · Posted: Fri Nov 11, 2005 9:11 am Post subject:

No solution
Time to solve 976 us, required 0 guesses and 92 applies

dukuso · Posted: Fri Nov 11, 2005 9:34 am Post subject:

Moschopulus · Posted: Fri Nov 11, 2005 10:20 am Post subject:

dukuso · Posted: Fri Nov 11, 2005 1:30 pm Post subject:

OK, there can't be a solution because:

1) symbols 5,G,B in block 7 must be in cells 79,89,8B
2) symbols 4AEF in block 7 be in cells 5A,6A,7A,8A
3) symbols A,B,E,F,G in block 3 must be in cells 1B,3B,49,4B

but 4 cells can't hold 5 symbols

-----------------------

I converted this into a SAT-instance (4096 variables, 110939 clauses)
and "SATZ" solved it in 2sec.
"SATZRAND" couldn't solve it within some minutes, when I stopped it.

------------------------

what I wanted is constructing hard unsolvable 16*16 (25*25?) puzzles,
which almost any solver will fail to proof unsolvable.
Ideas ? Has it been done for QWHs ?

Lummox JR · Posted: Fri Nov 11, 2005 7:53 pm Post subject:

I'm wondering if it's DLX itself that can't solve that in a reasonable time, or just your implementation. I've been using an implementation similar to yours that, importantly, does not use the full method of actual double-linked lists as Knuth described. Instead it uses arrays to show if a column/row is covered or uncovered. I tried my solver on 16x16's and it took forever; the actual generating process was interminable because it called dancing links multiple times, and that in turn took seconds at a stretch.

However, a lot of time is spent skipping past columns and rows that have already been covered. In essence all steps take O(4N^2) time to find a column, and O(N) to traverse it. In Knuth's algorithm, this doesn't happen. Each column search takes O(4j^2), where j is the number of uncovered columns, and each row traversal is only O(i), i being the number of uncovered rows in that column. The difference is probably dramatic even at a 9x9 level, but because 9x9's are so fast we don't worry about it. At 16x16 the rules change, though, since the whole algorithm has a complexity of roughly O(k^N) (k being the number of uncovered rows per constraint, on average, at each step) and you've just made it much slower by picking a higher N.

By using linked lists, a full search of the columns or rows will never include those already covered. This could be a huge timesaver, particularly in later steps and when backtracking over singles. While O(k^N) still applies, the algorithm is not wasting its time skipping over columns or rows it knows do not apply. When you consider the time it spends, proportionally, at the wider end of the search tree where more of those covered columns/rows exist, it's clear how skipping them could be a huge timesaver.

Your method (and therefore mine as well), while representing sudoku in an exact cover format, isn't technically using dancing links because nothing is linked.

[edit]
Correction: I believe the complexity is actually O(k^(N^2)).

xyzzy · Posted: Fri Nov 11, 2005 10:23 pm Post subject:

dukuso · Posted: Sun Nov 13, 2005 6:41 am Post subject:

>I'm wondering if it's DLX itself that can't solve that in a
>reasonable time, or just your implementation.

I think, it's the algo. Too obvious the timing differences between
the instances.

>I've been using an implementation similar to yours that,
>importantly, does not use the full method of actual double-linked
>lists as Knuth described. Instead it uses arrays to show if a
>column/row is covered or uncovered.

yes, I'm doing this too

>I tried my solver on 16x16's and it took forever;
>the actual generating process was interminable because
>it called dancing links multiple times, and that in turn
>took seconds at a stretch.

the problem arises i.e. with some special underconstrained puzzles.
Solving unique-solution sudokus is not that slow typically.

>However, a lot of time is spent skipping past columns and rows
>that have already been covered. In essence all steps take O(4N^2)
>time to find a column, and O(N) to traverse it. In Knuth's algorithm,
>this doesn't happen. Each column search takes O(4j^2), where j is the
>number of uncovered columns, and each row traversal is only O(i),
>i being the number of uncovered rows in that column. The difference
>is probably dramatic even at a 9x9 level, but because 9x9's are so
>fast we don't worry about it. At 16x16 the rules change, though,
>since the whole algorithm has a complexity of roughly O(k^N)
>(k being the number of uncovered rows per constraint, on average,
>at each step) and you've just made it much slower by picking a higher N.

I partly managed this by remembering the columns with one or zero
entries. This gave a factor of 5 for 9*9 and I figured out that after
that improvement only 20% of time was spent with walking through columns
while skipping the marked ones.
But you're right, it's worse on 16*16 with 1024 columns,
I just tested it.
Also multisolution puzzles where most time is spent with few clues
and many marked columns are worse, see the "nodes per second" - thread.

I thought with all the links, albeit faster, it's not so easy
to include new ideas or make changes.
Instead of linking, we could maybe also keep an updated list of
columns with two entries

>By using linked lists, a full search of the columns or rows
>will never include those already covered. This could be a huge
>timesaver, particularly in later steps and when backtracking
>over singles. While O(k^N) still applies, the algorithm is not
>wasting its time skipping over columns or rows it knows do not
>apply. When you consider the time it spends, proportionally,
>at the wider end of the search tree where more of those covered
>columns/rows exist, it's clear how skipping them could be a huge timesaver.

yes, but once you reached that point then the algo is no longer important.
Any method will solve it then. So I don't care a lot, what happens.

>Your method (and therefore mine as well), while representing
>sudoku in an exact cover format, isn't technically using dancing
>links because nothing is linked.

yes. Let alone danced.
Or put it otherwise : "dancing links" isn't an algo , it's
only an implementation-trick to make the exact-cover-algo faster.

But the problem of this thread with the posted puzzle is
of another nature.
Normal puzzles are solved in a fraction of a second but this one
takes more than 20min, maybe hours or days or years.
The reason why there can't be a solution is just not recognized
and it walks through a large searchspace just to end with
similar contradictions after about 150 placements again and again.
A human would easily recognize what the problem is after
some failed attempts.

BTW. the hardest problem in my top95 list is easy for other solvers
or humans because they also consider columns with 3 entries
even when there are columns with 2 entries available.
DLX would _never_ do this.

Now, can this be resolved ?
Of course we can implement more rules and this would probably
solve the problem for most 16*16s in practice. But you can probably
always construct puzzles to fool any solver with things like this.
Do there exist 16*16-puzzles where xyzzy's or other's solvers takes hours ?

We know about the average solution-time behaviour of these
problems for random instances as a function
of the number of clues=givens.
We get a phase transition at about 1.7*n^1.55 clues for random QCP
which is 51 givens for 9*9 or 125 givens for 16*16.
Almost all puzzles with more clues than that are solvable,
while almost all problems with fewer clues are unsolvable
and puzzles with that peak number of clues take most time.

That is for random puzzles, but what about the worst case behaviour ?
Is this known ?

my idea to design hard puzzles
for sudoku or QCP with n^4 cells :

choose a n-rookery at random uniquely defined by x positions inside it
and y positions outside.
Put symbols from {1..n} at the inside-positions
until it has no solution. Solve it, measure the time needed
by the solver, repeat this several times and choose the hardest
unsolvable n-rookery configuration found.
Now assign values to the y outside clues such that the (n*n-n)-
outside rookery has lots of nonobvious solutions.
Try several times to make a hard but solvable outside-puzzle.
Take a larger clues-density for outside than for inside.
Finally permute the digits randomly.

Now, the solver can't go the same way since it doesn't know which
rookery was chosen and there is an exponential number of choices.
It places clues according to some rules but doesn't find the
impossibility proof inside the designed rookery, because it
was made hard to detect, anticipating the solver's strategy.
So the solver must walk through many of the (exponentially many)
outside-solutions before it detects the impossibility to
make the inside valid.

-Guenter.

Lummox JR · Posted: Tue Nov 15, 2005 5:39 am Post subject:

My concern isn't strictly with solving a unique-solution grid. Rather, I'm using dancing links as a tool for building the grid. Any good puzzle creation algorithm I've seen relies on a solver that can tell it if it has a unique solution. This involves a few iterations, and improving the solve time on a 16x16 with multiple solutions--which, actually, is just the time it takes to find two solutons and therefore reject it as a complete puzzle--would help that tremendously.