Sudoku Programmers Forum Index

 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister   ProfileProfile   Log inLog in          Games  Calendar

Log in to check your private messagesLog in to check your private messages   

Large families of resolution rules

 
Post new topic   Reply to topic    Sudoku Programmers Forum Index -> Solving sudoku
View previous topic :: View next topic  
Author Message
berthier

Joined: 13 Jun 2007
Posts: 43
:
Location: Paris, France

Items
PostPosted: Thu Jun 14, 2007 6:38 am    Post subject: Large families of resolution rules Reply with quote

In my book ("The Hidden Logic of Sudoku") and in the associated solver (SudoRules), I consider 4 and only 4 families of resolution rules.

A) First it may be useful to state formally what I call a resolution rule. It is a logical formula which:
1) is a logical consequence of the 4 constraints defining the game,
2) has the form of a production (or condition-action) rule; the action part can only be the addition of a value or the deletion of a candidate.
Rules of this form are "operational": they tell you what to do, whereas a constraint only specifies a desired final state.
All the rules debated in this forum have this form (or can be rewritten so).

This approach is motivated by my goal: modelling the rules used by human players.

Notice that the 4 constraints defining the game are obviously NOT resolution rules. In order to get a resolution theory (a set of resolution rules), they have to be repalced by some of their logical consequences. The problem is that no complete set of resolution rules equivalent to the initial problem is known. This is probably the reason why we are all talking of Sudoku.



B) Now for my four families. Trying to find a limited number of rule types was motivated by two reasons:
- one practical: if a player has to search a puzzle for hundreds of different patterns, this may make the game a little tedious (but I understand that this might also be part of the fun),
- one theoretical: many rules are proposed, but the relationships between them and their usefulness are far from being clear.

So I have:

1) the family of the elementary constraints propagation rules. The word "constraint" in the name may be misleading, but they are indeed resolution rules in the above sense (they are an obvious operational reformulation of the 4 initial constraints).

2) the family of the four well known interaction rules: block --> row, row --> block, column --> block and block --> column

3) the family of subset rules. Naked Single, … Naked Quadruplets and their Hidden and "Super Hidden" counterparts (known as X-Wing, Swordfish and Jellifish). This family is proven to be strongly closed under all the symmetries of the game: no other resolution rule can be obtained from them by any symmetry. Apart from this result, there is nothing new in these three families.


4) The family of chain rules. This is where things get interesting. This family can be split into three sub-families:
a) xy-chains and their extension xyt-chains,
b) xyzt-chains,
and c) c-chains,
each of these sub-families being completed with the hidden counterparts (hxy, hxyt, hxyzt chains - there are no hidden c-chains). There are no super-hidden chains (more exactly, hidden chains have a symmetry property that makes them equal to their super-hidden counterparts). Chains have a type and a length.


(Although I have some rules for uniqueness and T&E, I do not consider them here and in the following).



C) Families 3 and 4 are stratified according to the number of cells necessary to define the associated patterns (in the proper representation space). This gives the following definition for my levels. Every level is defined in such a way that it is closed under all the symmetries of the problem.

Level 1_0 is only family 1 + Naked Single and Hidden Single
Level 1 adds the interaction rules
Level 2 adds the subset rules for pairs
Level 3 adds the subset rules for triplets and the chain rules for chains of length 3 (equivalent to XY-Wing and XYZ-Wing)
Level 4_0 adds the subset rules for quadruplets
Level 4 adds all the chains rules for chains of length 4
For any n > 4, level n adds to the previous level the chains rules for chains of length n.



D) Finally, what seems interesting here is that with a very limited number of rule types I get the following results (with all the chains being limited to length 13):
- 99.68% of the Royle collection of 17-minimal puzzles are solved;
- 97% of the randomly generated puzzles are solved;
- also, as appears from my "online supplements" to the book, lots of puzzles designed to illustrate additional rules one can find in this forum can be solved using only my set of rules.
Notice that these results could probably still be improved if I allowed combinations of chains of various types (I have some theorems allowing to do this), which I have not (or not yet) implemented in SudoRules.

This is not to say that my set of rules is better than any other (until a complete resolution theory is found, no one can make such claims). On the contrary, I think that having a profusion of rules is very useful (and probably a source of fun - for both the inventor and the player). But I also think that my results prove that it is worth trying to organise these rules into large families.

Finally-finally, how do I analyse these results? The detailed classification results in chapter XXI leave no doubt on the power of the two general types of rules I have introduced: xyt-chains and hidden chains.
Back to top
View user's profile Send private message Visit poster's website
gsf

Joined: 18 Aug 2005
Posts: 411
:
Location: NJ USA

Items
PostPosted: Thu Jun 14, 2007 7:44 am    Post subject: Re: Large families of resolution rules Reply with quote

berthier wrote:

D) Finally, what seems interesting here is that with a very limited number of rule types I get the following results (with all the chains being limited to length 13):
- 99.68% of the Royle collection of 17-minimal puzzles are solved;
- 97% of the randomly generated puzzles are solved;
- also, as appears from my "online supplements" to the book, lots of puzzles designed to illustrate additional rules one can find in this forum can be solved using only my set of rules.

in general, for constraint satisfaction and graph theoretic domains, random input is a poor measure of algorithm performance and/or breadth, mainly because random input almost always lacks the structure required to really exercise the boundaries of the domain

that's why its much harder to generate 17s or minimal 37s than minimal 24s
and also why most random puzzles fall to the basic constraints (or rules)

look in the player's forum "hardest" threads for links to hardest puzzle catalogs
the hardest of the hardest are currently rarer (and harder to generate) than 17s or minimal 37s

also look at the "pencilmark only" thread where each cell has >1 candidate but still admits
only one solution -- some of those will put (machine) solvers in their place
Back to top
View user's profile Send private message Visit poster's website
berthier

Joined: 13 Jun 2007
Posts: 43
:
Location: Paris, France

Items
PostPosted: Thu Jun 14, 2007 2:25 pm    Post subject: Re: Large families of resolution rules Reply with quote

gsf wrote:
in general, for constraint satisfaction and graph theoretic domains, random input is a poor measure of algorithm performance and/or breadth, mainly because random input almost always lacks the structure required to really exercise the boundaries of the domain

Yes, but our problematics are very different. What appears to be the boundaries in one approach may not be boundaries in another and different insights can be gained from each approach.
I'm not dealing with Sudoku as a problem in constraints satisfaction or graph theory. And I'm not looking for the most efficient algorithm for machine solving (my experience is that having a good deal of search is much more efficient than having complex rules).
What I'm looking for is a set of resolution rules (as defined in my initial post) that:
- are individually meaningful for a Sudoku player,
- give rise to clear explanations of any solution path,
- are of a limited number of types,
- are "as complete as possible" (whatever this can mean).
From such basis, I consider it is very interesting to determine what must be added to get still closer to completeness.
I agree that randomly generated puzzles are not the only criterion for "proximity to completeness", but it is a criterion.
Back to top
View user's profile Send private message Visit poster's website
gsf

Joined: 18 Aug 2005
Posts: 411
:
Location: NJ USA

Items
PostPosted: Thu Jun 14, 2007 3:13 pm    Post subject: Re: Large families of resolution rules Reply with quote

berthier wrote:

From such basis, I consider it is very interesting to determine what must be added to get still closer to completeness.

its near that boundary (getting closer to completeness) that random input will skew the analysis
Back to top
View user's profile Send private message Visit poster's website
gsf

Joined: 18 Aug 2005
Posts: 411
:
Location: NJ USA

Items
PostPosted: Thu Jun 14, 2007 3:22 pm    Post subject: Re: Large families of resolution rules Reply with quote

berthier wrote:
gsf wrote:
in general, for constraint satisfaction and graph theoretic domains, random input is a poor measure of algorithm performance and/or breadth, mainly because random input almost always lacks the structure required to really exercise the boundaries of the domain

Yes, but our problematics are very different. What appears to be the boundaries in one approach may not be boundaries in another and different insights can be gained from each approach.
I'm not dealing with Sudoku as a problem in constraints satisfaction or graph theory. And I'm not looking for the most efficient algorithm for machine solving (my experience is that having a good deal of search is much more efficient than having complex rules).

going back to the early days of the forum, (properly coded) brute force backtracking
will almost always beat rule based solving

but that's not the point
fast solvers have already been done
the focus (at least based on forum traffic) is the bounday where rule-based solver fall into brute force backtracking
and using random input as a measure of success at that boundary will give a false sense of accomplishment
Back to top
View user's profile Send private message Visit poster's website
berthier

Joined: 13 Jun 2007
Posts: 43
:
Location: Paris, France

Items
PostPosted: Thu Jun 14, 2007 4:41 pm    Post subject: Re: Large families of resolution rules Reply with quote

gsf wrote:

its near that boundary (getting closer to completeness) that random input will skew the analysis

I like your image of a boundary. The boundary of our knowledge?

I understand random input leads to some limitations, e.g. if we are interested in worst case analysis. But why should we reject any statistical analysis? Don't random collections of puzzles provide approximate answers to questions such as: what is the size of the boundary?
There is a finite (huge) number of valid puzzles. If I draw a really random sufficiently large sample of puzzles from it and solve x% of them with some solver, it gives a precise indication as to the % of puzzles that can be solved with this solver and it gives (100-x)% as the approximate size of the beyond-the-boundary for this solver.

Am I right then if I interpret your statement that "random input will skew the analysis" along two lines:
- we don't have "really random" generators
- the interesting work is in the (100-x)%

On the first point, I have no opinion, because I know of no way of testing randomness in this case.
On the second point, I would agree, just adding that the simpler the set of rules for the x% the better we are for studying the remaining (100-x)%.
Back to top
View user's profile Send private message Visit poster's website
gsf

Joined: 18 Aug 2005
Posts: 411
:
Location: NJ USA

Items
PostPosted: Thu Jun 14, 2007 5:28 pm    Post subject: Re: Large families of resolution rules Reply with quote

berthier wrote:
gsf wrote:

its near that boundary (getting closer to completeness) that random input will skew the analysis

I like your image of a boundary. The boundary of our knowledge?

the boundary of effectiveness of constraint/rule solvers before they fall into backtracking
Quote:

There is a finite (huge) number of valid puzzles. If I draw a really random sufficiently large sample of puzzles from it and solve x% of them with some solver, it gives a precise indication as to the % of puzzles that can be solved with this solver and it gives (100-x)% as the approximate size of the beyond-the-boundary for this solver.

right, there are O(5e9) different solution grids
and each of those holds a large number of minimal puzzles

the problem isn't the randomness of the pseudo-random puzzle generators
its just that, in the universe of all puzzles, #random puzzles >>> #puzzle with interesting structure,
for whatever structure you chose

for example, one could search for 17 clue puzzles by filtering 17s from a run of the mill
sudoku puzzle generator -- but it could take our lifetime for a 17 to pop out
Quote:

Am I right then if I interpret your statement that "random input will skew the analysis" along two lines:
- we don't have "really random" generators
- the interesting work is in the (100-x)%

the latter
Back to top
View user's profile Send private message Visit poster's website
berthier

Joined: 13 Jun 2007
Posts: 43
:
Location: Paris, France

Items
PostPosted: Thu Jun 14, 2007 6:23 pm    Post subject: Re: Large families of resolution rules Reply with quote

gsf wrote:
the problem isn't the randomness of the pseudo-random puzzle generators
its just that, in the universe of all puzzles, #random puzzles >>> #puzzle with interesting structure,
for whatever structure you chose

The problem is that of two differnt points of view, the local vs the global:
- the "interesting structure" defines a (small) part of all the puzzles, on which one may decide to concentrate - a normal approach for a researcher,
- nevertheless, this (small) part belongs to a whole, of which it is a well defined percentage.
If the generator one uses is not biased, then this percentage can be approximated by standard statistical analysis on large random samples. Of course, at this global level, an "interesting structure" can become invisible, especially if one defines "interesting" as "extremely rare".
To my opinion, concentrating on this small part and being aware of the global point of view are not contradictory.
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    Sudoku Programmers Forum Index -> Solving sudoku All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Sudoku Programmers topic RSS feed 


Powered by phpBB © 2001, 2005 phpBB Group

Igloo Theme Version 1.0 :: Created By: Andrew Charron