|
View previous topic :: View next topic |
Author |
Message |
| Howard
| Joined: 31 Jul 2005 | Posts: 7 | : | Location: Keele, UK | Items |
|
Posted: Sun Jul 31, 2005 2:47 pm Post subject: Rating Difficulty |
|
|
Hi all,
I'm helping to write a handheld (Palm,Pocket PC) Sudoku player/helper/generator.
I've written an engine that's pretty good at generating puzzles, and can solve them using the majority of logical techniques. (Up to XW/Swordfish(n)/Forcing Chains, etc.) I've had to put in quite a bit of optimisation so that it will generate good Sudoku in a reasonable time on a handheld device, which are much less powerful than current PCs!
I'll discuss my puzzle generation methods elsewhere, but I wanted to share the method I've been using to determine difficulty. Since I'm only concerned with puzzles that don't require guessing, then each puzzle can be solved with a sequence of logical techniques.
As others have done, at each stage I try each of the techniques until one is successful, and then go through again and again until either the puzzle is complete, or none of the techniques I've implemented makes any further progress. Clearly the simple puzzles (requiring only single candidate or single position tests) should be rated much easier than those requiring harder techniques.
The newspapers seem to rank their puzzle difficulties mostly on how many seed cells there are. Given that they usually require only the trivial tests, this is reasonable.
What I did was to give a "cost" to each technique, and calculate a total based on how many times each technique was used. I also gave a weighting to the first time a technique was used, to give it more of a human feel. (If you had to use Disjoint Subsets once, that was hard, but using it several times isn't so much harder.)
I realise that the terms I use for the techniques may not be the same as others have used, but here's what I've been using for the cost for each technique, for its first use, and then each subsequent use.
Code: |
Method name, First Subsequent
cost cost
SINGLECANDIDATE 100, 100
SINGLEPOSITION 110, 110
CANDIDATELINE 350, 200
DOUBLEPAIR 500, 250
MULTILINE 700, 400
DJSS2 1000, 750 (naked pairs)
USS2 1400, 1000 (hidden pairs)
DJSS3 2000, 1400 (naked triples)
USS3 2400, 1600 (hidden triples)
XWING 2500, 1800
SWORDFISH(3) 5000, 2000
FORCINGCHAINS 4200, 2100
DJSS4 7000, 5000
USS4 8000, 6000
SWORDFISH(4) 8000, 6000
|
I decided on these numbers based on how hard I felt each of these techniques was, compared to "100" for a single candidate test (which is of course trivial if you're using a computer solver that shows pencilmarks!). I'm sure many of you would choose different numbers (What would you change?)
I ran through a good set of Times puzzles (a couple of books worth!) to look at what kind of ranges each of the puzzles came out with:
Easy: 4500-4800 (average 4600)
Mild: 5000-5500 (average 5200)
Difficult: 5300-6000 (average 5600)
Fiendish: 5800-9000 (average 6500)
Only a few of the Times puzzles required Disjoint Subsets or anything that wasn't simple to use, but this did at least give me a range to work with. Clearly there's a lot of overlap between the ranges.
From this, I decided on some ranges so that I could classify my own generated puzzles into ranges that broadly fell across these. I wanted so that if someone entered a Times puzzle into my solver, it would reasonably give it a difficulty rating which was similar to that it came designated as.
Using this technique to classify puzzles, it becomes pretty clear that the four difficulty bands commonly used really don't do justice to many of the harder puzzles that require the much harder techniques.
Personally, beyond Fiendish, I added "Diabolical" (8000-15000) and "Obscene" (15000+).
I'm finding some lovely puzzles being generated at some higher difficulty levels, that even though they have a relatively high number of seeds (26-29), they are remarkably challenging to solve. The ones that my system rates as the hardest are those that require many different techniques to be used to solve them.
I'll probably continue to tweak the technique ratings with feedback from users, and hence move some of the difficulty boundaries, but at least for me it seems to give a continuous rating which is reasonably consistent with how hard the puzzle feels to play. I'm fairly sure this isn't formal enough for some of the mathematicians on this board, but for the purposes I wanted (to be able to categorise puzzles for users, and for them to ask to generate a puzzle of appropriate rating), it seems to hold out reasonably well!
I'll appreciate any comments or thoughts any of you have!
Howard. _________________ Howard Tomlinson, Developer, Astraware. (howard at astraware dot com) |
|
Back to top |
|
|
| Simes
| Joined: 08 Apr 2005 | Posts: 71 | : | Location: North Yorkshire, UK | Items |
|
Posted: Sun Jul 31, 2005 5:11 pm Post subject: |
|
|
Howard wrote: | I also gave a weighting to the first time a technique was used, to give it more of a human feel. | Hey, good idea! Mind if I borrow it?
How do you generate good puzzles then? I find that most of mine are easy - which makes for a lot of discarded puzzles if you're looking for a hard one. _________________ Simes
www.sadmansoftware.com/sudoku |
|
Back to top |
|
|
| Howard
| Joined: 31 Jul 2005 | Posts: 7 | : | Location: Keele, UK | Items |
|
Posted: Tue Aug 02, 2005 8:55 pm Post subject: |
|
|
Simes wrote: | Hey, good idea! Mind if I borrow it? |
Feel free!
Turning abstract values into something meaningful for people - and back - is quite an interesting exercise. In this case coming up with something that rates puzzles with more of a human feel was the intention.
I'm actually seeing some limitations as I continue working - for instance, in that the naked triples {246} {246} {246} are not necessarily the same difficulty as the group {24} {26} {46}. Perhaps I should rate them differently?
Another factor of difficulty is: from each step along in the solution, how many "valid" moves are there?
Even if the system always chooses the simplest one available, I'm sure it ought to rate the difficulty as harder if that was the *only* solution, than if there were dozens of potential moves.
That way, the very hardest puzzles would not only contain many techniques, but also have very few possible solve-routes to completion.
Hope that makes sense also
As for generating "good" puzzles, I came up with a few ideas. I'll discuss these directly with you by email first. I certainly found that the shotgun approach of "make a grid, solve it, discard it, make a grid, solve it..." is quite slow - especially if you're trying to make a grid with particular properties.
Howard. _________________ Howard Tomlinson, Developer, Astraware. (howard at astraware dot com) |
|
Back to top |
|
|
| droid42
| Joined: 29 Jul 2005 | Posts: 20 | : | | Items |
|
Posted: Wed Aug 03, 2005 9:15 am Post subject: |
|
|
I've also been dabbling with generating puzzles over the last couple of days and have also been grappling with the challenge of deciding which puzzles are easy and which ones are hard.
To start off with, similar to your approach, I grouped solving methods together in difficulty groups. My arbitrary first attempt is as follows..
0 - Simple logic (e.g. only "possible" in cell/row/column/box etc.)
1 - Pointing pairs/triples
1 - Box/line reduction
2 - Naked pairs
2 - Hidden pairs
3 - Naked triplets
3 - Hidden triplets
3 - X-Wing
4 - Naked quads/pentuples/n-tuples
4 - Hidden quads/pentuples/n-tuples
4 - Swordfish
5 - Bigger fish
6 - Super-colouring
I generate the puzzle and then apply the solver to it using graded levels of difficult. For example, the first pass uses only simple logic. If this succeeds, the puzzle is graded 0. The next pass uses methods only in levels 0 + 1 and subsequent passes are 0 + 1 + 2 etc. etc. This (relatively) fast method gives a basic grading. For the very hardest puzzles I only let them through this first step if they need at least level 6.
For a deeper understanding of how hard a puzzle is, I run a test similar to the above but much more involved. Instead of running the 7 combinations of difficulty levels, i.e.
0
01
012
0123
01234
012345
0123456
01234567
I run every possible combination of levels, e.g.
0
01
02
03 etc.
012
013
014 etc.
056
2356
2456 etc.etc.etc.
Obviously this takes a while per puzzle (around 2-3 seconds) but is worthwhile in terms of the information you get back, namely:
- Most efficient combination of solving methods
- Least efficient combination of solving methods
- Essential solving methods
- Methods that are of no use for this puzzle
- Methods that can be used to solve it but aren't essential
Once you have all this information, it takes no more than a glance to see how interesting or hard a particular puzzle is.
As a quick aside, before I embarked on writing a sudoku generator, I always assumed that building the puzzle would be the hardest bit. In fact, it's exactly the above that's taken up most of my time.
Building the completed solution was easy (throw some random cells on the board ... does it solve? no? try again etc.) ... doesn't take long to find a completed grid.
Desiging the puzzle means taking a completed grid and using the following algorithm...
remove random cell -> will it solve? if yes, repeat by removing another cell ... if no, replace the cell and try another.
Repeat the above until the required number of cells have been removed. To my surprise, this approach spits out a very very very hard puzzle (i.e. requires at least super-colouring or jellyfish) every couple of seconds. I even found a puzzle that has a genuine naked sextuplet in it depending on the solving route taken.
Ian. |
|
Back to top |
|
|
| angusj Site Admin
| Joined: 18 Jun 2005 | Posts: 406 | : | | Items |
|
Posted: Wed Aug 03, 2005 10:04 am Post subject: |
|
|
droid42 wrote: | I even found a puzzle that has a genuine naked sextuplet in it depending on the solving route taken. |
Hi Ian.
Firstly, thanks for your interesting post.
Anyhow, there's never any need to look for quintuplets, sextuplet etc because if there is one there's also a hidden or naked triple or quad in the same group (unless you're solving puzzles that are larger than the standard 9x9 grid). |
|
Back to top |
|
|
| Simes
| Joined: 08 Apr 2005 | Posts: 71 | : | Location: North Yorkshire, UK | Items |
|
Posted: Wed Aug 03, 2005 10:05 am Post subject: |
|
|
droid42 wrote: | ...this approach spits out a very very very hard puzzle ... every couple of seconds. | Does it do this even if you have a constraint that the placement of the clues must exhibit some symmetry?
droid42 wrote: | I even found a puzzle that has a genuine naked sextuplet in it depending on the solving route taken. |
Isn't that the same as a hidden triplet? (Perhaps I'm wrong (again) but I though naked x and hidden y were opposite sides of the same coin.) _________________ Simes
www.sadmansoftware.com/sudoku |
|
Back to top |
|
|
| droid42
| Joined: 29 Jul 2005 | Posts: 20 | : | | Items |
|
Posted: Wed Aug 03, 2005 11:31 am Post subject: |
|
|
angusj wrote: | droid42 wrote: | I even found a puzzle that has a genuine naked sextuplet in it depending on the solving route taken. |
Hi Ian.
Firstly, thanks for your interesting post.
Anyhow, there's never any need to look for quintuplets, sextuplet etc because if there is one there's also a hidden or naked triple or quad in the same group (unless you're solving puzzles that are larger than the standard 9x9 grid). |
Yes, interesting point. I suspect the only reasons my program finds them are a) it can and b) it was probably running the full difficulty analysis as described above with solitary-pair/triple detection switched off.
I ran it against a 25x25 earlier on and it uncovered a solitary 14-tuplet, yikes.
Wouldn't 12 23 34 45 15 in a 3x3 box be a valid quintuplet? (or would such a combination of possibles be impossible for some other reason?)
Ian. |
|
Back to top |
|
|
| angusj Site Admin
| Joined: 18 Jun 2005 | Posts: 406 | : | | Items |
|
Posted: Wed Aug 03, 2005 12:07 pm Post subject: |
|
|
droid42 wrote: | Wouldn't 12 23 34 45 15 in a 3x3 box be a valid quintuplet? |
Yes, perfectly valid. However, that leaves 6,7,8 & 9 in the remaining cells, so worst case (assuming no cells assigned in that box) there'll be a naked or hidden quad. |
|
Back to top |
|
|
| droid42
| Joined: 29 Jul 2005 | Posts: 20 | : | | Items |
|
Posted: Wed Aug 03, 2005 3:57 pm Post subject: |
|
|
angusj wrote: | droid42 wrote: | Wouldn't 12 23 34 45 15 in a 3x3 box be a valid quintuplet? |
Yes, perfectly valid. However, that leaves 6,7,8 & 9 in the remaining cells, so worst case (assuming no cells assigned in that box) there'll be a naked or hidden quad. |
Doh! ... yes, of course...
Ian. |
|
Back to top |
|
|
| upsidedownface
| Joined: 27 May 2005 | Posts: 5 | : | Location: Leeds | Items |
|
Posted: Mon Sep 12, 2005 9:57 pm Post subject: Classifying Difficulty |
|
|
I have started trying to classify the difficulty, but I think I am only looking at what looks difficult to old low perceptionists like me.
At any "level" there is a number of candidates, often with more than one reason for being a candidate.
The first level (=0) has the given problem with the "seeds".
Having identified the candidates, I then add them all to the grid, increase the level by 1 and see how many new candidates there are.
When there are no candidates, if the number of unfilled cells is zero the problem is "solved" or if greater than zero the program is "stuck".
This is "Easy"
My program produces a log file of the steps it took, including the number of candidates at each level.
Guardian no. 108 Easy
level:- 1 2 3 4 5 6 7
count:- 2 3 5 8 12 14 11 "solved"
Guardian 095 Medium
level:- 1 2 3 "stuck"4 5 6 7 8 9 10 11 12
count:- 3 2 1 1 1 1 3 6 8 12 10 6 "solved"
For a "stuck" program/problem I display all the possibilities in the unfilled cells. My program then looks for disjoint sets, but it isn't yet very sophisticated, and I don't know what most of them look like, so I can't program it!
This is "Difficult"
As a last resort I choose a setting for some cell which has only 2 possibilities and start the solving process again. Sometimes I have to try this more than once, but I haven't yet had to do it more than 3 times.
This is "Cheated"
Les. |
|
Back to top |
|
|
| Agent Allen
| Joined: 01 Oct 2005 | Posts: 34 | : | | Items |
|
Posted: Sat Oct 01, 2005 4:44 pm Post subject: This is in the territory |
|
|
This topic is interesting.
Is there a really good catalogue defining all the methods Howard listed?
The first couple are of course very obvious, but the more complex the method the less illuminating the name.
The topic [Setting Sudoku for Human Solvers (Sudokorists)] I've started is in the same sort of area as Howard.
I'm proposing the standardise 'cost' on (notional) expert minutes and call them AESMs.
Actually the program I'm writing works in expert seconds but that seems too much precision to report in credibly.
I also suspect that care needs to be taken to go back and do it badly again to get the correct rating if you use a good guess.
I've noticed some listings where its possible that X says his tool solves problem A 10x faster than Ys for no reason other than Xs tool works well on problem A. It might be 10x slower on problem B, but how would we know? In the difficult problem topic someone points out his program thinks a problem is many times easy when rotated. He seems to realise that isn't sensible.
He's not rating the problem anymore but subjective luck of the draw in his method.
If you implement any of the 'manouvres' without rotational symmetry I certainly think your weightings won't be very objective. _________________ Agent A |
|
Back to top |
|
|
| Lummox JR
| Joined: 07 Sep 2005 | Posts: 202 | : | | Items |
|
Posted: Sat Oct 01, 2005 8:39 pm Post subject: |
|
|
I believe the DJSS stands for disjoint subsets, and I'm not sure what USS is but that also is a subset rule. Notice he also said those are hidden and naked pairs. The most commonly accepted term for each of these rules is naked and hidden subsets.
The X-wing and swordfish patterns are basically also subset rules, but positional rather than relating to the placements of different digits in a house (block/column/row). They're quite rare, and even astoundingly rarer is the 4-row/col swordfish, which is also sometimes called jellyfish.
Forcing chains is implemented in various ways, but the most common way is to find a chain of cells which each have only 2 candidates, such that one might be 18, another 38, another 36, then 16, etc. If you have an even number of such cells linked and the first and last ones intersect at a target cell that shares their common "endpoint candidate" (in this case the 1), you can remove the candidate from that cell. (The logic is that if the target cell is 1, the 18 must be 8, the 38 must be 3, the 36 must be 6, the 16 must be 1, and there's a contradiction.) I'm not fond of this approach, though, since it takes a long time to find anything and even then will miss things. Coloring is considerably easier, and advanced coloring while difficult will find anything that forcing chains can.
I like your idea of representing score in seconds, though that would be difficult to quantify, and I don't think it can be standardized. You're also quite right in noticing that some methods work better for some problems than others. My current approach to assessing difficulty is a lot like Howard's, but without the first-try rule (I'll probably implement something of the sort eventually) and different weights, and a lot more possible tests it can run. It always adds the score of the first test that finds anything. I'd get more accurate results, however, if I implemented this in a recursive algorithm that could try harder tests as well, or 2nd or 3rd possibilities for the same tests, to try to find techniques that will crack the puzzle faster. It might be that coloring on the 2's reveals something of little value while coloring the 8's will find a key insight to the solution. I'd have to somewhat revamp my solver to do that. |
|
Back to top |
|
|
| Agent Allen
| Joined: 01 Oct 2005 | Posts: 34 | : | | Items |
|
Posted: Sat Oct 01, 2005 9:38 pm Post subject: Yes we can standardise |
|
|
I'm saying we would like to measure how hard a puzzle is by telling someone how long we think an expert sudokorist would take on average.
The measure I'm proposing is at least it is defined and meaningful which is quite a lot. Difficult to measure accurately at this time, but that can be worked on.
Then you can start to ask questions about puzzles - how long does this one take?
Questions about sudokorists - how long do you take? What's your multiplier? If you take on average 0.95*, you rock.
Questions about methods - how long does it take and does it payoff?
Is it worth the fag? Is it worth trying early or only if stuck?
It seems to me that if you measure cost but cost isn't in a physical commodity its largely meaningless.
Howards numbers look about right relative to each other, but only because I'm rescaling by /200 to get minutes.
There in the ball-park anyway looking at the rankings from the Times. _________________ Agent A |
|
Back to top |
|
|
| Agent Allen
| Joined: 01 Oct 2005 | Posts: 34 | : | | Items |
|
Posted: Sat Oct 01, 2005 9:50 pm Post subject: Thanks also for the tips on methods |
|
|
Also thanks for the methods tips.
I found a great catalogue on
http://users.pandora.be/vandenberghe.jef/sudoku/index.html?how2solve
That lists nearly everything talked about in these forums.
Its also very well explained.
Which, err is better than some of the posts round here, err sorry guys, but err some of the posting and certainly most of the C programs are completely unreadable .
I'm an ex-C programmer and in my day we used local variables.
Why? So we could laugh at the COBOL progammers for one thing.
Well actually because structured programming showed that made programs both safer more widely usable and readable.
Can't be bad.
There's an old C saying "if was hard to write it should be hard to read".
It is a joke, is everyone clear about that?
---
Off rant, I agree that considering time cost, I'm not convinced all of the methods catalogued round the place deserve house room.
Some of them are awfully time consuming to do and only so successful.
A bit of brute force seems to be worth a try in preference to some.
That said, without hard and fast cost measures who can say... _________________ Agent A |
|
Back to top |
|
|
| Ruud Site Admin
| Joined: 17 Sep 2005 | Posts: 708 | : | Location: Netherlands | Items |
|
Posted: Sun Oct 02, 2005 12:12 am Post subject: |
|
|
Pump up the score for those @#$ naked singles.
It's easier to spot a locked pair than a naked single. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
Igloo Theme Version 1.0 :: Created By: Andrew Charron
|