Rating Difficulty

Howard · Posted: Sun Jul 31, 2005 2:47 pm Post subject: Rating Difficulty

Hi all,

I'm helping to write a handheld (Palm,Pocket PC) Sudoku player/helper/generator.

I've written an engine that's pretty good at generating puzzles, and can solve them using the majority of logical techniques. (Up to XW/Swordfish(n)/Forcing Chains, etc.) I've had to put in quite a bit of optimisation so that it will generate good Sudoku in a reasonable time on a handheld device, which are much less powerful than current PCs!

I'll discuss my puzzle generation methods elsewhere, but I wanted to share the method I've been using to determine difficulty. Since I'm only concerned with puzzles that don't require guessing, then each puzzle can be solved with a sequence of logical techniques.

As others have done, at each stage I try each of the techniques until one is successful, and then go through again and again until either the puzzle is complete, or none of the techniques I've implemented makes any further progress. Clearly the simple puzzles (requiring only single candidate or single position tests) should be rated much easier than those requiring harder techniques.

The newspapers seem to rank their puzzle difficulties mostly on how many seed cells there are. Given that they usually require only the trivial tests, this is reasonable.

What I did was to give a "cost" to each technique, and calculate a total based on how many times each technique was used. I also gave a weighting to the first time a technique was used, to give it more of a human feel. (If you had to use Disjoint Subsets once, that was hard, but using it several times isn't so much harder.)

I realise that the terms I use for the techniques may not be the same as others have used, but here's what I've been using for the cost for each technique, for its first use, and then each subsequent use.

Simes · Posted: Sun Jul 31, 2005 5:11 pm Post subject:

Howard · Posted: Tue Aug 02, 2005 8:55 pm Post subject:

droid42 · Posted: Wed Aug 03, 2005 9:15 am Post subject:

I've also been dabbling with generating puzzles over the last couple of days and have also been grappling with the challenge of deciding which puzzles are easy and which ones are hard.

To start off with, similar to your approach, I grouped solving methods together in difficulty groups. My arbitrary first attempt is as follows..

0 - Simple logic (e.g. only "possible" in cell/row/column/box etc.)
1 - Pointing pairs/triples
1 - Box/line reduction
2 - Naked pairs
2 - Hidden pairs
3 - Naked triplets
3 - Hidden triplets
3 - X-Wing
4 - Naked quads/pentuples/n-tuples
4 - Hidden quads/pentuples/n-tuples
4 - Swordfish
5 - Bigger fish
6 - Super-colouring

I generate the puzzle and then apply the solver to it using graded levels of difficult. For example, the first pass uses only simple logic. If this succeeds, the puzzle is graded 0. The next pass uses methods only in levels 0 + 1 and subsequent passes are 0 + 1 + 2 etc. etc. This (relatively) fast method gives a basic grading. For the very hardest puzzles I only let them through this first step if they need at least level 6.

For a deeper understanding of how hard a puzzle is, I run a test similar to the above but much more involved. Instead of running the 7 combinations of difficulty levels, i.e.
0
01
012
0123
01234
012345
0123456
01234567

I run every possible combination of levels, e.g.
0
01
02
03 etc.
012
013
014 etc.
056
2356
2456 etc.etc.etc.

Obviously this takes a while per puzzle (around 2-3 seconds) but is worthwhile in terms of the information you get back, namely:

- Most efficient combination of solving methods
- Least efficient combination of solving methods
- Essential solving methods
- Methods that are of no use for this puzzle
- Methods that can be used to solve it but aren't essential

Once you have all this information, it takes no more than a glance to see how interesting or hard a particular puzzle is.

As a quick aside, before I embarked on writing a sudoku generator, I always assumed that building the puzzle would be the hardest bit. In fact, it's exactly the above that's taken up most of my time.

Building the completed solution was easy (throw some random cells on the board ... does it solve? no? try again etc.) ... doesn't take long to find a completed grid.

Desiging the puzzle means taking a completed grid and using the following algorithm...

remove random cell -> will it solve? if yes, repeat by removing another cell ... if no, replace the cell and try another.

Repeat the above until the required number of cells have been removed. To my surprise, this approach spits out a very very very hard puzzle (i.e. requires at least super-colouring or jellyfish) every couple of seconds. I even found a puzzle that has a genuine naked sextuplet in it depending on the solving route taken.

Ian.

angusj · Site Admin

Simes · Posted: Wed Aug 03, 2005 10:05 am Post subject:

droid42 · Posted: Wed Aug 03, 2005 11:31 am Post subject:

angusj · Site Admin

droid42 · Posted: Wed Aug 03, 2005 3:57 pm Post subject:

upsidedownface · Posted: Mon Sep 12, 2005 9:57 pm Post subject: Classifying Difficulty

I have started trying to classify the difficulty, but I think I am only looking at what looks difficult to old low perceptionists like me.

At any "level" there is a number of candidates, often with more than one reason for being a candidate.
The first level (=0) has the given problem with the "seeds".
Having identified the candidates, I then add them all to the grid, increase the level by 1 and see how many new candidates there are.

When there are no candidates, if the number of unfilled cells is zero the problem is "solved" or if greater than zero the program is "stuck".

This is "Easy"

My program produces a log file of the steps it took, including the number of candidates at each level.

Guardian no. 108 Easy
level:- 1 2 3 4 5 6 7
count:- 2 3 5 8 12 14 11 "solved"

Guardian 095 Medium
level:- 1 2 3 "stuck"4 5 6 7 8 9 10 11 12
count:- 3 2 1 1 1 1 3 6 8 12 10 6 "solved"

For a "stuck" program/problem I display all the possibilities in the unfilled cells. My program then looks for disjoint sets, but it isn't yet very sophisticated, and I don't know what most of them look like, so I can't program it!

This is "Difficult"

As a last resort I choose a setting for some cell which has only 2 possibilities and start the solving process again. Sometimes I have to try this more than once, but I haven't yet had to do it more than 3 times.

This is "Cheated"

Les.

Agent Allen · Posted: Sat Oct 01, 2005 4:44 pm Post subject: This is in the territory

This topic is interesting.
Is there a really good catalogue defining all the methods Howard listed?
The first couple are of course very obvious, but the more complex the method the less illuminating the name.

The topic [Setting Sudoku for Human Solvers (Sudokorists)] I've started is in the same sort of area as Howard.
I'm proposing the standardise 'cost' on (notional) expert minutes and call them AESMs.

Actually the program I'm writing works in expert seconds but that seems too much precision to report in credibly.
I also suspect that care needs to be taken to go back and do it badly again to get the correct rating if you use a good guess.
I've noticed some listings where its possible that X says his tool solves problem A 10x faster than Ys for no reason other than Xs tool works well on problem A. It might be 10x slower on problem B, but how would we know? In the difficult problem topic someone points out his program thinks a problem is many times easy when rotated. He seems to realise that isn't sensible.
He's not rating the problem anymore but subjective luck of the draw in his method.
If you implement any of the 'manouvres' without rotational symmetry I certainly think your weightings won't be very objective.
_________________
Agent A

Lummox JR · Posted: Sat Oct 01, 2005 8:39 pm Post subject:

I believe the DJSS stands for disjoint subsets, and I'm not sure what USS is but that also is a subset rule. Notice he also said those are hidden and naked pairs. The most commonly accepted term for each of these rules is naked and hidden subsets.

The X-wing and swordfish patterns are basically also subset rules, but positional rather than relating to the placements of different digits in a house (block/column/row). They're quite rare, and even astoundingly rarer is the 4-row/col swordfish, which is also sometimes called jellyfish.

Forcing chains is implemented in various ways, but the most common way is to find a chain of cells which each have only 2 candidates, such that one might be 18, another 38, another 36, then 16, etc. If you have an even number of such cells linked and the first and last ones intersect at a target cell that shares their common "endpoint candidate" (in this case the 1), you can remove the candidate from that cell. (The logic is that if the target cell is 1, the 18 must be 8, the 38 must be 3, the 36 must be 6, the 16 must be 1, and there's a contradiction.) I'm not fond of this approach, though, since it takes a long time to find anything and even then will miss things. Coloring is considerably easier, and advanced coloring while difficult will find anything that forcing chains can.

I like your idea of representing score in seconds, though that would be difficult to quantify, and I don't think it can be standardized. You're also quite right in noticing that some methods work better for some problems than others. My current approach to assessing difficulty is a lot like Howard's, but without the first-try rule (I'll probably implement something of the sort eventually) and different weights, and a lot more possible tests it can run. It always adds the score of the first test that finds anything. I'd get more accurate results, however, if I implemented this in a recursive algorithm that could try harder tests as well, or 2nd or 3rd possibilities for the same tests, to try to find techniques that will crack the puzzle faster. It might be that coloring on the 2's reveals something of little value while coloring the 8's will find a key insight to the solution. I'd have to somewhat revamp my solver to do that.

Agent Allen · Posted: Sat Oct 01, 2005 9:38 pm Post subject: Yes we can standardise

I'm saying we would like to measure how hard a puzzle is by telling someone how long we think an expert sudokorist would take on average.
The measure I'm proposing is at least it is defined and meaningful which is quite a lot. Difficult to measure accurately at this time, but that can be worked on.
Then you can start to ask questions about puzzles - how long does this one take?
Questions about sudokorists - how long do you take? What's your multiplier? If you take on average 0.95*, you rock.
Questions about methods - how long does it take and does it payoff?
Is it worth the fag? Is it worth trying early or only if stuck?

It seems to me that if you measure cost but cost isn't in a physical commodity its largely meaningless.
Howards numbers look about right relative to each other, but only because I'm rescaling by /200 to get minutes.
There in the ball-park anyway looking at the rankings from the Times.
_________________
Agent A

Agent Allen

Also thanks for the methods tips.
I found a great catalogue on

http://users.pandora.be/vandenberghe.jef/sudoku/index.html?how2solve
That lists nearly everything talked about in these forums.
Its also very well explained.
Which, err is better than some of the posts round here, err sorry guys, but err some of the posting and certainly most of the C programs are completely unreadable Embarassed

.
I'm an ex-C programmer and in my day we used local variables.
Why? So we could laugh at the COBOL progammers for one thing.
Well actually because structured programming showed that made programs both safer more widely usable and readable.
Can't be bad.

There's an old C saying "if was hard to write it should be hard to read".
It is a joke, is everyone clear about that?

---
Off rant, I agree that considering time cost, I'm not convinced all of the methods catalogued round the place deserve house room.
Some of them are awfully time consuming to do and only so successful.
A bit of brute force seems to be worth a try in preference to some.
That said, without hard and fast cost measures who can say...
_________________
Agent A

Ruud · Site Admin

Pump up the score for those @#$ naked singles.

It's easier to spot a locked pair than a naked single.