Optimizing the Search for Subsets

Ruud · Site Admin

Looking at ways to optimize my human-style solver, I came to the conclusion that the search for (hidden and naked) subsets can be limited. I don't know if any theory on this subject has been published before, but I haven't found it, so let me know if I re-invented something here.

Here comes the theory:

Given a House with F filled cells and P pinned cells, the number of Cells and Values within that House available for subset checking (N) = 9 - F - P.

There is no need to look for subsets of size N - 1, because the remaining Cell/Value would already be pinned.

Even more interesting, every Naked Subset with size S would be complemented with a Hidden Subset with size N - S. In reverse, every Hidden Subset with size H would be complemented with a Naked Subset with size N - H.

To optimize the code, generate all permutations of available (non filled and non-pinned) Cells with a subset size between 2 and N - 2 and count the total number of values allowed in the defining Cells. When the count equals the size of the subset, the defining Cells form a Naked Subset, and the remaining Cells form a Hidden Subset.

In communication to the user, one can decide to highlight the Naked Subset, unless the size of the complementing Hidden Subset is smaller.

Does this make sense?

xyzzy · Posted: Mon Oct 03, 2005 6:51 am Post subject:

There was a thread about this a few weeks ago on this forum, here . There is also a post by Lummox JR about implimenting subsets with bitwise math.

What do you mean exactly by filled and pinned cells? Do you mean cells with only one possibility, either initial givens or found with logic?

You can optimize your search a bit more. If N is the number of cells with more than one possibility, you only need to search for subsets of size N/2 or less. Furthermore, only cells with between 2 and N/2 possibilities need to be included in the search. If subsets of size > N/2 exist, you will find them when you search for hidden subsets. When you search for hidden subsets, you only need to look for subsets of size (N-1)/2 or less.

For example, if three of the nine cells have already been discovered, there are only N=6 cells left. You only need to search for naked subsets of size 2 or 3, since any larger naked subsets will have hidden compliments that are smaller. Then search the "transpose" of the data for hidden subsets of size 2. You don't need to look for hidden triples, as any hidden triples would have had a complement naked triple that would already have been found.

Ruud · Site Admin

Thanks for your response, xyzzy.

xyzzy · Posted: Mon Oct 03, 2005 6:08 pm Post subject:

Lummox JR · Posted: Mon Oct 03, 2005 7:35 pm Post subject:

You can actually limit subset solving to half the number of candidates/cells (or rows/columns, if you're using the method to find X-wings and swordfish) involved. That is, if you have 5 positions in a house where nothing has been placed, and therefore 5 digits to go in those places, you only have to look for a naked or hidden pair.

The reason behind this is that a naked subset for one set of digits/positions is equivalent to a hidden subset for all the other digits/positions. Try the logic out; it works quite well.

This also means that you can ignore the possibility of finding subsets in any case with 3 or fewer candidates. If you have 3 digits to place and there's a naked pair, a hidden single will also exist. So unless your solver works on a principle like Trebor's where some tests can be turned off, you don't have to worry about these cases coming up.

Ruud · Site Admin

xyzzy, your arguments are clear.

However, my method does not search the sizes one by one, but incremental.

In your example, where N=6:

xyzzy · Posted: Tue Oct 04, 2005 2:02 am Post subject:

Here's an example from a real sudoku where there are N=6 non-single sets:

{139} [7] [4] {1239} [8] {125} {12569} {2569} {1569}

I'll number the sets 1 at the left to 9 at the right. If you search for all sets <=4, you'll look at these combinations:
1|4, 1|4|6, 1|4|8, 1|4|9, 1|6, 1|8, 1|9, 4|6, 4|8, 4|9, 6|8, 6|9, 8|9

If you're looking for sets <=3, you'll just look at this combination:
1|6

Now you have to look naked pairs in the transpose sets:
{14679} {4678} {14} [3] {6789} {789} [2] [5] {14789}

Since there is just a single set of size 2, there can't be any pairs and you don't even need to search!

So when you search for subsets <=4, in this case you have to look at 13 subsets. If you look for subsets <=3 and then <=2 in the transpose sets, you only have to look at one subset. So as you can see, doing double search N/2 and (N-1)/2 will result in far fewer combinations to look at then a single search of N-2.

Ruud · Site Admin

I've changed my code to look for subsets in a way that allowed switching between the N-2 and the N/2 & (N-1)/2 methods.

It tested both methods on 4 sudokus, counting the number of times it inspected a cell or value.

In your example, it would still have to inspect those 6 transposed sets to determine that there is only 1 set of size 2. Those 6 inspections take time and have been counted.

Results:

N/2 & (N-1)/2 method does 60% of the inspections of the N-1 method.

Not as spectacular as I expected, but significantly faster.

N-2 loses. My program is using the N/2 & (N-1)/2 method now.

Thanks for your explanation!

xyzzy · Posted: Thu Oct 06, 2005 2:58 am Post subject:

Have you found a good solution to deal with useless subsets being found before useful ones? For instance, consider these sets:

{45} {456} {45} {1389} {27} {89} {123} {789} {2389}

My search algorithm, and I'd guess yours too, will first find the triple that sets 1|2|3 => {456}. But this triple is useless, as it doesn't allow any reductions to be made. If you were to ignore that triple and keep searching, you'd find the pair 1|3 => {45}, which is useful as it lets you eliminate {45} from set 2.

Ruud · Site Admin

When my code finds a subset, it will do a scan on the affected candidates. When candidates were eliminated, the method will end reporting that a subset was found, allowing the solver to apply basic logic to handle the effects of the eliminations.

When the subset yields no results, the search continues and, in your example, the smaller subset will be detected.

Most of the advanced techniques that I implemented work this way. Because I write a solver log, I don't want it to report useless subsets, x-wings, etc.

kranser · Posted: Thu Oct 06, 2005 10:15 am Post subject:

xyzzy · Posted: Thu Oct 06, 2005 11:30 pm Post subject:

Now that I look at it again, my example was flawed. 6 is a hidden single in set 2 and so would have been removed when singles are removed. I don't think it is possible to construct a real example where a useful subset is hidden inside a larger useless subset.

I did find a fast way to tell if a subset is useless. Consider these sets:
{34} {34} {56} {56} {126} {127} {789} {789} {1289}
And their transpose:
{569} {569} {12} {12} {34} {345} {678} {789} {789}

First the search finds the useless pair 1|2 => {34}. If you look at the transpose sets, you'll see that 3|4 => {12}. Notice the symmetry? Every time you have a useless subset this will occur and this symmetry also proves the subset is useless.

If you look at the useful subset 3|4 => {56}, then take the transpose sets 5|6 => {345} you can see that the subset isn't useless. In fact they tell you where the reductions will exist. {345} - {34} = 5, so the reductions will consist of removing a 5 and/or 6 from set 5. For the pair 1|2 we would get {12} - {12} = empty set, so there are no reductions to make.

To restate the rule, I'll use the notation |X| to denote the number of elements in the set X. If the sets S have union U, and |S| = |U| then the elements of U can be remove from all the sets not in S. If the union of the transpose sets U is denoted T, then if and only if T = S will no elements be removed because of the subset S. Furthermore, if |T| = |U| then it follows that T = S. This way the subset S can be declared useless without even knowing S, just by knowing U.

Ruud · Site Admin

To take this one step further:

When you have established which cells have eliminations,
you can use a bitwise operation to remove the subset values from their candidate bitmap. The remaining bitmap tells you which candidates to eliminate.

So far, I've put my effort in optimizing the search and not the eliminations, because that really does not have any impact compared to the search, but combining the detection of useful subsets and using the same data to avoid a loop through all combinations is very useful.

I'm glad I started this topic. I really appreciate your helpful contributions.

Ruud · Site Admin

This thought just came up... Idea

My program keeps track of a lot of data on candidates (for cells) and possible locations for digits in a house. It must be possible to detect subsets when these bitmaps are updated, so we would not even have to search for them but detect them as they come into existence. That would really be a timesaver.

xyzzy · Posted: Wed Nov 02, 2005 4:02 pm Post subject: