Sudoku Programmers Forum Index

 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister   ProfileProfile   Log inLog in          Games  Calendar

Log in to check your private messagesLog in to check your private messages   

Optimizing the Search for Subsets
Goto page 1, 2  Next
 
Post new topic   Reply to topic    Sudoku Programmers Forum Index -> Solving sudoku
View previous topic :: View next topic  
Author Message
Ruud
Site Admin
Joined: 17 Sep 2005
Posts: 708
:
Location: Netherlands

Items
PostPosted: Mon Oct 03, 2005 12:42 am    Post subject: Optimizing the Search for Subsets Reply with quote

Looking at ways to optimize my human-style solver, I came to the conclusion that the search for (hidden and naked) subsets can be limited. I don't know if any theory on this subject has been published before, but I haven't found it, so let me know if I re-invented something here.

Here comes the theory:

Given a House with F filled cells and P pinned cells, the number of Cells and Values within that House available for subset checking (N) = 9 - F - P.

There is no need to look for subsets of size N - 1, because the remaining Cell/Value would already be pinned.

Even more interesting, every Naked Subset with size S would be complemented with a Hidden Subset with size N - S. In reverse, every Hidden Subset with size H would be complemented with a Naked Subset with size N - H.

To optimize the code, generate all permutations of available (non filled and non-pinned) Cells with a subset size between 2 and N - 2 and count the total number of values allowed in the defining Cells. When the count equals the size of the subset, the defining Cells form a Naked Subset, and the remaining Cells form a Hidden Subset.

In communication to the user, one can decide to highlight the Naked Subset, unless the size of the complementing Hidden Subset is smaller.

Does this make sense?
Back to top
View user's profile Send private message Visit poster's website
xyzzy

Joined: 24 Aug 2005
Posts: 80
:

Items
PostPosted: Mon Oct 03, 2005 6:51 am    Post subject: Reply with quote

There was a thread about this a few weeks ago on this forum, here . There is also a post by Lummox JR about implimenting subsets with bitwise math.

What do you mean exactly by filled and pinned cells? Do you mean cells with only one possibility, either initial givens or found with logic?

You can optimize your search a bit more. If N is the number of cells with more than one possibility, you only need to search for subsets of size N/2 or less. Furthermore, only cells with between 2 and N/2 possibilities need to be included in the search. If subsets of size > N/2 exist, you will find them when you search for hidden subsets. When you search for hidden subsets, you only need to look for subsets of size (N-1)/2 or less.

For example, if three of the nine cells have already been discovered, there are only N=6 cells left. You only need to search for naked subsets of size 2 or 3, since any larger naked subsets will have hidden compliments that are smaller. Then search the "transpose" of the data for hidden subsets of size 2. You don't need to look for hidden triples, as any hidden triples would have had a complement naked triple that would already have been found.
Back to top
View user's profile Send private message Visit poster's website
Ruud
Site Admin
Joined: 17 Sep 2005
Posts: 708
:
Location: Netherlands

Items
PostPosted: Mon Oct 03, 2005 11:30 am    Post subject: Reply with quote

Thanks for your response, xyzzy.
Quote:
What do you mean exactly by filled and pinned cells? Do you mean cells with only one possibility, either initial givens or found with logic?

Correct. The remaining count N contains only cells with more than one candidate.

Quote:
If subsets of size > N/2 exist, you will find them when you search for hidden subsets.

My point is, that due to the fact that naked and hidden subsets are complements, there is no need to have a separate search for hidden subsets. A seach for naked subsets upto size N-2 will find the complementing naked subset, therefore exposing the hidden subset in the remaining cells.

The thread you mention deals with X-wings, swordfish and such patterns, not really with the search for subsets. The post by Lummox JR is very useful, since it also recognizes the complementing nature of subsets. However, I usually lose track of all the i's and j's and k's in such code, so I prefer to discuss this on a conceptual level.
Back to top
View user's profile Send private message Visit poster's website
xyzzy

Joined: 24 Aug 2005
Posts: 80
:

Items
PostPosted: Mon Oct 03, 2005 6:08 pm    Post subject: Reply with quote

Quote:
My point is, that due to the fact that naked and hidden subsets are complements, there is no need to have a separate search for hidden subsets.

That's true, but it should be faster to look for N/2 and then (N-1)/2 than to look for N-2. Example for N=6:

S=2: 15 combinations
S=3: 20 combinations
S=4: 15 combinations

So for the N/2 then (N-1)/2 you have to search 15+20 then 15 combinations, 50 total. For the N-2 menthod you have search 15+20+15, also 50 total. Now you might think, it makes no difference and you have to search the same number, but that's not the case. For the first method, you can ignore all cells with more than 3 possibilities for the first search and all cells with more than 2 on the second. If use used the N-2 method, you can only ignore cells with more then 4 possibilities. So unless every non-single cell has just two possibilities, it's a smaller search space with the first method.

Quote:
The thread you mention deals with X-wings, swordfish and such patterns, not really with the search for subsets

As Lummox JR explained in his post, x-wings, et al are the exact same thing as subsets. I use the same code for them both, it's just a matter of loading the sets to search in a different order.
Back to top
View user's profile Send private message Visit poster's website
Lummox JR

Joined: 07 Sep 2005
Posts: 202
:

Items
PostPosted: Mon Oct 03, 2005 7:35 pm    Post subject: Reply with quote

You can actually limit subset solving to half the number of candidates/cells (or rows/columns, if you're using the method to find X-wings and swordfish) involved. That is, if you have 5 positions in a house where nothing has been placed, and therefore 5 digits to go in those places, you only have to look for a naked or hidden pair.

The reason behind this is that a naked subset for one set of digits/positions is equivalent to a hidden subset for all the other digits/positions. Try the logic out; it works quite well.

This also means that you can ignore the possibility of finding subsets in any case with 3 or fewer candidates. If you have 3 digits to place and there's a naked pair, a hidden single will also exist. So unless your solver works on a principle like Trebor's where some tests can be turned off, you don't have to worry about these cases coming up.
Back to top
View user's profile Send private message
Ruud
Site Admin
Joined: 17 Sep 2005
Posts: 708
:
Location: Netherlands

Items
PostPosted: Mon Oct 03, 2005 9:02 pm    Post subject: Reply with quote

xyzzy, your arguments are clear.

However, my method does not search the sizes one by one, but incremental.

In your example, where N=6:

Code:
search all cells with 2-4 candidates
| search all remaining cells and combine (bitwise OR) candidates
|-when count = 2, report the naked pair
|-when count > 4, continue
|-otherwise, search remaining cells and combine candidates with 1 and 2
| |-when count = 3, report the naked triplet
| |-when count > 4, continue
| |-otherwise, search remaining cells and combine candidates with 1,2 and 3
| | |-when count = 4, report the hidden pair


This makes the code skip most of the 20 + 15 combinations, because either the 2 or 3 combined candidates add up to more than 4.

Having few clues and lots of candidates makes this method faster, not slower.

Lummox:
Quote:
if you have 5 positions in a house where nothing has been placed, and therefore 5 digits to go in those places, you only have to look for a naked or hidden pair.

When there are 5 positions, my method would test for naked subsets of 2 and 3 positions, initially skipping all cells with more than 3 candidates, then skipping the search for a 3rd candidate when the first 2 cells combined already have more than 3 candidates. When a naked triplet is found, it will report the hidden pair in the complementing cells.

Quote:
If you have 3 digits to place and there's a naked pair, a hidden single will also exist

This has been dealt with in the search method. That is why the method will only search for naked subsets with a size between 2 and N-2, where N is the number of cells in the house with multiple candidates.

The following is also true: If you have 3 digits to place and there's a hidden pair, a naked single will also exist.
Back to top
View user's profile Send private message Visit poster's website
xyzzy

Joined: 24 Aug 2005
Posts: 80
:

Items
PostPosted: Tue Oct 04, 2005 2:02 am    Post subject: Reply with quote

Here's an example from a real sudoku where there are N=6 non-single sets:

{139} [7] [4] {1239} [8] {125} {12569} {2569} {1569}

I'll number the sets 1 at the left to 9 at the right. If you search for all sets <=4, you'll look at these combinations:
1|4, 1|4|6, 1|4|8, 1|4|9, 1|6, 1|8, 1|9, 4|6, 4|8, 4|9, 6|8, 6|9, 8|9

If you're looking for sets <=3, you'll just look at this combination:
1|6

Now you have to look naked pairs in the transpose sets:
{14679} {4678} {14} [3] {6789} {789} [2] [5] {14789}

Since there is just a single set of size 2, there can't be any pairs and you don't even need to search!

So when you search for subsets <=4, in this case you have to look at 13 subsets. If you look for subsets <=3 and then <=2 in the transpose sets, you only have to look at one subset. So as you can see, doing double search N/2 and (N-1)/2 will result in far fewer combinations to look at then a single search of N-2.
Back to top
View user's profile Send private message Visit poster's website
Ruud
Site Admin
Joined: 17 Sep 2005
Posts: 708
:
Location: Netherlands

Items
PostPosted: Tue Oct 04, 2005 3:42 pm    Post subject: Reply with quote

I've changed my code to look for subsets in a way that allowed switching between the N-2 and the N/2 & (N-1)/2 methods.

It tested both methods on 4 sudokus, counting the number of times it inspected a cell or value.

In your example, it would still have to inspect those 6 transposed sets to determine that there is only 1 set of size 2. Those 6 inspections take time and have been counted.

Results:

N/2 & (N-1)/2 method does 60% of the inspections of the N-1 method.

Not as spectacular as I expected, but significantly faster.

N-2 loses. My program is using the N/2 & (N-1)/2 method now.

Thanks for your explanation!
Back to top
View user's profile Send private message Visit poster's website
xyzzy

Joined: 24 Aug 2005
Posts: 80
:

Items
PostPosted: Thu Oct 06, 2005 2:58 am    Post subject: Reply with quote

Have you found a good solution to deal with useless subsets being found before useful ones? For instance, consider these sets:

{45} {456} {45} {1389} {27} {89} {123} {789} {2389}

My search algorithm, and I'd guess yours too, will first find the triple that sets 1|2|3 => {456}. But this triple is useless, as it doesn't allow any reductions to be made. If you were to ignore that triple and keep searching, you'd find the pair 1|3 => {45}, which is useful as it lets you eliminate {45} from set 2.
Back to top
View user's profile Send private message Visit poster's website
Ruud
Site Admin
Joined: 17 Sep 2005
Posts: 708
:
Location: Netherlands

Items
PostPosted: Thu Oct 06, 2005 3:27 am    Post subject: Reply with quote

When my code finds a subset, it will do a scan on the affected candidates. When candidates were eliminated, the method will end reporting that a subset was found, allowing the solver to apply basic logic to handle the effects of the eliminations.

When the subset yields no results, the search continues and, in your example, the smaller subset will be detected.

Most of the advanced techniques that I implemented work this way. Because I write a solver log, I don't want it to report useless subsets, x-wings, etc.
Back to top
View user's profile Send private message Visit poster's website
kranser

Joined: 18 Aug 2005
Posts: 35
:

Items
PostPosted: Thu Oct 06, 2005 10:15 am    Post subject: Reply with quote

xyzzy wrote:
Have you found a good solution to deal with useless subsets being found before useful ones? For instance, consider these sets:

{45} {456} {45} {1389} {27} {89} {123} {789} {2389}

My search algorithm, and I'd guess yours too, will first find the triple that sets 1|2|3 => {456}. But this triple is useless, as it doesn't allow any reductions to be made. If you were to ignore that triple and keep searching, you'd find the pair 1|3 => {45}, which is useful as it lets you eliminate {45} from set 2.


My algorithm (which is very slow, BTW!) searches each row, column & box for twins first, and then triples (i.e. it has a for loop from 2 to 5, that determines how many items it will look for!).

This may help you, but then it's very slow starting looking for 2 matches then continuing up to a possible 5!

Kranser.
Back to top
View user's profile Send private message
xyzzy

Joined: 24 Aug 2005
Posts: 80
:

Items
PostPosted: Thu Oct 06, 2005 11:30 pm    Post subject: Reply with quote

Now that I look at it again, my example was flawed. 6 is a hidden single in set 2 and so would have been removed when singles are removed. I don't think it is possible to construct a real example where a useful subset is hidden inside a larger useless subset.

I did find a fast way to tell if a subset is useless. Consider these sets:
{34} {34} {56} {56} {126} {127} {789} {789} {1289}
And their transpose:
{569} {569} {12} {12} {34} {345} {678} {789} {789}

First the search finds the useless pair 1|2 => {34}. If you look at the transpose sets, you'll see that 3|4 => {12}. Notice the symmetry? Every time you have a useless subset this will occur and this symmetry also proves the subset is useless.

If you look at the useful subset 3|4 => {56}, then take the transpose sets 5|6 => {345} you can see that the subset isn't useless. In fact they tell you where the reductions will exist. {345} - {34} = 5, so the reductions will consist of removing a 5 and/or 6 from set 5. For the pair 1|2 we would get {12} - {12} = empty set, so there are no reductions to make.

To restate the rule, I'll use the notation |X| to denote the number of elements in the set X. If the sets S have union U, and |S| = |U| then the elements of U can be remove from all the sets not in S. If the union of the transpose sets U is denoted T, then if and only if T = S will no elements be removed because of the subset S. Furthermore, if |T| = |U| then it follows that T = S. This way the subset S can be declared useless without even knowing S, just by knowing U.
Back to top
View user's profile Send private message Visit poster's website
Ruud
Site Admin
Joined: 17 Sep 2005
Posts: 708
:
Location: Netherlands

Items
PostPosted: Fri Oct 07, 2005 10:34 am    Post subject: Reply with quote

To take this one step further:

When you have established which cells have eliminations,
you can use a bitwise operation to remove the subset values from their candidate bitmap. The remaining bitmap tells you which candidates to eliminate.

So far, I've put my effort in optimizing the search and not the eliminations, because that really does not have any impact compared to the search, but combining the detection of useful subsets and using the same data to avoid a loop through all combinations is very useful.

I'm glad I started this topic. I really appreciate your helpful contributions.
Back to top
View user's profile Send private message Visit poster's website
Ruud
Site Admin
Joined: 17 Sep 2005
Posts: 708
:
Location: Netherlands

Items
PostPosted: Fri Oct 07, 2005 11:13 am    Post subject: Reply with quote

This thought just came up... Idea

My program keeps track of a lot of data on candidates (for cells) and possible locations for digits in a house. It must be possible to detect subsets when these bitmaps are updated, so we would not even have to search for them but detect them as they come into existence. That would really be a timesaver.
Back to top
View user's profile Send private message Visit poster's website
xyzzy

Joined: 24 Aug 2005
Posts: 80
:

Items
PostPosted: Wed Nov 02, 2005 4:02 pm    Post subject: Reply with quote

Ruud wrote:

My program keeps track of a lot of data on candidates (for cells) and possible locations for digits in a house. It must be possible to detect subsets when these bitmaps are updated, so we would not even have to search for them but detect them as they come into existence. That would really be a timesaver.


It certainly is for singles!

I thought of this before I wrote my subset solver, but decided it wouldn't be faster, and so wrote it as a search instead of an on-the-fly detector.

I figured that you when you change a bitmap, any new subset that may have been created MUST involve the bitmap which was changed. So, you would need to check for subsets involving that bitmap in each grouping it's in.

For example, say you adjusted a bitmap which represents the possible digits for a given cell. You need to check for naked subsets in three places: the row, column, and box the cell is in. If you adjusted the bitmap for the possible locations of a given number in a given row, you'd need to check for new subsets in two places: hiddens subsets in the row that involve the number and row wing-fishes in the number that involve the row.

To start with, you could look at the number of bits left in the bitmap you updated, call it L. The look at the number of non-singles left in the group you are checking for subsets, call that N. If L > N/2, then you know that there are no subsets involving the bitmap in that group and no search at all is necessary there.

To do a search for subsets that involve a certain bitmap (I'll call it the key bitmap), you can do the following. It helps to have looked at the recursive subset solver I posted in another thread. In the list of bitmaps to search, make the key bitmap zero, as if it were a single to be ignored. My recursive search has three parameters: the bitmap to start on, the size of the subset being looked at, and the union of the bitmaps in the subset. Normally it starts with a call to search(0,1,0), which means start at bitmap 0, the current subset size is 1 bitmap, and the union so far is the empty set. To do the search involving the key set, it should be called search(0,2,key_set), which means start with size 2 subsets, and the initial union is the key set.

This search with a key bitmap is faster than a full subset search, but not all that much. It's sort of like doing a full search of N-1 bitmaps.

Doing these searches constantly, instead of only after faster techniques like singles are finished, seems like it's sure to be much slower.

Another way might be have a flag for each bitmap than indicates if the bitmap has changed. After a full subset search, all flags are cleared. When a bitmap is changed, the flag gets set. This isn't that expensive to keep track of.

After doing faster algorithms, the subset search can be done and make use of these flags. If all the bitmaps for a group have not been flagged, no new subset exist there and the search can be skipped. If flagged bitmaps do exist, they can be checked to see if they have more than "max subset size for the group" bits set, and if so they can be eliminated from the search (set to zero as if they were singles) and unflagged. If all the flagged bitmaps are eliminated this way, no search need be done. Otherwise any subset found must involve the flagged bitmaps. This search can be done by setting the first flagged bitmap to zero, unflagging it, and then starting the search with the bitmap 'preloaded' as I described before. If no subset is found, the next flagged bitmap is set to zero, unflagged, and 'preload' searched, and so on until no flagged bitmaps remain in the group.

I thought about doing this, but I figured that normally so many bitmaps changed between subset searchs, that this optimization would eliminate so few searches that it wouldn't pay for the cost of keeping track of all the flags.
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    Sudoku Programmers Forum Index -> Solving sudoku All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Sudoku Programmers topic RSS feed 


Powered by phpBB © 2001, 2005 phpBB Group

Igloo Theme Version 1.0 :: Created By: Andrew Charron