Pseudocode for forcing chain/net discovery

Draco · Last edited by Draco on Thu Jul 24, 2008 4:43 pm; edited 1 time in total

Fulfilling my offer to Ronk to post this when the code stabilized…

The goal of this algorithm is to find forcing chains or nets that originate from a b/b value (either a square with exactly 2 possible values or a house with only 2 squares containing a given value). The code will create two puzzle copies, setting one b/b value in copy 1 and the second in copy 2.

The code will then look for any changes to the PM’s that this assignment produces and record said changes. Then singles are found in multiple passes or iterations. After each iteration to find the available singles, record the changes to the PM’s. To be more precise, track which potential values from the original puzzle grid are removed from consideration as each singleton is assigned. I call these “cancellations.” We are looking for one of a few conditions:

1. Values in a square that are cancelled in both puzzle copies (common cancellation). Common cancellations must be false.
2. Contradiction Type 1: the path leads to 2 of the same value in one house. This means that b/b value/location originating the path must be false (recorded in CommonCancelPoints).
3. Contradiction Type 2: the path leads to a square with no viable candidate values remaining. This means that b/b value/location originating the path must be false (recorded in CommonCancelPoints).

As soon as any one of these 3 events is triggered (which is signaled by the appearance of one or more rows in CommonCancelPoints), the search stops and returns the chain/net that it found. The chain/net is returned for each puzzle copy as a list of the singles found in each iteration along with data to help in turning that into a path through the puzzle. The search also stops if there are no further singles to be found for either path. If no cancellations are found, the next b/b pair is evaluated. This continues until cancellations are found or all b/b pairs for the current puzzle grid state have been exhausted.

No attempt is made to apply any other logic (e.g. not looking for patterns, locked sets, ALS, etc.) insofar as chain propagation goes. In this regard, any single path is one which a human can follow (at least anybody who can solve puzzles that have been reduced to singles, both hidden and open) even though it utilizes all singletons, not just forces involving b/b locations. No claim is made that any person would be likely to try all of these variations. Smile

There are optimization parameters that are applied that can limit the search based on:

1. Starting square
2. Cancellation of a value in a square
3. Depth of the chain (# of iterations)
4. Length of chain
5. # of values cancelled
6. Degree of puzzle simplification produced by the chain

1 places limits on the b/b pair(s) to be considered in the outer FOR loop. 2 places limits on which common cancellation points will be accepted for termination of the search. 1 and/or 2 allow the user to focus the search on desired starting or ending points. 3 and 4 are used to look for simpler chains first (I call this throttling). 5 and 6 are applied for “smart searching” which scores every chain found for the grid at different throttle settings, returning the chain with the best score. This implies that one has a good scoring algorithm. It can also take a long time. With proper tuning, however, it is always as good as a simple, throttled search (which is pretty quick, usually under 100 Ms). It is also always slower (sometimes 10 – 100x slower), and often cracks the puzzle faster. I currently tune the “smart search” to take any solution that reduces the puzzle to singles over all other solutions, regardless of other considerations (such as length, depth, etc). I could just as easily tune it to do likewise for puzzles cracked to STSS or any other level of granularity offered by my scoring algorithm. That is more a matter of taste than method… though I should note that coming up with a good (i.e. relatively fast, repeatable, reliable/consistent in terms of how one chooses to rate puzzles) algorithm was not as easy a task as I first thought it might be.

This post does not show the throttling techniques, scoring techniques, finding b/b pairs, how to find singles (FindSinglesForIteration) or how to set a value and track the effect it has on PM's (SetSquareValueAndTrackCancels). I am happy to go into those details in a separate thread should anyone want to discuss things to that level of detail. My assumption is that readers already know how to perform these tasks. The particulars of their implementation would be very closely tied to the internal puzzle grid/PM representation used by the baseline solver (at least in my code they are).

This post also does not describe the work involved in turning the chain/net into a displayable form (GUI-based or text suitable for forum posts). If there is interest, I can describe the current state of my code for doing that as well. I found this to be a thornier problem in many ways than finding the chain/net itself (at least once I settled on the design below). That is, in part, due to the broad swath I cover for a single “iteration” (1 iteration includes all the open and hidden singles present for any single grid state… which means initial order depends heavily on the order you choose for calling out those singles). The code I have works well enough, but could be optimized in several edge conditions to draw more direct chains.

Finally, keep in mind that if the PM’s going in are bad, then the cancellations produced by this algorithm may be bad (as bad PM's often lead to an invalid solution). GIGO.

With that intro, here’s the pseudo-C code to find any single chain/net (before optimization is applied). Thanks go out to Danny for showing me how to post more readable code.

daj95376 · Posted: Thu Jul 24, 2008 3:02 pm Post subject:

I will have to study your algorithm in more depth. A quick pass left me a couple of questions. As for making your code listing readable, please note the additional line (of continuous underscore characters) that I added below.

daj95376 · Posted: Fri Jul 25, 2008 1:31 am Post subject:

A (forcing) chain is a single stream of linked relationships between candidates and cells in the original PM. It's tricky and inefficient to find chains by actually performing assignments and eliminations because you need to continually return the PM to its original state. I suspect that most solvers build a table of relationships and then use it to logically link entries together into a chain. By finding multiple singles and applying them across the PM in one operation, your results are based on multiple streams and are forcing nets except for the very rare case when there is only one single found for p1 (and p2) on each iteration through your FOR loop.

Note: I actually generate chains with my new solver by performing assignments and eliminations and carefully restoring the state of the original PM when necessary. I had plans to expand my logic that never materialized. Oh Well!

Contradiction Type 3: All occurrences of a value are eliminated from a house/unit.

Draco · Posted: Fri Jul 25, 2008 2:26 am Post subject:

Yes, chains and nets are found without preference for one over the other. That was a design decision; I have nothing against nets, though I know some (perhaps most) people feel otherwise.

I understand the strict chain strictures to not modify the PMs. Understanding that does not prohibit one from creating algorithms that don't conform to that restriction. And... I did say this code finds chains/nets. Smile

What drove that decision was the realization that propagation of singles is something everyone (or at least everyone who would understand how forcing chains or nets work) knows how to do. And, at least for the search/find part, it yields a very simple algorithm. The optimizations once I got the first implementation working... now those are another matter! Still, the core algorithm remains.

Part of what you do not see in my post is the simplification that occurs during the creation of the user-viewable result (left that off as it is not part of actually finding the chain or net). Branches are trimmed from a net when they do not contribute to the cancellations; this often teases a chain out of a network (but not always). There is also UI implemented to allow removal of one or more cancellation points to further simplify the net (or chain) as desired.

FWIW, becaue of the internals of my solver, I already have a lot of the code in place for "backtracking" pretty easily (it is also easy to create copies of the current grid, though that was an enhancement I added once I decided on this chain/net implementation). Once one has the list of b/b pairs for a grid, it is not difficult to walk its paths to find chains. On that basis I started designing a strict chain finder eariler this year, then stopped. I decided I did not want to spend time creating a subset of the chain/net solutions I already have by restricting what my solver finds to the strict definition of chains. We could have a long, philosophical discussion about this decision, but in the end it is a matter of taste. Everyone decides which techniques they'd like to employ, and I would rather not open the "which techniques are acceptable" can of worms in a thread about an algorithm (not saying you tried to do so, just stating my desire to avoid said worms).

Not sure how your code for CT_3 fits into the rest of your post...?

Thanks for your feedback (and help)!

Cheers...

- drac