Sudoku Programmers Forum Index

 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister   ProfileProfile   Log inLog in          Games  Calendar

Log in to check your private messagesLog in to check your private messages   

BB Sudoku V1.0
Goto page Previous  1, 2, 3  Next
 
Post new topic   Reply to topic    Sudoku Programmers Forum Index -> Programming sudoku
View previous topic :: View next topic  
Author Message
gsf

Joined: 18 Aug 2005
Posts: 411
:
Location: NJ USA

Items
PostPosted: Fri Oct 23, 2009 2:44 am    Post subject: Reply with quote

Code:
#define MarkChanged(x) {register unsigned char const *ucp = C2Grp[x]; ChangedGroup[*(ucp++)] = 1; ChangedGroup[*(ucp++)] = 1; ChangedGroup[*ucp] = Changed = 1; }

JasonLion wrote:
The order of execution of post-increments relative to the expression they are in is explicitly not defined and the compiler is allowed to do it in a number of different ways. The old Borland C compiler would delay all post increments until the very end of the expression (after the assignments).

the ';' forces the compiler to (effectively) complete all computations before proceeding to the next statement
so the macro is ok on comforming compilers

however, I think the optimizer for the original macro is not doing a good job
Code:
a[b][0] = a[b][1] = a[b][2] = 1

should not have to re-evaluate a[b] for each part
and the constant suffixes should be another optimizer win

there are plenty of examples across architectures where switching from arrays to pointers is a lose
moving 64 bit pointers vs small const indices might also slow down an inner loop
Back to top
View user's profile Send private message Visit poster's website
JasonLion

Joined: 16 Nov 2008
Posts: 62
:
Location: Silver Spring, MD

Items
PostPosted: Fri Oct 23, 2009 3:09 am    Post subject: Reply with quote

Just to be clear, I was commenting on the version without semi-colons between the assignments:
Code:
#define MarkChanged(x) {register unsigned char const *ucp = C2Grp[x]; ChangedGroup[*(ucp++)] = ChangedGroup[*(ucp++)] = ChangedGroup[*(ucp++)] = Changed = 1; }


I wonder how this one does on the various compilers:
Code:
#define MarkChanged(x) {register unsigned char const *ucp = C2Grp[x]; ChangedGroup[ucp[0]] = ChangedGroup[[ucp[1]] = ChangedGroup[ucp[2]] = Changed = 1; }
Back to top
View user's profile Send private message
briturner

Joined: 14 Oct 2008
Posts: 75
:

Items
PostPosted: Fri Oct 23, 2009 3:13 am    Post subject: Reply with quote

Greetings,

I have tried a couple ideas people have posted here. This is what I saw:

1) Using a pointer to replace all G[PIdx]. This appears that it should save the evaluation of G[PIdx] in each instance. My testing showed a small speed up for (887 to 881 msec for the pearly 6000).

2) changing the MarkChanged macro. Again, the idea is for the compiler to not calculate the index. This actually caused my solver so slow considerably. I had to make each assignment a separate statement to make it work (rather than everything set to the final 1, I have 4 '= 1' statements). After that change, the timing was just slightly longer (just a fraction of a percent). Using [upc[0]]... was also in the noise.

Conclusion: These suggested changes can be helpful (10% speed up or more were seen by others), but the speedups are very dependent on the compiler and how well it optimizes. For me, they provided at best less than a 1% improvement.

One final note. On the discussion of the portability of the ++ statements in a series of assignments. For me, I found that the suggested macro did not work correctly (stats showed that is doubled much of steps, like number of naked singles). Reworking the macro to be separate assignments restored the portability.

Brit
Back to top
View user's profile Send private message
PIsaacson

Joined: 17 Jan 2006
Posts: 47
:
Location: Campbell, CA

Items
PostPosted: Fri Oct 23, 2009 7:23 am    Post subject: Reply with quote

Using gcc 4.4.1 optimize level 3 and objdump to decompile

#define MarkChanged(x) { ChangedGroup[C2Grp[x][0]] = ChangedGroup[C2Grp[x][1]] = ChangedGroup[C2Grp[x][2]] = Changed = 1; }
Code:
            MarkChanged(t2);                            // Mark groups as changed
     ad3: 8d 1c 5b              lea    (%ebx,%ebx,2),%ebx
     ad6: c7 85 54 ff ff ff 01  movl   $0x1,-0xac(%ebp)
     add: 00 00 00
     ae0: 0f b6 b3 40 00 00 00  movzbl 0x40(%ebx),%esi
     ae7: 0f b6 bb 41 00 00 00  movzbl 0x41(%ebx),%edi
     aee: 0f b6 9b 42 00 00 00  movzbl 0x42(%ebx),%ebx
     af5: c6 83 f0 d0 00 00 01  movb   $0x1,0xd0f0(%ebx)
     afc: c6 87 f0 d0 00 00 01  movb   $0x1,0xd0f0(%edi)
     b03: c6 86 f0 d0 00 00 01  movb   $0x1,0xd0f0(%esi)

#define MarkChanged(x) {register unsigned char const *ucp = C2Grp[x]; ChangedGroup[*(ucp++)] = 1; ChangedGroup[*(ucp++)] = 1; ChangedGroup[*ucp] = Changed = 1; }
Code:
            MarkChanged(t2);                            // Mark groups as changed
    5f7b: 8d 34 5b              lea    (%ebx,%ebx,2),%esi
    5f7e: 8d 9e 40 00 00 00     lea    0x40(%esi),%ebx
    5f84: 0f b6 b6 40 00 00 00  movzbl 0x40(%esi),%esi
    5f8b: c6 86 f0 d0 00 00 01  movb   $0x1,0xd0f0(%esi)
    5f92: 0f b6 73 01           movzbl 0x1(%ebx),%esi
    5f96: c6 86 f0 d0 00 00 01  movb   $0x1,0xd0f0(%esi)
    5f9d: 0f b6 5b 02           movzbl 0x2(%ebx),%ebx
    5fa1: c7 85 50 ff ff ff 01  movl   $0x1,-0xb0(%ebp)
    5fa8: 00 00 00
    5fab: c6 83 f0 d0 00 00 01  movb   $0x1,0xd0f0(%ebx)

I would say there is a slight performance advantage to the original (one less instruction), but the register usage is less in the pointer macro.

Using a sudoku grid generator program that makes numerous calls to Solver, the timings indicated that the original macro was somewhat faster.

Original:
Code:
41> time suexg 0 10000 > /dev/null

real    0m9.234s
user    0m0.015s
sys     0m0.015s
42> time suexg 0 10000 > /dev/null

real    0m9.235s
user    0m0.015s
sys     0m0.000s

Modified:
Code:
45> time suexg 0 10000 > /dev/null

real    0m9.531s
user    0m0.015s
sys     0m0.015s
45> time suexg 0 10000 > /dev/null

real    0m9.532s
user    0m0.015s
sys     0m0.015s
Back to top
View user's profile Send private message
daj95376

Joined: 05 Feb 2006
Posts: 349
:

Items
PostPosted: Fri Oct 23, 2009 8:28 am    Post subject: Reply with quote

I put the following code through my Borland C++ 5.5.1 compiler:

Code:
#define MarkChanged(x) {register unsigned char const *ucp = C2Grp[x];
                        ChangedGroup[*(ucp++)] =
                        ChangedGroup[*(ucp++)] =
                        ChangedGroup[*(ucp++)] = Changed = 1; }

    uint8   C2Grp[1][3] = {0,1,2};
    int     Changed, ChangedGroup[3];

    ChangedGroup[0] = ChangedGroup[1] = ChangedGroup[2] = 3;

    MarkChanged(0);

    printf( "ChangeGroup: %d %d %d\n",
             ChangedGroup[0], ChangedGroup[1], ChangedGroup[2] );

The output is as I (and Jason) expected:

Code:
ChangeGroup: 1 3 3

When I changed the macro to:

Code:
#define MarkChanged(x) {register unsigned char const *ucp = C2Grp[x];
                        ChangedGroup[*(ucp++)] = 1;
                        ChangedGroup[*(ucp++)] = 1;
                        ChangedGroup[* ucp   ] = Changed = 1; }

The output changed to:

Code:
ChangeGroup: 1 1 1

The following macro also worked:

Code:
#define MarkChanged(x) {register unsigned char const *ucp = C2Grp[x];
                        ChangedGroup[ucp[0]] =
                        ChangedGroup[ucp[1]] =
                        ChangedGroup[ucp[2]] = Changed = 1; }

FWIW: I believe operator precedence dictates that sudoking's altered macro, when compiled with some other compiler than mine, is actually equivalent to:

Code:
#define MarkChanged(x) {register unsigned char const *ucp = C2Grp[x];
                        ChangedGroup[ucp[2]] =
                        ChangedGroup[ucp[1]] =
                        ChangedGroup[ucp[0]] = Changed = 1; }

This doesn't make a difference in this instance, but it would if "=" was replaced by "+=" in the statement.
Back to top
View user's profile Send private message
sudoking

Joined: 20 Oct 2009
Posts: 40
:

Items
PostPosted: Fri Oct 23, 2009 8:48 am    Post subject: Reply with quote

JasonLion wrote:
The order of execution of post-increments relative to the expression they are in is explicitly not defined and the compiler is allowed to do it in a number of different ways. The old Borland C compiler would delay all post increments until the very end of the expression (after the assignments).


I wasn't aware of that. Thanx. I'll keep that in mind in the future.

JasonLion wrote:
I wonder how this one does on the various compilers:
Code:
#define MarkChanged(x) {register unsigned char const *ucp = C2Grp[x]; ChangedGroup[ucp[0]] = ChangedGroup[[ucp[1]] = ChangedGroup[ucp[2]] = Changed = 1; }


It would cause a complier error. Smile

But this:

Code:
#define MarkChanged(x) {register unsigned char const *ucp = C2Grp[x]; ChangedGroup[ucp[0]] = ChangedGroup[ucp[1]] = ChangedGroup[ucp[2]] = Changed = 1; }


gave identical results to the original. That's on a MSVC7.1 compiler.

briturner wrote:
Greetings,

I have tried a couple ideas people have posted here. This is what I saw:

1) Using a pointer to replace all G[PIdx]. This appears that it should save the evaluation of G[PIdx] in each instance. My testing showed a small speed up for (887 to 881 msec for the pearly 6000).

2) changing the MarkChanged macro. Again, the idea is for the compiler to not calculate the index. This actually caused my solver so slow considerably. I had to make each assignment a separate statement to make it work (rather than everything set to the final 1, I have 4 '= 1' statements). After that change, the timing was just slightly longer (just a fraction of a percent). Using [upc[0]]... was also in the noise.

Conclusion: These suggested changes can be helpful (10% speed up or more were seen by others), but the speedups are very dependent on the compiler and how well it optimizes. For me, they provided at best less than a 1% improvement.

One final note. On the discussion of the portability of the ++ statements in a series of assignments. For me, I found that the suggested macro did not work correctly (stats showed that is doubled much of steps, like number of naked singles). Reworking the macro to be separate assignments restored the portability.

Brit


What compiler were you using?
Back to top
View user's profile Send private message
briturner

Joined: 14 Oct 2008
Posts: 75
:

Items
PostPosted: Fri Oct 23, 2009 11:07 pm    Post subject: Reply with quote

I have recently upgraded to Win 7 RC, and MS Visual Studio 2008. For releases, I use the profiled optimization for final releases, since it does make the build a couple % faster.
Back to top
View user's profile Send private message
sudoking

Joined: 20 Oct 2009
Posts: 40
:

Items
PostPosted: Wed Oct 28, 2009 4:32 pm    Post subject: Reply with quote

Here's an interesting development. I checked the assembler that was compiled when I used the following:

Code:
#define MarkChanged(x) {register unsigned char const *ucp = C2Grp[x]; ChangedGroup[*(ucp++)] = ChangedGroup[*(ucp++)] = ChangedGroup[*(ucp++)] = Changed = 1; }


On the debug compile, with no optimisation, it worked as I had expected, and was an exact replacement for the original. However, on the release compile, with optimisations on, the increments were all performed at the end of the statement, which was the equivalent of:

Code:
#define MarkChanged(x) { ChangedGroup[C2Grp[x][0]] = Changed = 1; }


The reason I hadn't notices this before, was because this code still produced the same results. In fact, I used this line to replace the original line, tested it against 20 million puzzles, and it gave the same results for each one. briturner, is it possible that setting the remaining 2 elements of ChangedGroup (i.e. those > 8) has no effect?

Also, a note on performance increases. I'm using a relatively slow processor with a reasonably fast hard drive, so any improvements in processing will be more noticeable. This may explain why I see higher performance increases than others. It could also be because I'm using an older compiler, that doesn't produce as optimal code as that of newer compilers.
Back to top
View user's profile Send private message
briturner

Joined: 14 Oct 2008
Posts: 75
:

Items
PostPosted: Thu Oct 29, 2009 12:48 am    Post subject: Reply with quote

I would suspect the optimizations are not as good on the older compiler. So, the one change hit the right pattern for your compiler. I found that I could get a couple percent difference by including code that did not run with an older compiler. It was a hunt for the random unneeded change that would provide the best optimization. So, I am trying to avoid these type of changes, which try to tweak the optimizations, as compared to more legitimate changes, like changing the order of the guesses, or reducing an assignment.

When I used the modified macro, the solutions were still found, but it took maybe 40% longer. checking the stats showed that there were almost double the checks for naked and hidden singles (and thus the longer run time).

brit
Back to top
View user's profile Send private message
ChPicard

Joined: 12 Mar 2008
Posts: 82
:
Location: Montreal, Canada

Items
PostPosted: Fri Oct 30, 2009 10:55 am    Post subject: End the buffer in DecodePuzzleString Reply with quote

Hi

Is it necessary to end the buffer each time the routine reads a new puzzle? I do it one time in main routine and it works.


Best regards


JPS
Back to top
View user's profile Send private message
dmhayden

Joined: 01 Aug 2007
Posts: 16
:
Location: Central New Jersey

Items
PostPosted: Sat Oct 31, 2009 12:35 pm    Post subject: Reply with quote

sudoking wrote:
Here's an interesting development. I checked the assembler that was compiled when I used the following:

Code:
#define MarkChanged(x) {register unsigned char const *ucp = C2Grp[x]; ChangedGroup[*(ucp++)] = ChangedGroup[*(ucp++)] = ChangedGroup[*(ucp++)] = Changed = 1; }


On the debug compile, with no optimisation, it worked as I had expected, and was an exact replacement for the original. However, on the release compile, with optimisations on, the increments were all performed at the end of the statement, which was the equivalent of:

Code:
#define MarkChanged(x) { ChangedGroup[C2Grp[x][0]] = Changed = 1; }


You got different results because the behavior of the code is not well defined. In your example, the compiler can increment the ucp any time it wants, as long as it does so before the end of the statement, and since you're evaluating it several times, the only thing you're guaranteed is that one of the "ucp++" expressions will evaluate to the original value. You don't even know which one.
Back to top
View user's profile Send private message AIM Address
lkSudoku

Joined: 16 May 2009
Posts: 60
:

Items
PostPosted: Mon Nov 16, 2009 1:20 pm    Post subject: Reply with quote

This code is indeed fast
Back to top
View user's profile Send private message Send e-mail
dobrichev

Joined: 18 Dec 2009
Posts: 28
:

Items
PostPosted: Tue Jan 05, 2010 11:48 pm    Post subject: Reply with quote

Good Job!

There are really interesting pieces of code.

Let suggest some improvements.

1) Data definition.
Using unsigned short instead of unsigned int for Grid, Grp, and SingleVal improves performance.

Code:

...
  unsigned short Grid[81];   // Possibilities left for each cell
  unsigned short Grp[27];    // Values found in each group
...
unsigned short SingleVal[128];


2) Inlining.
This function definitions are working better for me along with compiling with option not automatically to inline all "suitable" functions.
Code:

       int  DecodePuzzleString (int ret_puz, char* buffer);
inline void InitGrid ();
//void ProcessInitSingles (void);
       void ProcessSingles (void);
inline void FindHiddenSingles (void);
       void FindLockedCandidates (void);
       void FindSubsets (void);
       void FindFishies (void);
       void DoStep (int doing_2_step, int use_methods);
inline void MakeGuess (void);


3) ProcessInitSingles function works slower than ProcessSingles.

Code:

...
      if (SingleCnt)
      { SCnt++;
        //if (SingleCnt > 2)               // If multiple singles
        //  ProcessInitSingles();          //   process them all at once
        //if (SingleCnt)                   // otherwise
        ProcessSingles();              //   process them one at a time
        if (!G[PIdx].CellsLeft)
...


4) ChangedGroup cache slows down the process.

Code:

//#define MarkChanged(x) { ChangedGroup[C2Grp[x][2]] = ChangedGroup[C2Grp[x][1]] = ChangedGroup[C2Grp[x][0]] = Changed = 1; }
#define MarkChanged(x) { ; }

...

void FindHiddenSingles (void)
{ unsigned short t1, t2, t3, b, t;
  int i, j;

  Changed = 0;
  for (i = 0; i < 27; i++)
    //if (ChangedGroup[i])
    { ChangedLC[i] = 1;
...
      //ChangedGroup[i] = 0;
    }
  //Changed = 0;
}


5) Improvements in the most critical code - ProcessSingles.

The group cache update is unconditional, but removing the possibility may raise error and return before completion of the whole loop. Moving (and unrolling) the group cache update to the bottom benefits from eventual earlier return.

The internal loop unrolling is critical for performance. For compilers that do not support unrolling, defining the body of the loop as a macro and calling him manually 20 times works fine.

6) In the main loop most advanced algorithms are executed even if some procedure found a discrepancy at earlier stages. Checking for No_Sol before calling each method solves this.

Code:

      if (Changed){
        if (!No_Sol)
          { HCnt++; FindHiddenSingles();      if (SingleCnt) continue;}
        if ((!No_Sol) && use_methods & USE_LOCK_CAND)
          { LCCnt++;  FindLockedCandidates(); if (Changed) continue; }
        if ((!No_Sol) && use_methods & USE_SUBSETS)
          { SSCnt++;  FindSubsets();          if (Changed) continue; }
        if ((!No_Sol) && use_methods & USE_FISHIES)
          { FCnt++;   FindFishies();          if (Changed) continue; }
        if ((!No_Sol) && use_methods & USE_1_STEP)
          { OneCnt++; DoStep(0, use_methods); if (Changed) continue; }
        if ((!No_Sol) && use_methods & USE_2_STEP)
          { TwoCnt++; DoStep(1, use_methods); if (Changed) continue; }
      }


7) random.h along with its initialization are unnecessary.

8) Printing statistics for totals instead for the latest puzzle are more informative Embarassed

Code:

  if (show_stats)
    fprintf (outfile, "\n  %9ld Singles Processed\n  %9ld Hidden Singles\n  %9ld Locked Candidates"
            "\n  %9ld SubSets Searched\n  %9ld Fishies\n  %9ld 1-Steps\n  %9ld 2-Steps"
            "\n  %9ld Guesses\n  %9d Max Depth\n",
            //SCnt, HCnt, LCCnt, SSCnt, FCnt, OneCnt, TwoCnt, GCnt, MaxDepth);
            TSCnt, THCnt, TLCCnt, TSSCnt, TFCnt, TOneCnt, TTwoCnt, TGCnt, MaxDepth);


9) In FindHiddenSingles I somehow decided to update the grid directly, not waiting the queue to be processed. Now I am not sure about the benefits, but probably this increases chances for earlier discrepancy finding.

Code:

void FindHiddenSingles (void)
{ unsigned short t1, t2, t3, b, t;
...
            if (!B2V[t]) { No_Sol = 1; SingleCnt = 0; Changed = 0; return; }
            G[PIdx].Grid[Group[i][j]] = t;
            PushSingle((char)Group[i][j], t);
...
}


Thank you Brian for this fast and elegant code.

I used the impressive piece of code
Code:
      t1 = t2 = t3 = 0;
      for (j = 0; j < 9; j++)
      { b  = G[PIdx].Grid[Group[i][j]];
        t2 = t2 | (t1 & b);
        t1 = t1 | b;
      }
...
      t3 = t2 ^ t1;
in my solver. This, along with copying the whole FindLockedCandidates improved the performance about 4 times.

Source in C++ and Windows binaries of my solver are available at (ah, these link limitations) "sites . google . com / site / dobrichev / fsss"
Back to top
View user's profile Send private message
briturner

Joined: 14 Oct 2008
Posts: 75
:

Items
PostPosted: Wed Jan 06, 2010 4:00 am    Post subject: Reply with quote

Greetings,

First off, congratulations, preliminary tests are showing around a 15% to 20% improvement with your program.

There is always a bit of a problem with tweaking code for efficiency, since it is very compiler and environment (os / processor / hardware) dependent. I compiled and tested the improvements you recommended (compiled directly from the file you emailed me), and for me, it actually slightly increased the amount of time needed.

On the otherhand, when I compiled your solver (fsss), it did run faster than my solver did, by some 15% to 20%. I will be looking through your code to see what some of the differences are. I do know I left some inefficiencies in my code as part of additional functionality (like handling different solving techniques and such), but that would not account for the difference.

bt
Back to top
View user's profile Send private message
briturner

Joined: 14 Oct 2008
Posts: 75
:

Items
PostPosted: Wed Jan 06, 2010 2:40 pm    Post subject: Reply with quote

Greetings again,

It is going to take a bit more for me to get an apples to apples timing comparison here. 2 things I noticed are:

a) you buffer the entire file IO before starting the timing, and

b) more importantly, it looks like you are attempting to run on multiple processors (I see things like "#pragma omp parallel for schedule(static, 1) reduction(+:nSolved)").

While these are legitimate speedups, it does make the timing of the actual solving algorithm more difficult.

I think I will try putting both solvers in the same timing stub, which will help give me better timings.

bt
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Sudoku Programmers Forum Index -> Programming sudoku All times are GMT
Goto page Previous  1, 2, 3  Next
Page 2 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Sudoku Programmers topic RSS feed 


Powered by phpBB © 2001, 2005 phpBB Group

Igloo Theme Version 1.0 :: Created By: Andrew Charron