
View previous topic :: View next topic 
Author 
Message 
 lkSudoku
 Joined: 16 May 2009  Posts: 60  :   Items 

Posted: Mon Nov 16, 2009 12:06 pm Post subject: Smallest size for storing a puzzle 


As I was building my online sudoku website www.lksudoku.co.cc with puzzles prepared in advanced, I came across the following question:
Suppose you have a large number, say P, of sudoku puzzles, each puzzle containing between 20 to 33 givens; What is the smallest number of bits that are required in order to store all the P puzzles and their solutions, and how they can be stored to achieve that number of bits? 

Back to top 


 sudoking
 Joined: 20 Oct 2009  Posts: 40  :   Items 

Posted: Mon Nov 16, 2009 3:46 pm Post subject: Re: Smallest size for storing a puzzle 


lkSudoku wrote:  As I was building my online sudoku website www.lksudoku.co.cc with puzzles prepared in advanced, I came across the following question:
Suppose you have a large number, say P, of sudoku puzzles, each puzzle containing between 20 to 33 givens; What is the smallest number of bits that are required in order to store all the P puzzles and their solutions, and how they can be stored to achieve that number of bits? 
Well, you could store the value of each cell in 4 bit, and add an extra bit to signify whether that cell is a given. So that would be 81 * 4 + 81 = 405 bits per puzzle.
Then you could shave a few bits off by combining cells. eg, the number of possible combinations for cells 1 and 2 is 9 * 8 * 7 = 504. 504 can be represented by only 9 bits (instead of 12 as above). So that would make it 3 bits per cell to store the solution value, and 1 bit to signify whether the cell is a given. 81 * 3 + 81 = 324 bits per puzzle.
But then there's the fact that not every combination of three cells will have 504 possibilities. eg. once the first 3 cell in a given line are known, then the possible combination of the next 3 cells is 6 * 5 * 4 (120), not 9 * 8 * 7 (504). 120 can be represented with only 7 bits. The final 3 cells in the line then only have 3 * 2 * 1 (6) combinations, which can of course be represented in only 3 bits. So, to represent a single solution line would require 9 + 7 + 3 = 19 bits (plus the standard 9 bits to determine if the cell is a given). So the whole puzzles could be represented with 19 * 9 + 81 = 252 bits per puzzle.
You could go even further. For example, the first 3 cells on the second line would also have only 6 * 5 * 4 (120) combinations, and each group of 3 cells in every 3rd row would only have 3 * 2 * 1 (6) combinations. The first 3 cells of row 4 would have only 6 * 5 * 4 (120) combinations. The first 3 and 2nd 3 cells of row 5 would have only 5 * 4 * 3 (60) combinations, requiring only 6 bits.
Finally, you could exclude the last line in the puzzle, as this line could be easily recalculated once the rest of the puzzle had been reconstructed. so that would leave us with
Code: 
Bits (for each group of 3 cells).
Line 1: 9 7 3 Total: 19
Line 2: 7 7 3 Total: 17
Line 3: 3 3 3 Total: 9
Line 4: 7 7 3 Total: 17
Line 5: 6 6 3 Total: 15
Line 6: 3 3 3 Total: 9
Line 7: 3 3 3 Total: 9
Line 8: 3 3 3 Total: 9
Line 9: 0 0 0 Total: 0

Which is a total of 104 bits. Plus the 81 bits to determine the givens, is a total of 185 bits per puzzle. Of course, that would require quite a bit of complex code to put it all back together again.
PS: I'm operating on little sleep at the moment, so if someone else wouldn't mind double checking my logic for me. Thanx. 

Back to top 


 daj95376
 Joined: 05 Feb 2006  Posts: 349  :   Items 

Posted: Mon Nov 16, 2009 6:04 pm Post subject: Re: Smallest size for storing a puzzle 


lkSudoku wrote:  As I was building my online sudoku website www.lksudoku.co.cc with puzzles prepared in advanced, I came across the following question:
Suppose you have a large number, say P, of sudoku puzzles, each puzzle containing between 20 to 33 givens; What is the smallest number of bits that are required in order to store all the P puzzles and their solutions, and how they can be stored to achieve that number of bits? 
gsf has a compression algorithm that reduces a large collection of puzzles to an average of very few bits per puzzle. What you have to ask yourself: How much space will be consumed by the software to decompress the puzzles? 

Back to top 


 lkSudoku
 Joined: 16 May 2009  Posts: 60  :   Items 

Posted: Mon Nov 16, 2009 6:44 pm Post subject: 


The storage method idea seems interesting in the way it utilizes the logic of the puzzle, however the calculation has a flaw
Quote:  The first 3 cells of row 4 would have only 6 * 5 * 4 (120) combinations 
That may not always be the case; the first cell of the 4th row has only 6 options, however if the first cell has the same value as one of the cells above the second cell of the 4th row, that second cell also has 6 options and not 5 options; for similar reasons the 3rd cell in the 4th row may also have 6 options and not 4, in such case the first 3 cells of row 4 have 6*6*6=216 combinations 

Back to top 


 lkSudoku
 Joined: 16 May 2009  Posts: 60  :   Items 

Posted: Mon Nov 16, 2009 6:48 pm Post subject: Re: Smallest size for storing a puzzle 


daj95376 wrote:  lkSudoku wrote:  As I was building my online sudoku website www.lksudoku.co.cc with puzzles prepared in advanced, I came across the following question:
Suppose you have a large number, say P, of sudoku puzzles, each puzzle containing between 20 to 33 givens; What is the smallest number of bits that are required in order to store all the P puzzles and their solutions, and how they can be stored to achieve that number of bits? 
gsf has a compression algorithm that reduces a large collection of puzzles to an average of very few bits per puzzle. What you have to ask yourself: How much space will be consumed by the software to decompress the puzzles? 
Can someone post the algorithm of gsf for puzzles compression?
How much memory does it require? 

Back to top 


 m_b_metcalf
 Joined: 13 Mar 2006  Posts: 210  :  Location: Berlin  Items 

Posted: Mon Nov 16, 2009 8:41 pm Post subject: Re: Smallest size for storing a puzzle 


lkSudoku wrote: 
Can someone post the algorithm of gsf for puzzles compression?
How much memory does it require? 
I'm sure gsf will reply himself as he described his method somewhere recently (search here or on the Players' Forum). But, in the meantime, please note that his method is used to store complete grids and not puzzles. It is based on storing differences between successive grids.
Note that for complete grids in a canonical form, for instance with the first row always 1 ... 9, you need store only rows 2 to 8 and columns 1 to 8.
Regards,
Mike Metcalf 

Back to top 


 daj95376
 Joined: 05 Feb 2006  Posts: 349  :   Items 

Posted: Mon Nov 16, 2009 9:24 pm Post subject: 


[Withdrawn: may add new reply later.]
Last edited by daj95376 on Fri Nov 20, 2009 6:49 pm; edited 1 time in total 

Back to top 


 wapati
 Joined: 12 Jun 2007  Posts: 622  :  Location: Canada  Items 

Posted: Mon Nov 16, 2009 11:24 pm Post subject: Re: Smallest size for storing a puzzle 


lkSudoku wrote: 
Suppose you have a large number, say P, of sudoku puzzles, each puzzle containing between 20 to 33 givens; What is the smallest number of bits that are required in order to store all the P puzzles and their solutions, and how they can be stored to achieve that number of bits? 
Seems silly to store the solutions, worse, guess how many nonminimal nonsymmetric puzzles fit any given valid matrix? ( You have a valid 20 given puzzle, add one correct cell, of the many, new puzzle! Oh, add 2 correct cells, etc.) 

Back to top 


 lkSudoku
 Joined: 16 May 2009  Posts: 60  :   Items 

Posted: Tue Nov 17, 2009 12:29 am Post subject: 


For the thoretical question regarding the minimal storage size, it does not matter if there are operational reasons
The reason for storing the solutions is that I do not want to run a solver algorithm on every machine that runs the online game, but I do want the machine to hint upon errors, you may consider this as a tradeoff between client machine time for loading the game, and storage size on the server
I also do not want the server to solve puzzles as it should act quickly in response to puzzle requests
The idea of generating different puzzles from a given minimal puzzle should not be applied in my case, as the puzzles have a difficulty grade, and once you add another given to the puzzle, the difficulty grade may change and you get a puzzle with unknown difficulty level 

Back to top 


 JasonLion
 Joined: 16 Nov 2008  Posts: 61  :  Location: Silver Spring, MD  Items 

Posted: Tue Nov 17, 2009 3:09 am Post subject: 


daj95376 wrote:  If you keep your puzzles to 32 clues or less, then you can reduce the second algorithm to 7 32bit words. 
If you are willing to do bit packing and division you can fit it in 24 bytes (6 32bit words) using a slight variation on this approach. 81 bits (ten bytes and a spare bit) to store the map of where the clues go, then you store 10 digits in a 32bit word (9 to the 10th power is less than 2 to the 32nd power). Three words gives 30 clues, and the remaining two clues fit in a byte along with the left over bit of the clue map). Using a similar approach you can get a solved grid into 8 words and a byte (33 bytes).
I find it is worth having a solver even on extremely minimal clients. You can get a solver down to a quite small bit of code without very much work, and solving a single grid using brute force can be done in a fraction of a second even on a fairly slow, by modern standards, microcontroller. Having the solver cuts your compressed size for puzzle and solution down dramatically, since you no longer store the solution.
You can also use a puzzle generator to cut the compressed size down very dramatically. You only store the seed used to seed the random number generator that drives the puzzle generator. Using 4 bytes of seeding is restrictive but will work in many situations. Using 8 to 12 bytes of seeding ought to cover most needs. The counterpoint is that you can't store arbitrary puzzles, only ones that the generator can generate given the restricted seed space. This approach is very practical on cell phones for example, but is too expensive for handheld Sudoku games. Since you mentioned networking, I assume you have at least a cell phone level processor.
There are other approaches to limiting the puzzles that you can represent, that don't take nearly as much CPU as a puzzle generator, which can still give you the same kind of dramatic compressed puzzle size savings that the generator gives. Nearly anything that creates "likely to be puzzles" using a seeded random process can be used so you only need to store the seeds of the ones that worked out appropriately. Generating the puzzles for initial grading needs to use the same process, which will be noticeably less efficient on initial puzzle creation, but does wonders for stored puzzle size. 

Back to top 


 lkSudoku
 Joined: 16 May 2009  Posts: 60  :   Items 

Posted: Tue Nov 17, 2009 11:23 am Post subject: 


JasonLion wrote:  Having the solver cuts your compressed size for puzzle and solution down dramatically, since you no longer store the solution. 
I doubt if that is the case; if you take the algorithm of sudoking that stores both solution and clues, for instance, you get a storage of about 185 bits (due to some assumptions that may not always apply, the size will be a little larger, but still close to that number); perhaps if you make more optimizations such as removing the first row, you can get even smaller size than 185 bits and that size may be smaller than any number of bits required for clues storage only.
JasonLion wrote:  If you are willing to do bit packing and division you can fit it in 24 bytes (6 32bit words) 
The 192 bits solution may be larger than the full solution storage
When you store the complete solution, you can take advantage of the sudoku rules logic in order to decrease size, while storing only clues cannot take advantage of these rules
I agree that if you have a generator on the client with hints from the server, the storage can be much smaller, but the generator creation time depends on the grading algorithm and varies both with the target machine and the complexity of the grading algorithm, and if you have a complicated generator that runs large loops for checking time consuming solving methods it may run too slow on some machines 

Back to top 


 JasonLion
 Joined: 16 Nov 2008  Posts: 61  :  Location: Silver Spring, MD  Items 

Posted: Tue Nov 17, 2009 2:31 pm Post subject: 


An ideal packing of arbitrary puzzles requires on the order of 154 bits (81 bit map of clue locations and around 73 bits to specify one of the 6.6e21 unique grids). Decoding such a packing requires a substantial amount of code and deep knowledge of Sudoku. (In practice this particular packing is not practical to do on a constrained client system, but it does give us a lower bound to compare to.)
The packing I suggested requires 192 bits, worse but not terrible, and hardly any code or deep knowledge of Sudoku. Various other strategies, such as sudoking's suggestions, are intermediate between these two.
Depending on the memory/CPU/programming time constraints of the project, one or another of the full possible range of solutions might be optimal. My suggestion is aimed at fairly low CPU/memory/programming time situations. For the most part, the more deep knowledge of Sudoku the packing takes advantage of, the more CPU/memory/programming time you will need to implement it. I did assume 32 bit division was available. There are other approaches if division is not available in the budget. (There are, of course, cases where deep Sudoku knowledge reduces CPU/memory requirements, but I haven't run across one that helps significantly for this particular issue.)
I am assuming that puzzle rating and selection are happening in advance on a more capable system, and the client simply "expands" a puzzle from a compressed bit string. This appears to be the case in lkSudoku's question which started this topic. In this situation, the generator used by the client for the seeded generator approach does not need to do puzzle grading. It can be very simple and efficient compared to the process used for initial selection of the puzzles. Only seeds which result in puzzles that have the desired properties would be stored in the database, so there is no need to verify those properties in the generator on the client. 

Back to top 


 gsf
 Joined: 18 Aug 2005  Posts: 408  :  Location: NJ USA  Items 

Posted: Tue Nov 17, 2009 3:27 pm Post subject: 


for puzzle compression just use zlib
for the 17 catalog with comments it gets ~137 bits per puzzle
for a high level description of sudz compression (for the catalog of all essentially different solution grids)
see sudz(1)
sudz compression averages ~8.37 bits per solution grid
the basic tactic to get good domain specific compression is to
preorder the data in a form that maximizes the compression rate of a back end compressor  for sudz that back end is bzlib, which does
really well when the input data can be organized in columns where the data does not change much rowbyrow 

Back to top 


 lkSudoku
 Joined: 16 May 2009  Posts: 60  :   Items 

Posted: Tue Nov 17, 2009 3:44 pm Post subject: 


JasonLion wrote:  An ideal packing of arbitrary puzzles requires on the order of 154 bits (81 bit map of clue locations and around 73 bits to specify one of the 6.6e21 unique grids) 
Yes, this seem correct when the number of clues is unknown.
When the number of clues is a fixed number, say x, the amount of bits for specifying the clue location cannot be less than log_2(choose(x, 81)), or in other words log_2(81!/(x!(81x!)))
When the number of clues ranges between x_1 and x_2, the amount of bits for clues location cannot be less than log_2(sum_{x=x_1, x<=x_2}choose(x, 81))
To that, another 73 bits are required for the solution description
Last edited by lkSudoku on Tue Nov 17, 2009 4:34 pm; edited 1 time in total 

Back to top 


 sudoking
 Joined: 20 Oct 2009  Posts: 40  :   Items 

Posted: Tue Nov 17, 2009 4:34 pm Post subject: 


lkSudoku wrote:  Quote:  The first 3 cells of row 4 would have only 6 * 5 * 4 (120) combinations 
That may not always be the case; the first cell of the 4th row has only 6 options, however if the first cell has the same value as one of the cells above the second cell of the 4th row, that second cell also has 6 options and not 5 options; for similar reasons the 3rd cell in the 4th row may also have 6 options and not 4, in such case the first 3 cells of row 4 have 6*6*6=216 combinations 
You are correct. I knew there was something I'd overlooked when I posted it.
m_b_metcalf wrote:  It is based on storing differences between successive grids. 
Problem with that is, it make extraction of the last puzzle highly complex, as you'd have to calculate all the differences from the first puzzle. This could take a lot of I/O / processing for a large list of puzzles.
Quote:  Note that for complete grids in a canonical form, for instance with the first row always 1 ... 9, you need store only rows 2 to 8 and columns 1 to 8. 
Which isn't really the way you want puzzles stored if you intend for them to be posted on a website. The problem with canonical form is that it discards the original form.
lkSudoku wrote:  I also do not want the server to solve puzzles as it should act quickly in response to puzzle requests 
If you use the bb_sudoku method, and cut the algorithm to the bare bones, then it'll solve puzzles in a reasonable timeframe. 

Back to top 




You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum

Powered by phpBB © 2001, 2005 phpBB Group
Igloo Theme Version 1.0 :: Created By: Andrew Charron
