Sudoku Programmers Forum Index

 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister   ProfileProfile   Log inLog in          Games  Calendar

Log in to check your private messagesLog in to check your private messages   

Smallest size for storing a puzzle
Goto page 1, 2  Next
 
Post new topic   Reply to topic    Sudoku Programmers Forum Index -> Programming sudoku
View previous topic :: View next topic  
Author Message
lkSudoku

Joined: 16 May 2009
Posts: 60
:

Items
PostPosted: Mon Nov 16, 2009 12:06 pm    Post subject: Smallest size for storing a puzzle Reply with quote

As I was building my online sudoku website www.lksudoku.co.cc with puzzles prepared in advanced, I came across the following question:

Suppose you have a large number, say P, of sudoku puzzles, each puzzle containing between 20 to 33 givens; What is the smallest number of bits that are required in order to store all the P puzzles and their solutions, and how they can be stored to achieve that number of bits?
Back to top
View user's profile Send private message Send e-mail
sudoking

Joined: 20 Oct 2009
Posts: 40
:

Items
PostPosted: Mon Nov 16, 2009 3:46 pm    Post subject: Re: Smallest size for storing a puzzle Reply with quote

lkSudoku wrote:
As I was building my online sudoku website www.lksudoku.co.cc with puzzles prepared in advanced, I came across the following question:

Suppose you have a large number, say P, of sudoku puzzles, each puzzle containing between 20 to 33 givens; What is the smallest number of bits that are required in order to store all the P puzzles and their solutions, and how they can be stored to achieve that number of bits?


Well, you could store the value of each cell in 4 bit, and add an extra bit to signify whether that cell is a given. So that would be 81 * 4 + 81 = 405 bits per puzzle.

Then you could shave a few bits off by combining cells. eg, the number of possible combinations for cells 1 and 2 is 9 * 8 * 7 = 504. 504 can be represented by only 9 bits (instead of 12 as above). So that would make it 3 bits per cell to store the solution value, and 1 bit to signify whether the cell is a given. 81 * 3 + 81 = 324 bits per puzzle.

But then there's the fact that not every combination of three cells will have 504 possibilities. eg. once the first 3 cell in a given line are known, then the possible combination of the next 3 cells is 6 * 5 * 4 (120), not 9 * 8 * 7 (504). 120 can be represented with only 7 bits. The final 3 cells in the line then only have 3 * 2 * 1 (6) combinations, which can of course be represented in only 3 bits. So, to represent a single solution line would require 9 + 7 + 3 = 19 bits (plus the standard 9 bits to determine if the cell is a given). So the whole puzzles could be represented with 19 * 9 + 81 = 252 bits per puzzle.

You could go even further. For example, the first 3 cells on the second line would also have only 6 * 5 * 4 (120) combinations, and each group of 3 cells in every 3rd row would only have 3 * 2 * 1 (6) combinations. The first 3 cells of row 4 would have only 6 * 5 * 4 (120) combinations. The first 3 and 2nd 3 cells of row 5 would have only 5 * 4 * 3 (60) combinations, requiring only 6 bits.

Finally, you could exclude the last line in the puzzle, as this line could be easily recalculated once the rest of the puzzle had been reconstructed. so that would leave us with
Code:

           Bits (for each group of 3 cells).
Line 1: 9 7 3 Total: 19
Line 2: 7 7 3 Total: 17
Line 3: 3 3 3 Total: 9
Line 4: 7 7 3 Total: 17
Line 5: 6 6 3 Total: 15
Line 6: 3 3 3 Total: 9
Line 7: 3 3 3 Total: 9
Line 8: 3 3 3 Total: 9
Line 9: 0 0 0 Total: 0


Which is a total of 104 bits. Plus the 81 bits to determine the givens, is a total of 185 bits per puzzle. Of course, that would require quite a bit of complex code to put it all back together again.

PS: I'm operating on little sleep at the moment, so if someone else wouldn't mind double checking my logic for me. Thanx.
Back to top
View user's profile Send private message
daj95376

Joined: 05 Feb 2006
Posts: 349
:

Items
PostPosted: Mon Nov 16, 2009 6:04 pm    Post subject: Re: Smallest size for storing a puzzle Reply with quote

lkSudoku wrote:
As I was building my online sudoku website www.lksudoku.co.cc with puzzles prepared in advanced, I came across the following question:

Suppose you have a large number, say P, of sudoku puzzles, each puzzle containing between 20 to 33 givens; What is the smallest number of bits that are required in order to store all the P puzzles and their solutions, and how they can be stored to achieve that number of bits?

gsf has a compression algorithm that reduces a large collection of puzzles to an average of very few bits per puzzle. What you have to ask yourself: How much space will be consumed by the software to decompress the puzzles?
Back to top
View user's profile Send private message
lkSudoku

Joined: 16 May 2009
Posts: 60
:

Items
PostPosted: Mon Nov 16, 2009 6:44 pm    Post subject: Reply with quote

The storage method idea seems interesting in the way it utilizes the logic of the puzzle, however the calculation has a flaw

Quote:
The first 3 cells of row 4 would have only 6 * 5 * 4 (120) combinations


That may not always be the case; the first cell of the 4th row has only 6 options, however if the first cell has the same value as one of the cells above the second cell of the 4th row, that second cell also has 6 options and not 5 options; for similar reasons the 3rd cell in the 4th row may also have 6 options and not 4, in such case the first 3 cells of row 4 have 6*6*6=216 combinations
Back to top
View user's profile Send private message Send e-mail
lkSudoku

Joined: 16 May 2009
Posts: 60
:

Items
PostPosted: Mon Nov 16, 2009 6:48 pm    Post subject: Re: Smallest size for storing a puzzle Reply with quote

daj95376 wrote:
lkSudoku wrote:
As I was building my online sudoku website www.lksudoku.co.cc with puzzles prepared in advanced, I came across the following question:

Suppose you have a large number, say P, of sudoku puzzles, each puzzle containing between 20 to 33 givens; What is the smallest number of bits that are required in order to store all the P puzzles and their solutions, and how they can be stored to achieve that number of bits?

gsf has a compression algorithm that reduces a large collection of puzzles to an average of very few bits per puzzle. What you have to ask yourself: How much space will be consumed by the software to decompress the puzzles?


Can someone post the algorithm of gsf for puzzles compression?
How much memory does it require?
Back to top
View user's profile Send private message Send e-mail
m_b_metcalf

Joined: 13 Mar 2006
Posts: 210
:
Location: Berlin

Items
PostPosted: Mon Nov 16, 2009 8:41 pm    Post subject: Re: Smallest size for storing a puzzle Reply with quote

lkSudoku wrote:

Can someone post the algorithm of gsf for puzzles compression?
How much memory does it require?

I'm sure gsf will reply himself as he described his method somewhere recently (search here or on the Players' Forum). But, in the meantime, please note that his method is used to store complete grids and not puzzles. It is based on storing differences between successive grids.

Note that for complete grids in a canonical form, for instance with the first row always 1 ... 9, you need store only rows 2 to 8 and columns 1 to 8.

Regards,

Mike Metcalf
Back to top
View user's profile Send private message
daj95376

Joined: 05 Feb 2006
Posts: 349
:

Items
PostPosted: Mon Nov 16, 2009 9:24 pm    Post subject: Reply with quote

[Withdrawn: may add new reply later.]

Last edited by daj95376 on Fri Nov 20, 2009 6:49 pm; edited 1 time in total
Back to top
View user's profile Send private message
wapati

Joined: 12 Jun 2007
Posts: 622
:
Location: Canada

Items
PostPosted: Mon Nov 16, 2009 11:24 pm    Post subject: Re: Smallest size for storing a puzzle Reply with quote

lkSudoku wrote:

Suppose you have a large number, say P, of sudoku puzzles, each puzzle containing between 20 to 33 givens; What is the smallest number of bits that are required in order to store all the P puzzles and their solutions, and how they can be stored to achieve that number of bits?


Seems silly to store the solutions, worse, guess how many non-minimal non-symmetric puzzles fit any given valid matrix? ( You have a valid 20 given puzzle, add one correct cell, of the many, new puzzle! Oh, add 2 correct cells, etc.)
Back to top
View user's profile Send private message
lkSudoku

Joined: 16 May 2009
Posts: 60
:

Items
PostPosted: Tue Nov 17, 2009 12:29 am    Post subject: Reply with quote

For the thoretical question regarding the minimal storage size, it does not matter if there are operational reasons

The reason for storing the solutions is that I do not want to run a solver algorithm on every machine that runs the online game, but I do want the machine to hint upon errors, you may consider this as a tradeoff between client machine time for loading the game, and storage size on the server

I also do not want the server to solve puzzles as it should act quickly in response to puzzle requests

The idea of generating different puzzles from a given minimal puzzle should not be applied in my case, as the puzzles have a difficulty grade, and once you add another given to the puzzle, the difficulty grade may change and you get a puzzle with unknown difficulty level
Back to top
View user's profile Send private message Send e-mail
JasonLion

Joined: 16 Nov 2008
Posts: 61
:
Location: Silver Spring, MD

Items
PostPosted: Tue Nov 17, 2009 3:09 am    Post subject: Reply with quote

daj95376 wrote:
If you keep your puzzles to 32 clues or less, then you can reduce the second algorithm to 7 32-bit words.

If you are willing to do bit packing and division you can fit it in 24 bytes (6 32-bit words) using a slight variation on this approach. 81 bits (ten bytes and a spare bit) to store the map of where the clues go, then you store 10 digits in a 32-bit word (9 to the 10th power is less than 2 to the 32nd power). Three words gives 30 clues, and the remaining two clues fit in a byte along with the left over bit of the clue map). Using a similar approach you can get a solved grid into 8 words and a byte (33 bytes).

I find it is worth having a solver even on extremely minimal clients. You can get a solver down to a quite small bit of code without very much work, and solving a single grid using brute force can be done in a fraction of a second even on a fairly slow, by modern standards, micro-controller. Having the solver cuts your compressed size for puzzle and solution down dramatically, since you no longer store the solution.

You can also use a puzzle generator to cut the compressed size down very dramatically. You only store the seed used to seed the random number generator that drives the puzzle generator. Using 4 bytes of seeding is restrictive but will work in many situations. Using 8 to 12 bytes of seeding ought to cover most needs. The counterpoint is that you can't store arbitrary puzzles, only ones that the generator can generate given the restricted seed space. This approach is very practical on cell phones for example, but is too expensive for hand-held Sudoku games. Since you mentioned networking, I assume you have at least a cell phone level processor.

There are other approaches to limiting the puzzles that you can represent, that don't take nearly as much CPU as a puzzle generator, which can still give you the same kind of dramatic compressed puzzle size savings that the generator gives. Nearly anything that creates "likely to be puzzles" using a seeded random process can be used so you only need to store the seeds of the ones that worked out appropriately. Generating the puzzles for initial grading needs to use the same process, which will be noticeably less efficient on initial puzzle creation, but does wonders for stored puzzle size.
Back to top
View user's profile Send private message
lkSudoku

Joined: 16 May 2009
Posts: 60
:

Items
PostPosted: Tue Nov 17, 2009 11:23 am    Post subject: Reply with quote

JasonLion wrote:
Having the solver cuts your compressed size for puzzle and solution down dramatically, since you no longer store the solution.


I doubt if that is the case; if you take the algorithm of sudoking that stores both solution and clues, for instance, you get a storage of about 185 bits (due to some assumptions that may not always apply, the size will be a little larger, but still close to that number); perhaps if you make more optimizations such as removing the first row, you can get even smaller size than 185 bits and that size may be smaller than any number of bits required for clues storage only.

JasonLion wrote:
If you are willing to do bit packing and division you can fit it in 24 bytes (6 32-bit words)


The 192 bits solution may be larger than the full solution storage

When you store the complete solution, you can take advantage of the sudoku rules logic in order to decrease size, while storing only clues cannot take advantage of these rules

I agree that if you have a generator on the client with hints from the server, the storage can be much smaller, but the generator creation time depends on the grading algorithm and varies both with the target machine and the complexity of the grading algorithm, and if you have a complicated generator that runs large loops for checking time consuming solving methods it may run too slow on some machines
Back to top
View user's profile Send private message Send e-mail
JasonLion

Joined: 16 Nov 2008
Posts: 61
:
Location: Silver Spring, MD

Items
PostPosted: Tue Nov 17, 2009 2:31 pm    Post subject: Reply with quote

An ideal packing of arbitrary puzzles requires on the order of 154 bits (81 bit map of clue locations and around 73 bits to specify one of the 6.6e21 unique grids). Decoding such a packing requires a substantial amount of code and deep knowledge of Sudoku. (In practice this particular packing is not practical to do on a constrained client system, but it does give us a lower bound to compare to.)

The packing I suggested requires 192 bits, worse but not terrible, and hardly any code or deep knowledge of Sudoku. Various other strategies, such as sudoking's suggestions, are intermediate between these two.

Depending on the memory/CPU/programming time constraints of the project, one or another of the full possible range of solutions might be optimal. My suggestion is aimed at fairly low CPU/memory/programming time situations. For the most part, the more deep knowledge of Sudoku the packing takes advantage of, the more CPU/memory/programming time you will need to implement it. I did assume 32 bit division was available. There are other approaches if division is not available in the budget. (There are, of course, cases where deep Sudoku knowledge reduces CPU/memory requirements, but I haven't run across one that helps significantly for this particular issue.)

I am assuming that puzzle rating and selection are happening in advance on a more capable system, and the client simply "expands" a puzzle from a compressed bit string. This appears to be the case in lkSudoku's question which started this topic. In this situation, the generator used by the client for the seeded generator approach does not need to do puzzle grading. It can be very simple and efficient compared to the process used for initial selection of the puzzles. Only seeds which result in puzzles that have the desired properties would be stored in the database, so there is no need to verify those properties in the generator on the client.
Back to top
View user's profile Send private message
gsf

Joined: 18 Aug 2005
Posts: 408
:
Location: NJ USA

Items
PostPosted: Tue Nov 17, 2009 3:27 pm    Post subject: Reply with quote

for puzzle compression just use zlib
for the 17 catalog with comments it gets ~137 bits per puzzle

for a high level description of sudz compression (for the catalog of all essentially different solution grids)
see sudz(1)
sudz compression averages ~8.37 bits per solution grid

the basic tactic to get good domain specific compression is to
pre-order the data in a form that maximizes the compression rate of a back end compressor -- for sudz that back end is bzlib, which does
really well when the input data can be organized in columns where the data does not change much row-by-row
Back to top
View user's profile Send private message Visit poster's website
lkSudoku

Joined: 16 May 2009
Posts: 60
:

Items
PostPosted: Tue Nov 17, 2009 3:44 pm    Post subject: Reply with quote

JasonLion wrote:
An ideal packing of arbitrary puzzles requires on the order of 154 bits (81 bit map of clue locations and around 73 bits to specify one of the 6.6e21 unique grids)


Yes, this seem correct when the number of clues is unknown.

When the number of clues is a fixed number, say x, the amount of bits for specifying the clue location cannot be less than log_2(choose(x, 81)), or in other words log_2(81!/(x!(81-x!)))

When the number of clues ranges between x_1 and x_2, the amount of bits for clues location cannot be less than log_2(sum_{x=x_1, x<=x_2}choose(x, 81))

To that, another 73 bits are required for the solution description


Last edited by lkSudoku on Tue Nov 17, 2009 4:34 pm; edited 1 time in total
Back to top
View user's profile Send private message Send e-mail
sudoking

Joined: 20 Oct 2009
Posts: 40
:

Items
PostPosted: Tue Nov 17, 2009 4:34 pm    Post subject: Reply with quote

lkSudoku wrote:
Quote:
The first 3 cells of row 4 would have only 6 * 5 * 4 (120) combinations


That may not always be the case; the first cell of the 4th row has only 6 options, however if the first cell has the same value as one of the cells above the second cell of the 4th row, that second cell also has 6 options and not 5 options; for similar reasons the 3rd cell in the 4th row may also have 6 options and not 4, in such case the first 3 cells of row 4 have 6*6*6=216 combinations


You are correct. I knew there was something I'd overlooked when I posted it. Smile

m_b_metcalf wrote:
It is based on storing differences between successive grids.


Problem with that is, it make extraction of the last puzzle highly complex, as you'd have to calculate all the differences from the first puzzle. This could take a lot of I/O / processing for a large list of puzzles.

Quote:
Note that for complete grids in a canonical form, for instance with the first row always 1 ... 9, you need store only rows 2 to 8 and columns 1 to 8.


Which isn't really the way you want puzzles stored if you intend for them to be posted on a website. The problem with canonical form is that it discards the original form.

lkSudoku wrote:
I also do not want the server to solve puzzles as it should act quickly in response to puzzle requests


If you use the bb_sudoku method, and cut the algorithm to the bare bones, then it'll solve puzzles in a reasonable time-frame.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Sudoku Programmers Forum Index -> Programming sudoku All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Sudoku Programmers topic RSS feed 


Powered by phpBB © 2001, 2005 phpBB Group

Igloo Theme Version 1.0 :: Created By: Andrew Charron