Sudoku file formats

barcode · Posted: Mon Dec 05, 2005 7:43 pm Post subject:

Ruud could you tell me what license you are releasing that XSD under, I would like to use it as it is the only documented XML Sudoku format I have seen. BSD would be good.

Ruud · Site Admin

barcode · Posted: Mon Jan 16, 2006 9:17 pm Post subject:

Are you using this schema in your program, is anyone else using this schema?

Miles · Posted: Mon Jan 16, 2006 9:56 pm Post subject:

I use XML to store my work, for the moment, it is like this :
<Sodoku size="3" >
<caseValues row="0" column="3" value="13" />
</Sodoku>

I'll add some nodes for possible values or more complicated games.

PSBlake · Posted: Wed Jan 18, 2006 6:51 pm Post subject:

Alternatively, if you're going for smallest size, and not a "human readable" format, then consider:

First byte: Size of the grid. We'll just deal with square grids, ie, 9x9.

Next (N^2)/8 bytes (rounded up): The grid, in binary format, with 1s for clue boxes, and 0s for blanks. Excess bits at the end of the string are ignored.

Next, enough bytes to represent the number of clue boxes. Express in hex equivalent of base-nine value (you don't have to deal with zeros).

For a 17-clue, 9x9 grid sudoku, that's 1 byte for size, 4 bytes for "where are clues", and 7 bytes (ln[9^17]/ln[256]) for the actual clue string. I can name that Sudoku in 12 bytes.

gsf · Posted: Wed Jan 18, 2006 7:54 pm Post subject:

PSBlake · Posted: Wed Jan 18, 2006 11:28 pm Post subject:

gsf · Posted: Thu Jan 19, 2006 1:06 am Post subject:

Ruud · Site Admin

I think the 24 byte version also has some room for compression.

However, I store my large collections (>500.000 puzzles) in 9 32-bit integers per puzzle. The upper limit of the int32 allows 9 digits (a single row) to be stored. I am not compressing these. I do value performance over diskspace and memory, which are cheap.

Ruud.

PS. Talking about performance, did anyone notice the improvements for this forum? Very Happy

_________________
Meet me at sudocue.net

Vikstar · Posted: Mon Jan 23, 2006 7:54 am Post subject: Smallest ASCII Format?

Purpose: to do away with all the little/big endian problems of binary format, and to make it *slightly* human readable, but to also make it small.

To be applied to very sparse 3x3 sudoku 17-hint puzzles, the best format I can come up with is the following.

Hints are written with two characters, the first is [1..9] representing the column and the second is [1..9] representing the actual hint number or value. All hints are written for a row and each row is separated with some arbitrary character, such as ':'.

For example, the following puzzle:

gsf · Posted: Mon Jan 23, 2006 10:31 am Post subject:

PSBlake · Posted: Tue Jan 24, 2006 3:12 am Post subject:

gsf · Posted: Tue Jan 24, 2006 3:53 am Post subject:

PSBlake · Posted: Thu Jan 26, 2006 11:06 pm Post subject:

I don't think anyone was insisting on using special encoding. It was just a "minimal space" suggestion.

Consider: There are only 5,472,730,538 possible distinct Sudoku solutions (for a 9x9 grid). For each of those, between 17 and 80 boxes will be givens.

5,472,730,538 is representable in 5 bytes [actually, in 33 bits, but I'm rounding to the nearest byte]. I submit the following as a variation of gsf's method:

First 5 bytes: Enumeration of the sudoku grid.

Next byte: Size of the largest gap in the puzzle. Technically, the largest possible gap is 64, which resolves to 6 bits, but wasting two bits is trivial.

Next bytes: Enumeration of the gap sizes. If the largest gap size is X, then the list of gap sizes can be represented as a base-(x+1) string, converted to hex.

So the aforementioned Sudoku resolves to:

[00 00 00 00 00*] 0C [D3 3E DA 28 A3 DF E0 12**]

*These bits are unknown, as I don't know the enumeration for distinct Sudoku.

**Assuming I did my math right converting from base-13 to base 16.

I can demonstrably name that Sudoku in 14 bytes.
Cool

Additional space savings could be had by examining whether there were more adjacent clues than adjacent blanks, and using one of the two unused bits to represent which would provide better savings.

And I still think there's a smaller way.

gsf · Posted: Thu Jan 26, 2006 11:38 pm Post subject: