Sudoku Programmers Forum Index

 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister   ProfileProfile   Log inLog in          Games  Calendar

Log in to check your private messagesLog in to check your private messages   

Sudoku file formats
Goto page Previous  1, 2, 3  Next
 
Post new topic   Reply to topic    Sudoku Programmers Forum Index -> Software
View previous topic :: View next topic  
Author Message
barcode

Joined: 02 Nov 2005
Posts: 7
:

Items
PostPosted: Mon Dec 05, 2005 7:43 pm    Post subject: Reply with quote

Ruud could you tell me what license you are releasing that XSD under, I would like to use it as it is the only documented XML Sudoku format I have seen. BSD would be good.
Back to top
View user's profile Send private message
Ruud
Site Admin
Joined: 17 Sep 2005
Posts: 708
:
Location: Netherlands

Items
PostPosted: Mon Dec 05, 2005 9:25 pm    Post subject: Reply with quote

barcode wrote:
Ruud could you tell me what license you are releasing that XSD under, I would like to use it as it is the only documented XML Sudoku format I have seen. BSD would be good.

I uploaded a new version with a BSD license included in the annotations.

Since we're drawing up standards here, I suggest we use a .sdx extension for these files, unless you have a better suggestion.

Ruud.
_________________
Meet me at sudocue.net
Back to top
View user's profile Send private message Visit poster's website
barcode

Joined: 02 Nov 2005
Posts: 7
:

Items
PostPosted: Mon Jan 16, 2006 9:17 pm    Post subject: Reply with quote

Are you using this schema in your program, is anyone else using this schema?
Back to top
View user's profile Send private message
Miles

Joined: 29 Dec 2005
Posts: 30
:

Items
PostPosted: Mon Jan 16, 2006 9:56 pm    Post subject: Reply with quote

I use XML to store my work, for the moment, it is like this :
<Sodoku size="3" >
<caseValues row="0" column="3" value="13" />
</Sodoku>

I'll add some nodes for possible values or more complicated games.
Back to top
View user's profile Send private message Visit poster's website
PSBlake

Joined: 18 Jan 2006
Posts: 8
:

Items
PostPosted: Wed Jan 18, 2006 6:51 pm    Post subject: Reply with quote

Alternatively, if you're going for smallest size, and not a "human readable" format, then consider:

First byte: Size of the grid. We'll just deal with square grids, ie, 9x9.

Next (N^2)/8 bytes (rounded up): The grid, in binary format, with 1s for clue boxes, and 0s for blanks. Excess bits at the end of the string are ignored.

Next, enough bytes to represent the number of clue boxes. Express in hex equivalent of base-nine value (you don't have to deal with zeros).

For a 17-clue, 9x9 grid sudoku, that's 1 byte for size, 4 bytes for "where are clues", and 7 bytes (ln[9^17]/ln[256]) for the actual clue string. I can name that Sudoku in 12 bytes.
Back to top
View user's profile Send private message
gsf

Joined: 18 Aug 2005
Posts: 411
:
Location: NJ USA

Items
PostPosted: Wed Jan 18, 2006 7:54 pm    Post subject: Reply with quote

PSBlake wrote:
Alternatively, if you're going for smallest size, and not a "human readable" format, then consider:

First byte: Size of the grid. We'll just deal with square grids, ie, 9x9.

Next (N^2)/8 bytes (rounded up): The grid, in binary format, with 1s for clue boxes, and 0s for blanks. Excess bits at the end of the string are ignored.

Next, enough bytes to represent the number of clue boxes. Express in hex equivalent of base-nine value (you don't have to deal with zeros).

For a 17-clue, 9x9 grid sudoku, that's 1 byte for size, 4 bytes for "where are clues", and 7 bytes (ln[9^17]/ln[256]) for the actual clue string. I can name that Sudoku in 12 bytes.

not so fast
for N=9 you'll need 11 bytes for "where are the clues"
Code:

(9*9+7)/8 = 11
Back to top
View user's profile Send private message Visit poster's website
PSBlake

Joined: 18 Jan 2006
Posts: 8
:

Items
PostPosted: Wed Jan 18, 2006 11:28 pm    Post subject: Reply with quote

gsf wrote:

not so fast
for N=9 you'll need 11 bytes for "where are the clues"
Code:

(9*9+7)/8 = 11


You're right. I made another mistake in a different post (referencing this post), counting on 17 or fewer clues in all Sudoku. All Nikoli Sudoku are 32 clues or fewer, making all Nikoli 9x9 Sudoku expressable as:
11 bytes for clues grid
(ln[9^32]/ln[256]) bytes (rounded up) for clue values = 13 bytes

For a grand total of 24 bytes per 9x9 Nikoli Sudoku.
Back to top
View user's profile Send private message
gsf

Joined: 18 Aug 2005
Posts: 411
:
Location: NJ USA

Items
PostPosted: Thu Jan 19, 2006 1:06 am    Post subject: Reply with quote

PSBlake wrote:

For a grand total of 24 bytes per 9x9 Nikoli Sudoku.

for a catalog of 100,000 puzzles
82 bytes per puzzle (81+newline), . for empty,
average # clues ~25, average compressed bytes per puzzle
Code:

24.2 gzip
21.5 bzip
19.8 run length encoding + huffman encoding
Back to top
View user's profile Send private message Visit poster's website
Ruud
Site Admin
Joined: 17 Sep 2005
Posts: 708
:
Location: Netherlands

Items
PostPosted: Thu Jan 19, 2006 1:54 am    Post subject: Reply with quote

I think the 24 byte version also has some room for compression.

However, I store my large collections (>500.000 puzzles) in 9 32-bit integers per puzzle. The upper limit of the int32 allows 9 digits (a single row) to be stored. I am not compressing these. I do value performance over diskspace and memory, which are cheap.

Ruud.

PS. Talking about performance, did anyone notice the improvements for this forum? Very Happy Cool
_________________
Meet me at sudocue.net
Back to top
View user's profile Send private message Visit poster's website
Vikstar

Joined: 23 Jan 2006
Posts: 1
:

Items
PostPosted: Mon Jan 23, 2006 7:54 am    Post subject: Smallest ASCII Format? Reply with quote

Purpose: to do away with all the little/big endian problems of binary format, and to make it *slightly* human readable, but to also make it small.

To be applied to very sparse 3x3 sudoku 17-hint puzzles, the best format I can come up with is the following.

Hints are written with two characters, the first is [1..9] representing the column and the second is [1..9] representing the actual hint number or value. All hints are written for a row and each row is separated with some arbitrary character, such as ':'.

For example, the following puzzle:
Code:

+---+---+---+
|...|...|.1.|
|4..|...|...|
|.2.|...|...|
+---+---+---+
|...|.5.|4.7|
|..8|...|3..|
|..1|.9.|...|
+---+---+---+
|3..|4..|2..|
|.5.|1..|...|
|...|8.6|...|
+---+---+---+

would be encoded as
Code:

81:14:22:557497:3873:3159:134472:2541:4866

Since there are 9 rows, it would require 8 ':' characters, and 17 hints would require 17 * 2 characters, which gives (8+17*2) = 42 bytes per puzzle. 43 if you want to include a newline between puzzles.
Any suggestions on how this can be improved, or any suggestions on completely different formats which are both ASCII and small?

Edit: Ofcourse, to make it even more human readable you could use [a..i] for the column names.
Back to top
View user's profile Send private message
gsf

Joined: 18 Aug 2005
Posts: 411
:
Location: NJ USA

Items
PostPosted: Mon Jan 23, 2006 10:31 am    Post subject: Reply with quote

Ruud wrote:
I think the 24 byte version also has some room for compression.

my point was that off the shelf compressors will usually beat specialized
encodings so why bother with another level of obfuscation
all of the solvers on this forum can parse text puzzles and with
a decompressor and a pipe they can all access an efficiently stored catalog

but if you still want a specialized encoding
here's an efficient one using 4 bit nibbles:
Code:

nibble values 0-8 for cell values 1-9
nibble values 9-15 for run of 1-7 holes
if the nibble count is odd add another 0 nibble

rounding up to whole bytes this encoding uses 18 bytes per puzzle for gordon's 17 catalog
and it seems to use ~average#clues+1 for puzzles scraped from the forums
Back to top
View user's profile Send private message Visit poster's website
PSBlake

Joined: 18 Jan 2006
Posts: 8
:

Items
PostPosted: Tue Jan 24, 2006 3:12 am    Post subject: Reply with quote

gsf wrote:

Code:

nibble values 0-8 for cell values 1-9
nibble values 9-15 for run of 1-7 holes
if the nibble count is odd add another 0 nibble


A) What's the deal with the last line there?
B) What about the following Sudoku:

Code:

...|386|..5
..2|1..|7..
59.|.2.|...
-----------
.81|...|...
.6.|...|...
...|..8|...
-----------
...|..5|...
6..|...|3.8
...|4..|.17


The gap between the r5c2 "6" and the r6c6 "8" is too big to represent in 3 bits. Same for the gap between the r6c6 "8" and the r7c6 "5".

I see what you're getting at. It's just that your implementation disallows some puzzles.

Anyway, the minimal space approach is useful for say, an embedded system with extremely limited memory and low processor power. Like a $10 handheld unit dedicated entirely to Sudoku (Although that is, itself, a flawed example: Such handhelds usually generate puzzles on the fly, rather than using pregenerated puzzles). Compression software isn't generally feasable, as it requires memory overhead for execution, and processor power for retrieval. Less CPU ticks are used for simply parsing a binary format, and I don't think we've yet hit upon the "minimal space" solution.[/code]
Back to top
View user's profile Send private message
gsf

Joined: 18 Aug 2005
Posts: 411
:
Location: NJ USA

Items
PostPosted: Tue Jan 24, 2006 3:53 am    Post subject: Reply with quote

gsf wrote:

Code:

nibble values 0-8 for cell values 1-9
nibble values 9-15 for run of 1-7 holes
if the nibble count is odd add another 0 nibble

PSBlake wrote:

A) What's the deal with the last line there?

the encoding could be an odd number of nibbles
so to make it a whole byte count add 1 nibble of 0
PSBlake wrote:

B) What about the following Sudoku:
Code:

...|386|..5
..2|1..|7..
59.|.2.|...
-----------
.81|...|...
.6.|...|...
...|..8|...
-----------
...|..5|...
6..|...|3.8
...|4..|.17


The gap between the r5c2 "6" and the r6c6 "8" is too big to represent in 3 bits. Same for the gap between the r6c6 "8" and the r7c6 "5".

there's nothing in the spec that prohbits 2 or more nibbles to specify a run of holes > 7
a run of 8 holes would need 2 hex nibbles "f9"
a run of 16 holes would need 3 "ffa"
the empty grid is 12 nibbles, 6 bytes "ff ff ff ff ff fc"

this puzzle encodes in 19 bytes (listed in hex):
Code:
b2 75 a4 a1 0a 6a 48 a1 d7 0f 5f d7 f9 4b 5d 29 7b 3b 06

PSBlake wrote:

Anyway, the minimal space approach is useful for say, an embedded system with extremely limited memory and low processor power.

hey, I was just responding to those who insist on an encoding beyond
the snarfable Code: forms seen on the forums

I won't use this encoding in my solver/generator because others would
need a special decoder to read my catalogs

it is an interesting exercize -- beat 19 bytes for this puzzle
Back to top
View user's profile Send private message Visit poster's website
PSBlake

Joined: 18 Jan 2006
Posts: 8
:

Items
PostPosted: Thu Jan 26, 2006 11:06 pm    Post subject: Reply with quote

I don't think anyone was insisting on using special encoding. It was just a "minimal space" suggestion.

Consider: There are only 5,472,730,538 possible distinct Sudoku solutions (for a 9x9 grid). For each of those, between 17 and 80 boxes will be givens.

5,472,730,538 is representable in 5 bytes [actually, in 33 bits, but I'm rounding to the nearest byte]. I submit the following as a variation of gsf's method:

First 5 bytes: Enumeration of the sudoku grid.

Next byte: Size of the largest gap in the puzzle. Technically, the largest possible gap is 64, which resolves to 6 bits, but wasting two bits is trivial.

Next bytes: Enumeration of the gap sizes. If the largest gap size is X, then the list of gap sizes can be represented as a base-(x+1) string, converted to hex.

So the aforementioned Sudoku resolves to:

[00 00 00 00 00*] 0C [D3 3E DA 28 A3 DF E0 12**]

*These bits are unknown, as I don't know the enumeration for distinct Sudoku.

**Assuming I did my math right converting from base-13 to base 16.

I can demonstrably name that Sudoku in 14 bytes.
Cool

Additional space savings could be had by examining whether there were more adjacent clues than adjacent blanks, and using one of the two unused bits to represent which would provide better savings.

And I still think there's a smaller way.
Back to top
View user's profile Send private message
gsf

Joined: 18 Aug 2005
Posts: 411
:
Location: NJ USA

Items
PostPosted: Thu Jan 26, 2006 11:38 pm    Post subject: Reply with quote

PSBlake wrote:
I don't think anyone was insisting on using special encoding. It was just a "minimal space" suggestion.

insisting in the face of readily available methods like gzip and bzip2
PSBlake wrote:

Consider: There are only 5,472,730,538 possible distinct Sudoku solutions (for a 9x9 grid). For each of those, between 17 and 80 boxes will be givens.

5,472,730,538 is representable in 5 bytes [actually, in 33 bits, but I'm rounding to the nearest byte]. I submit the following as a variation of gsf's method:

First 5 bytes: Enumeration of the sudoku grid.

so to decode you would need a dictionary of 5,472,730,538 canonical puzzles
or a function that produces a canonical puzzle given its label

but that would only give you a puzzle equivalent to the original
to get the exact original you would also need the row/col permutations
and the cell label permutation
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    Sudoku Programmers Forum Index -> Software All times are GMT
Goto page Previous  1, 2, 3  Next
Page 2 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Sudoku Programmers topic RSS feed 


Powered by phpBB © 2001, 2005 phpBB Group

Igloo Theme Version 1.0 :: Created By: Andrew Charron