|
View previous topic :: View next topic |
Author |
Message |
| barcode
| Joined: 02 Nov 2005 | Posts: 7 | : | | Items |
|
Posted: Mon Dec 05, 2005 7:43 pm Post subject: |
|
|
Ruud could you tell me what license you are releasing that XSD under, I would like to use it as it is the only documented XML Sudoku format I have seen. BSD would be good. |
|
Back to top |
|
|
| Ruud Site Admin
| Joined: 17 Sep 2005 | Posts: 708 | : | Location: Netherlands | Items |
|
Posted: Mon Dec 05, 2005 9:25 pm Post subject: |
|
|
barcode wrote: | Ruud could you tell me what license you are releasing that XSD under, I would like to use it as it is the only documented XML Sudoku format I have seen. BSD would be good. |
I uploaded a new version with a BSD license included in the annotations.
Since we're drawing up standards here, I suggest we use a .sdx extension for these files, unless you have a better suggestion.
Ruud. _________________ Meet me at sudocue.net |
|
Back to top |
|
|
| barcode
| Joined: 02 Nov 2005 | Posts: 7 | : | | Items |
|
Posted: Mon Jan 16, 2006 9:17 pm Post subject: |
|
|
Are you using this schema in your program, is anyone else using this schema? |
|
Back to top |
|
|
| Miles
| Joined: 29 Dec 2005 | Posts: 30 | : | | Items |
|
Posted: Mon Jan 16, 2006 9:56 pm Post subject: |
|
|
I use XML to store my work, for the moment, it is like this :
<Sodoku size="3" >
<caseValues row="0" column="3" value="13" />
</Sodoku>
I'll add some nodes for possible values or more complicated games. |
|
Back to top |
|
|
| PSBlake
| Joined: 18 Jan 2006 | Posts: 8 | : | | Items |
|
Posted: Wed Jan 18, 2006 6:51 pm Post subject: |
|
|
Alternatively, if you're going for smallest size, and not a "human readable" format, then consider:
First byte: Size of the grid. We'll just deal with square grids, ie, 9x9.
Next (N^2)/8 bytes (rounded up): The grid, in binary format, with 1s for clue boxes, and 0s for blanks. Excess bits at the end of the string are ignored.
Next, enough bytes to represent the number of clue boxes. Express in hex equivalent of base-nine value (you don't have to deal with zeros).
For a 17-clue, 9x9 grid sudoku, that's 1 byte for size, 4 bytes for "where are clues", and 7 bytes (ln[9^17]/ln[256]) for the actual clue string. I can name that Sudoku in 12 bytes. |
|
Back to top |
|
|
| gsf
| Joined: 18 Aug 2005 | Posts: 411 | : | Location: NJ USA | Items |
|
Posted: Wed Jan 18, 2006 7:54 pm Post subject: |
|
|
PSBlake wrote: | Alternatively, if you're going for smallest size, and not a "human readable" format, then consider:
First byte: Size of the grid. We'll just deal with square grids, ie, 9x9.
Next (N^2)/8 bytes (rounded up): The grid, in binary format, with 1s for clue boxes, and 0s for blanks. Excess bits at the end of the string are ignored.
Next, enough bytes to represent the number of clue boxes. Express in hex equivalent of base-nine value (you don't have to deal with zeros).
For a 17-clue, 9x9 grid sudoku, that's 1 byte for size, 4 bytes for "where are clues", and 7 bytes (ln[9^17]/ln[256]) for the actual clue string. I can name that Sudoku in 12 bytes. |
not so fast
for N=9 you'll need 11 bytes for "where are the clues"
|
|
Back to top |
|
|
| PSBlake
| Joined: 18 Jan 2006 | Posts: 8 | : | | Items |
|
Posted: Wed Jan 18, 2006 11:28 pm Post subject: |
|
|
gsf wrote: |
not so fast
for N=9 you'll need 11 bytes for "where are the clues"
|
You're right. I made another mistake in a different post (referencing this post), counting on 17 or fewer clues in all Sudoku. All Nikoli Sudoku are 32 clues or fewer, making all Nikoli 9x9 Sudoku expressable as:
11 bytes for clues grid
(ln[9^32]/ln[256]) bytes (rounded up) for clue values = 13 bytes
For a grand total of 24 bytes per 9x9 Nikoli Sudoku. |
|
Back to top |
|
|
| gsf
| Joined: 18 Aug 2005 | Posts: 411 | : | Location: NJ USA | Items |
|
Posted: Thu Jan 19, 2006 1:06 am Post subject: |
|
|
PSBlake wrote: |
For a grand total of 24 bytes per 9x9 Nikoli Sudoku. |
for a catalog of 100,000 puzzles
82 bytes per puzzle (81+newline), . for empty,
average # clues ~25, average compressed bytes per puzzle
Code: |
24.2 gzip
21.5 bzip
19.8 run length encoding + huffman encoding
|
|
|
Back to top |
|
|
| Ruud Site Admin
| Joined: 17 Sep 2005 | Posts: 708 | : | Location: Netherlands | Items |
|
Posted: Thu Jan 19, 2006 1:54 am Post subject: |
|
|
I think the 24 byte version also has some room for compression.
However, I store my large collections (>500.000 puzzles) in 9 32-bit integers per puzzle. The upper limit of the int32 allows 9 digits (a single row) to be stored. I am not compressing these. I do value performance over diskspace and memory, which are cheap.
Ruud.
PS. Talking about performance, did anyone notice the improvements for this forum? _________________ Meet me at sudocue.net |
|
Back to top |
|
|
| Vikstar
| Joined: 23 Jan 2006 | Posts: 1 | : | | Items |
|
Posted: Mon Jan 23, 2006 7:54 am Post subject: Smallest ASCII Format? |
|
|
Purpose: to do away with all the little/big endian problems of binary format, and to make it *slightly* human readable, but to also make it small.
To be applied to very sparse 3x3 sudoku 17-hint puzzles, the best format I can come up with is the following.
Hints are written with two characters, the first is [1..9] representing the column and the second is [1..9] representing the actual hint number or value. All hints are written for a row and each row is separated with some arbitrary character, such as ':'.
For example, the following puzzle:
Code: |
+---+---+---+
|...|...|.1.|
|4..|...|...|
|.2.|...|...|
+---+---+---+
|...|.5.|4.7|
|..8|...|3..|
|..1|.9.|...|
+---+---+---+
|3..|4..|2..|
|.5.|1..|...|
|...|8.6|...|
+---+---+---+
|
would be encoded as
Code: |
81:14:22:557497:3873:3159:134472:2541:4866
|
Since there are 9 rows, it would require 8 ':' characters, and 17 hints would require 17 * 2 characters, which gives (8+17*2) = 42 bytes per puzzle. 43 if you want to include a newline between puzzles.
Any suggestions on how this can be improved, or any suggestions on completely different formats which are both ASCII and small?
Edit: Ofcourse, to make it even more human readable you could use [a..i] for the column names. |
|
Back to top |
|
|
| gsf
| Joined: 18 Aug 2005 | Posts: 411 | : | Location: NJ USA | Items |
|
Posted: Mon Jan 23, 2006 10:31 am Post subject: |
|
|
Ruud wrote: | I think the 24 byte version also has some room for compression.
|
my point was that off the shelf compressors will usually beat specialized
encodings so why bother with another level of obfuscation
all of the solvers on this forum can parse text puzzles and with
a decompressor and a pipe they can all access an efficiently stored catalog
but if you still want a specialized encoding
here's an efficient one using 4 bit nibbles:
Code: |
nibble values 0-8 for cell values 1-9
nibble values 9-15 for run of 1-7 holes
if the nibble count is odd add another 0 nibble
|
rounding up to whole bytes this encoding uses 18 bytes per puzzle for gordon's 17 catalog
and it seems to use ~average#clues+1 for puzzles scraped from the forums |
|
Back to top |
|
|
| PSBlake
| Joined: 18 Jan 2006 | Posts: 8 | : | | Items |
|
Posted: Tue Jan 24, 2006 3:12 am Post subject: |
|
|
gsf wrote: |
Code: |
nibble values 0-8 for cell values 1-9
nibble values 9-15 for run of 1-7 holes
if the nibble count is odd add another 0 nibble
|
|
A) What's the deal with the last line there?
B) What about the following Sudoku:
Code: |
...|386|..5
..2|1..|7..
59.|.2.|...
-----------
.81|...|...
.6.|...|...
...|..8|...
-----------
...|..5|...
6..|...|3.8
...|4..|.17 |
The gap between the r5c2 "6" and the r6c6 "8" is too big to represent in 3 bits. Same for the gap between the r6c6 "8" and the r7c6 "5".
I see what you're getting at. It's just that your implementation disallows some puzzles.
Anyway, the minimal space approach is useful for say, an embedded system with extremely limited memory and low processor power. Like a $10 handheld unit dedicated entirely to Sudoku (Although that is, itself, a flawed example: Such handhelds usually generate puzzles on the fly, rather than using pregenerated puzzles). Compression software isn't generally feasable, as it requires memory overhead for execution, and processor power for retrieval. Less CPU ticks are used for simply parsing a binary format, and I don't think we've yet hit upon the "minimal space" solution.[/code] |
|
Back to top |
|
|
| gsf
| Joined: 18 Aug 2005 | Posts: 411 | : | Location: NJ USA | Items |
|
Posted: Tue Jan 24, 2006 3:53 am Post subject: |
|
|
gsf wrote: |
Code: |
nibble values 0-8 for cell values 1-9
nibble values 9-15 for run of 1-7 holes
if the nibble count is odd add another 0 nibble
|
|
PSBlake wrote: |
A) What's the deal with the last line there?
|
the encoding could be an odd number of nibbles
so to make it a whole byte count add 1 nibble of 0
PSBlake wrote: |
B) What about the following Sudoku:
Code: |
...|386|..5
..2|1..|7..
59.|.2.|...
-----------
.81|...|...
.6.|...|...
...|..8|...
-----------
...|..5|...
6..|...|3.8
...|4..|.17 |
The gap between the r5c2 "6" and the r6c6 "8" is too big to represent in 3 bits. Same for the gap between the r6c6 "8" and the r7c6 "5".
|
there's nothing in the spec that prohbits 2 or more nibbles to specify a run of holes > 7
a run of 8 holes would need 2 hex nibbles "f9"
a run of 16 holes would need 3 "ffa"
the empty grid is 12 nibbles, 6 bytes "ff ff ff ff ff fc"
this puzzle encodes in 19 bytes (listed in hex):
Code: | b2 75 a4 a1 0a 6a 48 a1 d7 0f 5f d7 f9 4b 5d 29 7b 3b 06 |
PSBlake wrote: |
Anyway, the minimal space approach is useful for say, an embedded system with extremely limited memory and low processor power.
|
hey, I was just responding to those who insist on an encoding beyond
the snarfable Code: forms seen on the forums
I won't use this encoding in my solver/generator because others would
need a special decoder to read my catalogs
it is an interesting exercize -- beat 19 bytes for this puzzle |
|
Back to top |
|
|
| PSBlake
| Joined: 18 Jan 2006 | Posts: 8 | : | | Items |
|
Posted: Thu Jan 26, 2006 11:06 pm Post subject: |
|
|
I don't think anyone was insisting on using special encoding. It was just a "minimal space" suggestion.
Consider: There are only 5,472,730,538 possible distinct Sudoku solutions (for a 9x9 grid). For each of those, between 17 and 80 boxes will be givens.
5,472,730,538 is representable in 5 bytes [actually, in 33 bits, but I'm rounding to the nearest byte]. I submit the following as a variation of gsf's method:
First 5 bytes: Enumeration of the sudoku grid.
Next byte: Size of the largest gap in the puzzle. Technically, the largest possible gap is 64, which resolves to 6 bits, but wasting two bits is trivial.
Next bytes: Enumeration of the gap sizes. If the largest gap size is X, then the list of gap sizes can be represented as a base-(x+1) string, converted to hex.
So the aforementioned Sudoku resolves to:
[00 00 00 00 00*] 0C [D3 3E DA 28 A3 DF E0 12**]
*These bits are unknown, as I don't know the enumeration for distinct Sudoku.
**Assuming I did my math right converting from base-13 to base 16.
I can demonstrably name that Sudoku in 14 bytes.
Additional space savings could be had by examining whether there were more adjacent clues than adjacent blanks, and using one of the two unused bits to represent which would provide better savings.
And I still think there's a smaller way. |
|
Back to top |
|
|
| gsf
| Joined: 18 Aug 2005 | Posts: 411 | : | Location: NJ USA | Items |
|
Posted: Thu Jan 26, 2006 11:38 pm Post subject: |
|
|
PSBlake wrote: | I don't think anyone was insisting on using special encoding. It was just a "minimal space" suggestion.
|
insisting in the face of readily available methods like gzip and bzip2
PSBlake wrote: |
Consider: There are only 5,472,730,538 possible distinct Sudoku solutions (for a 9x9 grid). For each of those, between 17 and 80 boxes will be givens.
5,472,730,538 is representable in 5 bytes [actually, in 33 bits, but I'm rounding to the nearest byte]. I submit the following as a variation of gsf's method:
First 5 bytes: Enumeration of the sudoku grid.
|
so to decode you would need a dictionary of 5,472,730,538 canonical puzzles
or a function that produces a canonical puzzle given its label
but that would only give you a puzzle equivalent to the original
to get the exact original you would also need the row/col permutations
and the cell label permutation |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
Igloo Theme Version 1.0 :: Created By: Andrew Charron
|