Sudoku Programmers Forum Index

 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister   ProfileProfile   Log inLog in          Games  Calendar

Log in to check your private messagesLog in to check your private messages   

Online newspapers with Sudoku puzzles / beta testers needed.

 
Post new topic   Reply to topic    Sudoku Programmers Forum Index -> Programming sudoku
View previous topic :: View next topic  
Author Message
MadOverlord

Joined: 01 Jun 2005
Posts: 80
:
Location: Wilmington, NC, USA

Items
PostPosted: Sat Jun 04, 2005 3:10 am    Post subject: Online newspapers with Sudoku puzzles / beta testers needed. Reply with quote

Hi, as many of you already know, I'm writing a sudoku solver/assist app.

Those of you with Macs will be happy to know I'll probably be releasing a beta test version over the weekend (as luck would have it, on Monday I fly to the UK to visit my 99-yr-old grandmother!). I'll make a PC version available but as I don't have a commercial realbasic license, it'll only run for 5 minutes at a time [until I get someone to compile it for me].

One thing I'd like the app to do in order to make it easier to use is to make it possible to simply drag Sudokus from online sources right into the app.

Figuring out text sudokus is reasonably easy, though I'm looking for more exemplars of how people tend to format them.

Doing character recognition on graphic sudokus as published in the major newspapers is a bit more difficult. However, because I've learned a lot of sleazy programming tricks over the years, I was able to whip up a quick recognizer that seems to do a perfect job on the timesonline.co.uk and dailymail.co.uk, and is about 99% accurate on the mirror.co.uk puzzles.

Are there any other major newspapers that are publishing Sudokus online that I can test the recognizer against?

Best,R
Back to top
View user's profile Send private message Visit poster's website AIM Address
vindaloo

Joined: 03 Jun 2005
Posts: 7
:

Items
PostPosted: Sat Jun 04, 2005 4:57 pm    Post subject: Reply with quote

Quote:
Doing character recognition on graphic sudokus as published in the major newspapers is a bit more difficult. However, because I've learned a lot of sleazy programming tricks over the years, I was able to whip up a quick recognizer that seems to do a perfect job on the timesonline.co.uk and dailymail.co.uk, and is about 99% accurate on the mirror.co.uk puzzles.


Sorry I can't help you with other publications, but I am interested to know what kind of algorithms you are employing (I have a background in AI).

Also it would be interesting to know what kind of errors you are getting (when you do get errors). Does it get 5/6 confused? Or 8/9? I notice that the mirror graphics aren't exactly the clearest.

Obvioulsy if you want to keep implementation details private I understand.
Back to top
View user's profile Send private message
MadOverlord

Joined: 01 Jun 2005
Posts: 80
:
Location: Wilmington, NC, USA

Items
PostPosted: Sun Jun 05, 2005 1:10 am    Post subject: Reply with quote

[quote="vindaloo"]
Quote:
Sorry I can't help you with other publications, but I am interested to know what kind of algorithms you are employing (I have a background in AI).


As an old-school hacker, I never believe in making things more complicated than they have to be. And always using sleazy tricks if you can.

AI is all fine and dandy, but AS (Artificial Stupidity) kicks its ass 99% of the time.

The problem of recognizing characters in a Sudoku matrix is a very limited subset of character recognition, and a lot of the difficulties go away.

The algorithm is quite simple:

1) greyscale the image

2) strip the outer edges of the puzzle until you get to the outer border of the puzzle. Be a little tricky so you skip past the text in the border of the Times puzzle. Basically, I just keep going until I hit a row or column with more than 75% black pixels, which I assume to be a line. I consider any pixel with a value < 200 (out of 255) to be black.

3) now do exactly the opposite, and continue in stripping the outer border until you get to the non-line inside of the puzzle.

4) You now know how big the puzzle is, and about how big each square is.

5) For each square in the puzzle:

6) Start in the middle, and expand out in all 8 directions until you hit a black pixel. If you get more than halfway to the edge without hitting one, consider the square empty.

7) starting with a perimeter around the black pixel, expand the perimeter up, down, left and right until the entire perimeter is white. You now have the character inside the perimeter.

8) extract the character from the perimeter. Scale it to a standard size (I used 32x32) even if this distorts the character.

Now comes the sleazy trick.

9) compare the character against entries in an exemplar matrix. Mine had 9 columns of 32x32 greyscale "scaled" characters, and 3 rows, one for each of the three newspapers (this was more to get an idea of the different fonts, I could have in fact just generated exemplars of different standard fonts). This exemplar matrix works better if it is slightly gaussian-blurred.

10) The score for a particular entry in the matrix is the sum of the differences between the pixels in the exemplar entry and the character we are checking.

11) Lowest score wins.

12) repeat for the rest of the squares.

Dirt simple, and reasonably fast (about 30secs/puzzle on my machine) even though it's running in RealBasic which isn't known for its speed, and I haven't really tuned it for speed yet.
Back to top
View user's profile Send private message Visit poster's website AIM Address
vindaloo

Joined: 03 Jun 2005
Posts: 7
:

Items
PostPosted: Sun Jun 05, 2005 11:46 am    Post subject: Reply with quote

I'm impressed with what you've managed to put together. I think you do yourself a disservice by calling them sleazy-tricks!

Here is something easy to implement which may help (before you scale down to 32x32):
You could try to clean up the images somewhat by ignoring the light gray bits e.g. treating everything above 150(you choose the exact threshold) as 255(white). My guess is that it is these parts of the image in particular which cause most confusion between digits.

This should e.g. allow the gaps in the digits to be more distinct, i.e. it should discriminate better between 5 and 6. (it would be nice to know what the 1% digit is you are getting wrong in the mirror).

Let me know if that helps any. If not I can suggest something else.

Vin.
Back to top
View user's profile Send private message
MadOverlord

Joined: 01 Jun 2005
Posts: 80
:
Location: Wilmington, NC, USA

Items
PostPosted: Sun Jun 05, 2005 3:44 pm    Post subject: Reply with quote

I actually tried that, but it makes the recognition less accurate.

The reason is that the gifs/jpegs that are autocreated for the websites are all anti-aliased, and are often shifted by a fraction of a pixel relative to the exemplars. So keeping all the grey anti-aliased pixels (and even the jpeg artifacts) and comparing them to the slightly blurred exemplars compensates for this problem; the errors tend to cancel out.

The current recognizer is 100% accurate on all the test images I have, but now that I've said that, it'll stop working.

http://www.madoverlord.com/projects/sudoku.t has the first beta release of the program. Enjoy.
Back to top
View user's profile Send private message Visit poster's website AIM Address
vindaloo

Joined: 03 Jun 2005
Posts: 7
:

Items
PostPosted: Sun Jun 05, 2005 5:05 pm    Post subject: Reply with quote

Hey I thought you said it took 30 seconds to recognize a puzzle? I've installed it on my xp machine and it took 1-2 seconds max. Have you tweaked it?? It works really well.

The program itself is really good. The drag and drop feature alone is worth the download.

The font looks tiny on my (1280x1024) screen though - not very pleasing to the eye. Actually looking at the documentation the Mac version looks better all around. Damn you microsoft!!

Vin.
Back to top
View user's profile Send private message
MadOverlord

Joined: 01 Jun 2005
Posts: 80
:
Location: Wilmington, NC, USA

Items
PostPosted: Sun Jun 05, 2005 5:17 pm    Post subject: Reply with quote

vindaloo wrote:
Hey I thought you said it took 30 seconds to recognize a puzzle? I've installed it on my xp machine and it took 1-2 seconds max. Have you tweaked it??


Yes, I stumbled upon a faster way to access the pixel data in RealBasic.

vindaloo wrote:
The font looks tiny on my (1280x1024) screen though - not very pleasing to the eye. Actually looking at the documentation the Mac version looks better all around.


Email me a screenshot and I'll see what I can do: trebor@animeigo.com
Back to top
View user's profile Send private message Visit poster's website AIM Address
pseudo coup

Joined: 28 Aug 2005
Posts: 6
:

Items
PostPosted: Tue Sep 13, 2005 1:37 pm    Post subject: re: www Sudoku puzzles as images Reply with quote

MadOverlord wrote:
Doing character recognition on graphic sudokus as published in the major newspapers is a bit more difficult.

I was able to whip up a quick recognizer that seems to do a perfect job on the timesonline.co.uk and dailymail.co.uk, and is about 99% accurate on the mirror.co.uk puzzles.

Are there any other major newspapers that are publishing Sudokus online that I can test the recognizer against?




images:



and let me take this opportunity to ask whether you accept text in a PDF
e.g.
Back to top
View user's profile Send private message
MadOverlord

Joined: 01 Jun 2005
Posts: 80
:
Location: Wilmington, NC, USA

Items
PostPosted: Tue Sep 13, 2005 2:19 pm    Post subject: Re: re: www Sudoku puzzles as images Reply with quote

pseudo coup wrote:


images:




Reads them with ease.

Quote:


and let me take this opportunity to ask whether you accept text in a PDF
e.g.


Yes; the daily telegraph back-puzzles require clipping just the puzzle using an app like snap-n-drag (Mac) because of all the other stuff in the page. I have special-case code for the Guardian puzzles if you download the PDF file and drag that in, otherwise clipping out the puzzle is required (you don't have to be that exact; just so long as there's no extra junk in the stuff you clipped out). The Sudoku-xls stuff requires clipping because it has multiple puzzles per page, and multiple pages, but they load fine.

You can drag PDF files right into the Susser (on the Mac, at least) and it'll try and find the puzzle.

At this point, the only online sudokus that the Susser cannot OCR are those that have weird ornamentation inside the puzzle squares, or that are so small and anti-aliased that digit recognition is extremely difficult (ie: the 6's and 8's are very similar). But anything over 150x150 pixels or so is no sweat.

It got a lot better about a month ago when I started using the puzzle context to verify the OCR; when the OCR gets a puzzle, I check it for validity and for single-solution, and retry with different parameters if I don't get something "good". 95% of the time, however, it loads properly on the first try.
Back to top
View user's profile Send private message Visit poster's website AIM Address
pseudo coup

Joined: 28 Aug 2005
Posts: 6
:

Items
PostPosted: Thu Sep 22, 2005 11:39 am    Post subject: OCR Reply with quote

i just tried the green image
The Sunday Times Su Doku 1: July 31, 2005,

and the OCR got 6s as 4, 8s as 3 or 6

( running under Microsoft Windows )

probably the fuzziness of their early images?
( the new puzzle 8 was recognized just fine )
Back to top
View user's profile Send private message
jbum

Joined: 14 Aug 2005
Posts: 14
:
Location: Los Angeles

Items
PostPosted: Fri Sep 23, 2005 12:13 am    Post subject: Reply with quote

I've got a test suite of 10000 sudokus that I generated of random complexity. If you want a copy (it's a text file) just drop me a note. jbum [at] jbum.com

- Jim
_________________
http://www.krazydad.com/
Back to top
View user's profile Send private message Visit poster's website
MadOverlord

Joined: 01 Jun 2005
Posts: 80
:
Location: Wilmington, NC, USA

Items
PostPosted: Fri Sep 23, 2005 12:54 am    Post subject: Re: OCR Reply with quote

pseudo coup wrote:
i just tried the green image
The Sunday Times Su Doku 1: July 31, 2005,

and the OCR got 6s as 4, 8s as 3 or 6

( running under Microsoft Windows )

probably the fuzziness of their early images?
( the new puzzle 8 was recognized just fine )


Yeah, they're using a new font. I've had to add an exemplar for it; 2.0.0 of the Susser, out this weekend I hope, will read them properly.
Back to top
View user's profile Send private message Visit poster's website AIM Address
Simes

Joined: 08 Apr 2005
Posts: 71
:
Location: North Yorkshire, UK

Items
PostPosted: Sat Oct 01, 2005 10:27 pm    Post subject: Reply with quote

MadOverlord wrote:
(an OCR algorithm)

Wow! I'm really impressed.

Since I wrote my very first version I wanted to implement OCR, but I never expected to actually do it. It was just a pipe dream, but, I just implemented your algorithm in a test program and it works great. Every image from The Times is perfect.

OK, I expect a few problems when I try other papers, but I can expand the sample digit images to cope... I hope.

many thanks for sharing.

Simon
_________________
Simes
www.sadmansoftware.com/sudoku


Last edited by Simes on Wed Mar 14, 2007 9:54 pm; edited 1 time in total
Back to top
View user's profile Send private message Visit poster's website
MadOverlord

Joined: 01 Jun 2005
Posts: 80
:
Location: Wilmington, NC, USA

Items
PostPosted: Sun Oct 02, 2005 12:39 am    Post subject: Reply with quote

My current version has several optimizations:

1) given a set of exemplar rows, it checks all the digits against all the rows and counts how many times a particular row is the best match for a particular digit. If one row scores 2/3 of the hits, it is used; otherwise rows that scored 0 hits + the lowest scoring row are discarded and the process is repeated until we get a row that scores 2/3 of the hits or only one row left.

2) The surviving row is used to OCR in the puzzle.

3) The puzzle is then checked to see if it is a valid, 1 solution sudoku. If it is, we're done. If not, the process is repeated, excluding the exemplar row(s) used in previous iteration(s).

There are some pathological puzzles that require 5-6 iterations before it catches them (mostly small, fuzzy digits, or ones with really "hooky" sixes that look like 8's), but it gets almost everything.
Back to top
View user's profile Send private message Visit poster's website AIM Address
Display posts from previous:   
Post new topic   Reply to topic    Sudoku Programmers Forum Index -> Programming sudoku All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Sudoku Programmers topic RSS feed 


Powered by phpBB © 2001, 2005 phpBB Group

Igloo Theme Version 1.0 :: Created By: Andrew Charron