Full Tilt Tournament Summary Database and OCR Update
Categories: Tournament Summaries, GFTS, Full Tilt, Poker, G33kI’ve ditched gocr in favor of tesseract. I found a script that makes all of the image manipulation, OCR’ing and clean up a snap. All I had to do was modify the parameters for ImageMagick’s convert program to generate the best image possible.
For example, this:

turns into this:

Which OCR’s to:
$6 + $0.50 Sit & Go (Turbo)
Game: Hold’em(Turbo)No Limit Status: Completed
Buy-In: $6 + $0.50 Started: May 16 09:33
Entrants: 9 Ended: May 16 10:15
To show why I ditched gocr, here is the output from the same command line switches to imagemagick but instead of writing to tesseract’s required tiff format, I used gocr favored pbm format.
_6 + _O.5O Sit & Go (Turbo)
Game: HoId’em (Turbo) No Limit 5tatus: CompIeted
Bu_-In: _6 + _D.5D 5tarted: May 16 D9:33
Entrants: 9 Ended: May 16 lD:15
While the spacing is correct, the quality is vastly different.
Now while the spacing isn’t PERFECT, the text, numbers and symbols ARE in this example, whereas FTOPS #3 shows FTOPS as F1~OPS as the F and T run together. Next step, update the tournaments with valid screen shots with the OCR’ed data. As a bonus, the start and end times will be 100% correct.
tags: full tilt, gfts, poker, tournament summaries
This work is published under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.







