An Open Source OCR program from Google

TuxGoogle has (re-) released “Tesseract”, an OCR software, as Open Source. It was actually released several months ago, but this time they announced it.

I stopped by this news because it hit me exactly at a moment where I thought about OCR systems because I had a short chat about scanners and OCR. Afaik there were no good OCR systems in Open Source, and Luc Vincent (the Google guy who announced the OCR) is the same opinion in so far:

It’s not nearly as accurate as some of the best commercial OCR packages out there. Yet, as far as we know, despite its shortcomings, Tesseract is far more accurate than any other Open Source OCR package out there.

If you read the article to its end you will also find a link to a job offer for top-OCR engineers. I hope this means that we can expect a high quality OCR program in the Open Source world soon!
I mean, Google definitely need professional OCR programs for their book scans. And if they now help to bring this OCR system to the top of OCR techniques to use it in their daily book scans than the Free Software movement would have a powerful tool in this market!

Google should, by the way, think about hiring the developers of the current existing Open Source OCR solutions…


