An Open Source OCR program from Google

TuxGoogle has (re-) released “Tesseract”, an OCR software, as Open Source. It was actually released several months ago, but this time they announced it.

I stopped by this news because it hit me exactly at a moment where I thought about OCR systems because I had a short chat about scanners and OCR. Afaik there were no good OCR systems in Open Source, and Luc Vincent (the Google guy who announced the OCR) is the same opinion in so far:

It’s not nearly as accurate as some of the best commercial OCR packages out there. Yet, as far as we know, despite its shortcomings, Tesseract is far more accurate than any other Open Source OCR package out there.

If you read the article to its end you will also find a link to a job offer for top-OCR engineers. I hope this means that we can expect a high quality OCR program in the Open Source world soon!
I mean, Google definitely need professional OCR programs for their book scans. And if they now help to bring this OCR system to the top of OCR techniques to use it in their daily book scans than the Free Software movement would have a powerful tool in this market!

Google should, by the way, think about hiring the developers of the current existing Open Source OCR solutions…

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s