Google Docs Adds OCR: Converts Images and PDFs to Text
Posted By Pallab De On June 22, 2010 @ 8:31 am In Tech News | 2 Comments
Optical Character Recognition utilities are a handy breed of applications which can extract text from images. Unfortunately, most desktop solutions are prohibitively expensive for casual use. Free online OCR services offer a way out by offering simple and quick conversion for free. Sure, they don’t have all the features you can find in a full-fledged desktop suite. But, they are sufficient for most users.
Google Docs is the latest online service to get OCR capabilities. While Google has been experimenting with OCR in its Docs API since last year, it was added [1] to the Docs frontend a little while ago. Now, while uploading any document, you will be provided the option to “convert text from PDF or image files to Google Docs documents”.
To test the OCR service I used an image extracted from an Av-Comparatives report. While Google managed to detect most of text correctly, it failed to retain any formatting. It managed to detect correctly even non-dictionary words like ‘Kingsoft’, however failed to detect special characters like ‘&’ and superscripts.

Sample Document Used

Output Returned by Google Docs
Overall, the accuracy was quite good and to be honest a lot better than I had expected. However, there are obvious limitations to the product. Nevertheless, it’s a handy addition to an already impressive service.
Article printed from Techie Buzz: http://techie-buzz.com
URL to article: http://techie-buzz.com/tech-news/google-docs-ocr.html
URLs in this post:
[1] added: http://googlesystem.blogspot.com/2010/06/google-adds-ocr-for-pdf-files-and.html
Click here to print.
Copyright © 2006-20011 Techie Buzz. All rights reserved.