Google Docs Adds OCR: Converts Images and PDFs to Text

Posted By Pallab De On June 22, 2010 @ 8:31 am In Tech News | 2 Comments

Optical Character Recognition utilities are a handy breed of applications which can extract text from images. Unfortunately, most desktop solutions are prohibitively expensive for casual use. Free online OCR services offer a way out by offering simple and quick conversion for free. Sure, they don’t have all the features you can find in a full-fledged desktop suite. But, they are sufficient for most users.

Google Docs is the latest online service to get OCR capabilities. While Google has been experimenting with OCR in its Docs API since last year, it was added [1] to the Docs frontend a little while ago. Now, while uploading any document, you will be provided the option to “convert text from PDF or image files to Google Docs documents”.

To test the OCR service I used an image extracted from an Av-Comparatives report. While Google managed to detect most of text correctly, it failed to retain any formatting. It managed to detect correctly even non-dictionary words like ‘Kingsoft’, however failed to detect special characters like ‘&’ and superscripts.

Google-Docs-Test-Document
Sample Document Used
Google-Docs-Results
Output Returned by Google Docs

Overall, the accuracy was quite good and to be honest a lot better than I had expected. However, there are obvious limitations to the product. Nevertheless, it’s a handy addition to an already impressive service.


Article printed from Techie Buzz: http://techie-buzz.com

URL to article: http://techie-buzz.com/tech-news/google-docs-ocr.html

URLs in this post:

[1] added: http://googlesystem.blogspot.com/2010/06/google-adds-ocr-for-pdf-files-and.html

Copyright © 2006-20011 Techie Buzz. All rights reserved.