Google Docs Adds OCR: Converts Images and PDFs to Text

Optical Character Recognition utilities are a handy breed of applications which can extract text from images. Unfortunately, most desktop solutions are prohibitively expensive for casual use. Free online OCR services offer a way out by offering simple and quick conversion for free. Sure, they don’t have all the features you can find in a full-fledged desktop suite. But, they are sufficient for most users.

Google Docs is the latest online service to get OCR capabilities. While Google has been experimenting with OCR in its Docs API since last year, it was added to the Docs frontend a little while ago. Now, while uploading any document, you will be provided the option to “convert text from PDF or image files to Google Docs documents”.

To test the OCR service I used an image extracted from an Av-Comparatives report. While Google managed to detect most of text correctly, it failed to retain any formatting. It managed to detect correctly even non-dictionary words like ‘Kingsoft’, however failed to detect special characters like ‘&’ and superscripts.

Sample Document Used
Output Returned by Google Docs

Overall, the accuracy was quite good and to be honest a lot better than I had expected. However, there are obvious limitations to the product. Nevertheless, it’s a handy addition to an already impressive service.

Published by

Pallab De

Pallab De is a blogger from India who has a soft spot for anything techie. He loves trying out new software and spends most of his day breaking and fixing his PC. Pallab loves participating in the social web; he has been active in technology forums since he was a teenager and is an active user of both twitter (@indyan) and facebook .