Converting images to text can be quite easy, depending on the quality of the original image, and the kind of software that you use. There are, in fact, a number of online services that will do the conversion for free.
Scanned, converted documents are widely used in many types of business, including law, information technology, manufacturing, and consumer product support. There are many excellent online classes available to get you started in these and other fields.
The What and Why of Image-to-Text Conversion
What do we mean when we talk about converting images to text? Here’s what it’s all about:
You put a document in your scanner, and scan it. What does that give you? You wind up with a scanned image of the document, saved on your computer — in other words, a picture of the document. You can look at it and read it, but as far as you’re computer’s concerned, it’s just another picture. If you want the computer to be able to read it, you need to use software (or an online service) which will examine the document closely, recognize the individual letters (or characters) and convert them to text.
Optical Character Recognition
The general name for the kind of software that does this is “Optical Character Recognition,” or OCR, software. There was a time when OCR was a difficult and error-prone process, and the output of an OCR program generally needed a human editor to make sense of it. These days, the process is considerably more accurate and reliable (many ATM machines use OCR to read deposited checks, for example), and the more sophisticated OCR applications use artificial intelligence techniques to make educated guesses about what they’re reading.
Being Smart About It
OCR isn’t an easy task for a computer. Human beings have a built-in talent for language, and a built-in talent for visual pattern recognition, as well, and we put both of those skills to use when we read. But those are two of the most difficult skills to program into a computer — in part because we don’t understand how human beings perform these tasks. For a computer to recognize text, it must sharpen the image so that the letters stand out clearly, recognize the boundaries of each letter, identify the shapes involved (uprights, horizontal and diagonal lines, loops, curves, tails, etc.), and match what it sees against stored letter templates. Smart OCR software will also look for recognizable words (based on built-in word-lists), and try to identify words from context: where they appear, how they are used, and what words are next to them.
Making It Easier for the Computer
Since converting images to text isn’t an easy job for a computer, let’s take a moment to look at how you can make the job a little less difficult.
First of all, if you’re scanning a document into the computer, check to see if the scanner software has an OCR setting — if it does, use it. (It may also have built-in OCR capabilities; well discuss those below.) If there are no OCR settings for the scanner, try to scan the document with high contrast and in grayscale. If the scanner driver doesn’t have much in the way of image quality settings, or if you are working with an already-scanned image, you can use just about any standard graphics application to increase the contrast, convert to grayscale, crop the image, and if necessary, rotate it so that it is oriented like an ordinary page. Many OCR applications handle at least some of these tasks automatically (particularly things like contrast and grayscale adjustment), but there are times when it can help considerably to do some of them (such as rotation) ahead of time.
OCR Applications: Free Online Services
If you just want quick, reasonably accurate image-to-text conversion, and you aren’t dealing with a large number of documents, there are free, Internet-based OCR services which will generally produce good results. If you do a Google search for OCR or “Optical Character Recognition,” you will see links to some of the more widely used sites. Typically, you upload image files (in common graphics formats) one at a time for conversion; the output may be in text or some other common format, such as Microsoft Word. Most free sites place generally reasonable and lenient restrictions on the size of files to be uploaded, and on the number of documents per hour which you can convert.
OCR Applications: Bundled Scanner Software
If you have a scanner, and if you have the suite of software that originally came bundled with it, that software will probably include an OCR function. Bundled OCR software is typically rather bare-bones basic, and it may be less accurate than many of the online services, but it will generally do a decent job. The actual menu selections and options for using your scanner’s bundled OCR software will vary, depending on the software; consult your scanner’s documentation if the text-conversion options aren’t clearly visible in the scanner software’s menu.
OCR Applications: Using Adobe Acrobat for Conversion
If you have access to any recent release of the full version of Adobe Acrobat (the full, commercial application – not the free Acrobat Reader), you can use it to convert a scanned image to PDF format, and then convert the PDF to text. The actual command for conversion varies, depending on the version of Acrobat (in Acrobat 8, select OCR Text Recognition from the Document menu; in later versions, select Recognize Text from the Tools menu). The output is a PDF file with selectable text (which you can export using Acrobat’s Save As or Export options. The very high quality and accuracy of the OCR output, combined with the convenience of the PDF format, make Adobe Acrobat (full, commercial version only) the best OCR option, when it is available.
OCR Applications: Commercial Software
There are a variety of high-end commercial applications available for professional document scanning and conversion. These applications are often used by document-processing services, law offices, and businesses and institutions which need to convert large quantities of printed archival documents to electronic format. If you need to convert a large quantity of archival documents or images to text, you may want to consult with such a service.
Whether you want to learn more about office software, document management, information technology, or a career in law or other fields where document archiving is important, there are great online classes in just about any field!