Quick Link: What Authors Should Know About OCR

Quick links, bringing you great articles on writing from all over the web.

OCR or optical character recognition is when someone scans text and uses a computer program to recognize and pull the text from the scan. This is very important to know if you have created an eBook from an older manuscript because OCR is notorious for having issues. At Digital Book World, explains.

~ * ~

What Authors Should Know About OCR

By: Ben Denckla | June 27, 2016

Typewritten manuscripts are especially difficult for OCR
Typewritten manuscripts are especially difficult for OCR

Expert publishing blog opinions are solely those of the blogger and not necessarily endorsed by DBW.

If you published a book before 2008, its ebook edition was probably created using optical character recognition (OCR). And if your ebook was created using OCR, it probably has typos in it. That’s the bad news.

The good news: you don’t have to accept this situation.

What’s special about the year 2008? Nothing, really. I just chose 2008 because the first Kindle came out in late 2007. So 2008 is the earliest year I can imagine a significant number of publishers adopting a single-source workflow: a workflow in which the ebook is created from the same files used to create the paper book. For example, nowadays Adobe InDesign can create an ebook and a paper book (well, a PDF) from the same file. A single-source workflow avoids OCR and OCR-caused typos. It doesn’t avoid all problems, but it goes a long way toward making higher-quality ebooks.

Many publishers continued to use OCR for books published more recently than 2008. On the other hand, commendably, some publishers used single-source workflows for books published before 2008. Since files may be available for books published as long ago as the 1970s, single-source workflows are possible (though unlikely) for books published while Jeff Bezos was still a child.

The bottom line for authors is this: regardless of its year of paper publication, ask your publisher whether OCR was used to create the ebook edition of your book.

If OCR was used, your ebook probably has typos in it. It was probably spellchecked, but not carefully. The whole conversion, including spellchecking, was probably outsourced to inexpensive workers who, even if their English skills were good, were probably working under severe time constraints. And even the most careful spellchecking, as you know, is no substitute for good old proofreading. Your ebook was almost certainly not proofread.

So what can you do?

~ * ~

If you liked this article, please share. If you have suggestions for further articles, articles you would like to submit, or just general comments, please contact me at paula@publetariat.com or leave a message below.