Optical Character Recognition


In this digital day and age, it has become obligatory to have all the available information in a digital form recognized by machines. In a country like India, where there is abundance of information in the form of manuscripts, ancient textbooks etc that are traditionally available in printed/handwritten form, such material is inadequate when it comes to searching information among thousands of pages. It must be digitized and converted to a textual form in-order to be recognized by machines doing searches of a million pages/second. Then only, the true knowledge of Indian history, tradition and culture would be available to the masses and the digital revolution would be said to have reached the information age.

Introduction to Optical Character Recognisition

In this digital day and age, it has become obligatory to have all the available information in a digital form recognized by machines. In a country like India, where there is abundance of information in the form of manuscripts, ancient textbooks etc that are traditionally available in printed/handwritten form, such material is inadequate when it comes to searching information among thousands of pages. It must be digitized and converted to a textual form in-order to be recognized by machines doing searches of a million pages/second. Then only, the true knowledge of Indian history, tradition and culture would be available to the masses and the digital revolution would be said to have reached the information age.

Optical character recognition plays an important role in achieving this. It converts the scanned images of books, magazines, and newspapers into machine-readable text.

Introduction to OCR

The OCR, Optical Character Recognition or Optical Character Reader is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, it can be from a scanned document, a photo of a document, or from subtitle text superimposed on an image.