Optical Character Recognition(OCR)

Published on . Written by

Optical Character Recognition
There is so much data in the world around us since humans started using computers and other digital media in an unprecedented way. This leads to a large volume of digital information which must be processed in the right way to be read and interpreted by machines.

Read more..
Machines don't see the world the way we do, but they are being trained to do so, thanks to the advanced concepts such as Machine Learning, Deep Learning, and Artificial Intelligence, all of which helps in making machines a little more human. This time around, we will be taking a look at how to implement these techniques to create an optical character recognition tool.


Skyfi Labs Projects
Project Description

Optical Character Recognition, better known as, OCR is a tool that allows you to read data from documents supplied to a machine, such as a picture, PDF or screenshot. Such tools will allow you to convert data on such pictorial items and give them a text editor touch. This helps you to edit, search and join the textual data, which was read from a picture or visual source.

This project aims to create a tool which, when supplied with an input image, will be able to extract alphabets, digits, and symbols from it. The process will be easier to implement on printed data because that is easier to analyze, but the same system can be built for handwritten notes as well.

Project Implementation

  1. OCR involves the automatic regeneration of images into digitized texts which can be used in machine processes.
  2. This machine-encoded text may then be used for machine translation, text-to-speech transformation or text mining.
  3. Use an optical scanner to capture a digital image of the required document.
  4. Next, convert the multilevel, multi-color image into a binary image in gray-scale.
  5. Perform thresholding on the image using some pre-defined values to convert the image into black-and-white and also to reduce space.
  6. A fixed threshold in which gray-levels below a particular value is set to black and ones above are set to white.
  7. Presence of noises cause poor recognition rates.
  8. Hence, they are eliminated using a preprocessor or smoother. We have already discussed such a filter. Go here to read up about image enrichment and principles of filtering.
  9. Once the document has been properly binarized, a top-down segmentation is done.
  10. The document is analyzed line by line, and words are extracted.
  11. The words are segmented into characters.
  12. The segmentation works on component extraction and average figure elevation evaluation.
  13. A block-based Hough transform detects potential text lines.
  14. K-Means clustering algorithm is used to create compact clusters.
  15. All Characters must then be grouped into k clusters.
  16. Convert these clusters into classes with a unique ASCII label.
  17. Now the system must be trained using a fixed set of data.
  18. Characters are then identified and extracted by converting them into field vectors.
  19. To get precision to 100%, a lot of fine-tuning and training is required.
  20. Improvements must also be made in the pre-processing work, for the tool to be effective.
Concepts Used

  • Data Acquisition
  • Feature Extraction
  • Segmentation
  • Machine Learning
  • MATLAB or Octave
  • JAVA Programming/Python Programming
  • Image processing
  • Natural language processing
  • Artificial Intelligence
Components of an OCR

  • Digitizing
  • Analog to Digital Scanner
  • Data extraction
  • Segmentation
  • Pre-processor- Filtering, Increasing clarity
  • Feature Extraction
  • Comparison with Known object
  • Learning phase modules
  • Reconstruction of words
Kit required to develop Optical Character Recognition(OCR):
Technologies you will learn by working on Optical Character Recognition(OCR):


Any Questions?


Subscribe for more project ideas