Handwriting reader using Machine Learning

Published on . Written by

Handwriting reader using machine learning
As technology surges ahead, some of the most important developments have been ones that help us save time, because as we know, time is money in the twenty-first century. Handwriting analysis has been trying to make a ripple within the industry for a while now, as they would make it a lot easier for people to upload documents onto the web. Such character recognition works on the principles of artificial intelligence, computer vision, machine learning, and pattern recognition. Computers that can analyse handwriting and recognise or detect characters will be able to convert the paper documents into machine-readable form. Here’s a look at a simple Handwriting reader project using machine learning.

Read more..
Project Description


Skyfi Labs Projects
Handwriting readers find use in optical character recognition, automatic transcription, conversion of handwritten documents into digital ones and also help in creating smart character recognition interfaces. Such technology may be viewed as a subsection under image recognition and follows several of the same techniques to function. An image recognition algorithm has a basic flow to it. It includes taking an image as an input, analyse the image within it, classifies it into one of the main data elements that the system can identify and then display the result. In this project, we will be trying to recreate the same with respect to numbers. This handwriting analyser will help identify, classify and represent numerical digits, using principles of machine learning.

Concepts Used

  1. Fundamentals of Machine Learning
  2. Character Recognition
  3. Computer Vision Fundamentals
  4. Segmentation
  5. Data Clustering
  6. Pattern Recognition algorithms
  7. Support Vector Machines
  8. Nearest Neighbor technique
Project Implementation

  • The main tasks involved in solving the problem, and building this application are as follows:
  • Load an MNIST dataset
  • Preprocess the dataset and correct the errors
  • Train a classifier
  • Ensure the classifier can categorize the digits
  • Apply the classifier model on a test-set
  • Improve accuracy
  • Finalize a model which has been trained
  • The dataset we will be using to build this model can be downloaded from kaggle, which in turn has taken it from the Modified National Institute of Standards and Technology dataset.
  • The metric we use to test our model or determine its quality is accuracy, and this metric will tell us how much of our data we were able to recognise and categorise correctly.
  • First thing to do is explore the dataset and find common ground. For example, in the MNIST dataset, there are 42,000 samples with a total of 784 features. Each sample is 28 pixels both in and width, making up a total of 784 pixels. Also, the dataset is balanced, meaning there are equal occurrences of each digit.
  • Next, we need to find the average intensity of each pixel, as this will allow us to segment the dataset and form clusters.
  • Once we have decided to use intensity to differentiate the characters, form a rough idea of which digits have higher intensity. Next, calculate the standard deviation of these intensities and use it as a factor to classify the letters.
  • Support Vector Machines is a great algorithm for hand-written character recognition as they are very effective in high dimensional spaces.
  • However, they don’t perform very well when it comes to large datasets and so we will use the Nearest Neighbor technique to classify a test-neighborhood for the dataset.
  • This allows us to perform the SVM analysis on smaller sets of data.
  • Next, set a benchmark model accuracy score, somewhere in the high 90s.
  • Use Principal Component Analysis to extract the essential features from the training set using the scikit-learn Python library.
  • Next, apply cross-validation technique to split the set into various training and testing sets using the StratifiedShuffleSplit module.
  • Next, perform the SVM-KNN by splitting and clustering data as per Euclidean distances from K nearest neighbors. The KNeighborsClassifier may be used for this.
  • Keep training this model using the datasets we have, until you arrive at the benchmark accuracy, and then your app is good to go.
Kit required to develop Handwriting reader using Machine Learning:
Technologies you will learn by working on Handwriting reader using Machine Learning:


Any Questions?


Subscribe for more project ideas