This is a school project in deep learning I am currently working on.
It consists in building a handwritten text recognition system using a CNC-LSTM-CTC architecture.
I have planned to use a language model later to analyse recognized words and improve the accuracy.
This article was really helpful to understand the concept of Convolutional Recurrent Neural Network (CRNN).
I am using the IAM Dataset which includes about 115,000 labelled images of English words from more than 1500 handwritten letters.
You have to register to download the dataset. Once it is done, unzip it and place the 'words' directory and 'words.txt' file in the project repository as following :
deepOCR repository
βββ data
β βββ words
β β βββ a01
β β βββ a02
β β βββ ...
β βββ words.txt
βββ src
βββ ...