Skip to content

Correcting the noisy OCR output on Bangla language using seq2seq Model

Notifications You must be signed in to change notification settings

umairanis03/Noise-Correction-OCR-Bangla

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bangla OCR Correction

Project aim:

  • Correcting OCR output on Bangla language

Workflow:

  1. Creating Bangla Corpus
  2. Pre-processing corpus, converting articles to lines
  3. Convert text-lines to Images
  4. Running OCR on these Images
  5. Calculating Word Error Rate of OCR output using ground truth text-lines

Project tree

Preparing Text Dataset

  • Bangla Text corpus not availaible
  • Wikipedia Article Scrapped for corpus preparation
  • Find Scrapped Corpus here

Conversion Text2Images

Using pytesseract to convert images2text

Spell Checker for correcting noisy output for OCR

Relevant Link to blogs/repositories

About

Correcting the noisy OCR output on Bangla language using seq2seq Model

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages