This repo contains code to extend/replicate the dataset present in the Kaggle Bengali.AI Handwritten Grapheme Classification. For the dataset, codes, discussions and leaderboards, visit the Kaggle competition page. The paper describing the dataset, protocols and future directions can be found here or here.
.
- data
-- scanned
-- extracted
-- error
-- packed
- codes
- collection
-- A4
-- Letter
- logs
- Run
python ./data/extracted/purge.py
to clear extraction folders - Download and extract batch of scanned file .jpgs to
./data/scanned/<batchname>
cd ./data/scanned
and runpython transcribeGui.py <batchname>
- After Roll/ID are transcribed execute
extract.m
on MATLAB. Specifysource
folder before executing. ReplacesurfAlignGPU()
withsurfAlign
in the absence of GPU support. Setdisp=true
forocrForm(), surfAlign(), surfAlignGPU()
to validate extraction performance. ForsurfAlign()
setnonrigid=true
. cd ./data/error
and check for extraction failures.cd ./data/extracted
and check for label errors in sub-folders.- Run
python pack.py
which will create separate folders for each extracted<batchname>
inside./data/packed
. cd ./data/packed/
and runpython labelXGui.py <batchname>
. Selectoverwriting
andempty blobs
to be discarded andCtrl+S
to save. After you are done going through all of the packets, click the transfer button to remove errors from the packaged folder.
-
MATLAB 2017b or higher
-
MATLAB Computer Vision Toolbox
-
Python 3.6.3 or higher
-
Pillow == 4.2.1
- Kaggle competition page www.kaggle.com/c/bengaliai-cv19
- Dataset introduction COCO-Grapheme