In this project, we aimed at extracting required fields in Vietnamese receipts captured by mobile devices 😄 We successfully built the flow and simply containerize as well as deploy the system on a webpage using streamlit. Everything is ready to use now! The beblow image is the main pipeline of ours, which includes background subtraction(Maskrcnn), invoice alignment(craft+resnet34), text detection(craft), text recognition(vietocr) and key information extraction(graphsage).
About the dataset, we utilized MC-OCR 2021. In general, the training set has 1155 images and the corresponding key fields, texts as the labels. Especially, this dataset is quite complex when having various backgrouds, as well as low quality images ... So EDA and proprcessing task are required to get good model performance!
More about Graphsage model, this is an improvement version of the original graph neural network which not only laverages the node attribute infomation from adjust nodes but also generates a representation for a new data which has not ever been seen previously. In detail, firstly, the graph is splitted into k
levels based on the distance from the current node. Then, feature of this node is updated by summarizing the embedding of its neighbors, we used mean
aggregated operator for this step. Excuting this way multiple times helping the information to propagate back from the furthest level. In our deployed model, we stacked consecutively 5
graphsage layers with relu activation. A simple fully connected layer is used on the top of our model to predict a probabilistic vector for key classification. The updating processing as well as node classification is illustrated as the following image:
You can easily run the project by running the below commands, but note that you already had docker in your computer:
git clone https://github.com/manhph2211/MC-OCR.git
cd MC-OCR
docker build -t "app" .
docker run app
Thanks to the authors: