Skip to content

matteomedioli/language-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Language Classifier

Binary Language Classifier with PyTorch and Flask.

Dataset: https://www.kaggle.com/datasets/basilb2s/language-detection


Local Setup

Install requirements

pip install -r requirements.txt

Debug

python src/app.py

Run

SET FLASK_APP=./src/app.py 
python -m flask run

Docker Setup

Build Docker Image

docker build . -t language-classifier-image

Run Docker Container

docker run --name language-classifier -d -p 5000:5000 language-classifier-image

Usage

Train [POST]

Endpoint: http://localhost:5000/train

All hyperparameters are optional and, if not set, default values are used. Below is an example of a body request with the configurable hyperparameters and their default values:

Request

    {
        "epochs": 10,
        "lr": 5, 
        "step_size": 1.0,
        "gamma": 0.1,
        "batch_size": 64,
        "input_dim": 4,
        "embed_dim": 32,
        "num_classes": 17,
        "eval_every": 100
    }

Response

    {
        "accuracy": 0.90
    }

Test [GET]

Endpoint: http://localhost:5000/test

Response

    {
        "test_accuracy": 0.90
    }

Inference [POST]

Endpoint: http://localhost:5000/predict

Request

    {
        "text": "questa è una frase in italiano!"
    }

Response

    {
        "class": 1
    }

TensorBoard

docker cp language-classifier:/app/runs docker_runs
tensorboard --logdir docker_runs

Preprocessing Automated Tests

pytest src/tests.py

Binary to Multi-Class

The model is configured to recognize Italian sentence. To switch to Multi-Class configuration change this line in predict() method:

response = 1 if LANG_LOOKUP[label.item()] == "Italian" else 0

to

response = LANG_LOOKUP[label.item()]

About

Language Classifier with Pytorch, Flask and Docker

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published