GitHub - Big-Data-Programming/sentiment-analysis-gke-pipeline: A fully GKE managed Sentiment analysis application with language model training and inference schemes automated using github actions

A fully GKE managed Sentiment analysis application with language model training and inference schemes automated using github actions

Architecture Overview

Block diagram to show the internal working of this project

Block diagram to show the github action automation to establish the CI/CD pipeline

Installation

Clone repository.
cd into cloned directory.
Create and activate a new virtual or miniconda python environment with python 3.10 or latest. E.g. for miniconda:
```
   conda create -n sentiment_analysis_ci_cd
   conda activate sentiment_analysis_ci_cd
```
Install package in develop mode via pip install -e .[tests].
Install pre-commit via pre-commit install.
- Optional: Run hooks once on all files via pre-commit run --all-files
- Make sure to autoupdate pre-commit autoupdate

Wandb cmds

To upload a dataset make sure you have the data in you local, wandb account project created and then run the below cmd (you can also upload multiple files):
- Run python sa_app/scripts/wandb_init.py --entity <you_user_name> --project <name of wandb project> --artifact_name <artifact name> --artifact_locations <artifact local path>
- Example python sa_app/scripts/wandb_init.py --entity bdp_grp2 --project sa-roberta --artifact_name sentiment-dataset --artifact_locations <you can provide multple files separated by a whitespace>
To manually download the datasets run the below command with the desired file name:
- wandb artifact get prabhupad26/sa-roberta/sentiment-dataset:latest --root training.1600000.processed.noemoticon.csv

Run training

If running for the first time follow below steps :

Download dataset from here
Run python -m spacy download en_core_web_sm
cd sa_app/src
Run this cmd from the root dir of training module (i.e. sa_train/sa_train_module) : python training/train.py --config <path to train_cfg.yml file>

Build docker image

Run below cmds from the root path of this repo

docker build -t <name the image> .
docker run -p 5000:5000 <name of the image>

Workflow rules :

For training :

wildcard - training/* if matched will run the model-training workflow

Workflow (via github action) flow diagram :

Name		Name	Last commit message	Last commit date
Latest commit History 288 Commits
.github		.github
kubernetes-manifests		kubernetes-manifests
sa_app		sa_app
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile_Inference		Dockerfile_Inference
Dockerfile_training		Dockerfile_training
README.md		README.md
bdp-group2.pdf		bdp-group2.pdf
env_variables.list		env_variables.list

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A fully GKE managed Sentiment analysis application with language model training and inference schemes automated using github actions

Architecture Overview

Block diagram to show the internal working of this project

Block diagram to show the github action automation to establish the CI/CD pipeline

Installation

Wandb cmds

Run training

Build docker image

Workflow rules :

Screenshots from the frontend :

TODOS

About

Releases 1

Packages

Languages

Big-Data-Programming/sentiment-analysis-gke-pipeline

Folders and files

Latest commit

History

Repository files navigation

A fully GKE managed Sentiment analysis application with language model training and inference schemes automated using github actions

Architecture Overview

Block diagram to show the internal working of this project

Block diagram to show the github action automation to establish the CI/CD pipeline

Installation

Wandb cmds

Run training

Build docker image

Workflow rules :

Screenshots from the frontend :

TODOS

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages