admirer

The Transformees

Description

A website that uses webcam feeds to answer open-ended questions requiring outside knowledge. For more info, check out the ZenML blog post.

Inference Pipeline

The visual question-answering pipeline is inspired by the paper from Microsoft linked in the credit section. In short, we prompt GPT-3 with a generated image caption and object tag list, the question-answer pair, and context examples that demonstrate the task at hand in a few-shot learning method, achieving a BERTScore computed F1 score of around .989 on the test set.

Usage

As a direct consequence of not feeding the image data directly to GPT-3, the best queries involve asking descriptive, counting, or similar questions about one or more objects visible in the background. For example, if there are two people in the image, one wearing a hat and the other wearing glasses, questions that would work well could include the following:

"How many people are in the room?"
"What color is the hat in the picture?"
"How many people are wearing glasses?"

Production

To setup the production server for the website, we:

Create an AWS Lambda function for the backend:

. deploy/aws_login.sh
python deploy/aws_lambda.py

Implement continual development by updating the AWS Lambda backend whenever a commit is pushed to the repo and the BERTScore computed F1 score of the pipeline has improved:
```
. deploy/cont_deploy.sh
```

Development

Contributing

To contribute, check out the guide.

Setup

Install conda if necessary:

# Install conda: https://conda.io/projects/conda/en/latest/user-guide/install/index.html#regular-installation
# If on Windows, install chocolately: https://chocolatey.org/install. Then, run:
# choco install make

Create the conda environment locally:

cd admirer
make conda-update
conda activate admirer
make pip-tools
export PYTHONPATH=.
echo "export PYTHONPATH=.:$PYTHONPATH" >> ~/.bashrc

Install pre-commit:
```
pre-commit install
```
Sign up for an OpenAI account and get an API key here.
Populate a .env file with your key and the backend URL in the format of .env.template, and reactivate the environment.
Sign up for a Weights and Biases account here and download the CLIP ONNX file locally:
```
wandb login
python ./training/stage_model.py --fetch --from_project admirer
```
(Optional) Sign up for an AWS account here and set up your AWS credentials locally, referring to this as needed:
```
aws configure
```

If the instructions aren't working for you, head to this Google Colab, make a copy of it, and run the cells there to get an environment set up.

Repository Structure

The repo is separated into main folders that each describe a part of the ML-project lifecycle, some of which contain interactive notebooks, and supporting files and folders that store configurations and workflow scripts:

.
├── api_serverless  # the backend handler code using AWS Lambda.
├── app_gradio      # the frontend code using Gradio.
├── deploy   # the AWS Lambda backend setup and continuous deployment code.
├── data_manage     # the data management code using AWS S3 for training data and ZenML log storage, boto3 for data exploration, and ZenML + Great Expectations for data validation.
├── load_test       # the load testing code using Locust.
├── monitoring      # the model monitoring code using Gradio's flagging feature.
├── question_answer # the inference code.
├── tasks           # the pipeline testing code.
├── training        # the model development code using PyTorch, PyTorch Lightning, and Weights and Biases.

Testing

From the main directory, there are various ways to test the pipeline:

To start a W&B hyperparameter optimization sweep for the caption model (on one GPU):

. ./training/sweep/sweep.sh
CUDA_VISIBLE_DEVICES=0 wandb agent --project ${PROJECT} --entity ${ENTITY} ${SWEEP_ID}

To train the caption model (add --strategy ddp_find_unused_parameters_false for multi-GPU machines; takes ~7.5 hrs on an 8xA100 Lambda Labs instance):

python ./training/run_experiment.py \
--data_class PICa --model_class ViT2GPT2 --gpus "-1" \
--wandb --log_every_n_steps 25 --max_epochs 300 \
--augment_data True --num_workers "$(nproc)" \
--batch_size 2 --one_cycle_max_lr 0.01 --top_k 780 --top_p 0.65 --max_label_length 50

To test the caption model (best model can be downloaded from here):

python ./training/test_model.py \
--data_class PICa --model_class ViT2GPT2 \
--num_workers "$(nproc)" --load_checkpoint training/model.pth

To start the app locally (uncomment code in PredictorBackend.init and set use_url=False to use the local model instead of the API):
```
python app_gradio/app.py
```

To test the Gradio frontend by launching and pinging the frontend locally:

python -c "from app_gradio.tests.test_app import test_local_run; test_local_run()"

To test the caption model's ability to memorize a single batch:
```
. ./training/tests/test_memorize_caption.sh
```
To run integration tests for the model pipeline:
```
. ./tasks/integration_test.sh
```
To run unit tests for the model pipeline:
```
. ./tasks/unit_test.sh
```
To test the whole model pipeline:
```
. ./tasks/test.sh
```

Code Style

To lint your code:
```
pre-commit run --all-files
```

Credit

GI4E for their database and Scale AI for their annotations.
Facebook for their image segmentation model.
NLP Connect for their base image caption model and Sachin Abeywardana for his fine-tuning code.
OpenAI for their CLIP text and image encoder code and GPT-3 API.
Microsoft for their visual question answering code.

Name		Name	Last commit message	Last commit date
Latest commit History 267 Commits
.aws		.aws
.devcontainer		.devcontainer
.github/workflows		.github/workflows
api_serverless		api_serverless
app_gradio		app_gradio
assets		assets
data/raw/admirer-pica		data/raw/admirer-pica
data_manage		data_manage
deploy		deploy
load_test		load_test
monitoring		monitoring
question_answer		question_answer
requirements		requirements
tasks		tasks
training		training
.dockerignore		.dockerignore
.env.template		.env.template
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

admirer

Contents

The Transformees

Description

Inference Pipeline

Usage

Production

Development

Contributing

Setup

Repository Structure

Testing

Code Style

Credit

About

Releases

Packages

Contributors 3

Languages

License

andrewhinh/admirer

Folders and files

Latest commit

History

Repository files navigation

admirer

Contents

The Transformees

Description

Inference Pipeline

Usage

Production

Development

Contributing

Setup

Repository Structure

Testing

Code Style

Credit

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages