Ask me anything about text or images!
This repository contains a Question Answering web app for both Text-based and Visual Question Answering (VQA).
- For the VQA tasks, we use the BLIP (Bootstrapping Language-Image Pre-training) model for unified vision-language understanding and generation.
- For the text-based tasks, we use the DistilBERT model for question answering.
Information:
- Frontend: Django templates (HTML, CSS, Javascript).
- Backend: Django.
- Database: Not used in this web app.
- Deployment: Docker.
- Developers: Nguyen Bao Tin & Le Huu Trong.
- 1. Introduction
- 2. Technical Overview
- 3. How to Install
- 4. Usage
- 5. How to run automated tests
- 6. References
Visual Question Answering (VQA) is a challenging task that combines computer vision and natural language processing to answer questions about images.
Besides, Text-based Question Answering is a task that aims to answer questions based on a given context. The context can be a paragraph, a document, or a set of documents. The answer to the question is a span of text in the context.
This repository provides a web application that allows users to upload images or enter some paragraphs and ask questions, then the models provide answers based on the visual content of the image or the context of the paragraph.
The core AI models used in this web app are BLIP and DistilBERT.
- The BLIP model is a state-of-the-art vision-language model and it achieves impressive results on various vision-language tasks, including VQA. This web app used the model that was implemented using Pytorch at the original repo of BLIP.
- The DistilBERT model is a smaller version of BERT (Bidirectional Encoder Representations from Transformers) and it is trained to be smaller and faster while retaining most of BERT's accuracy. This web app used the model via Hugging Face API at DistilBERT.
The web app is built using the Django framework. Django provides a convenient and efficient way to handle web requests and build interactive web applications. By utilizing Django, we can easily integrate these models into the web app and provide a seamless user experience.
For the full list of dependencies, see requirement.txt.
To ensure consistent and reproducible installations, this repository is packaged using Docker. Docker allows us to encapsulate the entire application and its dependencies into a container, making it easy to deploy the app on any machine with Docker installed. The Docker image includes all the necessary libraries and dependencies required to run the web app and execute the BLIP model.
To install and run the VQA demo web app, please follow the steps below:
-
Ensure that Docker is installed on your system. You can download and install Docker from the official website: Docker Engine for Ubuntu or Docker Desktop.
-
Clone this repository to your local machine using the following command:
git clone https://github.com/nbtin/qa_web_demo
-
Navigate to the project directory:
cd qa_web_demo
-
Build the Docker image and run the container using the following command:
docker compose up --build
Note: The first time you run the above command, you will need to be patient 😄. This process may take up to 30 minutes depending on your internet speed. This is because the process involves downloading libraries (also includes some libraries to run on GPU if available) and the BLIP model, which has a size of approximately 1.35 GB.
-
Wait for the installation process to complete. Once the downloading is done, the web app will be ready to use.
To use the web app, follow the steps below:
-
Open your web browser and navigate to http://localhost:8080.
-
After logging in, you will be redirected to the home page. In the home page, you can choose to use some user-center functions such as: view profile (and update profile), change password, logout, ... (see the image below for more details).
Credit
: I used the user management system from this repo -
If you want to you the main functions of this web app, please click on
Use Question Answering System
button. You will be redirected to the main page of the web app. -
After upload an image or enter some text, you can ask one or more questions about the context by typing them into the input field, separated by a question mark ("?"). For example, you can ask "How many people are there?" or "What are they doing? What color of their shirt?" ...
- You can break the line in question box by holding
Shift
and pressingEnter
.
- You can break the line in question box by holding
-
To submit your questions and obtain answers, press the
Enter
key or click on theAsk AI
button. -
The model will then process your request and provide answers based on the context you provided. The execution time depends on the number of questions asked. The more questions you ask, the longer it takes to execute.
Here is an example of the VQA task, you can do the same with the text-based task:
To run automated tests for this application, you simply open a new terminal and run the following command:
docker exec -it qa-web-app ./app/manage.py test app/qa
After running this command, you will see the test results displayed below:
-
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. - Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi - arXiv:2201.12086 (2022).
-
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. - Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi - Repo.
-
Guides on using Docker for Python application - Docker docs.
-
Django REST API UNIT Testing - Tafadzwa Lameck Nyamukapa - Video.
-
hello ML - User management system - Repo.
-
Hugging Face API - DistilBERT.
-
Install Docker Engine on Ubuntu - Docker docs.
Thanks for going through this Repository! Have a nice day.
Do you have any questions? Feel free to contact me via E-mail.