This tool converts natural language queries into SQL commands and fetches data from a configured database. It's an intuitive way for individuals without SQL knowledge to interact with databases, or for experts to streamline and automate their work.
There are two inputs
- schema.sql - The schema of the database
- text/prompt - The text to be converted to SQL. Example - "Number of students in class 5?"
The process is
- Figure out the subject of the query (Could be a query to ChatGPT) - "students"; Map it to a table - "students" -> "student"; Map relevant query params to either table or columns; using ChatGPT;
- A sample prompt
- A sample response -
{ "subject": "student", "relatedTables": ["subject", "midDayMealRecieved", "examMarks"] }
- Find out all the tables relevant to the subject - "student" -> ["student", "class", "teacher"]; This could a second level linkage as well.
- Setup a mock database for that schema (flavour wise - PSQL, SQLite, MySQL, etc.)
- Insert the schema into the mock database
- Run a query like this for the relevant flavour -
SELECT name FROM sqlite_master WHERE type='table'
to the tables having the subject in it. - Return the tables
- Find out all the columns relevant to the tables in the above step.
- Currently return all columns for a table
- Create a
schema-relevant.sql
file with the relevant tables and columns - Create a prompt for the query - "Given this SQL Schema - {schema-relevant.sql}, Can you give a SQL query as a code snippet to "{NL SQL Query}" and don't share with me anything else."
- Send a prompt to ChatGPT
- Return SQL query
- Verify the query on a mock DB -
validate_SQL(sql)
- Assuming this system is single tenant and single database query tool
- Onboard a Schema using the
/onboard
API => schema.sql => already onboarded to the database | P2 /prompt
=> takes in a two param,prompt
andschema_id
and based on that prompt return the SQL if ChatGPT provides a valid SQL.
- Rename .env.sample file to .env
- Update OPENAI_API_KEY with your own openai api key. You can get your own api key by clicking here.
- Now execute the below commands to setup server and other services.
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -r src/server/sql_graph/requirements.txt
docker compose -f docker-compose.gitpod.yml up -d
python -m unittest tests.related_tables
python src/server/app.py
cd src/server/db/mock-data
sudo sh init_mock_data.sh
Note down the Schema_ID that gets stored in the schema_id.txt file. You will have to send this as param in the prompt api.
This will be required for sql_graph.
sudo apt update
sudo apt-get install libgraphviz-dev
sudo apt install pkg-config graphviz-dev -y
pip install pygraphviz
sudo apt-get install libgraphviz-dev
pip3 install --global-option=build_ext --global-option="-I/opt/homebrew/Cellar/graphviz/8.0.5/include/" --global-option="-L/opt/homebrew/Cellar/graphviz/8.0.5/lib/" pygraphviz
brew install graphviz
python -m pip install \
--global-option=build_ext \
--global-option="-I$(brew --prefix graphviz)/include/" \
--global-option="-L$(brew --prefix graphviz)/lib/" \
pygraphviz
curl --location 'https://localhost:5078/prompt/v3' \
--header 'Content-Type: application/json' \
--header 'Authorization: Basic dGVzdDp0ZXN0' \
--data '{
"schema_id": "<Schema ID>",
"prompt": "How many Primary School are there?"
}'
Check our QuickStart guide if you want to contribute to this project.
You can watch this video which will help you in setting up Gitpod for the project.
This tool is released under the MIT License.