forked from gulcin/pgvector-rag-app
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from bilge-ince/aidb-rag
Implement aidb into RAG application
- Loading branch information
Showing
13 changed files
with
93 additions
and
165 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,12 @@ | ||
# DATABASE | ||
DB_NAME=vector_test | ||
DB_USER=postgres | ||
DB_PASSWORD=postgres | ||
DB_PASSWORD=password | ||
DB_HOST=localhost | ||
DB_PORT=5432 | ||
DB_PORT=15432 | ||
|
||
# MODEl | ||
AIDB_MODEL_NAME=all-MiniLM-L6-v2 | ||
MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.2 | ||
TOKENIZER_NAME=mistralai/Mistral-7B-Instruct-v0.2 | ||
HUGGING_FACE_ACCESS_TOKEN= |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,20 @@ | ||
# pgvector-rag | ||
An application to demonstrate how can you make a RAG using pgvector and PostgreSQL | ||
# aidb-rag | ||
An application to demonstrate how can you make a RAG using EDB's aidb and PostgreSQL. | ||
|
||
![Sample Chat Console Output](/imgs/chat%20console.png) | ||
|
||
## Requirements | ||
- Python3 | ||
- PostgreSQL | ||
- pgvector | ||
- aidb | ||
|
||
## Install | ||
|
||
Clone the repository | ||
|
||
``` | ||
git clone [email protected]:gulcin/pgvector-rag.git | ||
cd pgvector-rag | ||
git clone [email protected]:gulcin/aidb-rag-app.git | ||
cd aidb-rag-app | ||
``` | ||
|
||
Install Dependencies | ||
|
@@ -31,10 +33,16 @@ cp .env-example .env | |
|
||
## Run | ||
|
||
First run your `aidb` extension by following the step by step installation guide: https://www.enterprisedb.com/docs/edb-postgres-ai/ai-ml/install-tech-preview/ | ||
|
||
Make sure your aidb extension is ready to accept connections. Then you can continue as follows: | ||
|
||
``` | ||
python app.py --help | ||
usage: app.py [-h] {create-db,import-data,chat} ... | ||
usage: app.py [-h] {create-db,import-data,chat} {data_source} | ||
e.g: python app.py import-data sample.pdf | ||
Application Description | ||
|
@@ -49,34 +57,3 @@ Subcommands: | |
chat Use chat feature | ||
``` | ||
|
||
## Run UI | ||
|
||
We use Streamlit for creating a simple Graphical User Interface for our pgvector-rag app. | ||
|
||
To be able to run Streamlit please do the following: | ||
|
||
``` | ||
pip install streamlit | ||
``` | ||
|
||
**Add keys/secrets to Streamlit secrets** | ||
|
||
If you need to store secrets that Streamlit app will use, you can do this by creating | ||
`.streamlit/secrets.toml` file under Streamlit directory and adding lines like following: | ||
|
||
``` | ||
# .streamlit/secrets.toml | ||
OPENAI_API_KEY = "YOUR_API_KEY" | ||
``` | ||
**Run Streamlit app for generating UI** | ||
|
||
``` | ||
streamlit run chatgptui.py | ||
``` | ||
You can create as many apps you'd like and place them under Streamlit directory, | ||
edit the keys if needed and run them like described above. | ||
|
||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,30 +1,22 @@ | ||
import numpy as np | ||
|
||
from db import get_connection | ||
from embedding import generate_embeddings, read_pdf_file | ||
|
||
|
||
def import_data(args, model, device, tokenizer): | ||
def import_data(args): | ||
data = read_pdf_file(args.data_source) | ||
|
||
embeddings = [ | ||
generate_embeddings(tokenizer=tokenizer, model=model, device=device, text=line) | ||
for line in data | ||
] | ||
|
||
conn = get_connection() | ||
cursor = conn.cursor() | ||
|
||
# Store each embedding in the database | ||
for i, (doc_fragment, embedding) in enumerate(embeddings): | ||
for i, (doc_fragment) in enumerate(data): | ||
cursor.execute( | ||
"INSERT INTO embeddings (id, doc_fragment, embeddings) VALUES (%s, %s, %s)", | ||
(i, doc_fragment, embedding[0]), | ||
"INSERT INTO documents (id, doc_fragment) VALUES (%s, %s)", | ||
(i, doc_fragment), | ||
) | ||
conn.commit() | ||
|
||
generate_embeddings() | ||
print( | ||
"import-data command executed. Data source: {}".format( | ||
args.data_source | ||
) | ||
) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,6 @@ psycopg2 | |
transformers | ||
torch | ||
black | ||
pgvector | ||
PyPDF2 | ||
bitsandbytes | ||
accelerate |
Binary file not shown.
This file was deleted.
Oops, something went wrong.
Empty file.