📄 Document question answering template

A simple Streamlit app that answers questions about an uploaded document via OpenAI's GPT-3.5.

How to run it on your own machine

Prerequisites

Python version 3.12
Java 7 or higher version installed

Dev Setup

Create Python venv Open a terminal and run:

python3.12 -m venv .venv
source .venv/bin/activate

Install the requirements
```
$ pip install -r requirements.txt
```
Run the app
```
$ streamlit run streamlit_app.py
```

Users and Roles

Create a file called secrets.toml inside the .streamlit director and add the following information.

API_KEY="<your-OpenAI-api-key>"

[passwords]
# Follow the rule: username = "password"
<user> = "<password>"

[roles]
# Follow the rule: username = "role"
<user> = "<role>"

Replace <user> with actual user names for login to the application. Replace <role> with one of user, admin or super-admin

Troubleshooting

Certificate issues preventing text extraction The application uses the Apache Tika port of Python for extracting text from Documents. To run this, the system requires Java 7+ installed on the machine. For MacOS running the code might cause the below exception at the time of uploading the document(s)
```
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)
```

To resolve, consider going through the steps provided on this Stackoverflow question

Known Issues

Token Limit At the moment the application tries to embed the whole document text into one single document in the Vector DB. If the document size hits the token limit for the embedding model then document upload does not work.
Chat Errors due to Token Limit At the moment no attempt has been made to strip down the quantum of content sent to the AI for RAG. The code does limit the number of documents sent for RAG, but if the sum of tokens for all the documents is more than the limit of the model, we get an error.
Delete buttons work but the solution is not efficient. Need to figure out how to efficiently delete individual documents from FAISS when the indexing was done using the Langchain Indexing APIs

References

Streamlit Docs

Langchain How To Guides

Langchain Docs on RAG

Medium Blog Links

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.devcontainer		.devcontainer
.github		.github
.streamlit		.streamlit
pages		pages
persistence		persistence
prompts		prompts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
menu.py		menu.py
packages.txt		packages.txt
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py
uninstall.txt		uninstall.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 Document question answering template

How to run it on your own machine

Prerequisites

Dev Setup

Users and Roles

Troubleshooting

Known Issues

References

About

Releases

Packages

Languages

License

vshanbha/RAGPoC

Folders and files

Latest commit

History

Repository files navigation

📄 Document question answering template

How to run it on your own machine

Prerequisites

Dev Setup

Users and Roles

Troubleshooting

Known Issues

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages