This code provides a question-answering (QA) system using Langchain, which allows you to chat with multiple documents (PDF, TXT, etc.) as sources. The code also includes support for creating a simple Streamlit app for a user-friendly interface.
Note: The conversation shown in the screenshot was based on a PDF by UCSD International Student Office. The information displayed in the chat should not be taken as truth; it is for demonstration purposes only.
This repository includes the following unique features:
-
Persistent Database: The code provides an option for the database to persist between sessions. By specifying a directory for
persist_directory
when creating the database, you can avoid recreating the index each time the code runs. To create a persistent database, use the following code:vectordb = Chroma.from_documents(documents, embedding=embedding, persist_directory='db')
-
Customizable Prompts: The code clearly demonstrates how the prompts are sent to the language model (LLM) under the hood. This allows you to easily understand and modify the prompts to tailor the responses for your specific use case. You can explore and adjust the prompts in the following code section:
# Print the chat prompts print(qa_chain.combine_documents_chain.llm_chain.prompt.messages[0].prompt.template) print(qa_chain.combine_documents_chain.llm_chain.prompt.messages[1].prompt.template)
-
Load and process the documents:
- If you have text files, use the
TextLoader
class. Update the file path in theDirectoryLoader
constructor to the directory containing your text files.
loader = DirectoryLoader('/path/to/text/files/', glob="./*.txt", loader_cls=TextLoader)
- If you have PDF files, use the
PyPDFLoader
class. Update the file path in theDirectoryLoader
constructor to the directory containing your PDF files.
loader = DirectoryLoader('/path/to/pdf/files/', glob="./*.pdf", loader_cls=PyPDFLoader)
- If you have text files, use the
-
Create the document database:
- To create a new database each time:
embedding = OpenAIEmbeddings() vectordb = Chroma.from_documents(documents, embedding=embedding, persist_directory=None)
- To create a database that persists between sessions:
embedding = OpenAIEmbeddings() vectordb = Chroma.from_documents(documents, embedding=embedding, persist_directory='db')
-
Create the question-answering chain:
turbo_llm = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo') retriever = vectordb.as_retriever(search_kwargs={"k": 3}) qa_chain = RetrievalQA.from_chain_type(llm=turbo_llm, chain_type="stuff", retriever=retriever, return_source_documents=True)
-
Use the chat prompts to interact with the QA system:
# Print the chat prompts print(qa_chain.combine_documents_chain.llm_chain.prompt.messages[0].prompt.template) print(qa_chain.combine_documents_chain.llm_chain.prompt.messages[1].prompt.template) # Main loop for user input while True: query = input("Enter your query (or 'q' to quit): ") if query == 'q': break llm_response = qa_chain(query) process_llm_response(llm_response)
-
Run the code:
python your_script.py
The code also includes a simple Streamlit app for a more interactive experience. To run the Streamlit app, follow these steps:
-
Uncomment the necessary code lines in the provided script.
-
Run the Streamlit app:
streamlit run app.py
-
Access the app in your browser by clicking on the external URL provided.