You can open this Jupyter notebook directly in Google Colab by clicking the link below:
In this Lab we will show you how to Generated/analyse contracts using GPT-3.5-turbo 16k context length and adding RAG to solve the issue that we face in Lab 0.1.
As NVIDIA quotes it in the best and easiest way possible "Retrieval-augmented generation (RAG) is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources."
It is needed for mapping the best embeddings there are in a knowledge base index, to the prompt that will be submnitted to the model. The model will use these embeddings along with its generative AI capability to respond to the query in the best possible way.
In the notebook RAG_Responsible_AI.ipynb, we have curated the easiest implementation of RAG, using newer technologies like creating embeddings and using faiss for creating index of embeddings. Also we have implemented sentence transformers all of which you can see in action upon running the notebook.
- OPEN_API_KEY (You can create OpenAI key using these Instruction: Click Here)
- Google colab setup. Follow the Instructions here to set them up: Click Here
- You can find some contracts here: Small Doc, Large Doc
You can create OpenAI key using these Instruction: Click Here
Click on this link to go to the Colab:
Paste the OpenAI key here in Cell 2
Step 1:
Run cell 1 to install all the required packages to run the notebook.
If you get this prompt, you can click on Run Anyway
.
Step 2:
Run cell 2 to import all the required modules from the packages to be used in the notebook also this will load the keys and credentials in the environment.
Step 3:
Run cell 3 to load the sentence transformer model, read the description in the notebook as to why we are using all-MiniLM for sentence transformation.
Step 4:
Run cell 4 to load the PDF, paste the path to the PDF here.
Step 5:
Run cell 5 to create chunks of the document and store them is a Knowledge base
Step 6:
Run cell 6, to use the knowledge_base to create the vector index
Step 7:
Run cell 8 to load the answer_question()
function in the memory, this function when called will retrieve relevant chunks on the basis of the question from the knowledge base.
Step 8:
Run cell 9 to load return_RAG_passage()
function that is used for getting the most relevant chunks and converting them to proper formatted prompt following this pattern:
<Context1>
Chunk 1
</Context1>
<Context2>
Chunk 2
</Context2>
<Context3>
Chunk 3
</Context3>
.
.
.
<ContextN>
Chunk N
</ContextN>
<Question>The question that is being asked about the document</Question>
Step 9:
Running cell 10 will initialize the CallOpenAI()
function for further usage/calling.
Step 10:
Running cell 11 will initialize the quesiton and call the return_RAG_passage()
function and the relevant chunks are returned.
Step 11:
Run cell 12 to combine the question with the RAG chunks that were created and will print the token count of the entire prompt.
Step 12:
Run cell 13 to call the CallOpenAI()
function to get the response from GPT.
Step 13:
Run cell 14 to print the response from the GPT.
Step 14:
Now we will try to do the same thing with a larger document, preferably the one that we faced problems with in Lab 0.1 due to context length.
Run cell 15 to load the larger document, paste the path to your PDF here.
Step 15:
Run cell 16 to create chunks of the document and store them is a Knowledge base
Step 16:
Run cell 17 to use the knowledge_base to create the vector index
Step 17:
Running cell 19 will initialize the quesiton and call the return_RAG_passage()
function and the relevant chunks are returned.
Step 18:
Run cell 20 to combine the question with the RAG chunks that were created and will print the token count of the entire prompt.
Step 19:
Run cell 21 to call the CallOpenAI()
function to get the response from GPT.
Step 18:
Run cell 18 to print the response.