Python latest version is giving some errors in installing Gensim and another library, so I recommend you to consider python 3.10 version.
-
Open VSCode > Terminal (command prompt) and continue the below commands:
# Create a new directory mkdir Graphrag # Create a new conda environment with Python 3.10 conda create -p ./graphragvenv python=3.10 # Activate the created conda environment conda activate ./graphragvenv # Install and update pip python -m pip install --upgrade pip # Install and upgrade the setuptools package python -m pip install --upgrade setuptools
-
Install GraphRAG:
# Install GraphRAG pip install graphrag # If installing GraphRAG raises any error, run the below command python -m pip install --no-cache-dir --force-reinstall gensim # Then install GraphRAG again pip install graphrag
-
Initialize GraphRAG:
python -m graphrag.index --init --root .
Running the above command, GraphRAG is initiated and creates folders and files.
-
Creating the Deployments of LLM Model and Embedding Model in Azure OpenAI:
- For embedding, the model is
text-embedding-small
- For LLM model, recommended is
GPT-4o
.
I have tried with
GPT-35-turbo-instruct
which is not working, raising some issues on creating the graphs at the end. I recommend you to useGPT-4o
, and the model is very good at providing accurate answers. You can also check which other modelsAfter creating the deployments, you need to make some changes in the
settings.yaml
file. - For embedding, the model is
-
Configure OPENAI_API_KEY:
.env
is also created when you initialize GraphRAG, you can configure yourOPENAI_API_KEY
there. -
Make Changes in
settings.yaml
File:Go to the
llm
section:- Change the
api_key
from${GRAPHRAG_API_KEY}
to${OPENAI_API_KEY}
- Change
type
fromopenai_chat
toazure_openai_chat
- Change
model
to the model name that you deployed in Azure OpenAI.- In my case, I have taken
GPT-4o
, you can try with other models as well.
- In my case, I have taken
- Uncomment the
api_base
and replace that URL with the endpoint of the Azure OpenAI- In my case
<your_azure_openai_resource_group_name>
is used, you can give any name to the instance.
- In my case
- Uncomment the
api_version
, no need to make changes in that, you can use the defaultapi_version
- Uncomment the
deployement_name
, and replace that with the deployment you have given to the model while creating the deployment.- In my case, I have taken
<deployment_name_of_model>
- In my case, I have taken
In the same
settings.yaml
file, go to theembeddings
section and make some changes:- Change the
api_key
from${GRAPHRAG_API_KEY}
to${OPENAI_API_KEY}
- Change
type
fromopenai_embeddings
toazure_openai_embeddings
- Change
model
to the model name that you deployed in Azure OpenAI.- In my case, I have taken
text-embedding-small
, you can try with other models as well.
- In my case, I have taken
- Uncomment the
api_base
and replace that URL with the endpoint of the Azure OpenAI- In my case
<your_azure_openai_resource_group_name>
is used, you can give any name to the instance.
- In my case
api_version
, you can comment that or you can uncomment and use that. It won't raise an error- Uncomment the
deployement_name
, and replace that with the deployment you have given to the model while creating the deployment.- In my case, I have taken
<deployment_name_of_model>
- In my case, I have taken
That's it. Now save the changes that you made in the
settings.yaml
file. - Change the
-
Add Input File:
You need to add an input text file, to do that first create a folder by running the below command:
mkdir input
Now in this
input
folder, you need to add the.txt
file (input text data file). -
Run GraphRAG to Create Graphs on the Data:
python -m graphrag.index --root .
This command will run the GraphRAG and create the parquet files. This will convert all your text data into entities and relationships graphs. You can check that in
output > last folder > artifacts folder
(you have parquet files, which are converted data into graphs). -
Evaluate Our RAG Model:
Now we need to test our RAG model, to do that we need to ask questions. We are doing things in the command prompt, we need to ask questions to our model along with a command. To ask a question, you need to write a command and then the question:
python -m graphrag.query --root . --method local/global "Your_Question"
When you run this command, the model uses either
local
(if writtenlocal
) orglobal
(if writtenglobal
) to answer the question.To confirm the model is accurate, we can copy any word from the output of the model and find it in the text file:
- Copy the word you need to check
- Go to the text file (input file) and hit
ctrl+f
, a search bar will open - Paste that copied word in this search bar, use the arrows to navigate to the words in the text file.