🩹 broken link fix (lancedb#186)

raghavdixit99 · May 13, 2024 · bbc96f0 · bbc96f0
1 parent 7bdef20
commit bbc96f0
Show file tree

Hide file tree

Showing 6 changed files with 50 additions and 36 deletions.
diff --git a/README.md b/README.md
@@ -61,7 +61,7 @@ If you're looking for in-depth tutorial-like examples, checkout the [tutorials](
 | [Evaluating Prompts with Prompttools](/examples/prompttools-eval-prompts/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/prompttools-eval-prompts/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>   [![LLM](https://img.shields.io/badge/openai-api-white)](#) [![local LLM](https://img.shields.io/badge/local-llm-green)](#) [![advanced](https://img.shields.io/badge/advanced-FF3333)](#)|  |
 | [AI Agents: Reducing Hallucination](/examples/reducing_hallucinations_ai_agents/) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/reducing_hallucinations_ai_agents/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> [![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](./examples/reducing_hallucinations_ai_agents/main.py) [![JS](https://img.shields.io/badge/javascript-%23323330.svg?style=for-the-badge&logo=javascript&logoColor=%23F7DF1E)](./examples/reducing_hallucinations_ai_agents/index.js) [![LLM](https://img.shields.io/badge/openai-api-white)](#) [![advanced](https://img.shields.io/badge/advanced-FF3333)](#) |[![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/how-to-reduce-hallucinations-from-llm-powered-agents-using-long-term-memory-72f262c3cc1f/)|
 | [AI Trends Searcher with CrewAI](./examples/AI-Trends-with-CrewAI/) |<a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/AI-Trends-with-CrewAI/CrewAI_AI_Trends.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>  [![LLM](https://img.shields.io/badge/openai-api-white)](#)    [![beginner](https://img.shields.io/badge/beginner-B5FF33)](#)|[![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/track-ai-trends-crewai-agents-rag/)|
-| [SuperAgent Autogen](/examples/SuperAgent_Autogen) |<a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/SuperAgent_Autogen/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> [![LLM](https://img.shields.io/badge/openai-api-white)](#) [![intermediate](https://img.shields.io/badge/intermediate-FFDA33)](#)|[![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/optimizing-ai-agents-harnessing-openai-compatible-technologies-and-vector-databases)|
+| [SuperAgent Autogen](/examples/SuperAgent_Autogen) |<a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/SuperAgent_Autogen/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> [![LLM](https://img.shields.io/badge/openai-api-white)](#) [![intermediate](https://img.shields.io/badge/intermediate-FFDA33)](#)||
 [Sentiment Analysis : Analysing Hotel Reviews](/examples/Sentiment-Analysis-Analyse-Hotel-Reviews/) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/Sentiment-Analysis-Analyse-Hotel-Reviews/Sentiment_Analysis_using_LanceDB.ipynb) [![local LLM](https://img.shields.io/badge/local-llm-green)](#) [![beginner](https://img.shields.io/badge/beginner-B5FF33)](#)| [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/sentiment-analysis-using-lancedb-2da3cb1e3fa6)|
 | [Facial Recognition](./examples/facial_recognition) | <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/facial_recognition/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> [![beginner](https://img.shields.io/badge/beginner-B5FF33)](#)|
 | [Imagebind demo app](/examples/imagebind_demo/) | <a href="https://huggingface.co/spaces/raghavd99/imagebind2"><img src="https://huggingface.co/datasets/huggingface/brand-assets/resolve/main/hf-logo-with-title.svg" alt="hf spaces" style="width: 80px; vertical-align: middle; background-color: white;"></a>  [![intermediate](https://img.shields.io/badge/intermediate-FFDA33)](#)|
@@ -101,14 +101,14 @@ Looking to get started with LLMs, vectorDBs, and the world of Generative AI? The
 | [Local RAG from Scratch with Llama3](./tutorials/Local-RAG-from-Scratch) | [![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)](./tutorials/Local-RAG-from-Scratch/rag.py) [![local LLM](https://img.shields.io/badge/local-llm-green)](#) [![beginner](https://img.shields.io/badge/beginner-B5FF33)](#)|  |
 | [A Primer on Text Chunking and its Types](./tutorials/different-types-text-chunking-in-RAG) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/tutorials/different-types-text-chunking-in-RAG/Text_Chunking_on_RAG_application_with_LanceDB.ipynb) [![beginner](https://img.shields.io/badge/beginner-B5FF33)](#)| [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/a-primer-on-text-chunking-and-its-types-a420efc96a13) |
 | [Langchain LlamaIndex Chunking](./tutorials/Langchain-LlamaIndex-Chunking) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/tutorials/Langchain-LlamaIndex-Chunking/Langchain_Llamaindex_chunking.ipynb) [![beginner](https://img.shields.io/badge/beginner-B5FF33)](#)| [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/chunking-techniques-with-langchain-and-llamaindex/) |
-| [Comparing Cohere Rerankers with LanceDB](./tutorials/cohere-reranker) | [![beginner](https://img.shields.io/badge/beginner-B5FF33)](#)| [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)]() |
+| [Comparing Cohere Rerankers with LanceDB](./tutorials/cohere-reranker) | [![beginner](https://img.shields.io/badge/beginner-B5FF33)](#)| [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/benchmarking-cohere-reranker-with-lancedb/) |
 | [NER powered Semantic Search](./tutorials/NER-powered-Semantic-Search) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/tutorials/NER-powered-Semantic-Search/NER_powered_Semantic_Search_with_LanceDB.ipynb) [![local LLM](https://img.shields.io/badge/local-llm-green)](#) [![beginner](https://img.shields.io/badge/beginner-B5FF33)](#)| [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/ner-powered-semantic-search-using-lancedb-51051dc3e493) |
-| [Product Quantization: Compress High Dimensional Vectors](https://blog.lancedb.com/product-quantization-compress-high-dimensional-vectors-dfcba98fab47) |[![intermediate](https://img.shields.io/badge/intermediate-FFDA33)](#) | [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/product-quantization-compress-high-dimensional-vectors-dfcba98fab47) |
+| [Product Quantization: Compress High Dimensional Vectors](https://blog.lancedb.com/benchmarking-lancedb-92b01032874a-2/) |[![intermediate](https://img.shields.io/badge/intermediate-FFDA33)](#) | [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/benchmarking-lancedb-92b01032874a-2/) |
 | [Corrective RAG with Langgraph](./tutorials/Corrective-RAG-with_Langgraph/) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/tutorials/Corrective-RAG-with_Langgraph/CRAG_with_Langgraph.ipynb) [![LLM](https://img.shields.io/badge/openai-api-white)](#) [![intermediate](https://img.shields.io/badge/intermediate-FFDA33)](#)| [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/implementing-corrective-rag-in-the-easiest-way-2/)|
 | [LLMs, RAG, & the missing storage layer for AI](https://blog.lancedb.com/llms-rag-the-missing-storage-layer-for-ai-28ded35fa984) | [![intermediate](https://img.shields.io/badge/intermediate-FFDA33)](#)| [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/llms-rag-the-missing-storage-layer-for-ai-28ded35fa984/) |
 | [Fine-Tuning LLM using PEFT & QLoRA](./tutorials/fine-tuning_LLM_with_PEFT_QLoRA) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/tutorials/fine-tuning_LLM_with_PEFT_QLoRA/main.ipynb) [![local LLM](https://img.shields.io/badge/local-llm-green)](#) [![advanced](https://img.shields.io/badge/advanced-FF3333)](#)| [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/optimizing-llms-a-step-by-step-guide-to-fine-tuning-with-peft-and-qlora-22eddd13d25b) |
 | [Context-Aware Chatbot using Llama 2 & LanceDB](./tutorials/chatbot_using_Llama2_&_lanceDB) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/tutorials/chatbot_using_Llama2_&_lanceDB/main.ipynb) [![local LLM](https://img.shields.io/badge/local-llm-green)](#) [![advanced](https://img.shields.io/badge/advanced-FF3333)](#)| [![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/context-aware-chatbot-using-llama-2-lancedb-as-vector-database-4d771d95c755) |
-| [Better RAG with FLARE](./tutorials/better-rag-FLAIR) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/tutorials/better-rag-FLAIR/main.ipynb) [![local LLM](https://img.shields.io/badge/local-llm-green)](#) [![LLM](https://img.shields.io/badge/openai-api-white)](#) [![advanced](https://img.shields.io/badge/advanced-FF3333)](#)|[![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://medium.com/@aksdesai1998/better-rag-enhancing-ai-with-active-retrieval-augmented-generation-flare-3b66646e2a9f) |
+| [Better RAG with FLARE](./tutorials/better-rag-FLAIR) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/tutorials/better-rag-FLAIR/main.ipynb) [![local LLM](https://img.shields.io/badge/local-llm-green)](#) [![LLM](https://img.shields.io/badge/openai-api-white)](#) [![advanced](https://img.shields.io/badge/advanced-FF3333)](#)|[![Ghost](https://img.shields.io/badge/ghost-000?style=for-the-badge&logo=ghost&logoColor=%23F7DF1E)](https://blog.lancedb.com/better-rag-with-active-retrieval-augmented-generation-flare-3b66646e2a9f/) |
 
 
 

diff --git a/applications/Healthcare_chatbot/main.py b/applications/Healthcare_chatbot/main.py
@@ -28,26 +28,24 @@
 
 app.add_middleware(
     CORSMiddleware,
-    allow_origins=["*"],  
+    allow_origins=["*"],
     allow_credentials=True,
-    allow_methods=["*"],  
-    allow_headers=["*"],  
+    allow_methods=["*"],
+    allow_headers=["*"],
 )
 
 
 # Load the document
 DATA_PATH = "data/"
 
 
-loader = DirectoryLoader(DATA_PATH,
-                             glob='*.pdf',
-                             loader_cls=PyPDFLoader)
+loader = DirectoryLoader(DATA_PATH, glob="*.pdf", loader_cls=PyPDFLoader)
 
 docs = loader.load()
 logging.info("Document loader done.")
 
 # Set up the text processing and model chain
-#llm = ChatOpenAI(model="gpt-4", temperature=0, openai_api_key=OPENAI_API_KEY)
+# llm = ChatOpenAI(model="gpt-4", temperature=0, openai_api_key=OPENAI_API_KEY)
 
 # download weights from https://huggingface.co/PrunaAI/OpenBioLLM-Llama3-8B-GGUF-smashed/tree/main
 llm = LlamaCpp(
@@ -57,7 +55,9 @@
     verbose=False,  # Verbose is required to pass to the callback manager
 )
 
-embeddings_med = SentenceTransformerEmbeddings(model_name="NeuML/pubmedbert-base-embeddings")
+embeddings_med = SentenceTransformerEmbeddings(
+    model_name="NeuML/pubmedbert-base-embeddings"
+)
 text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
 
 logging.info("Embedding and LLM setup done.")
@@ -68,7 +68,9 @@
 logging.info("Retriever setup done.")
 
 compressor = CohereRerank(cohere_api_key=COHERE_API_KEY)
-compression_retriever = ContextualCompressionRetriever(base_compressor=compressor, base_retriever=retriever)
+compression_retriever = ContextualCompressionRetriever(
+    base_compressor=compressor, base_retriever=retriever
+)
 logging.info("Cohere compression retriever setup done.")
 
 chain = RetrievalQA.from_chain_type(llm=llm, retriever=compression_retriever)
@@ -80,17 +82,18 @@
 class QueryRequest(BaseModel):
     query: str
 
+
 @app.post("/query/", response_model=dict)
 async def handle_query(request: QueryRequest):
     try:
         compressed_docs = compression_retriever.invoke(request.query)
         # Assuming pretty_print_docs function returns a string
         response = chain({"query": request.query})
-        print("response",response['result'])
-        return {"answer": response['result']}
+        print("response", response["result"])
+        return {"answer": response["result"]}
     except Exception as e:
         raise HTTPException(status_code=500, detail=str(e))
 
+
 if __name__ == "__main__":
     uvicorn.run(app, host="0.0.0.0", port=8000)
-
diff --git a/assets/critique-based-contexting.png b/assets/critique-based-contexting.png
diff --git a/examples/databricks_DBRX_website_bot/main.py b/examples/databricks_DBRX_website_bot/main.py
@@ -15,7 +15,7 @@ def get_doc_from_url(url):
 def build_RAG(
     url="https://harrypotter.fandom.com/wiki/Hogwarts_School_of_Witchcraft_and_Wizardry",
     embed_model="mixedbread-ai/mxbai-embed-large-v1",
-    uri="~/tmp/lancedb_hogwarts_12",
+    uri="~/tmp/lancedb_hogwart",
     force_create_embeddings=False,
 ):
     Settings.embed_model = HuggingFaceEmbedding(model_name=embed_model)

diff --git a/examples/reducing_hallucinations_ai_agents/README.md b/examples/reducing_hallucinations_ai_agents/README.md
@@ -4,7 +4,7 @@ AI agents can help simplify and automate tedious workflows. By going through thi
 
 Colab walkthrough - <a href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/reducing_hallucinations_ai_agents/main.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a>
 
-![Untitled (34)](https://github.com/lancedb/vectordb-recipes/assets/15766192/e87d5fcc-6f04-4592-b9ec-0156ee2c98df)
+![alt text](../../assets/critique-based-contexting.png)
 
 
 ### Setup

diff --git a/tutorials/cohere-reranker/main.py b/tutorials/cohere-reranker/main.py
@@ -23,16 +23,16 @@ def evaluate(
     query_type="auto",
     verbose=False,
 ):
-    #corpus = dataset['corpus']
-    #queries = dataset['queries']
-    #relevant_docs = dataset['relevant_docs']
+    # corpus = dataset['corpus']
+    # queries = dataset['queries']
+    # relevant_docs = dataset['relevant_docs']
 
     vector_store = LanceDBVectorStore(uri=f"/tmp/lancedb_cohere-bench-{time.time()}")
     storage_context = StorageContext.from_defaults(vector_store=vector_store)
     service_context = ServiceContext.from_defaults(embed_model=embed_model)
     index = VectorStoreIndex.from_documents(
         docs,
-        service_context=service_context, 
+        service_context=service_context,
         show_progress=True,
         storage_context=storage_context,
     )
@@ -42,37 +42,48 @@ def evaluate(
     eval_results = []
     ds = dataset.to_pandas()
     for idx in tqdm(range(len(ds))):
-        query = ds['query'][idx]
-        reference_context = ds['reference_contexts'][idx]
+        query = ds["query"][idx]
+        reference_context = ds["reference_contexts"][idx]
         query_vector = embed_model.get_query_embedding(query)
         try:
             if reranker is None:
                 rs = tbl.search(query_vector).limit(top_k).to_pandas()
             elif query_type == "auto":
-                rs = tbl.search((query_vector, query)).rerank(reranker=reranker).limit(top_k).to_pandas()
+                rs = (
+                    tbl.search((query_vector, query))
+                    .rerank(reranker=reranker)
+                    .limit(top_k)
+                    .to_pandas()
+                )
             elif query_type == "vector":
-                rs = tbl.search(query_vector).rerank(reranker=reranker, query_string=query).limit(top_k*2).to_pandas() # Overfetch for vector only reranking
+                rs = (
+                    tbl.search(query_vector)
+                    .rerank(reranker=reranker, query_string=query)
+                    .limit(top_k * 2)
+                    .to_pandas()
+                )  # Overfetch for vector only reranking
         except Exception as e:
-            print(f'Error with query: {idx} {e}')
+            print(f"Error with query: {idx} {e}")
             continue
-        retrieved_texts = rs['text'].tolist()[:top_k]
+        retrieved_texts = rs["text"].tolist()[:top_k]
         expected_text = reference_context[0]
         is_hit = expected_text in retrieved_texts  # assume 1 relevant doc
         eval_result = {
-            'is_hit': is_hit,
-            'retrieved': retrieved_texts,
-            'expected': expected_text,
-            'query': query,
+            "is_hit": is_hit,
+            "retrieved": retrieved_texts,
+            "expected": expected_text,
+            "query": query,
         }
         eval_results.append(eval_result)
     return eval_results
 
+
 rag_dataset = LabelledRagDataset.from_json("./data/rag_dataset.json")
 documents = SimpleDirectoryReader(input_dir="./data/source_files").load_data()
 
 embed_models = {
-"bge": HuggingFaceEmbedding(model_name="BAAI/bge-large-en-v1.5"),
-"colbert": HuggingFaceEmbedding(model_name="colbert-ir/colbertv2.0")
+    "bge": HuggingFaceEmbedding(model_name="BAAI/bge-large-en-v1.5"),
+    "colbert": HuggingFaceEmbedding(model_name="colbert-ir/colbertv2.0"),
 }
 rerankers = {
     "None": None,
@@ -93,7 +104,7 @@ def evaluate(
             verbose=True,
         )
         print(f" Embedder {embed_name} Reranker: {reranker_name}")
-        score = pd.DataFrame(eval_results)['is_hit'].mean()
+        score = pd.DataFrame(eval_results)["is_hit"].mean()
         print(score)
         scores[reranker_name] = score
 
@@ -108,6 +119,6 @@ def evaluate(
                 verbose=True,
             )
             print(f"Embedder {embed_name} Reranker: {reranker_name} (vector)")
-            score = pd.DataFrame(eval_results)['is_hit'].mean()
+            score = pd.DataFrame(eval_results)["is_hit"].mean()
             print(score)
             scores[f"{reranker_name}_vector"] = score