add more content

jerpint · Oct 27, 2023 · 09f77b3 · 09f77b3
1 parent 809787a
commit 09f77b3
Show file tree

Hide file tree

Showing 3 changed files with 128 additions and 7 deletions.
diff --git a/docs/index.rst b/docs/index.rst
@@ -22,6 +22,13 @@ We scraped the documentation of `huggingface 🤗 Transformers <https://huggingf
    :maxdepth: 2
 
    usage/installation
+
+
+.. toctree::
+   :caption: Customization
+   :maxdepth: 1
+
+   usage/components
    usage/configuration
    usage/custom_docs
 
@@ -32,11 +39,6 @@ We scraped the documentation of `huggingface 🤗 Transformers <https://huggingf
 
    usage/components_overview
 
-.. toctree::
-   :maxdepth: 2
-   :caption: API Reference
-
-   autoapi/index
 
 Useful links
 ============

diff --git a/docs/usage/components.md b/docs/usage/components.md
@@ -0,0 +1,19 @@
+# Buster Components
+
+Buster is built around components that can be customized and extended.
+
+For example, to do chat completion, we must use a `Completer` component.
+While we've implemented some completers like `ChatGPT`, adding more completers is possible by inheriting from the `Completer` base class.
+
+Currently, buster implements the following components:
+
+* `Completer`: The language model responsible for generating a response
+* `Retriever`: Responsible for fetching the documents associated to a user's input
+* `DocumentsFormatter`: Responsible for taking the various documents and formatting them in different ways. We support formatting documents into json-like objects and html-like objects.
+* `PromptFormatter`: Responsible for combining the formatted documents with the prompts for the LLM
+* `Validator`: Responsible for validating user inputs and/or model outputs. This can be implemented via checks of the questions and answer before and after completions occur.
+* `Tokenizer`: Used to monitor the length of prompts and completions. It is generally assumed that the `Tokenizer` is associated to that of the `Completer`.
+
+
+Additional components are also available for managing documents:
+* `DocumentManager`: Manager allowing to generate and store embeddings (should be used in conjunction with `Retriever` components)
diff --git a/docs/usage/configuration.md b/docs/usage/configuration.md
@@ -1,3 +1,103 @@
-# Configuration
+# Configuration of Components
 
-Buster uses a config file to setup most of the app.
+Buster's internal configuration is controlled via the `BusterConfig` object.
+It is meant to set all of the different parameters for the different components in one place.
+
+Here is a typical setup:
+
+```python
+from buster.busterbot import BusterConfig
+
+buster_cfg = BusterConfig(
+    retriever_cfg={
+        "path": "deeplake_store",
+        "top_k": 3,
+        "thresh": 0.7,
+        "max_tokens": 2000,
+        "embedding_model": "text-embedding-ada-002",
+    },
+    validator_cfg={
+        "unknown_response_templates": [
+            "I'm sorry, but I am an AI language model trained to assist with questions related to AI. I cannot answer that question as it is not relevant to the library or its usage. Is there anything else I can assist you with?",
+        ],
+        "unknown_threshold": 0.85,
+        "embedding_model": "text-embedding-ada-002",
+        "use_reranking": True,
+        "invalid_question_response": "This question does not seem relevant to my current knowledge.",
+        "check_question_prompt": """You are an chatbot answering questions on artificial intelligence.
+
+A user will submit a question. Respond 'true' if it is valid, respond 'false' if it is invalid.""",
+        "completion_kwargs": {
+            "model": "gpt-3.5-turbo",
+            "stream": False,
+            "temperature": 0,
+        },
+    },
+    documents_answerer_cfg={
+        "no_documents_message": "No documents are available for this question.",
+    },
+    completion_cfg={
+        "completion_kwargs": {
+            "model": "gpt-3.5-turbo",
+            "stream": False,
+            "temperature": 0,
+        },
+    },
+    tokenizer_cfg={
+        "model_name": "gpt-3.5-turbo",
+    },
+    documents_formatter_cfg={
+        "max_tokens": 3500,
+        "columns": ["content", "title", "source"],
+    },
+    prompt_formatter_cfg={
+        "max_tokens": 3500,
+        "text_before_docs": (
+            "You are a chatbot assistant answering technical questions about artificial intelligence (AI)."
+            "You can only respond to a question if the content necessary to answer the question is contained in the following provided documentation. "
+            "If the answer is in the documentation, summarize it in a helpful way to the user. "
+        ),
+        "text_after_docs": (
+            "REMEMBER:\n"
+            "You are a chatbot assistant answering technical questions about artificial intelligence (AI)."
+            "Here are the rules you must follow:\n"
+            "1) You must only respond with information contained in the documentation above. Say you do not know if the information is not provided.\n"
+            "2) Make sure to format your answers in Markdown format, including code block and snippets.\n"
+            "Now answer the following question:\n"
+        ),
+    },
+)
+```
+
+This `BusterConfig` can then be passed to initialize Buster and all of its components:
+
+```python
+from buster.busterbot import Buster, BusterConfig
+from buster.completers import ChatGPTCompleter, DocumentAnswerer
+from buster.formatters.documents import DocumentsFormatterJSON
+from buster.formatters.prompts import PromptFormatter
+from buster.retriever import DeepLakeRetriever, Retriever
+from buster.tokenizers import GPTTokenizer
+from buster.validators import QuestionAnswerValidator, Validator
+
+def setup_buster(buster_cfg: BusterConfig):
+    """initialize buster with a buster_cfg class"""
+    retriever: Retriever = DeepLakeRetriever(**buster_cfg.retriever_cfg)
+    tokenizer = GPTTokenizer(**buster_cfg.tokenizer_cfg)
+    document_answerer: DocumentAnswerer = DocumentAnswerer(
+        completer=ChatGPTCompleter(**buster_cfg.completion_cfg),
+        documents_formatter=DocumentsFormatterJSON(tokenizer=tokenizer, **buster_cfg.documents_formatter_cfg),
+        prompt_formatter=PromptFormatter(tokenizer=tokenizer, **buster_cfg.prompt_formatter_cfg),
+        **buster_cfg.documents_answerer_cfg,
+    )
+    validator: Validator = QuestionAnswerValidator(**buster_cfg.validator_cfg)
+    buster: Buster = Buster(retriever=retriever, document_answerer=document_answerer, validator=validator)
+    return buster
+
+buster = setup_buster(buster_cfg)
+
+completion = buster.process_input("What is backpropagation?")
+print(completion)
+```
+
+ uses a config file to setup most of the app.