Skip to content

Commit

Permalink
deploy: 5fc75c6
Browse files Browse the repository at this point in the history
  • Loading branch information
github-actions[bot] committed Jul 2, 2024
1 parent cebb4e6 commit 8d3a9e1
Show file tree
Hide file tree
Showing 19 changed files with 156 additions and 63 deletions.
Binary file modified .doctrees/apis/core/core.string_parser.doctree
Binary file not shown.
Binary file modified .doctrees/apis/eval/eval.llm_as_judge.doctree
Binary file not shown.
Binary file modified .doctrees/developer_notes/evaluation.doctree
Binary file not shown.
Binary file modified .doctrees/developer_notes/index.doctree
Binary file not shown.
Binary file modified .doctrees/developer_notes/output_parsers.doctree
Binary file not shown.
Binary file modified .doctrees/environment.pickle
Binary file not shown.
4 changes: 2 additions & 2 deletions _modules/core/functional.html
Original file line number Diff line number Diff line change
Expand Up @@ -1437,6 +1437,7 @@ <h1>Source code for core.functional</h1><div class="highlight"><pre>
<span class="sd"> Parse a YAML string to a Python object.</span>
<span class="sd"> yaml_str: has to be a valid YAML string.</span>
<span class="sd"> &quot;&quot;&quot;</span>
<span class="n">yaml_str</span> <span class="o">=</span> <span class="n">yaml_str</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
<span class="k">try</span><span class="p">:</span>
<span class="kn">import</span> <span class="nn">yaml</span>

Expand All @@ -1454,8 +1455,7 @@ <h1>Source code for core.functional</h1><div class="highlight"><pre>
<div class="viewcode-block" id="parse_json_str_to_obj">
<a class="viewcode-back" href="../../apis/core/core.functional.html#core.functional.parse_json_str_to_obj">[docs]</a>
<span class="k">def</span> <span class="nf">parse_json_str_to_obj</span><span class="p">(</span><span class="n">json_str</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]:</span>
<span class="w"> </span><span class="sa">r</span><span class="sd">&quot;&quot;&quot;</span>
<span class="sd"> Parse a JSON string to a Python object.</span>
<span class="w"> </span><span class="sa">r</span><span class="sd">&quot;&quot;&quot;Parse a JSON string to a Python object.</span>
<span class="sd"> json_str: has to be a valid JSON string. Either {} or [].</span>
<span class="sd"> &quot;&quot;&quot;</span>
<span class="n">json_str</span> <span class="o">=</span> <span class="n">json_str</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
Expand Down
6 changes: 3 additions & 3 deletions _modules/core/string_parser.html
Original file line number Diff line number Diff line change
Expand Up @@ -454,7 +454,7 @@ <h1>Source code for core.string_parser</h1><div class="highlight"><pre>



<span class="n">JASON_PARSER_OUTPUT_TYPE</span> <span class="o">=</span> <span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]</span>
<span class="n">JSON_PARSER_OUTPUT_TYPE</span> <span class="o">=</span> <span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">object</span><span class="p">]</span>


<div class="viewcode-block" id="JsonParser">
Expand All @@ -478,7 +478,7 @@ <h1>Source code for core.string_parser</h1><div class="highlight"><pre>

<div class="viewcode-block" id="JsonParser.call">
<a class="viewcode-back" href="../../apis/core/core.string_parser.html#core.string_parser.JsonParser.call">[docs]</a>
<span class="k">def</span> <span class="nf">call</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">input</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">JASON_PARSER_OUTPUT_TYPE</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">call</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">input</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">JSON_PARSER_OUTPUT_TYPE</span><span class="p">:</span>
<span class="nb">input</span> <span class="o">=</span> <span class="nb">input</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">json_str</span> <span class="o">=</span> <span class="n">F</span><span class="o">.</span><span class="n">extract_json_str</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">add_missing_right_brace</span><span class="p">)</span>
Expand All @@ -491,7 +491,7 @@ <h1>Source code for core.string_parser</h1><div class="highlight"><pre>



<span class="n">YAML_PARSER_OUTPUT_TYPE</span> <span class="o">=</span> <span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]</span>
<span class="n">YAML_PARSER_OUTPUT_TYPE</span> <span class="o">=</span> <span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">object</span><span class="p">]</span>


<div class="viewcode-block" id="YamlParser">
Expand Down
22 changes: 12 additions & 10 deletions _modules/eval/llm_as_judge.html
Original file line number Diff line number Diff line change
Expand Up @@ -453,7 +453,7 @@ <h1>Source code for eval.llm_as_judge</h1><div class="highlight"><pre>
<span class="k">class</span> <span class="nc">DefaultLLMJudge</span><span class="p">(</span><span class="n">Component</span><span class="p">):</span>
<span class="vm">__doc__</span> <span class="o">=</span> <span class="sa">r</span><span class="s2">&quot;&quot;&quot;Demonstrate how to use an LLM/Generator to output True or False for a judgement query.</span>

<span class="s2"> You can use any any of your template to adapt to more tasks and sometimes you can directly ask LLM to output a score in range [0, 1] instead of only True or False.</span>
<span class="s2"> You can use any of your template to adapt to more tasks and sometimes you can directly ask LLM to output a score in range [0, 1] instead of only True or False.</span>

<span class="s2"> A call on the LLM judge equalize to _compute_single_item method.</span>

Expand Down Expand Up @@ -495,8 +495,8 @@ <h1>Source code for eval.llm_as_judge</h1><div class="highlight"><pre>

<span class="sd"> Args:</span>
<span class="sd"> question (str): Question string.</span>
<span class="sd"> pred_answer (str): Predicted answer string.</span>
<span class="sd"> gt_answer (str): Ground truth answer string.</span>
<span class="sd"> pred_answer (str): Predicted answer string.</span>
<span class="sd"> judgement_query (str): Judgement query string.</span>

<span class="sd"> Returns:</span>
Expand Down Expand Up @@ -543,7 +543,7 @@ <h1>Source code for eval.llm_as_judge</h1><div class="highlight"><pre>
<span class="sd"> &gt;&gt;&gt; judgement_query = &quot;For the question, does the predicted answer contain the ground truth answer?&quot;</span>
<span class="sd"> &gt;&gt;&gt; llm_judge = LLMasJudge()</span>
<span class="sd"> &gt;&gt;&gt; avg_judgement, judgement_list = llm_judge.compute(</span>
<span class="sd"> questions, pred_answers, gt_answers, judgement_query</span>
<span class="sd"> questions, gt_answers, pred_answers, judgement_query</span>
<span class="sd"> )</span>
<span class="sd"> &gt;&gt;&gt; avg_judgement</span>
<span class="sd"> 2 / 3</span>
Expand All @@ -562,28 +562,30 @@ <h1>Source code for eval.llm_as_judge</h1><div class="highlight"><pre>
<span class="k">def</span> <span class="nf">compute</span><span class="p">(</span>
<span class="bp">self</span><span class="p">,</span>
<span class="n">questions</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">str</span><span class="p">],</span>
<span class="n">pred_answers</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">str</span><span class="p">],</span>
<span class="n">gt_answers</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">str</span><span class="p">],</span>
<span class="n">pred_answers</span><span class="p">:</span> <span class="n">List</span><span class="p">[</span><span class="nb">str</span><span class="p">],</span>
<span class="n">judgement_query</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span>
<span class="p">)</span> <span class="o">-&gt;</span> <span class="n">List</span><span class="p">[</span><span class="nb">bool</span><span class="p">]:</span>
<span class="w"> </span><span class="sa">r</span><span class="sd">&quot;&quot;&quot;</span>
<span class="sd"> Get the judgement of the predicted answer for a list of questions.</span>

<span class="sd"> Args:</span>
<span class="sd"> questions (List[str]): List of question strings.</span>
<span class="sd"> pred_answers (List[str]): List of predicted answer strings.</span>
<span class="sd"> gt_answers (List[str]): List of ground truth answer strings.</span>
<span class="sd"> pred_answers (List[str]): List of predicted answer strings.</span>
<span class="sd"> judgement_query (str): Judgement query string.</span>

<span class="sd"> Returns:</span>
<span class="sd"> List[bool]: Judgement results.</span>
<span class="sd"> tuple:</span>
<span class="sd"> - float: Average judgement score.</span>
<span class="sd"> - List[bool]: Judgement results for each query.</span>
<span class="sd"> &quot;&quot;&quot;</span>
<span class="n">judgement_list</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">question</span><span class="p">,</span> <span class="n">pred_answer</span><span class="p">,</span> <span class="n">gt_answer</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span>
<span class="n">questions</span><span class="p">,</span> <span class="n">pred_answers</span><span class="p">,</span> <span class="n">gt_answers</span>
<span class="k">for</span> <span class="n">question</span><span class="p">,</span> <span class="n">gt_answer</span><span class="p">,</span> <span class="n">pred_answer</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span>
<span class="n">questions</span><span class="p">,</span> <span class="n">gt_answers</span><span class="p">,</span> <span class="n">pred_answers</span>
<span class="p">):</span>
<span class="n">judgement</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">llm_evaluator</span><span class="p">(</span>
<span class="n">question</span><span class="p">,</span> <span class="n">pred_answer</span><span class="p">,</span> <span class="n">gt_answer</span><span class="p">,</span> <span class="n">judgement_query</span>
<span class="n">question</span><span class="p">,</span> <span class="n">gt_answer</span><span class="p">,</span> <span class="n">pred_answer</span><span class="p">,</span> <span class="n">judgement_query</span>
<span class="p">)</span>
<span class="n">judgement_list</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">judgement</span><span class="p">)</span>

Expand All @@ -606,7 +608,7 @@ <h1>Source code for eval.llm_as_judge</h1><div class="highlight"><pre>
<span class="p">)</span>
<span class="n">llm_judge</span> <span class="o">=</span> <span class="n">LLMasJudge</span><span class="p">()</span>
<span class="n">avg_judgement</span><span class="p">,</span> <span class="n">judgement_list</span> <span class="o">=</span> <span class="n">llm_judge</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span>
<span class="n">questions</span><span class="p">,</span> <span class="n">pred_answers</span><span class="p">,</span> <span class="n">gt_answers</span><span class="p">,</span> <span class="n">judgement_query</span>
<span class="n">questions</span><span class="p">,</span> <span class="n">gt_answers</span><span class="p">,</span> <span class="n">pred_answers</span><span class="p">,</span> <span class="n">judgement_query</span>
<span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">avg_judgement</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">judgement_list</span><span class="p">)</span>
Expand Down
46 changes: 27 additions & 19 deletions _sources/developer_notes/evaluation.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -73,31 +73,39 @@ If you are interested in computing metrics such as accuracy, F1-score, ROUGE, BE
If you are particulay interested in evaluating RAG (Retrieval-Augmented Generation) pipelines, we have several metrics available in LightRAG to assess both the quality of the retrieved context and the quality of the final generated answer.

- :class:`RetrieverEvaluator <eval.evaluators.RetrieverEvaluator>`: This evaluator is used to evaluate the performance of the retriever component of the RAG pipeline. It has metric functions to compute the recall and context relevance of the retriever.
- :class:`AnswerMacthEvaluator <eval.evaluators.AnswerMacthEvaluator>`: This evaluator is used to evaluate the performance of the generator component of the RAG pipeline. It has metric functions to compute the exact match and fuzzy match accuracy of the generated answer.
- :class:`LLMasJudge <eval.evaluators.LLMasJudge>`: This evaluator uses an LLM to get the judgement of the predicted answer for a list of questions. The task description and the judgement query of the LLM judge can be customized. It has a metric function to compute the judgement score, which is the number of generated answers that are judged as correct by the LLM divided by the total number of generated answers.
- :class:`RetrieverRecall <eval.retriever_recall>`: This is used to evaluate the recall of the retriever component of the RAG pipeline.
- :class:`RetrieverRelevance <eval.retriever_relevance>`: This is used to evaluate the relevance of the retrieved context to the query.
- :class:`AnswerMatchAcc <eval.answer_match_acc>`: This calculates the exact match accuracy or fuzzy match accuracy of the generated answers by comparing them to the ground truth answers.
- :class:`LLMasJudge <eval.llm_as_judge>`: This uses an LLM to get the judgement of the generated answer for a list of questions. The task description and the judgement query of the LLM judge can be customized. It computes the judgement score, which is the number of generated answers that are judged as correct by the LLM divided by the total number of generated answers.

For example, you can use the following code snippet to compute the recall and relevance of the retriever component of the RAG pipeline for a single query.

.. code-block:: python
:linenos:
from eval.evaluators import RetrieverEvaluator
retrieved_context = "Apple is founded before Google." # Retrieved context
gt_context = ["Apple is founded in 1976.",
"Google is founded in 1998.",
"Apple is founded before Google."] # Ground truth context
retriever_evaluator = RetrieverEvaluator() # Initialize the RetrieverEvaluator
recall = retriever_evaluator.compute_recall_single_query(
retrieved_context, gt_context
) # Compute the recall of the retriever
relevance = retriever_evaluator.compute_context_relevance_single_query(
retrieved_context, gt_context
) # Compute the relevance of the retriever
print(f"Recall: {recall}, Relevance: {relevance}")
# Recall: 0.3333333333333333, Relevance: 1.0
For a more detailed instructions on how to use these evaluators to evaluate RAG pipelines, you can refer to the tutorial on :doc:`Evaluating a RAG Pipeline <../tutorials/eval_a_rag>`, where we provide a step-by-step guide on how to use these evaluators to evaluate a RAG pipeline on HotpotQA dataset.
from lightrag.eval import RetrieverRecall, RetrieverRelevance
retrieved_contexts = [
"Apple is founded before Google.",
"Feburary has 28 days in common years. Feburary has 29 days in leap years. Feburary is the second month of the year.",
]
gt_contexts = [
[
"Apple is founded in 1976.",
"Google is founded in 1998.",
"Apple is founded before Google.",
],
["Feburary has 28 days in common years", "Feburary has 29 days in leap years"],
]
retriever_recall = RetrieverRecall()
avg_recall, recall_list = retriever_recall.compute(retrieved_contexts, gt_contexts) # Compute the recall of the retriever
print(f"Recall: {avg_recall}, Recall List: {recall_list}")
# Recall: 0.6666666666666666, Recall List: [0.3333333333333333, 1.0]
retriever_relevance = RetrieverRelevance()
avg_relevance, relevance_list = retriever_relevance.compute(retrieved_contexts, gt_contexts) # Compute the relevance of the retriever
print(f"Relevance: {avg_relevance}, Relevance List: {relevance_list}")
# Relevance: 0.803030303030303, Relevance List: [1.0, 0.6060606060606061]
For a more detailed instructions on how build and evaluate RAG pipelines, you can refer to the use case on :doc:`Evaluating a RAG Pipeline <../tutorials/eval_a_rag>`.

If you intent to use metrics that are not available in the LightRAG library, you can also implement your own custom metric functions or use other libraries such as `RAGAS <https://docs.ragas.io/en/stable/getstarted/index.html>`_ to compute the desired metrics for evaluating RAG pipelines.

Expand Down
3 changes: 2 additions & 1 deletion _sources/developer_notes/index.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ Code path: :ref:`lightrag.core <apis-core>`.

RAG Essentials
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

RAG components
^^^^^^^^^^^^^^^^^^^

Expand All @@ -98,14 +99,14 @@ Code path: :ref:`lightrag.core<apis-core>`. For abstract classes:
- ``ModelClient`` is the protocol and base class for LightRAG to **integrate all models**, either APIs or local, LLMs or Embedding models or any others.
* - :doc:`generator`
- The orchestrator for LLM prediction. It streamlines three components: `ModelClient`, `Prompt`, and `output_processors` and works with optimizer for prompt optimization.
- The **center component** that orchestrates the model client(LLMs in particular), prompt, and output processors for format parsing or any post processing.
* - :doc:`output_parsers`
- The component that parses the output string to structured data.
* - :doc:`embedder`
- The component that orchestrates model client (Embedding models in particular) and output processors.
* - :doc:`retriever`
- The base class for all retrievers who in particular retrieve relevant documents from a given database to add **context** to the generator.


Data Pipeline and Storage
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
13 changes: 13 additions & 0 deletions _sources/developer_notes/output_parsers.rst.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,15 @@
Parser
=============
In this note, we will explain LightRAG parser and output parsers.

Context
----------------

Parser
----------------
LLMs output text in string format.
Parser is a component used to parse that string into desired data structure per the use case.

Converting string to structured data is similar to the step of deserialization in serialization-deserialization process.
We already have powerful ``DataClass`` to handle the serialization-deserialization for data class instance.
Parser builts on top of that
Loading

0 comments on commit 8d3a9e1

Please sign in to comment.