feat: let LLM choose whether to retrieve context #62

lsorber · 2024-12-09T16:11:31Z

Changes:

Add support for tool use to the llama.cpp LiteLLM integration.
The rag or async_rag functions now update the message history with tool use and the assistant response.
Let LLM choose whether to retrieve context through tool use. Details:
- Tool use is skipped if RAG context is already provided.
- Llama.cpp models cannot stream the response with tool_choice="auto" (which lets the LLM choose whether or not to call a tool), so we add a workaround that forces tool use but lets them skip tool execution with a skip=True argument.
- To help llama.cpp models plan their tool use response, the final message is extended with the tools' JSON schema.
Fix registering of llama.cpp model_info with LiteLLM.
Improve RAG generation test cases.
Improve LLM-embedder test coverage.

src/raglite/_rag.py

feat: let LLM choose whether to retrieve context

319f522

lsorber requested a review from undo76 December 9, 2024 16:11

undo76 reviewed Dec 9, 2024

View reviewed changes

src/raglite/_rag.py Show resolved Hide resolved

lsorber added 10 commits December 9, 2024 18:09

test: improve config consistency

aa5cc80

feat: add tool use to Chainlit

93a75b9

test: increase LLM coverage

4728d28

fix: make tool use more robust

888cb43

fix: make RAG with SLMs more robust

f440ae4

fix: improve LiteLLM usage

4f5f038

fix: fix registering of llama.cpp model_info

9ae3d8d

feat: add an on_retrieval callback

a011621

docs: improve README section on dynamic routing

a446373

fix: update Chainlit integration to use the new callback

82def46

lsorber merged commit 574e407 into main Dec 15, 2024
2 checks passed

lsorber deleted the ls-tools branch December 15, 2024 10:42

Provide feedback