Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify context before LLM #1034

Open
tpy37 opened this issue Nov 4, 2024 · 6 comments
Open

Modify context before LLM #1034

tpy37 opened this issue Nov 4, 2024 · 6 comments

Comments

@tpy37
Copy link

tpy37 commented Nov 4, 2024

Thank you for the great package. I am looking for a method to change the context before sending it to the LLM in MultimodalAgent class. I think it exists in VoicePipelineAgent, and I am wondering how I could implement it with OpenAI realtime API.

@tpy37
Copy link
Author

tpy37 commented Nov 5, 2024

I would be happy if you could implement the RAG part of the MultimodalAgent, as in the documentation! :)
before_llm_cb=_enrich_with_rag

image

@davidzhao
Copy link
Member

it's a bit difficult with multimodal agent, because it's from voice input directly to voice output.

the way to handle RAG is with function calling. If you are defining a function for the LLM to look up information with the user's query, it should be straight forward to pick up the function call and return the RAG results that way

@tpy37
Copy link
Author

tpy37 commented Nov 5, 2024

Thank you very much David!
I see... I was trying the function call, but changing to tool "required" seem to completely halt the process and break the conversation, so I stopped using it.

Since we have the transcribed text from the user's audio in the
openai.realtime.RealtimeResponse

I was thinking that we could analyze the transcribed texts, and then use that to do RAG, and send the results async to the openai api as texts?
const event = { type: 'conversation.item.create', item: { type: 'message', role: 'user', content: [ { type: 'input_text', text: 'Hello!' } ] } }; ws.send(JSON.stringify(event)); ws.send(JSON.stringify({type: 'response.create'}));
https://platform.openai.com/docs/guides/realtime?text-generation-quickstart-example=text

Just some thoughts...

@prashantmetadome
Copy link

@tpy37 can you guide me on how to change context in VoicePipelineAgent?

@tpy37
Copy link
Author

tpy37 commented Nov 6, 2024

I am sorry, I haven't done it myself in VoicePipeline. But the example is available in https://docs.livekit.io/agents/voice-agent/voice-pipeline/#modify-context-before-llm

I think there was also example to send it to RAG using this before_llm_cb in one of the Github repository:
examples/voice-pipeline-agent/simple-rag/assistant.py

Hope it helps!

@prashantmetadome
Copy link

prashantmetadome commented Nov 6, 2024

@tpy37 thank you very much, it does help. But I am still unsure.. my use case is I want to manipulate the prompt based on a tool call.

To go deeper in the specific requirement, the conversation has outgrown the current prompt and has gone into different territory that need to be handled by a different prompt.

I do not need to manipulate the prompt inside the tool call but if I can just extract some metadata from tool call and access it in the callback function should solve the problem but I am not sure how can I do that.
@davidzhao any help would be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants