Feature: add after_llm_cb to modify llm generated text #1044

achapla · 2024-11-05T12:40:31Z

Question regarding handling special tokens in conversation transcription

First of all, thanks for making this wonderful SDK to easily create voice-enabled applications!

I'm currently building a quiz agent that asks questions to users. The user's response is evaluated by an LLM, and if it's correct, the agent congratulates the user and adds a special 'QUESTION_END' token in the response. This token is used to identify that the conversation related to the current question is finished. I then create a new chat context with the next question.

Issue

The issue I'm facing is that while I have removed the special 'QUESTION_END' token in the before_tts_cb function, so my agent does not speak 'QUESTION_END', it still appears in the conversation text. It seems that the 'QUESTION_END' text was not removed from the LLM's response before transcription.

Upon further investigation, I discovered the following code is executed after calling the LLM, which utilizes two different variables, tts_source and transcription_source, for different purposes:

def _synthesize_agent_speech(
    self,
    speech_id: str,
    source: str | LLMStream | AsyncIterable[str],
) -> SynthesisHandle:
    assert (
        self._agent_output is not None
    ), "agent output should be initialized when ready"
 
    if isinstance(source, LLMStream):
        source = _llm_stream_to_str_iterable(speech_id, source)
 
    og_source = source
    transcript_source = source
    if isinstance(og_source, AsyncIterable):
        og_source, transcript_source = utils.aio.itertools.tee(og_source, 2)
 
    tts_source = self._opts.before_tts_cb(self, og_source)
    if tts_source is None:
        raise ValueError("before_tts_cb must return str or AsyncIterable[str]")
 
    return self._agent_output.synthesize(
        speech_id=speech_id,
        tts_source=tts_source,
        transcript_source=transcript_source,
        transcription=self._opts.transcription.agent_transcription,
        transcription_speed=self._opts.transcription.agent_transcription_speed,
        sentence_tokenizer=self._opts.transcription.sentence_tokenizer,
        word_tokenizer=self._opts.transcription.word_tokenizer,
        hyphenate_word=self._opts.transcription.hyphenate_word,
    )

The tts_source does not contain the 'QUESTION_END' token, so it won't be played out, but the transcription_source contains 'QUESTION_END', causing it to be included in the transcription when committed.

Request for Enhancement

Currently, I've been unable to find a way to remove the 'QUESTION_END' text from the transcription, forcing me to use a crude hack to remove it from my frontend.

I am looking for an after_llm_cb function or similar hook that would allow for observation and modification of the LLM-generated text before it's committed to transcription.

Thank you!

The text was updated successfully, but these errors were encountered:

davidzhao · 2024-11-06T01:59:41Z

this is a good point, we should offer a way to override before it's committed to the context.

achapla added the question Further information is requested label Nov 5, 2024

achapla mentioned this issue Nov 6, 2024

Add AfterLLMCallback to modify LLM stream text post-generation #1052

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: add after_llm_cb to modify llm generated text #1044

Feature: add after_llm_cb to modify llm generated text #1044

achapla commented Nov 5, 2024

davidzhao commented Nov 6, 2024

Feature: add after_llm_cb to modify llm generated text #1044

Feature: add after_llm_cb to modify llm generated text #1044

Comments

achapla commented Nov 5, 2024

Question regarding handling special tokens in conversation transcription

Issue

Request for Enhancement

davidzhao commented Nov 6, 2024