You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Question regarding handling special tokens in conversation transcription
First of all, thanks for making this wonderful SDK to easily create voice-enabled applications!
I'm currently building a quiz agent that asks questions to users. The user's response is evaluated by an LLM, and if it's correct, the agent congratulates the user and adds a special 'QUESTION_END' token in the response. This token is used to identify that the conversation related to the current question is finished. I then create a new chat context with the next question.
Issue
The issue I'm facing is that while I have removed the special 'QUESTION_END' token in the before_tts_cb function, so my agent does not speak 'QUESTION_END', it still appears in the conversation text. It seems that the 'QUESTION_END' text was not removed from the LLM's response before transcription.
Upon further investigation, I discovered the following code is executed after calling the LLM, which utilizes two different variables, tts_source and transcription_source, for different purposes:
def_synthesize_agent_speech(
self,
speech_id: str,
source: str|LLMStream|AsyncIterable[str],
) ->SynthesisHandle:
assert (
self._agent_outputisnotNone
), "agent output should be initialized when ready"ifisinstance(source, LLMStream):
source=_llm_stream_to_str_iterable(speech_id, source)
og_source=sourcetranscript_source=sourceifisinstance(og_source, AsyncIterable):
og_source, transcript_source=utils.aio.itertools.tee(og_source, 2)
tts_source=self._opts.before_tts_cb(self, og_source)
iftts_sourceisNone:
raiseValueError("before_tts_cb must return str or AsyncIterable[str]")
returnself._agent_output.synthesize(
speech_id=speech_id,
tts_source=tts_source,
transcript_source=transcript_source,
transcription=self._opts.transcription.agent_transcription,
transcription_speed=self._opts.transcription.agent_transcription_speed,
sentence_tokenizer=self._opts.transcription.sentence_tokenizer,
word_tokenizer=self._opts.transcription.word_tokenizer,
hyphenate_word=self._opts.transcription.hyphenate_word,
)
The tts_source does not contain the 'QUESTION_END' token, so it won't be played out, but the transcription_source contains 'QUESTION_END', causing it to be included in the transcription when committed.
Request for Enhancement
Currently, I've been unable to find a way to remove the 'QUESTION_END' text from the transcription, forcing me to use a crude hack to remove it from my frontend.
I am looking for an after_llm_cb function or similar hook that would allow for observation and modification of the LLM-generated text before it's committed to transcription.
Thank you!
The text was updated successfully, but these errors were encountered:
Question regarding handling special tokens in conversation transcription
First of all, thanks for making this wonderful SDK to easily create voice-enabled applications!
I'm currently building a quiz agent that asks questions to users. The user's response is evaluated by an LLM, and if it's correct, the agent congratulates the user and adds a special 'QUESTION_END' token in the response. This token is used to identify that the conversation related to the current question is finished. I then create a new chat context with the next question.
Issue
The issue I'm facing is that while I have removed the special 'QUESTION_END' token in the
before_tts_cb
function, so my agent does not speak 'QUESTION_END', it still appears in the conversation text. It seems that the 'QUESTION_END' text was not removed from the LLM's response before transcription.Upon further investigation, I discovered the following code is executed after calling the LLM, which utilizes two different variables,
tts_source
andtranscription_source
, for different purposes:The
tts_source
does not contain the 'QUESTION_END' token, so it won't be played out, but thetranscription_source
contains 'QUESTION_END', causing it to be included in the transcription when committed.Request for Enhancement
Currently, I've been unable to find a way to remove the 'QUESTION_END' text from the transcription, forcing me to use a crude hack to remove it from my frontend.
I am looking for an
after_llm_cb
function or similar hook that would allow for observation and modification of the LLM-generated text before it's committed to transcription.Thank you!
The text was updated successfully, but these errors were encountered: