Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] OpenAI GPT-4o Realtime Preview - Context Loss and Audio Event Handling #478

Open
Zeeshan-Satti opened this issue Nov 11, 2024 · 0 comments

Comments

@Zeeshan-Satti
Copy link

Zeeshan-Satti commented Nov 11, 2024

Title: Context Loss on Long Prompts & Audio Event Handling Issue in GPT-4o Realtime Preview

Description:

I am encountering two issues with the OpenAI GPT-4o Realtime Preview (version gpt-4o-realtime-preview-2024-10-01). These issues are affecting both prompt context retention and audio event handling when using long or complex prompts.

  1. Context Loss with Long Prompts:
    When providing lengthy prompts, GPT-4o seems to "forget" prior context, even though I am staying well within the token limits.

    • Token Limits:
      • Context window: 128,000 tokens
      • Output tokens: 4096 tokens
      • I am using a maximum of 20,000 tokens (input + output combined), well below the limit.
    • Expected Behavior: The model should retain the context of the input prompt, even if it is lengthy, and provide a response based on the entire prompt.
    • Observed Behavior: Despite being within the token limits, the model outputs a response that seems disconnected or repetitive, and it doesn’t retain the full context from the long prompt. This results in a similar or generic response, rather than a response informed by the entire input.
  2. Audio Event Handling:
    The second issue arises when using audio input alongside a long prompt. The model does not respond to audio events correctly.

    • Issue Details: When a long prompt is provided, the system fails to trigger or respond to any related audio events.
    • Expected Behavior: Audio-related events should be captured and processed alongside the text input. For instance, if I say something like "Can you assist me with a task?", the system should recognize and handle audio events related to that input.
    • Observed Behavior: The system continues to respond to socket events only with text-based responses like “Hello, how may I assist you?” or similar generic replies. No audio-related events are triggered or processed, regardless of the input prompt.

Steps to Reproduce:

  1. Provide a long prompt to the GPT-4o model via the session.update event.
  2. Observe the output response for context retention.
  3. Test the system with an audio input and a lengthy prompt.
  4. Observe that the audio socket event does not get triggered and only text-based socket responses are returned.

Expected Behavior:

  • The model should retain context from long prompts without losing information and produce relevant, informed responses.
  • The system should trigger and process audio events in response to audio input alongside text, properly integrating the interaction.

Actual Behavior:

  • GPT-4o fails to maintain context when handling lengthy prompts, providing generic or repetitive responses.
  • Audio-related socket events are not triggered when a long prompt is used, and only basic text-based responses are returned.

Environment:

  • Model Version: gpt-4o-realtime-preview-2024-10-01
  • Token Limitations: 128,000 context tokens, 4096 output tokens.
  • Token Usage: Maximum 20,000 tokens (input + output).
  • Issue observed across both text and audio input modes.

Additional Information:

  • The issue persists even when staying within the token limits.
  • The audio handling seems to be affected specifically when dealing with long prompts, which might suggest an issue with how the system is managing multiple input types (text and audio) in parallel.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant