Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server to client media playback with frame-based processing #1362

Open
jamjambles opened this issue Aug 25, 2023 · 3 comments
Open

Server to client media playback with frame-based processing #1362

jamjambles opened this issue Aug 25, 2023 · 3 comments

Comments

@jamjambles
Copy link

Many of the examples in this repo show client to server media sinks (mic / video capture), which have frame based callback processing. I am looking to do server to client media playback, with frame based callback processing. This would be useful for real-time audio playback with real-time processing.

After searching through this discussion https://discuss.streamlit.io/t/new-component-streamlit-webrtc-a-new-way-to-deal-with-real-time-media-streams/8669, and the example pages in streamlit-webrtc, I have not been able to find an example of this.

To be specific, I am looking to do the following:

  1. Load an audio file (server)
  2. Start playback (from server to client), frame by frame
  3. Process each frame (before it is sent to the client) via a callback (processing should occur on the server, for example ML inference)
  4. Playback processed audio frame to client
  5. Continue in real-time

This example uses the MediaPlayer class from aiortc:

from aiortc.contrib.media import MediaPlayer
. However it does not seem that this provides any sort of callback on the stream (at the audio frame level).

Digging deeper, the MediaPlayer class has a MediaStreamTrack instance (https://aiortc.readthedocs.io/en/latest/api.html#aiortc.MediaStreamTrack) which has a recv callback method for each frame.

Would the correct approach be to create a new subclass of MediaStreamTrack and write a custom recv for the required processing? I found this related thread: aiortc/aiortc#571

Is this functionality supported currently? I would appreciate any guidance here.

Thanks heaps!

@wenshutang
Copy link

Interested in pointers to potential solutions. My use case is similar, generate some text-to-speech and play it back.

@whitphx
Copy link
Owner

whitphx commented Sep 10, 2023

The example you mentioned (https://github.com/whitphx/streamlit-webrtc/blob/main/pages/8_media_files_streaming.py) uses a callback to process the video frames (video_frame_callback).
Does using audio_frame_callback in this place instead work for you?

The audio filter example may also be a reference about the usage of audio callback while it's a client-to-server example.

@manhcuong17072002
Copy link

manhcuong17072002 commented Oct 30, 2024

Interested in pointers to potential solutions. My use case is similar, generate some text-to-speech and play it back.

@wenshutang I'm also interested in this issue for my text-to-speech streaming problem. I wonder if you've found a solution yet. If so, could you please share some references or suggestions? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants