-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The SDL3 audio subsystem redesign! #7704
Conversation
Would it be possible to allow mixing to be done by the audio drivers using this new API in order to take advantage of hardware capabilities? |
I'd have to look into what is available on various APIs, but my suspicion is that most don't offer this, we'd have to keep a separate buffer for each stream, and mixing isn't a high-overhead operation in general. I'd be more inclined to add SIMD versions of SDL_MixAudioFormat instead. |
I'm wondering if maybe it was too aggressive to remove SDL_AudioSpec entirely. A struct with just format, channels and sample rate could be nice, and probably save some code changes. |
I think I might expose three extra functions in SDL_AudioStream:
(There's already a mutex, this just lets others use it explicitly.)
Register a function that runs at the start of a SDL_GetAudioStreamData call. The callback can take the chance to add more data to the stream, or query the current amount, etc. The end result is you can have the SDL2 callback interface, if you want it, and you can have it for each bound stream. Actual code added to SDL is minimal, the old interface can be implemented without a lot of drama, or latency, or an extra single-header library. If SDL_mixer were so-inclined, it could move each audio channel to a stream and still be able to do posteffects, on-demand decoding, etc. |
That sounds awesome. :) |
Hey, many thanks for all your work on SDL and apologies if this is not the right place to raise this. One issue I've been running into when using SDL with SDL_mixer is that doing volume fades creates a popping sound because volume is only changed on chunk boundaries in a hard step fashion. This was not something that could be easily fixed in SDL_mixer as it was simply calling MixAudioFormat() with a single volume for the whole chunk. Do you see any way to make the new audio API more flexible so that smooth fades could be more easily implemented by devs and/or in SDL_mixer? There's an old discussion of this in SDL_mixer repo libsdl-org/SDL_mixer#190 . |
Yeah, this is a legit bug in SDL_mixer, but the interpolation should happen there, I think. libsdl-org/SDL_mixer#190 is the right place to discuss this. |
A few more observations:
I suspect resampling and channel conversion are likely to be the key issues here (especially on CPUs without hardware floating point), since that needs to be done before mixing happens. That said, there doesn't seem to be anything major in the proposed API that prevents hardware mixing, so this might be something to investigate separately at a later date. |
Good catch, I'll fix that.
This should accept a device ID of 0 to request the default device, but this isn't hooked up at the moment, and there are still some logistics to figure out. There are going to be some API changes here still, I think.
It isn't really undefined so much as it will absolutely ruin your output. :) But it's thread-safe and won't crash the app or anything. I've thought about this (and also refusing to let the app change the stream's output format when it's bound to an output device), but my thinking is that if you put your finger on a hot stove, eventually you'll figure out to not do that.
I think I have a FIXME in there to consider this. I was not going to do this, since it opens up a world of One More Things people would like added until we just reimplement SDL_mixer, but with the callback plan, maybe we can avoid feature creep, so maybe I will add this.
There's an SDL2 API that was lost in here, SDL_GetDefaultAudioInfo, which needs to be readded once I figure out the default device politics. For specific devices, the current format (which is our best guess at a preferred format when not opened), is already available. SDL has never listed all possible formats, and I don't think it's useful to do so...in many cases, you're just moving where data conversion happens if you try to pick a "native" format. |
Wishlist item: there should be a way to query if device permission is available, or forbidden, or pending user response, if this is something various platforms expose. iOS and Android obviously do this for the microphone, but web browsers will forbid access to audio output, too, until the user has interacted with the page, and WinRT makes approval of WASAPI device opens async, presumably for situations where they want users to approve it. Having a more formal way to deal with that in SDL apps would be nice. I don't know what, yet. |
d0089e4
to
a0d369e
Compare
SDL_AudioStream callbacks are in, and loopwave has been updated to use them for testing purposes, and the changes to move from SDL2 to SDL3 are pretty small with this approach. This is a good improvement. |
SDL_AudioSpec is back in, but just as a thing that holds format/channels/frequency. It actually tightens up a bunch of code, and its purpose is really clear now vs SDL2, so I'm happy with its return. |
So one stumbling point is that I wanted to remove device open/close, and just let people bind streams to devices, but this causes other problems (people will want to have a definite shutdown point where "closing" the device will stop all their sounds, but what do you do if something else also has streams bound to a device? If you want to pause the device, there isn't an easy button here beyond unbinding all your streams at once, etc). So I guess we're going to keep an open/close API, and opening will return a new device ID, even though internally these fake devices will all just mix onto a single physical device, but the VoIP library's streams will be logically separated from the movie playback library's streams, and the app's own streams, and pausing a device will just stop mixing one logical group, and closing a device will just unbind that group from the device. It adds a little internal complexity, but it seems like the right thing to do, and will be less confusing for app developers. |
Of course, now we have device ids that can be used with some APIs (SDL_BindAudioStream needs a logical device) and device ids that can be used with others (SDL_OpenAudioDevice needs a physical device) and some that can reasonably be used with both (SDL_GetAudioDeviceName) Have to think on this more. |
Actually, this is probably fine. Binding a stream to a physical device will fail, which might be confusing, but everything else can be made to reasonably work, including opening a new logical device from an existing logical device (and might even be useful if you want to make a temporary logical grouping of streams). |
Ok, logical audio devices are in, here's the silly test program doing the two streams with music and sound, plus a second open of the same device (done by opening the logical device's id, so you don't have to keep the original physical device id around!), playing the sound at an offset, so you can hear them all mixing into a single buffer for the actual hardware: #include <SDL3/SDL.h>
int main(int argc, char **argv)
{
SDL_Init(SDL_INIT_AUDIO);
Uint8 *musicbuf, *soundbuf;
Uint32 musicbuflen, soundbuflen;
SDL_AudioSpec musicspec, soundspec;
SDL_LoadWAV("music.wav", &musicspec, &musicbuf, &musicbuflen);
SDL_LoadWAV("tink.wav", &soundspec, &soundbuf, &soundbuflen);
SDL_AudioDeviceID *devices = SDL_GetAudioOutputDevices(NULL);
const SDL_AudioDeviceID device = devices ? SDL_OpenAudioDevice(devices[0], &musicspec) : 0;
SDL_free(devices);
if (device) {
const SDL_AudioDeviceID device2 = SDL_OpenAudioDevice(device, &musicspec);
SDL_AudioStream *musicstream = SDL_CreateAndBindAudioStream(device, &musicspec);
SDL_AudioStream *soundstream = SDL_CreateAndBindAudioStream(device, &soundspec);
SDL_AudioStream *soundstream2 = SDL_CreateAndBindAudioStream(device2, &soundspec);
Uint64 nextsound = 0;
SDL_PutAudioStreamData(musicstream, musicbuf, musicbuflen);
SDL_free(musicbuf);
while (SDL_GetAudioStreamAvailable(musicstream) > 0) {
if (SDL_GetAudioStreamAvailable(soundstream) < soundbuflen) {
SDL_PutAudioStreamData(soundstream, soundbuf, soundbuflen);
}
if (SDL_GetAudioStreamAvailable(soundstream2) == 0) {
if (!nextsound) {
nextsound = SDL_GetTicks() + 1000;
} else if (nextsound <= SDL_GetTicks()) {
SDL_PutAudioStreamData(soundstream2, soundbuf, soundbuflen);
}
}
SDL_Delay(10);
}
SDL_DestroyAudioStream(musicstream);
SDL_DestroyAudioStream(soundstream);
SDL_DestroyAudioStream(soundstream2);
SDL_CloseAudioDevice(device2);
SDL_CloseAudioDevice(device);
}
SDL_free(soundbuf);
SDL_Quit();
return 0;
} This is complexity the average app won't need directly; it's intended to make things work when some external library wants to open a device too, and doesn't coordinate with the app to share one. But it's also kinda glorious. |
Latest commit still has some loose ends to tie up, but not only are most of the details for default devices back in place, SDL can now handle migrating playback to a new default device when the system default changes. Before this was pretty much something we handled explicitly in the CoreAudio backend (and just asked PulseAudio to manage for us implicitly), but now for any backend, we can just scoot all the logical devices over to different physical hardware, change the format of the business end of their audio streams, and keep going. The backend just needs to be able to tell us when a new default device was selected (user changed it in system controls, they plugged in headphones, etc), and the higher level does the rest! |
Nice! |
56fe371
to
bd98272
Compare
|
sdl2-compat work is sitting in libsdl-org/sdl2-compat#80, which was like climbing a mountain, but I've almost reached the summit now. |
So I'm reworking the Pipewire backend, and while this is proving to be a good test of the new system for backends that provide their own threads, I'm wondering if the Pipewire implementation is wrong. I suspect the API is meant to work like the new PulseAudio threading code, where you let it spin one thread to dispatch PulseAudio events, and then all your threads cooperate around that. Right now it is spinning a thread for each device, which is not bad in itself and what we would do, but I'm wondering if each thread they spin is fighting for the same socket and event queue anyhow, and we should restructure this to match the new PulseAudio code. As an added bonus, then it can use the standard SDL device thread. |
This is allegedly lower-latency than the AAudioStream_write interface, but more importantly, it let me set this up to block in WaitDevice. Also turned on the low-latency performance mode, which trades battery life for a more efficient audio thread to some unspecified degree.
I can't ever find this when it's in the middle! It's a "me" problem. :)
51b5b94
to
b0edd23
Compare
On Linux,
On Android, Also on Android,
|
This was a dumb bug, fixed now. I'll look at testaudiostreamdynamicresample. |
It needed some help to find sample.wav after the test building changes. This is fixed now (but this program really needs to be reworked to use the test framework stuff that the "real" test apps use, at some point). |
Oops, I launched it from the wrong directory. Sorry about that. I see the following behavior with
|
There's a resampling bug in mainline SDL3 pending a fix, which is where your pops are coming from on all platforms. I'll take a look at the other issues. Wrapping up 2.28.2 and such today and then I can get to that. |
Sweet! Does it make sense to squash this to a single commit? |
I was thinking of leaving it as a merge, not a rebase, because we could very well want to step through some of these 138 commits in the future as a pile of commits on an unnamed branch, but the main timeline is just going to see "this is where the audio changes merged" and not have to deal with the mess when bisecting, etc. Normally I'd say squash, but this is a ton of changes to make look like one unit of work. I might take a run at interactive rebasing to squash out "patch to compile on [whatever platform]" commits in any case. But let's CC a few people that might have opinions on the right way to merge this beast: @libsdl-org/a-team @smcv |
Great presentation, and the test app looks really slick! I don't have a reproducer, other then me creating, removing and moving around logical audio devices (from my laptop capture device) in the testaudio app, but the following assertion error throws after some time:
Another Android change, which I don't know is intentional: when I open the |
That's good, right? :) Right now we don't have a way to determine the default audio device in Android, so it goes with the first one it sees during device enumeration (and on a phone, this generally works out). I haven't found an API that tells me the actual default yet, but there are various things that mention the "preferred" device in the API reference, so I'm assuming I'll stumble upon this at some point. Right now it's still an open question to be resolved. But that (deciding which integer is the default for a given device) just needs to be settled before 3.2.0 ships. I'll try to track down the assert over here. |
I'm writing this based entirely on recent comments and have not looked at the actual changes.
I think you're right. I would say that even if the commits get rebased onto Some projects like GLib and dbus do everything as a merge with a merge commit, even if the commits could have been fast-forwarded ("semi-linear history" - look at one of those projects in gitk or similar and you'll see what I mean). I personally like that approach better than always rebasing: one advantage of doing that is that the git commit history ends up with a reference to the pull request/merge request, rather than the PRs being Github-/Gitlab-specific metadata that becomes invisible if you clone the repo elsewhere.
Almost 8K lines of diff seems unhelpfully big for a single commit! I think
If there are places where a commit was just wrong, and a subsequent commit fixes it up to be right, then doing the interactive rebase to make it look as if you had got it right the first time makes sense - when doing archaeology to find out why some code is the way it is, the history of what would have happened in a hypothetical past where you had never made mistakes is often more useful than the history of what actually happened. But if it's hard to achieve that, for a branch of this size it's not going to be realistic to bisect into the middle of it in any case, so it doesn't matter so much if the tree passed through some broken intermediate states that can't be compiled or don't work; better for future code-historians to preserve the context of separate commits than to squash everything into fewer/larger commits, IMO. (One of the things I've learned from contributing to projects like SDL, GLib and dbus is that "maintainer" and "code historian" have a surprising amount in common!) |
Okay, I'm happy to see I'm on the right path here. I guess this is last call; I might do some minor cleanup to the revision history if possible, but I'll click the "merge" button later today unless someone says "stop!" |
We are merged! 😬 |
This is a work in progress! (and this commit will probably get force-pushed over at some point).
This rips up the entire SDL audio subsystem! While we still feed the audio device from a separate thread, the audio callback into the app is now
gonea totally optional alternative.Now the app will bind an SDL_AudioStream to a given device and feed data to it. As many streams as one likes can be bound to a device; SDL will mix them all into a single buffer and feed the device from there.
So not only does this function as a basic mixer, it also means that multiple device opens are handled seamlessly (so if you want to open the device for your game, but you also link to a library that provides VoIP and it wants to open the device separately, you don't have to worry about stepping on each other, or that the OS will fail to allow multiple opens of the same device, etc).
Here is a simple program that just opens a device, binds two streams to it, and plays them both at the same time, ending when the first stream is exhausted, and looping the other:
There are many many other changes; the best plan is to find README-migration.md in the commit and read up on the differences. Notably: the commit deletes more code than it adds, so in many ways the new audio code is a simplication of the code and the API.
There is a lot to be done still, but this has been churning in my working copy for weeks now. Since this has finally gotten far enough that it can be made to work in the right conditions, I intend to work out of this PR and then squash it down before merging.
Some notable things to be done, still:
List of backends to be updated:
So we have a ways to go here, but this is the basic idea I'm moving towards. I have to spend some time on SDL 2.28.0 and sdl12-compat next week, and then I'll be returning to this. Feedback is certainly welcome in the meantime!
Fixes #7379.
Reference Issue #6889.
Reference Issue #6632.