Audio engine #39

MuffinTastic · 2023-01-29T00:55:49Z

MuffinTastic
Jan 29, 2023

Currently, there's no audio engine in Mocha at all. We have a lot of building blocks to choose from, but most won't fill all of our needs, so we're probably gonna have to do some leg-work. I'll lay out what I see as the pros and cons to each, and give the combinations I like best at the end.

Assume available on vcpkg/nuget unless stated otherwise.

Brace yourself for a long post.

Low-level APIs

Cross-platform playback and recording backends.

SDL_audio.h - Docs, Native, zlib license

Part of SDL2 which Mocha already uses. No extra legal considerations, probably doesn't even add another dependency?
Some old sources claim it doesn't support recording audio, which used to be true, but not anymore.

but...

Doesn't support WASAPI on Windows - makes for more latency.

PortAudio - Website, Docs, vcpkg, Native, MIT license

Widely used.

libsoundio - Website, Docs, vcpkg, Native, MIT license

Is more thorough and resilient with its error checking than alternatives, or so the author would claim.
Can monitor events for device changes.

miniaudio - Website, Docs, Header file, Native, Public domain / MIT license

Not available on vcpkg, but because it's a single header library I feel that's not so bad (?)

NAudio - GitHub, nuget, Managed, MIT license

Has some DSP features like high-level APIs do, but no sound source abstraction, thus low-level.

High-level APIs

I'm defining this as APIs that provide some level of abstraction for playing sounds, and 3D spatialization - a listener or multiple listeners with position, rotation and velocity, and the same for 3D sound sources. They all have support for callbacks called by dedicated audio threads, meaning we generally don't have to guess when audio needs to be supplied to the backend.

FMOD - Website, Docs, Download, Native, Proprietary

Extremely widely used
Extremely powerful
Easy DAW-like interface for artists
Admittedly quite relaxed pricing model, all things considered.

but...

It does cost money, if your game costs money and you earn above a certain amount of revenue.
Not on vcpkg.

OpenAL Soft - Website, Docs, vcpkg, Native, GPLv2 license

And to be clear it's OpenAL Soft, not OpenAL, which is different. Soft is forked from OpenAL 1.0 from the year 2000, before OpenAL went closed-source and stopped being truly maintained.

Widely used, easier to find information on it

but...

Stagnant 2-decade-old API, with cruft left over from its history. OpenAL 1.0's API targets audio hardware, with support for extensions, but it never took off. OpenAL Soft implements that API with software backends, but that means it also always supports certain extensions.

miniaudio (cont.)

miniaudio has a high-level API as well. All of this stuff is optional. If we want to play audio directly and bypass it entirely, we can.

Built-in decoders for WAV, MP3, and FLAC, with OGG/Vorbis as opt-in. Lets you add your own decoders.
Built-in encoder for WAV.
Built-in flexible node graph-based audio engine for mixing and effect processing, with some built-in effects and noise generators. Lets you make your own effects.
We should be able to access effects directly, meaning we can use them outside of the built-in engine.
Automatic internal resource management - caching of sounds and such.
- Loads sounds on worker threads, doesn't block the audio thread

but...

By itself, its 3D spatialization is very simple. It doesn't even have HRTF, nor environmental occlusion. I suppose this could be an upside, depending on who you are.

SoLoud - Website, Docs, Download, Native, zlib license

Built-in decoders for OGG/Vorbis, WAV, MP3 and FLAC
More effects than I've seen in the other high-level APIs.
Good documentation for how it actually works. Even if we don't use it, there's something we can learn from it.

but...

Not on vcpkg

and also

It quite probably isn't as fast. As of this writing, it has limited specialized SSE optimizations. It contains no hand-written assembly.

I'm guessing most of the others are largely the same though? So I'm going to ignore this.

SDL_mixer - Website, Docs, vcpkg, Native, zlib license

The spatialization is... let's say, sub-par. Its Mix_SetPosition() function only takes a single angle (I assume between left/right stereo channels) and a distance, neither of them being floats, and in the documentation for it they even say:

If you need more precise positional audio, consider using OpenAL for spatialized effects instead of SDL_mixer. This is only meant to be a basic effect for simple "3D" games.

This library is worth mentioning because it's talked about often enough elsewhere, but it's clearly already off the table.

DSPs

Probably necessary if we don't go with FMOD, as even the DSPs included in the high-level libraries don't have some of the effects we'd want.

Quick glossary:

Online processing: Real-time processing of audio samples
Offline processing: By contrast, processing that is done ahead-of-time.

We want online processing.

NWaves - GitHub, Docs, nuget, Managed, MIT license

I have experience! I know it does the job.
Simple API

but...

Not optimised with SIMD instructions

KFR - Website, Docs, vcpkg, Native, GPLv2 license

Well-documented, widely used.
Optimised with SIMD

but...

Has kind of an elaborate template system, seemingly with a focus on offline processing. It's not clear to me yet how one would do online processing.
Odd pricing model that makes it free only for open-source. I suppose that works for us though, as we're currently AGPLv3...

Madronalib - GitHub, Native, MIT license

Seems to have an extensive set of effects, noise generators, etc.
Optimised with SIMD
Simple enough API from the looks of it?

but...

Basically no documentation
Not on vcpkg

Other

Steam Audio - Website, Docs, GitHub Releases, Native, Proprietary license

Its announcement is a good overview, but here's an even quicker summary:

Within the realm of game audio, this could be considered middleware. It does not play audio for us. It's basically a big box of effects, with its focus being on 3D spatialization, including HRTF, reverb and reflections, occlusion, etc.

The Steam Audio SDK is available free of charge, for use by teams of any size, without any royalty requirements.
Can be integrated with other audio libraries like FMOD and miniaudio.

but...

No support for listener/source velocity; no support for doppler? I mean, it's actually a small thing, not too hard to implement if we go real low-level, but still weird.
Not available on vcpkg

Standalone codec libraries

These are the same ones used by miniaudio and SoLoud.

https://github.com/mackron/dr_libs, Public domain / MIT license
- WAV
- MP3
- FLAC
https://github.com/nothings/stb, Public domain / MIT license
- OGG/Vorbis

Summary & Thoughts

I feel there's a lot to think about here, hence why I putting this in a discussion and not an issue.

So, uh... what should we use?

If we don't care about it costing money for commercial projects, and we don't care about it maaaybe being overkill, I mean, wow, FMOD. It would take care of just... everything, and it can be paired with Steam Audio. It does mean that we wouldn't have a strictly VMix-like system, but it has equivalent features, so who cares.

Bar that, miniaudio with its built-in node graph system looks quite promising, although I don't know how difficult it would be to make a system around it that allows things to be changed on the fly. Worth looking into.

The low-level APIs all have slight tradeoffs in design and platform support, but they're generally all okay. If we don't go for anything else and we intend to roll our own audio engine, I say we try out libsoundio due to its self-purported resilience.

Why the bias towards native libraries?

For simple sound, managed libraries will work fine, but I'm concerned about how performance will scale. Think about it: Tens or perhaps even over a hundred sound sources in-game, at a sample rate of 44100hz, each with their own effects applied and mixed. I mean, just 30 simultaneous sounds works out to 1,323,000 samples per second. Perhaps it's unfounded but I start getting worried once we start getting into the range of millions of samples in managed code, especially once we involve VMix-like stuff which will have some slight overhead, scaling with how much processing it does.

Despite the extra complexity, I'd personally prefer to do stuff in native code. I think doing that has a chance of letting us throw more crap at it before it chokes - we can set higher limits on simultaneous sound events.

What about proprietary licenses?

I think distributing closed-source binaries alongside AGPLv3-licensed software (like Mocha) is allowed so long as it's dynamically linked (as is the case for FMOD and Steam Audio), but I suppose more research should be done here.

How would a VMix-like system work?

This assumes no FMOD, as that library would take care of equivalent things for us entirely.

We want something like VMix, but it's actually just one piece of the puzzle. Source 2's audio engine as a whole can be summarised as being done in five stages:

Sound events
- Each sound event has associated sound files, and chooses one sound stack
- Each sound event sets parameters made available by its chosen sound stack
Sound stacks
- From the outside, they're like shaders but for sound / sound events. They are per-sound.
- They define parameters as input, to be set by sound events' presets and likely also settable by game code.
- Each stack is a node graph in its own right, just like VMix layers. Each node within it uses those either the input parameters for the stack as a whole, or outputs from other nodes.
- Final outputs target specific 'tracks'. For example: weapon fire sounds, voiceover, ambient, reverb, music, etc. I believe it is valid to have multiple outputs targeting different tracks, considering the 'reverb' track in Half-Life: Alyx.
- Common effects include:
  - 3D spatialization for sound sources
Track mixing
- Grabs all the outputs of sound stacks and mixes matching output tracks down to single sets of samples.
- This is the transition point from per-sound processing to post-processing.
- Not configurable as far as I can tell, though I'm not sure what you would configure exactly.
VMix
- VMix is split up into multiple layers for the convenience of artists, it helps with organization.
- Each layer defines parameters as input to be manipulated by game code, for use on any layer.
- They are not strictly related to tracks in any way. It is not uncommon to have multiple track inputs on the same layer.
- Each layer is a node graph.
- Each layer has a single output.
- Common effects include:
  - Crossfades
  - Delays
  - Distortion
  - Compression
  - Etc.
Final mixing
- Mixes together all of the outputs from each VMix layer into a final composite to be heard by the player.
- Has some sort of limiter on it to prevent peaking / clipping.
- Not configurable.

How would Steam Audio integrate into this?

In Source 2, Steam Audio nodes are available in sound stacks and VMix. This makes a lot of sense with the way Steam Audio is designed, I don't see why we'd do it any differently.

Woah, this is complicated.

Yeah.

From an engine developer perspective, I think FMOD and Source 2's audio engine are both really nice and things we should at least partially emulate. They're elegant solutions to a complicated problem, and from my experience a Source 2-like system isn't that hard to implement, structurally speaking. If we do it in native code, we may need to implement some basic form of C++ reflection.

From an end user perspective, this is slightly terrifying, and we should provide sane defaults so nobody has to touch it if they don't have to nor want to. Unlike some platforms though, we should still expose it.

Am I overthinking/overengineering this?

Maybe. Definitely. I don't know. What do you guys think?

xezno · 2023-01-29T12:36:59Z

xezno
Jan 29, 2023
Maintainer

So ideally what we want when we're introducing new systems is to make sure that those new systems don't introduce any licensing dependencies / fees / etc. and we want those systems to cover as many cases as possible.

That being said, I'm going to go through these and apply some heuristics to this list so we know what we're looking for.
Ideally we want a library with a good FOSS license, we want to use something native for performance, we want something that's updated regularly, and we want something that is documented well enough that integrating it won't be a hassle.

Library	FOSS License	Native Library	Regularly Updated	Well Documented
SDL_audio.h	zlib	✅	❌	✅
PortAudio	MIT	✅	✅	✅
libsoundio	MIT	✅	✅	✅
miniaudio	Public domain / MIT	✅	✅	✅
NAudio	MIT	❌	✅	✅
FMOD	Proprietary	✅	✅	✅
OpenAL Soft	GPLv2	✅	✅	✅
SoLoud	zlib	✅	✅	✅
SDL_mixer	zlib	✅	❌	✅
NWaves	MIT	❌	✅	✅
KFR	GPLv2	✅	✅	✅
Madronalib	MIT	✅	✅	❌
Steam Audio	Proprietary	✅	✅	✅

So the ideal libraries here are ones with everything checked, as well as a good FOSS license. That leaves miniaudio, libsoundio, PortAudio, OpenAL Soft, SoLoud, NWaves, KFR, Madronalib.
Considering miniaudio has its own integration with Steam Audio, and the processing SA does is what we want, I would say that's the best library here for our uses.

That being said, FMOD already does all of the work.. however I would rather not introduce "license baggage" - if we add FMOD and someone wants to use the engine and sell their game, they would then have to worry about paying FMOD's license, which is something I want to avoid if possible.

Perhaps in future we could separate the audio system out into two subsystems, one for miniaudio and one for FMOD - that way people that want to use FMOD's workflow can do so, and people that want to make a game without worrying about additional license fees can do so as well.

1 reply

MuffinTastic Jan 29, 2023
Author

... Considering miniaudio has its own integration with Steam Audio ...

To be pedantic, all Steam Audio does is take in some audio samples, mess with em a bit and give them back to you. All integration really is is just figuring out where to intercept samples in the audio engine. If you're making an engine from scratch, that's easy. The hard part is... making the rest of the engine.

miniaudio has a lot of flexibility though, and seemingly does most of the work for us, so I'll be taking a closer look at what we can do with it today.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio engine #39

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Audio engine #39

MuffinTastic Jan 29, 2023

Low-level APIs

SDL_audio.h - Docs, Native, zlib license

PortAudio - Website, Docs, vcpkg, Native, MIT license

libsoundio - Website, Docs, vcpkg, Native, MIT license

miniaudio - Website, Docs, Header file, Native, Public domain / MIT license

NAudio - GitHub, nuget, Managed, MIT license

High-level APIs

FMOD - Website, Docs, Download, Native, Proprietary

OpenAL Soft - Website, Docs, vcpkg, Native, GPLv2 license

miniaudio (cont.)

SoLoud - Website, Docs, Download, Native, zlib license

SDL_mixer - Website, Docs, vcpkg, Native, zlib license

DSPs

NWaves - GitHub, Docs, nuget, Managed, MIT license

KFR - Website, Docs, vcpkg, Native, GPLv2 license

Madronalib - GitHub, Native, MIT license

Other

Steam Audio - Website, Docs, GitHub Releases, Native, Proprietary license

Standalone codec libraries

Summary & Thoughts

So, uh... what should we use?

Why the bias towards native libraries?

What about proprietary licenses?

How would a VMix-like system work?

How would Steam Audio integrate into this?

Woah, this is complicated.

Am I overthinking/overengineering this?

Replies: 1 comment · 1 reply

xezno Jan 29, 2023 Maintainer

MuffinTastic Jan 29, 2023 Author

MuffinTastic
Jan 29, 2023

Replies: 1 comment 1 reply

xezno
Jan 29, 2023
Maintainer

MuffinTastic Jan 29, 2023
Author