-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic Interfaces #3
Comments
For completeness, I am repeating part of the dialog with @mewmew from his pull request: The main questions to resolve for it are:
If we provide runtime or OS wrappers, then the list of codecs available The non-uniformity would in turn mean that we'd have to be able to identify The idea I had in mind with Manufacturer was to help identify the codec, ie The idea I had in mind with Extensions is filename extensions. I see you are an author of a Go FLAC package, which is great! Do you have |
I have a proof of concept decoder ready to be reviewed for FLAC :) I did not really make use of the generic codec interface though. What are the benefits of doing so? I.e. what are the intended consumers of this interface? Note: The proof of concept flac zikichombo frontend contains some bug which makes the playback of music insert some incorrect samples once every second or something. This is probably due to an off-by-one error or an incorrect calculation of the stride between channels in the dst slide of
To be honest, I dislike this idea with a passion. It feels like the way FLAC tried to do with vendor IDs, and its a mess to have a central register around it, both in maintenance and prevents open source repos to easily provide implementations as you have to register the ID to avoid collisions. To communicate the feeling better, imagine trying to use a library in C, as compared to Go. In Go the package has a fully qualified import path with by convention contains the domain name for packages released to the public. In contrast, in C you have to make sure that library names don't collide, which is more or less impossible in the every dynamic nature of open source development. To handle collisions, Go relies on DNS which is already distributed in nature and handle collision by unique domain name registration. That being said, I don't know exactly what the solution may look like. Only that I would veto having a Manufature/vendor ID that requires a global register. Haha, given that I have any veto power in this issue of course :p |
As noted in zikichombo#3 (comment) there are some bugs to be ironed out.
My brother and I developed the flac library together. Inviting him to the discussion :) cc: @karlek |
Great! Thanks First, I guess we can add int32 versions of ToFixed/etc in sound/sample.
Second, for the purpose of the generic interface: The main consumer is a developer working on any aspect of sound not related to encodings: playback, music information retrieval, voip, etc. By proxy, their end users are also consumers. Many of these usages would like to treat the codecs as a black box, at least as much as possible. Developers would like to be able to respond to the question of what codecs they support. The generic codec package is intended to allow a developer to just say "we support [...]" with [...] being the list of codecs supported here. As you have observed, the generic codec interface is already halfway done: just implement sound.{Source,Sink,RadomAccess} and it plugs in nicely, without even needing to reference zikichombo.org . thanks to Go's interfaces. So one option is to do just that. The umbrella-level part is about
In order to at once enable more use of existing Go codec implementations and guarantee IP and quality, zikichombo vets upstream suppliers of codecs for inclusion in the codebase. In order to to have import "xyz" with "xyz" not in the standard library or based at zikichombo.org. we need an agreement on release guidelines and guarantees of upstream suppliers, so that we know how to incorporate the code safely for our users, and so that users can know that all dependencies of zikichombo.org code are license compatible. This aspect is not so much related to "manufacturer" in the API, which is intended more for the runtime codec selection mentioned above. We do not want to prevent or discourage distributed open source development, but as we have no choice for what happens when someone does "go get zikichombo.org", we do want to be able to control for quality of the result (as a matter of process, even though we are still in alpha), just as you want to control for what dependencies you use in your flac decoder. Just implementing the Source/Sink/RandomAccess interfaces is one option for distributed development. Maybe we can do something similar for filename extensions as well. That said, I think it would be a great demonstration of cooperation and very useful if users could find a functional, reliable list of supported codecs for doing sound in Go in one place. Zikichombo can provide that, and it would be great to add FLAC to the list. |
I like this design, a lot.
For file name extensions, we could follow the example set by the image library. Something along the lines of image.RegisterFormat. Note, this is the way Azul3D incorporated audio decoding, and it seemed to work quite well. Have a look at audio.RegisterFormat if you haven't already for some inspiration.
Definitely! |
Thanks for the pointers. I think we have more functionality to address than Azul3D's audio.RegisterFormat, most importantly, encoding and random access. In audio, my impression is that there is also more heavy usage and reliance on metadata than images, which can also be read/write. However, I also think it would be best to address decoding+encoding+random access first. It looks like Azul3D's "magic" let's codecs look at the first few bytes of data to say whether or not they volunteer to the task. It doesn't look documented exactly how the codec implementation is selected. Also, I never understood why the Go standard library image package didn't allow to register encoders. So my thoughts after reading your comments are to do something like RegisterFormat, but for encoding, decoding, and random access, and to document the selection mechanism so that somehow it can either be controlled directly, or predicted by developers in such a way that they can manipulate what codec actually gets used. Thoughts? |
Just to clarify, when you talk about random access, do you simply mean seek support? Just checking so I interpret it correctly.
I think this is mostly implemented in the same fashion as the Go image library. Peek on a few bytes to determine file format by magic values. If there are more than one codec for a specific file format, then the first file format to register the magic value will get precedence. At least this is how it works in the Go image library: https://golang.org/src/image/format.go?s=1005:1129#L63
Good point, it only allows to register decoders. My guess is that decoders need less options specified, whereas encoders can be very specific and thus providing a general interface may be bloated and adding too many layers of abstraction. Just a guess though. For instance, if you want to encode an animation (e.g. GIF), you have to provide option to do so in the encoder; but many of the other formats don't have support for animations. Similar comparisons can probably be made for sound codecs. That being said, it would perhaps make sense to provide a general encoding interface, which selects "good defaults" for each codec, and let users use the specific encoders directly if they require more control of the encoding options. I do not think it should be a goal to support everything in the general interface. This can also be a guiding principle when it comes to metadata support. Only let the general interface support the most basic metadata (e.g. key-value pairs), and use specific encoders/decoders for more exotic metadata. Well, these are only preliminary thoughts and they may change as we explore the problem and solution space more. |
As noted in zikichombo#3 (comment) there are some bugs to be ironed out.
Thanks for your thoughts. First, for random access, the sound.RandomAccess interface is what I had in mind; not Second, after reading your thoughts, I agree that limiting the functionality to a bare minimum at the registration level is desirable, and let direct interaction with individual codecs deal with everything else. For me, bare minimum would however include functionality for all of sound.{Source,Sink,RandomAccess}. Obviously RandomAccess would either be unavailable or of limited efficiency for most compressed formats. Unfortunately, there are more pure Go decoders than encoders. I would go ahead and fork a bare bones proposal, but I'm still thinking about if/how easy control of selection can be done, hopefully obviating the need for the developer to deal with package initialisation order. |
For info, I have created a "bare bones" pull request at #11 Upon coding it, it occurred to me that the role of the sample.Codec in the interface is important. Some codecs implicitly require or rely on the notion of a sample.Codec, and others, mostly related to perceptual based compression do not really have a notion of sample.Codec as the bitDepth varies during the encoding/decoding process and according to frequency range. This is handled in the proposed generic interface, and was not until now part of the discussion. The import comment was removed just so I could compile in a wsc1 fork, it is not part of the proposal. Thoughts appreciated. |
To keep discussion mostly in one place, I'll add the comment from #11 (review) @wsc1 Looks like a good start. I've added some comments for minor details. Regarding the overall API, we can only determine if it is good once we have lets say two or three encoders and decoders for various formats (especially for formats which are not similar to each other, e.g. varying sample bit rate in the same sound file). For this reason, I think that the API should be marked experimental, and only finalized once the above condition holds true. At which point we can evaluate if the API is a good match for the various different encoders/decoders. Cheers, |
Thanks. I'm glad you find it to be a good start. To summarise your more detailed comments that I feel can be continued here, there are:
About marking experimental, there is project maintenance overhead associated with making this distinction explicit rather than implicit via the fact it is in alpha and semver. I have addressed As this issue is referenced as a design discussion, I think anyone following would conclude it is experimental. Given all this, could you make more precise what you would like to see in terms of marking it experimental? |
It also occurred to me that it may be worth adding a function member of Codec which I can't think of a corresponding use case for sound.SinkSeeker Thoughts? |
As an update, I simplified the codec selection mechanism so there is no sense of codec-implementor priority, just package path. It now uses reflect and the package path of the encoding/decoding functions of a codec. It
Still working on integrating existing codecs. |
Sounds good to me. Let sem ver cover this. |
New PR #12 Looking for community opinions. |
Proposal: rename Codec to Godec just an idea. |
Why? Codec makes more sense to me at least. |
maybe it's too simple to merit being called a codec :) |
Ah. Well package encoding only provides interfaces for encoding implementations, but does not implement an encoding. In that sense, |
Yes, my feeling is that the main difference with sound is that sound, to me, is inherently flowing and not so easily capturable by an object/type per se. Not only in that it is stream-like but also in the physical/perceptual sense -- a block of samples is not really what a sound is in this sense because, for example, you will hear a click when it starts or stops unless it has fade in/out. So encoding to blocks of bytes, as in the encoding package, feels awkward to me as a high level functionality to provide for sound. |
As noted in zikichombo/codec#3 (comment) there are some bugs to be ironed out.
I think the package selection mechanism would be better (and this one may be buggy because of pointers and types) if done like in sio, where host.Entry is an interface, and reflect.TypeOf gives the implementing type. Should be fixed in this PR |
We need to settle in on generic codec interfaces.
In particular, do we want to do golang/image style compile time plugins or something more dynamic? Both?
Also, we need to iron out random access.
The text was updated successfully, but these errors were encountered: