Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative? #81

Open
msqr1 opened this issue Apr 12, 2024 · 23 comments
Open

Alternative? #81

msqr1 opened this issue Apr 12, 2024 · 23 comments

Comments

@msqr1
Copy link

msqr1 commented Apr 12, 2024

This is not really an issue. But I went through remaking the repo from scratch using newer web technology and features: https://github.com/msqr1/Vosklet. Can I merge some changes over there to here, there are lots of stuff to be improve, as this is getting outdated. @ccoreilly

@ccoreilly
Copy link
Owner

Hi @msqr1 ! Great initiative :) software evolves and needs to be maintained. I do not have time to dedicate to this repository so it is good that better alternatives surge and gain traction.

I'll have a deeper look at your work later this week. In the end, users decide based on the developer experience and the features of these libraries so I'd be interested on what other users like @Yahweasel or @erikh2000 think.

@Yahweasel
Copy link
Contributor

The core thing I need out of vosk-browser is to not have an AudioContext-level API. I do all of my own audio capturing and ten other layers of processing. Further, although in my own project I do use threads, so SharedArrayBuffer is a nonissue, it's valuable to have a version that runs synchronously, because some users (including myself) manage their own threads. I would rather have a vosk running synchronously with a Worker thread I created on my own than running asynchronously with a Worker thread created by a library. To excessively toot my own horn, my own libav.js allows the user to load it in a synchronous mode, a worker mode, or a threaded mode, and provides the same API in all three.

Basically: I wouldn't mind a more up-to-date vosk adapter, but as stands, your API is too opinionated for me.

@msqr1
Copy link
Author

msqr1 commented Apr 13, 2024

You're right, I try to make this as easy to use as possible, just some minimal setup and you can start recognizing. I agree that more features should be added, but as this is the first version, I want to make it as fast and easy to setup as possible. Other use cases can be addressed later.

@erikh2000
Copy link

@msqr1 I'm interested in your project, but I'm likely to stick with vosk-browser out of inertia and not having any complaints with it. The main thing I saw in Vosklet that I'd like to see in vosk-browser, if practical, is more of the Vosk functions exposed. I had told myself that at some point I'd get vosk-browser building and try to contribute that myself, but I never got around to it.

The faster processing time is intriguing too. What kind of metrics are you seeing?

@msqr1
Copy link
Author

msqr1 commented Apr 13, 2024

I didn't really measured it, ngl, so maybe I should remove that line. But, I moved hot computations to c++ like free, mapping input data, I also use a simpler mechanism to communicate between js and c++, I used the faster new emscripten wasmfs, I used the new emmalloc, I turned on o3, lto, simd, non trapping float to int and many more... As such, I think it should be faster. You're right, I shouldn't claim anything without benchmarks.

@erikh2000
Copy link

No worries, @msqr1. I don't expect you to be super-scientific in your claims. I was just curious about what kind of speed increase you might be seeing. Your changes for performance seem promising.

@Yahweasel
Copy link
Contributor

FYI, simd will do not a damned thing (other than make it not work on Safari) unless the code is specifically written to use it. wasm simd is broadly compatible with x86 simd, but only the C API, and nobody uses the C API. I would be stunned to learn that that's gaining you anything. I had a simd version of libav.js for years and finally ditched it because it wasn't actually beneficial.

@msqr1
Copy link
Author

msqr1 commented Apr 15, 2024

Well, the thing is kaldi just refuses to compile with simd off, so I have to turn it on. It may or may not do anything though.

@Yahweasel
Copy link
Contributor

Oh, well that's just lovely X-D

@msqr1
Copy link
Author

msqr1 commented Apr 15, 2024

Just curious, how do you use a speech recognition library with your libav project? Isn't that for audio formats?

@Yahweasel
Copy link
Contributor

I do not. I use both in Ennuicastr.

@msqr1
Copy link
Author

msqr1 commented Apr 19, 2024

I can make a sync version, I just don't know how it is possible. If you block the current thread to recognize, how do you stop it? Synchronous model and recognizer loading should be easy. I'm not sure about the recognizer loop.

@Yahweasel
Copy link
Contributor

I can make a sync version, I just don't know how it is possible. If you block the current thread to recognize, how do you stop it? Synchronous model and recognizer loading should be easy. I'm not sure about the recognizer loop.

We're on an issue submitted to a synchronous version of the same API ;)

@msqr1
Copy link
Author

msqr1 commented Apr 19, 2024

The recognizer, I can't see how it is synchronous? It can't be blocking the one thread that is controlling itself.
Can I take a look at the issue? Maybe there is something I can do. Keep in mind that even if the recognizer is asynchronous, you can bind event listener to them, and setXXX on them synchronously. The only synchronous part is the recognition process itself:

@Yahweasel
Copy link
Contributor

The API of Vosk just takes a chunk at a time. That API is synchronous.

@msqr1
Copy link
Author

msqr1 commented Apr 19, 2024

I get it, but wouldn't that block itself from other actions? I can surely add acceptWaveformSync() that recognize (will block) on the same thread and return the result. Will that fit your use case? Ngl, a fully synchronous API, is even easier than the current one. I only need to translate it over without managing task queues and other stuff

@Yahweasel
Copy link
Contributor

My case is that I have vosk-browser loaded in a Worker thread which is also responsible for echo cancellation, noise suppression, audio metrics, and encoding. Each of these steps takes raw Float32Array audio in and spits raw Float32Array audio out, and I want them all to be synchronous because I'm managing all the threading myself. What I mean when I say that your API is opinionated is that it's doing more than just vosk: it's handling capture, it's handling threading, it's handling formats. For some people, that's presumably very useful. For me, that's actively unhelpful.

Also, to be clear: you should not be writing your code to fit my use case if that doesn't help you in any way. I'm perfectly happy with vosk-browser, and have no urgent need for a more updated version, though as a general principle I'd like for things to be up to date. I'm only presenting my case on this thread because I was asked to.

@msqr1
Copy link
Author

msqr1 commented Apr 19, 2024

My case is that I have vosk-browser loaded in a Worker thread which is also responsible for echo cancellation, noise suppression, audio metrics, and encoding. Each of these steps takes raw Float32Array audio in and spits raw Float32Array audio out, and I want them all to be synchronous because I'm managing all the threading myself. What I mean when I say that your API is opinionated is that it's doing more than just vosk: it's handling capture, it's handling threading, it's handling formats.

No, I just want to find out how you use it, because I just want to see what use case would synchronous vosk be needed, so thanks for your information! The above really helped me learn!

@Yahweasel
Copy link
Contributor

I can be totally precise: https://github.com/ennuicastr/ennuicastr/blob/3b3830fc979b039c245429a5ec7657594af4a705/awp/ennuicastr-worker.ts#L786

There's my call to acceptWaveformFloat :)

@msqr1
Copy link
Author

msqr1 commented Apr 19, 2024

I completely understand it now :)))))))

@msqr1
Copy link
Author

msqr1 commented Apr 22, 2024

@ccoreilly did you go over it?

@Utopiah
Copy link

Utopiah commented Jun 16, 2024

FWIW I'd also be interested in a "updated" alternative that is actively maintained. Yet I would need to better understand in what the alternative is different.

If it is entirely compatible, e.g

  • does NOT require anything new (i.e works at least on all contexts e.g browsers, servers for hosting wouldn't require SharedArrayBuffer thus secure context, etc),
  • no new tooling to build,
  • uses exactly the same interface to program with it,

even without providing any improvement, I would probably be interested.

Yet, if it does have any trade off, e.g breaks compatibility with some context, like older browsers, Chromium only, etc, then IMHO they should be made explicit.

PS: to clarify even though https://github.com/ccoreilly/vosk-browser/tree/master/examples/modern-vanilla is 2 years old, it works for me even in rather "exotic" context, e.g Oculus browser for WebXR.

@msqr1
Copy link
Author

msqr1 commented Jun 16, 2024

I have Vosklet that i make as an alternative. You would want to check it out @Utopiah! It does need SABs though. I can make it SAB-less but I think it is just too much work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants