-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparison with waylonflinn/weblas #126
Comments
I'm the author of the linked library. I'd also be interested in seeing a detailed comparison. Here's a (very hastily done) comparison for matrix multiply (gemm) on a 512x512 matrix. Time is given in milliseconds.
Time for This is a very interesting library with a lot of flexibility. I'd love to see how performance compares on this benchmark across a range of matrix sizes. |
Some factors to consider:
Firefox, for example, runs about 120-130 times faster than the cpu mode when using textures. Another factor that we are considering to add as an optional setting is floating point precision. Currently we have 32 bit floating point precision, we could allow for lower precision. In neural nets, for example, this number can be reduced to say, 16 bit, or even 8 bit, and the net can compensate for it (imagine looking at a blurry picture of yourself, and knowing it is yourself), which (I'm not a mathematician) should be about an order of magnitude faster. We are working very hard to get v1 finished up. I've got a job and family, and was only recently added to the team, but we are making great progress (before work, during breaks, during lunch, etc.). |
Hi @waylonflinn, thanks for the quick matchup! I'm a big fan of the BLAS and LAPACK libraries, so I found reading your glsl code really eye opening. Looks like gpu.js has a lot of work on our hands. Currently, even with a 85ms / 4 = 21.25ms theoretical speedup with a vectorizing SIMD compiler (which doesn't exist yet), it looks like we will not be able to even scratch weblas's timings! Do you mind if we borrow your encode_float for an alternative fast implementation? It seems useful to have this as a configurable option. |
@waylonflinn I totally missed that your library is gpu based, lol. Been a fun morning. Very interesting though! |
@waylonflinn looking at http://waylonflinn.github.io/DeepBeliefSDK/, one word: fantastic It has been a dream of mine for some time to see a convolutional neural net like this in the browser/js. |
@fuzzie360 please feel free to use |
@robertleeplummerjr You might also be interested in: https://github.com/transcranial/keras-js I've been collaborating with the author to make full use of |
very cool |
@waylonflinn I've encountered the numerical stability issues myself and gotten around it: https://github.com/gpujs/gpu.js/blob/develop/src/backend/web-gl/shader-frag.js#L45-L82 Have verified it to be working on notorious GPUs like Intel HD 2000. I'm actually been thinking of moving from my safe implementation and moving back to the unsafe implementation by detecting the special rounding characteristics of the GPU and choosing a correct implementation. |
@fuzzie360 very nice! I'm hoping that universal support for floating point textures in WebGL 2.0 will remove the need for the float encode altogether. |
@waylonflinn do you have the specific source for your benchmark here #126 (comment)? |
To answer the original question from @mike1808, we have not yet, as we are still mostly in alpha, but the libraries you mention are very finite in what they solve, whereas gpu.js is very open ended. |
@robertleeplummerjr I ran the benchmarks this morning on my personal development machine. You can replicate this with the command |
I was able to find and fix a flaw in our compilation that gives us a 300% boost over previous benchmarks. @waylonflinn We're coming for you! |
Yoohoo, @waylonflinn... #206 (comment) Jaws theme plays... |
Note: the performance here isn't really fair, as it is showing off texture mode, which is like pipeline mode in weblas (which I would love to see the numbers on, and totally expect it to be faster than gpu.js), but look at those numbers! 512 x 512 matrix multiplication:
|
Landed in dev today, fyi. |
Very much fast. I'm still working out how to do reliable benchmarks in pipeline mode for weblas. Everytime I do it I get results that seem impossibly fast. I'll try to work something up and post it here for comparison. |
@waylonflinn I very much look forward to it, and possibly collaborating in the future! |
Hi @robertleeplummerjr, sorry its been a long time since I last checked in. But I need to say this: you really cannot use benchmark.js to test the timing for texture mode as it is not a fair representation This is what is being timed by benchmark.js on texture mode
What you really want to time is this if you don't want to take into consideration the time taken to retrieve the data back into CPU.
There is really no way to do it with a single kernel launch (for e.g. a single matrix multiplication). To measure real world texture mode performance you need to do something like raising a matrix to a power (multiple matrix multiplications in texture mode and get result back). |
Yea, I know it really isn't fair, which I did mention. What I was trying to portray is that it is just really fast. In the case of machine learning, where gpu.js has my fascination, once values are on the GPU they don't need to come back to the CPU unless you are trying to see an output, or if you want to look at error rate. |
Nice comparisons! |
Hey, you guys really rock with this project! Did you compared the performance of some popular kernels with waylonflinn/weblas? It will be very interesting to see how fast/slow your library for these kernels:
The text was updated successfully, but these errors were encountered: