Comparison with waylonflinn/weblas #126

mike1808 · 2017-07-11T13:07:40Z

Hey, you guys really rock with this project! Did you compared the performance of some popular kernels with waylonflinn/weblas? It will be very interesting to see how fast/slow your library for these kernels:

sscal - Matrix (and Vector) Scale (with addition)
sgemm - Matrix Multiply
sdwns - Matrix (and Image) Downsample (for Max Pooling)
sclmp - Matrix clamp (for ReLU)

waylonflinn · 2017-07-11T14:11:15Z

I'm the author of the linked library. I'd also be interested in seeing a detailed comparison.

Here's a (very hastily done) comparison for matrix multiply (gemm) on a 512x512 matrix. Time is given in milliseconds.

library	time
gpu.js	85ms
weblas	14ms

Time for gpu.js is from gpu.rocks.

This is a very interesting library with a lot of flexibility. I'd love to see how performance compares on this benchmark across a range of matrix sizes.

robertleeplummerjr · 2017-07-11T14:33:55Z

Some factors to consider:

Gpu build time (compilation, kernel generation, etc. take some time)
Transfer to and from cpu and to and from an array to a texture takes more time
The hard truth is that likely a single matrix transformation will not be greatly more performant, if at all, but if you were to stack a bunch of matrices, and do all their transformations on the gpu, and have a single transfer to and from the cpu to gpu, you'll see a substantial gain in performance.

Firefox, for example, runs about 120-130 times faster than the cpu mode when using textures.

Another factor that we are considering to add as an optional setting is floating point precision. Currently we have 32 bit floating point precision, we could allow for lower precision. In neural nets, for example, this number can be reduced to say, 16 bit, or even 8 bit, and the net can compensate for it (imagine looking at a blurry picture of yourself, and knowing it is yourself), which (I'm not a mathematician) should be about an order of magnitude faster.

We are working very hard to get v1 finished up. I've got a job and family, and was only recently added to the team, but we are making great progress (before work, during breaks, during lunch, etc.).

fuzzie360 · 2017-07-11T14:36:40Z

Hi @waylonflinn, thanks for the quick matchup!

I'm a big fan of the BLAS and LAPACK libraries, so I found reading your glsl code really eye opening.

Looks like gpu.js has a lot of work on our hands. Currently, even with a 85ms / 4 = 21.25ms theoretical speedup with a vectorizing SIMD compiler (which doesn't exist yet), it looks like we will not be able to even scratch weblas's timings!

Do you mind if we borrow your encode_float for an alternative fast implementation? It seems useful to have this as a configurable option.

robertleeplummerjr · 2017-07-11T14:39:59Z

@waylonflinn I totally missed that your library is gpu based, lol. Been a fun morning. Very interesting though!

robertleeplummerjr · 2017-07-11T14:50:01Z

@waylonflinn looking at http://waylonflinn.github.io/DeepBeliefSDK/, one word: fantastic

It has been a dream of mine for some time to see a convolutional neural net like this in the browser/js.

waylonflinn · 2017-07-11T14:51:00Z

@fuzzie360 please feel free to use encode_float. If you do end up using it, I have an open issue for testing here: waylonflinn/weblas#11 . Any help would be greatly appreciated!

waylonflinn · 2017-07-11T14:52:45Z

@robertleeplummerjr You might also be interested in: https://github.com/transcranial/keras-js

I've been collaborating with the author to make full use of weblas. It's still in the early stages, but I have high expectations for it!

robertleeplummerjr · 2017-07-11T14:54:08Z

very cool

fuzzie360 · 2017-07-11T15:01:07Z

@waylonflinn I've encountered the numerical stability issues myself and gotten around it: https://github.com/gpujs/gpu.js/blob/develop/src/backend/web-gl/shader-frag.js#L45-L82

Have verified it to be working on notorious GPUs like Intel HD 2000. I'm actually been thinking of moving from my safe implementation and moving back to the unsafe implementation by detecting the special rounding characteristics of the GPU and choosing a correct implementation.

waylonflinn · 2017-07-11T15:26:10Z

@fuzzie360 very nice! I'm hoping that universal support for floating point textures in WebGL 2.0 will remove the need for the float encode altogether.

robertleeplummerjr · 2017-07-11T15:40:46Z

@waylonflinn do you have the specific source for your benchmark here #126 (comment)?

robertleeplummerjr · 2017-07-11T15:59:30Z

To answer the original question from @mike1808, we have not yet, as we are still mostly in alpha, but the libraries you mention are very finite in what they solve, whereas gpu.js is very open ended.

waylonflinn · 2017-07-12T00:12:21Z

@robertleeplummerjr I ran the benchmarks this morning on my personal development machine. You can replicate this with the command npm run benchmark, as described in the benchmarks section of the weblas README.

robertleeplummerjr · 2017-07-17T17:13:25Z

I was able to find and fix a flaw in our compilation that gives us a 300% boost over previous benchmarks. @waylonflinn We're coming for you!
😋

robertleeplummerjr · 2017-10-17T12:09:02Z

Yoohoo, @waylonflinn... #206 (comment)

Jaws theme plays...

robertleeplummerjr · 2017-10-17T12:12:41Z

Note: the performance here isn't really fair, as it is showing off texture mode, which is like pipeline mode in weblas (which I would love to see the numbers on, and totally expect it to be faster than gpu.js), but look at those numbers!

512 x 512 matrix multiplication:

3 milliseconds

robertleeplummerjr · 2017-10-24T15:26:16Z

Landed in dev today, fyi.

waylonflinn · 2017-10-24T15:36:04Z

Very much fast.

I'm still working out how to do reliable benchmarks in pipeline mode for weblas. Everytime I do it I get results that seem impossibly fast. I'll try to work something up and post it here for comparison.

robertleeplummerjr · 2017-10-24T15:41:44Z

@waylonflinn I very much look forward to it, and possibly collaborating in the future!

fuzzie360 · 2017-10-24T15:44:51Z

Hi @robertleeplummerjr, sorry its been a long time since I last checked in. But I need to say this: you really cannot use benchmark.js to test the timing for texture mode as it is not a fair representation

This is what is being timed by benchmark.js on texture mode

timing  +------+
cpu     +------+             +-----------+
copying        +---+    +----+
gpu                +----+

What you really want to time is this if you don't want to take into consideration the time taken to retrieve the data back into CPU.

timing  +---------------+
cpu     +------+             +-----------+
copying        +---+    +----+
gpu                +----+

There is really no way to do it with a single kernel launch (for e.g. a single matrix multiplication). To measure real world texture mode performance you need to do something like raising a matrix to a power (multiple matrix multiplications in texture mode and get result back).

robertleeplummerjr · 2017-10-24T15:52:29Z

Yea, I know it really isn't fair, which I did mention. What I was trying to portray is that it is just really fast. In the case of machine learning, where gpu.js has my fascination, once values are on the GPU they don't need to come back to the CPU unless you are trying to see an output, or if you want to look at error rate.

robertleeplummerjr · 2017-12-23T16:04:45Z

Nice comparisons!

robertleeplummerjr closed this as completed Dec 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparison with waylonflinn/weblas #126

Comparison with waylonflinn/weblas #126

mike1808 commented Jul 11, 2017 •

edited

Loading

waylonflinn commented Jul 11, 2017 •

edited

Loading

robertleeplummerjr commented Jul 11, 2017

fuzzie360 commented Jul 11, 2017

robertleeplummerjr commented Jul 11, 2017

robertleeplummerjr commented Jul 11, 2017

waylonflinn commented Jul 11, 2017 •

edited

Loading

waylonflinn commented Jul 11, 2017

robertleeplummerjr commented Jul 11, 2017

fuzzie360 commented Jul 11, 2017

waylonflinn commented Jul 11, 2017

robertleeplummerjr commented Jul 11, 2017

robertleeplummerjr commented Jul 11, 2017

waylonflinn commented Jul 12, 2017

robertleeplummerjr commented Jul 17, 2017 •

edited

Loading

robertleeplummerjr commented Oct 17, 2017

robertleeplummerjr commented Oct 17, 2017

robertleeplummerjr commented Oct 24, 2017

waylonflinn commented Oct 24, 2017

robertleeplummerjr commented Oct 24, 2017

fuzzie360 commented Oct 24, 2017 •

edited

Loading

robertleeplummerjr commented Oct 24, 2017

robertleeplummerjr commented Dec 23, 2017

Comparison with waylonflinn/weblas #126

Comparison with waylonflinn/weblas #126

Comments

mike1808 commented Jul 11, 2017 • edited Loading

waylonflinn commented Jul 11, 2017 • edited Loading

robertleeplummerjr commented Jul 11, 2017

fuzzie360 commented Jul 11, 2017

robertleeplummerjr commented Jul 11, 2017

robertleeplummerjr commented Jul 11, 2017

waylonflinn commented Jul 11, 2017 • edited Loading

waylonflinn commented Jul 11, 2017

robertleeplummerjr commented Jul 11, 2017

fuzzie360 commented Jul 11, 2017

waylonflinn commented Jul 11, 2017

robertleeplummerjr commented Jul 11, 2017

robertleeplummerjr commented Jul 11, 2017

waylonflinn commented Jul 12, 2017

robertleeplummerjr commented Jul 17, 2017 • edited Loading

robertleeplummerjr commented Oct 17, 2017

robertleeplummerjr commented Oct 17, 2017

robertleeplummerjr commented Oct 24, 2017

waylonflinn commented Oct 24, 2017

robertleeplummerjr commented Oct 24, 2017

fuzzie360 commented Oct 24, 2017 • edited Loading

robertleeplummerjr commented Oct 24, 2017

robertleeplummerjr commented Dec 23, 2017

mike1808 commented Jul 11, 2017 •

edited

Loading

waylonflinn commented Jul 11, 2017 •

edited

Loading

waylonflinn commented Jul 11, 2017 •

edited

Loading

robertleeplummerjr commented Jul 17, 2017 •

edited

Loading

fuzzie360 commented Oct 24, 2017 •

edited

Loading