Sparse and AVX2 #172

syzygy1 · 2020-10-30T19:48:13Z

On my AVX2 laptop, sparse multiplication now turns out to be slower than the non-sparse multiplication. I suspect that this is not the case on some other AVX2 CPUs, in particular Zen 1.

I have therefore added a compilation option.
To compile with sparse multiplication: make -j pgo sparse=yes
To compile without sparse multiplication: make -j pgo sparse=no

By default "sparse=yes" except for AVX2 targets (including BMI2, VNNI, AVX512).

If it is clear that "sparse=no" is still faster on Zen 1 or on other CPUs with AVX2, I can make it the default on those CPUs. I cannot test this myself, so if anyone is willing to try sparse=yes/no on Zen 1 or other CPUs, that would be very welcome.

It would also be interesting to know if sparse=no is faster on any non-AVX2 CPUs.

The text was updated successfully, but these errors were encountered:

syzygy1 · 2020-10-30T19:49:32Z

The number of search threads might also have an impact on which is faster...

JavaMast · 2020-10-31T16:36:00Z

JavaMast · 2020-10-31T16:36:42Z

311020_1 = Correctly display castling rights for Chess960.
311020_2 = Improve non-sparse multiplication.

JavaMast · 2020-10-31T16:39:05Z

*Ryzen 3900X @3.8 GHz

JavaMast · 2020-10-31T16:48:24Z

syzygy1 · 2020-10-31T16:55:35Z

Thanks, so sparse AVX2 is still clearly better on AMD. Were these all tested on Ryzen 3900X?

JavaMast · 2020-10-31T17:01:40Z

Yes, all on my Ryzen 3900X.

I hope to get tests on another CPUs soon.

JavaMast · 2020-10-31T19:11:59Z

Intel i5 760 (Nehalem), 2,95 GHz

JavaMast · 2020-10-31T19:27:36Z

Athlon_x4_870K

syzygy1 · 2020-10-31T20:39:15Z

Thanks again!

So on Nehalem, no_sparse is now better than sparse, which was the other way around before the improvement.
On my Sandybridge PC, no_sparse is improved, but sparse is still better.
So there is no clear Intel rule here.

The Athlon resutls have a pretty high variance, but seem to suggest sparse is better.

JavaMast · 2020-10-31T23:01:50Z

Intel Core i5-7600K

JavaMast · 2020-11-01T11:58:53Z

Intel 6800k

JavaMast · 2020-11-01T12:00:20Z

i7-7700HQ @2.80GHz

syzygy1 · 2020-11-01T19:48:26Z

Thanks.
So sparse=no is now better on Intel AVX2.
For SSE2, sparse=yes is better. (I have now improved non-sparse for SSE2, but it still doesn't get close to sparse.)
For SSSE3/SSE41, there is no clear winner on Intel.

On AMD, sparse=yes seems better.

JavaMast · 2020-11-01T21:16:30Z

It looks like this.

I am very confused by the results on Athlon 870K - today more tests were carried out and the variance has become even greater.

Was tested with network nn-cb26f10b1fd9.nnue

syzygy1 · 2020-11-01T21:21:45Z

Maybe the cpu is overheating and then throttles down?

AlexB123 · 2020-11-02T18:16:49Z

It looks like this.

I am very confused by the results on Athlon 870K - today more tests were carried out and the variance has become even greater.

Was tested with network nn-cb26f10b1fd9.nnue

Hello guys! Above test was made on my PC, same as below speed tests. Recently my brother made a small update on my PS, and he didn't tell me that now i have Turbo boost, so now i have to learn how to switch the Turbo boost off (lol). I've repeated speed test with "Warm up CPU", speed looks more less correct.

syzygy1 · 2020-11-03T00:47:40Z

@AlexB123
Which CPU is that?
It seems non-sparse might be a little bit better with 1 thread (except for SSE2, which is expected) but loses to sparse with multiple threads.
Non-sparse probably uses a bit more power and therefore increases CPU temps more.

JavaMast · 2020-11-03T15:18:18Z

@syzygy1
This is Athlon 870K

syzygy1 · 2020-11-03T19:07:55Z

Ah, I see now.

JavaMast · 2020-11-16T12:58:11Z

Looks like no_sparse is faster on new AMD CPUs
AMD RYZEN 9 5950x

==================
Hope to see BMI2 builds in speed test soon.

JavaMast · 2020-11-16T14:36:29Z

AMD RYZEN 9 5950x

JavaMast · 2020-12-17T17:34:25Z

After "Updated to "AVX512, AVX2 and SSSE3 speedups"."
Ryzen 3900X

syzygy1 · 2020-12-17T22:29:44Z

What is the difference between SSSE3.exe and SSSE3_popcnt_mingw_10.exe ?

syzygy1 · 2020-12-17T22:34:05Z

I think the fact that no_sparse now beats sparse on Zen 3 shows that AMD has improved their AVX2 implementation in Zen 3.

JavaMast · 2020-12-18T18:17:51Z

What is the difference between SSSE3.exe and SSSE3_popcnt_mingw_10.exe ?

SSSE3 and SSSE3_sparse is 32-bit builds (compiled in MinGW i686-8.1.0-posix-dwarf-rt_v6-rev0)

syzygy1 · 2020-12-18T20:42:00Z

OK, so for 64-bit SSSE3 on Zen 2, sparse=yes is still faster than sparse=no.

But it seems sparse=no is now faster than sparse=yes for AVX2 on Zen 2. I thought sparse=yes was clearly faster before the AVX2 speed up. This suggests that sparse=no is now faster on all CPUs with AVX2.

syzygy1 · 2020-12-19T18:33:36Z

I just tested a Ryzen 4500U laptop and also found that sparse=yes was faster than sparse=no before the AVX2 speedup patch and is now slower.

JavaMast · 2021-04-12T19:20:31Z

Hello!

Sparse=no faster for all builds except SSE2 on Core i5 - 11400f.

AVX512_VNNI fastest

JavaMast · 2021-04-13T17:20:03Z

Just curious, on my i5 11400f Cish is faster with Pure mode:

Only for AVX2 builds and higher. Not for SSE builds.
On Ryzen 3900X - NNUE is still faster than Pure.

syzygy1 · 2021-04-24T22:29:58Z

Pure being fasted is pretty nice. Is it also stronger?

JavaMast · 2021-04-25T05:52:31Z

No, Hybrid still stronger

BMI2
10+0,1
concurrency 6

Score of Cfish_x64_120421_ELTO_BMI2 vs Cfish_x64_130421_ELTO_BMI2_Pure: 668 - 521 - 6564 [0.509]
... Cfish_x64_120421_ELTO_BMI2 playing White: 520 - 138 - 3219 [0.549] 3877
... Cfish_x64_120421_ELTO_BMI2 playing Black: 148 - 383 - 3345 [0.470] 3876
... White vs Black: 903 - 286 - 6564 [0.540] 7753
Elo difference: 6.6 +/- 3.0, LOS: 100.0 %, DrawRatio: 84.7 %
7758 of 20000 games finished.

AVX512_VNNI
10+0,1
concurrency 5

Score of Cfish_x64_120421_ELTO_AVX512___VNNI vs Cfish_x64_130421_ELTO_AVX512_VNNI_Pure: 527 - 507 - 6038 [0.501]
... Cfish_x64_120421_ELTO_AVX512___VNNI playing White: 406 - 119 - 3011 [0.541] 3536
... Cfish_x64_120421_ELTO_AVX512___VNNI playing Black: 121 - 388 - 3027 [0.462] 3536
... White vs Black: 794 - 240 - 6038 [0.539] 7072
Elo difference: 1.0 +/- 3.1, LOS: 73.3 %, DrawRatio: 85.4 %
7076 of 20000 games finished.

JavaMast · 2021-06-20T07:27:15Z

@syzygy1
did you know how much Cfish faster on an old CPUs?
My friend with Phenom II x6 1100T (SSE2 build compatible) told me that Cfish is 2 times faster than Stockfish...
On my i5-11400f it is "only" 50% faster

even x32 build is faster

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse and AVX2 #172

Sparse and AVX2 #172

syzygy1 commented Oct 30, 2020

syzygy1 commented Oct 30, 2020

JavaMast commented Oct 31, 2020

JavaMast commented Oct 31, 2020

JavaMast commented Oct 31, 2020

JavaMast commented Oct 31, 2020

syzygy1 commented Oct 31, 2020

JavaMast commented Oct 31, 2020

JavaMast commented Oct 31, 2020

JavaMast commented Oct 31, 2020

syzygy1 commented Oct 31, 2020

JavaMast commented Oct 31, 2020

JavaMast commented Nov 1, 2020

JavaMast commented Nov 1, 2020

syzygy1 commented Nov 1, 2020

JavaMast commented Nov 1, 2020 •

edited

Loading

syzygy1 commented Nov 1, 2020

AlexB123 commented Nov 2, 2020

syzygy1 commented Nov 3, 2020

JavaMast commented Nov 3, 2020

syzygy1 commented Nov 3, 2020

JavaMast commented Nov 16, 2020

JavaMast commented Nov 16, 2020 •

edited

Loading

JavaMast commented Dec 17, 2020

syzygy1 commented Dec 17, 2020

syzygy1 commented Dec 17, 2020

JavaMast commented Dec 18, 2020

syzygy1 commented Dec 18, 2020

syzygy1 commented Dec 19, 2020

JavaMast commented Apr 12, 2021

JavaMast commented Apr 13, 2021

syzygy1 commented Apr 24, 2021

JavaMast commented Apr 25, 2021

JavaMast commented Jun 20, 2021

Sparse and AVX2 #172

Sparse and AVX2 #172

Comments

syzygy1 commented Oct 30, 2020

syzygy1 commented Oct 30, 2020

JavaMast commented Oct 31, 2020

JavaMast commented Oct 31, 2020

JavaMast commented Oct 31, 2020

JavaMast commented Oct 31, 2020

syzygy1 commented Oct 31, 2020

JavaMast commented Oct 31, 2020

JavaMast commented Oct 31, 2020

JavaMast commented Oct 31, 2020

syzygy1 commented Oct 31, 2020

JavaMast commented Oct 31, 2020

JavaMast commented Nov 1, 2020

JavaMast commented Nov 1, 2020

syzygy1 commented Nov 1, 2020

JavaMast commented Nov 1, 2020 • edited Loading

syzygy1 commented Nov 1, 2020

AlexB123 commented Nov 2, 2020

syzygy1 commented Nov 3, 2020

JavaMast commented Nov 3, 2020

syzygy1 commented Nov 3, 2020

JavaMast commented Nov 16, 2020

JavaMast commented Nov 16, 2020 • edited Loading

JavaMast commented Dec 17, 2020

syzygy1 commented Dec 17, 2020

syzygy1 commented Dec 17, 2020

JavaMast commented Dec 18, 2020

syzygy1 commented Dec 18, 2020

syzygy1 commented Dec 19, 2020

JavaMast commented Apr 12, 2021

JavaMast commented Apr 13, 2021

syzygy1 commented Apr 24, 2021

JavaMast commented Apr 25, 2021

JavaMast commented Jun 20, 2021

JavaMast commented Nov 1, 2020 •

edited

Loading

JavaMast commented Nov 16, 2020 •

edited

Loading