TODO: Test Silent army v5 of a wide variety of devices #5

maztheman · 2016-11-14T17:11:53Z

Discuss your results here

kruisdraad · 2016-11-14T18:03:44Z

Windows build please

maztheman · 2016-11-14T18:07:04Z

it is failing, i have to fix it up...

maztheman · 2016-11-14T20:28:23Z

If you have a 1070 or 1080 please test it:

https://github.com/maztheman/nheqminer/releases/tag/v0.4h

drigger · 2016-11-14T20:45:36Z

maztheman · 2016-11-14T20:51:47Z

try:
https://github.com/maztheman/nheqminer/releases/download/v0.4h/v0.4h_1070_plus.7z

i remove all the thread waiting..

maztheman · 2016-11-14T20:52:30Z

the v5 silent army doesnt seem to be too great when converted to CUDA :(, however It might just be the way its looping...i might be able to fix it

drigger · 2016-11-14T21:01:31Z

Hmm, w/ 0.4h_1070_plus this happens: ;)
Just for the sake of comparison - this is what I get w/ the same 1080 on one of the original SA5 python ports for windows:

maztheman · 2016-11-14T21:08:23Z

yeah there is some major issue with cuda and that code that was posted for opencl. cuda keeps "timing" out when I try to do same kind of calls...I guess itll have to be a work in progress for now.

Ill post again when I see improvement with my GTX 650, which should then indicate some progress with 1080s, etc.

maztheman · 2016-11-14T21:13:15Z

I see from the author or silentarmy:

mbevand commented 2 days ago • edited
@tupieurods You are right. Didn't know shared atomics were not hardware implemented pre-Maxwell. That' s definitively the cause of the slowdown then, because this commit makes heavy use of shared atomics. I see no solution other than maintaining a 2nd separate version of input.cl specifically for pre-Maxwell Nvidia GPUs then.

maztheman · 2016-11-14T21:20:32Z

Well based off that guys message I reverted back to a more direct conversion, maybe itll work better:

https://github.com/maztheman/nheqminer/releases/download/v0.4h/v0.4h_MAXWEL_PLUS.7z

It was super slow on GTX 650 but looks like it will be because of the shared atomics

drigger · 2016-11-14T21:27:32Z

It gives me much higher I/s numbers + full GPU load, but no Sols, unfortunately =)

maztheman · 2016-11-14T21:36:35Z

hmm wierd, i get 78 sols/s with my R9 290 with the v5 code in open cl. 0 sols probably because some timeout is causing a crash

kruisdraad · 2016-11-14T21:37:17Z

I ran both versions.

Setup: 6x GTX1070 with Windows 10 latest drivers, etc (anno vers)

0.4h plain: 241 sol/s power usage 51%
0.4h maxplus: 2.4 sol/s and a near 400 I/s. Also the power usage is above 70%

previsous version got about 250 so its a little less, power usage is much more stable though.

maztheman · 2016-11-14T21:40:04Z

hmm 241 is really not that competitive to the SA linux version, is it?

kruisdraad · 2016-11-14T21:44:04Z

Honestly i havent tried the linux version yet, The zcminer-dev windows version gets about 300 so its not that bad.

You have Linux example on actual speeds?

chronosek · 2016-11-14T22:05:24Z

Compiled. For me crashing after 20 sec (i see constantly increase use of gpu memory till all 4GB is fillup) and then nvidia is in P5 locked state..

[23:00:05][0x000011dc] Using SSE2: YES
[23:00:05][0x000011dc] Using AVX: YES
[23:00:05][0x000011dc] Using AVX2: YES
[23:00:05][0x000011a8] stratum | Starting miner
[23:00:05][0x000011a8] stratum | Connecting to stratum server eu1-zcash.flypool.org:3333
[23:00:05][0x00001194] miner#0 | Starting thread #0 (CUDA-SILENTARMY) GeForce GTX 980 (#0) BLOCKS=64, THREADS=512
[23:00:05][0x000011a8] stratum | Connected!
[23:00:05][0x000011a8] stratum | Subscribed to stratum server
[23:00:05][0x000011a8] miner | Extranonce is fa613f0f55
[23:00:05][0x000011a8] stratum | ←[35mTarget set to 004189374bc6a7ef9db22d0e5604189374bc6a7ef9db22d0e5604189374bc6a7←[0m
[23:00:06][0x000011a8] stratum | ←[36mReceived new job #11ee30962342110c9785←[0m
[23:00:20][0x000011dc] ←[33mSpeed [300 sec]: 8.72074 I/s, 15.646 Sols/s←[0m
[23:00:36][0x000011dc] ←[33mSpeed [300 sec]: 8.94374 I/s, 16.6693 Sols/s←[0m
CUDA error 'out of memory' in func 'sa_cuda_context::solve' line 1029
CUDA error 'out of memory' in func 'sa_cuda_context::solve' line 1030
CUDA error 'unspecified launch failure' in func 'sa_cuda_context::solve' line 1073
CUDA error 'unspecified launch failure' in func 'sa_cuda_context::solve' line 1024
CUDA error 'unspecified launch failure' in func 'sa_cuda_context::solve' line 1025
CUDA error 'unspecified launch failure' in func 'sa_cuda_context::solve' line 1029
CUDA error 'unspecified launch failure' in func 'sa_cuda_context::solve' line 1030
CUDA error 'unspecified launch failure' in func 'sa_cuda_context::solve' line 1073
CUDA error 'unspecified launch failure' in func 'sa_cuda_context::solve' line 1024
[ ... ]

maztheman · 2016-11-14T22:07:51Z

yes, sorry there is a memory "leak" that i already have fixed but not have pushed
Its only supposed to create the 2 buffers once, and reuse the, in my old code I was creating it every time and not deleting....:P

auroracoin · 2016-11-14T22:39:58Z

I tried it on a 750ti comp 5.0 and I get this massage. Missing file MSVCP140.dll for win 8.1

maztheman · 2016-11-14T22:41:40Z

oh, you need to install the redist file:
https://www.microsoft.com/en-ca/download/details.aspx?id=48145

maztheman · 2016-11-14T22:54:08Z

@chronosek ive checked in the fixes

auroracoin · 2016-11-14T22:58:58Z

That worked...thx :)

chronosek · 2016-11-15T00:06:25Z

@maztheman thx, not crashing now, but always get 0 sol/s no matter what -cb, -ct will set

maztheman · 2016-11-15T01:24:37Z

Yeah some internal issue, ill have to fully debug it again...

dtawom · 2016-11-15T09:05:34Z

Getting @ 17-18 sol/s with a 1060 3gb card with 0.4h. I found the best rate for this card is obtained with -cb 128 -ct 32 (autodect puts it at -cb 63 ct 64 and only yields 15 to 16 sol/s.

krnlx · 2016-11-15T09:52:13Z

removing packed atomic counters = bad, I tested it in opencl, it is fastest. I tested packed 64-bit atomics too, 1-2% slower

chronosek · 2016-11-15T12:33:17Z

tpruvot did good job porting to cuda, specially with atomic part (working stable at 58 sol/s but still eqm doing 66 sol/s), i think maztheman problems was from some hardcoded values or other code what was not in silentarmy

maztheman · 2016-11-15T20:28:15Z

I added a test program that will help me debug this cuda issue. Please post your log files.

drigger · 2016-11-15T20:43:22Z

log.txt

maztheman · 2016-11-15T21:49:48Z

I created a new build that should "work" on 1080 and 1070's again. Probably wont break any records though...

maztheman · 2016-11-26T21:15:29Z

I uploaded a 16 thread version

maztheman · 2016-11-26T21:19:26Z

Thanks for your testing! Not exactly what I was expecting but seems each card has its own sweet spot.

jddebug · 2016-11-26T21:50:07Z

t16
16.5 sols on gtx760
So far seems the 32 is the best.

Whenever I start the miner it always shows blocks 42 and threads 64. Is that when the mining software thinks I should be using?

maztheman · 2016-11-26T21:57:58Z

The original miner used that information but I dont use it.

42 is 7 * your sm count, which I guess is 6. It seems silent army uses a different block dimension so that it can use the thread id as an index into the hash table.

drigger · 2016-11-27T02:36:57Z

My results with latest beta:

GTX 580:

t16 22.41
t32 27.5
t64 32.3
t128 29.5
t256 26

GTX 550 Ti:

t16 5.7
t32 7.4
t64 7.5
t128 6.5
t256 5.4

GTX 650:

t16 5.3
t32 6.6
t64 8.8
t128 8.6
t256 8.0

maztheman · 2016-11-27T04:10:19Z

@drigger. Thanks for the testing.

Kahana82 · 2016-11-27T16:40:42Z

I'm getting around 100 sols with GTX980 with version L now
best result (102 sols) with -cs -cb 512 -ct 64 -cd 0

with cpu (7 out of 8 threads) it gives around 120 sols
-t 7 -cs -cb 512 -ct 64 -cd 0
I found keeping 1 thread free (7 of 8 total) gives better result on gpu and same if not better final sols.

On previous versions (i) best (42 sols) was -cs -cb 256 -ct 32 -cd 0
around 66 sols with -t7 -cs -cb 256 -ct 32 -cd 0

This is a huge improvement ! keep up the good work Maz :)

maztheman · 2016-11-27T17:52:16Z

Tnx I'll keep porting any enhancements I find.

dtawom · 2016-11-28T08:09:28Z

@jddebug Kenai/Soldotna area. Thanks for the updates @maztheman, I'll post updates for all my stuff tomorrow.

ceozero · 2016-11-28T18:38:19Z

Can you please send me a private generation absenteeism?

maztheman · 2016-11-28T18:41:27Z

@ceozero Sorry I dont understand what you mean...

ceozero · 2016-11-28T18:46:39Z

@maztheman I need a new miner. You need to modify the code to re compile. can you do it?

maztheman · 2016-11-28T18:49:15Z

Yeah I can compile anything for windows.

ceozero · 2016-11-28T18:52:05Z

May I know your email address? Or do you want to send a mail to me. [email protected]

maztheman · 2016-11-28T18:56:55Z

I sent you an email

dtawom · 2016-11-28T20:08:19Z

Getting @ 75-85 sol/sec off my GTX 1060 3GB with 0.4l, for some reason that number doesn't improve much when I have all 4 CPU's running too. I was getting better results using your miner for CPU and Silentarmy for GPU at a combined 100 sol/s. Default detection for the 1060 is 63 blocks & 64 threads, I cap out at about 75 with this setting, 128b and 32t seems to give me the best capping out at about 85 sol/sec. Is there a calculator out there that will tell you what the best block and thread count is for any given model and ram size video cuda card?

ceozero · 2016-11-28T20:10:26Z

I can't run, I'm still working on that.

…

在 2016年11月29日，上午4:08，dtawom ***@***.***> 写道： Getting @ 75-85 sol/sec off my GTX 1060 3GB with 0.4l, for some reason that number doesn't improve much when I have all 4 CPU's running too. I was getting better results using your miner for CPU and Silentarmy for GPU at a combined 100 sol/s. Default detection for the 1060 is 63 blocks & 64 threads, I cap out at about 75 with this setting, 128b and 32t seems to give me the best capping out at about 85 sol/sec. Is there a calculator out there that will tell you what the best block and thread count is for any given model and ram size video cuda card? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AVeH-ZwE_v68V9-n2Rw-fCDUF63s7Phvks5rCzSzgaJpZM4KxkRY>.

maztheman · 2016-11-28T20:45:54Z

@dtawom , the silentarmy does not include some of the changes I added, so it maybe more efficient for that specific card. unless you are using the https://github.com/zawawawa/silentarmy version. I ported those changes and some other nvidia specific parameters and stuff. If I had all the nvidia cards I would be testing and testing before I created builds. I just have to trust that the changes im porting "work" and "work well" with all nvidia 10xx series cards. There is a "Version 6" of silentarmy coming VERY soon. I will be porting those changes ASAP.

Stay tuned...

dtawom · 2016-11-30T01:22:51Z

@maztheman yeah the zawawawa one is the one I was using. The difference is so negligible (5-10 sol/sec per machine) that it is easier just to use your latest build only. The new nvidia one you made for older cards is getting me @ 40 sol/s on my 580 gtx which is @ 15 more sol/sec than I was getting before. Thanks soo much for the updates, they've got me hitting peaks of 650 sol/s now between all my rigs. Looking forward to silentarmy V6, hope the improvements are as significant as they were from 4 to 5.

maztheman · 2016-11-30T01:56:08Z

I'm just glad the work I'm doing is useful to someone :-)

ceozero · 2016-12-07T13:04:17Z

@maztheman what‘s time update. to Faster.,

maztheman · 2016-12-08T14:31:51Z

A few more days

dtawom · 2016-12-09T14:14:01Z

I dunno if this guy is running off silentarmy v6 or something else, but these numbers are soooo sexy. Gonna give um a try on my 1060 rigs. https://forum.z.cash/t/ewbfs-nvidia-cuda-zcash-miner-1060-170-h-s-gtx-1070-250-h-s/12523

maztheman · 2016-12-09T14:19:04Z

Oh it's closed source so I can't see what he is doing. I can profile the exe probably and see what I can see

dtawom · 2016-12-09T14:31:21Z

Confirmed 175 sol/sec on 1060 with no CPU. You should do what this guy does and take a couple percent for yourself. I'd rather support you than him.

dtawom · 2016-12-09T15:05:41Z

Holy moly, getting triple speed off my R9 Fury's now with claymore. 325 H/s per card with https://github.com/nanopool/ClaymoreZECMiner
Gonna break the gigahash mark per sec tomorrow :)

maztheman · 2016-12-09T15:11:06Z

:) wow, looks like claymore is on top again!

maztheman · 2016-12-14T21:54:32Z

Looks like an update will happen pretty soon...

maztheman · 2016-12-20T15:26:27Z

new build new thread

#8

maztheman mentioned this issue Nov 14, 2016

TODO: Test CUDA Silentarmy v4 Implementation on a wide variety of devices #1

Closed

maztheman closed this as completed Dec 20, 2016

TODO: Test Silent army v5 of a wide variety of devices #5

TODO: Test Silent army v5 of a wide variety of devices #5

Comments

maztheman commented Nov 14, 2016

kruisdraad commented Nov 14, 2016

maztheman commented Nov 14, 2016

maztheman commented Nov 14, 2016

drigger commented Nov 14, 2016

maztheman commented Nov 14, 2016

maztheman commented Nov 14, 2016

drigger commented Nov 14, 2016

maztheman commented Nov 14, 2016 • edited Loading

maztheman commented Nov 14, 2016

maztheman commented Nov 14, 2016

drigger commented Nov 14, 2016 • edited Loading

maztheman commented Nov 14, 2016

kruisdraad commented Nov 14, 2016 • edited Loading

maztheman commented Nov 14, 2016

kruisdraad commented Nov 14, 2016

chronosek commented Nov 14, 2016 • edited Loading

maztheman commented Nov 14, 2016

auroracoin commented Nov 14, 2016

maztheman commented Nov 14, 2016

maztheman commented Nov 14, 2016

auroracoin commented Nov 14, 2016

chronosek commented Nov 15, 2016

maztheman commented Nov 15, 2016

dtawom commented Nov 15, 2016

krnlx commented Nov 15, 2016

chronosek commented Nov 15, 2016

maztheman commented Nov 15, 2016

drigger commented Nov 15, 2016

maztheman commented Nov 15, 2016

maztheman commented Nov 26, 2016

maztheman commented Nov 26, 2016

jddebug commented Nov 26, 2016

maztheman commented Nov 26, 2016

drigger commented Nov 27, 2016 • edited Loading

maztheman commented Nov 27, 2016

Kahana82 commented Nov 27, 2016

maztheman commented Nov 27, 2016

dtawom commented Nov 28, 2016

ceozero commented Nov 28, 2016

maztheman commented Nov 28, 2016

ceozero commented Nov 28, 2016

maztheman commented Nov 28, 2016

ceozero commented Nov 28, 2016

maztheman commented Nov 28, 2016

dtawom commented Nov 28, 2016

ceozero commented Nov 28, 2016 via email

maztheman commented Nov 28, 2016

dtawom commented Nov 30, 2016 • edited Loading

maztheman commented Nov 30, 2016

ceozero commented Dec 7, 2016

maztheman commented Dec 8, 2016

dtawom commented Dec 9, 2016 • edited Loading

maztheman commented Dec 9, 2016

dtawom commented Dec 9, 2016

dtawom commented Dec 9, 2016 • edited Loading

maztheman commented Dec 9, 2016

maztheman commented Dec 14, 2016

maztheman commented Dec 20, 2016

maztheman commented Nov 14, 2016 •

edited

Loading

drigger commented Nov 14, 2016 •

edited

Loading

kruisdraad commented Nov 14, 2016 •

edited

Loading

chronosek commented Nov 14, 2016 •

edited

Loading

drigger commented Nov 27, 2016 •

edited

Loading

dtawom commented Nov 30, 2016 •

edited

Loading

dtawom commented Dec 9, 2016 •

edited

Loading

dtawom commented Dec 9, 2016 •

edited

Loading