Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TODO: Test Silent army v5 of a wide variety of devices #5

Closed
maztheman opened this issue Nov 14, 2016 · 125 comments
Closed

TODO: Test Silent army v5 of a wide variety of devices #5

maztheman opened this issue Nov 14, 2016 · 125 comments

Comments

@maztheman
Copy link
Owner

Discuss your results here

@kruisdraad
Copy link

Windows build please

@maztheman
Copy link
Owner Author

it is failing, i have to fix it up...

@maztheman
Copy link
Owner Author

If you have a 1070 or 1080 please test it:

https://github.com/maztheman/nheqminer/releases/tag/v0.4h

@drigger
Copy link

drigger commented Nov 14, 2016

image

@maztheman
Copy link
Owner Author

try:
https://github.com/maztheman/nheqminer/releases/download/v0.4h/v0.4h_1070_plus.7z

i remove all the thread waiting..

@maztheman
Copy link
Owner Author

the v5 silent army doesnt seem to be too great when converted to CUDA :(, however It might just be the way its looping...i might be able to fix it

@drigger
Copy link

drigger commented Nov 14, 2016

Hmm, w/ 0.4h_1070_plus this happens: image ;)
Just for the sake of comparison - this is what I get w/ the same 1080 on one of the original SA5 python ports for windows: image

@maztheman
Copy link
Owner Author

maztheman commented Nov 14, 2016

yeah there is some major issue with cuda and that code that was posted for opencl. cuda keeps "timing" out when I try to do same kind of calls...I guess itll have to be a work in progress for now.

Ill post again when I see improvement with my GTX 650, which should then indicate some progress with 1080s, etc.

@maztheman
Copy link
Owner Author

I see from the author or silentarmy:

mbevand commented 2 days ago • edited
@tupieurods You are right. Didn't know shared atomics were not hardware implemented pre-Maxwell. That' s definitively the cause of the slowdown then, because this commit makes heavy use of shared atomics. I see no solution other than maintaining a 2nd separate version of input.cl specifically for pre-Maxwell Nvidia GPUs then.

@maztheman
Copy link
Owner Author

Well based off that guys message I reverted back to a more direct conversion, maybe itll work better:

https://github.com/maztheman/nheqminer/releases/download/v0.4h/v0.4h_MAXWEL_PLUS.7z

It was super slow on GTX 650 but looks like it will be because of the shared atomics

@drigger
Copy link

drigger commented Nov 14, 2016

It gives me much higher I/s numbers + full GPU load, but no Sols, unfortunately =)
image

@maztheman
Copy link
Owner Author

hmm wierd, i get 78 sols/s with my R9 290 with the v5 code in open cl. 0 sols probably because some timeout is causing a crash

@kruisdraad
Copy link

kruisdraad commented Nov 14, 2016

I ran both versions.

Setup: 6x GTX1070 with Windows 10 latest drivers, etc (anno vers)

0.4h plain: 241 sol/s power usage 51%
0.4h maxplus: 2.4 sol/s and a near 400 I/s. Also the power usage is above 70%

previsous version got about 250 so its a little less, power usage is much more stable though.

@maztheman
Copy link
Owner Author

hmm 241 is really not that competitive to the SA linux version, is it?

@kruisdraad
Copy link

Honestly i havent tried the linux version yet, The zcminer-dev windows version gets about 300 so its not that bad.

You have Linux example on actual speeds?

@chronosek
Copy link

chronosek commented Nov 14, 2016

Compiled. For me crashing after 20 sec (i see constantly increase use of gpu memory till all 4GB is fillup) and then nvidia is in P5 locked state..

[23:00:05][0x000011dc] Using SSE2: YES
[23:00:05][0x000011dc] Using AVX: YES
[23:00:05][0x000011dc] Using AVX2: YES
[23:00:05][0x000011a8] stratum | Starting miner
[23:00:05][0x000011a8] stratum | Connecting to stratum server eu1-zcash.flypool.org:3333
[23:00:05][0x00001194] miner#0 | Starting thread #0 (CUDA-SILENTARMY) GeForce GTX 980 (#0) BLOCKS=64, THREADS=512
[23:00:05][0x000011a8] stratum | Connected!
[23:00:05][0x000011a8] stratum | Subscribed to stratum server
[23:00:05][0x000011a8] miner | Extranonce is fa613f0f55
[23:00:05][0x000011a8] stratum | ←[35mTarget set to 004189374bc6a7ef9db22d0e5604189374bc6a7ef9db22d0e5604189374bc6a7←[0m
[23:00:06][0x000011a8] stratum | ←[36mReceived new job #11ee30962342110c9785←[0m
[23:00:20][0x000011dc] ←[33mSpeed [300 sec]: 8.72074 I/s, 15.646 Sols/s←[0m
[23:00:36][0x000011dc] ←[33mSpeed [300 sec]: 8.94374 I/s, 16.6693 Sols/s←[0m
CUDA error 'out of memory' in func 'sa_cuda_context::solve' line 1029
CUDA error 'out of memory' in func 'sa_cuda_context::solve' line 1030
CUDA error 'unspecified launch failure' in func 'sa_cuda_context::solve' line 1073
CUDA error 'unspecified launch failure' in func 'sa_cuda_context::solve' line 1024
CUDA error 'unspecified launch failure' in func 'sa_cuda_context::solve' line 1025
CUDA error 'unspecified launch failure' in func 'sa_cuda_context::solve' line 1029
CUDA error 'unspecified launch failure' in func 'sa_cuda_context::solve' line 1030
CUDA error 'unspecified launch failure' in func 'sa_cuda_context::solve' line 1073
CUDA error 'unspecified launch failure' in func 'sa_cuda_context::solve' line 1024
[ ... ]

@maztheman
Copy link
Owner Author

yes, sorry there is a memory "leak" that i already have fixed but not have pushed
Its only supposed to create the 2 buffers once, and reuse the, in my old code I was creating it every time and not deleting....:P

@auroracoin
Copy link

I tried it on a 750ti comp 5.0 and I get this massage. Missing file MSVCP140.dll for win 8.1

@maztheman
Copy link
Owner Author

oh, you need to install the redist file:
https://www.microsoft.com/en-ca/download/details.aspx?id=48145

@maztheman
Copy link
Owner Author

@chronosek ive checked in the fixes

@auroracoin
Copy link

That worked...thx :)

@chronosek
Copy link

@maztheman thx, not crashing now, but always get 0 sol/s no matter what -cb, -ct will set

@maztheman
Copy link
Owner Author

Yeah some internal issue, ill have to fully debug it again...

@dtawom
Copy link

dtawom commented Nov 15, 2016

Getting @ 17-18 sol/s with a 1060 3gb card with 0.4h. I found the best rate for this card is obtained with -cb 128 -ct 32 (autodect puts it at -cb 63 ct 64 and only yields 15 to 16 sol/s.

@krnlx
Copy link

krnlx commented Nov 15, 2016

removing packed atomic counters = bad, I tested it in opencl, it is fastest. I tested packed 64-bit atomics too, 1-2% slower

@chronosek
Copy link

tpruvot did good job porting to cuda, specially with atomic part (working stable at 58 sol/s but still eqm doing 66 sol/s), i think maztheman problems was from some hardcoded values or other code what was not in silentarmy

@maztheman
Copy link
Owner Author

I added a test program that will help me debug this cuda issue. Please post your log files.

@drigger
Copy link

drigger commented Nov 15, 2016

log.txt

@maztheman
Copy link
Owner Author

I created a new build that should "work" on 1080 and 1070's again. Probably wont break any records though...

@maztheman
Copy link
Owner Author

I uploaded a 16 thread version

@maztheman
Copy link
Owner Author

Thanks for your testing! Not exactly what I was expecting but seems each card has its own sweet spot.

@jddebug
Copy link

jddebug commented Nov 26, 2016

t16
16.5 sols on gtx760
So far seems the 32 is the best.

Whenever I start the miner it always shows blocks 42 and threads 64. Is that when the mining software thinks I should be using?

@maztheman
Copy link
Owner Author

The original miner used that information but I dont use it.

42 is 7 * your sm count, which I guess is 6. It seems silent army uses a different block dimension so that it can use the thread id as an index into the hash table.

@drigger
Copy link

drigger commented Nov 27, 2016

My results with latest beta:

GTX 580:

  • t16 22.41
  • t32 27.5
  • t64 32.3
  • t128 29.5
  • t256 26

GTX 550 Ti:

  • t16 5.7
  • t32 7.4
  • t64 7.5
  • t128 6.5
  • t256 5.4

GTX 650:

  • t16 5.3
  • t32 6.6
  • t64 8.8
  • t128 8.6
  • t256 8.0

@maztheman
Copy link
Owner Author

@drigger. Thanks for the testing.

@Kahana82
Copy link

I'm getting around 100 sols with GTX980 with version L now
best result (102 sols) with -cs -cb 512 -ct 64 -cd 0

with cpu (7 out of 8 threads) it gives around 120 sols
-t 7 -cs -cb 512 -ct 64 -cd 0
I found keeping 1 thread free (7 of 8 total) gives better result on gpu and same if not better final sols.

On previous versions (i) best (42 sols) was -cs -cb 256 -ct 32 -cd 0
around 66 sols with -t7 -cs -cb 256 -ct 32 -cd 0

This is a huge improvement ! keep up the good work Maz :)

@maztheman
Copy link
Owner Author

Tnx I'll keep porting any enhancements I find.

@dtawom
Copy link

dtawom commented Nov 28, 2016

@jddebug Kenai/Soldotna area. Thanks for the updates @maztheman, I'll post updates for all my stuff tomorrow.

@ceozero
Copy link

ceozero commented Nov 28, 2016

Can you please send me a private generation absenteeism?

@maztheman
Copy link
Owner Author

@ceozero Sorry I dont understand what you mean...

@ceozero
Copy link

ceozero commented Nov 28, 2016

@maztheman I need a new miner. You need to modify the code to re compile. can you do it?

@maztheman
Copy link
Owner Author

Yeah I can compile anything for windows.

@ceozero
Copy link

ceozero commented Nov 28, 2016

May I know your email address? Or do you want to send a mail to me. [email protected]

@maztheman
Copy link
Owner Author

I sent you an email

@dtawom
Copy link

dtawom commented Nov 28, 2016

Getting @ 75-85 sol/sec off my GTX 1060 3GB with 0.4l, for some reason that number doesn't improve much when I have all 4 CPU's running too. I was getting better results using your miner for CPU and Silentarmy for GPU at a combined 100 sol/s. Default detection for the 1060 is 63 blocks & 64 threads, I cap out at about 75 with this setting, 128b and 32t seems to give me the best capping out at about 85 sol/sec. Is there a calculator out there that will tell you what the best block and thread count is for any given model and ram size video cuda card?

@ceozero
Copy link

ceozero commented Nov 28, 2016 via email

@maztheman
Copy link
Owner Author

@dtawom , the silentarmy does not include some of the changes I added, so it maybe more efficient for that specific card. unless you are using the https://github.com/zawawawa/silentarmy version. I ported those changes and some other nvidia specific parameters and stuff. If I had all the nvidia cards I would be testing and testing before I created builds. I just have to trust that the changes im porting "work" and "work well" with all nvidia 10xx series cards. There is a "Version 6" of silentarmy coming VERY soon. I will be porting those changes ASAP.

Stay tuned...

@dtawom
Copy link

dtawom commented Nov 30, 2016

@maztheman yeah the zawawawa one is the one I was using. The difference is so negligible (5-10 sol/sec per machine) that it is easier just to use your latest build only. The new nvidia one you made for older cards is getting me @ 40 sol/s on my 580 gtx which is @ 15 more sol/sec than I was getting before. Thanks soo much for the updates, they've got me hitting peaks of 650 sol/s now between all my rigs. Looking forward to silentarmy V6, hope the improvements are as significant as they were from 4 to 5.

@maztheman
Copy link
Owner Author

I'm just glad the work I'm doing is useful to someone :-)

@ceozero
Copy link

ceozero commented Dec 7, 2016

@maztheman what‘s time update. to Faster.,

@maztheman
Copy link
Owner Author

A few more days

@dtawom
Copy link

dtawom commented Dec 9, 2016

I dunno if this guy is running off silentarmy v6 or something else, but these numbers are soooo sexy. Gonna give um a try on my 1060 rigs. https://forum.z.cash/t/ewbfs-nvidia-cuda-zcash-miner-1060-170-h-s-gtx-1070-250-h-s/12523

@maztheman
Copy link
Owner Author

Oh it's closed source so I can't see what he is doing. I can profile the exe probably and see what I can see

@dtawom
Copy link

dtawom commented Dec 9, 2016

Confirmed 175 sol/sec on 1060 with no CPU. You should do what this guy does and take a couple percent for yourself. I'd rather support you than him.

@dtawom
Copy link

dtawom commented Dec 9, 2016

Holy moly, getting triple speed off my R9 Fury's now with claymore. 325 H/s per card with https://github.com/nanopool/ClaymoreZECMiner
Gonna break the gigahash mark per sec tomorrow :)

@maztheman
Copy link
Owner Author

:) wow, looks like claymore is on top again!

@maztheman
Copy link
Owner Author

Looks like an update will happen pretty soon...

@maztheman
Copy link
Owner Author

new build new thread

#8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests