Intermittent startup error on CUDA miner #19

closerm · 2018-03-02T16:49:57Z

When benchmarking the CUDA miner (v0.1.9) I get an intermittent error, as shown below.

        ============================= aion reference miner======================
                        Equihash<210,9> CPU&GPU Miner for AION v0.1.9
                        Base on NiceHash equihash miner.
        ============================= aion reference miner======================

Setting log level to 2
[20:31:50][0x00007f6ae3ad4740] Using SSE2: YES
[20:31:50][0x00007f6ae3ad4740] Using AVX: NO
[20:31:50][0x00007f6ae3ad4740] Using AVX2: NO
[20:31:50][0x00007f6ae3ad4740] Benchmarking CUDA worker (CUDA-TROMP) GeForce GTX 1080 Ti (#0) BLOCKS=64, THREADS=64
[20:31:51][0x00007f6ae3ad4740] Benchmark starting... this may take several minutes, please wait...
[20:32:12][0x00007f6adb04c700] CUDA error 'the launch timed out and was terminated' in func 'solve' line 1186

This doesn't happen every time I launch the miner, but it happened several times in a short period of running different benchmarks.

The text was updated successfully, but these errors were encountered:

closerm · 2018-03-11T17:33:20Z

I'm still getting this error pretty consistently. Any thoughts?

        ============================= aion reference miner======================
                        Equihash<210,9> CPU&GPU Miner for AION v0.1.9
                        Base on NiceHash equihash miner.
        ============================= aion reference miner======================

Setting log level to 2
[12:31:40][0x00007f6161bb7740] Using SSE2: YES
[12:31:40][0x00007f6161bb7740] Using AVX: NO
[12:31:40][0x00007f6161bb7740] Using AVX2: NO
[12:31:40][0x00007f615912f700] stratum | Starting miner
[12:31:40][0x00007f615912f700] stratum | Connecting to stratum server 192.168.1.35:3333
[12:31:40][0x00007f615892e700] miner#0 | Starting thread #0 (CUDA-TROMP) GeForce GTX 1080 Ti (#0) BLOCKS=56, THREADS=64
[12:31:40][0x00007f615912f700] stratum | Connected!
[12:31:40][0x00007f615912f700] stratum | Subscribed to stratum server
[12:31:40][0x00007f615912f700] miner | Extranonce is 50000004
[12:31:40][0x00007f615912f700] stratum | Received new job #9
[12:31:40][0x00007f615912f700] stratum | Authorized worker 0x0000000000000000000000000000000000000000000000000000000000000000
[12:31:45][0x00007f615912f700] stratum | Received new job #a
[12:31:51][0x00007f6161bb7740] Speed [15 sec]: 5.16016 I/s, 10.223 Sols/s
[12:32:01][0x00007f6161bb7740] Speed [15 sec]: 2.33333 I/s, 5.26667 Sols/s
[12:32:03][0x00007f615892e700] miner#0 | CUDA error 'the launch timed out and was terminated' in func 'solve' line 1186

closerm · 2018-03-11T18:30:00Z

I am also getting some additional CUDA errors, and this is becoming less "intermittent".

[13:26:34][0x00007f4cb6ffd700] miner#4 | CUDA error 'unspecified launch failure' in func 'solve' line 1186
[13:26:21][0x00007f06f9359700] miner#4 | CUDA error 'an illegal memory access was encountered' in func 'solve' line 1186

[13:26:56][0x00007ffbf13f4700] miner#3 | CUDA error 'the launch timed out and was terminated' in func 'setheadernonce' line 260

These errors are all being produced by the pre-built 0.1.9 CUDA miner.

closerm · 2018-03-12T18:05:55Z

These errors appear to be related to the nvidia driver's watchdog timer that is used to keep the X window display responsive in mixed X / compute environments.

Per this thread, the first two options may not be tenable since they involve not running X which appears to be required if the user wants to control fan / power / clock speeds on the GPU. (I know there have been ways to startx, set parameters, and have them persist after closing X, but this process hasn't worked for me.)

The fourth option is working for me right now, though the use of that option is the least recommended of the ways forward.

Which brings me to option 3, the recommended option, which is effectively "break kernel execution into small enough pieces that their execution does not exceed the driver watchdog." I realize that this is a bit of a huge request, but I gather from other pages (old) that this is a bigger problem on Windows, so this problem will likely rear its ugly head when the miner is released for Windows. Refactoring the kernel code into smaller, faster executing segments could prevent this problem on both platforms.

closerm · 2018-04-12T01:59:04Z

Despite the comments above, I am still getting
CUDA error 'an illegal memory access was encountered' in func 'solve' line 1186

even with v0.2.0. It does appear to happen less, but has still occurred twice in the past hour.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermittent startup error on CUDA miner #19

Intermittent startup error on CUDA miner #19

closerm commented Mar 2, 2018

closerm commented Mar 11, 2018

closerm commented Mar 11, 2018

closerm commented Mar 12, 2018

closerm commented Apr 12, 2018

Intermittent startup error on CUDA miner #19

Intermittent startup error on CUDA miner #19

Comments

closerm commented Mar 2, 2018

closerm commented Mar 11, 2018

closerm commented Mar 11, 2018

closerm commented Mar 12, 2018

closerm commented Apr 12, 2018