eip | title | author | discussions-to | status | type | category | created |
---|---|---|---|---|---|---|---|
1057 |
ProgPoW, a Programmatic Proof-of-Work |
Radix Pi <[email protected]>, Ifdef Else <[email protected]> |
Draft |
Standards Track |
Core |
2018-05-02 |
The following is a proposal for an alternate proof-of-work algorithm - “ProgPoW” - tuned for commodity hardware in order to close the efficiency gap available to specialized ASICs.
The security of proof-of-work is built on a fair, randomized lottery where miners with similar resources have a similar chance of generating the next block.
For Ethereum - a community based on widely distributed commodity hardware - specialized ASICs enable certain participants to gain a much greater chance of generating the next block, and undermine the distributed security.
ASIC-resistance is a misunderstood problem. FPGAs, GPUs and CPUs can themselves be considered ASICs. Any algorithm that executes on a commodity ASIC can have a specialized ASIC made for it; most existing algorithms provide opportunities that reduce power usage and cost. Thus, the proper question to ask when solving ASIC-resistance is “how much more efficient will a specialized ASIC be, in comparison with commodity hardware?”
EIP presents an algorithm that is tuned for commodity GPUs where there is minimal opportunity for ASIC specialization. This prevents specialized ASICs without resorting to a game of whack-a-mole where the network changes algorithms every few months.
Until Ethereum transitions to a pure proof-of-stake model, proof-of-work will continue to be a part of the security of the network - whether it’s adapted into a hybrid model (as is the case of Casper FFG), or adopted by a hard fork.
Ethash allows for the creation of an ASIC that is roughly twice as efficient as a commodity GPU. Ethash’s memory accesses are paired with a very small amount of fixed compute. Most of a GPU’s capacity and complexity sits idle, wasting power, while waiting for DRAM accesses. A specialized ASIC can implement a much smaller (and cheaper) compute engine that burns much less power.
As miner rewards are reduced with Casper FFG, it will remain profitable to mine on a specialized ASIC long after GPUs have exited the network. This will make it easier for an entity that has access to private ASICs to stage a 51% attack on the Ethereum network.
ProgPoW is based on Ethash. The algorithm has five main elements, each tuned for commodity GPUs while minimizing the possible advantage of a specialized ASIC.
Changes keccak_f1600 (with 64-bit words) to keccak_f800 (with 32-bit words).
On 64-bit architectures f1600 processes twice as many bits as f800 in roughly the same time. As GPUs are natively 32-bit architectures, f1600 takes twice as long as f800. ProgPow doesn’t require all the bits f1600 can consume, thus reducing the size reduces the optimization opportunity for a specialized ASIC.
Increases mix state.
A significant part of a GPU’s area, power, and complexity is the large register file. A large mix state ensures that a specialized ASIC would need to implement similar state storage, limiting any advantage.
Adds a random sequence of math in the main loop.
The random math changes every 50 blocks to amortize compilation overhead. Having a random sequence of math that reads and writes random locations within the state ensures that the ASIC executing the algorithm is fully programmable. There is no possibility to create an ASIC with a fixed pipeline that is much faster or lower power.
Adds reads from a small, low-latency cache that supports random addresses.
Another significant part of a GPU’s area, power, and complexity is the memory hierarchy. Adding cached reads makes use of this hierarchy and ensures that a specialized ASIC also implements a similar hierarchy, preventing power or area savings.
Increases the DRAM read from 128 bytes to 256 bytes.
The DRAM read from the DAG is the same as Ethash’s, but with the size increased to 256 bytes
. This better matches the workloads seen on commodity GPUs, preventing a specialized ASIC from being able to gain performance by optimizing the memory controller for abnormally small accesses.
The DAG file is generated according to traditional Ethash specifications, with an additional PROGPOW_SIZE_CACHE
bytes generated that will be cached in the L1.
ProgPoW can be tuned using the following parameters. The proposed settings have been tuned for a range of existing, commodity GPUs:
PROGPOW_LANES:
The number of parallel lanes that coordinate to calculate a single hash instance; default is32.
PROGPOW_REGS:
The register file usage size; default is16.
PROGPOW_CACHE_BYTES:
The size of the cache; default is16 x 1024.
PROGPOW_CNT_MEM:
The number of frame buffer accesses, defined as the outer loop of the algorithm; default is64
(same as Ethash).PROGPOW_CNT_CACHE:
The number of cache accesses per loop; default is8.
PROGPOW_CNT_MATH:
The number of math operations per loop; default is8.
ProgPoW uses FNV1a for merging data. The existing Ethash uses FNV1 for merging, but FNV1a provides better distribution properties.
ProgPow uses KISS99 for random number generation. This is the simplest (fewest instruction) random generator that passes the TestU01 statistical test suite. A more complex random number generator like Mersenne Twister can be efficiently implemented on a specialized ASIC, providing an opportunity for efficiency gains.
ProgPoW utilizes almost all parts of a commodity GPU, excluding:
- The graphics pipeline (displays, geometry engines, texturing, etc);
- Floating point math.
Making use of either of these would have significant portability issues between commodity hardware vendors, and across programming languages.
Since the GPU is almost fully utilized, there’s little opportunity for specialized ASICs to gain efficiency. Removing both the graphics pipeline and floating point math could provide up to 1.2x gains in efficency, compared to the 2x gains possible in Ethash, and 50x gains possible for CryptoNight.
This algorithm is not backwards compatible with the existing Ethash, and will require a fork for adoption. Furthermore, the network hashrate will halve as the time spent in the core is now balanced with time spent in memory.
This PoW algorithm was tested against six different models from two different manufacturers. Selected models span two different chips and memory types from each manufacturer (Polaris20-GDDR5 and Vega10-HBM2 for AMD; GP104-GDDR5 and GP102-GDDR5X for NVIDIA). The average hashrate results are listed below. Additional tests are ongoing.
As the algorithm nearly fully utilizes GPU functions in a natural way, the results reflect relative GPU performance that is similar to other gaming and graphics applications.
Model | Hashrate |
---|---|
RX580 | 9.4 |
Vega56 | 16.6 |
Vega64 | 18.7 |
GTX1070Ti | 13.1 |
GTX1080 | 14.9 |
GTX1080Ti | 21.8 |
Please refer to the official code located at ProgPOW for the full code, implemented in the standard ethminer.
Copyright and related rights waived via CC0.