Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework the LXRHash to make it faster to verify without the 1GB map #55

Open
PaulSnow opened this issue Oct 6, 2019 · 15 comments
Open

Comments

@PaulSnow
Copy link
Collaborator

PaulSnow commented Oct 6, 2019

So the Idea here is to make the state run through the ByteMap and build the state with (mostly) a fixed number of bytes and ByteMap lookups.

This will work better for PoW, less so for general cryptographic hashing.

Implemented in this branch as LXRHash2

@WhoSoup
Copy link
Contributor

WhoSoup commented Oct 8, 2019

I don't see how this is in any way feasible. You can use literally anything you want as the 224 verifying bytes. Example: https://play.golang.org/p/EHIQpIVLkfE

You can use any payload and a random 224 byte slice and this will generate a hash that passes HashValidate without ever touching the bytemap.

@PaulSnow
Copy link
Collaborator Author

PaulSnow commented Oct 9, 2019

So I did fail to consider the fact that you can build a hash and a verification stream of bytes with any size byte map; that's a pretty big hole.

Spent some time considering the issue, and the obvious fix is to hold some of the ByteMap. So consider:

So suppose you have the whole ByteMap on disk, but only load a random 5 to 10 percent of it into memory. When you process the grading as part of the wallet, you can do the verification using the verification stream (the 224 bytes provided with the 32 byte hash) and check that any byte that falls within your 5 to 10 percent of the ByteMap you have loaded matches the ByteMap.

Odds of catching a bad hash is 1 - ^ 224. So for 5%, odds of catching a hash failure is about 99.9989765 percent, or (1 - 19/20^224)*100. If a wallet runs through the results of grading for a block appears to disagree with the miners (excluding an OPR that your validation appears to pass), then you can pull in a different 5% and double check the hashes you thought passed. (Do not count any OPRs that failed the first time around, since failure is proof the hash is bad). The additional 5% significantly increases the odds of catching failures to 1-6E-10 percent

Another way of looking at this is that using only 10 percent of the ByteMap to validate the hashes saves 90% of the memory footprint when validating. The combination of using less memory and comparing the wallet's results with the miner's yields very strong validation without the memory footprint, and far faster validation to boot.

@PaulSnow
Copy link
Collaborator Author

PaulSnow commented Oct 9, 2019

BTW, putting the current code into the playground demonstrates you can't easily fool the validation without running something through the hash function. (your code doesn't check the results of the hash against the hash... maybe I failed to check in the current code?) The exploit is really that you can use a very small ByteMap and create a validation stream that will pass, if you don't check at least some of the bytes in the validation stream against what the ByteMap would have produced.

See my update to the playground here: https://play.golang.org/p/S9RJtYdPuDm

@WhoSoup
Copy link
Contributor

WhoSoup commented Oct 9, 2019

So suppose you have the whole ByteMap on disk, but only load a random 5 to 10 percent of it into memory.

I'd be interested in seeing some code that accomplishes this. How does the validator know which bytes are in a subsection of the bytemap?

@PaulSnow
Copy link
Collaborator Author

PaulSnow commented Oct 9, 2019

You pick a range of bytes from the ByteMap. B() gets the index, so you verify the bytes in that range.

I'll update the branch.

@PaulSnow
Copy link
Collaborator Author

PaulSnow commented Oct 9, 2019

More complete playground test. But the unit tests for validation in the LX-55 branch are better.

https://play.golang.org/p/sYz-cgWfGMv

@PaulSnow
Copy link
Collaborator Author

PaulSnow commented Oct 9, 2019

Updated the branch with a TestFastValidation() unit test that tests doing checks on just portions of the ByteMap. The odds of getting a false positive checking 10 percent of the ByteMap are about 1 in 6 trillion, more or less.

Given the odds, nobody is going to be putting many engineered hashes in the OPRs, and this reduces our footprint from 1GB of memory + code to about 100 MB + code. We are still big, but now can run inside 1 GB instead of requiring 2 GB. Also speeds up validation quite a bit. (we do 1/10th the ByteMap look ups, and against a smaller ByteMap)

There are other tests that have to be run on this version of the LXRHash, so I am not sure when we need to deploy this.

@WhoSoup
Copy link
Contributor

WhoSoup commented Oct 10, 2019

Updated the branch with a TestFastValidation() unit test that tests doing checks on just portions of the ByteMap.

Okay let's assume that validators are loading only 1/10th of the bytemap.

If all validators load the same 10% into memory, then we have the same problem as before. Anyone hashing will also only need to load that same 10% into memory to hash.

If validators load different (random?) 10% into memory, then we have a different problem. An attacker could create hashes that validate on 50% of the bytemap and intentionally fail in the other 50%. This would create a soft fork between all of the validators. With just a single OPR that makes it into the top 25, you could fork PegNet.

@WhoSoup
Copy link
Contributor

WhoSoup commented Oct 10, 2019

Creating such a malicious hash is trivial, for example:

1d99cde01cad8b4e64d229196d3712c2b97349987a4a4364a522fabc7ebfb540000000676f008e009946000000006500873b00005a6f0000860000000000d8440000cc000000c4ac00004615eb00dc79e400002400c60000000084ee00c900003400009c000000230000002f33000000b000009500786b007a00000052c00085050000a842c6000000000000a60000000000a50000d7b50000000073a70002000000005800811f2a00c6009b62d800c000dc00000047f1350000000055e800a4578400009000e57b000090000077d500bd00f700000000e6de6a65af060a002277000066006e04090000002400efef00092ceb0000000000000000008a009100

This was generated using the 30-bit bytemap and will validate correctly if you verify against the lower half of the bytemap (ValidationIndex = 0, ValidationSize = 536870912), but fail if you load any part of the upper half.

Test code used (using latest commit to branch):

func main() {
	var LXR lxr.LXRHash2
	LXR.Init(lxr.Seed, lxr.MapSizeBits, lxr.HashSize, lxr.Passes)

	payload := []byte("this is the hash payload")

	fake, _ := hex.DecodeString("1d99cde01cad8b4e64d229196d3712c2b97349987a4a4364a522fabc7ebfb540000000676f008e009946000000006500873b00005a6f0000860000000000d8440000cc000000c4ac00004615eb00dc79e400002400c60000000084ee00c900003400009c000000230000002f33000000b000009500786b007a00000052c00085050000a842c6000000000000a60000000000a50000d7b50000000073a70002000000005800811f2a00c6009b62d800c000dc00000047f1350000000055e800a4578400009000e57b000090000077d500bd00f700000000e6de6a65af060a002277000066006e04090000002400efef00092ceb0000000000000000008a009100")

	LXR.ValidationIndex = 0
	LXR.ValidationSize = 536870912
	_, err := LXR.HashValidate(payload, fake)
	fmt.Println(err)

	LXR.ValidationIndex = 536870912
	_, err = LXR.HashValidate(payload, fake)
	fmt.Println(err)
}

Result:

<nil>
error found in the validation stream of the hash

@PaulSnow
Copy link
Collaborator Author

PaulSnow commented Oct 23, 2019

@WhoSoup Can you provide the code to produce the fake hash? Observations:

  1. If it takes way longer to create a fake hash than using the 1 GB buffer, then it is useless as a way to speed up LXRHash.
  2. This is a proof of work hash, not a cryptographic hash. If we can have many hash solutions for a set of data, we don't care. What we care is that it is harder to find a fake than to find a real one.

So if we consider a valid proof of work a hash that has the right difficulty, and we use the validation path (not caring if the miner found the fake hash or the real hash) then what difference does it make to us that some miners might take 3 trillion attempts to find one fake, and iterate down that path to try and find a solution? I doubt reasonable hardware can beat just having the 1 GB buffer to create a hash.

tl;dr: So what we have to show is that someone can create fakes roughly as fast as real hashes. If it takes on average a 3 trillion fake hash creates to get one hash whose difficulty isn't any more likely to be good than using a 1 GB buffer, then I think we are good.

@PaulSnow
Copy link
Collaborator Author

PaulSnow commented Oct 23, 2019

Well, thinking about a PoW hash harder, we can see that there are things we can do that a cryptographic hash would not do.

And the easiest fix for the last issue is to simply require the hash to have 10 hits in the lower 10 percent of the ByteMap. Since we do 224 samples, Mostly we will have 10 hits. When you create your fake hash, it has no hits, so it fails (we can return all zeros as we define difficulty as being the biggest number). The same hash won't validate, but that's okay with us, because it has no difficulty either

Engineering 10 hits that also have the right bytes to keep the hash on its rails is really hard.

So without this fix, what a miner could do is use a 256 byte ByteMap and find a solution really fast (On a ASIC or GPU). Then do the recursive search to find a validation list that would avoid the lower 10% of the 1 GB ByteMap.

With the fix, this doesn't work. If they don't hit the lower 10% 10 times, the hash is invalid. If they do hit the lower 10%, they run a 255 out of 256 odds of not having the right byte. 10 bytes, and it is really unlikely that they can create a fake that will pass.

Actually this doesn't work, so never mind.

@PaulSnow
Copy link
Collaborator Author

So in conclusion I think we have to make it impossible for the attacker to know what 10% is gong to be used for validation.

@WhoSoup
Copy link
Contributor

WhoSoup commented Oct 24, 2019

Alright, first of all I want to differentiate between fake hashes and malicious hashes. In this context, a fake hash is a hash that validates the same way for everyone but was created without the use of the full bytemap. A malicious hash is a hash that validates for some but not for others.

A malicious hash will soft fork anyone who uses this partial-bytemap algorithm. For example in the pegnet miner, if there is an OPR that reaches the top 25 with a malicious hash that verifies on half of the nodes, the list of "previous winners" will be different for half of the miners.

A fake hash becomes a malicious hash when not all validators do the same thing. To prevent malicious hashes, all validators have to get the exact same output for the same input. This means we can't randomize the percent of the bytemap that is going to be checked.

Fake hashes can be created with the exact same amount of ByteMap requirements as the validator.

Example:

8391d771aef4f4a975d38cb6036a8db933918cbb9d7ec45978689d5926397154000000000000320000000000000000000000000000040000000000000000000000000000000000000000400000000000a8000000000000000000000000e20036000000000000e60000000000fb0000000000000000d6000000000000000000000000000000000002000000000000000000000000000000000000a52a0000000000000000000000004900000000000000000000000000000000000000000000000000fbf0fe0000000000000000000000000000000000000000220000000000000000000000000000000000000000000000000000fd000000008c000000000000

This is a valid hash that passes your latest algorithm with the low10percent approach. The code essentially just pulls data from the bytemap if it is going to be verified: WhoSoup@80777dc

And here's the code I used to create and verify this hash:

package main

import (
	"encoding/hex"
	"fmt"

	lxr "github.com/pegnet/LXRHash"
)

func main() {
	var LXR lxr.LXRHash2
	LXR.Init(lxr.Seed, lxr.MapSizeBits, lxr.HashSize, lxr.Passes)

	payload := []byte("this is the hash payload")

	LXR.ValidationSize = 107374183
	hash, err := LXR.HashValidate(payload, nil)
	fmt.Printf("%x\n", hash)

	fake, _ := hex.DecodeString("8391d771aef4f4a975d38cb6036a8db933918cbb9d7ec45978689d5926397154000000000000320000000000000000000000000000040000000000000000000000000000000000000000400000000000a8000000000000000000000000e20036000000000000e60000000000fb0000000000000000d6000000000000000000000000000000000002000000000000000000000000000000000000a52a0000000000000000000000004900000000000000000000000000000000000000000000000000fbf0fe0000000000000000000000000000000000000000220000000000000000000000000000000000000000000000000000fd000000008c000000000000")
	_, err = LXR.HashValidate(payload, fake)
	fmt.Println(err)
}

As you can see, I set the validation size to be 10% of the 1GB bytemap (rounded up).

I'm just using the default 0, but if you wanted to hide the fact that you're creating a fake hash, you could use a random number generator in this place. The validity of these numbers is never checked.

The fundamental flaw to your approach is that you are trusting user input. Every single byte of the user-provided hash has to be verified. Any byte that is not verified can be used to craft fake hashes. I don't think it's feasible to create an algorithm that requires the miners to load the full 1 GB but allows validators to have less than the 1 GB loaded. If it can be validated with less than 1 GB, it can also be mined with the same amount.

@PaulSnow
Copy link
Collaborator Author

PaulSnow commented Oct 24, 2019

So the preferred solution might be to pick 1000 random 1 MB blocks and load them into memory. Now the odds of an attacker guessing what 10% is going to be used by any validater is very close to zero. Even guessing 80% of any validater is very close to zero.

In order to win a block the miners are going to use the 1 GB, so they don't win if they cheat.

Now wallets might just use the majority of miners to validate the winning list, and use this method for the block until the miners finish mining and publish their blocks. Outside of that, just trusting the miners works here because of how many miners are involved. Unlike typical "trust the miner" issue, the wallet can verify with a random data set, and also look at the majority of what the miners say who won. The odds of an attack hitting with hash power AND avoiding a random 10% used by wallets is near nil.

@WhoSoup
Copy link
Contributor

WhoSoup commented Oct 28, 2019

So the preferred solution might be to pick 1000 random 1 MB blocks and load them into memory.

That's the entire table, assuming it's a typo and you meant 100 random?

Now the odds of an attacker guessing what 10% is going to be used by any validater is very close to zero. Even guessing 80% of any validater is very close to zero.

That is a solution to discourage people from not faking hashes, but it does nothing to address malicious hashes. If this algorithm can't be used for mining and can't be used for pegnetd, where do you see this being used?

We already have a good way of grading without using lxrhash by just using the self reported difficulty. That has similar properties to using this reduced-ram verification algorithm:

  • It doesn't affect mining/pegnetd because they verify it via full bytemap
  • There's no tangible benefit for anyone to submit fake records
  • We can never trust the results completely, so it can't be used anywhere important

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants