Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable Hash Functions #7111

Open
2 tasks done
illuzen opened this issue Jan 10, 2025 · 11 comments
Open
2 tasks done

Configurable Hash Functions #7111

illuzen opened this issue Jan 10, 2025 · 11 comments
Labels
I5-enhancement An additional feature request. I10-unconfirmed Issue might be valid, but it's not yet known.

Comments

@illuzen
Copy link

illuzen commented Jan 10, 2025

Is there an existing issue?

  • I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

  • This is not a support question.

Motivation

I want to change the internal hash functions of substrate to poseidon in order to make it zk-friendly. Currently it seems the only way to do that for Accounts, Events, Extrinsics, Blocks and entropy is to directly modify the frame-system pallet, which is not-ideal. Blake2 and Twox64Concat are hardcoded into several places despite almost always having access to T:Config, which allows configurability. So you could replace Blake2_256 with T::Hashing in most places.

Request

Please make the internal hash functions Blake2 and Twox64Concat configurable via setting Config in the runtime so I don't have to mod frame-system.

The one I'm not sure about is pub fn unique(entropy: impl Encode) -> [u8; 32], since it doesn't have access to the Config object. Maybe just put it in the pallet module.

Solution

	#[pallet::getter(fn account)]
	pub type Account<T: Config> = StorageMap<
		_,
	        T::Hashing,
		T::AccountId,
		AccountInfo<T::Nonce, T::AccountData>,
		ValueQuery,
	>;
	
		/// Map of block numbers to block hashes.
	#[pallet::storage]
	#[pallet::getter(fn block_hash)]
	pub type BlockHash<T: Config> =
		StorageMap<_, T::Hashing, BlockNumberFor<T>, T::Hash, ValueQuery>;

	/// Extrinsics data for the current block (maps an extrinsic's index to its data).
	#[pallet::storage]
	#[pallet::getter(fn extrinsic_data)]
	#[pallet::unbounded]
	pub(super) type ExtrinsicData<T: Config> =
		StorageMap<_, T::Hashing, u32, Vec<u8>, ValueQuery>;

	#[pallet::storage]
	#[pallet::unbounded]
	#[pallet::getter(fn event_topics)]
	pub(super) type EventTopics<T: Config> =
		StorageMap<_, T::Hashing, T::Hash, Vec<(BlockNumberFor<T>, EventIndex)>, ValueQuery>;

Are you willing to help with this request?

Yes!

@illuzen illuzen added the I5-enhancement An additional feature request. label Jan 10, 2025
@github-actions github-actions bot added the I10-unconfirmed Issue might be valid, but it's not yet known. label Jan 10, 2025
@bkchr
Copy link
Member

bkchr commented Jan 10, 2025

The hash functions you are mentioning there are only used to calculate the storage keys. Why do you need to use a different hash function there?

I think this is a xyproblem. So, you want to use ZK, but for what exactly?

@illuzen
Copy link
Author

illuzen commented Jan 10, 2025

The main use case is for EIP-7503 style ZK-Wormhole. This requires forming a zk-circuit out of state proofs, which, if I'm not mistaken, involves calculating storage keys.

Is there a good reason not to make these hash functions configurable? @bkchr

@burdges
Copy link

burdges commented Jan 10, 2025

In principle, you could use any hash function you like for your chain's data, but parachain infrastructure needs substrate's regular storage and hash function. In practice, our DSL infrastructure makes doing your own thing difficult.

At least for our envisioned identity stuff, we'd want an append only Poseidon MMR for whole blocks, and then another Poseidon Merkle tree within blocks, but being append only makes this simpler than maintaining some actual state with updates.

We've much interest in making NOMT work in substrate, which basically brings exactly the same problems, so maybe one should figure out who is making progress there? JAM involves some new storage proposal too, but afaik it mnight lack some of the important optimizations in NOMT etc.

@illuzen
Copy link
Author

illuzen commented Jan 11, 2025

Seems like, if you want to support both parachains and solochains, you could make a separate types for the solochain-only configs and just clearly label them as such. Maybe T::Hashing is too broad, could do something more specific like T::EventHashing and T::ExtrinsicHashing.

We will just fork frame-system if necessary. I don't see anything close to what Substrate has accomplished in terms of a build-your-own-blockchain-library.

@gui1117
Copy link
Contributor

gui1117 commented Jan 11, 2025

What is the issue for NOMT usage in substrate, only the hash choice? Or the whole storage type architecture? (Key being: 128 bit for pallet name + 128 bit for storage item name + choosen hash and encoding for the key that can be very long as it can include the key itself)

I think we can bring more flexibility and configurability.

@burdges
Copy link

burdges commented Jan 11, 2025

Afaik NOMT hasn't even optimized their hashing: Ideally, you'd invoke the bare blake3 compression function, which gives really nice SIMD implementations, and then provide some new leaf hash.

In substrate, there is a string based lookup so that related prefixes show up together, which reduces PoV size. In NOMT there is a single 32 byte (?) key, of which yes you can shave off a few bits, but mostly you'd want complex accounts records to be substrucutres, maybe subtrees, or maybe blake3 encoded bytes, since you can do a read or edit proof into blake3 hashed data, or maybe accounts should be bytes in a designated tree-like format ala JSON, but employ some structure aware hashing. It's lots of work doing this nicely.

We will just fork frame-system if necessary

Are you sure you do not need append only here?

@illuzen
Copy link
Author

illuzen commented Jan 12, 2025

Are you sure you do not need append only here?

We are storing nullifiers on chain permanently (in the latest state) to prevent double-spends, which I would describe as append-only, and the sequence of finalized blocks is also append-only, but you seem to be talking about a different layer?

Using the Merkle-Patricia-Trie for storage doesn't seem to interfere with our use case, but I would be curious to hear more.

@illuzen
Copy link
Author

illuzen commented Jan 14, 2025

I think we can bring more flexibility and configurability.

We're happy to support this if you can suggest the boundary conditions for the edits. @gui1117

@bkchr
Copy link
Member

bkchr commented Jan 14, 2025

The main use case is for EIP-7503 style ZK-Wormhole. This requires forming a zk-circuit out of state proofs, which, if I'm not mistaken, involves calculating storage keys.

If you want to build a zk storage proof, you can compute the keys "offline" and input them into the prover. No need to compute them inside the circuit?

@bkchr
Copy link
Member

bkchr commented Jan 14, 2025

Afaik NOMT hasn't even optimized their hashing: Ideally, you'd invoke the bare blake3 compression function, which gives really nice SIMD implementations, and then provide some new leaf hash.

I don't see how the request here relates to NOMT. We need the key structure that we have to support iterations, reconstructing the keys and similar. However, NOMT doesn't really care how the key is being build. The problem there right now is that they don't support iterating the storage, but this can probably be build on top of it. Still we are talking here about the hash that is being used to generate the KEYs and not the hash that is being used to build the trie.

@illuzen
Copy link
Author

illuzen commented Jan 15, 2025

The main use case is for EIP-7503 style ZK-Wormhole. This requires forming a zk-circuit out of state proofs, which, if I'm not mistaken, involves calculating storage keys.

If you want to build a zk storage proof, you can compute the keys "offline" and input them into the prover. No need to compute them inside the circuit?

Yes the prover would be the off-chain wallet, but the verifier is on chain and the verifier complexity is also a function of the circuit complexity, which is a function of the hash function. Poseidon is orders of magnitude more efficient than Blake2 in this context.

Regarding storage keys, I narrowed it down, if we can do extrinsic proofs without editing the storage key I think it will work.

Seems like we can just edit runtime/src/lib.rs

pub type Header = generic::Header<BlockNumber, PoseidonHasher>;

We also need to change the hash function that is used to derive the accountId, but that also seems to be a separate hasher that we don't need to edit frame-system for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I5-enhancement An additional feature request. I10-unconfirmed Issue might be valid, but it's not yet known.
Projects
None yet
Development

No branches or pull requests

4 participants