Public IO #627

matthiasgoergens · 2024-11-25T09:15:39Z

matthiasgoergens
Nov 25, 2024
Maintainer

We recently discussed hints (also known as unconstrained private input.

Now it is time to look at public IO.

We have two choices to make:

Simulate input or simulate output?
Merkle-ise the data by default?

Simulate input or simulate output?

SP1 makes a curious choice here: For hints, they simulate input via functions like read or read_slice. We do the same. However for public IO, they simulate output via functions like commit or commit_slice

pub fn commit<T>(value: &T)
where
    T: Serialize,
pub fn commit_slice(buf: &[u8])

Not that they need a Serialize bound, because they are actually serialising inside the VM.

Serialising is a rather expensive operation. Deserialising is much cheaper. (But the gap between the two is smaller in SP1, because their deserialisation is literally orders of magnitude less efficient than ours.)

So I suggest to deviate from SP1 here, and simulate public input instead of public output. Both models have the same expressiveness, but simulating input is more efficient thanks to rkvy's zero-copy deserialisation.

(And we can implement commit and commit_slice as convenience functions on top of read_public and read_public_slice, without even having to serialise inside the VM.)

I suggest we follow the same model as for hints. We already have public IO memory regions in the VM.

As an extra wrinkle, in the code for 'successful exit' we will want to assert that all public input values have been consumed. This is not necessary for hints.

Merkle-ise by default?

The discussion so far assumes that we specify public IO as a list of values. If we have a long collection of inputs and we only need to read some of them, it can make sense to use a Merkle-tree.

In that case, the public input would consists of just the root of the Merkle-tree.

To make that work we need one specific pre-compile that can be wrapped in Rust function:

pub fn unhash(digest: &Digest) -> &Preimage

unhash takes a digest and returns the corresponding pre-image according to some hash function. This combines reading the pre-image from some private input (not necessarily the same as for our unconstrained hints) and verifying its hash digest.

On top of this primitive, we can build functionality that abstracts away from the hashing and presents the guest developer with an API to navigate a tree (or directed acyclic graph) of public input data.

Details depend on the hash functions we want to implement as pre-compiles.

We can also look into eg vector commitments and other schemes.

Sidenote: design for Merkle-trees and commitment schemes work better with simulated public input than SP1-style public output. (Unless you want your VM to copy a lot.)

naure · 2024-11-25T10:59:47Z

naure
Nov 25, 2024
Collaborator

The program will typically be driven by private inputs, e.g. a rollup block. And it will generate public summaries, e.g. hash digests and counters. Then it can write there as public outputs. If only inputs were available, it would simulate outputs by loading inputs, comparing, and branching.

The output style is easier and safer to use.

Easier: No need to generate the public IO somehow before tracing, as it is naturally a result of the trace.

we will want to assert that all public input values have been consumed

Safer: This is an automatic and intuitive feature of the output style.

5 replies

matthiasgoergens Nov 25, 2024
Maintainer Author

Thanks for your feedback.

To quote myself:

And we can implement commit and commit_slice as convenience functions on top of read_public and read_public_slice, without even having to serialise inside the VM.

Serialising is expensive.

naure Nov 25, 2024
Collaborator

Serialising is a rather expensive operation. Deserialising is much cheaper.

There’s the problem then. Surely you’re thinking of some particular case. Can you expand on why that would be?

The format should be something like a few fixed-size values (digests and numbers) in a simple layout. So about the same complexity to read or write.

kunxian-xia Nov 25, 2024
Maintainer

Serialising is expensive.

For example in a typical workload (i.e. proving scroll mainnet block) that we cared about, this cost of serializing is negligible compared to other parts.

┌╴program
│ ┌╴read_size
│ └╴859 cycles
│ ┌╴read_traces
│ └╴196,303 cycles
│ ┌╴exec_batch
│ │ ┌╴exec_block
│ │ └╴35,176,327 cycles (99.94%)
│ │ ┌╴commit_changes
│ │ └╴67,510,476 cycles (99.67%)
│ │ ┌╴Keccak::finalize
│ │ └╴1,360 cycles
│ │ ┌╴sp1_zkvm::io::commit_slice
│ │ └╴18,166 cycles
│ └╴190,658,309 cycles (99.95%)
└╴190,858,491 cycles (100.00%)

icemelon Nov 26, 2024
Maintainer

I agree that in general it's true that serialization is more expensive in the guest program. We must also take into account that public input could be computed in a more expensive environment like smart contracts on Ethereum. In such case, we should make the public input cheaper for the verifier.

matthiasgoergens Nov 26, 2024
Maintainer Author

Yes, if you only ever want to have a very small amount of data, then even a scheme that's expensive per byte will be cheap in absolute terms.

Though there's a cost in terms of complexity for each additional scheme we introduce. By default, we can copy what we are doing for private input / hints.

We must also take into account that public input could be computed in a more expensive environment like smart contracts on Ethereum. In such case, we should make the public input cheaper for the verifier.

Yes, in those cases, we probably want to specify the byte layout of the public input precisely, instead of relying on rkyv (or serde or postcard etc) to come up with a layout for us.

From the point of view of the VM, it's the same mechanism: the prover specifies a few (public) bytes that will be mapped into a well known memory location somewhere. But the libraries that we use inside the VM to interpret these bytes, and outside of the VM to create those bytes will be bespoke for eg ethereum smart contracts.

naure · 2024-11-25T11:21:28Z

naure
Nov 25, 2024
Collaborator

Note for the implementation of unhash or Merkle openings: this is best offered as a software library, as opposed to some solution built into the zkVM.

First because there can be many variants of these for different contexts, and the applications actually come with their own methods: e.g. the existing formats of blocks and the Merkle trees of a rollup. So we want the flexibility of changing or adding functions in this library. Of course that will use zkVM intrinsics for the actual hashing.

2 replies

matthiasgoergens Nov 25, 2024
Maintainer Author

For the circuits, it's about the same whether you hand the pre-image as an argument and ask it to fill in the digest, as it is to give the digest and ask it to fill in the pre-image.

The latter comes up a lot more often in a zkVM setting.

(Unless you want an intrinsic that gets both pre-image and digest as arguments, and then just aborts, if they don't match.)

matthiasgoergens Nov 26, 2024
Maintainer Author

But you are right that support for this will be a combination of VM built-in functionality and libraries.

See also how even private hints are a mixture of these two.

kunxian-xia · 2024-11-25T12:23:28Z

kunxian-xia
Nov 25, 2024
Maintainer

The public input region is expected to be relatively small, e.g. less than 2^4 field elements as the verifier needs to evaluate the MLE at a random point. Therefore it's should be holding hash digest of the data that a guest program wants to output.

The SP1's approach looks good to me. It feed the bytes to a static SHA256 object. And get hash digest by reading the value of that static object. This digest is then connected to the public values (committed_value_digest) using a dedicated syscall COMMIT.

1 reply

matthiasgoergens Nov 26, 2024
Maintainer Author

Thanks for looking that up! Doing public input via such a hash digest is something we should definitely explore.

The main reason I didn't go into further detail on that approach here is that we haven't implemented any of the necessary machinery, so far.

I don't know whether we need a dedicated syscall, but that's an internal detail that's not super relevant for users of the VM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Public IO #627

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 8 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Public IO #627

matthiasgoergens Nov 25, 2024 Maintainer

Simulate input or simulate output?

Merkle-ise by default?

Replies: 3 comments · 8 replies

naure Nov 25, 2024 Collaborator

matthiasgoergens Nov 25, 2024 Maintainer Author

naure Nov 25, 2024 Collaborator

kunxian-xia Nov 25, 2024 Maintainer

icemelon Nov 26, 2024 Maintainer

matthiasgoergens Nov 26, 2024 Maintainer Author

naure Nov 25, 2024 Collaborator

matthiasgoergens Nov 25, 2024 Maintainer Author

matthiasgoergens Nov 26, 2024 Maintainer Author

kunxian-xia Nov 25, 2024 Maintainer

matthiasgoergens Nov 26, 2024 Maintainer Author

matthiasgoergens
Nov 25, 2024
Maintainer

Replies: 3 comments 8 replies

naure
Nov 25, 2024
Collaborator

matthiasgoergens Nov 25, 2024
Maintainer Author

naure Nov 25, 2024
Collaborator

kunxian-xia Nov 25, 2024
Maintainer

icemelon Nov 26, 2024
Maintainer

matthiasgoergens Nov 26, 2024
Maintainer Author

naure
Nov 25, 2024
Collaborator

matthiasgoergens Nov 25, 2024
Maintainer Author

matthiasgoergens Nov 26, 2024
Maintainer Author

kunxian-xia
Nov 25, 2024
Maintainer

matthiasgoergens Nov 26, 2024
Maintainer Author