Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration test #368

Merged
merged 83 commits into from
Nov 19, 2024
Merged

Integration test #368

merged 83 commits into from
Nov 19, 2024

Conversation

kunxian-xia
Copy link
Collaborator

@kunxian-xia kunxian-xia commented Oct 12, 2024

Summary

This PR aims to add an integration test in which we proves the fibonacci rust program. The goal will be achieved by two steps:

  1. We will only prove the fibonacci ELF compiled by SP1 rust toolchain in the first place;
  2. We will prove the fibonacci rust guest program once our toolchain is ready.

This PR currently builds on top of the changes to ceno_emul in #487.

naure:

Summary of changes

  • The new example program from sp1 is similar to the existing example except:
    • This uses a fixed set of memory addresses: the data segments from the ELF and the stack.
    • This accepts ECALLs as NOPs.
  • Move the circuits for memory initialization into their own module called MmuConfig (Memory Management Unit). Meanwhile, Rv32imConfig focuses on instruction circuits.
  • Make assignment in memory circuits (NonVolatileRamCircuit) more flexible.
    • Simplify the assign functions.
    • Move the flexibility of memory initialization to a dedicated type MemPadder.
    • Pad with valid addresses and zero values.
    • Do not assume that the table covers the whole range of valid addresses. Instead, it is only as large as necessary which is much smaller.
  • Removed the notion of program data out of circuits and generalize it as static memory. That is memory with a static set of addresses.
  • Refactor SetTableSpec to reflect that the address parameters are not relevant in the case FixedAddr.
  • Fix missing padding of the program table witness.
  • Various debug logs and assertions.

Performance

By running the command cargo run --example fibonacci_elf --release, we get the e2e proving time is 48.5s 1 for 11538921 execution steps. This is equivalent to 237.9 kHz.

Footnotes

  1. Configuration of the benchmark machine: AMD EPYC 9R14 (32 threads) + 64GB memory

@kunxian-xia kunxian-xia self-assigned this Oct 12, 2024
@kunxian-xia kunxian-xia linked an issue Oct 12, 2024 that may be closed by this pull request
2 tasks
@kunxian-xia kunxian-xia marked this pull request as draft October 12, 2024 16:08
@kunxian-xia kunxian-xia changed the base branch from master to feat/load_mem October 25, 2024 15:14
Base automatically changed from feat/load_mem to master October 28, 2024 11:47
@kunxian-xia kunxian-xia changed the base branch from master to feat/expose_insns_in_emul October 28, 2024 12:29
Base automatically changed from feat/expose_insns_in_emul to master October 29, 2024 08:53
@kunxian-xia kunxian-xia marked this pull request as ready for review October 29, 2024 17:35
@kunxian-xia
Copy link
Collaborator Author

kunxian-xia commented Oct 29, 2024

We need to resolve #245 to support writes to X0.

ceno_zkvm/src/instructions/riscv/jump/jalr.rs Outdated Show resolved Hide resolved
ceno_zkvm/src/structs.rs Outdated Show resolved Hide resolved
ceno_zkvm/src/scheme/prover.rs Outdated Show resolved Hide resolved
ceno_emul/src/vm_state.rs Outdated Show resolved Hide resolved
@naure
Copy link
Collaborator

naure commented Nov 18, 2024

Status: it still breaks starting from --max-steps=4097 with logup_sum != 0

@naure naure mentioned this pull request Nov 18, 2024
@naure
Copy link
Collaborator

naure commented Nov 18, 2024

Fixed a bug: missing padding in the program table witness.

That seems unrelated but it was just randomly triggered by changes in this PR.

@kunxian-xia Ready again.

Copy link
Collaborator Author

@kunxian-xia kunxian-xia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new changes look good to me!

@naure naure merged commit 63ad6f9 into master Nov 19, 2024
6 checks passed
@naure naure deleted the feat/guest-example branch November 19, 2024 17:08
assert!(DVRAM::max_len().is_power_of_two());
let mut final_table =
RowMajorMatrix::<F>::new(final_mem.len().next_power_of_two(), num_witness);
let mut final_table = RowMajorMatrix::<F>::new(DVRAM::max_len(), num_witness);
Copy link
Collaborator

@hero78119 hero78119 Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @naure , could you elaborate in new design how we achieve non-uniform overhead? previously the allocated memory in RowMajorMatrix are dynamic and cover final_ram which I assumed final ram size can be dynamic according to memory access "per proof". However It's not obvious to me for how it work in new design

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type is not used in the example program, so this change was maybe not necessary. We can revert it if that’s incorrect.

@kunxian-xia Which version do you think is right?

Copy link
Collaborator

@hero78119 hero78119 Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new version is incorrect to me. Original design is the key of our non-uniform memory overhead for RAM holding heap/stack.
Originally I thought the design was kept in some other places so I bring up this question.
(I was also a bit surprise how this type is not used in Fabonacci program 😮, will check detail for how Fabonacci works in this PR)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there was some minor bug in it but that was maybe not the right fix. Let me revert / fix this in a new PR.

Copy link
Collaborator

@naure naure Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fibonacci program uses only a few predictable static addresses. So it works perfectly with NonVolatileRamCircuit and its addr: Fixed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert: #601

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the old design is wrong as it should not create the final_table based on the length of final_mem. For example, if the addresses in final_mem are vec![4, 16, 20, 128, 168], then the old assignment will create a final_table that only has 4 elements, while the correct final_table should has 168.next_power_of_two() = 256 elements.

Copy link
Collaborator

@hero78119 hero78119 Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old design:

  • final_mem can start on arbitrary offset address
  • final_mem are in dense and last is the max_access address

For example, if offset address is 0x12340000, and max access is 0x12340018
then address final_mem should be [0x12340000, 0x12340004, 0x12340008, 0x1234000c, 0x12340010, 0x12340014, 0x12340018]. In this case, final table size will be 7.next_power_of_two() = 8.

@kunxian-xia does the explanation make sense to you :) ?

Previously I didnt have clear documents on that so it cause misunderstanding, and at that moment emul mem design not settle yet, sorry for that

Will review on #601 and let settle the design there

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the above explanation makes sense to me. The main issue we have is we need to insert untouched memory records into final_mem manually. As in your example, if the real touched addresses are vec![0x12340000, 0x12340008, 0x12340018], then we need to insert vec![0x12340004, 0x1234000c, 0x12340010, 0x12340014] into final_mem manually, is this what you have in mind?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue we have is we need to insert untouched memory

yes exactly, as I previously explanation we should be able to resolve unconsecutive mem access in imperfect but good enough version.

naure pushed a commit that referenced this pull request Nov 19, 2024
@naure naure mentioned this pull request Nov 19, 2024
@kunxian-xia kunxian-xia changed the title WIP: integration test Integration test Nov 20, 2024
.map(|rs2| rs2.value);

let final_access = vm.tracer().final_accesses();
let end_cycle: u32 = vm.tracer().cycle().try_into().unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If PublicValues needs a u32 for end_cycle, then fn cycle should probably return a u32?

.assign_table_circuit::<ExampleProgramTableCircuit<E>>(&zkvm_cs, &prog_config, vm.program())
.unwrap();

if std::env::var("MOCK_PROVING").is_ok() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MOCK_PROVING=no ./run_my_programming would also result in this branch being taken? This might be a bit suprising.

hero78119 pushed a commit that referenced this pull request Nov 21, 2024
- Revert #368 for `DynVolatileRamTableConfig`.
- Add padding of `final_cycle`.

---------

Co-authored-by: Aurélien Nicolas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

prove fibonacci program example to improve testing coverage
4 participants