Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add equivalence tester #108

Merged
merged 2 commits into from
Dec 9, 2024
Merged

Add equivalence tester #108

merged 2 commits into from
Dec 9, 2024

Conversation

hanno-becker
Copy link
Collaborator

@hanno-becker hanno-becker commented Nov 25, 2024

By default, SLOTHY includes a 'selfcheck' verifying that the DFGs
of input and output are isomorphic with respect to the rescheduling
found by SLOTHY. This selfcheck is very powerful and should catch most
modelling errors in SLOTHY.

However, some aspects of SLOTHY remain outside of the scope of the
DFG-based selfcheck:

  • Address offset fixup

    This is a historically rather error prone feature to which the self-check
    is blind (since it relies on making the DFG blind to address increments).

  • Loop boundaries

    If SW pipelining is enabled, the selfcheck checks equivalence of a
    bounded unrolling of the input and output loops.

    Therefore, it would not catch issues with (a) the loop boundary, or
    (b) exceptional iterations for low loop counts.

While it is not SLOTHY's goal to provide formal guarantees of correctness
of the optimizations -- this should be gauged/confirmed through extensive test
and ideally formal verification -- development would be faster if bugs
in the aforementioned areas were caught before the developer integrates
and tests the SLOTHY output manually.

As a step towards catching more potential pitfalls early in the SLOTHY
workflow, this commit starts the development of an equivalence tester.

The idea of the equivalence tester is simple: Just run original and optimized
assembly with various random inputs, and check that they indeed lead to
equivalent outputs.

This commit implements this idea for AArch64 and Armv7-M, leveraging
LLVM-MC as an assembler and the unicorn-engine for emulation.

An initial limitation is support for loops: For now, the equivalence
checker tests bounded unrollings of loops, just like the selfcheck. That is
against the point of the equivalenc check, but it will be improved at a later
point, after some refactoring of the loop generation code.

The checker added in this commit is able to detect issues with address offset
fixup, however, which would previously have gone unnoticed by the selfcheck.

The new equivalence test is configured via config.selftest, which is enabled
by default, but silently skipped for unsupported platforms or target architectures.

Copy link
Collaborator

@dop-amin dop-amin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like how well the feature is integrated with slothy already.

As you write, there's room for improvment regarding the abstraction of the approach. I think things like the number/types of registers etc. could be inferred from the Arch models automatically for example.

I tested the checker with some small examples and also introduced mistakes on purpose that were caught.
Apart from the suggested changes to the code, there have been no issues for me.

Possibly, adding an example in example.py could be nice to make it complete.

tests/aarch64/checker/README.md Outdated Show resolved Hide resolved
tests/aarch64/checker/checker.c Outdated Show resolved Hide resolved
slothy/targets/aarch64/aarch64_neon.py Outdated Show resolved Hide resolved
@hanno-becker hanno-becker marked this pull request as draft November 28, 2024 12:14
@hanno-becker
Copy link
Collaborator Author

Marking this is as draft because @dop-amin had promising ideas on how to achieve the goals of this PR by reusing the Unicorn Framework.

@hanno-becker hanno-becker force-pushed the equiv_test branch 6 times, most recently from c96bc14 to 87e6043 Compare December 8, 2024 06:19
By default, SLOTHY includes a 'selfcheck' verifying that the DFGs
of input and output are isomorphic with respect to the rescheduling
found by SLOTHY. This selfcheck is very powerful and should catch most
modelling errors in SLOTHY.

However, some aspects of SLOTHY remain outside of the scope of the
DFG-based selfcheck:

- Address offset fixup

  This is a historically rather error prone feature to which the self-check
  is blind (since it relies on making the DFG blind to address increments).

- Loop boundaries

  If SW pipelining is enabled, the selfcheck checks equivalence of a
  _bounded unrolling_ of the input and output loops.

  Therefore, it would not catch issues with (a) the loop boundary, or
  (b) exceptional iterations for low loop counts.

While it is not SLOTHY's goal to provide formal guarantees of correctness
of the optimizations -- this should be gauged/confirmed through extensive test
and ideally formal verification -- development would be faster if bugs
in the aforementioned areas were caught before the developer integrates
and tests the SLOTHY output manually.

As a step towards catching more potential pitfalls early in the SLOTHY
workflow, this commit starts the development of an _equivalence tester_.

The idea of the equivalence tester is simple: Just run original and optimized
assembly with various random inputs, and check that they indeed lead to
equivalent outputs.

This commit implements this idea for AArch64 and Armv7-M, leveraging
LLVM-MC as an assembler and the unicorn-engine for emulation.

An initial limitation is support for loops: For now, the equivalence
checker tests bounded unrollings of loops, just like the selfcheck. That is
against the point of the equivalenc check, but it will be improved at a later
point, after some refactoring of the loop generation code.

The checker added in this commit is able to detect issues with address offset
fixup, however, which would previously have gone unnoticed by the selfcheck.

The new equivalence test is configured via config.selftest, which is enabled
by default, but silently skipped for unsupported platforms or target architectures.
@hanno-becker hanno-becker marked this pull request as ready for review December 8, 2024 06:28
@hanno-becker hanno-becker force-pushed the equiv_test branch 3 times, most recently from 495860c to c65d7b9 Compare December 9, 2024 06:44
@hanno-becker hanno-becker changed the title AArch64: Add equivalence tester Add equivalence tester Dec 9, 2024
@hanno-becker
Copy link
Collaborator Author

@dop-amin Can you pick up the global selftest for Armv7-M? The following should be made work

python3 example.py --example armv7m_simple0_func_m7

At the moment, it passes the local selftest for me, but the global selftest ƒails with unmapped memory or invalid instruction errors -- it looks like it is missing the function return branch. My suspicion is that I am messing up and somehow llvm-mc (assembling) and unicorn (emulation) aren't called in a compatible configuration. This is configured in arch_v7m.py

llvm_mca_arch = "arm"
llvm_mc_arch = "arm" ### TODO: What to put here?
llvm_mc_attr = "armv5te,thumb2,dsp" ### TODO: What to put here?

unicorn_arch = UC_ARCH_ARM
unicorn_mode = UC_MODE_THUMB

@hanno-becker hanno-becker force-pushed the equiv_test branch 3 times, most recently from a359d3f to 76cadea Compare December 9, 2024 10:37
This commit prototypes an implementation of a 'global' selftest,
building and emulating the entire source code loaded into SLOTHY
and checking that input and output behave equivalently.

To use, call `slothy.global_selftest(...)` at any point during
stateful SLOTHY operations. Then, the global selftest will compare
the current state of the code with the original state.

See ntt_kyber_123_4567 for an example.
@hanno-becker hanno-becker merged commit c31b6b3 into main Dec 9, 2024
13 checks passed
@hanno-becker hanno-becker deleted the equiv_test branch December 9, 2024 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants