-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVX-512 support for RSA Signing #1273
Conversation
1ceedcf
to
676b064
Compare
Thanks for catching that @dkostic; updated. |
f5236af
to
91881d5
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1273 +/- ##
==========================================
- Coverage 78.51% 78.30% -0.22%
==========================================
Files 583 584 +1
Lines 98809 99188 +379
Branches 14159 14189 +30
==========================================
+ Hits 77583 77666 +83
- Misses 20598 20892 +294
- Partials 628 630 +2 ☔ View full report in Codecov by Sentry. |
crypto/fipsmodule/bn/rsaz_exp_x2.c
Outdated
#include <string.h> | ||
#include "rsaz_exp.h" | ||
|
||
# define ALIGN_OF(ptr, boundary) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use
aws-lc/crypto/poly1305/poly1305.c
Line 45 in 7e6aef8
static inline struct poly1305_state_st *poly1305_aligned_state( |
const BIGNUM *m2, const BN_MONT_CTX *in_mont2, | ||
BN_CTX *ctx) | ||
{ | ||
int ret = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the indentation be set to 2 spaces?
crypto/fipsmodule/bn/rsaz_exp_x2.c
Outdated
return (bitsize + digit_size - 1) / digit_size; | ||
} | ||
|
||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest these declarations be moved to crypto/fipsmodule/bn/internal.h and be documented there.
############################################################################### | ||
{ | ||
# input parameters ("%rdi","%rsi","%rdx","%rcx","%r8") | ||
my ($res,$a,$b,$m,$k0) = @_6_args_universal_ABI; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me if this takes win64 into account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right, I don't think it is either! I've updated this with a ternary check.
# Registers mapping for normalization. | ||
my ($T0,$T0h,$T1,$T1h,$T2) = ("$zero", "$Bi", "$Yi", map("%ymm$_", (25..26))); | ||
|
||
sub amm52x20_x1() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is Algorithm 7 (Fig 5) in the J Cryptographic Eng. (2012) paper or Alg 3 in iacr 2011-239, can the exact algorithm be cited. It would be great if the various blocks of steps are annotated with the steps from the algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @nebeid, could I plan to follow up with a second PR with the documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, Dan. We will be looking at the delocate issue on Arm. If, meanwhile, you can shed any light on the algorithm steps, that would be much appreciated.
7ede753
to
e3ca1f5
Compare
Looks like Linux CI runs are failing. Would someone mind sharing the details on those failures? |
|
Thanks! Looks like the IFMA flag isn't being passed in this command. Building and running tests seems okay on my SPR dev machine, though. Maybe I've missed a build conditional somewhere… |
I think you need to add it here as well: https://github.com/aws/aws-lc/blob/main/crypto/fipsmodule/CMakeLists.txt#L367 |
The fipstool delocation only allows the use of `lea` when interacting with this symbol. This commit uses `lea` and `r11` as required by the delocation process.
.github/workflows/mingw.yml
Outdated
@@ -0,0 +1,42 @@ | |||
name: MinGW |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this? Is this not covered by our existing intel SDE tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this file got into this PR. This appears to be a duplicate of CI tests we already have: https://github.com/aws/aws-lc/blob/main/.github/workflows/windows-alt.yml#L11
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It must have come in with an intermediate merge somewhere along the way. I will remove it.
I expect to get back to this next week—chaos reigns over here at the moment. |
I've made it through @nebeid and @dkostic's last reviews and I'll start the testing / merging process here in a bit. I should have something pushed up in the next day or so. That leaves:
Which I will get started on as soon as I get the current state of the patch in order. As for follow-ups:
|
// - We have AMM(t, 2^k) = R^4 * 2^{4*(s-n)} / R'^2 mod m | ||
// = R'^4 / R'^2 mod m | ||
// = R'^2 mod m | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason the example wasn't added back? It was in the original commit, I just reworded it. I think it was a helpful illustration.
crypto/fipsmodule/bn/rsaz_exp_x2.c
Outdated
* Number of word-size (uint64_t) digits to store in redundant | ||
* representation. | ||
*/ | ||
// Number of word-size (uint64_t) digits to store in redundant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Number of word-size (uint64_t) digits to store in redundant | |
// Number of word-size (uint64_t) digits to store values in redundant |
crypto/fipsmodule/bn/rsaz_exp_x2.c
Outdated
amm(rr1_red, rr1_red, rr1_red, m1_red, k0_1); | ||
amm(rr1_red, rr1_red, coeff_red, m1_red, k0_1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amm(rr1_red, rr1_red, rr1_red, m1_red, k0_1); | |
amm(rr1_red, rr1_red, coeff_red, m1_red, k0_1); | |
amm(rr1_red, rr1_red, rr1_red, m1_red, k0_1); // (1) for m1 | |
amm(rr1_red, rr1_red, coeff_red, m1_red, k0_1); // (2) for m1 |
crypto/fipsmodule/bn/rsaz_exp_x2.c
Outdated
amm(rr2_red, rr2_red, rr2_red, m2_red, k0_2); | ||
amm(rr2_red, rr2_red, coeff_red, m2_red, k0_2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
amm(rr2_red, rr2_red, rr2_red, m2_red, k0_2); | |
amm(rr2_red, rr2_red, coeff_red, m2_red, k0_2); | |
amm(rr2_red, rr2_red, rr2_red, m2_red, k0_2); // (1) for m2 | |
amm(rr2_red, rr2_red, coeff_red, m2_red, k0_2); // (2) for m2 |
crypto/fipsmodule/bn/rsaz_exp_x2.c
Outdated
@@ -120,11 +121,11 @@ int RSAZ_mod_exp_avx512_x2(uint64_t *res1, | |||
uint64_t *storage = NULL; | |||
uint64_t *storage_aligned = NULL; | |||
int storage_len_bytes = 7 * regs_capacity * sizeof(uint64_t) | |||
+ 64 /* alignment */; | |||
+ 64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+ 64; | |
+ 64; // alignment |
the added 64 is for alignment, right?
crypto/fipsmodule/bn/rsaz_exp_x2.c
Outdated
// `rem` is { 1024, 1536, 2048 } % 5 which is { 4, 1, 3 } | ||
// respectively. | ||
// | ||
// If this assertion ever fails the fix above is easy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If this assertion ever fails the fix above is easy. | |
// If this assertion ever fails then we should set this easy fix | |
// exp_bit_no = modlen - exp_win_size |
Is that what's intended? Because the change removed "the fix above".
crypto/fipsmodule/bn/rsaz_exp_x2.c
Outdated
* Get additional bits from then next quadword | ||
* when 64-bit boundaries are crossed. | ||
*/ | ||
red_table_idx_1 = expz[exp_chunk_no + 0 * (exp_digits + 1)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
red_table_idx_1 = expz[exp_chunk_no + 0 * (exp_digits + 1)]; | |
red_table_idx_1 = expz[EXP_CHUNK(0)]; |
{ | ||
const int rem = modulus_bitsize % exp_win_size; | ||
const BN_ULONG table_idx_mask = exp_win_mask; | ||
const int rem = modlen % exp_win_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(if it's correct)
const int rem = modlen % exp_win_size; | |
// Find the location of the 5-bit window in the exponent which is stored | |
// in 64-bit digits. Left pad it with 0s to form a 64-bit digit to become | |
// an index in the precomputed table. | |
// The window location in the exponent is identified by its least | |
// significant bit `exp_bit_no`. | |
const int rem = modlen % exp_win_size; |
crypto/fipsmodule/bn/rsaz_exp_x2.c
Outdated
} | ||
{ | ||
red_table_idx_1 = expz[exp_chunk_no + 1 * (exp_digits + 1)]; | ||
T = expz[exp_chunk_no + 1 + 1 * (exp_digits + 1)]; | ||
red_table_idx_2 = expz[exp_chunk_no + 1 * (exp_digits + 1)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
red_table_idx_2 = expz[exp_chunk_no + 1 * (exp_digits + 1)]; | |
red_table_idx_2 = expz[EXP_CHUNK(1)]; |
crypto/fipsmodule/bn/rsaz_exp_x2.c
Outdated
} | ||
|
||
/* Series of squaring */ | ||
// Series of squaring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crypto/impl_dispatch_test.cc
Outdated
uint64_t k0_2 = 0; | ||
int modlen = 0; | ||
|
||
RSAZ_mod_exp_avx512_x2(&res1, &base1, &exp1, &m1, &rr1, k0_1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be calling BN_mod_exp_mont_consttime_x2
in exponentiation.c to make sure this function gets called. It may be why the test is failing as follows: the function is called where the conditions in flag.second
are false.
[ RUN ] ImplDispatchTest.BN_mod_exp_mont_consttime_x2
../crypto/impl_dispatch_test.cc:105: Failure
Expected equality of these values:
flag.second
Which is: false
BORINGSSL_function_hit[flag.first] == 1
Which is: true
Google Test trace:
../crypto/impl_dispatch_test.cc:103: 8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course you're right. I'm not sure what I was thinking here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I even named the test the right thing! But got mixed up while filling it in. Anyways, 92b9e3f fixes this, and with that I think we're caught up.
crypto/impl_dispatch_test.cc
Outdated
BN_CTX_end(ctx); | ||
BN_MONT_CTX_free(mont1); | ||
BN_MONT_CTX_free(mont2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BN_CTX_end(ctx); | |
BN_MONT_CTX_free(mont1); | |
BN_MONT_CTX_free(mont2); | |
BN_MONT_CTX_free(mont1); | |
BN_MONT_CTX_free(mont2); | |
BN_CTX_end(ctx); | |
BN_CTX_free(ctx); |
@@ -47,5 +47,5 @@ batch: | |||
env: | |||
type: LINUX_CONTAINER | |||
privileged-mode: true | |||
compute-type: BUILD_GENERAL1_MEDIUM | |||
compute-type: BUILD_GENERAL1_LARGE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is a leftover of a merge with main?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. An advice from the team was to increase the capacity for that particular Android cross build which wasn't passing.
Pushing a merge commit "dismissed" your approval @nebeid, that was not my intention! |
## What's Changed * Use OPENSSL_STATIC_ASSERT which handles all the platform/compiler/C s… by @andrewhop in #1791 * ML-KEM refactor by @dkostic in #1763 * ML-KEM-IPD to ML-KEM as defined in FIPS 203 by @dkostic in #1796 * Add KDA OneStep testing to ACVP by @skmcgrail in #1792 * Updating erroneous documentation for BIO_get_mem_data and subsequent usage by @smittals2 in #1752 * No-op impls for several EVP_PKEY_CTX functions by @justsmth in #1759 * Drop "ipd" suffix from ML-KEM related code by @dkostic in #1797 * Upstream merge 2024 08 19 by @skmcgrail in #1781 * ML-KEM move to the FIPS module by @dkostic in #1802 * Reduce collision probability for variable names by @torben-hansen in #1804 * Refactor ENGINE API and memory around METHOD structs by @smittals2 in #1776 * bn: Move x86-64 argument-based dispatching of bn_mul_mont to C. by @justsmth in #1795 * Check at runtime that the tool is loading the same libcrypto it was built with by @andrewhop in #1716 * Avoid matching prefixes of a symbol as arm registers by @torben-hansen in #1807 * Add CI for FreeBSD by @justsmth in #1787 * Move curve25519 implementations to fips module except spake25519 by @torben-hansen in #1809 * Add CAST for SP 800-56Cr2 One-Step function by @skmcgrail in #1803 * Remove custom PKCS7 ASN1 functions, add new structs by @WillChilds-Klein in #1726 * NASM use default debug format by @justsmth in #1747 * Add KDF in counter mode ACVP Testing by @skmcgrail in #1810 * add support for OCSP_request_verify by @samuel40791765 in #1778 * Fix GitHub/CodeBuild Purge Lambda by @justsmth in #1808 * KBKDF_ctr_hmac FIPS Service Indicator by @skmcgrail in #1798 * Update x509 tool to write all output to common BIO which is a file or stdout by @andrewhop in #1800 * Add ML-KEM to speed.cc, bump AWSLC_API_VERSION to 30 by @andrewhop in #1817 * Add EVP_PKEY_asn1_* functions by @justsmth in #1751 * Improve portability of CI integration script by @torben-hansen in #1815 * Upstream merge 2024 08 23 by @justsmth in #1799 * Replace ECDSA_METHOD with EC_KEY_METHOD and add the associated API by @smittals2 in #1785 * Cherrypick "Add some barebones support for DH in EVP" by @samuel40791765 in #1813 * Add KDA OneStep (SSKDF_digest and SSKDF_hmac) to FIPS indicator by @skmcgrail in #1793 * Add EVP_Digest one-shot test XOFs by @WillChilds-Klein in #1820 * Wire-up ACVP Testing for SHA3 Signatures with RSA by @skmcgrail in #1805 * Make SHA3 (not SHAKE) Approved for EVP_DigestSign/Verify, RSA and ECDSA. by @nebeid in #1821 * Begin tracking RelWithDebInfo library statistics by @andrewhop in #1822 * Move EVP ed25519 function table under FIPS module by @torben-hansen in #1826 * Avoid C11 Atomics on Windows by @justsmth in #1824 * Improve pre-sandbox setup by @torben-hansen in #1825 * Add OCSP round trip integration test with minor fixes by @samuel40791765 in #1811 * Add various PKCS7 getters and setters by @WillChilds-Klein in #1780 * Run clang-format on pkcs7 code by @WillChilds-Klein in #1830 * Move KEM API and ML-KEM definitions to FIPS module by @torben-hansen in #1828 * fix socat integration CI by @samuel40791765 in #1833 * Retire out-of-module KEM folder by @torben-hansen in #1832 * Refactor RSA_METHOD and expand API by @smittals2 in #1790 * Update benchmark documentation in tool/readme.md by @andrewhop in #1812 * Pre jail unit test by @torben-hansen in #1835 * Move EVP KEM implementation to in-module and correct OID by @torben-hansen in #1838 * More minor symbols Ruby depends on by @samuel40791765 in #1837 * ED25519 Power-on Self Test / CAST / KAT by @skmcgrail in #1834 * ACVP ML-KEM testing by @skmcgrail in #1840 * ACVP ECDSA SHA3 Digest Testing by @skmcgrail in #1819 * ML-KEM Service Indicator for EVP_PKEY_keygen, EVP_PKEY_encapsulate, EVP_PKEY_decapsulate by @skmcgrail in #1844 * Add ML-KEM CAST for KeyGen, Encaps, and Decaps by @skmcgrail in #1846 * ED25519 Service Indicator by @skmcgrail in #1829 * Update Allowed RSA KeySize Generation to FIPS 186-5 specification by @skmcgrail in #1823 * Add ED25519 ACVP Testing by @skmcgrail in #1818 * Make EDDSA/Ed25519 POST lazy initalized by @skmcgrail in #1848 * add support for PEM Parameters without ASN1 hooks by @samuel40791765 in #1831 * Add OpenVPN tip of main to CI by @smittals2 in #1843 * Ensure SSE2 is enabled when using optimized assembly for 32-bit x86 by @graebm in #1841 * Add support for `EVP_PKEY_CTX_ctrl_str` - Step #1 by @justsmth in #1842 * Added SHA3/SHAKE XOF functionality by @jakemas in #1839 * Migrated ML-KEM SHA3/SHAKE usage to fipsmodule by @jakemas in #1851 * AVX-512 support for RSA Signing by @pittma in #1273
Description of changes:
This patch adds AVX-512 support for RSA 2k, 3k and 4k signing. It is built around the use of AVX512_IFMA within the (Almost) Montgomery Multiplication implementation that comprises the modular exponentiation part of the RSA algorithm. It is ported from the OpenSSL patch.
When running the provided speed tests, the following contains the results with and without this patch:
There is currently not support for 8k, so no change there. However, this could be a follow on if there is interest in that.
Call-outs:
This patch is primarily additive modulo a small logic change that occurs here, where, previously, the calls to
mod_montgomery
andBN_mod_exp_mont_consttime
were interleaved. The intermediate value ofr1
is needed for the first exponentiation call; in order to make this possible when doing parallel exponentiations, we create a newBIGNUM
on the context (r2
).Testing:
I added coverage for the fuzzer and borrowed a couple of test cases from the existing
mod_exp
tests to hit the newBN_mod_exp_mont_consttime_x2
function. I'm more than happy to pull out more cases from those tests, or whatever else is suggested here, just let me know!By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.