Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant-time EC point multiplication (Montgomery ladder) implementation #325

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

ChinoCribioli
Copy link
Contributor

@ChinoCribioli ChinoCribioli commented Sep 11, 2024

Description

The mulPointScalar method is implemented with the regular square and multiply algorithm, which is prone to timing attacks due to the fact that the number of EC point additions depends on the number of 1's in the binary expression of the scalar.

With this implementation (called Montgomery Ladder) this is avoided by maintaining the number of additions depending only on the length of the binary representation of the scalar.

As documented in my implementation of the method, the algorithm works because of the following invariant: At each step, R0 will be r_0*base where r_0 is the prefix of e written in binary and R1 will be (r_0+1)*base. In other words: at iteration i of the loop, r_0's binary representation will be the first i+1 most significant bits of e. If the upcoming bit is a 0, we just have to double R0 and add R0 to R1 to maintain the invariant. If it is a 1, we have to double R0 and add 1*base (or add R1, which is the same as (r_0+1)*base), and double R1 to maintain the invariant.

Related Issue(s)

Fixes #324

Other information

Aside from the new implementation of the mulPointScalar method, I added some tests to automatically check basic behaviors such as EC point additions, EC point multiplications, and the orders of the generating/base points used in the protocol.

These tests are independent of the new implementation of the multiply method and are intended to make the test base more robust.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • I have run yarn style without getting any errors
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

ChinoCribioli and others added 17 commits August 28, 2024 17:08
Copy link
Collaborator

@artwyman artwyman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool, @ChinoCribioli. Thanks for taking it on.

I feel underqualified to review the correctness of the cryptographic algorithm, and how well it meets the goal of resistance to timing attacks. Perhaps @cedoor is willing to do so, or can suggest someone to double-check here. I mostly trust the unit tests elsewhere (particularly the compatibility tests in eddsa-poseidon comparing results to circomlibjs) to ensure the results of this calculation are correct.

I do feel competent to comment on some aspects of the code, and potential micro-optimizations, so I've done that here. One thing I'm concerned about is relative performance of this algorithm compared to the old one. Can you do some testing of the relative speed of this raw operation, and/or higher-level operations which depend on it (like eddsa-poseidon sign/verify). Just timing operations in a loop and reporting results in the PR would be sufficient here. I don't think this needs an automated performance test in the test suite.

@@ -65,25 +65,43 @@ export function addPoint(p1: Point<bigint>, p2: Point<bigint>): Point<bigint> {
/**
* Performs a scalar multiplication by starting from the 'base' point and 'adding'
* it to itself 'e' times.
* This works given the following invariant: At each step, R0 will be r_0*base where r_0 is the prefix of e
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest additionally referring to the named algorithm being used here (Montgomery Ladder) perhaps with a link to a description of the algorithm (Wikipedia or some other source).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Will do.

return [BigInt(0), BigInt(1)]
}
const eBits: Array<boolean> = []
while (!scalar.isZero(e)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the goal constant time? This loop doesn't seem constant time to me, since it'll exit early if high bits of the input are zero. As a result the length of eBits is also variable. If you want constant time, it seems like there has to be a hard-coded max bit length (254 I think, but someone should double-check me) somewhere.

If you wish, I think this loop can probably be skipped entirely with some bigint bitwise operations below. Something like this:

  for (let bitMask = 1n, bitmask < 1n<<254n, bitmask <<= 1n) {
    if (e & bitMask != 0) {
      // ...

I'm not sure of the relative efficiency, but I'd guess that the bitmasks are cheaper than allocating memory for an array.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this sounds correct to me. If e is 7 for example this will loop 3 times, whereas if it's like 4182498124124891489144 it will loop significantly more times.

Hardcoding an iteration count is a good approach, but we'll still have timing vulnerabilities due to the use of the javascript BigInt type which is not safe against timing attacks.

It's also unclear if the scalar.isZero function is constant time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this point. I changed the loop with a hardcoded for.

while (!scalar.isZero(rem)) {
if (scalar.isOdd(rem)) {
res = addPoint(res, exp)
if (scalar.isZero(e)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't constant time if e is zero. Is that okay?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think although the exponent is marked as secret, it may be considered public if it's set to 0. The rationale in my mind is "no private key would ever be selected as the 0 value". That said, I don't think it's a good idea to short circuit here because it breaks the comment above of e being secret in all cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also thought initially that we shouldn't worry too much about the zero case since it is irrelevant cryptographically. But I agree to change this to make the method more consistent.

@cedoor
Copy link
Member

cedoor commented Sep 12, 2024

Thanks @ChinoCribioli and @artwyman!

Perhaps @cedoor is willing to do so, or can suggest someone to double-check here.

I'm also not competent enough to check the logic out here. I'll ask other people to review it.

@ChinoCribioli Is this a breaking change?

Copy link
Collaborator

@chancehudson chancehudson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to see discussion about timing vulnerabilities, and a constant time ec multiplication implementation.

Unfortunately this function is still unsafe because of the use of the javascript BigInt type. Internally a BigInt is (likely) represented as the minimum number of bytes for the value. This means any operation we do here will be variable time based on the value of the BigInt.

I've tried to think of how to convert from BigInt to an alternate representation in constant time, but haven't found an approach that seems like it would be safe. Approaches i've considered:

  • use e.toString(2): this will return a binary string, but it will be of variable length based on the size of the value
  • use repeated division to extract le bytes: division is generally not a constant time operation based on the numerator. It would also be operating on a bigint which is not constant time
  • do e + (1n << 254n) to get a ~constant size value then operate on that and discard bits above 254: this could work, and may be the most safe approach, but it's unclear if the addition is constant time (the implementation may have to backfill the bits between the value and 254). It's also unclear if bigints are variable time based on only the number of bits in the value, or based on something more specific like the number of 1's in the value, and the total number of bits

I think it's a good idea to include this implementation though (after the requested changes) to mitigate timing attacks as much as possible.

while (!scalar.isZero(rem)) {
if (scalar.isOdd(rem)) {
res = addPoint(res, exp)
if (scalar.isZero(e)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think although the exponent is marked as secret, it may be considered public if it's set to 0. The rationale in my mind is "no private key would ever be selected as the 0 value". That said, I don't think it's a good idea to short circuit here because it breaks the comment above of e being secret in all cases.

return [BigInt(0), BigInt(1)]
}
const eBits: Array<boolean> = []
while (!scalar.isZero(e)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this sounds correct to me. If e is 7 for example this will loop 3 times, whereas if it's like 4182498124124891489144 it will loop significantly more times.

Hardcoding an iteration count is a good approach, but we'll still have timing vulnerabilities due to the use of the javascript BigInt type which is not safe against timing attacks.

It's also unclear if the scalar.isZero function is constant time.

}
const eBits: Array<boolean> = []
while (!scalar.isZero(e)) {
if (scalar.isOdd(e)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear if isOdd is constant time

}
e = scalar.shiftRight(e, BigInt(1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A right shift on a BigInt is almost certainly not constant time.

let R0: Point<bigint> = base
let R1: Point<bigint> = addPoint(base, base)

for (const bit of eBits.slice(0, -1).reverse()) {
Copy link
Collaborator

@chancehudson chancehudson Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This slice is a no-op, i think. It may be necessary to invoke the reverse function though.

nvm, it returns all elements but the last one.

Copy link
Collaborator

@artwyman artwyman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments on the latest changes. I'm still curious about relative performance of this implementation vs. the previous one.

packages/baby-jubjub/src/baby-jubjub.ts Show resolved Hide resolved

let R0: Point<bigint> = base
let R1: Point<bigint> = addPoint(base, base)
let R0: Point<bigint> = [0n, 1n]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see there's a unit test covering multiply by zero, which is good to confirm this still behaves the same way as before without the special-case.
As a side suggestion, it might be clearer to make this a public const id like is used in the unit tests, to make it clear that [0n, 1n] is the representation of the zero/id point in this curve.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that. It is done.

@ChinoCribioli
Copy link
Contributor Author

@chancehudson I just pushed most of the changes you requested. My only doubt now regarding any possible remaining timing vulnerability lies on the lines e %= order and if (e & mask). Mostly the latter, since I don't know if the evaluation of an if statement in an integer is implemented in constant time.

@ChinoCribioli
Copy link
Contributor Author

Comments on the latest changes. I'm still curious about relative performance of this implementation vs. the previous one.

I've run the tests of the baby-jubjub and eddsa-poseidon packages (which are the main packages affected by this PR) with both implementations and these are the results:

babyjubjub_before
babyjubjub_withMyChange
eddsa_before
eddsa_withMyChange

@chancehudson
Copy link
Collaborator

on the lines e %= order and if (e & mask). Mostly the latter, since I don't know if the evaluation of an if statement in an integer is implemented in constant time.

The if statement should be fine as long as e is always the same size. This branch is specifically designed so that each codepath is equal time (this is the Montgomery ladder strategy). We just need to make sure the condition evaluation is constant time as well, which i suggested a change to try and ensure.

Again though, this is best effort. Regardless we're going to have some timing leakage from the BigInt implementation.

@artwyman
Copy link
Collaborator

I've run the tests of the baby-jubjub and eddsa-poseidon packages (which are the main packages affected by this PR) with both implementations and these are the results:

I think this needs a more direct test of the affected operations (repeated in a loop to avoid fixed overhead and caching effects) to make the differences more clear. It does look to me like the test times are uniformly slower, though, which isn't very pleasing. I submitted an issue requesting to make it faster (which looks like it may be met by a WASM implementation implemented in Rust). The utest which targets small values in particular is expected to get slower for a constant-time algorithm, but I assume the eddsa-poseidon tests are operating on more typical values.

I'm not entirely sure of the right trade-off here. Speaking for the Zupass use case, I think we probably care more about performance than timing attacks right now, but that's not a universal choice I expect to apply to everyone.

while (!scalar.isZero(rem)) {
if (scalar.isOdd(rem)) {
res = addPoint(res, exp)
e %= order
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, the original implementation doesn't make any assertions or modifications to this variable. This is an exponent, which is not necessarily a field element, so the reduction should be unnecessary.

I think if we remove this we have a pretty safe function. We just need to look more deeply at the performance hit and decide if it's worth it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An exponent doesn't have to be a field element, but the curve is cyclic with its order, so a value greater than the order is redundant compared to the modular reduction of the same value (either leads to the same output). And the constant-ish time algorithm below works only if you can fix the number of bits in the input. If we reduce the modular reduction, then I think this function becomes incorrect for any value greater than the order. Modular reduction is not the same as cutting off high bits (because the order is not a power of 2).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. As Andrew says, some tests break when you remove the reduction because you incorrectly handle the case where the exponent passed as input is greater than 254 bits.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right because the loop count is hardcoded, i see

@ChinoCribioli
Copy link
Contributor Author

I've run the tests of the baby-jubjub and eddsa-poseidon packages (which are the main packages affected by this PR) with both implementations and these are the results:

I think this needs a more direct test of the affected operations (repeated in a loop to avoid fixed overhead and caching effects) to make the differences more clear. It does look to me like the test times are uniformly slower, though, which isn't very pleasing. I submitted an issue requesting to make it faster (which looks like it may be met by a WASM implementation implemented in Rust). The utest which targets small values in particular is expected to get slower for a constant-time algorithm, but I assume the eddsa-poseidon tests are operating on more typical values.

I'm not entirely sure of the right trade-off here. Speaking for the Zupass use case, I think we probably care more about performance than timing attacks right now, but that's not a universal choice I expect to apply to everyone.

I get that for some projects performance is more important than

I've run the tests of the baby-jubjub and eddsa-poseidon packages (which are the main packages affected by this PR) with both implementations and these are the results:

I think this needs a more direct test of the affected operations (repeated in a loop to avoid fixed overhead and caching effects) to make the differences more clear. It does look to me like the test times are uniformly slower, though, which isn't very pleasing. I submitted an issue requesting to make it faster (which looks like it may be met by a WASM implementation implemented in Rust). The utest which targets small values in particular is expected to get slower for a constant-time algorithm, but I assume the eddsa-poseidon tests are operating on more typical values.

I'm not entirely sure of the right trade-off here. Speaking for the Zupass use case, I think we probably care more about performance than timing attacks right now, but that's not a universal choice I expect to apply to everyone.

I get that for some use cases performance is more important than safety, I honestly don't know which are the main projects that use this library. However, I think that any implementation of a rather complex cryptographic protocol must have a strong focus on security and, as you said, the performant variant may not be the best option for a lot of uses this library might have.

@cedoor
Copy link
Member

cedoor commented Sep 16, 2024

However, I think that any implementation of a rather complex cryptographic protocol must have a strong focus on security and, as you said, the performant variant may not be the best option for a lot of uses this library might have.

I think security should have priority over performance, but I would wait for a full review by the audit team.

One solution might be to release a new major if the security benefits are substantial and continue to maintain version 1 anyway.

@cedoor
Copy link
Member

cedoor commented Oct 3, 2024

One solution might be to release a new major if the security benefits are substantial and continue to maintain version 1 anyway.

@chancehudson @artwyman any thoughts?

@artwyman
Copy link
Collaborator

artwyman commented Oct 3, 2024

One solution might be to release a new major if the security benefits are substantial and continue to maintain version 1 anyway.
@chancehudson @artwyman any thoughts?

I have thoughts in two directions. First general thoughts on maintaining 2 versions:

It's workable, but definitely has a cost in maintenance burden and confusion. Either both versions need to be kept up to date with future patches, or else v1 will slowly become less useful, and potentially even insecure. I historically have a preference for "develop on main" workflows since the code everyone looks at and uses the code which will remain up-to-date and workable. I realize this is a preference, though, not a clear-cut definition of "best".

If you're going to support multiple algorithms for a significant time period, it may be better to keep them both in the code separately. They could be separate packages, separate functions within the package, or configurable via an "options" parameter. That keeps them both "on main" so they should benefit from ongoing work, unit testing, etc.

2nd, thoughts on the specifics of the security vs. performance tradeoff here:

Given bigint arithmetic isn't constant-time anyway, this algorithm isn't a complete solution to the security problem. That makes me personally de-prioritize it vs. performance, but that's a call reasonable people can make differently. I would defer to someone with more security expertise, and who has specific threat models in mind to make the call on whether a constant-time algorithm built on variable-time bigint math reduces timing attack risk by enough of a margin to be worth prioritizing.

Calling the current implementation the "performant variant" is admittedly a bit of a stretch, since any pure-JS implementation of cryptography isn't going to be very performant. A preferable alternative would be to solve this problem fully as part of a more optimized implementation (probably wasm) which is fast enough to be an improvement to performance even with a constant-time algorithm. Such an implementation could improve both performance and security at once . I'm not sure how soon that's likely to happen, though, so there may need to be trade-offs or provide multiple options in the near term.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mulPointScalar vulnerable to timing attacks
4 participants