-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attributable errors #60
base: master
Are you sure you want to change the base?
Conversation
c342f47
to
c99e01e
Compare
c99e01e
to
bb61d8d
Compare
1e5ca3e
to
5783ac8
Compare
00bce74
to
83d4580
Compare
e6b9314
to
e2a890e
Compare
e2a890e
to
8d8737e
Compare
a8fce5c
to
444400b
Compare
attributable_error_crypto.go
Outdated
) | ||
|
||
type AttributableErrorStructure struct { | ||
HopCount int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are u16
,u8
etc in the BOLT, maybe use the same here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used ints here to prevent casting to a wider data type in the various calculations that use these values. I think you could argue that the byte
that is specified in the bolt is a limitation on the wire level and that the code in this repo doesn't need to follow that limitation necessarily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to have the types 1:1. How bad would the casting be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the casting overhead would be negligible in terms of affected code sites. Inheriting the proper int types here aides in readability IMO.
Also in most cases, you don't really need to cast for comparisons, as you can make generic funcs that use constraints. Integer
or w/e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also missing godoc
comments here.
Is this an error struct (actually error
type), or something else entirely? In either case, adding "Structure" to the end seems superflous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not an error struct, but a description of the actual structure of the onion failure. I think the addition Structure
is appropriate in this case. Added godoc to explain.
Will leave this thread open and await movement on the spec side with regards to the data types. In the spec PR I chose byte
for now, but perhaps we might go for something even more compact with just a few options for the structure (enum-like).
cc @bitromortac |
52a8a08
to
fccd040
Compare
9afb8a0
to
c0682ea
Compare
@joostjager, remember to re-request review from reviewers when ready |
!lightninglabs-deploy mute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An initial pass to build up my mental model. Will sit with the spec for a bit before doing another pass here.
} | ||
|
||
// generateSharedSecret generates the shared secret by given ephemeral key. | ||
func (r *Router) generateSharedSecret(dhKey *btcec.PublicKey) (Hash256, error) { | ||
func (r *Router) GenerateSharedSecret(dhKey *btcec.PublicKey) (Hash256, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Godoc above not updated.
Is this just so the implemented interface only has public methods, or that this method is to be used elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Godoc updated.
The method is used in #7139. Access is needed to the shared secret so that the type of onion encryptor can be 'upgraded' after reading the error structure parameters from the onion tlv.
attributable_error_crypto.go
Outdated
) | ||
|
||
type AttributableErrorStructure struct { | ||
HopCount int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the casting overhead would be negligible in terms of affected code sites. Inheriting the proper int types here aides in readability IMO.
Also in most cases, you don't really need to cast for comparisons, as you can make generic funcs that use constraints. Integer
or w/e.
attributable_error_crypto.go
Outdated
) | ||
|
||
type AttributableErrorStructure struct { | ||
HopCount int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also missing godoc
comments here.
Is this an error struct (actually error
type), or something else entirely? In either case, adding "Structure" to the end seems superflous.
attributable_error_crypto.go
Outdated
} | ||
|
||
func newAttributableErrorBase( | ||
structure *AttributableErrorStructure) attributableErrorBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onionError
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the legacy error an onion-routed error too?
hash := hmac.New(sha256.New, umKey[:]) | ||
|
||
// Include message. | ||
_, _ = hash.Write(message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually an encrypted message right? So the ciphertext and not the plaintext.
If not, then we need to revise the proposal, as encrypt-then-mac is a must as we want to ensure integrity of the ciphertext. Otherwise the MAC may also give some information about the plaintext itself.
Might just have an out of date mental model tho....decided to take a look at the code before the spec to reaload some context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is hmac-then-encrypt. Legacy failures are also hmac-then-encrypt - at least for the error source because the other nodes only encrypt.
If the complete message including hmacs is encrypted as the final step before passing it back to the upstream node, how can information about the plaintext leak?
attributable_error_crypto.go
Outdated
var hmacIdx = maxHops + position | ||
|
||
// Iterate over all downstream nodes. | ||
for j := 0; j < maxHops-position-1; j++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the inclusion of positional/index information here confusing: how will a forwarding node actually derive its position? If this is given to it by the sender, ofc that breaks w/e privacy we may have w/ the current scheme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do not need to derive their position. And this information also isn't given to them by the sender, because that would indeed break privacy.
The core idea of this proposal is to iterate over every possible position that the node may have in the path, and add a corresponding hmac for each of those positions.
// Extract the payload and exit with a nil message if it is invalid. | ||
source, payload, err := o.extractPayload(payloads) | ||
if sender == 0 { | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we don't handle the error at all if the sender is known?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handling consists of returning the current position to the caller.
Some invalid data was encountered because we couldn't extract the payload. This means that there is no actual message to return. With the position information, the caller can apply a selective penalty.
Allow for more flexible usage of the error encrypter. This is useful when upgrading an existing legacy error encrypter to fat errors in lnd.
e244f62
to
6649a30
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, quite a smart way to approach the problem!
Having HopCount
is probably not ideal because it could be used for fingerprinting, but this ensures to have enough space for future extensions, right? This brings me to the question whether it's worthwhile to stay below the TCP MTU limit for latency reasons (similarly as for the onion size in update_add_htlc
)?
If we have b
bits (say 8000) available for the HMACs and security of s
bits in the HMACs (HMAC chopped off after s
bits, as you suggested out of band), we would get
s = b / (HopCount * (HopCount + 1) / 2)
HopCount | s
--------------
27 | 21
20 | 38
18 | 47
16 | 59
14 | 76
12 | 102
For comparison, the Bitcoin network has a hash rate that manages s~80 in 10 min. In practice anything higher than that seems to be very high. The downside of having a fixed total message size below b
is that the number of total hops is limited.
attributable_error_crypto.go
Outdated
|
||
// AttributableErrorStructure contains the parameters that define the structure | ||
// of the error message that is passed back. | ||
type AttributableErrorStructure struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we would manage to reduce the size of the total error, would it be desirable to get rid of the HopCount
as it may add a fingerprint vector? Together with FixedPayloadLen
, this is mostly to allow flexibility concerning future usage like nodes communicating back different metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, FixedPayloadLen
is to allow future flexibility without increasing the failure message until it is needed.
HopCount
is slightly different. It gives senders some control over the failure message size, at the cost of limiting the maximum route length. If the size of the total error can be reduced so that HopCount
can be hard-coded to the max route length of 27, that would be great. On the other hand, perhaps long routes are so uncommon that a lower constant could be acceptable too.
attributable_error_crypto.go
Outdated
_, _ = hash.Write(message) | ||
|
||
// Include payloads including our own. | ||
_, _ = hash.Write(payloads[:(o.maxHops()-position)*o.payloadLen()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify the position parameter? Currently the comment states that this is the distance to the error source. If position=0, i.e, we are the erring node, we would only need to include our own payload, is it? If so, I would expect payloads[:(position+1)*o.payloadLen()]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, this parameter is indeed inverted. Updated the code to match the comment. In the spec pr, I also define position relative to the error source.
attributable_error_decrypt.go
Outdated
|
||
// Ensure the error message length is enough to contain the payloads and | ||
// hmacs blocks. Otherwise blame the first hop. | ||
if len(encryptedData) < minPaddedOnionErrorLength+o.hmacsAndPayloadsLen() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would an intermediate node react to another node forwarding a short message (probably out of scope for this PR)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good question.
In master what happens is that a random failure message is made up if the length is below the (legacy) minimum:
https://github.com/lightningnetwork/lnd/blob/e549096b8837a95e0515c8c63fa01c634714d913/htlcswitch/link.go#LL1889C2-L1889C2
For attributable errors though, it may pass this check. Then in the intermediate EncryptError, an error is returned.
I think currently in lightningnetwork/lnd#7139, the intermediate node ignores the error and passes back the failure message as is. This allows an attacker to get away with it.
What should happen instead is that the intermediate node creates a new, properly formatted failure to attribute blame to the connection between itself and its downstream peer. Added comment https://github.com/lightningnetwork/lnd/pull/7139/files#r1228037240
attributable_error_decrypt.go
Outdated
// Clear first hmac slot. This slot is for the position farthest away | ||
// from the error source. Because we are shifting, this cannot be | ||
// relevant. | ||
copy(hmacs[destIdx*sha256.Size:], zeroHMAC[:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't convince myself whether clearing is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should never be reading this field when decrypting the error. I left this in as a 'sanity clear'...
@@ -0,0 +1,25 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add a comment to explain the test, for example which payloads were used or what parameters were used? Payloads could also be added to the test vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is taken straight from the spec PR where the parameters are explained. I will do more polishing after we've decided that attr errors is going to happen and what variation of it exactly.
Yes, I think this would be a useful optimization. You could argue that attributing failure isn't the most critical thing in lightning. Even with a relatively high chance of an attacker forging an hmac that is considered valid (let's say 21 bits, 1 in 2,000,000), game theory is still very much against them. If they guess wrong, they'll be penalized and will have to wait for their reputation to be restored before they'll receive new traffic from that sender. |
09db49b
to
4852f56
Compare
Feedback from lnd dev call:
For the payload that each node adds, we could for now just go for the minimum size. No tlv also. Especially because we don't even have concrete ideas what to add besides the timestamp. If this changes in the future, we can invent another signal in the forward onion. This would eliminate all parameters from the forward onion and leave just a zero-length tlv record to signal sender support for attributable errors. |
9b2478b
to
74fcef9
Compare
Added a parameter in the attributable error structure struct for the size of the hmac. The other changes mentioned above can happen in the lnd pr. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🎉! I compared the left/right-shift/HMAC operations to my notes and they are identical. Great to have the attributable errors completely parametrized such that we can define them for the protocol.
|
||
return &DecryptedAttrError{ | ||
DecryptedError: DecryptedError{ | ||
SenderIdx: 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be helpful to distinguish a malicious sender from the true sender of the error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would we do that? If our peer returns a message that is too short, then they didn't do what do should have done. They should always return a message of the correct length that includes at least their valid hmacs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about distinguishing the happy path decoding (we can determine the true sender) from the unhappy one (isCorruptedSender bool
), but I'm not sure how that would be used for penalization yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unhappy path is currently signaled through a nil DecryptedError.Message
. But perhaps it should be more explicit? This is where I miss rust's expressiveness in data types.
fbef088
to
9530e97
Compare
) | ||
copy(dummySecret[:], bytes.Repeat([]byte{1}, 32)) | ||
|
||
// We'll iterate a constant amount of hops to ensure that we don't give |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the intention to guard against timing attacks by constant-hopping is admirable, it may be a case of over-engineering. Timing attacks can originate from numerous other sources, including differences in data processing times, network latency, and even hardware behavior. To truly protect against timing attacks, a comprehensive approach is needed that covers all potential sources, making constant-hopping only a part of the solution.
Moreover, adding complex measures such as this may introduce new, unforeseen security vulnerabilities or performance issues. It could also divert resources from tackling more likely and more dangerous threats. Therefore, a balanced, risk-based approach to security might be more appropriate than focusing too heavily on one specific, and potentially less probable, attack vector.
[So far for the AI]
In short, shall we just leave this out for the sake of simplicity?
Test vector updated to 20/4/4 structure in line with lightning/bolts#1044 (comment) |
Error attribution is important to properly penalize nodes after a payment failure occurs. The goal of the penalty is to give the next attempt a better chance at succeeding. In the happy failure flow, the sender is able to determine the origin of the failure and penalizes a single node or pair of nodes.
Unfortunately it is possible for nodes on the route to hide themselves. If they return random data as the failure message, the sender won't know where the failure happened.
This PR updates the failure encryption mechanism so that the failure source can always be determined.
References: