Attributable errors #60

joostjager · 2022-11-10T13:30:46Z

Error attribution is important to properly penalize nodes after a payment failure occurs. The goal of the penalty is to give the next attempt a better chance at succeeding. In the happy failure flow, the sender is able to determine the origin of the failure and penalizes a single node or pair of nodes.

Unfortunately it is possible for nodes on the route to hide themselves. If they return random data as the failure message, the sender won't know where the failure happened.

This PR updates the failure encryption mechanism so that the failure source can always be determined.

References:

crypto.go

fat_error_decrypt.go

halseth · 2023-04-12T11:08:46Z

attributable_error_crypto.go

+)
+
+type AttributableErrorStructure struct {
+	HopCount        int


these are u16,u8 etc in the BOLT, maybe use the same here?

I used ints here to prevent casting to a wider data type in the various calculations that use these values. I think you could argue that the byte that is specified in the bolt is a limitation on the wire level and that the code in this repo doesn't need to follow that limitation necessarily?

I think it would be nice to have the types 1:1. How bad would the casting be?

I think the casting overhead would be negligible in terms of affected code sites. Inheriting the proper int types here aides in readability IMO.

Also in most cases, you don't really need to cast for comparisons, as you can make generic funcs that use constraints. Integer or w/e.

Also missing godoc comments here.

Is this an error struct (actually error type), or something else entirely? In either case, adding "Structure" to the end seems superflous.

It is not an error struct, but a description of the actual structure of the onion failure. I think the addition Structure is appropriate in this case. Added godoc to explain.

Will leave this thread open and await movement on the spec side with regards to the data types. In the spec PR I chose byte for now, but perhaps we might go for something even more compact with just a few options for the structure (enum-like).

attributable_error_crypto.go

fat_error_decrypt.go

Roasbeef · 2023-04-19T19:34:44Z

cc @bitromortac

lightninglabs-deploy · 2023-05-04T11:12:42Z

@joostjager, remember to re-request review from reviewers when ready

joostjager · 2023-05-04T14:12:49Z

!lightninglabs-deploy mute

Roasbeef

An initial pass to build up my mental model. Will sit with the spec for a bit before doing another pass here.

crypto.go

Roasbeef · 2023-05-23T02:32:51Z

crypto.go

 }

 // generateSharedSecret generates the shared secret by given ephemeral key.
-func (r *Router) generateSharedSecret(dhKey *btcec.PublicKey) (Hash256, error) {
+func (r *Router) GenerateSharedSecret(dhKey *btcec.PublicKey) (Hash256, error) {


Godoc above not updated.

Is this just so the implemented interface only has public methods, or that this method is to be used elsewhere?

Godoc updated.

The method is used in #7139. Access is needed to the shared secret so that the type of onion encryptor can be 'upgraded' after reading the error structure parameters from the onion tlv.

Roasbeef · 2023-05-23T02:34:46Z

attributable_error_crypto.go

+)
+
+type AttributableErrorStructure struct {
+	HopCount        int


I think the casting overhead would be negligible in terms of affected code sites. Inheriting the proper int types here aides in readability IMO.

Also in most cases, you don't really need to cast for comparisons, as you can make generic funcs that use constraints. Integer or w/e.

Roasbeef · 2023-05-23T02:35:30Z

attributable_error_crypto.go

+)
+
+type AttributableErrorStructure struct {
+	HopCount        int


Also missing godoc comments here.

Is this an error struct (actually error type), or something else entirely? In either case, adding "Structure" to the end seems superflous.

Roasbeef · 2023-05-23T02:36:32Z

attributable_error_crypto.go

+}
+
+func newAttributableErrorBase(
+	structure *AttributableErrorStructure) attributableErrorBase {


onionError instead?

Isn't the legacy error an onion-routed error too?

attributable_error_crypto.go

Roasbeef · 2023-05-23T02:42:06Z

attributable_error_crypto.go

+	hash := hmac.New(sha256.New, umKey[:])
+
+	// Include message.
+	_, _ = hash.Write(message)


This is actually an encrypted message right? So the ciphertext and not the plaintext.

If not, then we need to revise the proposal, as encrypt-then-mac is a must as we want to ensure integrity of the ciphertext. Otherwise the MAC may also give some information about the plaintext itself.

Might just have an out of date mental model tho....decided to take a look at the code before the spec to reaload some context.

No, this is hmac-then-encrypt. Legacy failures are also hmac-then-encrypt - at least for the error source because the other nodes only encrypt.

If the complete message including hmacs is encrypted as the final step before passing it back to the upstream node, how can information about the plaintext leak?

Roasbeef · 2023-05-23T02:43:47Z

attributable_error_crypto.go

+	var hmacIdx = maxHops + position
+
+	// Iterate over all downstream nodes.
+	for j := 0; j < maxHops-position-1; j++ {


I find the inclusion of positional/index information here confusing: how will a forwarding node actually derive its position? If this is given to it by the sender, ofc that breaks w/e privacy we may have w/ the current scheme.

They do not need to derive their position. And this information also isn't given to them by the sender, because that would indeed break privacy.

The core idea of this proposal is to iterate over every possible position that the node may have in the path, and add a corresponding hmac for each of those positions.

Roasbeef · 2023-05-23T02:47:09Z

attributable_error_decrypt.go

+		// Extract the payload and exit with a nil message if it is invalid.
+		source, payload, err := o.extractPayload(payloads)
+		if sender == 0 {
+			if err != nil {


So we don't handle the error at all if the sender is known?

The handling consists of returning the current position to the caller.

Some invalid data was encountered because we couldn't extract the payload. This means that there is no actual message to return. With the position information, the caller can apply a selective penalty.

Allow for more flexible usage of the error encrypter. This is useful when upgrading an existing legacy error encrypter to fat errors in lnd.

bitromortac

Nice work, quite a smart way to approach the problem!

Having HopCount is probably not ideal because it could be used for fingerprinting, but this ensures to have enough space for future extensions, right? This brings me to the question whether it's worthwhile to stay below the TCP MTU limit for latency reasons (similarly as for the onion size in update_add_htlc)?

If we have b bits (say 8000) available for the HMACs and security of s bits in the HMACs (HMAC chopped off after s bits, as you suggested out of band), we would get

s = b / (HopCount * (HopCount + 1) / 2)

HopCount |   s
--------------
27       |  21
20       |  38
18       |  47
16       |  59
14       |  76
12       | 102

For comparison, the Bitcoin network has a hash rate that manages s~80 in 10 min. In practice anything higher than that seems to be very high. The downside of having a fixed total message size below b is that the number of total hops is limited.

bitromortac · 2023-06-06T14:27:16Z

attributable_error_crypto.go

+
+// AttributableErrorStructure contains the parameters that define the structure
+// of the error message that is passed back.
+type AttributableErrorStructure struct {


If we would manage to reduce the size of the total error, would it be desirable to get rid of the HopCount as it may add a fingerprint vector? Together with FixedPayloadLen, this is mostly to allow flexibility concerning future usage like nodes communicating back different metadata?

Yes, FixedPayloadLen is to allow future flexibility without increasing the failure message until it is needed.

HopCount is slightly different. It gives senders some control over the failure message size, at the cost of limiting the maximum route length. If the size of the total error can be reduced so that HopCount can be hard-coded to the max route length of 27, that would be great. On the other hand, perhaps long routes are so uncommon that a lower constant could be acceptable too.

attributable_error_crypto.go

bitromortac · 2023-06-06T15:20:21Z

attributable_error_crypto.go

+	_, _ = hash.Write(message)
+
+	// Include payloads including our own.
+	_, _ = hash.Write(payloads[:(o.maxHops()-position)*o.payloadLen()])


Could you clarify the position parameter? Currently the comment states that this is the distance to the error source. If position=0, i.e, we are the erring node, we would only need to include our own payload, is it? If so, I would expect payloads[:(position+1)*o.payloadLen()].

Good catch, this parameter is indeed inverted. Updated the code to match the comment. In the spec pr, I also define position relative to the error source.

bitromortac · 2023-06-08T12:49:42Z

attributable_error_decrypt.go

+
+	// Ensure the error message length is enough to contain the payloads and
+	// hmacs blocks. Otherwise blame the first hop.
+	if len(encryptedData) < minPaddedOnionErrorLength+o.hmacsAndPayloadsLen() {


How would an intermediate node react to another node forwarding a short message (probably out of scope for this PR)?

This is good question.

In master what happens is that a random failure message is made up if the length is below the (legacy) minimum:
https://github.com/lightningnetwork/lnd/blob/e549096b8837a95e0515c8c63fa01c634714d913/htlcswitch/link.go#LL1889C2-L1889C2

For attributable errors though, it may pass this check. Then in the intermediate EncryptError, an error is returned.

I think currently in lightningnetwork/lnd#7139, the intermediate node ignores the error and passes back the failure message as is. This allows an attacker to get away with it.

What should happen instead is that the intermediate node creates a new, properly formatted failure to attribute blame to the connection between itself and its downstream peer. Added comment https://github.com/lightningnetwork/lnd/pull/7139/files#r1228037240

attributable_error_decrypt.go

bitromortac · 2023-06-08T13:11:16Z

attributable_error_decrypt.go

+		// Clear first hmac slot. This slot is for the position farthest away
+		// from the error source. Because we are shifting, this cannot be
+		// relevant.
+		copy(hmacs[destIdx*sha256.Size:], zeroHMAC[:])


Couldn't convince myself whether clearing is needed.

We should never be reading this field when decrypting the error. I left this in as a 'sanity clear'...

bitromortac · 2023-06-08T16:33:00Z

testdata/attributable_error.json

@@ -0,0 +1,25 @@
+{


Would it make sense to add a comment to explain the test, for example which payloads were used or what parameters were used? Payloads could also be added to the test vector.

This file is taken straight from the spec PR where the parameters are explained. I will do more polishing after we've decided that attr errors is going to happen and what variation of it exactly.

joostjager · 2023-06-13T12:28:08Z

If we have b bits (say 8000) available for the HMACs and security of s bits in the HMACs (HMAC chopped off after s bits, as you suggested out of band)

Yes, I think this would be a useful optimization. You could argue that attributing failure isn't the most critical thing in lightning. Even with a relatively high chance of an attacker forging an hmac that is considered valid (let's say 21 bits, 1 in 2,000,000), game theory is still very much against them. If they guess wrong, they'll be penalized and will have to wait for their reputation to be restored before they'll receive new traffic from that sender.

joostjager · 2023-06-15T17:49:00Z

Feedback from lnd dev call:

Hard-code max_hops to 27 to avoid fingerprinting. Perhaps it could be one or two hops less also because the max hops goes down when we signal attributable errors.
Reduce hmac bits to keep failure message reasonably small (bits tbd, maybe divisible by 8, maybe 32 bits?)

For the payload that each node adds, we could for now just go for the minimum size. No tlv also. Especially because we don't even have concrete ideas what to add besides the timestamp. If this changes in the future, we can invent another signal in the forward onion.

This would eliminate all parameters from the forward onion and leave just a zero-length tlv record to signal sender support for attributable errors.

joostjager · 2023-06-16T10:59:30Z

Added a parameter in the attributable error structure struct for the size of the hmac.

The other changes mentioned above can happen in the lnd pr.

bitromortac

LGTM 🎉! I compared the left/right-shift/HMAC operations to my notes and they are identical. Great to have the attributable errors completely parametrized such that we can define them for the protocol.

bitromortac · 2023-06-19T16:14:10Z

attributable_error_decrypt.go

+
+		return &DecryptedAttrError{
+			DecryptedError: DecryptedError{
+				SenderIdx: 1,


Would it be helpful to distinguish a malicious sender from the true sender of the error message?

How would we do that? If our peer returns a message that is too short, then they didn't do what do should have done. They should always return a message of the correct length that includes at least their valid hmacs.

I thought about distinguishing the happy path decoding (we can determine the true sender) from the unhappy one (isCorruptedSender bool), but I'm not sure how that would be used for penalization yet.

The unhappy path is currently signaled through a nil DecryptedError.Message. But perhaps it should be more explicit? This is where I miss rust's expressiveness in data types.

attributable_error_crypto_test.go

joostjager · 2023-06-20T08:00:08Z

attributable_error_decrypt.go

+	)
+	copy(dummySecret[:], bytes.Repeat([]byte{1}, 32))
+
+	// We'll iterate a constant amount of hops to ensure that we don't give


While the intention to guard against timing attacks by constant-hopping is admirable, it may be a case of over-engineering. Timing attacks can originate from numerous other sources, including differences in data processing times, network latency, and even hardware behavior. To truly protect against timing attacks, a comprehensive approach is needed that covers all potential sources, making constant-hopping only a part of the solution.

Moreover, adding complex measures such as this may introduce new, unforeseen security vulnerabilities or performance issues. It could also divert resources from tackling more likely and more dangerous threats. Therefore, a balanced, risk-based approach to security might be more appropriate than focusing too heavily on one specific, and potentially less probable, attack vector.

[So far for the AI]

In short, shall we just leave this out for the sake of simplicity?

joostjager · 2023-08-08T12:32:13Z

Test vector updated to 20/4/4 structure in line with lightning/bolts#1044 (comment)

joostjager mentioned this pull request Nov 10, 2022

htlcswitch: attributable errors lightningnetwork/lnd#7139

Open

joostjager force-pushed the fat-errors branch from c342f47 to c99e01e Compare November 10, 2022 15:22

joostjager commented Nov 10, 2022

View reviewed changes

crypto.go Outdated Show resolved Hide resolved

joostjager force-pushed the fat-errors branch from c99e01e to bb61d8d Compare November 10, 2022 15:37

joostjager changed the title ~~convert to fat errors~~ fat errors Nov 10, 2022

joostjager force-pushed the fat-errors branch from 1e5ca3e to 5783ac8 Compare November 21, 2022 08:14

joostjager force-pushed the fat-errors branch 9 times, most recently from 00bce74 to 83d4580 Compare December 12, 2022 07:32

joostjager force-pushed the fat-errors branch 3 times, most recently from e6b9314 to e2a890e Compare January 16, 2023 11:37

joostjager force-pushed the fat-errors branch from e2a890e to 8d8737e Compare January 23, 2023 11:34

halseth reviewed Jan 26, 2023

View reviewed changes

fat_error_decrypt.go Outdated Show resolved Hide resolved

carlaKC reviewed Feb 6, 2023

View reviewed changes

fat_error_decrypt.go Outdated Show resolved Hide resolved

joostjager force-pushed the fat-errors branch 3 times, most recently from a8fce5c to 444400b Compare February 8, 2023 12:18

joostjager changed the title ~~fat errors~~ Attributable errors Feb 8, 2023

halseth reviewed Apr 12, 2023

View reviewed changes

joostjager force-pushed the fat-errors branch from 444400b to ed2e8be Compare April 19, 2023 11:28

joostjager force-pushed the fat-errors branch 2 times, most recently from 52a8a08 to fccd040 Compare April 24, 2023 18:30

joostjager force-pushed the fat-errors branch 2 times, most recently from 9afb8a0 to c0682ea Compare April 24, 2023 18:40

Roasbeef requested changes May 23, 2023

View reviewed changes

joostjager added 3 commits May 23, 2023 14:52

add minPaddedOnionErrorLength constant

b0bf831

initialize NewOnionErrorEncrypter with shared secret directly

7b82733

Allow for more flexible usage of the error encrypter. This is useful when upgrading an existing legacy error encrypter to fat errors in lnd.

export GenerateSharedSecret

78880cd

joostjager force-pushed the fat-errors branch 3 times, most recently from e244f62 to 6649a30 Compare May 23, 2023 13:39

joostjager requested a review from Roasbeef May 23, 2023 13:43

bitromortac reviewed Jun 13, 2023

View reviewed changes

joostjager force-pushed the fat-errors branch 2 times, most recently from 09db49b to 4852f56 Compare June 13, 2023 13:11

joostjager requested a review from bitromortac June 13, 2023 13:17

joostjager mentioned this pull request Jun 14, 2023

Lightning Specification Meeting 2023/06/19 lightning/bolts#1088

Closed

28 tasks

joostjager force-pushed the fat-errors branch 2 times, most recently from 9b2478b to 74fcef9 Compare June 16, 2023 10:53

bitromortac approved these changes Jun 19, 2023

View reviewed changes

joostjager force-pushed the fat-errors branch 2 times, most recently from fbef088 to 9530e97 Compare June 19, 2023 18:05

joostjager commented Jun 20, 2023

View reviewed changes

add attributable error encryption and decryption

787ad3d

joostjager force-pushed the fat-errors branch from 9530e97 to 787ad3d Compare August 8, 2023 12:11

Attributable errors #60

Are you sure you want to change the base?

Attributable errors #60

Conversation

joostjager commented Nov 10, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joostjager May 23, 2023 • edited Loading

Choose a reason for hiding this comment

Roasbeef commented Apr 19, 2023

lightninglabs-deploy commented May 4, 2023

joostjager commented May 4, 2023

Roasbeef left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joostjager May 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bitromortac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joostjager Jun 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joostjager Jun 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joostjager Jun 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joostjager Jun 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joostjager commented Jun 13, 2023 • edited Loading

joostjager commented Jun 15, 2023

joostjager commented Jun 16, 2023

bitromortac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joostjager commented Aug 8, 2023

joostjager commented Nov 10, 2022 •

edited

Loading

joostjager May 23, 2023 •

edited

Loading

joostjager May 23, 2023 •

edited

Loading

joostjager Jun 13, 2023 •

edited

Loading

joostjager Jun 13, 2023 •

edited

Loading

joostjager Jun 13, 2023 •

edited

Loading

joostjager Jun 13, 2023 •

edited

Loading

joostjager commented Jun 13, 2023 •

edited

Loading