Can the bep52 spec clearly explain what happens with non-powers of two. #128

alanmcgovern · 2022-06-04T11:13:51Z

Lines 419 to 423 in 7b7b41f

 Index MUST be a multiple of length, this includes zero. 

 Length is the number of hashes to include from the base layer. 

 Length MUST be equal-to-or-greater-than two and a power of two. 

 Length SHOULD NOT be greater than 512. Proof layers is the 

 number of ancestor layers to include. Note that the limits imposed on

The spec is not clear about what should happen in the event a bep52 hashed file does not have an exact multiple of 2 number of nodes in a given layer. For example, a file has 7 pieces in it's piece layer. Note: The omitted piece h is referred to as p as it's a padding hash.

                  abcdefgp
         abcd                 efgp
   ab        cd          ef           gp
a    b     c     d     e     f    g

Is it valid to send a request message with: index = 0, length = 7 as that is the full layer even though length is not a power of 2 as the spec requires?
If not, is it intended that a request message would be sent for index =0, length = 8. If so, should this request be fulfilled by the receiving client by including a padding hash as the 8th hash in the Hashes response message, or should the client who receives this only include the 7 actual hashes?
Similarly, if this layer were retrieved in smaller chunks, would it be correct to send a request for: index = 0, length = 4 + index = 4, length = 3, or should the requesting client send index = 0, length = 4 + index = 4, length = 4 ?
If option 2 is the intention, as a worst case scenario would it be correct (albeit inefficient) to request a layer with 513 nodes using this combination of requests: index = 0 length = 512 + index = 512, length = 512. 511 of the hashes would be padding hashes in this case.

The text was updated successfully, but these errors were encountered:

the8472 · 2022-06-04T11:52:25Z

Hrm, good question. At the very least the index must stay properly aligned to ensure we minimize the amount of proof hashes needed.

Allowing non power-of-two lengths at the tail would at least make some spec phrasings and offset calculations in implementations a bit more complicated I think such as finding the proof layers in a hashes message.

@arvidn how does libtorrent handle this today?

If so, should this request be fulfilled by the receiving client by including a padding hash as the 8th hash in the Hashes response message, or should the client who receives this only include the 7 actual hashes?

Requesting N and replying with N-x doesn't seem sane to me. Either we allow shorter requests or the client should reply with the padding hashes, otherwise it makes offset calculation and matching requests to responses even more complicated.

alanmcgovern · 2022-06-04T13:10:19Z

Requesting N and replying with N-x doesn't seem sane to me

This is exactly how proof layers work, right? The expectation from the spec is that I request the full set of proof layers (typically covering the range from piece layer -> the layer which generates the piece root) and the responding client explicitly omits the ones which aren't required. As such the spec guarantees you'll receive between 0 and the requested amount of proof layers. You have to figure it out how many were sent by counting the number of hashes in the response.

The invariant being maintained is that you can merkle hash the "hashes" you requested until you generate the "root" of that subtree, and then each proof layer can be merkle hashed, in order, until you end up with the actual piece root. Right?

If so, it would be good to clearly call that out in the spec too. It was gonna be my follow up question ;)

the8472 · 2022-06-04T13:30:59Z

This is exactly how proof layers work, right? The expectation from the spec is that I request the full set of proof layers (typically covering the range from piece layer -> the layer which generates the piece root) and the responding client explicitly omits the ones which aren't required.

You got a point there. At least it's explicitly documented. I agree, it does strike me as silly that we optimized that part but not the number of layer hashes one can request.

You have to figure it out how many were sent by counting the number of hashes in the response.

Well, you don't really need to count them since the number of base layer hashes is given in the length field of the message and the number of omitted hashes can be directly derived from that, which can then be consistency-checked with the total message length.

The invariant being maintained is that you can merkle hash the "hashes" you requested until you generate the "root" of that subtree, and then each proof layer can be merkle hashed, in order, until you end up with the actual piece root. Right?

Unless you request fewer than necessary because you already have them from other requests.

alanmcgovern · 2022-06-04T15:01:00Z

Yup! Agree on both points there! I think the docs explain that reasonably clearly - what you described is how I interpreted it.

The only actual question then is what should occur when the layer length is not an exact power of 2. I'll see what arvid chips in with :)

Thanks for the responses so far!

alanmcgovern · 2022-06-09T07:20:40Z

@arvidn gentle ping just in case you saw this and forgot!

arvidn · 2022-06-09T08:12:23Z

Is it valid to send a request message with: index = 0, length = 7 as that is the full layer even though length is not a power of 2 as the spec requires?

Yes, that is valid according to libtorrent's interpretation.

If not, is it intended that a request message would be sent for index =0, length = 8. If so, should this request be fulfilled by the receiving client by including a padding hash as the 8th hash in the Hashes response message, or should the client who receives this only include the 7 actual hashes?

Similarly, if this layer were retrieved in smaller chunks, would it be correct to send a request for: index = 0, length = 4 + index = 4, length = 3, or should the requesting client send index = 0, length = 4 + index = 4, length = 4 ?

I don't know what you mean by the multiple equal signs here: length = 4 + index = 4

libtorrent ensures that the range of hashes you request all fall within the tree. You're allowed to request padding hashes, but you don't have to.

If option 2 is the intention, as a worst case scenario would it be correct (albeit inefficient) to request a layer with 513 nodes using this combination of requests: index = 0 length = 512 + index = 512, length = 512. 511 of the hashes would be padding hashes in this case.

Ah, + is a separator between different request messages.

Those would be valid requests. It would also be valid to just request a single hash in the last request.

arvidn · 2022-06-09T08:17:29Z

These test cases illustrate how libtorrent responds to hash requests:

https://github.com/arvidn/libtorrent/blob/RC_2_0/test/test_merkle_tree.cpp#L402

The parameters to get_hashes() are: base_layer, index, count, proof_layers

alanmcgovern · 2022-06-09T19:37:31Z

Ah, + is a separator between different request messages.

That could've been clearer - my bad!

Those would be valid requests. It would also be valid to just request a single hash in the last request.

Perfect. I proposed a minor wording change to the spec, re-using the same language used to describe how the final piece/blocks are requested. From my understanding of the the tests you linked, and the intent of the spec, this feels appropriate. #129

If you think the spec should be more permissive and clients should be allowed request hashes beyond the length of the layer (i.e. padding hashes), then let me know and I'll update as appropriate.

alanmcgovern mentioned this issue Jun 9, 2022

Clarify that non power-of-two lengths are valid #129

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can the bep52 spec clearly explain what happens with non-powers of two. #128

Can the bep52 spec clearly explain what happens with non-powers of two. #128

alanmcgovern commented Jun 4, 2022

the8472 commented Jun 4, 2022

alanmcgovern commented Jun 4, 2022

the8472 commented Jun 4, 2022

alanmcgovern commented Jun 4, 2022

alanmcgovern commented Jun 9, 2022

arvidn commented Jun 9, 2022

arvidn commented Jun 9, 2022

alanmcgovern commented Jun 9, 2022 •

edited

Loading

Can the bep52 spec clearly explain what happens with non-powers of two. #128

Can the bep52 spec clearly explain what happens with non-powers of two. #128

Comments

alanmcgovern commented Jun 4, 2022

the8472 commented Jun 4, 2022

alanmcgovern commented Jun 4, 2022

the8472 commented Jun 4, 2022

alanmcgovern commented Jun 4, 2022

alanmcgovern commented Jun 9, 2022

arvidn commented Jun 9, 2022

arvidn commented Jun 9, 2022

alanmcgovern commented Jun 9, 2022 • edited Loading

alanmcgovern commented Jun 9, 2022 •

edited

Loading