Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can the bep52 spec clearly explain what happens with non-powers of two. #128

Open
alanmcgovern opened this issue Jun 4, 2022 · 8 comments

Comments

@alanmcgovern
Copy link

Index MUST be a multiple of length, this includes zero.
Length is the number of hashes to include from the base layer.
Length MUST be equal-to-or-greater-than two and a power of two.
Length SHOULD NOT be greater than 512. Proof layers is the
number of ancestor layers to include. Note that the limits imposed on

The spec is not clear about what should happen in the event a bep52 hashed file does not have an exact multiple of 2 number of nodes in a given layer. For example, a file has 7 pieces in it's piece layer. Note: The omitted piece h is referred to as p as it's a padding hash.

                  abcdefgp
         abcd                 efgp
   ab        cd          ef           gp
a    b     c     d     e     f    g
  1. Is it valid to send a request message with: index = 0, length = 7 as that is the full layer even though length is not a power of 2 as the spec requires?

  2. If not, is it intended that a request message would be sent for index =0, length = 8. If so, should this request be fulfilled by the receiving client by including a padding hash as the 8th hash in the Hashes response message, or should the client who receives this only include the 7 actual hashes?

  3. Similarly, if this layer were retrieved in smaller chunks, would it be correct to send a request for: index = 0, length = 4 + index = 4, length = 3, or should the requesting client send index = 0, length = 4 + index = 4, length = 4 ?

  4. If option 2 is the intention, as a worst case scenario would it be correct (albeit inefficient) to request a layer with 513 nodes using this combination of requests: index = 0 length = 512 + index = 512, length = 512. 511 of the hashes would be padding hashes in this case.

@the8472
Copy link
Contributor

the8472 commented Jun 4, 2022

Hrm, good question. At the very least the index must stay properly aligned to ensure we minimize the amount of proof hashes needed.

Allowing non power-of-two lengths at the tail would at least make some spec phrasings and offset calculations in implementations a bit more complicated I think such as finding the proof layers in a hashes message.

@arvidn how does libtorrent handle this today?

If so, should this request be fulfilled by the receiving client by including a padding hash as the 8th hash in the Hashes response message, or should the client who receives this only include the 7 actual hashes?

Requesting N and replying with N-x doesn't seem sane to me. Either we allow shorter requests or the client should reply with the padding hashes, otherwise it makes offset calculation and matching requests to responses even more complicated.

@alanmcgovern
Copy link
Author

Requesting N and replying with N-x doesn't seem sane to me

This is exactly how proof layers work, right? The expectation from the spec is that I request the full set of proof layers (typically covering the range from piece layer -> the layer which generates the piece root) and the responding client explicitly omits the ones which aren't required. As such the spec guarantees you'll receive between 0 and the requested amount of proof layers. You have to figure it out how many were sent by counting the number of hashes in the response.

The invariant being maintained is that you can merkle hash the "hashes" you requested until you generate the "root" of that subtree, and then each proof layer can be merkle hashed, in order, until you end up with the actual piece root. Right?

If so, it would be good to clearly call that out in the spec too. It was gonna be my follow up question ;)

@the8472
Copy link
Contributor

the8472 commented Jun 4, 2022

This is exactly how proof layers work, right? The expectation from the spec is that I request the full set of proof layers (typically covering the range from piece layer -> the layer which generates the piece root) and the responding client explicitly omits the ones which aren't required.

You got a point there. At least it's explicitly documented. I agree, it does strike me as silly that we optimized that part but not the number of layer hashes one can request.

You have to figure it out how many were sent by counting the number of hashes in the response.

Well, you don't really need to count them since the number of base layer hashes is given in the length field of the message and the number of omitted hashes can be directly derived from that, which can then be consistency-checked with the total message length.

The invariant being maintained is that you can merkle hash the "hashes" you requested until you generate the "root" of that subtree, and then each proof layer can be merkle hashed, in order, until you end up with the actual piece root. Right?

Unless you request fewer than necessary because you already have them from other requests.

@alanmcgovern
Copy link
Author

Yup! Agree on both points there! I think the docs explain that reasonably clearly - what you described is how I interpreted it.

The only actual question then is what should occur when the layer length is not an exact power of 2. I'll see what arvid chips in with :)

Thanks for the responses so far!

@alanmcgovern
Copy link
Author

@arvidn gentle ping just in case you saw this and forgot!

@arvidn
Copy link
Contributor

arvidn commented Jun 9, 2022

Is it valid to send a request message with: index = 0, length = 7 as that is the full layer even though length is not a power of 2 as the spec requires?

Yes, that is valid according to libtorrent's interpretation.

If not, is it intended that a request message would be sent for index =0, length = 8. If so, should this request be fulfilled by the receiving client by including a padding hash as the 8th hash in the Hashes response message, or should the client who receives this only include the 7 actual hashes?

Similarly, if this layer were retrieved in smaller chunks, would it be correct to send a request for: index = 0, length = 4 + index = 4, length = 3, or should the requesting client send index = 0, length = 4 + index = 4, length = 4 ?

I don't know what you mean by the multiple equal signs here: length = 4 + index = 4

libtorrent ensures that the range of hashes you request all fall within the tree. You're allowed to request padding hashes, but you don't have to.

If option 2 is the intention, as a worst case scenario would it be correct (albeit inefficient) to request a layer with 513 nodes using this combination of requests: index = 0 length = 512 + index = 512, length = 512. 511 of the hashes would be padding hashes in this case.

Ah, + is a separator between different request messages.

Those would be valid requests. It would also be valid to just request a single hash in the last request.

@arvidn
Copy link
Contributor

arvidn commented Jun 9, 2022

These test cases illustrate how libtorrent responds to hash requests:

https://github.com/arvidn/libtorrent/blob/RC_2_0/test/test_merkle_tree.cpp#L402

The parameters to get_hashes() are: base_layer, index, count, proof_layers

@alanmcgovern
Copy link
Author

alanmcgovern commented Jun 9, 2022

Ah, + is a separator between different request messages.

That could've been clearer - my bad!

Those would be valid requests. It would also be valid to just request a single hash in the last request.

Perfect. I proposed a minor wording change to the spec, re-using the same language used to describe how the final piece/blocks are requested. From my understanding of the the tests you linked, and the intent of the spec, this feels appropriate. #129

If you think the spec should be more permissive and clients should be allowed request hashes beyond the length of the layer (i.e. padding hashes), then let me know and I'll update as appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@alanmcgovern @arvidn @the8472 and others