Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compression support for inlined data (v2-compatible) #201

Open
mostynb opened this issue Jul 12, 2021 · 5 comments
Open

Compression support for inlined data (v2-compatible) #201

mostynb opened this issue Jul 12, 2021 · 5 comments

Comments

@mostynb
Copy link
Contributor

mostynb commented Jul 12, 2021

We added support for compressed blobs via the bytestream API a while back (#147), and this has been working well so far. Given that we don't seem to be gearing up for a non-backwards compatible REAPIv3 release, I think we should start investigating adding backwards-compatible support for compression of inlined data, aiming to minimize roundtrips while still using compressed data.

One idea that I have been thinking about is to create a new Blob message which has the same fields as Digest, but with two additional optional fields: bytes data to hold the inlined data and Compressor.Value compressor to describe the encoding. Then, we would replace each usage of Digest in a message that currently also has a separate field for inlined (uncompressed) data with a Blob field, and deprecate the previous data field.

For example:

message Blob {
  // The first two field types + numbers must match those in Digest
  string hash = 1;
  int64 size_bytes = 2; // This refers to the uncompressed size

  reserved 3, 4; // Leave some room in case we want to extend Digest later(?)

  bytes data = 5; // Possibly compressed data, if set
  Compressor.Value compressor = 6; // Encoding used, if the data field is set
}

For "request" messages that can be responded to with inlined data, we would add a repeated field that specifies the compression types that the client accepts. The server would pick one of those encodings (or identity/uncompressed) for each Blob that it chooses to inline data for. We would also add a new field in the Capabilities API for servers to advertise support for this.

I believe this would be backwards-compatible- old clients would not request compressed inlined data, and would receive Blobs that are decodable as Digests in response, and servers would have a clear signal for when this feature can be used in responses.

Is anyone else interested in this feature?

@EricBurnett
Copy link
Collaborator

EricBurnett commented Jul 12, 2021 via email

@mostynb
Copy link
Contributor Author

mostynb commented Jul 12, 2021

I have mostly been considering this from the "cleaner v3 design, retrofitted to v2" angle (3), rather than the "minimal required changes to v2" point of view (1). (2) doesn't sound like a great compromise to me, but I'd need to read up more on protobufv3 to understand the implications well.

There are benefits to each of (1) and (3) course, and I'd be happy to sketch out solutions to both, or either one if there's a clear preference in the group.

@EricBurnett
Copy link
Collaborator

EricBurnett commented Jul 12, 2021 via email

@peterebden
Copy link
Contributor

We'd be interested in this too; we definitely observe a lot of builds with plenty of small but compressible artifacts (Go tends to be pretty heavy on this).

Agreed that both (1) and (3) have benefits; worth mentioning (2) for completeness but I agree that it's unlikely to be appealing as a solution. I think (1) is my favourite; (3) is a little weird from the perspective of the generated code given you have a thing that is like a Digest but is not actually one (so it'd be fine in languages like Python but not in say Go or Java).

I like the idea in general of encapsulating all blobs/digests with something like this Blob message as a V3 change; that seems more elegant than lots of individual sites re-inventing the same thing. But that doesn't seem practical for V2 and the embedded Request/Response messages seem like reasonable things to extend with the compressor field.

mostynb added a commit to mostynb/remote-apis that referenced this issue Jul 13, 2021
This is a small API change which allows for inlined data to be
compressed.

Refers to bazelbuild#201.
mostynb added a commit to mostynb/remote-apis that referenced this issue Jul 13, 2021
By defining a Blob message that overlaps with Digest, we can repurpose
Digest fields to allow data inlining with optional compression in more
places.  The Digest message is then only used in cases where the
receiver wants a reference rather than data.

In this example I have changed most of the Digest fields to Blobs,
which may lead to over-inlining of data.  Each case needs to be
considered separately before this change is ready for consideration.

Refers to bazelbuild#201.
@mostynb
Copy link
Contributor Author

mostynb commented Jul 13, 2021

mostynb added a commit to mostynb/remote-apis that referenced this issue Jul 19, 2021
This is a small API change which allows for inlined data to be
compressed.

Refers to bazelbuild#201.
mostynb added a commit to mostynb/remote-apis that referenced this issue Jul 19, 2021
This is a small API change which allows for inlined data to be
compressed.

Refers to bazelbuild#201.
mostynb added a commit to mostynb/remote-apis that referenced this issue Aug 12, 2021
This is a small API change which allows for inlined data to be
compressed form in BatchReadBlobs and BatchUpdateBlobs calls.

Refers to bazelbuild#201.
mostynb added a commit to mostynb/remote-apis that referenced this issue Aug 16, 2021
This is a small API change which allows for inlined data to be
compressed form in BatchReadBlobs and BatchUpdateBlobs calls.

Refers to bazelbuild#201.
mostynb added a commit to mostynb/remote-apis that referenced this issue Aug 21, 2021
This is a small API change which allows for inlined data to be
compressed form in BatchReadBlobs and BatchUpdateBlobs calls.

Refers to bazelbuild#201.
mostynb added a commit to mostynb/remote-apis that referenced this issue Aug 21, 2021
This is a small API change which allows for inlined data to be
compressed form in BatchReadBlobs and BatchUpdateBlobs calls.

Refers to bazelbuild#201.
mostynb added a commit to mostynb/remote-apis that referenced this issue Sep 14, 2021
This is a small API change which allows for inlined data to be
compressed form in BatchReadBlobs and BatchUpdateBlobs calls.

Refers to bazelbuild#201.
mostynb added a commit to mostynb/remote-apis that referenced this issue Oct 4, 2021
This is a small API change which allows for inlined data to be
compressed form in BatchReadBlobs and BatchUpdateBlobs calls.

Refers to bazelbuild#201.
EricBurnett pushed a commit that referenced this issue Oct 4, 2021
* Add support for inlined compressed data in batch CAS operations

This is a small API change which allows for inlined data to be
compressed form in BatchReadBlobs and BatchUpdateBlobs calls.

Refers to #201.

* Remove some stray parentheses
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants