Skip to content
This repository has been archived by the owner on May 3, 2019. It is now read-only.

Implement zero-copy ByteString <=> ByteVector conversions #1

Open
mpilquist opened this issue Sep 7, 2015 · 12 comments
Open

Implement zero-copy ByteString <=> ByteVector conversions #1

mpilquist opened this issue Sep 7, 2015 · 12 comments

Comments

@mpilquist
Copy link
Contributor

No description provided.

@aloiscochard
Copy link

:+1

That would be awesome!

@rkuhn
Copy link
Contributor

rkuhn commented Sep 7, 2015

Step one is done by #2, taking care of composite variants is more complex than I can tackle tonight. Shall we create a more specific ticket for those? (or even one per direction?)

@mpilquist
Copy link
Contributor Author

That sounds good. A ticket per direction might be good -- I've been experimenting with ByteVector => ByteString and there's not much more we can do without a minor change to scodec-bits. Tracking those details would be best in an issue specific to that conversion.

Another thing to note -- the next major release of scodec-bits changes the indexing on ByteVector from Int to Long, so in the case where a ByteVector is larger than Int.MaxValue, the conversion would have to fail.

@rkuhn
Copy link
Contributor

rkuhn commented Oct 1, 2015

I’m unfortunately quite overwhelmed with other things now, @rklaehn would you be interested in tinkering with this?

@rklaehn
Copy link

rklaehn commented Oct 1, 2015

@rkuhn So basically akka.util.ByteString and scodec.bits.ByteVector are both unbalanced rope-like data structures representing sequences of bytes. And you want a conversion that does not copy the byte arrays in the leafs. Conversion between leafs is already implemented using this java hack to get around accessibility restrictions, right?

@rkuhn
Copy link
Contributor

rkuhn commented Oct 1, 2015

Well, kind of. Currently it “works”, but it avoids the copy only in the compact case. Concerning access restrictions @mpilquist told me that on the scodec end there might be some work needed to make the necessary constructors accessible, even for Java.

@mpilquist
Copy link
Contributor Author

I'm not sure what to do about access restrictions. I don't really want to make the various concrete ByteVector subtypes public, though perhaps we could make them private[bits] and use a similar Java accessibility workaround. My concern with doing that is that scodec-bits and this library are more tightly coupled than implied by semantic versioning -- e.g., scodec-bits could evolve in a binary compatible way as far as API usage is concerned, but end up breaking this interop library.

IIRC, we have a similar issue with ByteString, where some of the constructors are private[akka].

@rkuhn
Copy link
Contributor

rkuhn commented Oct 1, 2015

Yes, this interop library may require more frequent release than normal client code, but that might be worth it. WDYT?

@mpilquist
Copy link
Contributor Author

Works for me, assuming we can avoid exposing internals to scala clients (via package private, java workarounds, etc.).

@rklaehn
Copy link

rklaehn commented Oct 1, 2015

@rkuhn Yes, that is what I meant with leafs. The compact ByteString1C is the leaf of your rope data structure, right? It is (currently?) the only implementation of CompactByteString. I am not familiar with the scodec one, but it seems to be roughly similar except for some trickery with mutable buffers...

In the long term, wouldn't it be best if akka and scodec could come up with a common ByteString implementation that works for both?

@rkuhn
Copy link
Contributor

rkuhn commented Oct 1, 2015

Well, ideally we would just use ByteString :-) (saying that since Akka is quite a bit older than scodec, but also because we would have quite some difficulty phasing out ByteString given our binary compatibility constraints). But scodec will probably not want to depend on akka-actor and we cannot break ByteString out of that artifact without violating some useful practices (like not splitting packages across multiple artifacts—which is a definite no-go for OSGi [which we unfortunately do support]). What would the migration strategies look like?

@mpilquist
Copy link
Contributor Author

@rkuhn Strangely, I feel that the ideal is that we'd just use ByteVector... :)

In all seriousness though, you are correct that we don't want scodec depending on akka-actor. Also, BitVector is much more important to scodec than ByteVector, and we get quite a bit of convenience by having these two types defined together. Further, ByteVector has been optimized for the types of code paths that occur frequently when encoding/decoding binary. ByteString hasn't been optimized for those code paths (you can see micro benchmark results that compare the two implementations in the scodec-bits/benchmark project).

Internally, the ByteVector structure is a balanced tree, with a mutable scratch buffer at the end, which allows referentially transparent copying in to the scratch buffer (where concurrent writes are raced, with the loser having to copy).

As a result, I think an interop layer is our best bet for the foreseeable future. If folks were interested in SLIP-ing something based on BitVector/ByteVector/ByteString, I'd be interested in moving to a standard library type, but only if the performance and convenience of that type was on par with the existing scodec-bits types.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants