Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support minimum-size two's complement encoding #402

Open
JesusMcCloud opened this issue Oct 16, 2024 · 4 comments
Open

Support minimum-size two's complement encoding #402

JesusMcCloud opened this issue Oct 16, 2024 · 4 comments

Comments

@JesusMcCloud
Copy link

We'd like to contribute minimum-size two's complement number encoding and decoding for both signed and unsigned numbers.

We have an implementation of this encoding including a comprehensive test suite (over here).
The plan would be to extract everything but BigInt encoding and create a PR for it.

The test suite relies on property-based testing using Kotest. Here's the issue: How can this be implemented using kotlin.test?
What about data-driven testing?

Encoding tests need reproducible inputs and we do no know how to achieve this without re-implementing parts of Kotest and this is beyond the scope of contributing two's complement encoding.

@fzhinkin
Copy link
Collaborator

Hey!

Disclaimer: I'm not proficient in property-based testing.

The test suite relies on property-based testing using Kotest. Here's the issue: How can this be implemented using kotlin.test?

Well, it all depends on a particular test. For a test mostly relying on generator functions, a set of randomly generated values could probably be reduced to a several equivalence sets (good old), then only a few values could be picked from each set, and a test will be reduced to a data-driven unit test. That leads to a second part of the question:

What about data-driven testing?

As of today, kotlin-test does not help with such tests, unfortunately. So everything has to be written manually, but for tests with one-dimensional inputs (i.e. check how everything works with that int value) that should not be a dramatic burden (but it is annoying indeed).

We'd like to contribute minimum-size two's complement number encoding and decoding for both signed and unsigned numbers.

Could you please elaborate a bit on what exactly you'd like to contribute, what scenarios these encoding schemes cover and how widespread they are, what API you have in mind?

We're striving to keep the core module's API as minimal as possible, and provide staple functionality mostly.
Depending on "minimum-size two's complement number encoding" popularity, it might be worth either adding it to a new separate variadic-length-encoding-centric kotlinx-io module, or to keep it in a separate library.

@JesusMcCloud
Copy link
Author

Thanks for the swift reply!
What you are describing is a roll-yuor-own poor man's kotest-property and this would probably take more time to implement properly than the actual contribution.

As for the Encoding itself: it is just regular two's complement encoding without wasting zero bytes:

  • -1 -> FF
  • 1 -> 01
  • 0 -> 00
  • 257 -> 0101
  • 16777215 -> 00FFFFFF
  • 65555 -> 010013
  • 16777216 -> 01000000
  • 15253481 -> 00E8BFE9
  • -1446230472 -> A9CC4638
  • 8935431257042531517 -> 7C010762E6F134BD

The API would look as follows (these are just examples):

/**
 *  Encodes an unsigned Long to a minimum-size twos-complement byte array
 * @return the number of bytes written
 */
fun Sink.writeTwosComplement(number: ULong): Int {…}
/**
 * Consumes all remaining data from this source and interprets it as a signed [Int]
 *
 * @throws IllegalArgumentException if no or too much data is present
 */
@Throws(IllegalArgumentException::class)
fun Source.readTwosComplementInt(): Int {…}

We're using it for ASN.1 encoding, but personally: I lost count how many times I needed this in the past. Implementing it from scratch is error-prone and far from not fun, which is why we wanted to share. Then again, maybe that's just me and it is not that of a common use-case.
We're happy to maintain it ourselves as part of our library. After all, it is far less of a hassle and we can keep our tests as they are, if this is more clutter than a wanted feature. If this does turn out to be a desired feature at a later point in time, just ping me

@fzhinkin
Copy link
Collaborator

We're using it for ASN.1 encoding, but personally: I lost count how many times I needed this in the past. Implementing it from scratch is error-prone and far from not fun, which is why we wanted to share. Then again, maybe that's just me and it is not that of a common use-case.

Unlike other variadic-length encoding schemes (LEB/ULEB128, Protobuf's varint, ASN.1 Object Identifier's encoding), the scheme employed by ASN.1 to encode integers requires storing the length of an encoded value separately.

Most likely (as in case of ASN.1), field's length need to precede the encoded value.

Due to Sinks and Sources are being sequentially accessible only (or the other way around, they don't provide random access), there's no easy way to prepend a value returned from Sink.writeTwosComplement before the actual message.

It is a nice building block for different encoding schemes and it should be useful in such cases, but I doubt it could be used on its own as a general purpose way to write/read integer values.

We're happy to maintain it ourselves as part of our library. After all, it is far less of a hassle and we can keep our tests as they are, if this is more clutter than a wanted feature. If this does turn out to be a desired feature at a later point in time, just ping me

Sounds good to me. Anyway, thanks for opening the issue!

@JesusMcCloud
Copy link
Author

I'll be closing this issue here once we have a release of our library ready, so I can add a reference to it here, in case someone else needs this in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants