Support minimum-size two's complement encoding #402

JesusMcCloud · 2024-10-16T11:20:47Z

We'd like to contribute minimum-size two's complement number encoding and decoding for both signed and unsigned numbers.

We have an implementation of this encoding including a comprehensive test suite (over here).
The plan would be to extract everything but BigInt encoding and create a PR for it.

The test suite relies on property-based testing using Kotest. Here's the issue: How can this be implemented using kotlin.test?
What about data-driven testing?

Encoding tests need reproducible inputs and we do no know how to achieve this without re-implementing parts of Kotest and this is beyond the scope of contributing two's complement encoding.

The text was updated successfully, but these errors were encountered:

fzhinkin · 2024-10-16T18:06:22Z

Hey!

Disclaimer: I'm not proficient in property-based testing.

The test suite relies on property-based testing using Kotest. Here's the issue: How can this be implemented using kotlin.test?

Well, it all depends on a particular test. For a test mostly relying on generator functions, a set of randomly generated values could probably be reduced to a several equivalence sets (good old), then only a few values could be picked from each set, and a test will be reduced to a data-driven unit test. That leads to a second part of the question:

What about data-driven testing?

As of today, kotlin-test does not help with such tests, unfortunately. So everything has to be written manually, but for tests with one-dimensional inputs (i.e. check how everything works with that int value) that should not be a dramatic burden (but it is annoying indeed).

We'd like to contribute minimum-size two's complement number encoding and decoding for both signed and unsigned numbers.

Could you please elaborate a bit on what exactly you'd like to contribute, what scenarios these encoding schemes cover and how widespread they are, what API you have in mind?

We're striving to keep the core module's API as minimal as possible, and provide staple functionality mostly.
Depending on "minimum-size two's complement number encoding" popularity, it might be worth either adding it to a new separate variadic-length-encoding-centric kotlinx-io module, or to keep it in a separate library.

JesusMcCloud · 2024-10-16T19:14:06Z

Thanks for the swift reply!
What you are describing is a roll-yuor-own poor man's kotest-property and this would probably take more time to implement properly than the actual contribution.

As for the Encoding itself: it is just regular two's complement encoding without wasting zero bytes:

-1 -> FF
1 -> 01
0 -> 00
257 -> 0101
16777215 -> 00FFFFFF
65555 -> 010013
16777216 -> 01000000
15253481 -> 00E8BFE9
-1446230472 -> A9CC4638
8935431257042531517 -> 7C010762E6F134BD

The API would look as follows (these are just examples):

/**
 *  Encodes an unsigned Long to a minimum-size twos-complement byte array
 * @return the number of bytes written
 */
fun Sink.writeTwosComplement(number: ULong): Int {…}

/**
 * Consumes all remaining data from this source and interprets it as a signed [Int]
 *
 * @throws IllegalArgumentException if no or too much data is present
 */
@Throws(IllegalArgumentException::class)
fun Source.readTwosComplementInt(): Int {…}

We're using it for ASN.1 encoding, but personally: I lost count how many times I needed this in the past. Implementing it from scratch is error-prone and far from not fun, which is why we wanted to share. Then again, maybe that's just me and it is not that of a common use-case.
We're happy to maintain it ourselves as part of our library. After all, it is far less of a hassle and we can keep our tests as they are, if this is more clutter than a wanted feature. If this does turn out to be a desired feature at a later point in time, just ping me

fzhinkin · 2024-10-18T18:44:33Z

We're using it for ASN.1 encoding, but personally: I lost count how many times I needed this in the past. Implementing it from scratch is error-prone and far from not fun, which is why we wanted to share. Then again, maybe that's just me and it is not that of a common use-case.

Unlike other variadic-length encoding schemes (LEB/ULEB128, Protobuf's varint, ASN.1 Object Identifier's encoding), the scheme employed by ASN.1 to encode integers requires storing the length of an encoded value separately.

Most likely (as in case of ASN.1), field's length need to precede the encoded value.

Due to Sinks and Sources are being sequentially accessible only (or the other way around, they don't provide random access), there's no easy way to prepend a value returned from Sink.writeTwosComplement before the actual message.

It is a nice building block for different encoding schemes and it should be useful in such cases, but I doubt it could be used on its own as a general purpose way to write/read integer values.

We're happy to maintain it ourselves as part of our library. After all, it is far less of a hassle and we can keep our tests as they are, if this is more clutter than a wanted feature. If this does turn out to be a desired feature at a later point in time, just ping me

Sounds good to me. Anyway, thanks for opening the issue!

JesusMcCloud · 2024-10-21T07:35:45Z

I'll be closing this issue here once we have a release of our library ready, so I can add a reference to it here, in case someone else needs this in the future.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support minimum-size two's complement encoding #402

Support minimum-size two's complement encoding #402

JesusMcCloud commented Oct 16, 2024

fzhinkin commented Oct 16, 2024

JesusMcCloud commented Oct 16, 2024

fzhinkin commented Oct 18, 2024

JesusMcCloud commented Oct 21, 2024

Support minimum-size two's complement encoding #402

Support minimum-size two's complement encoding #402

Comments

JesusMcCloud commented Oct 16, 2024

fzhinkin commented Oct 16, 2024

JesusMcCloud commented Oct 16, 2024

fzhinkin commented Oct 18, 2024

JesusMcCloud commented Oct 21, 2024