Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPROTO-265 Remove additional byte[] allocations for nested writers #192

Merged
merged 2 commits into from
Aug 10, 2023

Conversation

wburns
Copy link
Member

@wburns wburns commented May 8, 2023

  • Add new subWriter method to implement to allow reusing encoder instances
  • Add some common default methods to the TagWriter/TagReader interfaces
  • Add common way to write a fixed varint of 5 bytes

https://issues.redhat.com/browse/IPROTO-265
https://issues.redhat.com/browse/IPROTO-266

Also added changes to allow for custom Decoder/Encoder instances to be used instead of the supplied ones.

@wburns
Copy link
Member Author

wburns commented May 8, 2023

Performance for writers, where I was able to completely remove all additional byte[] instances

NEW

Benchmark                                      (byteArrayOrStream)  (userByteArraySize)  Mode  Cnt     Score    Error  Units
ProtostreamBenchmark.testMarshallAddress                      true                  N/A  avgt    6   444.554 ±  2.691  ns/op
ProtostreamBenchmark.testMarshallAddress                     false                  N/A  avgt    6   424.578 ±  6.260  ns/op
ProtostreamBenchmark.testMarshallIracMetadata                 true                  N/A  avgt    6   567.723 ±  7.429  ns/op
ProtostreamBenchmark.testMarshallIracMetadata                false                  N/A  avgt    6   540.126 ±  5.248  ns/op
ProtostreamBenchmark.testMarshallUser                         true                   10  avgt    6  1886.560 ± 15.246  ns/op
ProtostreamBenchmark.testMarshallUser                         true                 8096  avgt    6  3659.781 ±  8.508  ns/op
ProtostreamBenchmark.testMarshallUser                        false                   10  avgt    6  1851.959 ± 33.655  ns/op
ProtostreamBenchmark.testMarshallUser                        false                 8096  avgt    6  2799.700 ±  7.628  ns/op

6.4.3-SNAPSHOT

Benchmark                                      (byteArrayOrStream)  (userByteArraySize)  Mode  Cnt     Score    Error  Units
ProtostreamBenchmark.testMarshallAddress                      true                  N/A  avgt    6   483.135 ±  2.088  ns/op
ProtostreamBenchmark.testMarshallAddress                     false                  N/A  avgt    6   736.977 ±  2.959  ns/op
ProtostreamBenchmark.testMarshallIracMetadata                 true                  N/A  avgt    6  1112.077 ± 20.424  ns/op
ProtostreamBenchmark.testMarshallIracMetadata                false                  N/A  avgt    6  1342.824 ±  4.419  ns/op
ProtostreamBenchmark.testMarshallUser                         true                   10  avgt    6  2600.131 ± 42.651  ns/op
ProtostreamBenchmark.testMarshallUser                         true                 8096  avgt    6  5377.703 ± 15.728  ns/op
ProtostreamBenchmark.testMarshallUser                        false                   10  avgt    6  2869.619 ± 27.668  ns/op
ProtostreamBenchmark.testMarshallUser                        false                 8096  avgt    6  4899.339 ± 23.065  ns/op

6.4.2.Final

Benchmark                                      (byteArrayOrStream)  (userByteArraySize)  Mode  Cnt     Score    Error  Units
ProtostreamBenchmark.testMarshallAddress                      true                  N/A  avgt    6   488.974 ±  2.547  ns/op
ProtostreamBenchmark.testMarshallAddress                     false                  N/A  avgt    6   736.031 ±  6.277  ns/op
ProtostreamBenchmark.testMarshallIracMetadata                 true                  N/A  avgt    6  1102.625 ±  5.741  ns/op
ProtostreamBenchmark.testMarshallIracMetadata                false                  N/A  avgt    6  1343.379 ±  4.145  ns/op
ProtostreamBenchmark.testMarshallUser                         true                   10  avgt    6  2607.106 ± 34.377  ns/op
ProtostreamBenchmark.testMarshallUser                         true                 8096  avgt    6  5378.692 ± 22.828  ns/op
ProtostreamBenchmark.testMarshallUser                        false                   10  avgt    6  2856.660 ± 20.676  ns/op
ProtostreamBenchmark.testMarshallUser                        false                 8096  avgt    6  4904.252 ± 13.979  ns/op

Unfortunately, reading a stream is not as beneficial as there is no easy way to skip ahead with marks that doesn't conflict with the isAtEnd method. I will post the perf numbers for reads later today, but I expect byte array based TagReaderImpl to have faster performance.

@wburns
Copy link
Member Author

wburns commented May 9, 2023

Read perf isn't quite as good as I would hope yet, going to look at it some more tomorrow. The one that was actually affected is the array based read that allows for no copy, unfortunately the InputStream variant requires a copy that I am not sure I can get rid of, so I will probably just try to get it the same performance as it is currently in that case.

NEW

Benchmark                                    (byteArrayOrStream)  (userByteArraySize)  Mode  Cnt     Score    Error  Units
ProtostreamBenchmark.testUnmarshallAddress                  true                  N/A  avgt    6   138.618 ±  2.057  ns/op
ProtostreamBenchmark.testUnmarshallAddress                 false                  N/A  avgt    6   406.437 ±  2.792  ns/op
ProtostreamBenchmark.testUnmarshallMetadata                 true                  N/A  avgt    6   301.055 ±  2.816  ns/op
ProtostreamBenchmark.testUnmarshallMetadata                false                  N/A  avgt    6   582.103 ± 12.788  ns/op
ProtostreamBenchmark.testUnmarshallUser                     true                   10  avgt    6   704.985 ±  7.099  ns/op
ProtostreamBenchmark.testUnmarshallUser                     true                 8096  avgt    6  1429.527 ±  9.361  ns/op
ProtostreamBenchmark.testUnmarshallUser                    false                   10  avgt    6  1057.003 ± 34.231  ns/op
ProtostreamBenchmark.testUnmarshallUser                    false                 8096  avgt    6  2674.062 ±  4.725  ns/op

4.6.3-SNAPSHOT

Benchmark                                    (byteArrayOrStream)  (userByteArraySize)  Mode  Cnt     Score    Error  Units
ProtostreamBenchmark.testUnmarshallAddress                  true                  N/A  avgt    6   137.458 ±  1.184  ns/op
ProtostreamBenchmark.testUnmarshallAddress                 false                  N/A  avgt    6   331.708 ±  2.669  ns/op
ProtostreamBenchmark.testUnmarshallMetadata                 true                  N/A  avgt    6   304.312 ±  2.491  ns/op
ProtostreamBenchmark.testUnmarshallMetadata                false                  N/A  avgt    6   514.835 ±  2.664  ns/op
ProtostreamBenchmark.testUnmarshallUser                     true                   10  avgt    6   712.873 ±  7.157  ns/op
ProtostreamBenchmark.testUnmarshallUser                     true                 8096  avgt    6  2282.761 ±  8.351  ns/op
ProtostreamBenchmark.testUnmarshallUser                    false                   10  avgt    6  1040.317 ±  9.922  ns/op
ProtostreamBenchmark.testUnmarshallUser                    false                 8096  avgt    6  2683.187 ± 21.002  ns/op

@wburns
Copy link
Member Author

wburns commented May 9, 2023

Not finding any way to speed up the InputStream version more than it is. Our final usage though will be using ByteBuf which can more easily take advantage of the read changes similar to the ByteBuffer since it can use a slice instead.

@wburns wburns force-pushed the IPROTO-265_nested_writer_allocations branch from 7afea21 to b3074e5 Compare May 9, 2023 12:31
@wburns
Copy link
Member Author

wburns commented May 9, 2023

I have some more changes after this is integrated that will test ByteBuf perf.

A preview of what state they are currently is the following:

Benchmark                                      (serializationType)  (userByteArraySize)  Mode  Cnt     Score   Error  Units

ProtostreamBenchmark.testMarshallAddress                  BYTE_BUF                  N/A  avgt    6   272.715 ± 4.272  ns/op
ProtostreamBenchmark.testMarshallIracMetadata             BYTE_BUF                  N/A  avgt    6   497.901 ± 3.265  ns/op
ProtostreamBenchmark.testMarshallUser                     BYTE_BUF                   10  avgt    6  1020.351 ± 9.291  ns/op
ProtostreamBenchmark.testMarshallUser                     BYTE_BUF                 8096  avgt    6  1109.903 ± 9.272  ns/op
ProtostreamBenchmark.testUnmarshallAddress                BYTE_BUF                  N/A  avgt    6   180.323 ± 1.314  ns/op
ProtostreamBenchmark.testUnmarshallMetadata               BYTE_BUF                  N/A  avgt    6   366.013 ± 2.660  ns/op
ProtostreamBenchmark.testUnmarshallUser                   BYTE_BUF                   10  avgt    6  1152.420 ± 8.283  ns/op
ProtostreamBenchmark.testUnmarshallUser                   BYTE_BUF                 8096  avgt    6  2040.321 ± 6.300  ns/op

So the write perf is amazing as it uses a pooled ByteBuf instance, reads for User is a bit odd though, going to be looking closer.

@wburns wburns changed the title IPROTO Remove additional byte[] allocations for nested writers IPROTO-265 Remove additional byte[] allocations for nested writers May 9, 2023
@wburns
Copy link
Member Author

wburns commented May 9, 2023

I have added a new commit that allows for custom Encoder/Decoder instances to be used which is how I was able to do the test for ByteBuf based encoder and decoders.

@wburns
Copy link
Member Author

wburns commented May 9, 2023

infinispan/infinispan-benchmarks#18 is the benchmark PR testing that requires these changes and found those results.

Copy link
Member

@pruivo pruivo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just minor suggestions 👍

Copy link
Member

@pruivo pruivo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just minor suggestions 👍

@wburns
Copy link
Member Author

wburns commented May 22, 2023

I pushed review comments. Let me know if it is okay and I can squash the commit.

@wburns wburns force-pushed the IPROTO-265_nested_writer_allocations branch from 763bd1e to 12490a3 Compare May 22, 2023 19:18
@@ -58,6 +57,7 @@ public void testComputeMessageSize() throws Exception {

messageSize = ProtobufUtil.computeWrappedMessageSize(ctx, user);

// Actual array is 4 bigger because of fixed Varint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outdated

Comment on lines 302 to 304
if (nestedWriter instanceof Closeable) {
((Closeable) nestedWriter).close();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to ask, don't you have to close every time?

Suggested change
if (nestedWriter instanceof Closeable) {
((Closeable) nestedWriter).close();
}
nestedWriter.getWriter().close();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it is always true afaik, but this is cleaner.

…iters

* Add new subWriter method to implement to allow reusing encoder
  instances
* Add some common default methods to the TagWriter/TagReader interfaces
* Add common way to write a fixed varint of 5 bytes
@wburns wburns force-pushed the IPROTO-265_nested_writer_allocations branch from 12490a3 to ca65721 Compare August 10, 2023 17:49
@wburns
Copy link
Member Author

wburns commented Aug 10, 2023

Updated

@pruivo pruivo merged commit 2a34478 into infinispan:main Aug 10, 2023
4 checks passed
@pruivo
Copy link
Member

pruivo commented Aug 10, 2023

merged! thanks @wburns !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants