IPROTO-265 Remove additional byte[] allocations for nested writers #192

wburns · 2023-05-08T21:39:06Z

Add new subWriter method to implement to allow reusing encoder instances
Add some common default methods to the TagWriter/TagReader interfaces
Add common way to write a fixed varint of 5 bytes

https://issues.redhat.com/browse/IPROTO-265
https://issues.redhat.com/browse/IPROTO-266

Also added changes to allow for custom Decoder/Encoder instances to be used instead of the supplied ones.

wburns · 2023-05-08T21:40:59Z

Performance for writers, where I was able to completely remove all additional byte[] instances

NEW

Benchmark                                      (byteArrayOrStream)  (userByteArraySize)  Mode  Cnt     Score    Error  Units
ProtostreamBenchmark.testMarshallAddress                      true                  N/A  avgt    6   444.554 ±  2.691  ns/op
ProtostreamBenchmark.testMarshallAddress                     false                  N/A  avgt    6   424.578 ±  6.260  ns/op
ProtostreamBenchmark.testMarshallIracMetadata                 true                  N/A  avgt    6   567.723 ±  7.429  ns/op
ProtostreamBenchmark.testMarshallIracMetadata                false                  N/A  avgt    6   540.126 ±  5.248  ns/op
ProtostreamBenchmark.testMarshallUser                         true                   10  avgt    6  1886.560 ± 15.246  ns/op
ProtostreamBenchmark.testMarshallUser                         true                 8096  avgt    6  3659.781 ±  8.508  ns/op
ProtostreamBenchmark.testMarshallUser                        false                   10  avgt    6  1851.959 ± 33.655  ns/op
ProtostreamBenchmark.testMarshallUser                        false                 8096  avgt    6  2799.700 ±  7.628  ns/op

6.4.3-SNAPSHOT

Benchmark                                      (byteArrayOrStream)  (userByteArraySize)  Mode  Cnt     Score    Error  Units
ProtostreamBenchmark.testMarshallAddress                      true                  N/A  avgt    6   483.135 ±  2.088  ns/op
ProtostreamBenchmark.testMarshallAddress                     false                  N/A  avgt    6   736.977 ±  2.959  ns/op
ProtostreamBenchmark.testMarshallIracMetadata                 true                  N/A  avgt    6  1112.077 ± 20.424  ns/op
ProtostreamBenchmark.testMarshallIracMetadata                false                  N/A  avgt    6  1342.824 ±  4.419  ns/op
ProtostreamBenchmark.testMarshallUser                         true                   10  avgt    6  2600.131 ± 42.651  ns/op
ProtostreamBenchmark.testMarshallUser                         true                 8096  avgt    6  5377.703 ± 15.728  ns/op
ProtostreamBenchmark.testMarshallUser                        false                   10  avgt    6  2869.619 ± 27.668  ns/op
ProtostreamBenchmark.testMarshallUser                        false                 8096  avgt    6  4899.339 ± 23.065  ns/op

6.4.2.Final

Benchmark                                      (byteArrayOrStream)  (userByteArraySize)  Mode  Cnt     Score    Error  Units
ProtostreamBenchmark.testMarshallAddress                      true                  N/A  avgt    6   488.974 ±  2.547  ns/op
ProtostreamBenchmark.testMarshallAddress                     false                  N/A  avgt    6   736.031 ±  6.277  ns/op
ProtostreamBenchmark.testMarshallIracMetadata                 true                  N/A  avgt    6  1102.625 ±  5.741  ns/op
ProtostreamBenchmark.testMarshallIracMetadata                false                  N/A  avgt    6  1343.379 ±  4.145  ns/op
ProtostreamBenchmark.testMarshallUser                         true                   10  avgt    6  2607.106 ± 34.377  ns/op
ProtostreamBenchmark.testMarshallUser                         true                 8096  avgt    6  5378.692 ± 22.828  ns/op
ProtostreamBenchmark.testMarshallUser                        false                   10  avgt    6  2856.660 ± 20.676  ns/op
ProtostreamBenchmark.testMarshallUser                        false                 8096  avgt    6  4904.252 ± 13.979  ns/op

Unfortunately, reading a stream is not as beneficial as there is no easy way to skip ahead with marks that doesn't conflict with the isAtEnd method. I will post the perf numbers for reads later today, but I expect byte array based TagReaderImpl to have faster performance.

wburns · 2023-05-09T03:38:44Z

Read perf isn't quite as good as I would hope yet, going to look at it some more tomorrow. The one that was actually affected is the array based read that allows for no copy, unfortunately the InputStream variant requires a copy that I am not sure I can get rid of, so I will probably just try to get it the same performance as it is currently in that case.

NEW

Benchmark                                    (byteArrayOrStream)  (userByteArraySize)  Mode  Cnt     Score    Error  Units
ProtostreamBenchmark.testUnmarshallAddress                  true                  N/A  avgt    6   138.618 ±  2.057  ns/op
ProtostreamBenchmark.testUnmarshallAddress                 false                  N/A  avgt    6   406.437 ±  2.792  ns/op
ProtostreamBenchmark.testUnmarshallMetadata                 true                  N/A  avgt    6   301.055 ±  2.816  ns/op
ProtostreamBenchmark.testUnmarshallMetadata                false                  N/A  avgt    6   582.103 ± 12.788  ns/op
ProtostreamBenchmark.testUnmarshallUser                     true                   10  avgt    6   704.985 ±  7.099  ns/op
ProtostreamBenchmark.testUnmarshallUser                     true                 8096  avgt    6  1429.527 ±  9.361  ns/op
ProtostreamBenchmark.testUnmarshallUser                    false                   10  avgt    6  1057.003 ± 34.231  ns/op
ProtostreamBenchmark.testUnmarshallUser                    false                 8096  avgt    6  2674.062 ±  4.725  ns/op

4.6.3-SNAPSHOT

Benchmark                                    (byteArrayOrStream)  (userByteArraySize)  Mode  Cnt     Score    Error  Units
ProtostreamBenchmark.testUnmarshallAddress                  true                  N/A  avgt    6   137.458 ±  1.184  ns/op
ProtostreamBenchmark.testUnmarshallAddress                 false                  N/A  avgt    6   331.708 ±  2.669  ns/op
ProtostreamBenchmark.testUnmarshallMetadata                 true                  N/A  avgt    6   304.312 ±  2.491  ns/op
ProtostreamBenchmark.testUnmarshallMetadata                false                  N/A  avgt    6   514.835 ±  2.664  ns/op
ProtostreamBenchmark.testUnmarshallUser                     true                   10  avgt    6   712.873 ±  7.157  ns/op
ProtostreamBenchmark.testUnmarshallUser                     true                 8096  avgt    6  2282.761 ±  8.351  ns/op
ProtostreamBenchmark.testUnmarshallUser                    false                   10  avgt    6  1040.317 ±  9.922  ns/op
ProtostreamBenchmark.testUnmarshallUser                    false                 8096  avgt    6  2683.187 ± 21.002  ns/op

wburns · 2023-05-09T12:30:27Z

Not finding any way to speed up the InputStream version more than it is. Our final usage though will be using ByteBuf which can more easily take advantage of the read changes similar to the ByteBuffer since it can use a slice instead.

wburns · 2023-05-09T21:08:35Z

I have some more changes after this is integrated that will test ByteBuf perf.

A preview of what state they are currently is the following:

Benchmark                                      (serializationType)  (userByteArraySize)  Mode  Cnt     Score   Error  Units

ProtostreamBenchmark.testMarshallAddress                  BYTE_BUF                  N/A  avgt    6   272.715 ± 4.272  ns/op
ProtostreamBenchmark.testMarshallIracMetadata             BYTE_BUF                  N/A  avgt    6   497.901 ± 3.265  ns/op
ProtostreamBenchmark.testMarshallUser                     BYTE_BUF                   10  avgt    6  1020.351 ± 9.291  ns/op
ProtostreamBenchmark.testMarshallUser                     BYTE_BUF                 8096  avgt    6  1109.903 ± 9.272  ns/op
ProtostreamBenchmark.testUnmarshallAddress                BYTE_BUF                  N/A  avgt    6   180.323 ± 1.314  ns/op
ProtostreamBenchmark.testUnmarshallMetadata               BYTE_BUF                  N/A  avgt    6   366.013 ± 2.660  ns/op
ProtostreamBenchmark.testUnmarshallUser                   BYTE_BUF                   10  avgt    6  1152.420 ± 8.283  ns/op
ProtostreamBenchmark.testUnmarshallUser                   BYTE_BUF                 8096  avgt    6  2040.321 ± 6.300  ns/op

So the write perf is amazing as it uses a pooled ByteBuf instance, reads for User is a bit odd though, going to be looking closer.

wburns · 2023-05-09T21:44:45Z

I have added a new commit that allows for custom Encoder/Decoder instances to be used which is how I was able to do the test for ByteBuf based encoder and decoders.

wburns · 2023-05-09T21:47:55Z

infinispan/infinispan-benchmarks#18 is the benchmark PR testing that requires these changes and found those results.

pruivo

LGTM, just minor suggestions 👍

core/src/main/java/org/infinispan/protostream/TagWriter.java

core/src/main/java/org/infinispan/protostream/TagReader.java

core/src/main/java/org/infinispan/protostream/TagWriter.java

core/src/main/java/org/infinispan/protostream/annotations/impl/GeneratedMarshallerBase.java

core/src/test/java/org/infinispan/protostream/ProtobufUtilTest.java

core/src/main/java/org/infinispan/protostream/impl/TagWriterImpl.java

pruivo

LGTM, just minor suggestions 👍

wburns · 2023-05-22T18:34:17Z

I pushed review comments. Let me know if it is okay and I can squash the commit.

pruivo · 2023-05-23T17:02:34Z

core/src/test/java/org/infinispan/protostream/ProtobufUtilTest.java

@@ -58,6 +57,7 @@ public void testComputeMessageSize() throws Exception {

      messageSize = ProtobufUtil.computeWrappedMessageSize(ctx, user);

+      // Actual array is 4 bigger because of fixed Varint


core/src/main/java/org/infinispan/protostream/impl/TagReaderImpl.java

pruivo · 2023-05-23T17:16:52Z

core/src/main/java/org/infinispan/protostream/WrappedMessage.java

+               if (nestedWriter instanceof Closeable) {
+                  ((Closeable) nestedWriter).close();
+               }


forgot to ask, don't you have to close every time?

Suggested change

if (nestedWriter instanceof Closeable) {

((Closeable) nestedWriter).close();

}

nestedWriter.getWriter().close();

Technically it is always true afaik, but this is cleaner.

…iters * Add new subWriter method to implement to allow reusing encoder instances * Add some common default methods to the TagWriter/TagReader interfaces * Add common way to write a fixed varint of 5 bytes

…iter/TagReader

wburns · 2023-08-10T17:49:36Z

Updated

pruivo · 2023-08-10T18:02:28Z

merged! thanks @wburns !

wburns force-pushed the IPROTO-265_nested_writer_allocations branch from 7afea21 to b3074e5 Compare May 9, 2023 12:31

wburns changed the title ~~IPROTO Remove additional byte[] allocations for nested writers~~ IPROTO-265 Remove additional byte[] allocations for nested writers May 9, 2023

pruivo requested changes May 17, 2023

View reviewed changes

wburns force-pushed the IPROTO-265_nested_writer_allocations branch from 763bd1e to 12490a3 Compare May 22, 2023 19:18

pruivo reviewed May 23, 2023

View reviewed changes

wburns added 2 commits August 10, 2023 10:31

IPROTO-265 Remove additional byte[] allocations for nested readers/wr…

4eadd2d

…iters * Add new subWriter method to implement to allow reusing encoder instances * Add some common default methods to the TagWriter/TagReader interfaces * Add common way to write a fixed varint of 5 bytes

IPROTO-266 Allow for custom Encoder/Decoder implementations for TagWr…

ca65721

…iter/TagReader

wburns force-pushed the IPROTO-265_nested_writer_allocations branch from 12490a3 to ca65721 Compare August 10, 2023 17:49

pruivo approved these changes Aug 10, 2023

View reviewed changes

pruivo merged commit 2a34478 into infinispan:main Aug 10, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPROTO-265 Remove additional byte[] allocations for nested writers #192

IPROTO-265 Remove additional byte[] allocations for nested writers #192

wburns commented May 8, 2023 •

edited

Loading

wburns commented May 8, 2023

wburns commented May 9, 2023 •

edited

Loading

wburns commented May 9, 2023

wburns commented May 9, 2023

wburns commented May 9, 2023

wburns commented May 9, 2023

pruivo left a comment

pruivo left a comment

wburns commented May 22, 2023

pruivo May 23, 2023

pruivo May 23, 2023

wburns Aug 10, 2023

wburns commented Aug 10, 2023

pruivo commented Aug 10, 2023

		@@ -58,6 +57,7 @@ public void testComputeMessageSize() throws Exception {

		messageSize = ProtobufUtil.computeWrappedMessageSize(ctx, user);

		// Actual array is 4 bigger because of fixed Varint

IPROTO-265 Remove additional byte[] allocations for nested writers #192

IPROTO-265 Remove additional byte[] allocations for nested writers #192

Conversation

wburns commented May 8, 2023 • edited Loading

wburns commented May 8, 2023

wburns commented May 9, 2023 • edited Loading

wburns commented May 9, 2023

wburns commented May 9, 2023

wburns commented May 9, 2023

wburns commented May 9, 2023

pruivo left a comment

Choose a reason for hiding this comment

pruivo left a comment

Choose a reason for hiding this comment

wburns commented May 22, 2023

pruivo May 23, 2023

Choose a reason for hiding this comment

pruivo May 23, 2023

Choose a reason for hiding this comment

wburns Aug 10, 2023

Choose a reason for hiding this comment

wburns commented Aug 10, 2023

pruivo commented Aug 10, 2023

wburns commented May 8, 2023 •

edited

Loading

wburns commented May 9, 2023 •

edited

Loading