feat: don't split header and body across TCP packets #168

stormshield-pj50 · 2023-07-19T15:28:47Z

Currently, header and body of a frame are written to the underlying IO source in two steps. Depending on how eager the underlying stream is about flushing data, this can lead to two TCP packets being sent instead of header and body ending up in one.

By refactoring the internal implementation of Frame, we can reference it as a single byte slice and as a result, only use a single call to poll_write without additional allocations per frame.

thomaseizinger · 2023-07-19T15:51:57Z

Thank you for digging into this! This looks like a great improvement. Is there a reason why this needs to be configurable? We will always need to write a header so can't we just always reserve 12 bytes for it?

thomaseizinger · 2023-07-19T15:56:46Z

Could we also achieve something similar by just changing Io to not write the header and body in two distinct steps? Like, can't we concatenate header and body and try to write them as one byte blob?

thomaseizinger · 2023-07-19T16:05:09Z

concatenate header and body and try to write them as one byte blob?

This would mean an additional allocation per frame which is not great. But, we could introduce a new constructor for Frame<Data> that gets initialized from a slice. Currently, AsyncWrite::poll_write for Stream allocates anyway for each Frame. If we move this allocation into the Frame type, we can directly allocate an additional 12 bytes and store the header alongside it.

For this to work, we'd also have to change how Io accesses the bytes to be written but for all other frame-types, we don't have any payload, meaning we need something like "encode frame into byte vec" and use that within Io::start_send.

Thoughts?

stormshield-pj50 · 2023-07-20T08:20:46Z

Thank you for digging into this! This looks like a great improvement. Is there a reason why this needs to be configurable? We will always need to write a header so can't we just always reserve 12 bytes for it?

You're right we can assume that if we have enough space it must be 12 bytes, but in that case we have to expose header::HEADER_SIZE to the outside.

thomaseizinger · 2023-07-20T08:31:22Z

Thank you for digging into this! This looks like a great improvement. Is there a reason why this needs to be configurable? We will always need to write a header so can't we just always reserve 12 bytes for it?

You're right we can assume that if we have enough space it must be 12 bytes, but in that case we have to expose header::HEADER_SIZE to the outside.

I am not sure I fully understand what you mean?

Regardless, I think that #168 (comment) is an avenue worth exploring because it should avoid the need for any of these APIs by merging header + body into one chunk for data frames which should always result in one TCP packet, assuming it fits.

stormshield-pj50 · 2023-07-20T08:34:32Z

concatenate header and body and try to write them as one byte blob?

This would mean an additional allocation per frame which is not great. But, we could introduce a new constructor for Frame<Data> that gets initialized from a slice. Currently, AsyncWrite::poll_write for Stream allocates anyway for each Frame. If we move this allocation into the Frame type, we can directly allocate an additional 12 bytes and store the header alongside it.

For this to work, we'd also have to change how Io accesses the bytes to be written but for all other frame-types, we don't have any payload, meaning we need something like "encode frame into byte vec" and use that within Io::start_send.

Thoughts?

So you propose to allocate directly into a new Frame constructor: allocate 12 bytes more for the header, encode the header into the buffer then copy the slice into the buffer at offset 12. Eventually it means the Stream API set_reserved_header_size() becomes useless, isn't it ?

thomaseizinger · 2023-07-20T08:44:26Z

concatenate header and body and try to write them as one byte blob?

This would mean an additional allocation per frame which is not great. But, we could introduce a new constructor for Frame<Data> that gets initialized from a slice. Currently, AsyncWrite::poll_write for Stream allocates anyway for each Frame. If we move this allocation into the Frame type, we can directly allocate an additional 12 bytes and store the header alongside it.
For this to work, we'd also have to change how Io accesses the bytes to be written but for all other frame-types, we don't have any payload, meaning we need something like "encode frame into byte vec" and use that within Io::start_send.
Thoughts?

So you propose to allocate directly into a new Frame constructor: allocate 12 bytes more for the header, encode the header into the buffer then copy the slice into the buffer at offset 12. Eventually it means the Stream API set_reserved_header_size() becomes useless, isn't it ?

Correct! Frame<Data> is the only variant that actually has a body so I think it makes more sense to embed this feature there. We probably need to slightly change the internals of Frame and the interaction within Io where we extract the data to be written to the socket.

I'd have to look into the details but off the top of my head I am thinking:

Expose a &[u8] from Frame
For Frame<Data>, we just expose the internal buffer (where header and body are concatenated)
For all other frames, we just expose the slice of the buffer
Io needs to keep track of how many bytes of the current frame it already wrote. We do something like this already I believe.

The last point makes sure we don't allocate again in Io but instead request the slice and index into it based on how many bytes we already wrote to the socket.

stormshield-pj50 · 2023-07-20T09:40:06Z

concatenate header and body and try to write them as one byte blob?

This would mean an additional allocation per frame which is not great. But, we could introduce a new constructor for Frame<Data> that gets initialized from a slice. Currently, AsyncWrite::poll_write for Stream allocates anyway for each Frame. If we move this allocation into the Frame type, we can directly allocate an additional 12 bytes and store the header alongside it.
For this to work, we'd also have to change how Io accesses the bytes to be written but for all other frame-types, we don't have any payload, meaning we need something like "encode frame into byte vec" and use that within Io::start_send.
Thoughts?

So you propose to allocate directly into a new Frame constructor: allocate 12 bytes more for the header, encode the header into the buffer then copy the slice into the buffer at offset 12. Eventually it means the Stream API set_reserved_header_size() becomes useless, isn't it ?

Correct! Frame<Data> is the only variant that actually has a body so I think it makes more sense to embed this feature there. We probably need to slightly change the internals of Frame and the interaction within Io where we extract the data to be written to the socket.

I'd have to look into the details but off the top of my head I am thinking:
* Expose a `&[u8]` from `Frame`

* For `Frame<Data>`, we just expose the internal buffer (where header and body are concatenated)

* For all other frames, we just expose the slice of the buffer

* `Io` needs to keep track of how many bytes of the current frame it already wrote. We do something like this already I believe.
The last point makes sure we don't allocate again in Io but instead request the slice and index into it based on how many bytes we already wrote to the socket.

OK that's makes sense. I will try to work on it. In the same time I was wondering if we could avoid the copy of the buffer in poll_write() (or in Frame as you suggest) by implementing a second API on Stream similar to the Sink trait. For instance a start_send() method that takes ownership of a BytesMut and indicates whether there is some room at the start of the buffer for a Yamux header. With this we could avoid an allocation and a copy. What do you think about it ?

yamux/src/frame.rs

mxinden · 2023-08-02T10:42:58Z

Thank you @stormshield-pj50 for posting your findings here!

Our software has some gains of 50% throughput with this patch, as we are mostly sending small buffers with Yamux.

Can you add more details on the setup where you are seeing a 50% throughput increase? What is the latency and bandwidth between the two endpoints? Is it a LAN, WAN or routed over the Internet? How many TCP connections (or other flows) run concurrently? What is the lifetime of a stream on a single connection?

stormshield-pj50 · 2023-08-21T07:36:18Z

Thank you @stormshield-pj50 for posting your findings here!

Our software has some gains of 50% throughput with this patch, as we are mostly sending small buffers with Yamux.

Can you add more details on the setup where you are seeing a 50% throughput increase? What is the latency and bandwidth between the two endpoints? Is it a LAN, WAN or routed over the Internet? How many TCP connections (or other flows) run concurrently? What is the lifetime of a stream on a single connection?

Hi Max,
Sorry for the late reply, I was on vacation !
Using libp2p + Yamux + rustls, we tunnel a single iperf3 TCP connection between two local docker containers. As we prefix each tunneled packet with its size in the Yamux stream, in the end we write 4 packets for one tunneled packet. The Yamux stream lives as long as the connection.

thomaseizinger

Thank you!

Couple of questions regarding ergonomics of these new APIs.

yamux/src/frame.rs

yamux/src/connection/stream.rs

yamux/src/frame.rs

yamux/src/connection.rs

mxinden

I am not yet convinced that this is a feature worth supporting. Keep in mind, I might be missing something.

Using libp2p + Yamux + rustls, we tunnel a single iperf3 TCP connection between two local docker containers.

Do I understand correctly, that this is localhost networking only? Is this your primary use-case? If not, can you please provide numbers on a more realistic network environment (i.e. an environment with significant latency and bandwidth limitations)?

In my eyes this pull request introduces significant complexity. At this point I don't think this complexity is worth the improvement of a minor use-case (localhost networking) of this library.

thomaseizinger · 2023-09-09T07:59:06Z

In my eyes this pull request introduces significant complexity. At this point I don't think this complexity is worth the improvement of a minor use-case (localhost networking) of this library.

Is it really "significant"? I think we should be able to implement this such that it is almost all contained within Frame.

I think it makes perfect sense and is quite intuitive that a frame can be encoded into a single byte buffer. Not doing that is the weird thing, no?

I agree that we may need to polish the API a bit more but as long as it is contained, I am happy to accept this.

stormshield-pj50 · 2023-09-11T07:17:43Z

I am not yet convinced that this is a feature worth supporting. Keep in mind, I might be missing something.

Using libp2p + Yamux + rustls, we tunnel a single iperf3 TCP connection between two local docker containers.

Do I understand correctly, that this is localhost networking only? Is this your primary use-case? If not, can you please provide numbers on a more realistic network environment (i.e. an environment with significant latency and bandwidth limitations)?

The first aim of the dev is to prevent sending two TCP packets on the wire each time some data is written to the Yamux stream (one packet for Yamux header and one packet for data). Thus we can save one useless standalone header packet and at least 14 (ETH) + 20 (IP) + 20 (TCP) = 54 bytes for each bunch of data written to the stream. Take a look at the start of the PR for a packet capture that show this. That also means less network syscalls and a more scalable code.

Our local test is just for benchmarking purpose, however I think with or without latency the important thing is to limit the number of packets sent on the wire.

In my eyes this pull request introduces significant complexity. At this point I don't think this complexity is worth the improvement of a minor use-case (localhost networking) of this library.

thomaseizinger · 2023-09-15T09:59:32Z

@stormshield-pj50 I'll have a play with this API next week, ideally, I'd like to encapsulate more of this in the Frame type. Currently the zero-parsing stuff leaks to many other components which isn't very nice and makes things harder to understand.

Do you think we can refactor this such that it is completely an implementation detail of Frame?

thomaseizinger · 2023-09-15T10:00:39Z

@stormshield-pj50 I'll have a play with this API next week, ideally, I'd like to encapsulate more of this in the Frame type. Currently the zero-parsing stuff leaks to many other components which isn't very nice and makes things harder to understand.

Do you think we can refactor this such that it is completely an implementation detail of Frame?

I think, if we can contain this optimisation mostly in Frame, then we have a lot less complexity to deal with and can maybe convince @mxinden to land this :)

thomaseizinger · 2023-09-17T06:36:22Z

@stormshield-pj50 I pushed a commit that outlines a different approach for using zerocopy. It needs more work but I hope that it demonstrates, how I'd like to contain the functionality within the Frame module. As you can see from the diff, connection.rs and other modules are pretty much unaffected.

Unfortunately, zerocopy doesn't handle enums very well as you've probably encountered yourself in the previous commits. They seem to be working on it but I am not sure when they'll land that work. For now, I think the workaround isn't too bad.

Let me know if you are willing to continue down this path :)

thomaseizinger

Left some more notes on what still needs work!

yamux/src/frame.rs

yamux/src/frame/io.rs

stormshield-pj50 · 2023-09-18T06:40:46Z

@stormshield-pj50 I pushed a commit that outlines a different approach for using zerocopy. It needs more work but I hope that it demonstrates, how I'd like to contain the functionality within the Frame module. As you can see from the diff, connection.rs and other modules are pretty much unaffected.

Unfortunately, zerocopy doesn't handle enums very well as you've probably encountered yourself in the previous commits. They seem to be working on it but I am not sure when they'll land that work. For now, I think the workaround isn't too bad.

Yes zerocopy poor enum handling is a pity. But their TryFromBytes wip could make the job in theory.

Let me know if you are willing to continue down this path :)

Containing zerocopy into the Frame module is a good idea! However I won't be able to work on it this week, maybe next week. I'll finish the work and push a new commit. Thank you for your help!

thomaseizinger · 2023-09-18T07:15:33Z

Let me know if you are willing to continue down this path :)

Containing zerocopy into the Frame module is a good idea! However I won't be able to work on it this week, maybe next week. I'll finish the work and push a new commit. Thank you for your help!

No rush from my end! Let me know if you have any questions on the code!

thomaseizinger · 2023-09-19T10:12:13Z

Note: Having this would be helpful in debugging the TCP RTT issue in libp2p/test-plans#304. At the moment, header and body are always split into two TCP packets which makes this rather cumbersome to follow along!

yamux/src/frame/header.rs

thomaseizinger · 2023-10-06T01:25:39Z

I spent some more cycles on this and I think this is now a quite clean implementation. Would appreciate another review @mxinden! The implementation is now entirely contained within Frame so there isn't much complexity and it is all guarded by the type-system, in some places even more than before!

thomaseizinger · 2023-10-06T01:26:30Z

@stormshield-pj50 Can you do me two final favours?

Test this in your system to make sure it has the desired effect.
Create a pull-request against rust-libp2p so we run our interoperability tests and check that we didn't break anything.

yamux/src/frame/io.rs

stormshield-pj50 · 2023-10-06T07:00:54Z

yamux/src/frame/header.rs

-}
-
-/// Decode a [`Header`] value.
+// Decode a [`Header`] value.
 pub fn decode(buf: &[u8; HEADER_SIZE]) -> Result<Header<()>, HeaderDecodeError> {


I'm not convinced that the decoding should be done on the raw buffer, as we have introduced zerocopy and can work instead on the struct header.

It isn't really good use of the type system to allow construction of types that are in an invalid state. Thus, until zerocopy can give us these guarantees, I'd rather have a dedicated decode function. This somewhat ensures that we are not working with a Header that has invalid values set.

stormshield-pj50 · 2023-10-06T07:35:15Z

@stormshield-pj50 Can you do me two final favours?

1. Test this in your system to make sure it has the desired effect.

2. Create a pull-request against `rust-libp2p` so we run our interoperability tests and check that we didn't break anything.

Sure! I will do it today.

stormshield-pj50 · 2023-10-06T09:19:15Z

@stormshield-pj50 Can you do me two final favours?

1. Test this in your system to make sure it has the desired effect.

2. Create a pull-request against `rust-libp2p` so we run our interoperability tests and check that we didn't break anything.

Sure! I will do it today.

Integration tests PR in rust-libp2p: libp2p/rust-libp2p#4598

stormshield-pj50 · 2023-10-09T07:12:38Z

@stormshield-pj50 Can you do me two final favours?

1. Test this in your system to make sure it has the desired effect.

2. Create a pull-request against `rust-libp2p` so we run our interoperability tests and check that we didn't break anything.

Sure! I will do it today.

On our side, our CI tests are all OK.

thomaseizinger

Good from my end but also needs @mxinden's approval.

mxinden

Overall I am fine moving forward here. The rational behind the change makes sense. Thank you for the continuous work.

While intuitively I expect this to have no performance drawback, I would still like to see some benchmarks. Would you mind:

Testing the latest version in your setup and reporting the latest numbers?
Running the concurrent.rs benchmarks, especially the unconstrained, comparing this patch with latest master, e.g. using critcmp?
Ideally (optional) plug this into https://github.com/libp2p/test-plans/blob/master/perf/README.md#libp2p-performance-benchmarking.

mxinden · 2023-10-09T08:40:43Z

yamux/src/connection.rs

-    fn poll_new_outbound(&mut self, cx: &mut Context<'_>) -> Poll<Result<Stream>> {
+    fn poll_new_outbound(&mut self, cx: &Context<'_>) -> Poll<Result<Stream>> {


Why diverge from the Future signature here? In other words, why remove the mut?

Triggered by a clippy lint, we can put it back but given that it is an internal API, I wouldn't bother.

mxinden · 2023-10-09T13:49:29Z

yamux/Cargo.toml


 [dev-dependencies]
 quickcheck = "1.0"
+futures = "0.3.4"


Why is this dev-dependency needed? futures is already imported above using 0.3.12.

mxinden · 2023-10-09T13:50:55Z

yamux/src/frame.rs

+use self::header::HEADER_SIZE;
+
+/// A Yamux message frame consisting of a single buffer with header followed by body.
+/// The header can be zerocopy parsed into a Header struct by calling header()/header_mut().


Suggested change

/// The header can be zerocopy parsed into a Header struct by calling header()/header_mut().

/// The header can be zerocopy parsed into a Header struct by calling `Header::header` and `Header::header_mut`.

mxinden · 2023-10-09T13:59:14Z

yamux/src/frame/header.rs


 /// The message frame header.
-#[derive(Clone, Debug, PartialEq, Eq)]
+#[derive(Clone, PartialEq, Eq, FromBytes, AsBytes, FromZeroes)]
+#[repr(packed)]


With this attribute the struct is no longer aligned to the target machine architecture. Do I understand correctly that we don't expect many accesses to Header fields and thus for this not to have a performance impact?

This is necessary for zerocopy to do its job so I don't think we have a choice unless we want to trade zerocopy for more allocations.

thomaseizinger · 2023-10-10T01:22:35Z

Running the concurrent.rs benchmarks, especially the unconstrained, comparing this patch with latest master, e.g. using critcmp?

I'd take these with a grain of salt. When I worked on the refactoring to poll APIs, I couldn't get really consistent numbers out of repeated benchmark runs without code changes, i.e. it was hard to establish a base-line. Plugging it into your perf-setup might make more sense.

stormshield-pj50 · 2023-10-10T07:21:53Z

Here's the unconstrained compared benchmarks between master and this PR:

thomaseizinger · 2023-10-10T08:02:59Z

Great results! Looks like we actually measurably improved performance :)

stormshield-pj50 · 2023-10-10T09:47:44Z

Results from our iperf3 CI test:

With Yamux master:
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   324 MBytes   272 Mbits/sec   26             sender
[  5]   0.00-10.05  sec   322 MBytes   269 Mbits/sec                  receiver

With Yamux PR:
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   831 MBytes   697 Mbits/sec  32              sender
[  5]   0.00-10.04  sec   829 MBytes   693 Mbits/sec                  receiver

thomaseizinger · 2023-10-10T10:32:48Z

We just discussed this in the open maintainers call and @umgefahren made a really good suggestion: Instead of doing all this work with zerocopy, we could just use AsyncWrite::poll_write_vectored. That would allow us to pass two buffers down to the underlying implementation.

It is unfortunate that this idea only comes up now but I think it is well worth exploring as it would avoid the complexity of zerocopy.

stormshield-pj50 · 2023-10-10T11:52:23Z

We just discussed this in the open maintainers call and @umgefahren made a really good suggestion: Instead of doing all this work with zerocopy, we could just use AsyncWrite::poll_write_vectored. That would allow us to pass two buffers down to the underlying implementation.

It is unfortunate that this idea only comes up now but I think it is well worth exploring as it would avoid the complexity of zerocopy.

You can't be sure that poll_write_vectored() is well implemented on the underlying stream, as the default implementation takes the first non empty buffer and call poll_write() on it. I tested it a couple of weeks ago before submitting this PR and it was working with a TcpStream but not with a TlsStream on top of it.

thomaseizinger · 2023-10-10T18:45:01Z

We just discussed this in the open maintainers call and @umgefahren made a really good suggestion: Instead of doing all this work with zerocopy, we could just use AsyncWrite::poll_write_vectored. That would allow us to pass two buffers down to the underlying implementation.
It is unfortunate that this idea only comes up now but I think it is well worth exploring as it would avoid the complexity of zerocopy.

You can't be sure that poll_write_vectored() is well implemented on the underlying stream, as the default implementation takes the first non empty buffer and call poll_write() on it. I tested it a couple of weeks ago before submitting this PR and it was working with a TcpStream but not with a TlsStream on top of it.

But isn't that just a shortcoming of the TlsStream then? Shouldn't we patch those impls instead of making this more complex?

stormshield-pj50 · 2023-10-11T08:09:54Z

We just discussed this in the open maintainers call and @umgefahren made a really good suggestion: Instead of doing all this work with zerocopy, we could just use AsyncWrite::poll_write_vectored. That would allow us to pass two buffers down to the underlying implementation.
It is unfortunate that this idea only comes up now but I think it is well worth exploring as it would avoid the complexity of zerocopy.

You can't be sure that poll_write_vectored() is well implemented on the underlying stream, as the default implementation takes the first non empty buffer and call poll_write() on it. I tested it a couple of weeks ago before submitting this PR and it was working with a TcpStream but not with a TlsStream on top of it.

But isn't that just a shortcoming of the TlsStream then? Shouldn't we patch those impls instead of making this more complex?

On our side we won't spend more work time on this PR. We think the implementation is good enough:

zerocopy is a mainstream crate designed by Google and used by many Fuchsia OS devices.
internally zerocopy is just a safe cast with endianness bytes swap when needed.
the PR solution is self contained and we think it's not a good idea to rely on an underlying stream implementation of poll_write_vectored(), which by the way is optional in the specifications.
we have been using this code for months in our code base and it's working fine with improved performance as shown before in the previous discussion.

thomaseizinger · 2023-10-11T08:15:29Z

Which TLS library are you using?

stormshield-pj50 · 2023-10-11T08:17:58Z

Which TLS library are you using?

tokio-rustls

thomaseizinger · 2023-10-11T10:10:18Z

We just discussed this in the open maintainers call and @umgefahren made a really good suggestion: Instead of doing all this work with zerocopy, we could just use AsyncWrite::poll_write_vectored. That would allow us to pass two buffers down to the underlying implementation.
It is unfortunate that this idea only comes up now but I think it is well worth exploring as it would avoid the complexity of zerocopy.

You can't be sure that poll_write_vectored() is well implemented on the underlying stream, as the default implementation takes the first non empty buffer and call poll_write() on it. I tested it a couple of weeks ago before submitting this PR and it was working with a TcpStream but not with a TlsStream on top of it.

But isn't that just a shortcoming of the TlsStream then? Shouldn't we patch those impls instead of making this more complex?

On our side we won't spend more work time on this PR. We think the implementation is good enough:

zerocopy is a mainstream crate designed by Google and used by many Fuchsia OS devices.

internally zerocopy is just a safe cast with endianness bytes swap when needed.

the PR solution is self contained and we think it's not a good idea to rely on an underlying stream implementation of poll_write_vectored(), which by the way is optional in the specifications.

we have been using this code for months in our code base and it's working fine with improved performance as shown before in the previous discussion.

Thank you for writing down your view-points. zerocopy is definitely a well-designed library and I am not disputing its production-readiness. I do want to stress though that it isn't intrinsically necessary for this feature.

If we didn't care about reduced allocations, we could easily keep the Frame type as is and simply create a new buffer and write that in a single pass to the underlying stream. The only thing that zerocopy gives us is reducing this allocation. At the same time, we currently have to implement hacks because zerocopy doesn't allow for any validations during parsing.

Unfortunately, despite being familiar with AsyncWrite, it completely slipped my mind that we could use poll_write_vectored to achieve the exact same thing without taking on another dependency and deviating from what would be the "natural" design of Frame when ignoring performance optimisations.

I cannot follow your reasoning why we shouldn't rely on AsyncWrite::poll_write_vectored. In fact, I find using that is a very clean solution. The abstraction is specifically there to optimise performance when it is possible to write multiple buffers at the same time. tokio-rustls doesn't override that in its AsyncWrite implementation but I would be very surprised if they wouldn't accept a patch to do that.

Overall, I am hesitant to merge this as is knowing that there is an easier and more maintainable solution. Our time is limited and we need to carefully consider, what we take on as maintenance responsibility.

Thank you for iterating on this solution with me. I wish we would have talked about AsyncWrite::poll_write_vectored earlier before going down this pathway. I accept that you don't want to put any more time into it. I hope you understand that we don't want to take on more maintenance burden than necessary.

If I find some free time next week, I might check how hard it would be change to AsyncWrite::poll_write_vectored. I have a feeling it would be quite easy!

thomaseizinger · 2023-10-29T08:34:35Z

tokio-rustls doesn't override that in its AsyncWrite implementation but I would be very surprised if they wouldn't accept a patch to do that.

I've opened issue with them to discuss the implementation: rustls/tokio-rustls#26. I think that properly implementing this could be a nice optimization that the entire async rustls ecosystem can benefit from which is another reason on why we shouldn't build a specialised implementation here in rust-yamux for this problem.

stormshield-pj50 · 2023-10-30T08:10:01Z

tokio-rustls doesn't override that in its AsyncWrite implementation but I would be very surprised if they wouldn't accept a patch to do that.

I've opened issue with them to discuss the implementation: rustls/tokio-rustls#26. I think that properly implementing this could be a nice optimization that the entire async rustls ecosystem can benefit from which is another reason on why we shouldn't build a specialised implementation here in rust-yamux for this problem.

OK that's fine for us. However if you want to cover all Tokio ecosystem libp2p related, you should also take a look at Tokio Tunsgtenite WebSockets and also Yamux over Yamux as used by libp2p relay_v2.

thomaseizinger · 2024-02-14T07:40:17Z

I am closing this as stale.

For reference, I started working on correctly forwarding vectored writes but ran out of time. Here are the drafts / issues:

thomaseizinger reviewed Jul 20, 2023

View reviewed changes

yamux/src/frame.rs Outdated Show resolved Hide resolved

Compilation + test using zerocopy ok

1c06435

stormshield-pj50 force-pushed the yamux_reserved_header_size branch from 70f6777 to 1c06435 Compare September 6, 2023 14:54

thomaseizinger reviewed Sep 7, 2023

View reviewed changes

stormshield-pj50 added 2 commits September 7, 2023 15:07

Fix after comments

39fa1e5

cargo fmt

5d4e2d2

mxinden requested changes Sep 9, 2023

View reviewed changes

thomaseizinger added 2 commits September 17, 2023 16:20

WIP Minimal diff

62002ae

Some basic fixes

ee00e1d

thomaseizinger reviewed Sep 17, 2023

View reviewed changes

thomaseizinger changed the title ~~Stream can be configured with a reserved buffer size for data frame~~ Don't split header and body across TCP packets Sep 19, 2023

stormshield-pj50 commented Oct 3, 2023

View reviewed changes

yamux/src/frame/header.rs Outdated Show resolved Hide resolved

thomaseizinger requested a review from mxinden October 6, 2023 01:24

stormshield-pj50 commented Oct 6, 2023

View reviewed changes

yamux/src/frame/io.rs Outdated Show resolved Hide resolved

stormshield-pj50 commented Oct 6, 2023

View reviewed changes

stormshield-pj50 mentioned this pull request Oct 6, 2023

deps: update rust-yamux libp2p/rust-libp2p#4598

Closed

Ensure we check max body len before allocating

b65f2bd

thomaseizinger changed the title ~~Don't split header and body across TCP packets~~ feat: don't split header and body across TCP packets Oct 9, 2023

Don't allocate unless necessary

ee9c920

thomaseizinger approved these changes Oct 9, 2023

View reviewed changes

mxinden reviewed Oct 9, 2023

View reviewed changes

WIP: Use AsyncWrite::poll_write_vectored

ae3bc3d

thomaseizinger closed this Feb 14, 2024

		fn poll_new_outbound(&mut self, cx: &mut Context<'_>) -> Poll<Result<Stream>> {
		fn poll_new_outbound(&mut self, cx: &Context<'_>) -> Poll<Result<Stream>> {

	/// The header can be zerocopy parsed into a Header struct by calling header()/header_mut().
	/// The header can be zerocopy parsed into a Header struct by calling `Header::header` and `Header::header_mut`.

feat: don't split header and body across TCP packets #168

feat: don't split header and body across TCP packets #168

Conversation

stormshield-pj50 commented Jul 19, 2023 • edited by thomaseizinger Loading

thomaseizinger commented Jul 19, 2023 • edited Loading

thomaseizinger commented Jul 19, 2023

thomaseizinger commented Jul 19, 2023

stormshield-pj50 commented Jul 20, 2023

thomaseizinger commented Jul 20, 2023

stormshield-pj50 commented Jul 20, 2023

thomaseizinger commented Jul 20, 2023

stormshield-pj50 commented Jul 20, 2023

mxinden commented Aug 2, 2023

stormshield-pj50 commented Aug 21, 2023

thomaseizinger left a comment

Choose a reason for hiding this comment

mxinden left a comment

Choose a reason for hiding this comment

thomaseizinger commented Sep 9, 2023

stormshield-pj50 commented Sep 11, 2023

thomaseizinger commented Sep 15, 2023

thomaseizinger commented Sep 15, 2023

thomaseizinger commented Sep 17, 2023

thomaseizinger left a comment

Choose a reason for hiding this comment

stormshield-pj50 commented Sep 18, 2023

thomaseizinger commented Sep 18, 2023

thomaseizinger commented Sep 19, 2023

thomaseizinger commented Oct 6, 2023

thomaseizinger commented Oct 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stormshield-pj50 commented Oct 6, 2023

stormshield-pj50 commented Oct 6, 2023

stormshield-pj50 commented Oct 9, 2023

thomaseizinger left a comment

Choose a reason for hiding this comment

mxinden left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomaseizinger commented Oct 10, 2023

stormshield-pj50 commented Oct 10, 2023

thomaseizinger commented Oct 10, 2023

stormshield-pj50 commented Oct 10, 2023

thomaseizinger commented Oct 10, 2023

stormshield-pj50 commented Oct 10, 2023

thomaseizinger commented Oct 10, 2023

stormshield-pj50 commented Oct 11, 2023

thomaseizinger commented Oct 11, 2023

stormshield-pj50 commented Oct 11, 2023

thomaseizinger commented Oct 11, 2023

thomaseizinger commented Oct 29, 2023

stormshield-pj50 commented Oct 30, 2023

thomaseizinger commented Feb 14, 2024

stormshield-pj50 commented Jul 19, 2023 •

edited by thomaseizinger

Loading

thomaseizinger commented Jul 19, 2023 •

edited

Loading