-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDC-NCM poor performance for known reasons #2068
Comments
Hello rgrr, The same thing has been observed on my platform. RNDIS performance is fairly reasonable, But NCM is 10 times slower then RNDIS. Also there is a bug in the implementation which I filed sometime ago but still haven't got any response yet. |
Hello joshikeval3131, to be honest, I'm more or less a novice to (Tiny)USB from developer side. Do you have some pointers where and how to start debugging the NCM code? Of course I have found ncm_device.c et al, but even adding some debug output still generates more confusion than clarity. |
New evidence for the NCM problem. I have done some tests with the iperf server provided in the lwIP examples and recorded with Wireshark. Once I have done the tests with ECM and once with NCM. iperf command line on client side (Linux) is
Wiresharks statistics graphs are as follows. Good case is with ECM, although with MSS=1460/1500 I find it suspicious: Bad case is with NCM, small MSS are generating a lot of errors within the stream:
Does anyone has any pointers where to search (ok, I know it is most likely in ncm_device) |
Hmmm, does anybody read this? I want to make a proposal: the ncm_device driver seems to be very buggy, e.g. it operates on incoming frames while the previous one is still processed, also it does no ZLP insertion where it should insert one. These bugs are the cause for the above described hickups. I have implemented a simple version of the driver, which allows only one ethernet frame on reception/transmission. This would put the focus on reliability, not so much on performance. If one of the maintainers agree, I will file a PR on this. If one wants to check the code, it can be found here: https://github.com/rgrr/yapicoprobe/blob/feature/systemview-ncm-debugging/src/net/tinyusb/ncm_device_simple.c |
|
Hello rgrr, Thanks for finding out these bugs I'll test this on my device and let you know. |
Thanks for this analysis |
Note, that there is a pull request #2227 about this. I have done a complete rewrite in the meantime. |
and some more images |
rgrr, Kudos and thanks for writing a very clear concise documentation. I was able to see that the NCM bandwidth increased by a factor or 2 😄 . However I observed that as the segment size kept increasing the bandwidth was decreasing. Currently trying to understand and analyze your changes in the driver. |
Do you have some kind of available test application? What is the exact iperf command line? |
Yup I am using standard lwip webserver example from TinyUSB and iperf application from lwip-contrib/examples/lwiperf I am executing following command. Here are the logs that I am getting. Iperf logs
TinyUSB NCM Driver logs
|
ooopssss... and the example is running with ECM/RNDIS? "request blocked" is not bad, but the "VALIDATION FAILED" messages are fatal. Possible reasons that come to my mind:
What I also do not understand is "WARNING: attempt to set TCP maximum segment size to 100, but got 536". That also indicates that something is really wrong. |
It is possible, that I have created a dependency of the driver with my glue code to TinyUSB. The suspicious part looks like: static void context_tinyusb_linkoutput(void *param)
/**
* Put \a xmt_buff into TinyUSB.
*
* Context: TinyUSB
*/
{
if ( !tud_network_can_xmit(xmt_buff_len)) {
//printf("context_tinyusb_linkoutput: sleep\n");
vTaskDelay(pdMS_TO_TICKS(5));
taskDISABLE_INTERRUPTS();
usbd_defer_func(context_tinyusb_linkoutput, NULL, false); // put yourself at end of TinyUSB event queue
taskENABLE_INTERRUPTS();
}
else {
tud_network_xmit(xmt_buff, xmt_buff_len);
}
} // context_tinyusb_linkoutput
static err_t linkoutput_fn(struct netif *netif, struct pbuf *p)
/**
* called by lwIP to transmit data to TinyUSB
*
* Context: lwIP
*/
{
if ( !tud_ready()) {
return ERR_USE;
}
if (xmt_buff_len != 0) {
return ERR_USE;
}
// copy data into temp buffer
xmt_buff_len = pbuf_copy_partial(p, xmt_buff, p->tot_len, 0);
assert(xmt_buff_len < sizeof(xmt_buff));
taskDISABLE_INTERRUPTS();
usbd_defer_func(context_tinyusb_linkoutput, NULL, false);
taskENABLE_INTERRUPTS();
return ERR_OK;
} // linkoutput_fn This glue code is driving me nuts, because actually there is no legal (and correct) way to make a function call in TinyUSB context. The above is my best bet. But as said: perhaps the first usbd_xfer_func() perhaps introduces something unintended. I will look deeper into that. |
Ok, first check shows, that with the lwip-webserver example it is running as well. Little bit slow, but anyway. You can find the sources here: https://github.com/rgrr/tinyusb/tree/feature/standard-webserver/examples/device/net_lwip_webserver |
rgrr, I have confirmed that , lwip-webserver is also running fine on my platform.
I suspect that this definitely indicates some problem. Also it works for smaller segment sizes but as I increase the segment size the bandwidth decreases and later point driver just crashes. :( |
Please consider ncm_device as a rewrite. I have dropped more or less the original and started fresh but used the original as a template for some things. Problem with the original ncm_device was, that it reused internal buffers already for reception while transmission of its contents to lwIP was still running. |
I have taken a reference of glue logic from Implemented the same. As I increased the TinyUSB buffer size, the test case worked for all segment size. 😃 iperf command used:
Wireshark analysis Also please find the logs attached. Observation -:
Note-: I am using the NCM driver in NO OS setting. |
Actually I have no idea what the bottleneck could be. Have you checked the wireshark log if you see the expected? Perhaps your iperf behaves not as you want. Your iperf output:
Mine (2.1.9):
For debugging purposes: could you please change
so that it is printing. I had the case under windows, that after several days, the driver seemed to be in a state were it did output the above message very often. |
Tried with iperf 2.1.9 same result
Also with little clean up #2227 should be okay to merge 👍 Do you see any other blockers ? |
Expectations with glue logic: there should be an official call to put an event into the TinyUSB event queue so that one get into the context of TinyUSB. lwIP API offers tcpip_try_callback() or tcpip_callback(). Currently one has to use usbd_defer_func() which is actually TinyUSB private and also does just block USB interrupts. Clean up: will follow soon. Currently I cannot see any further improvement. Perhaps wider use of the net_device will open some new wishes/issues. |
One point for improvement could be to change the webserver example to use NCM by default. This would improve compatibility over the several OSes, because NCM is supported by Linux, MacOS (AFAIK), Win11 (AFAIK) and Win10 (with minor driver installation). With ECM or RNDIS I had no luck under Win10, at least on my machine. |
One more thing (out of topic): how about adding iperf to the webserver example to get a reference example for performance measurements? |
Out of curiosity: have you ever compared with ECM/RNDIS? And have you done a bidirectional test with e.g. ((to me) surprisingly server-to-client does not work for small packets) |
As far as I know RNDIS is windows standard offering and Linux also supports RNDIS by default. Where as in NCM extra driver installation is required. Thus it is okay to keep RNDIS as default.
lwip Iperf will be a good to have example. 👍 I think we should add it.
No haven't done Bidirectional test. When I tried I showed following error message.
I suspect that in default lwip iperf example there is no RX logic. Can you confirm the same ? Thanks |
yes, that is a valid point: RNDIS is correctly detected by Win10. But there are some drawbacks: RNDIS changes somehow routing of my Linux desktop (Debian) and also RNDIS works only under Win10 if it is the only class of the USB device. So actually not a generic solution. Nevertheless as an example RNDIS should do it. But: those examples aren't just examples. To my understanding they are also some kind of compilation test. So if all examples go for RNDIS, the NCM device will not be compile tested which to my understanding is "bad". Concerning iperf: I'm observing the same behavior but just for MSS=100. MSS=800,1450 are working. So actually the iperf example is capable of doing transmission. But still there must be somewhere a bug in the iperf example. |
@Keval-Joshi-TI : "funny" thing, if you limit in lwiperf_tcp_client_send_more() (in lwiperf.c) the txlen_max to 700 (instead of TCP_MSS) then Don't know yet, if this indicates a bug in lwiperf, lwIP, TinyUSB or the device driver (but ECM shows the same behavior). Comparing ECM / NCM shows that NCM is factor 4 better in direction device -> host. PS: actual problem is, that tcp_output() does not create a lwiperf_tcp_client_sent() callback |
@rgrr : I am stuck at more basic problem. Even with your suggested change int the lwip example DEV-> HOST transfer is not happening in my case :(
-> Moreover I am trying to run lwip + tinyUSB in freeRTOS i.e LWIP with NO_SYS = 0 where lwip task has lower priority. Quick question on glue logic: -> The part where you call tcpip_try_callback_block which essentially schedules a function call which puts received ethernet packet into lwip context , looks all okay to me. -> But the one where you are calling the defer function looks little bit missplaced. In baremetal scenario I tried with and without defer function, its identical for me. Can you please why we need to put context_tinyusb_linkoutput call in the tud task event queue ?. |
Perhaps our lwIP configuration is different. Essential parts of mine:
Concerning the glue logic: this is also a little bit historically grown. I struggled around with lwIP, NCM, FreeRTOS and did not get it flying. So I made a lot of testing. One of them was to assure that the NCM device is always called from TinyUSB context. After many tests and attempts to repair NCM, I decided to rewrite it. First approach was ncm_device_simple in my repo which handles only one datagram in each direction. Compatibility even with Win10 was ways better, because with NCM Win10 allows other USB classes in parallel, with RNDIS it does not. Second step was to reimplement NCM device. Special case in the glue logic is context_tinyusb_linkoutput(): the idea is to put the deferred call at the end of the TinyUSB queue, so not using a simple while loop. But I made a quick test and it seems (for the iperf testcase) that the while loop is also ok: while ( !tud_network_can_xmit(xmt_buff_len)) {
//printf("context_tinyusb_linkoutput: sleep\n");
vTaskDelay(pdMS_TO_TICKS(5));
}
tud_network_xmit(xmt_buff, xmt_buff_len); But I did not test if this context switching is actually required. For me this is the cleaner way although the glue logic looks a little bit more cumbersome. |
gosh... sometimes it is more complicated... I have to correct myself: the above loop does not work, the one with the usbd_defer_func() is the correct one. And this is also true for ECM/RNDIS: they do fine with the defer loop but not with the simple loop from above. I will investigate this further if there is time, but I guess the simple loop blocks TinyUSB while it actually wants something to do. The problems appears for ECM immediately, for NCM (because of better buffering) for very small "-M" in the iperf invocation. Concerning lwiperf: the problem with the sender operations lies within the condition of the do-while-loop in lwiperf_tcp_client_send_more(). The condition Replacing the above condition with @joshikeval3131: Could you please test this as well? |
@rgrr :
Even with above mentioned change rx is not working in my case.
Let me check my lwip-stack version Updates on NCM /RNDIS + LWIP FreeRTOS migration.Converted my reference application to work with FreeRTOS. At max TX iperf hit 49Mbps bandwidth.
However the story so far with NCM is not good. With OS migration I can see some performance increase for smaller packet size i.e MSS = 100, 200. For MSS=400 & above the NCM driver crashes.
|
@rgrr : NCM driver worked for me for all segment sizes. When lwip_task priority == tud_task priority it works well and as expected. I am trying to get iperf -r working on my platform. |
My device is a Raspi Pico (RP2040), i.e. dual-core Cortex-M0. TinyUSB has higher priority than lwIP. I will test, if it matters which process has higher priority (both processes are limited to one core). lwiperf: I inserted several printf() for debugging. They pointed very fast to the loop in lwiperf_tcp_client_send_more(). What is your current glue code? Especially if you are using TinyUSB I would use mine ;-) |
Just since I'm also using the NCM device and didn't experience any problem with my RP2040-based project up to now: @rgrr: You are using an RTOS? I'm going bare-metal. In case you want to do measurements using my setup (you just need to enable iperf but you know how to do that ;): https://github.com/OpenLightingProject/rp2040-dmxsun |
@kripton : does it mean that you are using the new NCM device? It did not do any tests with your project although I like your work. I stopped when "make" asked for npm :-/ Nevertheless your glue code points me into the direction why the simple loop from above did not work: the call to tud_task() was missing. Corrected version looks like this: static void context_tinyusb_linkoutput(void *param)
/**
* Put \a xmt_buff into TinyUSB.
*
* Context: TinyUSB
*/
{
while ( !tud_network_can_xmit(xmt_buff_len)) {
printf("context_tinyusb_linkoutput: sleep\n");
vTaskDelay(pdMS_TO_TICKS(1));
tud_task();
}
tud_network_xmit(xmt_buff, xmt_buff_len);
} // context_tinyusb_linkoutput |
No, i'm using the original, first version and I did notnface any problems with it.
True, dmxsun is compiling a React web app, Sorry, I know it causes additional effort.
So does it solve your problems with the original NCM device or your new one? |
You will stumble into problems if the transmitted packets are coming fast and in different sizes. My initial application was SystemView.
I'm actually not the one having problems with it ;-) I'm just trying to track and understand the problems @joshikeval3131 has. The simple loop was just a reaction to his complaint about an additional usb_defer call. Unfortunately wrong in the first attempt. |
Made some tests with priority: for my device tud_task priority > lwip_task priority is mandatory. Otherwise especially target->host transfer collapses. |
Thanks for conformation on this Iperf Rx is finally functional for me I am getting good results with the changes you suggested in lwiperf_tcp_client_send_more() In my case so far I am getting much better performance with the new driver compare to older one. Note -:
|
This has been closed by #2227 |
Operating System
Linux
Board
Raspi Pico RP2040
Firmware
My probe project (https://github.com/rgrr/yapicoprobe/tree/feature/systemview-rndis) uses lwIP / TinyUSB to provide a network interface for SystemView (https://www.segger.com/products/development-tools/systemview/).
The probe is running on a Pico (RP2040), the connected target board is an nRF52840. The nRF52840 generates the SystemView events (https://github.com/rgrr/playground/tree/feature/xray/tools/SystemView), the events are collected from the target via RTT by the RP2040. SystemView itself is connected via TCP to the RP2040 and collects the generated data.
For the underlying Ethernet over USB protocol one can select between ECM/RNDIS/NCM in CMakeLists.txt of the probe firmware.
The probe is using the raw lwIP interface.
For performance testing there is the example iperf server contained in the firmware.
What happened ?
if one selects ECM/RNDIS as underlying protocol, the event throughput is around 30000 events/s,
if one selects NCM, the event throughput is < 1000 events/s.
It seems that network traffic is stuttering, i.e. data transfer (traced in the sent-event) pauses and then transfer bursts are happening.
Data is fed from the application to lwIP/TinyUSB the same in all three cases.
What is surprising is, that iperf throughput shows almost the same values for all three cases. So I'm wondering what's the difference between the iperf server and my SystemView server.
Is it because SystemView traffic is almost unidirectional and NCM needs somehow a trigger to start transfer?
How to reproduce ?
Complicated... and I'm actually not sure if this is a bug in TinyUSB or a configuration/handling problem on my side.
To reproduce one needs a Pico and a target board. The required firmware/setup is described above.
I'm really willing to dig deep into TinyUSB, so I'm actually looking for some pointers where to go.
Debug Log as txt file (LOG/CFG_TUSB_DEBUG=2)
On the target (nRF52840) the same firmware is running. The generated events lead to a data volume of < 20KByte/s.
ECM case
ECM transfers the data continuously, every ~170ms data transfer is happening.
NCM case
Here are long pauses between data transfer bursts
Screenshots
No response
I have checked existing issues, dicussion and documentation
The text was updated successfully, but these errors were encountered: