-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to speed things up (ST7789V2 + ESP32-C6-DevKitC-1) #142
Comments
Hmm I don't see anything particularly wrong in the code, but AFAICS you should be able to push roughly 60fps with 80Mhz SPI. Of course in practice that's going to be a bit lower (and I only did my napkin math real quick I might've made a mistake here). For one full screen of filler you're doing 320x240x2x8 bits so One thing I do not see though in your code is setting the MCU's processing speed/clocks. I don't know how ESP32 inits but I remember that on my hifive1 I had to set my clock to full. In my case it's 320Mhz, in yours it's 160Mhz from what I see. The default was something quite lower IIRC. It might be you're just running on lower clocks on your MCU. |
The |
Hmm that's quite unfortunate. Do you think we should abandon |
Thank you for the response and thoughts.
The full log (which of course I should've posted in the first place, but here) says
so unless something in the ESP IDF drops that down after the app starts, I think I should be running at 160MHz. |
That seems to be this bit of code. Looks like the intent is to read a 64-byte chunk from the pixel iterator, endian-flip it and send it off. EDIT: Changing that buffer size in a local copy of
|
Perfect, I was hoping to see how this goes. It seems the flipflopping of buffer isn't the main slowdown here. Making the buffer bigger improves it but not in a linear fashion at all, considering the 1024 case should've been 16 time better in theory and is < 10% improvement. Could you try changing fill_congiguous into a fill_solid, or possibly calling set_pixels directly? I just want to make sure it's not something funky. |
I did some testing and I think the ESP HAL has quite a bit of overhead for the kind of transfers we use. I'm currently testing on a ESP32-C3 with the no_std HAL. With some tweaks I was able to get the time to clear the framebuffer down to ~60ms, which still isn't great. Using a logic analyzer did reveal some interesting patterns: I'm not familiar enough with the ESP32-C3 internals to know if this dead time could be reduced. The "best" solution would probably be a framebuffer in RAM, which is then transferred via DMA to the display, but that does require a lot of RAM and there is currently no support for DMA in this create and the embedded-graphics ecosystem in general. |
I changed the draw bit to display.fill_solid(&fullscreen, if i % 2 == 0 { Rgb565::BLACK } else { Rgb565::WHITE }).unwrap();
which is still kind of unfortunate (and funky, I think?)... For what it's worth, changing |
I've worked on this a bit more and got the update reasonably fast by using DMA and bypassing the iterator. My implementation uses two DMA buffers, which makes it possible to fill one while the other is being transmitted. There are still gaps between the individual blocks, but the overall impact of this overhead is now fairly low. This really shows that using an dynamically dispatched iterator to transfer lots of data is a bad idea. Even with two DMA buffers the time it takes to fill one buffer up is much longer than the time it takes to transmit a buffer. This causes the SPI bus to be idle about 80% of the time, which is why it still takes 70ms to update the display at a 80 MHz SPI clock. But if we bypass the iterator and use I've uploaded my custom |
@rfuest Nice work! Am I understanding correctly that since the |
Yes, that's correct. Some API changes will be necessary to make this safer to use without the chance of accidentally using the wrong endianness. In my code the CPU is often just waiting for the DMA transfer to finish and converting a little-endian framebuffer into a big-endian format during that time wouldn't negatively impact the time it takes to update the display. But the |
So from what I understand there are a few issues here:
Optionally I think another issue is that:
I think if |
No it doesn't force runtime conversion, but It only support different endianness with
This has been an issue for a long time and there is even an open draft PR from 2021 with a possible solution in the I'm not sure if we (or someone) should fix
This might help for some drawing operations, but I'm not sure it is needed. Randomly drawing individual pixels over a relatively slow connection like SPI always comes with a lot of overhead. For images we shouldn't use iterators at all. I'm planning to add a something like |
One thing that If we allowed something like That could then be used by the expanded |
If you see my luluu repo, I went through the process of getting within a few percent of the theoretical bandwidth limits of the raspi 2040 spi device. There is some interesting stuff in the main firmware but in particular see the patches in the vendored crates, including the rp-hal implementation and mipidsi.
I believe I did something very similar to what you're describing here. It's not clear to me how much these "fixes" can or should actually be upstreamed but maybe it's useful for reference https://github.com/fu5ha/luluu/tree/main/software |
@akx could you please try the |
@almindor Sorry, I was away from this project for a while 😅 Not a great improvement (current mess of a code here – curiously enough my particular display doesn't seem to care if I pass in 319 or 320 as the "width" coordinate):
EDITRendering 320x120 is exactly twice as fast:
Rendering 160x120 is exactly twice as fast as that:
EDIT 2A similar program in C, using Espressif's own SDK and SPI drivers etc. pushes 65 FPS:
|
That's unfortunate. I'd expect a much better improvement. I got myself esp32-C6, once I have some time I'll try to experiment myself and see where the holdups are. |
Ok so for me, using the
So at least on the C6 this clearly shows that the buffering/abstraction we currently depend on is a big problem, but not the only issue.
We should be able, when using |
I did some more testing and if I deconstruct the SPI back after sending all the init stuff and prepping a data window, I'm able to send the raw buffer directly using the SPI, without and This means that the SPI implementation of the I also identified that Full code here NOTE: you need My results are:
I'm wondering if: printf("Initializing SPI bus!\n");
ESP_ERROR_CHECK(spi_bus_initialize(
LCD_HOST, &buscfg, SPI_DMA_CH_AUTO)); // Enable the DMA feature This might explain the discrepancy here. @rfuest what do you think? I'm guessing perhaps the DMA feature isn't done when using Rust code? I found a reference to SpiDmaBus but I'm unsure how to actually instantiate this. I suspect it's what's used in the C code with their |
@rfuest @akx Found it! Note I used only The main issues are:
The code for the "fixed up" version is here The result now looks like this:
Which means ~62.5 FPS. I'll keep this issue opened until we release |
Heya,
I'm wondering how to speed things up for an application that will likely need full-screen updates most of the time.
I have an ESP32-C6-DevKitC-1 and a WaveShare 280x240 1.69" LCD module, using
esp-idf-svc
(at present anyway).My current experiment code (please excuse the mess, it's an experiment so far) is
and the interesting (performance) output for a
--release
build is:IOW, 99.2% of the frame time is spent in
fill_contiguous
. Setting the SPI baudrate to something lower than 80 MHz (which, AIUI, is already pushing it especially given my display is behind 10-centimeter DuPont wires 🤠) doesn't change things a lot; blit time becomes about 123547us).Is there something obvious I'm doing wrong for an application like this, where I basically just have a buffer of Rgb565 to push to the screen?
And of course thank you for the work you've put into the library and the ecosystem at large! I was surprised to see things working at all (after, of course, having heeded the big red instructions on WaveShare's wiki and powered the display from 3V3 and not 5V...).
The text was updated successfully, but these errors were encountered: