Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

St7789 async dma and partial update #593

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

AndrewCapon
Copy link
Contributor

@AndrewCapon AndrewCapon commented Dec 3, 2022

I put in this PR to ask for some help testing this, basically it adds async DMA transfers for the st7789 and also partial updates.

There is a test example for the pico display for testing the new code, I am also waiting for a pico display 2. When that arrives I will port over the example for that.

I do not have the other st7789 Pimoroni displays though so these will not be tested, the PIO route is also not tested.

So is this PR something you guys would like to add in, you can get a nice speedup with these changes?

Can someone help with testing these other displays?

Cheers

Andy

Changes

Added async DMA for st7789. For 565 we don’t have to wait for the SPI transfer now.

Added partial update of display so we can just update a region via SPI.

Async DMA is also implemented for partial update for 565. As the DMA on the pico doesn’t support stride this is done using two DMA channels with chained transfers.

Added ability to destruct graphics and display.

For an overview look at pico_display_async_region.cpp, this is a test example, comments at top showing what is going on.

st7789 changes

Constructors changed to take use_async_dma parameter, defaults to false.

command() changed to take use_async_dma flag.

is_busy() implemented, returns true if waiting for async DMA.

wait_for_update_to_finish() added, this needs to be called if using async dma. Waits for dma to finish and disables the DMA chain.

Fix for display rotating for rotate180.

Added partial_update() implementation.

setup_dma_control_if_needed() added, sets up a second DMA channel for chained transfers.

Examples changes

Added example test code for the pico display: pico_display_async_region.cpp

pico_graphics changes

rect_convert_rgb565() added, like frame_convert_rgb565() but for Rects/Regions.

rect_convert() added for each pen.

create_owned_frame_buffer() added to create frame buffer owned by us

destructor added to free up frame buffer if owned.

types

Rect.equals() added to test equality of Rects.

pico_graphics_pen_1bit changes

Use create_owned_frame_buffer() to create frame buffer if none provided.

pico_graphics_pen_1bitY changes

Use create_owned_frame_buffer() to create frame buffer if none provided.

pico_graphics_pen_3bit changes

Use create_owned_frame_buffer() to create frame buffer if none provided.
Add empty rect_convert()

pico_graphics_pen_P4 changes

Use create_owned_frame_buffer() to create frame buffer if none provided.
Add rect_convert()

pico_graphics_pen_P8 changes

Use create_owned_frame_buffer() to create frame buffer if none provided.
Add rect_convert()

pico_graphics_pen_RGB332 changes

Use create_owned_frame_buffer() to create frame buffer if none provided.
Add rect_convert()

pico_graphics_pen_RGB565 changes

Use create_owned_frame_buffer() to create frame buffer if none provided.

pico_graphics_pen_RGB888 changes

Use create_owned_frame_buffer() to create frame buffer if none provided.

@Gadgetoid
Copy link
Member

This is extremely interesting, thank you! Always eager to push the state of the art with our display updates and see what the Pico is capable of.

I wont get any time to feed back on this until the new year (new baby plus seemingly constant waves of sickness have demolished my productivity and I have a lot of catching up to do) but it might pique the interest of @MichaelBell who's also had some input on our display driving code.

If you want/need other screens to tinker with- not necessarily even to test or with any strings attached- let me know.

@AndrewCapon
Copy link
Contributor Author

AndrewCapon commented Dec 6, 2022

You are always welcome to send me screens :)

The PicoDisplay2 has arrived, haven't unpacked it yet.

I am doing a ILI9341 "Pimoroni" driver at the moment, along with a XMP2046 touch screen driver. The ILI9341 has the same region and async code in, I'm going to also add new additional "pens" for direct updating via display drivers rather than via a frame buffer, this will open up larger displays to be used with the limited memory the pico has.

I don't think you sell any displays based on these but I will put in another PR for this code when I have finished the example test code. Might be useful for some of your other customers.

When I finish this I will add an example for the PicoDisplay2...

p.s. Congratulations on the new baby.

@AndrewCapon
Copy link
Contributor Author

I have added an example for the Pico Display 2 as well.

@AndrewCapon
Copy link
Contributor Author

AndrewCapon commented Dec 9, 2022

I have added the ili9341 and xpt2046 drivers and examples to this PR : #596

@MichaelBell
Copy link
Contributor

This looks awesome, I have also wanted to do partial updates and have some hacky patch to do that, but great to see a suggestion that could be merged. It would be nice to expose the partial update to MicroPython.
Similarly the async DMA is a great option to have - this would give me one less reason to use my own st7789 driver for C++ projects :)

I think the extra up to ~2KB memory allocation for the chaining DMA commands could be an issue - heap RAM is at a premium especially on PicoW builds. Maybe you could just have a single chain CB, and use the DMA complete interrupt to set it back up for each line? Though that might occasionally stall the SPI transfer and I don't know whether the st7789 can cope with that. That said, you haven't exposed this to MicroPython, so maybe it's not such a big deal unless that is done.

Final minor comment: the indenting is inconsistent - I think you need to convert tabs into spaces.

@AndrewCapon
Copy link
Contributor Author

AndrewCapon commented Dec 20, 2022

Your idea of using a single chain is good if the memory is an issue.

What I would like to do is to implement a "minimal memory" display list type rendering, so you push updates to the renderer (rects, points, etc), the renderer just uses a ping pong buffer with 2 scanlines, sorts by updates by Y and intersects the objects with a single scanline which is rendered while the dma is transferring the other scanline until all updates are rendered.

So higher cpu usage but less memory, not so great for whizing things round the screen but ok for UIs. The renderer can then run on the other core. Maybe the first step is just direct to spi pens with no frame buffer at all...

What would be good is a pico with more memory!

Concerning the tabs I used the Pimoroni VSCode workspace and hoped that the tabs would be ok, I will look for some hidden setting somewhere and see what is wrong...

p.s. I have kept away from python for years using the tactic of finding someone else that knows it and getting them to do whatever needs doing, so I can't be much help with that!

@AndrewCapon
Copy link
Contributor Author

I have added a pr: #613 with an idea of how to use direct pens without frame buffers.

@ZodiusInfuser ZodiusInfuser added [- pico graphics library -] c++ This issue or request relates to C++ code needs discussion This issue or request is in need of further discussion labels Jan 20, 2023
@Gadgetoid
Copy link
Member

Your idea of using a single chain is good if the memory is an issue.

For MicroPython, memory is a huge constraint. Pretty much any heap allocation outside of MicroPython's own managed "heap" will either immediately, or eventually, cause an OOM lockup. This is particularly severe on Pico W where the RAM constraints are even tighter and the slightest misstep will cause the MicroPython build to not even finish booting 💀

Any buffers we use, must be possible to allocate using MicroPython's m_new or m_malloc which just allocate a block of its heap and gives you back a pointer. These w ill, IIRC, always be 32-bit aligned but no other alignment is guaranteed. This is why PicoGraphics is absolutely rife with buffers allocated in the MicroPython bindings and passed down the chain-

if(pen_type == PEN_INKY7) {
self->buffer = m_new_class(PSRamDisplay, width, height);
} else {
if (args[ARG_buffer].u_obj != mp_const_none) {
mp_buffer_info_t bufinfo;
mp_get_buffer_raise(args[ARG_buffer].u_obj, &bufinfo, MP_BUFFER_RW);
self->buffer = bufinfo.buf;
if(bufinfo.len < (size_t)(required_size)) {
mp_raise_ValueError("Supplied buffer is too small!");
}
} else {
self->buffer = m_new(uint8_t, required_size);
}
}

Even our C++ classes are allocated in MicroPython's heap using m_new_class which is our own convenience wrapper around C++'s "Placement New"-

switch((PicoGraphicsPenType)pen_type) {
case PEN_1BIT:
if (display == DISPLAY_INKY_PACK) {
self->graphics = m_new_class(PicoGraphics_Pen1BitY, self->display->width, self->display->height, self->buffer);
} else {
self->graphics = m_new_class(PicoGraphics_Pen1Bit, self->display->width, self->display->height, self->buffer);
}
break;
case PEN_3BIT:
self->graphics = m_new_class(PicoGraphics_Pen3Bit, self->display->width, self->display->height, self->buffer);
break;
case PEN_P4:
self->graphics = m_new_class(PicoGraphics_PenP4, self->display->width, self->display->height, self->buffer);
break;
case PEN_P8:
self->graphics = m_new_class(PicoGraphics_PenP8, self->display->width, self->display->height, self->buffer);
break;
case PEN_RGB332:
self->graphics = m_new_class(PicoGraphics_PenRGB332, self->display->width, self->display->height, self->buffer);
break;
case PEN_RGB565:
self->graphics = m_new_class(PicoGraphics_PenRGB565, self->display->width, self->display->height, self->buffer);
break;
case PEN_RGB888:
self->graphics = m_new_class(PicoGraphics_PenRGB888, self->display->width, self->display->height, self->buffer);
break;
case PEN_INKY7:
self->graphics = m_new_class(PicoGraphics_PenInky7, self->display->width, self->display->height, *(IDirectDisplayDriver<uint8_t> *)self->buffer);
break;
default:
break;
}

This is a continuous pain point 😭

@AndrewCapon
Copy link
Contributor Author

AndrewCapon commented Feb 23, 2023

That is all greek to me :)

So are you saying the C/C++ code here has to be limited in order to work with the python stuff, so the python stuff is a wrapper around the C/C++ stuff and the python "runtime" uses up enough ram to limit the c/c++ stuff?

So 2KB of extra memory means the users of the C/C++ stuff don't get the functionality because the python stuff is near the limit?

@Gadgetoid
Copy link
Member

Yep, all of the C++ stuff here (or most of it, anyway) is wrapped directly via MicroPython bindings which are just a thin veneer over the underlying C++ class to handle type conversions, argument checking, throwing errors and all that usual user facing garbage you have to deal with in a non-compiled language.

MicroPython eats RAM for breakfast, so the safest assumption is to not use any RAM the user hasn't supplied, or at least give the user the option and default to allocating if they don't supply a buffer.

We didn't know this when we started out, quite a trial by fire 😬

@AndrewCapon
Copy link
Contributor Author

Ok, Got you.

So are we saying that dma_control_chain_blocks should be able to be passed in after being allocated in python or the whole idea of using the dma chaining needs to be changed?

Set to 1 to use dma interrupts for partial updates instead of control chain.
For Python to remove memory requirements for the control block buffer.
@AndrewCapon
Copy link
Contributor Author

Ok, I just added a define USE_ASYNC_INTERRUPTS.

If this is set to 1 then the control chain is not used but interrupts are used instead.

I guess somehow you can get this define set outside of st7789.hpp depending on if using c++ or python.

@Gadgetoid
Copy link
Member

Thank you.

IIRC from @MichaelBell's musings with the camera, chained DMA requires alignment that MicroPython can't guarantee and I think that might apply here, but I'm not sure. Either way the buffers would need to be allocated by MicroPython's functions, whether those are user-supplied or handled by the bindings is relatively inconsequential.

Setting a define in the respective CMake files for MicroPython vs C++ builds should be straightforward.

I am still in very much over my head here!

@MichaelBell
Copy link
Contributor

The extra alignment was for use with DMA ring buffers, I haven't reviewed in detail but I don't think Andrew is doing that here.

I think I did add some explicit alignment to 4-byte boundary on the camera thing just as belt and braces, but fairly sure that buffers from Python are 4-byte aligned unless it's a user provided buffer and they've given a non-multiple of 4 offset into a larger byte buffer.

@AndrewCapon
Copy link
Contributor Author

AndrewCapon commented Feb 27, 2023

USE_ASYNC_INTERRUPTS=1 turns chains off entirely, it is then all handled in the interrupt handler instead.

Basically it is then a bodge for stride DMA as the pico doesn't have it. So there is a new structure:

struct DMAStride
{
  uint32_t size = 0;
  uint32_t count = 0;
  uint32_t width = 0;
  uint8_t* data = nullptr;
};

The data part points into the frame buffer, the size if the size in bytes of the scanline, count is the number of scanlines and width is the width of the frame buffer in bytes (stride).

This structure is set up and the DMA is done one scanline at a time, when each scanline is complete the interrupt handler sets of the next DMA transfer until all of them are done.

So there are no allocations being made at all.

I think this was Michaels idea to begin with?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[- pico graphics library -] c++ This issue or request relates to C++ code needs discussion This issue or request is in need of further discussion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants