-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Locks up if a thread is run on core1: Pico Pi W #27
Comments
I can reproduce the problem using v1.20 firmware using Pin object interrupts (which are also used in the Rotary library). I used your example to create a minimal test to reproduce the failure. The processor sometimes locks up when pin 16 is pulled low, thereby creating an interrupt. Note that lockups don't happen every time, typically only 1 in 3 times when the pin is pulled low. There are many Github issues and discussions about IRQs and how they run in a RP2 multi-threaded program, for example micropython/micropython#10690. I didn't look deeply into these issues to know if you have discovered a new bug, or one of the existing issues already describes the failure you have found. In any case, this bug needs to be fixed in RP2 firmware before the Rotary library will work in a multi threaded application. import _thread
from machine import Pin
import time
def pin_callback(p):
print('callback')
class EventManager:
@staticmethod
def Start(onKeyDown, onKeyUp):
while True:
a = 1
time.sleep(1)
print('core 1')
t = _thread.start_new_thread(EventManager.Start, (1,2))
p16 = Pin(16, Pin.IN, Pin.PULL_UP)
p16.irq(handler=pin_callback, trigger=Pin.IRQ_FALLING) |
Thanks so much, I realised of course this was a general problem and not specific to your library. Frankly I'm very discouraged by the immaturity of multi-core support in Micropython. I have an open source (and open hardware project) which is working brilliantly hardware-wise - it's a fully user-programmable MIDI controller with an OLED display and a matrix of 16x8 RGB buttons with everything 3d printed. The basic hardware is working nicely, the Pico PI's multiple ADC channels made interfacing to the resistive touchscreen which acts as an underlay to the buttons very easy. But I really need both cores working as one will handle MIDI and the other non-time-critical tasks like updating the OLED display and interfacing with the touchscreen and the backing WS2812 led matrixes. Until then I just can't really release it.
So effectively I'm now stalled. Not only do I have this issue but the WLAN stuff doesn't work on core 1 - it hangs on configuration and someone suggested adding the country code. This stopped the hangs but it still doesn't receive UDP packets so my RTPMIDI implementation (which works perfectly on core 0) is also stalled.
I do appreciate the whole endeavour is being run by volunteers but it feels like the RPi Foundation really ought to contribute some support here as well, after all, they're putting these things forward as Micropython-programmable microcontrollers and they have dual cores, so that's kinda important. In any case the ESP32 and other controllers are now also multi-core. I get I could probably get all this working in C++ but I want the users to be able to program custom functionality and Micropython is really essential for the whole concept. What do we do to get these threading issues prioritised?
On Tuesday, 8 August 2023 at 05:24:30 BST, Mike Teachman ***@***.***> wrote:
I can reproduce the problem using Pin object interrupts. I used your example to create a minimal test to reproduce the failure. The processor sometimes locks up when pin 16 is pulled low, thereby creating the interrupt. Note that lockups don't happen every time, typically only 1 in 3 times when the pin is pulled low. There are many Github issues and discussions about IRQs and how they run in a RP2 multi-threaded program, for example micropython/micropython#10690
import _thread
from machine import Pin
import time
def pin_callback(p):
print('callback')
class EventManager:
@staticmethod
def Start(onKeyDown, onKeyUp):
while True:
a = 1
time.sleep(1)
print('core 1')
t = _thread.start_new_thread(EventManager.Start, (1,2))
p16 = Pin(16, Pin.IN, Pin.PULL_UP)
p16.irq(handler=pin_callback, trigger=Pin.IRQ_FALLING)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
@miketeachman This is a rather extreme repro because core 0 has no running code. With this remedied, and some action on pin 16, the sample runs (link P16-P17): import _thread
from machine import Pin
import time
def pin_callback(p):
print('callback')
class EventManager:
@staticmethod
def Start(onKeyDown, onKeyUp):
while True:
a = 1
time.sleep(1)
print('core 1')
t = _thread.start_new_thread(EventManager.Start, (1,2))
p16 = Pin(16, Pin.IN, Pin.PULL_UP)
p16.irq(handler=pin_callback, trigger=Pin.IRQ_FALLING)
p17 = Pin(17, Pin.OUT)
while True:
time.sleep(1)
p17(not p17()) Outcome:
|
Hmm, well yes although I did see the problem with code actually running on core 0 I'm pretty sure. Let me try a while loop with a sleep as a placeholder and see what happens. |
Ok, well, this may be a stupid error on my part but the following code, while it does not hang, doesn't work if the encoder library is launched from core 1 - I do not get value changes when it is rotated, but I do of course get switch events because these are polled. i.e the test harness is from _thread import * def onEncoderSwitchDown(): def onEncoderChange(v): #t = start_new_thread(Encoder.Start,(onEncoderSwitchDown, onEncoderChange)) and the class is below. This of course is uploaded to the /lib folder on the Pico and I'm then running the test harness from Thonny directly. Obviously when run in core 1 uncomment out the while loop in the Encoder class and remove it from the harness, so that the thread performs the poll to look for an encoder value change. This is just kinda ripped out of the larger project and in reality I would have polled the encoder in the WLAN module when it's polling for UDP datagrams as the overhead is negligible and the value change will occur via an IRQ. But - and I could have made a stupid error here - this code doesn't seem to process the IRQs on core 1 at all. I get this is different from the original problem of course. from rotary_irq_rp2 import RotaryIRQ class Encoder:
|
IRQ's are processed on core 0. In my experience core 1 should be reserved for computationally intensive tasks with great care taken when communicating between cores. I wouldn't expect classes such as sockets to work when shared between cores because the internal state of the class is not designed for a GIL-free environment. Core 1 is ideally suited for running blocking functions in a way which enables the core 0 code to continue running. I believe there was discussion among the maintainers as to whether core 1 should run GIL-free. My view is to prefer the high performance solution that we have, but it does mean that an appreciation of multiprocessor coding has to be employed. I favour using The following allows an encoder to run on core 0 with its state being tracked on core 1: import _thread
from machine import Pin
import time
import uasyncio as asyncio
from primitives import Encoder
position = 0 # Note constraints on shared globals see THREADING.md
change = 0
def core_1():
while True:
time.sleep(1)
print(f"Position = {position} change={change}")
def cb(pos, delta):
global position, change
position = pos
change = delta
async def main():
t = _thread.start_new_thread(core_1, ())
px = Pin(16, Pin.IN, Pin.PULL_UP)
py = Pin(17, Pin.IN, Pin.PULL_UP)
enc = Encoder(px, py, div=4, callback=cb) # div matches mechanical detents
while True:
await asyncio.sleep(1)
try:
asyncio.run(main())
finally:
asyncio.new_event_loop() Docs on encoder class and on THREADING.md. |
Thank you very much for the valuable insights.To be clear I had no intention of sharing resources like sockets across cores.My intention was for the wireless rtpmidi code to run on core 1 along with the encoder. This is because the cost of polling the encoder for value changes is minimal.The core 1 code calls back to send available midi data or to accept midi data to transmit, this seems safe as we are just marshalling a 3 byte command and not accessing internal state across core boundaries. Also as you see we have a callback for encoder value change and switch down.Core 0 then runs code which the user will put together from building blocks. There is an oled display, an SD card reader and a touchscreen which provides 137 buttons which are backed by a ws2812 set of addressable LEDs.Code to communicate with these components will all run on core 0. This then allows the time critical midi operations to run on core 1.At some point an in memory data structure will be supplied to the core 1 code to define a sequence of midi events to be transmitted with timestamps. However at present I cannot achieve this. I would much prefer this architecture as the device user only adds custom code to core 0 and so it's much sinpler to understand for them. I had not anticipated the issues I encountered, I am experienced in multithread coding on actual operating systems and appreciate this is a bare metal implementation and I am aware of the need for locking but found that adding locks did not mitigate my initial issues, if I could resolve these I certainly realise the need to put locks around things like array allocations etc but at present the code doesn't do anything like that.
So to summarise ideally I need the wlan subsystem to actually work on core 1 and the rotary encoder to work on core 1. This would make things much simpler for my intended audience. I hope this helps clarify where I was going, architecturally. I appreciate your input very much.
Sent from Yahoo Mail on Android
On Sat, 12 Aug 2023 at 8:42, Peter ***@***.***> wrote:
IRQ's are processed on core 0.
In my experience core 1 should be reserved for computationally intensive tasks with great care taken when communicating between cores. I wouldn't expect classes such as sockets to work when shared between cores because the internal state of the class is not designed for a GIL-free environment. Core 1 is ideally suited for running blocking functions in a way which enables the core 0 code to continue running.
I believe there was discussion among the maintainers as to whether core 1 should run GIL-free. My view is to prefer the high performance solution that we have, but it does mean that an appreciation of multiprocessor coding has to be employed. I favour using asyncio for concurrency on core 0 with core 1 being reserved for code that has an absolute need for true concurrency.
The following allows an encoder to run on core 0 with its state being tracked on core 1:
import _thread
from machine import Pin
import time
import uasyncio as asyncio
from primitives import Encoder
position = 0 # Note constraints on shared globals see THREADING.md
change = 0
def core_1():
while True:
time.sleep(1)
print(f"Position = {position} change={change}")
def cb(pos, delta):
global position, change
position = pos
change = delta
async def main():
t = _thread.start_new_thread(core_1, ())
px = Pin(16, Pin.IN, Pin.PULL_UP)
py = Pin(17, Pin.IN, Pin.PULL_UP)
enc = Encoder(px, py, div=4, callback=cb) # div matches mechanical detents
while True:
await asyncio.sleep(1)
try:
asyncio.run(main())
finally:
asyncio.new_event_loop()
Docs on encoder class and on THREADING.md.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
And your comment on irqs and core 0 led me to Google and find this interesting thread because I said "how did he know that, I don't recall reading that in the docs"
https://github.com/orgs/micropython/discussions/10638It looks like the whole multi threading area is evolving and that right now what I want to do isn't really feasible.Problem is that I don't want the user having to write a loop to capture encoder events on core 0. And I assume that possibly the wlan failures on core 1 may be because irqs aren't being handled in that scenario, as well. Now that the pi zero is becoming available again perhaps I need to redesign around that (probably the zero w2 ideally though I'm not sure about availability). That's a nuisance because I need two ADC channels for the touchscreen so will add significant cost. Or hope that multicore support on the rp2040 might be enhanced and just park this project for a few months. Or get my hands dirty on the code, myself, of course!
Sent from Yahoo Mail on Android
On Sat, 12 Aug 2023 at 8:42, Peter ***@***.***> wrote:
IRQ's are processed on core 0.
In my experience core 1 should be reserved for computationally intensive tasks with great care taken when communicating between cores. I wouldn't expect classes such as sockets to work when shared between cores because the internal state of the class is not designed for a GIL-free environment. Core 1 is ideally suited for running blocking functions in a way which enables the core 0 code to continue running.
I believe there was discussion among the maintainers as to whether core 1 should run GIL-free. My view is to prefer the high performance solution that we have, but it does mean that an appreciation of multiprocessor coding has to be employed. I favour using asyncio for concurrency on core 0 with core 1 being reserved for code that has an absolute need for true concurrency.
The following allows an encoder to run on core 0 with its state being tracked on core 1:
import _thread
from machine import Pin
import time
import uasyncio as asyncio
from primitives import Encoder
position = 0 # Note constraints on shared globals see THREADING.md
change = 0
def core_1():
while True:
time.sleep(1)
print(f"Position = {position} change={change}")
def cb(pos, delta):
global position, change
position = pos
change = delta
async def main():
t = _thread.start_new_thread(core_1, ())
px = Pin(16, Pin.IN, Pin.PULL_UP)
py = Pin(17, Pin.IN, Pin.PULL_UP)
enc = Encoder(px, py, div=4, callback=cb) # div matches mechanical detents
while True:
await asyncio.sleep(1)
try:
asyncio.run(main())
finally:
asyncio.new_event_loop()
Docs on encoder class and on THREADING.md.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
I went back to the project today and confirmed that with code actually running on core 0 which is doing the RTP MIDI stuff with the WLAN plus handling the rotary encoder, that if I then have the touchscreen code running on core 1 (and no shared state except the callbacks when a touch event occurs), then an IRQ asserted on core 0 when the encoder is turned DOES lock up the processor. So it wasn't just a scenario where I'm dropping back to the REPL on core 0. I am not entirely sure why this doesn't repro directly but I'm using the Pico W, not sure if this might be material. I don't think there's anything specific to the core 1 touchscreen code, which does not use any IRQs, it just uses the ADC channels to read the resistive touchscreen and then send events for touch and release. |
It's not. I used a Pico W to check the above script. The Pico W behaves identically to the Pico because WiFi is handled by a separate chip: a very nice design. If you're going to write your own encoder code, you might like to read this. Encoders are surprisingly subtle, and there is a lot of nonsense written about them with absurdly complex algorithms. Hint: with the right algorithm you don't need to debounce them. Here is another approach to design, which keeps all the hardware interfaces on core 0. Use
I'm not sure that there is a fault in MP. Writing multiprocessor code is notoriously difficult. Subtle bear-traps abound. It's easy to encounter a situation where there is shared state whose existence is hidden. My strong advice is to use |
Thanks for your very valuable suggestions. You are quite right that multiprocessor code is tricky. I have spent years working with spinlocks, mutexes, critical sections, memory barriers, atomic operations etc in C# and C++ primarily and I certainly have managed to create some very subtle bugs in the process. Just for clarity, it does appear for the code below that IRQs asserted on core 0 with code running (i.e not dropping back to the REPL) and with a trivial piece of thread code running which - I assume - can't really expose any kind of multi-threading "beartrap" DO cause the whole processor to lock up. This is 100% reproducible, every time. One movement on the encoder and it locks up. Why this isn't entirely consistent with some of the other attempts to reproduce the issue I'm not sure, However it may relate to the number and timing of IRQs that occur with the rotary encoder I'm using. At any event the test harness code works perfectly without the thread created, but locks up as soon as the encoder is turned, if the thread is created as part of the invocation process.
|
Please could you edit this post so that all the code is enclosed in treble backticks. Then the indention is preserved making it easy to read. |
Sorry, of course, I didn't know about that, have added them. So hopefully this code example shows that it seems to be impossible to use core 1 at all if you want to assert IRQs on Core 0 - at least, with a real rotary encoder connected and using the RotaryIRQ module. Unless of course there's something wrong with my hardware. I have plenty more of these, so I can re-test on a different Pico W if necessary or indeed a standard Pico. If I could get this working I could probably just have the WLAN code and rotary check on core 0 and put the other stuff - which doesn't use IRQs and shouldn't expose any threading issues - on core 1. |
It doesn't really because my sample here shows the opposite: my library uses IRQ's and the sample ran with a real encoder. Rather than trawl through the RotaryIRQ code why not give my library a try? |
I definitely did not drink enough coffee! Should have looked closer at your code to realise you weren't calling the same module. I will try that and see. |
I've had a look at Mike Teachman's driver. It uses a different approach to mine. My driver attempts to minimise the runtime of the ISR's: the ISR code is as small as I could make it. RotaryIRQ has potentially quite a long runtime, especially if the Listener mechanism is used. The reason this matters (in my opinion) is as follows. Using IRQ's to handle switch contact closures is undesirable because of contact bounce. This can cause arbitrarily short pulses and - worse - invalid logic levels. The risk of the latter can be mitigated with low value pullup resistors which I recommend. Short pulses can cause re-entrancy if the ISR runtime is long. Alas IRQ's are necessary for an encoder interface, hence my approach of minimising ISR runtime. Another design point is that my callbacks are run from an My ISR's run in 36μs on a Pyboard 1.1 and 50μs on ESP32. Note that soft ISR's still run fast, but are subject to long latency. The ISR design aims to allow for potential latency. Hard ISR's are used if available (e.g. on RP2) because latency sets a limit on allowable rotational speed. To be fair I can't identify a specific reason why RotaryIRQ is failing in a dual-core environment. Please note that my Switch class does not use IRQ's: normal switch and pushbutton operations are much better handled by |
Yes, I have to agree with your analysis and can confirm that indeed your code runs correctly. I'm not clear whether 'globals' as opposed to say static class-scoped variables hit the same problems with initialisation. Certainly my reasonably complex touchscreen code did seem to run ok on core 1 with the MIDI stuff running on core 0 and I hadn't done anything specific to lock because there's no explicit shared state. But whether that was just luck I don't know. |
I think it is. But that's beside the point: it's damn near essential in firmware. I suspect that your experience is in PC programming rather than firmware. I first got involved with MP nine years ago when it was in its infancy. It's I urge you to learn Tools such as interrupts, timers and threads have vital uses but they bring significant complexity compared to concurrent cooperative tasks. Unfortunately people new to MP look at these tools and think they are the answer to concurrency, using them in quite inappropriate ways. My approach is to use these tools only when the requirement is unachievable with Incidentally I too wrote TSR's but not professionally. I first encountered cooperative scheduling when, in about 198, I took on a professional firmware project from a firm that had gone bust. It was written in Intel 8080 Assembler and was a revelation. I realised I'd spent the previous five years writing firmware the wrong way... |
Well, yes, I do have quite a bit of firmware experience but that was a long time back working with the Microchip PIC devices with a massive 1K of program storage and 128 bytes of RAM. These modern microcontrollers are orders of magnitude more complex and often not terribly well documented. For example in C# async.. await work under the hood by actually using a second thread so that the async routine really returns twice, once to the caller immediately and then a second time on a different thread back to the runtime when the operation actually completes. But in Micropython this must operate a bit differently since you're clearly not going to spawn an actual thread, there being no OS to do this, so the underlying scheduling subsystem etc. is presenting the abstraction of a small cooperative multitasking OS to the caller, I assume, in order to actually implement this. Which is pretty amazing because really this is a small near RTOS you're effectively building, at least, at the task management level - you're obviously deferring other things out to the bare metal like the filesystem - but fortunately there are of course other modules to take care of this. |
The fundamental principle behind The code may be viewed here. *The actual implementation has the |
On Micropython 1.20 if you start a thread on Core1, even if that thread does nothing then when an IRQ is triggered from the rotary encoder on code running on Core0 the whole machine locks up e.g
on core 0
t = start_new_thread(EventManager.Start, (onKeyDown,onKeyUp))
and then some code after that to interact with the encoder using this library, which works perfectly if the thread isn't spawned first.
The thread on core 1 is intended as a class which would handle a touchscreen - I have commented out that code because the failure occurs without it. However the code does work fine on Core1. But then running anything on Core1 even the dummy code above, seems to cause the rotary encoder code to fail. (note: python indentation seems to get lost when I save the issue)
class EventManager:
@staticmethod
def Start(onKeyDown, onKeyUp):
while True:
a = 1
Problem occurs even if the while loop just has a call to time.sleep_ms(1000), it seems like if there is a process running on Core1 then IRQs fail.
The text was updated successfully, but these errors were encountered: