Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locks up if a thread is run on core1: Pico Pi W #27

Open
ayjaym opened this issue Jul 14, 2023 · 20 comments
Open

Locks up if a thread is run on core1: Pico Pi W #27

ayjaym opened this issue Jul 14, 2023 · 20 comments

Comments

@ayjaym
Copy link

ayjaym commented Jul 14, 2023

On Micropython 1.20 if you start a thread on Core1, even if that thread does nothing then when an IRQ is triggered from the rotary encoder on code running on Core0 the whole machine locks up e.g

on core 0

t = start_new_thread(EventManager.Start, (onKeyDown,onKeyUp))
and then some code after that to interact with the encoder using this library, which works perfectly if the thread isn't spawned first.

The thread on core 1 is intended as a class which would handle a touchscreen - I have commented out that code because the failure occurs without it. However the code does work fine on Core1. But then running anything on Core1 even the dummy code above, seems to cause the rotary encoder code to fail. (note: python indentation seems to get lost when I save the issue)

class EventManager:
@staticmethod
def Start(onKeyDown, onKeyUp):
while True:
a = 1

Problem occurs even if the while loop just has a call to time.sleep_ms(1000), it seems like if there is a process running on Core1 then IRQs fail.

@miketeachman
Copy link
Owner

miketeachman commented Aug 8, 2023

I can reproduce the problem using v1.20 firmware using Pin object interrupts (which are also used in the Rotary library). I used your example to create a minimal test to reproduce the failure. The processor sometimes locks up when pin 16 is pulled low, thereby creating an interrupt. Note that lockups don't happen every time, typically only 1 in 3 times when the pin is pulled low. There are many Github issues and discussions about IRQs and how they run in a RP2 multi-threaded program, for example micropython/micropython#10690. I didn't look deeply into these issues to know if you have discovered a new bug, or one of the existing issues already describes the failure you have found.

In any case, this bug needs to be fixed in RP2 firmware before the Rotary library will work in a multi threaded application.

import _thread
from machine import Pin
import time

def pin_callback(p):
    print('callback')

class EventManager:
    @staticmethod
    def Start(onKeyDown, onKeyUp):
        while True:
            a = 1
            time.sleep(1)
            print('core 1')

t = _thread.start_new_thread(EventManager.Start, (1,2))

p16 = Pin(16, Pin.IN, Pin.PULL_UP)
p16.irq(handler=pin_callback, trigger=Pin.IRQ_FALLING)

@ayjaym
Copy link
Author

ayjaym commented Aug 8, 2023 via email

@peterhinch
Copy link

@miketeachman This is a rather extreme repro because core 0 has no running code. With this remedied, and some action on pin 16, the sample runs (link P16-P17):

import _thread
from machine import Pin
import time

def pin_callback(p):
    print('callback')

class EventManager:
    @staticmethod
    def Start(onKeyDown, onKeyUp):
        while True:
            a = 1
            time.sleep(1)
            print('core 1')

t = _thread.start_new_thread(EventManager.Start, (1,2))

p16 = Pin(16, Pin.IN, Pin.PULL_UP)
p16.irq(handler=pin_callback, trigger=Pin.IRQ_FALLING)
p17 = Pin(17, Pin.OUT)
while True:
    time.sleep(1)
    p17(not p17())

Outcome:

<irq>
callback
core 1
core 1
callback
core 1
core 1
callback
core 1
core 1
callback
core 1
core 1
callback
core 1
core 1
callback
core 1
core 1
callback
core 1
core 1
callback
core 1
core 1
callback
...

@ayjaym
Copy link
Author

ayjaym commented Aug 11, 2023

Hmm, well yes although I did see the problem with code actually running on core 0 I'm pretty sure. Let me try a while loop with a sleep as a placeholder and see what happens.

@ayjaym
Copy link
Author

ayjaym commented Aug 11, 2023

Ok, well, this may be a stupid error on my part but the following code, while it does not hang, doesn't work if the encoder library is launched from core 1 - I do not get value changes when it is rotated, but I do of course get switch events because these are polled.

i.e the test harness is

from _thread import *
from encoder import Encoder
import time

def onEncoderSwitchDown():
print ("On Encoder Switch Down")

def onEncoderChange(v):
print ("On Encoder Change" , v)

#t = start_new_thread(Encoder.Start,(onEncoderSwitchDown, onEncoderChange))
Encoder.Start(onEncoderSwitchDown, onEncoderChange)
while True:
Encoder.CheckEncoder()
time.sleep_ms(100)

and the class is below. This of course is uploaded to the /lib folder on the Pico and I'm then running the test harness from Thonny directly.

Obviously when run in core 1 uncomment out the while loop in the Encoder class and remove it from the harness, so that the thread performs the poll to look for an encoder value change.

This is just kinda ripped out of the larger project and in reality I would have polled the encoder in the WLAN module when it's polling for UDP datagrams as the overhead is negligible and the value change will occur via an IRQ. But - and I could have made a stupid error here - this code doesn't seem to process the IRQs on core 1 at all. I get this is different from the original problem of course.

from rotary_irq_rp2 import RotaryIRQ
from machine import Pin
import time

class Encoder:

old_val = None
encsw_val = None
OnEncoderSwitchDown = None
OnEncoderChange = None
encoder = None
encsw = None

@staticmethod
def CheckEncoder():

    v = Encoder.encoder.value()
    print(v)
    sw = Encoder.encsw.value()
    if sw != Encoder.encsw_val:
        Encoder.encsw_val = sw
        if sw == 0:
            Encoder.OnEncoderSwitchDown()
        
    if v != Encoder.old_val:
        Encoder.old_val = v
        Encoder.OnEncoderChange(v)
        
@staticmethod         
def Start(onEncoderSwitchDown, onEncoderChange):
    Encoder.OnEncoderSwitchDown = onEncoderSwitchDown
    Encoder.OnEncoderChange = onEncoderChange
    Encoder.encoder = RotaryIRQ(pin_num_clk=17, 
          pin_num_dt=19, 
          min_val=0, 
          max_val=6,
          pull_up=True,
          reverse=False,            
          range_mode=RotaryIRQ.RANGE_BOUNDED)
          
    Encoder.old_val = Encoder.encoder.value()
    Encoder.encsw = Pin(16, mode=Pin.IN, pull=Pin.PULL_UP)
    Encoder.encsw_val = Encoder.encsw.value()
    #while True:
    #    Encoder.CheckEncoder()
    #    time.sleep_ms(100)

@peterhinch
Copy link

IRQ's are processed on core 0.

In my experience core 1 should be reserved for computationally intensive tasks with great care taken when communicating between cores. I wouldn't expect classes such as sockets to work when shared between cores because the internal state of the class is not designed for a GIL-free environment. Core 1 is ideally suited for running blocking functions in a way which enables the core 0 code to continue running.

I believe there was discussion among the maintainers as to whether core 1 should run GIL-free. My view is to prefer the high performance solution that we have, but it does mean that an appreciation of multiprocessor coding has to be employed. I favour using asyncio for concurrency on core 0 with core 1 being reserved for code that has an absolute need for true concurrency.

The following allows an encoder to run on core 0 with its state being tracked on core 1:

import _thread
from machine import Pin
import time
import uasyncio as asyncio
from primitives import Encoder

position = 0  # Note constraints on shared globals see THREADING.md
change = 0

def core_1():
    while True:
        time.sleep(1)
        print(f"Position = {position} change={change}")

def cb(pos, delta):
    global position, change
    position = pos
    change = delta

async def main():
    t = _thread.start_new_thread(core_1, ())
    px = Pin(16, Pin.IN, Pin.PULL_UP)
    py = Pin(17, Pin.IN, Pin.PULL_UP)
    enc = Encoder(px, py, div=4, callback=cb)  # div matches mechanical detents
    while True:
        await asyncio.sleep(1)

try:
    asyncio.run(main())
finally:
    asyncio.new_event_loop()

Docs on encoder class and on THREADING.md.

@ayjaym
Copy link
Author

ayjaym commented Aug 12, 2023 via email

@ayjaym
Copy link
Author

ayjaym commented Aug 12, 2023 via email

@ayjaym
Copy link
Author

ayjaym commented Aug 15, 2023

I went back to the project today and confirmed that with code actually running on core 0 which is doing the RTP MIDI stuff with the WLAN plus handling the rotary encoder, that if I then have the touchscreen code running on core 1 (and no shared state except the callbacks when a touch event occurs), then an IRQ asserted on core 0 when the encoder is turned DOES lock up the processor. So it wasn't just a scenario where I'm dropping back to the REPL on core 0. I am not entirely sure why this doesn't repro directly but I'm using the Pico W, not sure if this might be material.

I don't think there's anything specific to the core 1 touchscreen code, which does not use any IRQs, it just uses the ADC channels to read the resistive touchscreen and then send events for touch and release.
I guess I'll need to read the encoder by polling and debouncing manually, which is a shame, as your library is such an elegant technical solution, but at least then I can work around all the other threading issues by just running the MIDI stuff on core 0, the touchscreen and OLED and SD card reader stuff ought to be able to run on core 1 (touchscreen code certainly does) and I can then just implement a standard request block protocol queue which core 1 will poll to pick up e.g a request to display a message on the OLED screen etc, so that the MIDI code on core 0 won't jitter.
It's not what I would like but needs must, I have to work within the current limitations of multicore support and just hope at some point Micropython will be a bit more robust in this area. Really appreciate your comments, I learned a lot about internals from reading them!

@peterhinch
Copy link

I'm using the Pico W, not sure if this might be material.

It's not. I used a Pico W to check the above script. The Pico W behaves identically to the Pico because WiFi is handled by a separate chip: a very nice design.

If you're going to write your own encoder code, you might like to read this. Encoders are surprisingly subtle, and there is a lot of nonsense written about them with absurdly complex algorithms. Hint: with the right algorithm you don't need to debounce them.

Here is another approach to design, which keeps all the hardware interfaces on core 0. Use asyncio and my encoder class on core 0. Run an asyncio task which reads the ADC. Whenever there is a significant change, the task puts a pair of readings onto a threadsafe queue. This queue is read by your code on core 1.

hope at some point Micropython will be a bit more robust in this area

I'm not sure that there is a fault in MP. Writing multiprocessor code is notoriously difficult. Subtle bear-traps abound. It's easy to encounter a situation where there is shared state whose existence is hidden. My strong advice is to use asyncio for concurrency on core 0. Keep all the "difficult" interfaces on core 0. Use core 1 only for processes which would block core 0 and follow the guidelines in THREADING.md for inter-core communication.

@ayjaym
Copy link
Author

ayjaym commented Aug 17, 2023

Thanks for your very valuable suggestions. You are quite right that multiprocessor code is tricky. I have spent years working with spinlocks, mutexes, critical sections, memory barriers, atomic operations etc in C# and C++ primarily and I certainly have managed to create some very subtle bugs in the process.

Just for clarity, it does appear for the code below that IRQs asserted on core 0 with code running (i.e not dropping back to the REPL) and with a trivial piece of thread code running which - I assume - can't really expose any kind of multi-threading "beartrap" DO cause the whole processor to lock up. This is 100% reproducible, every time. One movement on the encoder and it locks up.

Why this isn't entirely consistent with some of the other attempts to reproduce the issue I'm not sure, However it may relate to the number and timing of IRQs that occur with the rotary encoder I'm using.

At any event the test harness code works perfectly without the thread created, but locks up as soon as the encoder is turned, if the thread is created as part of the invocation process.

from _thread import *
from encoder import Encoder
import time

def onEncoderSwitchDown():
    print ("On Encoder Switch Down")
    
def onEncoderChange(v):
    print ("On Encoder Change" , v)

def breakit():
    print ("thread start")
    while True:
        time.sleep_ms(100)
        
t = start_new_thread(breakit,()) # this will cause any IRQ later on to lock up the processor
Encoder.Start(onEncoderSwitchDown, onEncoderChange)
while True:
    Encoder.CheckEncoder()
    time.sleep_ms(100)

and the Encoder module is just a wrapper around the standard encoder library

from rotary_irq_rp2 import RotaryIRQ
from machine import Pin
import time

class Encoder:

    old_val = None
    encsw_val = None
    OnEncoderSwitchDown = None
    OnEncoderChange = None
    encoder = None
    encsw = None
    
    @staticmethod
    def CheckEncoder():

        v = Encoder.encoder.value()
        sw = Encoder.encsw.value()
        if sw != Encoder.encsw_val:
            Encoder.encsw_val = sw
            if sw == 0:
                Encoder.OnEncoderSwitchDown()
            
        if v != Encoder.old_val:
            Encoder.old_val = v
            Encoder.OnEncoderChange(v)
            
    @staticmethod         
    def Start(onEncoderSwitchDown, onEncoderChange):
        Encoder.OnEncoderSwitchDown = onEncoderSwitchDown
        Encoder.OnEncoderChange = onEncoderChange
        Encoder.encoder = RotaryIRQ(pin_num_clk=17, 
              pin_num_dt=19, 
              min_val=0, 
              max_val=6,
              pull_up=True,
              reverse=False,            
              range_mode=RotaryIRQ.RANGE_BOUNDED)
              
        Encoder.old_val = Encoder.encoder.value()
        Encoder.encsw = Pin(16, mode=Pin.IN, pull=Pin.PULL_UP)
        Encoder.encsw_val = Encoder.encsw.value()
        #while True:
        #    Encoder.CheckEncoder()
        #    time.sleep_ms(100)

        


@peterhinch
Copy link

Please could you edit this post so that all the code is enclosed in treble backticks. Then the indention is preserved making it easy to read.

@ayjaym
Copy link
Author

ayjaym commented Aug 18, 2023

Sorry, of course, I didn't know about that, have added them. So hopefully this code example shows that it seems to be impossible to use core 1 at all if you want to assert IRQs on Core 0 - at least, with a real rotary encoder connected and using the RotaryIRQ module. Unless of course there's something wrong with my hardware. I have plenty more of these, so I can re-test on a different Pico W if necessary or indeed a standard Pico. If I could get this working I could probably just have the WLAN code and rotary check on core 0 and put the other stuff - which doesn't use IRQs and shouldn't expose any threading issues - on core 1.
If not I could try and get everything working with async and I appreciate your suggestion, but it is a huge shame to have a whole processor core unused - it would really help enormously to be able to move some of the processing to core 1.

@peterhinch
Copy link

So hopefully this code example shows that it seems to be impossible to use core 1 at all if you want to assert IRQs on Core 0

It doesn't really because my sample here shows the opposite: my library uses IRQ's and the sample ran with a real encoder. Rather than trawl through the RotaryIRQ code why not give my library a try?

@ayjaym
Copy link
Author

ayjaym commented Aug 18, 2023

I definitely did not drink enough coffee! Should have looked closer at your code to realise you weren't calling the same module. I will try that and see.

@peterhinch
Copy link

I've had a look at Mike Teachman's driver. It uses a different approach to mine. My driver attempts to minimise the runtime of the ISR's: the ISR code is as small as I could make it. RotaryIRQ has potentially quite a long runtime, especially if the Listener mechanism is used. The reason this matters (in my opinion) is as follows.

Using IRQ's to handle switch contact closures is undesirable because of contact bounce. This can cause arbitrarily short pulses and - worse - invalid logic levels. The risk of the latter can be mitigated with low value pullup resistors which I recommend. Short pulses can cause re-entrancy if the ISR runtime is long. Alas IRQ's are necessary for an encoder interface, hence my approach of minimising ISR runtime.

Another design point is that my callbacks are run from an asyncio context. This entirely decouples the callback from the ISR.

My ISR's run in 36μs on a Pyboard 1.1 and 50μs on ESP32. Note that soft ISR's still run fast, but are subject to long latency. The ISR design aims to allow for potential latency. Hard ISR's are used if available (e.g. on RP2) because latency sets a limit on allowable rotational speed.

To be fair I can't identify a specific reason why RotaryIRQ is failing in a dual-core environment.

Please note that my Switch class does not use IRQ's: normal switch and pushbutton operations are much better handled by asyncio polling.

@ayjaym
Copy link
Author

ayjaym commented Aug 18, 2023

Yes, I have to agree with your analysis and can confirm that indeed your code runs correctly.
I have written time-critical interrupt code previously and indeed you want to be in and out as quickly as possible.
I apologise for not looking more closely at what you were originally proposing, where your comments were that core 1 could track rotary encoder changes on core 0 and I stupidly thought 'yeah but that's not what I want to do' without actually looking in detail at your code.
Now I need to read and digest the whole asyncio library and have a second read through threading.md.
I have a real sense of deja vu in that WAY back in time I used to write TSR (terminate and stay resident) code for MSDOS which definitely hits many of these issues.
I need to think about how I can implement this model in that the project is intended to be relatively easy for the user to customise. So I provide services for writing to the OLED display, reading/writing to the SD card reader, reading the button matrix (which is implemented as I said via buttons on top of a resistive touchscreen, and writing to the underlying WS2812 LED array. Some of these functions will take time although they could be carefully cut up into bits - but cooperative multitasking isn't much fun and I don't really miss Windows 3.x :)
I would ideally like the user to just write code and poll the encoder in their code to get state change callbacks (as in my original code, because it's not time critical, if they don't poll, the IRQs ensure that the state is kept accurate anyway), although the alternative is just to pass a 'YourCode' function in which will be executed regularly, which obviously the current async encoder architecture you have can do. [I can probably set something up around timers of course, need to think on this]
And I have to think carefully about the wireless MIDI stuff, which I had wanted to run on core 1 out of the way. That doesn't look like it'll work for as yet undetermined reasons (possibly IRQs but who knows?)
Still, this is just good architectural design, at least I can see it IS theoretically possible although the caveats in THREADING.md are going to be significant. Alas, I'm spoiled with C# and intrinsically threadsafe collections etc.

I'm not clear whether 'globals' as opposed to say static class-scoped variables hit the same problems with initialisation. Certainly my reasonably complex touchscreen code did seem to run ok on core 1 with the MIDI stuff running on core 0 and I hadn't done anything specific to lock because there's no explicit shared state. But whether that was just luck I don't know.
Most of the stuff I was just wrapping in static classes because there's no need to instantiate anything other than a singleton - there is only one of each of the I/O peripherals and so on. Hence classes give me Separation Of Concerns at least.
Anyway, many thanks. I had not realised the depth of information in this project which is probably unavailable elsewhere so it behooves me to just go through and read everything and experiment.

@peterhinch
Copy link

cooperative multitasking isn't much fun

I think it is. But that's beside the point: it's damn near essential in firmware.

I suspect that your experience is in PC programming rather than firmware. I first got involved with MP nine years ago when it was in its infancy. It's asyncio was useless with a time resolution of 1s. I wrote my own scheduler with ms resolution because, in my experience, cooperative multi tasking is essential in almost all firmware projects. Since then MP has acquired an absolutely excellent asyncio so I abandoned mine long ago.

I urge you to learn asyncio - I wrote the tutorial in an attempt to promote its use.

Tools such as interrupts, timers and threads have vital uses but they bring significant complexity compared to concurrent cooperative tasks. Unfortunately people new to MP look at these tools and think they are the answer to concurrency, using them in quite inappropriate ways. My approach is to use these tools only when the requirement is unachievable with asyncio. I suggest you consider a design based on this principle.

Incidentally I too wrote TSR's but not professionally. I first encountered cooperative scheduling when, in about 198, I took on a professional firmware project from a firm that had gone bust. It was written in Intel 8080 Assembler and was a revelation. I realised I'd spent the previous five years writing firmware the wrong way...

@ayjaym
Copy link
Author

ayjaym commented Aug 19, 2023

Well, yes, I do have quite a bit of firmware experience but that was a long time back working with the Microchip PIC devices with a massive 1K of program storage and 128 bytes of RAM. These modern microcontrollers are orders of magnitude more complex and often not terribly well documented.
I certainly agree though 100% with what you say. Clearly this is a sophisticated and well engineered toolset and not learning it would be just damn stupid on my part, so I need to spend some time getting my head around how things work.

For example in C# async.. await work under the hood by actually using a second thread so that the async routine really returns twice, once to the caller immediately and then a second time on a different thread back to the runtime when the operation actually completes. But in Micropython this must operate a bit differently since you're clearly not going to spawn an actual thread, there being no OS to do this, so the underlying scheduling subsystem etc. is presenting the abstraction of a small cooperative multitasking OS to the caller, I assume, in order to actually implement this. Which is pretty amazing because really this is a small near RTOS you're effectively building, at least, at the task management level - you're obviously deferring other things out to the bare metal like the filesystem - but fortunately there are of course other modules to take care of this.
One good thing is it seems possible to set the Pico clock rate to 250MHz without anything obviously failing - I think this is about as far as you can go before flash clock speed limitations become an issue - but this substantially increases the processing power at a fairly negligible current drain increase as far as I can see. My whole project comes in at 420mA on the USB bus which for everything including 137 LEDs is pretty good. As I said, hardware works perfectly, I'm delighted with it, the whole thing is 3d printed, it's cheap to build, I had just underestimated the software challenge but, that's life. I have learned such a lot from interacting with yourself here and I do appreciate your time, effort and patience. I'll try and get this damn thing done, and released as soon as I can, now.

@peterhinch
Copy link

The fundamental principle behind asyncio is simple (although the implementation is extremely clever). If you understand Python generators you have the gist of how coroutines may be implemented: a coroutine is a special case of a generator. A coroutine can yield at chosen points and resume execution at the instruction after the yield. The scheduler maintains a list of suspended coroutines and transfers execution to them in a way that ensures that each gets a share of the action. While this is highly simplified, there is no "magic" going on. It is pure Python, no threading is involved and there is no true concurrency.*

The code may be viewed here.

*The actual implementation has the task.py module duplicated in C for performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants