Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugger reset causes lockup during wifi initialize #268

Closed
alexbohm opened this issue Sep 21, 2023 · 11 comments · Fixed by #390
Closed

Debugger reset causes lockup during wifi initialize #268

alexbohm opened this issue Sep 21, 2023 · 11 comments · Fixed by #390

Comments

@alexbohm
Copy link

While working on a project with the esp32c6, the wifi features, and embassy, I'm hitting an issue while using probe-rs to debug.

On first power up, the esp_wifi::initialize works as I would expect. However, after a reset/restart through the probe-rs debugger in VSCode, esp_wifi::initialize starts to run but then gets stuck in the DefaultInterruptHandler.

Doing some initial investigation, it appears that esp_wifi::initialize gets as far as setup_timer_isr and as soon as the function starts to enable interrupts, it triggers an interrupt and goes into the default interrupt handler loop.

Not sure if anyone has insight into what would cause this but I'm going to see if I can put together an example project to reproduce this.

@alexbohm
Copy link
Author

With a debug build, I was able to get a backtrace. It looks like an Interrupt::WIFI_BB is attempting to get handled:
image

@bjoernQ
Copy link
Contributor

bjoernQ commented Sep 21, 2023

I could imagine this:

  • during the first debug session the interrupt gets asserted and it wasn't handled yet when the reset happens
  • the reset doesn't reset the interrupt pending bit
  • when we enable the interrupt there is no ISR registered yet - so the interrupt triggers but there is no one to handle it. because of that the interrupt isn't de-asserted and keeps triggering

@alexbohm
Copy link
Author

Thanks! That sounds like it could be what's happening. Do you know off hand if there's a good way of clearing the interrupts/resetting the modem? It's a little hard to find information on these specific interrupts.

I poked around in the svd and idf repo to see what I could find and tried this code before initialize():

{
    log::info!("Resetting modem.");

    // Reset the MODEM_SYSCON?
    unsafe {
        peripherals
            .MODEM_SYSCON
            .modem_rst_conf
            .modify(|_, w| w.bits(0xffffffff));
        peripherals
            .MODEM_SYSCON
            .modem_rst_conf
            .modify(|_, w| w.bits(0x00000000));
    }
    // Reset the MODEM_LPCON?
    unsafe {
        peripherals
            .MODEM_LPCON
            .rst_conf
            .write(|w| w.bits(0xffffffff));
        peripherals
            .MODEM_LPCON
            .rst_conf
            .write(|w| w.bits(0x00000000));
    }

    // Reset the modem subsystem?
    let pcr = unsafe { &*esp32c6_hal::peripherals::PCR::PTR };
    pcr.modem_apb_conf.modify(|_, w| w.modem_rst_en().set_bit());
    pcr.modem_apb_conf
        .modify(|_, w| w.modem_rst_en().clear_bit());
}
let timer = SystemTimer::new(peripherals.SYSTIMER).alarm0;

let init = initialize(
    EspWifiInitFor::Wifi,
    timer,
    Rng::new(peripherals.RNG),
    system.radio_clock_control,
    &clocks,
)
.unwrap();

After adding the reset code, I can still reproduce the issue in a release build, but it seems to make a debug build not hit the issue?

Tomorrow, I'm going to see if I can get one of the examples to reproduce this.

@bjoernQ
Copy link
Contributor

bjoernQ commented Sep 22, 2023

Interesting it's working in debug mode. The real peripherals are private so I don't know a safe way to disable the interrupts for them

I'm usually still using OpoenOCD and reset via jumping to the start of the ROM code and haven't seen any problems with that - don't know how the reset is performed by probe-rs and if it's configurable. Maybe worth a try

Probably the safest thing to do would be if we would only enable the interrupt if the ISR callback is set. Would require some refactoring but not too bad I guess

@alexbohm
Copy link
Author

I do seem to be hitting the same issue with the embassy_dhcp example. Again, it seems like an unhandled interrupt is being raised inside initialize() after a probe-rs reset.

I couldn't get a debug build of the example to run, seems to be hitting an issue in the malloc compat, but that seems like it is a separate issue:

Starting wifi
 
 
!! A panic occured in 'esp-wifi/src/compat/malloc.rs', at line 13, column 14
 
PanicInfo {
    payload: Any { .. },
    message: Some(
        already borrowed: BorrowMutError,
    ),
    location: Location {
        file: "esp-wifi/src/compat/malloc.rs",
        line: 13,
        col: 14,
    },
    can_unwind: true,
    force_no_backtrace: false,
}
 
Backtrace:
 
0x420a5b4a
0x4201bb8c
0x4201b756
0x420198a2
0x42022bb2
0x42022f58
0x42017218
0x4201730c

Here's the launch.json and tasks.json files I used with the embassy_dhcp examples:
https://gist.github.com/alexbohm/c42a979f646d55e5cb0fdec0628f6a2d

Going to poke a bit more on this over the weekend and see what I can figure out.

@bjoernQ
Copy link
Contributor

bjoernQ commented Sep 25, 2023

The later would suggest that the application doesn't fully re-initialize on reset. ie. in the moment it got reset HEAP was borrowed and it wasn't reset on reset

@alexbohm
Copy link
Author

I'm experiencing the malloc error after a full power cycle, but only on a debug build. A release build seems to work so maybe I need to check if something is funky with the opt level on the examples.

@MabezDev
Copy link
Member

I spoke to the hardware team and the following modules are left active through a USB-SERIAL-JTAG reset:

  • TRACE
  • ROOT_CLK_CONTROL
  • JTAG
  • TEE_APM
  • CPU_TIMEOUT
  • AHB_TIMEOUT
  • MEM_MONITOR
  • AHB_BUS
  • ASSIST_DEBUG
  • USB_DEVICE (this is USB serial jtag's internal name)
  • MODEM

The last one on that list, MODEM, is more than likely our issue here. I am however, quite surprised the above code snippet writing to the reset registers didn't fix it. We must be missing something else here. I've spent a little while looking through esp-idf code but haven't found anything glaringly obvious yet.

@MabezDev
Copy link
Member

I seem to have solved the MODEM_TIMEOUT issue by enabling the interrupt and creating a handler like so.

#[cfg(feature = "wifi")]
#[interrupt]
fn MODEM_PERI_TIMEOUT() {
    warn!("MODEM_PERI_TIMEOUT fired");
    let hp = unsafe { &*esp32c6::HP_SYS::PTR };
    hp.modem_peri_timeout_conf().modify(|_, w| w.modem_peri_timeout_int_clear().set_bit());
}

Still unable to solve the WIFI_BB interrupt constantly running.

@MabezDev
Copy link
Member

I don't think WIFI_BB is the actual problem, its WIFI_PWR, in particular there are instances where the WIFI_PWR interrupt triggers before crate::wifi::os_adapter::ISR_INTERRUPT_1 is initialized, meaning the interrupt doesn't actually do anything! WIFI_PWR is a level-based interrupt meaning it won't clear until the peripheral clears it.

Also a confusing note that bit 3 (WIFI_PWR) when printed as number is 4, which may be another source of confusion as to what interrupt is actually firing.

I'm going to try moving the isr enable into the blob API, hopefully it'll work 🤞.

@MabezDev
Copy link
Member

I have a draft PR availabe here: #390, which only enables the ISR once the ISR has been installed by the blobs.

With this patch, I flawlessly flashed my C6 10+ times in a row. Please do try it out if you can, it would be good to see the results.

The esp32c3 seems to work a little differently, at least now it doesn't get stuck in an interrupt loop but often the WiFi API will return 12300 which is ESP_ERR_WIFI_TIMEOUT. Other chips untested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants