Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTA update (TZ-201) #24

Closed
FHFS opened this issue Jul 24, 2023 · 31 comments
Closed

OTA update (TZ-201) #24

FHFS opened this issue Jul 24, 2023 · 31 comments

Comments

@FHFS
Copy link

FHFS commented Jul 24, 2023

Hello,

I'm trying to do a OTA update on the esp-thread-br. I've enabled CONFIG_OPENTHREAD_BR_CREATE_OTA_IMAGE in menuconfig, and the file br_ota_image is generated in the build folder.

I'm serving through a python http server with: python -m http.server -b 192.168.3.47 8070

However when I call ota download http://192.168.3.47:8070/br_ota_image on the esp-thread-br, I get an error. Does else have this problem?
I've tried multiple times, it only downloads the first time after an erase. Is it a flash issue?

> ota download http://192.168.3.47:8070/br_ota_image

I (26815) BR_OTA: Downloading http://192.168.3.47:8070/br_ota_image, RCP target file /rcp_fw/ot_rcp_1/rcp_image
I (29655) BR_OTA: subfile_info: tag 0xff size 84 offset 0
I (29655) BR_OTA: subfile_info: tag 0x0 size 72 offset 84
I (29655) BR_OTA: subfile_info: tag 0x1 size 24 offset 156
I (29655) BR_OTA: subfile_info: tag 0x2 size 20976 offset 180
I (29655) BR_OTA: subfile_info: tag 0x3 size 3072 offset 21156
I (29655) BR_OTA: subfile_info: tag 0x4 size 312896 offset 24228
I (29655) BR_OTA: subfile_info: tag 0x5 size 1577136 offset 337124
I (29665) BR_OTA: Downloaded 1108 bytes
…..
I (34715) BR_OTA: Downloaded 99412 bytes
E (40025) BR_OTA: download_ota_image(258): Failed to write OTA
E (40025) BR_OTA: esp_br_http_ota(292): Failed to download ota image
Failed to download image
I (40025) wifi:state: run -> init (0)
I (40025) wifi:pm stop, total sleep time: 1075574 us / 35919614 us
I (40025) wifi:<ba-del>idx:0, tid:0
I (40025) wifi:new:<6,0>, old:<6,2>, ap:<255,255>, sta:<6,2>, prof:1
I (40075) wifi:flush txq
I (40075) wifi:stop sw txq
I (40075) wifi:lmac stop hw txq
I (40075) wifi:Deinit lldesc rx mblock:10
@github-actions github-actions bot changed the title OTA update OTA update (TZ-201) Jul 24, 2023
@FHFS
Copy link
Author

FHFS commented Jul 28, 2023

Does anyone have any idea how I can fix this? or what is causing this?

@zwx1995esp
Copy link
Collaborator

@FHFS Hi, sorry for the delay, I will go on your issue right now.
I try to reproduce your issue but failed, all things work well on my side. So could you please provide more details for your issue?
1.Do you try to use the OTA step shown in the BR docs? And enable a http server using openssl?
2.You say that it only downloads the first time after an erase. But in your logs, the OTA image tried to be saved on /rcp_fw/ot_rcp_1/rcp_image which might be empty at the first time, if you re-run OTA, it should try to be saved on /rcp_fw/ot_rcp_0/rcp_image.refer to the OTA Image Rollback Mechanism.
3. For your logs, we can see that:

I (34715) BR_OTA: Downloaded 99412 bytes
E (40025) BR_OTA: download_ota_image(258): Failed to write OTA

the timestamp for the first line is 34715ms and the second is 40025ms, it means that after almost 5s sence the last downloaded data from the server. Actually our http client have a default timeout 5s for receiving data from server, if timeout it will return right now. So there might be some network issue(maybe some congestions or network broken)

@FHFS
Copy link
Author

FHFS commented Aug 1, 2023

1.Do you try to use the OTA step shown in the BR docs? And enable a http server using openssl?

No, I am just using a python http server. I will try to follow the guide with openssl, and get back to you.

2.You say that it only downloads the first time after an erase. But in your logs, the OTA image tried to be saved on /rcp_fw/ot_rcp_1/rcp_image which might be empty at the first time, if you re-run OTA, it should try to be saved on /rcp_fw/ot_rcp_0/rcp_image.refer to the OTA Image Rollback Mechanism.

I've included the logs of the first and second time downloading the OTA file:
First time:

I (1659169) OPENTHREAD: Received PIOI (1671149) OPENTHREAD: Received PIO
> ota download http://192.168.2.114:8070/br_ota_image

I (1677139) BR_OTA: Downloading http://192.168.2.114:8070/br_ota_image, RCP target file /rcp_fw/ot_rcp_1/rcp_image

I (1677279) BR_OTA: subfile_info: tag 0xff size 84 offset 0

I (1677279) BR_OTA: subfile_info: tag 0x0 size 72 offset 84

I (1677279) BR_OTA: subfile_info: tag 0x1 size 24 offset 156

I (1677279) BR_OTA: subfile_info: tag 0x2 size 21056 offset

I (1677279) BR_OTA: subfile_info: tag 0x3 size 3072 offset 2
I (1677279) BR_OTA: subfile_info: tag 0x4 size 318416 offset 24308

I (1677289) BR_OTA: subfile_info: tag 0x5 size 1597776 offset 342724

I (1677299) BR_OTA: Downloaded 1108 bytes
....
.... removed some lines showing downloaded byte count
....
I (1682309) BR_OTA: Downloaded 95316 bytes
I (1683129) OPENTHREAD: Received PIO
E (1688509) BR_OTA: download_ota_image(258): Failed to write OTA
E (1688509) BR_OTA: esp_br_http_ota(292): Failed to download ota image
Failed to download imageI (1688509) wifi:state: run -> init (0)
I (1688509) wifi:pm stop, total sleep time: 892964 us / 1684291987 us

I (1688509) wifi:new:<13,0>, old:<13,2>, ap:<255,255>, sta:<13,2>, prof:1
I (1688559) wifi:flush txq
I (1688559) wifi:stop sw txq
I (1688559) wifi:lmac stop hw txq
I (1688559) wifi:Deinit lldesc rx mblock:10

Second time:

I (133318) OPENTHREAD: Received PIO
> ota download I (146318) OPENTHREAD: Received PIO
> ota download http://192.168.2.114:8070/br_ota_image

I (152218) BR_OTA: Downloading http://192.168.2.114:8070/br_ota_image, RCP target file /rcp_fw/ot_rcp_1/rcp_image

E (152278) BR_OTA: download_ota_image(217): Failed to open target file
E (152278) BR_OTA: esp_br_http_ota(292): Failed to download ota image
Failed to download imageI (152278) wifi:state: run -> init (0)
I (152278) wifi:pm stop, total sleep time: 875004 us / 148052536 us

I (152278) wifi:new:<13,0>, old:<13,2>, ap:<255,255>, sta:<13,2>, prof:1
I (152318) wifi:flush txq
I (152318) wifi:stop sw txq
I (152318) wifi:lmac stop hw txq
I (152318) wifi:Deinit lldesc rx mblock:10

So there might be some network issue(maybe some congestions or network broken)

I don't think so, I've used curl and wget to download the file from multiple devices on the same wifi network and the file is downloaded without a problem.

@zwx1995esp
Copy link
Collaborator

For 1. I have tried for almost 20 times, no errors occur and all passed.

For 2. for your test scenario, could you please share your whole test step by step? (maybe start from how to build the example and how to set up a http server and copy which file on the server and everythings related to your test environment, the more details the more better) I want to use your step to reproduce this issue.

@FHFS
Copy link
Author

FHFS commented Aug 2, 2023

I will do a write-up for my scenario. I will get back to you asap.

@zwx1995esp
Copy link
Collaborator

I will do a write-up for my scenario. I will get back to you asap.

Thanks a lot!

@FHFS
Copy link
Author

FHFS commented Aug 3, 2023

I've followed this guide. It does not seem very comprehensive to me.

Lets start with my build setup:

Build command: docker run --rm -v ${workspaceFolder}:/IoT/esp-thread-br -w /IoT/esp-thread-br/examples/basic_thread_border_router espressif/idf:latest@sha256:62387daaa65bfc7dae4f04a4b4c573a592a5a8ecca1a04018357ce5ee64205cb idf.py build

esp-thread-br git commit: 7fd29e7

sdkconfig:

CONFIG_OPENTHREAD_BR_AUTO_UPDATE_RCP=y
CONFIG_OPENTHREAD_BR_CREATE_OTA_IMAGE=y
CONFIG_OPENTHREAD_BR_AUTO_START=y
CONFIG_OPENTHREAD_BR_START_WEB=y

CONFIG_OPENTHREAD_NETWORK_NAME="NEST-PAN-5961"
CONFIG_OPENTHREAD_NETWORK_CHANNEL=19
CONFIG_OPENTHREAD_NETWORK_PANID=0x5961
CONFIG_OPENTHREAD_NETWORK_EXTPANID="1952473fb4d78dda"
CONFIG_OPENTHREAD_NETWORK_MASTERKEY="ab11344873cc3fc3889ab62784e9e05d"
CONFIG_OPENTHREAD_NETWORK_PSKC="104810e2315100afd6bc9215a6bfac53"

CONFIG_EXAMPLE_WIFI_SSID="myrouter"
CONFIG_EXAMPLE_WIFI_PASSWORD="mypassword"

CONFIG_OPENTHREAD_BR_CREATE_OTA_IMAGE creates a OTA file in the build folder. I copy the file to a empty folder where I start the http or https server from the commandline.

http server called from commandline: python -m http.server -b 192.168.3.47 8070

https server called from python script:

import http.server, ssl, socketserver
context = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
context.load_cert_chain("ca_cert.pem","ca_key.pem")
server_address = ("192.168.3.47", 8070)
handler = http.server.SimpleHTTPRequestHandler
with socketserver.TCPServer(server_address, handler) as httpd:
    httpd.socket = context.wrap_socket(httpd.socket, server_side=True)
    httpd.serve_forever()

When generating the pem with: openssl req -x509 -newkey rsa:2048 -keyout ca_key.pem -out ca_cert.pem -days 365-nodes
CommonName: "192.168.3.47"

Calling the script with: python script.py from the folder contain the br_ota_image file.

I can use my browser to go to https://192.168.3.47:8070/ and download the br_ota_image file if I ignore the certificate warning.

It does not matter if I use a HTTP or HTTPS server. The result is the same on my esp-thread-br

I did the same thing again with a breakpoint on components/esp_br_http_ota/src/esp_br_http_ota.c:258. Here I can see r is -1 returned by write_file_for_length. It seems the write operation was retried 5 times. I also noticed the first part of the download goes pretty quickly, and slows down when reaching 70000 bytes and slows down further until 95000b

@zwx1995esp
Copy link
Collaborator

I've followed this guide. It does not seem very comprehensive to me.

Lets start with my build setup:

Build command: docker run --rm -v ${workspaceFolder}:/IoT/esp-thread-br -w /IoT/esp-thread-br/examples/basic_thread_border_router espressif/idf:latest@sha256:62387daaa65bfc7dae4f04a4b4c573a592a5a8ecca1a04018357ce5ee64205cb idf.py build

esp-thread-br git commit: 7fd29e7

sdkconfig:

CONFIG_OPENTHREAD_BR_AUTO_UPDATE_RCP=y
CONFIG_OPENTHREAD_BR_CREATE_OTA_IMAGE=y
CONFIG_OPENTHREAD_BR_AUTO_START=y
CONFIG_OPENTHREAD_BR_START_WEB=y

CONFIG_OPENTHREAD_NETWORK_NAME="NEST-PAN-5961"
CONFIG_OPENTHREAD_NETWORK_CHANNEL=19
CONFIG_OPENTHREAD_NETWORK_PANID=0x5961
CONFIG_OPENTHREAD_NETWORK_EXTPANID="1952473fb4d78dda"
CONFIG_OPENTHREAD_NETWORK_MASTERKEY="ab11344873cc3fc3889ab62784e9e05d"
CONFIG_OPENTHREAD_NETWORK_PSKC="104810e2315100afd6bc9215a6bfac53"

CONFIG_EXAMPLE_WIFI_SSID="myrouter"
CONFIG_EXAMPLE_WIFI_PASSWORD="mypassword"

CONFIG_OPENTHREAD_BR_CREATE_OTA_IMAGE creates a OTA file in the build folder. I copy the file to a empty folder where I start the http or https server from the commandline.

http server called from commandline: python -m http.server -b 192.168.3.47 8070

https server called from python script:

import http.server, ssl, socketserver
context = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
context.load_cert_chain("ca_cert.pem","ca_key.pem")
server_address = ("192.168.3.47", 8070)
handler = http.server.SimpleHTTPRequestHandler
with socketserver.TCPServer(server_address, handler) as httpd:
    httpd.socket = context.wrap_socket(httpd.socket, server_side=True)
    httpd.serve_forever()

When generating the pem with: openssl req -x509 -newkey rsa:2048 -keyout ca_key.pem -out ca_cert.pem -days 365-nodes CommonName: "192.168.3.47"

Calling the script with: python script.py from the folder contain the br_ota_image file.

I can use my browser to go to https://192.168.3.47:8070/ and download the br_ota_image file if I ignore the certificate warning.

It does not matter if I use a HTTP or HTTPS server. The result is the same on my esp-thread-br

I did the same thing again with a breakpoint on components/esp_br_http_ota/src/esp_br_http_ota.c:258. Here I can see r is -1 returned by write_file_for_length. It seems the write operation was retried 5 times. I also noticed the first part of the download goes pretty quickly, and slows down when reaching 70000 bytes and slows down further until 95000b

Copy that, I will try ASAP.

@zwx1995esp
Copy link
Collaborator

Hi, I reproduce this issue, I can investigate this issue now. but it is too hard for me to reproduce it on my side....

@FHFS
Copy link
Author

FHFS commented Aug 8, 2023

Great news you were able to reproduce the issue. Why is it hard to reproduce for you?

Also you were able to "ota download" without problems, which step in my writeup is different from your setup?

@zwx1995esp
Copy link
Collaborator

@FHFS Hi, Sorry for the delay, these days I jumped for another work, and I do not have much more time on it, I synced your issue to @gytxxsy and he will help you to investigate on your issues.

@gytxxsy
Copy link
Contributor

gytxxsy commented Aug 10, 2023

I'll be focusing on this issue recently and will update as soon as I make progress.

@gytxxsy
Copy link
Contributor

gytxxsy commented Aug 11, 2023

I have successfully reproduced this issue, it seems it is a problem caused by insufficient storage.
You can try modifying esp-thread-br/examples/basic_thread_border_router/partition.csv to temporarily fix this issue. Below is an example:

# Name,   Type, SubType, Offset,  Size, Flags
# Note: if you have increased the bootloader size, make sure to update the offsets to avoid overlap
nvs,        data, nvs,      , 0x6000,
otadata,    data, ota,      , 0x2000,
phy_init,   data, phy,      , 0x1000,
ota_0,      app,  ota_0,    , 1500K,
ota_1,      app,  ota_1,    , 1500K,
web_storage,data, spiffs,   , 100K,
rcp_fw,     data, spiffs,   , 820K,

After I use this configuration, I can solve the problem of OTA download failure. I hope this helps you.
In addition, the size of the OTA image I use is different from yours, so this configuration may not be fully applicable to yours, and you may need to make some adjustments.

@FHFS
Copy link
Author

FHFS commented Aug 11, 2023

I've updated the partition table. Got some overflow and decided to shrink by optimizing for size instead of debug with CONFIG_COMPILER_OPTIMIZATION_SIZE=y

The download finishes, but I get stuck at the end. These are my logs:

I (122370) BR_OTA: 1436832/1436832 bytes
I (122380) esp_image: segment 0: paddr=001a0020 vaddr=3c110020 size=4366ch (276076) map
I (122410) esp_image: segment 1: paddr=001e3694 vaddr=3fc96800 size=04f00h ( 20224) 
I (122420) esp_image: segment 2: paddr=001e859c vaddr=4037
I (122420) esp_image: segment 3: paddr=001f0020 vaddr=42000020 size=103f18h (1064728) map
I (122560) esp_image: segment 4: paddr=002f3f40 vaddr=4037ba7c size=0ad50h ( 44368) 
I (122580) esp_image: segment 0: paddr=001a0020 vaddr=3c110020 size=4366ch (276076) map
I (122610) esp_image: segment 1: paddr=001e3694 vaddr=3fc96800 size=04f00h ( 20224)I (122620) esp_image: segment 2: paddr=001e859c vaddr=40374000 size=07a7ch ( 31356) 
I (122620) esp_image: segment 3: paddr=001f0020 vaddr=42000020 size=103f18h (1064728) map
I (122760) esp_image: segment 4: paddr=002f3f40 vaddr=4037ba7c size=0ad50h ( 44368) 
I (122960) wifi:state: run -> init (0)
I (122960) wifi:pm stop, total sleep time: 715764 us / 118751525 us

I (122960) wifi:new:<13,0>, old:<13,2>, ap:<255,255>, sta:<13,2>, prof:1
I (123010) wifi:flush txq
I (123010) wifi:stop sw txq
I (123010) wifi:lmac stop hw txq
I (123010) wifi:Deinit lldesc rx mblock:10

When I restart idf monitor, the esp-thread-br starts up but the image hash is not ok?:

--- idf_monitor on /dev/ttyACM0 115200 ---
--- Quit: Ctrl+] | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0x15 (USB_UART_CHIP_RESET),boot:0x8 (SPI_FAST_FLASH_BOOT)
Saved PC:0x4037ae3a
0x4037ae3a: esp_cpu_wait_for_intr at /home/fhfs/Repos/idf50/components/esp_hw_support/cpu.c:145

SPIWP:0xee
mode:DIO, clock div:2
load:0x3fce3820,len:0x1718
load:0x403c9700,len:0x4
load:0x403c9704,len:0xbec
load:0x403cc700,len:0x2e60
SHA-256 comparison failed:
Calculated: 71089d92facc3c0042909678e8442d671c238a59bdc3b1d83ca670f997cb71ce
Expected: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
Attempting to boot anyway...
entry 0x403c990c
I (45) boot: ESP-IDF v5.2-dev-1948-g02f5b9a898-dirty 2nd stage bootloader
I (46) boot: compile time Aug 11 2023 13:31:58
I (47) boot: Multicore bootloader
I (51) boot: chip revision: v0.1
I (55) boot.esp32s3: Boot SPI Speed : 40MHz
I (59) boot.esp32s3: SPI Mode       : DIO
I (64) boot.esp32s3: SPI Flash Size : 4MB
I (69) boot: Enabling RNG early entropy source...
I (74) boot: Partition Table:
I (78) boot: ## Label            Usage          Type ST Offset   Length
I (85) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (93) boot:  1 otadata          OTA data         01 00 0000f000 00002000
I (100) boot:  2 phy_init         RF data          01 01 00011000 00001000
I (108) boot:  3 ota_0            OTA app          00 10 00020000 00177000
I (115) boot:  4 ota_1            OTA app          00 11 001a0000 00177000
I (123) boot:  5 web_storage      Unknown data     01 82 00317000 00019000
I (130) boot:  6 rcp_fw           Unknown data     01 82 00330000 000cd000
I (138) boot: End of partition table
I (142) esp_image: segment 0: paddr=001a0020 vaddr=3c110020 size=4366ch (276076) map
I (197) esp_image: segment 1: paddr=001e3694 vaddr=3fc96800 size=04f00h ( 20224) load
I (202) esp_image: segment 2: paddr=001e859c vaddr=40374000 size=07a7ch ( 31356) load
I (209) esp_image: segment 3: paddr=001f0020 vaddr=42000020 size=103f18h (1064728) map
I (392) esp_image: segment 4: paddr=002f3f40 vaddr=4037ba7c size=0ad50h ( 44368) load
I (409) boot: Loaded app from partition at offset 0x1a0000
I (409) boot: Disabling RNG early entropy source...
I (410) cpu_start: Multicore app
I (414) cpu_start: Pro cpu up.
I (417) cpu_start: Starting app cpu, entry point is 0x40375334
0x40375334: call_start_cpu1 at /home/fhfs/Repos/idf50/components/esp_system/port/cpu_start.c:170

I (0) cpu_start: App cpu up.
I (433) cpu_start: Pro cpu start user code
I (433) cpu_start: cpu freq: 160000000 Hz
I (433) cpu_start: Application information:
I (434) cpu_start: Project name:     esp_ot_br
I (434) cpu_start: App version:      3f01d8d-dirty
I (434) cpu_start: Compile time:     Aug 11 2023 13:31:53
I (434) cpu_start: ELF file SHA256:  569134500...
I (434) cpu_start: ESP-IDF:          v5.2-dev-1948-g02f5b9a898-dirty
I (435) cpu_start: Min chip rev:     v0.0
I (435) cpu_start: Max chip rev:     v0.99 
I (435) cpu_start: Chip rev:         v0.1
I (435) heap_init: Initializing. RAM available for dynamic allocation:
I (436) heap_init: At 3FCACB00 len 0003CC10 (243 KiB): DRAM
I (436) heap_init: At 3FCE9710 len 00005724 (21 KiB): STACK/DRAM
I (436) heap_init: At 3FCF0000 len 00008000 (32 KiB): DRAM
I (437) heap_init: At 600FE010 len 00001FD8 (7 KiB): RTCRAM
I (438) spi_flash: detected chip: generic
I (439) spi_flash: flash io: dio
I (439) sleep: Configure to isolate all GPIO pins in sleep state
I (440) sleep: Enable automatic switching of GPIO sleep configuration
I (441) app_start: Starting scheduler on CPU0
I (441) app_start: Starting scheduler on CPU1
I (441) main_task: Started on CPU0
I (451) main_task: Calling app_main()
I (551) example_connect: Start example_connect.
I (551) pp: pp rom version: e7ae62f
I (551) net80211: net80211 rom version: e7ae62f

Questions:

  1. Why doesn't the OTA update finish. Or it finishes but does not automatically reboots?
  2. I don't understand why shrinking OTA_0/1 and expanding rcp_fw partitions fixes the first problem with downloading.
  3. Why am I getting the failed SHA256 comparison?

@gytxxsy
Copy link
Contributor

gytxxsy commented Aug 14, 2023

  1. It seems that the OTA has been updated, but there is some failure when connecting to wifi, so it gets stuck.
  2. Because the direct cause of the failure of the firmware update is the lack of rcp_fw space, so it needs to be expanded (reducing OTA_0/1 is to save more space for rcp_fw, because the flash of the BR board is only 4MB ).
  3. I am not clear about this and trying to reproduce this problem.

@gytxxsy
Copy link
Contributor

gytxxsy commented Aug 17, 2023

There are some mistakes in my description above, corrected here:
In 1, the program doesn't get stuck when connecting to wifi, but when the OTA is done and the program restarts.
In this regard, we think that there may be some problems with the OTA image used, which can also explain why the hash is incorrect after restarting.

I've tried many times on my side, but can't reproduce this problem. However, you can make sure that you copied the correct OTA image to the correct server path. I thought this might be of some help.

@FHFS
Copy link
Author

FHFS commented Aug 17, 2023

I just tried it again with the br_ota_image I just generated. Same result. I will try this again with a fresh clone of the repository and let you know the results.

@FHFS
Copy link
Author

FHFS commented Aug 17, 2023

I also tried the same thing with a fresh clone.

  1. I changed the partitions.csv
  2. Changed sdkconfig
git diff examples/basic_thread_border_router/sdkconfig | grep '^[+-][^+-]'
+# CONFIG_BOOTLOADER_VDDSDIO_BOOST_1_8V is not set
-CONFIG_ESPTOOLPY_FLASHFREQ_80M=y
-# CONFIG_ESPTOOLPY_FLASHFREQ_40M is not set
+# CONFIG_ESPTOOLPY_FLASHFREQ_80M is not set
+CONFIG_ESPTOOLPY_FLASHFREQ_40M=y
-CONFIG_ESPTOOLPY_FLASHFREQ="80m"
+CONFIG_ESPTOOLPY_FLASHFREQ="40m"
-# CONFIG_ESPTOOLPY_HEADER_FLASHSIZE_UPDATE is not set
+CONFIG_ESPTOOLPY_HEADER_FLASHSIZE_UPDATE=y
-# CONFIG_OPENTHREAD_BR_CREATE_OTA_IMAGE is not set
-# CONFIG_OPENTHREAD_BR_AUTO_START is not set
-# CONFIG_OPENTHREAD_BR_START_WEB is not set
+CONFIG_OPENTHREAD_BR_CREATE_OTA_IMAGE=y
+CONFIG_OPENTHREAD_BR_AUTO_START=y
+CONFIG_OPENTHREAD_BR_START_WEB=y
-CONFIG_EXAMPLE_WIFI_SSID="myssid"
-CONFIG_EXAMPLE_WIFI_PASSWORD="mypassword"
+CONFIG_EXAMPLE_WIFI_SSID="FreshTomato24"
+CONFIG_EXAMPLE_WIFI_PASSWORD="asdf1234"
-CONFIG_COMPILER_OPTIMIZATION_DEBUG=y
-# CONFIG_COMPILER_OPTIMIZATION_SIZE is not set
+# CONFIG_COMPILER_OPTIMIZATION_DEBUG is not set
+CONFIG_COMPILER_OPTIMIZATION_SIZE=y
-# CONFIG_ESP_HTTPS_OTA_ALLOW_HTTP is not set
+CONFIG_ESP_HTTPS_OTA_ALLOW_HTTP=y
-CONFIG_FREERTOS_TASK_FUNCTION_WRAPPER=y
-CONFIG_OPENTHREAD_NETWORK_NAME="OpenThread-ESP"
-CONFIG_OPENTHREAD_NETWORK_CHANNEL=15
-CONFIG_OPENTHREAD_NETWORK_PANID=0x1234
-CONFIG_OPENTHREAD_NETWORK_EXTPANID="dead00beef00cafe"
-CONFIG_OPENTHREAD_NETWORK_MASTERKEY="00112233445566778899aabbccddeeff"
+CONFIG_OPENTHREAD_NETWORK_NAME="NEST-PAN-5961"
+CONFIG_OPENTHREAD_NETWORK_CHANNEL=19
+CONFIG_OPENTHREAD_NETWORK_PANID=0x5961
+CONFIG_OPENTHREAD_NETWORK_EXTPANID="1952473fb4d78dda"
+CONFIG_OPENTHREAD_NETWORK_MASTERKEY="ab11344873cc3fc3889ab62784e9e05d"
-# CONFIG_OPENTHREAD_COMMISSIONER is not set
-# CONFIG_OPENTHREAD_JOINER is not set
+CONFIG_OPENTHREAD_COMMISSIONER=y
+CONFIG_OPENTHREAD_JOINER=y
-CONFIG_OPTIMIZATION_LEVEL_DEBUG=y
-CONFIG_COMPILER_OPTIMIZATION_LEVEL_DEBUG=y
-CONFIG_COMPILER_OPTIMIZATION_DEFAULT=y
-# CONFIG_OPTIMIZATION_LEVEL_RELEASE is not set
-# CONFIG_COMPILER_OPTIMIZATION_LEVEL_RELEASE is not set
+# CONFIG_OPTIMIZATION_LEVEL_DEBUG is not set
+# CONFIG_COMPILER_OPTIMIZATION_LEVEL_DEBUG is not set
+# CONFIG_COMPILER_OPTIMIZATION_DEFAULT is not set
+CONFIG_OPTIMIZATION_LEVEL_RELEASE=y
+CONFIG_COMPILER_OPTIMIZATION_LEVEL_RELEASE=y
-# CONFIG_OTA_ALLOW_HTTP is not set
+CONFIG_OTA_ALLOW_HTTP=y
  1. esp-idf commit 02f5b9a8982db2d59b593fab4e01ecffd35cfdff
  2. build ot-rcp
  3. build esp-thread-br
  4. serve br_ota_image
  5. flash and monitor esp-thread-br
  6. ota download http://192.168.3.47/br_ota_image

@FHFS
Copy link
Author

FHFS commented Aug 18, 2023

I meant to say that I got stuck at the same part as before

@gytxxsy
Copy link
Contributor

gytxxsy commented Aug 18, 2023

Yes, I got it am trying to reproduce the issue.

@gytxxsy
Copy link
Contributor

gytxxsy commented Aug 18, 2023

I used your config and commit and reproduced the wrong hash. However, the problem of program stuck was not reproduced. After testing, I think it is caused by the macro CONFIG_ESPTOOLPY_HEADER_FLASHSIZE_UPDATE being set to true, you can close it and try building the image again.
To close the macro, you can run idf.py menuconfig and diselect Serial flasher config ---> Detect flash size when flashing bootloader.
In fact, even without using OTA, but directly using idf.py -p PORT flash, the hash value will also be wrong (if you turn on the macro said above).

@FHFS
Copy link
Author

FHFS commented Aug 21, 2023

Indeed, when I disable the FLASHSIZE_UPDATE config, the wrong hash does not show.

I will try to debug the freezing, maybe I will get some hint about what is causing this.

@FHFS
Copy link
Author

FHFS commented Aug 22, 2023

When debugging I just hit esp_restart() and then it is stuck.

I've searched on esp-idf issues, maybe this issue #9887 is related.

In this issue anatol-grabowski mentions creating a task to restart using xTaskCreate. This works for me, but only with a larger stack size, 1024 stack overflows. When I changed the stack size to 4096 it works perfectly.

JustQY mentions esp_system_abort which also works. However using a task to restart seems to me to be the proper way to do this.

@gytxxsy Why do you think I get stuck in esp_restart and you do not? Can you change esp_restart() to a xTaskCreate with stack size 4096?

@gytxxsy
Copy link
Contributor

gytxxsy commented Aug 23, 2023

We usually don't create a new task to call esp_restart. Of course, if you suspect that the problem is caused by stack overflow, it's a good idea to increase the stack of ot_task_worker to test.
Can you give me some more information to reproduce the issue? The information needed are as follows:

  1. idf commit ID and esp-thread-br commit ID.
  2. The modification of the codes(such as partitions.csv I told you serveral days ago).
  3. After fullclean and set target, which configurations did you change via menuconfig?

I think reproducing the issue locally will help me to find the root reason of this issue.

@FHFS
Copy link
Author

FHFS commented Aug 24, 2023

We usually don't create a new task to call esp_restart. Of course, if you suspect that the problem is caused by stack overflow, it's a good idea to increase the stack of ot_task_worker to test.

I increased the stack of the task to 10240, and now it does continue a little bit further. See log:


I (74574) BR_OTA: 1436672/1436912 bytes
I (74594) BR_OTA: 1436912/1436912 bytes
I (74594) esp_image: segment 0: paddr=001a0020 vaddr=3c110020 size=4367ch (276092) map
I (74644) esp_image: segment 1: paddr=001e36a4 vaddr=3fc96800 size=04f00h ( 20224) 
I (74654) esp_image: segment 2: paddr=001e85ac vaddr=40374000 size=07a6ch ( 31340) 
I (74654) esp_image: segment 3: paddr=001f0020 vaddr=42000020 size=103f44h (1064772) map
I (74864) esp_image: segment 4: paddr=002f3f6c vaddr=4037ba6c size=0ad60h ( 44384) 
I (74884) esp_image: segment 0: paddr=001a0020 vaddr=3c110020 size=4367ch (276092) map
I (74934) esp_image: segment 1: paddr=001e36a4 vaddr=3fc96800 size=04f00h ( 20224) 
I (74944) esp_image: segment 2: paddr=001e85ac vaddr=40374000 size=07a6ch ( 31340) 
I (74944) esp_image: segment 3: paddr=001f0020 vaddr=42000020 size=103f44h (1064772) map
I (75154) esp_image: segment 4: paddr=002f3f6c vaddr=4037ba6c size=0ad60h ( 44384) 
Done
I (75344) wifi:state: run -> init (0)
I (75344) wifi:pm stop, total sleep time: 804250 us / 71086740 us

I (75354) wifi:<ba-del>idx:0, tid:0
I (75354) wifi:new:<6,0>, old:<6,2>, ap:<255,255>, sta:<6,2>, prof:1
                                                                           I (75364) wifi:flush txq
I (75364) wifi:stop sw txq
I (75364) wifi:lmac stop hw txq
I (75374) wifi:Deinit lldesc rx mblock:10
ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0xc (RTC_SW_CPU_RST),boot:0x8 (SPI_FAST_FLASH_BOOT)
Saved PC:0x40375971
0x40375971: esp_restart_noos at /home/fhfs/Repos/idf50/components/esp_system/port/soc/esp32s3/system_internal.c:158 (discriminator 1)

SPIWP:0xee
mode:DIO, clock div:2
load:0x3fce3820,len:0x1718
load:0x403c9700,len:0x4
load:0x403c9704,len:0xbec

@FHFS
Copy link
Author

FHFS commented Aug 24, 2023

Can you give me some more information to reproduce the issue? The information needed are as follows:

1. idf commit ID and esp-thread-br commit ID.
2. The modification of the codes(such as partitions.csv I told you serveral days ago).
3. After fullclean and set target, which configurations did you change via menuconfig?

I think reproducing the issue locally will help me to find the root reason of this issue.

  1. idf commit sha: 02f5b9a8982db2d59b593fab4e01ecffd35cfdff, esp-thread-br commit sha: 7fd29e78ca2713c7e7e09a96d60820a0dc6d66ff
  2. Only changes in paritions.csv
  3. After a fullclean I do not do a set target. I use this sdkconfig file

@FHFS
Copy link
Author

FHFS commented Aug 28, 2023

We usually don't create a new task to call esp_restart. Of course, if you suspect that the problem is caused by stack overflow, it's a good idea to increase the stack of ot_task_worker to test.

I increased the stack of the task to 10240, and now it does continue a little bit further. See log:

Here the esp32 did restart but the monitor had some problems reattaching or something.

I think if you make the task stack size configurable with menuconfig or increase it, we can close this issue.

@FHFS
Copy link
Author

FHFS commented Aug 31, 2023

@gytxxsy The increased stack size you suggested fixes the issue, how do you suggest we proceed?

@gytxxsy
Copy link
Contributor

gytxxsy commented Sep 8, 2023

I've tried the configuration you said above, but still can't reproduce the problem. I suggest you use our latest ESP-THREAD-BR Release v1.0-beta to test it (with adding the partitions.csv changes I told you about). Also, do you use the ESP Thread Border Router board v1.2 for test? If you are not using it but other hardware devices, please let me know and I will do analysis from this aspect. Thanks.

@gytxxsy
Copy link
Contributor

gytxxsy commented Sep 8, 2023

@FHFS In addition, basic_thread_border_router is just an example, so its codes are not unchangeable. If you can solve the current problem by modifying the task stack size, you can modify it directly and continue your development.

@chshu
Copy link
Collaborator

chshu commented Sep 8, 2023

No SDK change required, closing.

@chshu chshu closed this as completed Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants