Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In ESP32 core version V3.0.5 speed is 30-50% slower than V2.0.17 with Arduino Nano ESP32. #10397

Closed
1 task done
ednieuw opened this issue Oct 2, 2024 · 14 comments · Fixed by #10428
Closed
1 task done
Assignees
Labels
Area: Performance Issue related to performance problems and improvements Priority: High 🗻 Issues with high priority which needs to be solved first. Status: In Progress Issue is in progress
Milestone

Comments

@ednieuw
Copy link

ednieuw commented Oct 2, 2024

Board

Arduino Nano ESP32

Device Description

Arduino Nano ESP32

Hardware Configuration

Nothing attached

Version

v3.0.4

IDE Name

Arduino IDE 1.0 and 2.0

Operating System

Windows 11

Flash frequency

NA

PSRAM enabled

no

Upload speed

NB Arduino settings in board

Description

I noticed while transferring a large Arduino sketch on an Arduino Nano ESP32 from ESP32 core version 2.0.17 to ESP32 core V3..0.5 the software loop speeds dropped from 220,000 to 169,000 loops per second.
The speed reduction is also with versions 3.0.1 - 3.0.4

With the sketch below the speed reduction is even more and almost 50%.
I did not test the ESP32 Zero and C3-12F board with the ESP32 V2 core but I expect the same results in speed differences

Below the 'Loops per second' when adding one to the loop counter for one second.
No difference noticed using uint64_t or uint32_t

*** Arduino Nano ESP32 board on Arduino ESP32 core V2.0.17
Loops per second: 737744
Loops per second: 734583
Loops per second: 734451
Loops per second: 734451
Loops per second: 734451
Loops per second: 734451

*** Processors on V3.0.5

Arduino Nano ESP32
Loops per second: 380670
Loops per second: 380709
Loops per second: 380726
Loops per second: 380724
Loops per second: 380727

ESP32 S3 Zero
Loops per second: 382485
Loops per second: 382488
Loops per second: 382487
Loops per second: 382487
Loops per second: 382487

ESP32-C3-12F
Loops per second: 125696
Loops per second: 126240
Loops per second: 125695
Loops per second: 126241
Loops per second: 125695
Loops per second: 126240
Loops per second: 125696

Sketch

uint32_t  Loopcounter = 0;
static uint32_t msTick;      

void setup() 
{
 Serial.begin(115200);                     // Setup the serial port to 115200 baud //
 msTick = millis(); 
}

void loop() 
{
 Loopcounter++;
 EverySecondCheck();
}

//--------------------------------------------//
// COMMON Update routine done every second
//--------------------------------------------
void EverySecondCheck(void)
{
 uint32_t msLeap = millis() - msTick;     // 
 if (msLeap >999)                         // Every second enter the loop
 {
  msTick = millis();
 Serial.printf("Loops per second: %u\n",Loopcounter);
 Loopcounter = 0;
 }  
}

Debug Message

NA

Other Steps to Reproduce

NA

I have checked existing issues, online documentation and the Troubleshooting Guide

  • I confirm I have checked existing issues, online documentation and Troubleshooting guide.
@ednieuw ednieuw added the Status: Awaiting triage Issue is waiting for triage label Oct 2, 2024
@VojtechBartoska VojtechBartoska added the Area: Performance Issue related to performance problems and improvements label Oct 2, 2024
@P-R-O-C-H-Y
Copy link
Member

@lucasssvaz Can we compare the performance with the tests you made?

@ednieuw
Copy link
Author

ednieuw commented Oct 2, 2024

Ofcourse? You can also use other ESP32 boards.

@SuGlider
Copy link
Collaborator

SuGlider commented Oct 2, 2024

Ofcourse? You can also use other ESP32 boards.

@P-R-O-C-H-Y is talking about including performance test and report in the Arduino Github CI jobs.
Adding this test case would give us information about execution speed and we will know about it ahead of time.

@SuGlider
Copy link
Collaborator

SuGlider commented Oct 2, 2024

Based on the sketch, it sounds like the difference may be in how FreeRTOS deals with the tasks or there may be more Tasks being executed at the same time in the 3.0.5 version.

It is not exactly the HW or System performance.

Please add code to list all the RTOS running tasks along the loop() execution.
This may lead to the difference.

@lucasssvaz
Copy link
Collaborator

We can add a test to check this but as @SuGlider it is probably related to how the FreeRTOS handles the tasks

@SuGlider
Copy link
Collaborator

SuGlider commented Oct 4, 2024

@ednieuw - I have tested it. There is a difference as you say. It seems to be the way how the software sets the cores that will run each task. In order to ilustrate it, I have created a sketch that demonstrates how it works within the ESP32-S3 (the same used in ESP32 Nano Board). You may want to change priority, running core, etc

// loop() variables
uint32_t  Loopcounter = 0;
static uint32_t msTick;
// duplicated to run on separated Task
uint32_t  t_Loopcounter = 0;
static uint32_t t_msTick;

#define TASK_PRIO 1 // Lowest = 1 - same as Arduino loop()

void sameLoopTask(void *pvParameters) {
  (void) pvParameters;
  t_msTick = millis();
  while (1) {
    t_Loopcounter++;
    t_EverySecondCheck();
  }
}

void setup()
{
  Serial.begin(115200);                     // Setup the serial port to 115200 baud //
  Serial.printf("\r\nArduino is running on Core %d\r\n", ARDUINO_RUNNING_CORE);
  Serial.println();
  delay(1000);
  msTick = millis();
#if CONFIG_IDF_TARGET_ESP32 || CONFIG_IDF_TARGET_ESP32S3
  uint8_t taskCore = ARDUINO_RUNNING_CORE == 0 ? 1 : 0;  // runs on the other Core
#else
  uint8_t taskCore = 0;
#endif
  xTaskCreatePinnedToCore(
    sameLoopTask  // Task Function
    ,  "Counter Task" // Task Name - text
    ,  2048  // Stack size
    ,  NULL  // When no parameter is used, simply pass NULL
    ,  TASK_PRIO  // Priority - Arduino Setup/Loop Task has priority 1 (lowest)
    ,  NULL // With task handle we will be able to manipulate with this task.
    ,  taskCore // Core on which the task will run 0, 1, tskNO_AFFINITY
  );
}

void loop()
{
  Loopcounter++;
  EverySecondCheck();
}

//--------------------------------------------//
// COMMON Update routine done every second
//--------------------------------------------
void EverySecondCheck(void)
{
  uint32_t msLeap = millis() - msTick;     //
  if (msLeap > 999)                        // Every second enter the loop
  {
    msTick = millis();
    Serial.printf("Loops per second: %ld\n", Loopcounter);
    Loopcounter = 0;
  }
}

void t_EverySecondCheck(void)
{
  uint32_t t_msLeap = millis() - t_msTick;     //
  if (t_msLeap > 999)                        // Every second enter the loop
  {
    delay(1); // feed the dog (WDT)
    Serial.printf("TASK:: Loops per second: %ld\n", t_Loopcounter);
    t_Loopcounter = 0;
    t_msTick = millis();
  }
}

Output using S3 + ESP32 Arduino Core 3.0.5

ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0x1 (POWERON),boot:0x8 (SPI_FAST_FLASH_BOOT)
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3818,len:0x109c
load:0x403c9700,len:0x4
load:0x403c9704,len:0xb50
load:0x403cc700,len:0x2fe4
entry 0x403c98ac

Arduino is running on Core 1

Loops per second: 137496
TASK:: Loops per second: 756376
Loops per second: 137579
TASK:: Loops per second: 756753
Loops per second: 137583
TASK:: Loops per second: 756756
Loops per second: 137581
TASK:: Loops per second: 756753
Loops per second: 137583
TASK:: Loops per second: 756756
Loops per second: 137581
TASK:: Loops per second: 756753
Loops per second: 137583

@SuGlider SuGlider self-assigned this Oct 4, 2024
@TD-er
Copy link
Contributor

TD-er commented Oct 5, 2024

Doesn't the "Arduino" loop() call also do stuff right after the loop() ends?
Like similar code as what is called when calling delay().

@SuGlider
Copy link
Collaborator

SuGlider commented Oct 5, 2024

I have investigated the issue and found the root cause: ESP32 Arduino UART Driver.
The difference from ESP32 Nano and a ESP32 DevKit is that the Nano enables by default the USB TinyUSB CDC port as Serial object. By other hand, ESP32 DevKit uses defauls UART0 as Serial.

In order to replicate the issue in the Nano ESP32, it is necessary to force the UART0 to start:
@ednieuw - Please confirm it. This is the same sketch from this issue + one line that causes the creation of a UART Task that reduces the time slice given to loop() Task.

uint32_t  Loopcounter = 0;
static uint32_t msTick;      

void setup() 
{
 Serial.begin(115200);                     // Setup the serial port to 115200 baud //
 msTick = millis(); 

 Serial0.begin(115200); // THIS Line starts UART Arduino Driver and also starts an additional Task that "eats" CPU time...
}

void loop() 
{
 Loopcounter++;
 EverySecondCheck();
}

//--------------------------------------------//
// COMMON Update routine done every second
//--------------------------------------------
void EverySecondCheck(void)
{
 uint32_t msLeap = millis() - msTick;     // 
 if (msLeap >999)                         // Every second enter the loop
 {
  msTick = millis();
 Serial.printf("Loops per second: %u\n",Loopcounter);
 Loopcounter = 0;
 }  
}

@SuGlider
Copy link
Collaborator

SuGlider commented Oct 5, 2024

Based on the testing I have done here, this is the root cause.
@ednieuw - Thanks for reporting. This has forced me to go deeper into it and found an enhancement for the Arduino Core code.

I'll work on a patch.

@SuGlider SuGlider added this to the 3.0.5 milestone Oct 5, 2024
@SuGlider SuGlider added Status: In Progress Issue is in progress Priority: High 🗻 Issues with high priority which needs to be solved first. and removed Status: Awaiting triage Issue is waiting for triage labels Oct 5, 2024
@SuGlider
Copy link
Collaborator

SuGlider commented Oct 5, 2024

After testing the sketch from the issue, it seems clear that IDF 4.x and 5.x UART Driver used by ESP32 Arduino HardwareSerial is not efficient and causes a huge performance issue.

The evidence is that whenever Serial0.begin() is executed, there is a significative time slot reduction to the main Arduino Task, affecting the time that loop() has to process data.

In Arduino Core 1.0.6, the UART driver was local one, faster and way more efficient.
The solution for this issue is to refactor HardwareSerial and create its own Arduino UART Driver, the same way it is done for SPI.
@me-no-dev - FYI.

@SuGlider
Copy link
Collaborator

SuGlider commented Oct 7, 2024

Doesn't the "Arduino" loop() call also do stuff right after the loop() ends? Like similar code as what is called when calling delay().

The whole issue is, indeed, related to HardwareSerial...
There is a serialEventRun() function that is executed with each loop() execution.
This implements serialEvent() Arduino API. It was supose to do nothing whenever the sketch doesn't declare serialEvent().
But it was executing available() every single time, which does nothig when Serial.begin() (UART0) isn't used.
Whenever Serial.begin() is excuted, available() is executed in every single loop().
This consumes time... apparently, a lot, because it uses FreeRTOS queues and so on.

I managed a way to check if the sketch has declared serialEvent() and avoid executing available(), specially when Serial.begin() for UART is executed.

PR #10428 shall solve the issue here reported.

@SuGlider
Copy link
Collaborator

SuGlider commented Oct 7, 2024

Results with ESP32 (dual core):
before the PR:

Loops per second: 119497
Loops per second: 119496
Loops per second: 119497
Loops per second: 119496
Loops per second: 119497

After the PR:

Loops per second: 765389
Loops per second: 765877
Loops per second: 765974
Loops per second: 765974
Loops per second: 765974

@ednieuw
Copy link
Author

ednieuw commented Oct 7, 2024

I also searched for a delay in hardwareserial but I did not noticed this. It was a lovely bug and in the end an easy solution
Thanks

@ednieuw
Copy link
Author

ednieuw commented Oct 7, 2024

On ESP32 core v2.0.18_rc3 the dual core counter sketch works fine.
Arduino is running on Core 1

Loops per second: 689880
TASK:: Loops per second: 689869
Loops per second: 689946
TASK:: Loops per second: 689491
Loops per second: 689975
TASK:: Loops per second: 689518
Loops per second: 689954
TASK:: Loops per second: 689497
Loops per second: 689955
TASK:: Loops per second: 689496


When choosing the ESP32 core 3.0.5 I loose my COM11 after upload that changes to COM9 ESP32 family when using IDE2.
Pressing reset on the Arduino does not restart the Arduino and only pressing reset twice sets the Arduino Nano esp32 back to COM11 for a new upload. but my COM11 disappears also

Uploading the Blink sketch and the Arduino Nano ESP32 is happy again and my own loopcounter sketch also works fine

Loops per second: 734550
Loops per second: 734551
Loops per second: 734550
Loops per second: 734551
Loops per second: 734552
Loops per second: 734551
Loops per second: 734551

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Performance Issue related to performance problems and improvements Priority: High 🗻 Issues with high priority which needs to be solved first. Status: In Progress Issue is in progress
Projects
6 participants