Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(espcoredump): Include boot time and unix time in header (IDFGH-12792) #13770

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nebkat
Copy link
Contributor

@nebkat nebkat commented May 9, 2024

Unfinished and untested but creating draft to gather feedback. Will create corresponding patch for esp-coredump python.

Core dumps do not contain any form of timestamp to help identify when a crash occurred.

This patch adds two new fields to the core dump header - boot time and unix time:

  • Boot time (microseconds since boot) is useful in determining whether a crash occurred immediately after boot or many hours in, which can aid in debugging. Observing the core dump data can indirectly provide this information but requires additional steps. In applications with known bugs (e.g. memory leaks) knowing this in advance can save time by reducing unnecessary debugging.
  • Unix time (microseconds since epoch) may provide hints as to why the crash occurred in applications that depend on external inputs, and is useful when organizing files, both on local and remote storage. In my application we copy core dumps to an SD card, but we have no way of knowing whether a particular core dump occurred 1 week or 1 year ago.

Copy link

github-actions bot commented May 9, 2024

Messages
📖 🎉 Good Job! All checks are passing!

👋 Hello nebkat, we appreciate your contribution to this project!


📘 Please review the project's Contributions Guide for key guidelines on code, documentation, testing, and more.

🖊️ Please also make sure you have read and signed the Contributor License Agreement for this project.

Click to see more instructions ...


This automated output is generated by the PR linter DangerJS, which checks if your Pull Request meets the project's requirements and helps you fix potential issues.

DangerJS is triggered with each push event to a Pull Request and modify the contents of this comment.

Please consider the following:
- Danger mainly focuses on the PR structure and formatting and can't understand the meaning behind your code or changes.
- Danger is not a substitute for human code reviews; it's still important to request a code review from your colleagues.
- To manually retry these Danger checks, please navigate to the Actions tab and re-run last Danger workflow.

Review and merge process you can expect ...


We do welcome contributions in the form of bug reports, feature requests and pull requests via this public GitHub repository.

This GitHub project is public mirror of our internal git repository

1. An internal issue has been created for the PR, we assign it to the relevant engineer.
2. They review the PR and either approve it or ask you for changes or clarifications.
3. Once the GitHub PR is approved, we synchronize it into our internal git repository.
4. In the internal git repository we do the final review, collect approvals from core owners and make sure all the automated tests are passing.
- At this point we may do some adjustments to the proposed change, or extend it by adding tests or documentation.
5. If the change is approved and passes the tests it is merged into the default branch.
5. On next sync from the internal git repository merged change will appear in this public GitHub repository.

Generated by 🚫 dangerJS against 523d526

@espressif-bot espressif-bot added the Status: Opened Issue is new label May 10, 2024
@github-actions github-actions bot changed the title feat(espcoredump): Include boot time and unix time in header feat(espcoredump): Include boot time and unix time in header (IDFGH-12792) May 10, 2024
@@ -146,6 +146,8 @@ typedef struct _core_dump_header_t {
uint32_t tcb_sz; /*!< Size of a TCB, in bytes */
uint32_t mem_segs_num; /*!< Number of memory segments */
uint32_t chip_rev; /*!< Chip revision */
uint64_t boot_time; /*!< Boot time, in microseconds */
uint64_t unix_time; /*!< Unix time, in microseconds */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid changing the header format (which requires changing the COREDUMP_VERSION...), could we keep this information in an ELF info segment? That way, the change would not cause compatibility issues even if an older version of esp-coredump package is used to decode the new core dump binary.

(Of course, this feature would be specific to the ELF format, but BIN format is present for backwards compatibility only, so not adding the feature there is okay.)

#define COREDUMP_VERSION_BIN 0
#define COREDUMP_VERSION_ELF 1
#define COREDUMP_VERSION_BIN 1
#define COREDUMP_VERSION_ELF 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not correct, unfortunately — this changes the definition of COREDUMP_VERSION_BIN_LEGACY a few lines below, meaning that the coredumps in legacy format won't be encoded correctly, and older versions of esp-coredump won't be able to read them.

dump_hdr.boot_time = esp_timer_get_time();

struct timeval tv;
gettimeofday(&tv, NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it is safe to call gettimeofday from the panic handler. Internally, gettimeofday may attempt to acquire a mutex (e.g. here and a couple other places), which in general might not be possible in the panic handler. (Panic handler might be invoked from a high-priority ISR, such as an interrupt WDT ISR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the best alternative I can suggest is to handle this at the application level.

You can designate a static variable with COREDUMP_DRAM_ATTR to make sure it gets saved in the coredump. Then, set up a periodic timer to save the gettimeofday output into that variable. If a crash happens, you will get the latest value of that variable inside the dump.

@igrr
Copy link
Member

igrr commented May 10, 2024

Thanks for the nice improvement idea @nebkat, I left some comments...

cc @erhankur

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Opened Issue is new
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants