Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] DSP panic when testing multiple-pause-resume on LNL/MTL-NOCODEC #8651

Closed
keqiaozhang opened this issue Dec 20, 2023 · 10 comments
Closed
Assignees
Labels
bug Something isn't working as expected DSP panic DSP panic observed I2S Applies to I2S bus for codec connection Intel Linux Daily tests This issue can be found in internal Linux daily tests LNL Applies to Lunar Lake platform MTL Applies to Meteor Lake platform multicore Issues observed when not only core#0 is used. P2 Critical bugs or normal features suspend-resume Issues observed when doing system suspend and resume
Milestone

Comments

@keqiaozhang
Copy link
Collaborator

keqiaozhang commented Dec 20, 2023

Describe the bug
Observed this issue on LNL-NOCODEC platform, we had some similar DSP panic issues before, but this one seems to be a new one.
The reproduction rate is about 50%.

dmesg

[ 7757.883706] kernel: snd_sof:sof_ipc4_set_pipeline_state: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc4 set pipeline instance 0 state 4
[ 7757.883713] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc tx      : 0x13000004|0x0: GLB_SET_PIPELINE_STATE
[ 7757.889677] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc rx      : 0x1b0a0000|0x0: GLB_NOTIFICATION|EXCEPTION_CAUGHT
[ 7757.889689] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ------------[ DSP dump start ]------------
[ 7757.893295] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: DSP panic!
[ 7757.894681] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (7)
[ 7757.896605] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ROM status: 0x5, ROM error: 0x0
[ 7757.898418] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ROM debug status: 0x0, ROM debug error: 0x0
[ 7757.900502] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ROM feature bit enabled
[ 7757.902203] kernel: snd_sof:sof_ipc4_find_debug_slot_offset_by_type: sof-audio-pci-intel-lnl 0000:00:1f.3: Slot type 0x4c455400 is not available in debug window
[ 7757.902205] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ------------[ DSP dump end ]------------
[ 7757.904205] kernel: snd_sof:sof_set_fw_state: sof-audio-pci-intel-lnl 0000:00:1f.3: fw_state change: 7 -> 8
[ 7757.904234] kernel: snd_sof:sof_ipc4_log_header: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc rx done : 0x1b0a0000|0x0: GLB_NOTIFICATION|EXCEPTION_CAUGHT
[ 7758.384257] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ipc timed out for 0x13000004|0x0
[ 7758.389367] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: Attempting to prevent DSP from entering D3 state to preserve context
[ 7758.389370] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ------------[ IPC dump start ]------------
[ 7758.391423] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: Host IPC initiator: 0x93000004|0x0|0x0, target: 0x1b0a0000|0x0|0x0, ctl: 0x3
[ 7758.394322] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ------------[ IPC dump end ]------------
[ 7758.396364] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: IPC timeout
[ 7758.397840] kernel: sof-audio-pci-intel-lnl 0000:00:1f.3: ASoC: error at soc_dai_trigger on SSP0 Pin: -110
[ 7758.400062] kernel:  Port0: ASoC: error at dpcm_be_dai_trigger on Port0: -110
[ 7758.401650] kernel:  Port0: ASoC: trigger FE cmd: 4 failed: -110

To Reproduce
~/sof-test/test-case/multiple-pause-resume.sh -r 50

Reproduction Rate
About 50%.

Environment

  1. Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).
  2. Name of the topology file
    • Topology: {development/sof-lnl-nocodec.tplg}
  3. Name of the platform(s) on which the bug is observed.
    • Platform: {LNLM_RVP_NOCODEC}

dmesg.txt
mtrace.txt

cc:

@keqiaozhang keqiaozhang added bug Something isn't working as expected P1 Blocker bugs or important features I2S Applies to I2S bus for codec connection Intel Linux Daily tests This issue can be found in internal Linux daily tests DSP panic DSP panic observed LNL Applies to Lunar Lake platform labels Dec 20, 2023
@keqiaozhang keqiaozhang added the multicore Issues observed when not only core#0 is used. label Dec 25, 2023
@tobonex tobonex self-assigned this Jan 9, 2024
@marc-hb
Copy link
Collaborator

marc-hb commented Jan 11, 2024

I found another pause-resume failure with more logs (cause no panic). I don't know whether it's related but just in case:

@abonislawski abonislawski added P2 Critical bugs or normal features and removed P1 Blocker bugs or important features labels Jan 23, 2024
@keqiaozhang
Copy link
Collaborator Author

A reproduction in CI daily test:37267

@keqiaozhang
Copy link
Collaborator Author

@keqiaozhang keqiaozhang added the MTL Applies to Meteor Lake platform label Jan 25, 2024
@keqiaozhang keqiaozhang changed the title [BUG] DSP panic when testing multiple-pause-resume on LNL-NOCODEC [BUG] DSP panic when testing multiple-pause-resume on LNL/MTL-NOCODEC Jan 25, 2024
@wszypelt
Copy link

wszypelt commented Feb 28, 2024

@keqiaozhang does this issue still reproduce?
If so, I would like to ask for fresh logs with payloads from LNL

@keqiaozhang
Copy link
Collaborator Author

The last time I saw this issue in CI was last week, it seems that the reproduction rate is lower than before, but I believe that this issue still exists.

Please refer to:https://sof-ci.ostc.intel.com/#/result/planresultdetail/38161?model=LNLM_RVP_NOCODEC&testcase=multiple-pause-resume-50

@tobonex
Copy link
Contributor

tobonex commented Apr 26, 2024

@fredoh9 @marc-hb Fix merged. This issue was pretty rare (mostly because other unrelated issues make the test fail, and it's also pretty rare by itself), so getting no repro doesn't really prove anything. Yet, I'm pretty sure it should be fixed now. Should we close it anyway and reopen if it somehow happens again?

@lgirdwood lgirdwood added this to the v2.10 milestone Apr 26, 2024
@marc-hb
Copy link
Collaborator

marc-hb commented Apr 26, 2024

I haven't seen any such panic for weeks. That's all I know.

Fix merged

Did you mean #9020?

@marc-hb marc-hb added the suspend-resume Issues observed when doing system suspend and resume label Apr 26, 2024
@tobonex
Copy link
Contributor

tobonex commented May 8, 2024

Did you mean #9020?

Yes

@tobonex
Copy link
Contributor

tobonex commented May 10, 2024

Should be fixed. Closing for now. Reopen if it's observed again.

@tobonex tobonex closed this as completed May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected DSP panic DSP panic observed I2S Applies to I2S bus for codec connection Intel Linux Daily tests This issue can be found in internal Linux daily tests LNL Applies to Lunar Lake platform MTL Applies to Meteor Lake platform multicore Issues observed when not only core#0 is used. P2 Critical bugs or normal features suspend-resume Issues observed when doing system suspend and resume
Projects
None yet
Development

No branches or pull requests

7 participants