Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test GPU (AMD Radeon RX 6700 XT) #222

Open
geerlingguy opened this issue Sep 7, 2021 · 184 comments
Open

Test GPU (AMD Radeon RX 6700 XT) #222

geerlingguy opened this issue Sep 7, 2021 · 184 comments

Comments

@geerlingguy
Copy link
Owner

geerlingguy commented Sep 7, 2021

Working branch: geerlingguy/linux#1

Just received an OEM AMD Radeon RX 6700 XT in the mail. I was able to get it at MSRP+Shipping, which is something of a miracle these days:

DSC02333

DSC02363

I will be interested in seeing what, if anything, the card does when powered up and plugged into the Compute Module 4 IO Board!

The following issues are closely related:

Current steps to get this card working with Pi OS Bookworm

Last updated: 2025-01-03

  1. Clone the Raspberry Pi Linux kernel patching the default Raspberry Pi 6.6.y kernel tree with Coreforge's GPU-enablement patch (or just check out Coreforge's branch directly).
  2. Before compiling the kernel, run make menuconfig and select the options:
    1. Kernel Features > Page Size > 4 KB (for Box86 compatibility)
    2. Kernel Features > Kernel support for 32-bit EL0 > Fix up misaligned multi-word loads and stores in user space
    3. Kernel Features > Fix up misaligned loads and stores from userspace for 64bit code
    4. Device Drivers > Graphics support > AMD GPU (optionally SI/CIK support too)
    5. Device Drivers > Graphics support > Direct Rendering Manager (XFree86 4.1.0 and higher DRI support) > Force Architecture can write-combine memory
  3. Recompile the kernel following Raspberry Pi's instructions
  4. Install the AMD firmware: sudo apt install -y firmware-amd-graphics
  5. Reboot the Pi with the card attached using an appropriate PCIe riser and external ATX power supply.

Confirm everything is working by plugging a monitor into the graphics card; then confirm the card's GPU is in use by running glxinfo -B (part of the mesa-utils package), for example:

$ sudo apt install -y mesa-utils
$ DISPLAY=:0 glxinfo -B
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: AMD (0x1002)
    Device: AMD Radeon RX 6700 XT (navi22, LLVM 15.0.6, DRM 3.54, 6.6.51-v8-16k+) (0x73df)
    Version: 23.2.1
    Accelerated: yes
    Video memory: 12288MB
...

(Prepend DISPLAY=:0 if running commands over SSH.)

@geerlingguy
Copy link
Owner Author

A few notes on drivers from the Twitterverse:

@linux4kix mentioned:

@geerlingguy You will need to use a pre 5.10 kernel for basic Navi on Aarch64. A driver rework needs to be done to fix amdgpu dcn support which was reverted for 5.10. https://lists.freedesktop.org/archives/dri-devel/2021-January/292867.html

@ric96 said:

@geerlingguy Don't forget to use upstream linux-firmware for the correct blob

So yeah... this one could be interesting, and I think my first attempts will be a bit faltering. We'll see.

@geerlingguy
Copy link
Owner Author

geerlingguy commented Sep 8, 2021

pi@cm4:~ $ lspci
00:00.0 PCI bridge: Broadcom Limited Device 2711 (rev 20)
01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Device 1478 (rev c1)
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Device 1479
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 73df (rev c1)
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device ab28

pi@cm4:~ $ sudo lspci -vvvv
...
01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Device 1478 (rev c1) (prog-if 00 [Normal decode])
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 255
	Region 0: Memory at 618200000 (32-bit, non-prefetchable) [disabled] [size=16K]
	Bus: primary=01, secondary=02, subordinate=03, sec-latency=0
	I/O behind bridge: 0000f000-00000fff
	Memory behind bridge: d8000000-d81fffff
	Prefetchable memory behind bridge: 00000000c0000000-00000000d7ffffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [48] Vendor Specific Information: Len=08 <?>
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v2) Upstream Port, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ SlotPowerLimit 0.000W
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed unknown, Width x16, ASPM L1, Exit Latency L0s unlimited, L1 <64us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR+, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
		LnkCtl2: Target Link Speed: Unknown, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [270 v1] #19
	Capabilities: [320 v1] Latency Tolerance Reporting
		Max snoop latency: 0ns
		Max no snoop latency: 0ns
	Capabilities: [400 v1] #25
	Capabilities: [410 v1] #26
	Capabilities: [440 v1] #27

02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Device 1479 (prog-if 00 [Normal decode])
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 255
	Bus: primary=02, secondary=03, subordinate=03, sec-latency=0
	I/O behind bridge: 0000f000-00000fff
	Memory behind bridge: d8000000-d81fffff
	Prefetchable memory behind bridge: 00000000c0000000-00000000d7ffffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v2) Downstream Port (Slot-), MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0
			ExtTag+ RBE+
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed unknown, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
		LnkCtl:	ASPM Disabled; Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed unknown, Width x16, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR+, OBFF Not Supported ARIFwd-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
		LnkCtl2: Target Link Speed: Unknown, EnterCompliance- SpeedDis-, Selectable De-emphasis: -3.5dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [c0] Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 1479
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [270 v1] #19
	Capabilities: [2a0 v1] Access Control Services
		ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+
		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
	Capabilities: [400 v1] #25
	Capabilities: [410 v1] #26
	Capabilities: [440 v1] #27

03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 73df (rev c1) (prog-if 00 [VGA controller])
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0e36
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 255
	Region 0: Memory at 600000000 (64-bit, prefetchable) [disabled] [size=256M]
	Region 2: Memory at 610000000 (64-bit, prefetchable) [disabled] [size=2M]
	Region 4: I/O ports at <unassigned> [disabled]
	Region 5: Memory at 618000000 (32-bit, non-prefetchable) [disabled] [size=1M]
	[virtual] Expansion ROM at 618100000 [disabled] [size=128K]
	Capabilities: [48] Vendor Specific Information: Len=08 <?>
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [64] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed unknown, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed unknown, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
		LnkCtl2: Target Link Speed: Unknown, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [200 v1] #15
	Capabilities: [240 v1] Power Budgeting <?>
	Capabilities: [270 v1] #19
	Capabilities: [2a0 v1] Access Control Services
		ACSCap:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
	Capabilities: [2d0 v1] Process Address Space ID (PASID)
		PASIDCap: Exec+ Priv+, Max PASID Width: 10
		PASIDCtl: Enable- Exec- Priv-
	Capabilities: [320 v1] Latency Tolerance Reporting
		Max snoop latency: 0ns
		Max no snoop latency: 0ns
	Capabilities: [410 v1] #26
	Capabilities: [440 v1] #27

03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device ab28
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device ab28
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin B routed to IRQ 255
	Region 0: Memory at 618120000 (32-bit, non-prefetchable) [disabled] [size=16K]
	Capabilities: [48] Vendor Specific Information: Len=08 <?>
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [64] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed unknown, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed unknown, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [2a0 v1] Access Control Services
		ACSCap:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

@geerlingguy
Copy link
Owner Author

pi@cm4:~ $ dmesg | grep pci
[    1.261278] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
[    1.261305] brcm-pcie fd500000.pcie:   No bus range found for /scb/pcie@7d500000, using [bus 00-ff]
[    1.261373] brcm-pcie fd500000.pcie:      MEM 0x0600000000..0x063fffffff -> 0x00c0000000
[    1.261447] brcm-pcie fd500000.pcie:   IB MEM 0x0000000000..0x00ffffffff -> 0x0400000000
[    1.308507] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC)
[    1.308896] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
[    1.308914] pci_bus 0000:00: root bus resource [bus 00-ff]
[    1.308940] pci_bus 0000:00: root bus resource [mem 0x600000000-0x63fffffff] (bus address [0xc0000000-0xffffffff])
[    1.309028] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
[    1.309262] pci 0000:00:00.0: PME# supported from D0 D3hot
[    1.313103] pci 0000:00:00.0: bridge configuration invalid ([bus ff-ff]), reconfiguring
[    1.313417] pci 0000:01:00.0: [1002:1478] type 01 class 0x060400
[    1.313474] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00003fff]
[    1.313873] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
[    1.313969] pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[    1.317679] pci 0000:01:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.318042] pci 0000:02:00.0: [1002:1479] type 01 class 0x060400
[    1.318515] pci 0000:02:00.0: PME# supported from D0 D3hot D3cold
[    1.322211] pci 0000:02:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    1.322530] pci 0000:03:00.0: [1002:73df] type 00 class 0x030000
[    1.322595] pci 0000:03:00.0: reg 0x10: [mem 0x00000000-0x0fffffff 64bit pref]
[    1.322637] pci 0000:03:00.0: reg 0x18: [mem 0x00000000-0x001fffff 64bit pref]
[    1.322667] pci 0000:03:00.0: reg 0x20: [io  0x0000-0x00ff]
[    1.322695] pci 0000:03:00.0: reg 0x24: [mem 0x00000000-0x000fffff]
[    1.322724] pci 0000:03:00.0: reg 0x30: [mem 0x00000000-0x0001ffff pref]
[    1.323058] pci 0000:03:00.0: PME# supported from D1 D2 D3hot D3cold
[    1.323147] pci 0000:03:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[    1.323306] pci 0000:03:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    1.323421] pci 0000:03:00.1: [1002:ab28] type 00 class 0x040300
[    1.323470] pci 0000:03:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
[    1.323795] pci 0000:03:00.1: PME# supported from D1 D2 D3hot D3cold
[    1.327530] pci_bus 0000:03: busn_res: [bus 03-ff] end is updated to 03
[    1.327555] pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 03
[    1.327576] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 03
[    1.327628] pci 0000:00:00.0: BAR 9: assigned [mem 0x600000000-0x617ffffff 64bit pref]
[    1.327644] pci 0000:00:00.0: BAR 8: assigned [mem 0x618000000-0x6182fffff]
[    1.327665] pci 0000:01:00.0: BAR 9: assigned [mem 0x600000000-0x617ffffff 64bit pref]
[    1.327680] pci 0000:01:00.0: BAR 8: assigned [mem 0x618000000-0x6181fffff]
[    1.327696] pci 0000:01:00.0: BAR 0: assigned [mem 0x618200000-0x618203fff]
[    1.327716] pci 0000:01:00.0: BAR 7: no space for [io  size 0x1000]
[    1.327729] pci 0000:01:00.0: BAR 7: failed to assign [io  size 0x1000]
[    1.327747] pci 0000:02:00.0: BAR 9: assigned [mem 0x600000000-0x617ffffff 64bit pref]
[    1.327761] pci 0000:02:00.0: BAR 8: assigned [mem 0x618000000-0x6181fffff]
[    1.327774] pci 0000:02:00.0: BAR 7: no space for [io  size 0x1000]
[    1.327786] pci 0000:02:00.0: BAR 7: failed to assign [io  size 0x1000]
[    1.327805] pci 0000:03:00.0: BAR 0: assigned [mem 0x600000000-0x60fffffff 64bit pref]
[    1.327844] pci 0000:03:00.0: BAR 2: assigned [mem 0x610000000-0x6101fffff 64bit pref]
[    1.327880] pci 0000:03:00.0: BAR 5: assigned [mem 0x618000000-0x6180fffff]
[    1.327902] pci 0000:03:00.0: BAR 6: assigned [mem 0x618100000-0x61811ffff pref]
[    1.327917] pci 0000:03:00.1: BAR 0: assigned [mem 0x618120000-0x618123fff]
[    1.327936] pci 0000:03:00.0: BAR 4: no space for [io  size 0x0100]
[    1.327949] pci 0000:03:00.0: BAR 4: failed to assign [io  size 0x0100]
[    1.327964] pci 0000:02:00.0: PCI bridge to [bus 03]
[    1.327987] pci 0000:02:00.0:   bridge window [mem 0x618000000-0x6181fffff]
[    1.328007] pci 0000:02:00.0:   bridge window [mem 0x600000000-0x617ffffff 64bit pref]
[    1.328032] pci 0000:01:00.0: PCI bridge to [bus 02-03]
[    1.328053] pci 0000:01:00.0:   bridge window [mem 0x618000000-0x6181fffff]
[    1.328072] pci 0000:01:00.0:   bridge window [mem 0x600000000-0x617ffffff 64bit pref]
[    1.328096] pci 0000:00:00.0: PCI bridge to [bus 01-03]
[    1.328115] pci 0000:00:00.0:   bridge window [mem 0x618000000-0x6182fffff]
[    1.328131] pci 0000:00:00.0:   bridge window [mem 0x600000000-0x617ffffff 64bit pref]
[    1.328349] pci 0000:03:00.1: D0 power state depends on 0000:03:00.0

@geerlingguy
Copy link
Owner Author

While compiling on kernel version 5.10 from the raspberrypi/linux tree, I noticed an error:

  AR      drivers/ptp/built-in.a
  CC [M]  drivers/i2c/busses/i2c-brcmstb.o
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c: In function 'amdgpu_dm_atomic_commit_tail':
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:7757:4: error: implicit declaration of function 'is_hdr_metadata_different'; did you mean 'is_scaling_state_different'? [-Werror=implicit-function-declaration]
    is_hdr_metadata_different(old_con_state, new_con_state);
    ^~~~~~~~~~~~~~~~~~~~~~~~~
    is_scaling_state_different
  CC [M]  drivers/media/i2c/cx25840/cx25840-firmware.o
  CC [M]  drivers/media/i2c/cx25840/cx25840-vbi.o
  AR      drivers/i2c/muxes/built-in.a
...
  LD [M]  drivers/media/dvb-frontends/drxd.o
  LD [M]  drivers/media/dvb-frontends/stv0900.o
  LD [M]  drivers/media/dvb-frontends/cxd2820r.o
  LD [M]  drivers/media/dvb-frontends/drxk.o
make: *** [Makefile:1825: drivers] Error 2

@6by9
Copy link

6by9 commented Sep 8, 2021

Looks like it was missed in raspberrypi/linux@6bd4634 which removed is_hdr_metadata_different for the generic helper function drm_connector_atomic_hdr_metadata_equal.

@geerlingguy
Copy link
Owner Author

geerlingguy commented Sep 10, 2021

2nd Attempt:

  1. Recompiled kernel on rpi-5.14.y branch with AMDGPU selected. Seemed to work.
  2. Copied over to Pi.
  3. Installed sudo apt install -y firmware-amd-graphics
  4. Blacklisted amdgpu via /etc/modprobe.d/blacklist-amdgpu.conf

Rebooting...

@geerlingguy
Copy link
Owner Author

Without the card plugged in, a sudo modprobe amdgpu gets me:

[  431.751110] [drm] amdgpu kernel modesetting enabled.

Now trying with the card plugged in...

@geerlingguy
Copy link
Owner Author

Good news! The Pi doesn't completely lock up and halt now... it errors out then goes back to letting me debug. Makes test cycles oh-so-much-simpler:

In one terminal:

pi@cm4:~ $ sudo modprobe amdgpu

And in the other:

pi@cm4:~ $ dmesg --follow
...
[   83.281692] [drm] amdgpu kernel modesetting enabled.
[   83.282319] pci 0000:00:00.0: enabling device (0000 -> 0002)
[   83.282361] pci 0000:01:00.0: enabling device (0000 -> 0002)
[   83.282398] pci 0000:02:00.0: enabling device (0000 -> 0002)
[   83.282430] amdgpu 0000:03:00.0: enabling device (0000 -> 0002)
[   83.282453] [drm] initializing kernel modesetting (NAVY_FLOUNDER 0x1002:0x73DF 0x1002:0x0E36 0xC1).
[   83.282474] amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[   83.282543] [drm] register mmio base: 0x18000000
[   83.282554] [drm] register mmio size: 1048576
[   83.282578] [drm] PCIE atomic ops is not supported
[   83.284144] [drm] add ip block number 0 <nv_common>
[   83.284150] [drm] add ip block number 1 <gmc_v10_0>
[   83.284373] [drm] add ip block number 2 <navi10_ih>
[   83.284395] [drm] add ip block number 3 <psp>
[   83.284401] [drm] add ip block number 4 <smu>
[   83.284419] [drm] add ip block number 5 <gfx_v10_0>
[   83.284425] [drm] add ip block number 6 <sdma_v5_2>
[   83.284431] [drm] add ip block number 7 <vcn_v3_0>
[   83.284435] [drm] add ip block number 8 <jpeg_v3_0>
[   83.319061] amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from ROM
[   83.319078] amdgpu: ATOM BIOS: 113-D5121100-101
[   83.319115] [drm] VCN(0) decode is enabled in VM mode
[   83.319121] [drm] VCN(0) encode is enabled in VM mode
[   83.319127] [drm] JPEG decode is enabled in VM mode
[   83.319148] [drm] GPU posting now...
[   83.319230] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[   83.319265] amdgpu 0000:03:00.0: BAR 2: releasing [mem 0x610000000-0x6101fffff 64bit pref]
[   83.319275] amdgpu 0000:03:00.0: BAR 0: releasing [mem 0x600000000-0x60fffffff 64bit pref]
[   83.319324] pci 0000:02:00.0: BAR 9: releasing [mem 0x600000000-0x617ffffff 64bit pref]
[   83.319332] pci 0000:01:00.0: BAR 9: releasing [mem 0x600000000-0x617ffffff 64bit pref]
[   83.319343] pci 0000:00:00.0: BAR 9: releasing [mem 0x600000000-0x617ffffff 64bit pref]
[   83.319362] pci 0000:00:00.0: BAR 9: no space for [mem size 0x600000000 64bit pref]
[   83.319369] pci 0000:00:00.0: BAR 9: failed to assign [mem size 0x600000000 64bit pref]
[   83.319378] pci 0000:01:00.0: BAR 9: no space for [mem size 0x600000000 64bit pref]
[   83.319383] pci 0000:01:00.0: BAR 9: failed to assign [mem size 0x600000000 64bit pref]
[   83.319391] pci 0000:02:00.0: BAR 9: no space for [mem size 0x600000000 64bit pref]
[   83.319397] pci 0000:02:00.0: BAR 9: failed to assign [mem size 0x600000000 64bit pref]
[   83.319406] amdgpu 0000:03:00.0: BAR 0: no space for [mem size 0x400000000 64bit pref]
[   83.319411] amdgpu 0000:03:00.0: BAR 0: failed to assign [mem size 0x400000000 64bit pref]
[   83.319419] amdgpu 0000:03:00.0: BAR 2: no space for [mem size 0x00200000 64bit pref]
[   83.319424] amdgpu 0000:03:00.0: BAR 2: failed to assign [mem size 0x00200000 64bit pref]
[   83.319431] pci 0000:00:00.0: PCI bridge to [bus 01-03]
[   83.319442] pci 0000:00:00.0:   bridge window [mem 0x618000000-0x6182fffff]
[   83.319456] pci 0000:00:00.0: PCI bridge to [bus 01-03]
[   83.319465] pci 0000:00:00.0:   bridge window [mem 0x618000000-0x6182fffff]
[   83.319473] pci 0000:00:00.0:   bridge window [mem 0x600000000-0x617ffffff 64bit pref]
[   83.319483] pci 0000:01:00.0: PCI bridge to [bus 02-03]
[   83.319494] pci 0000:01:00.0:   bridge window [mem 0x618000000-0x6181fffff]
[   83.319504] pci 0000:01:00.0:   bridge window [mem 0x600000000-0x617ffffff 64bit pref]
[   83.319517] pci 0000:02:00.0: PCI bridge to [bus 03]
[   83.319529] pci 0000:02:00.0:   bridge window [mem 0x618000000-0x6181fffff]
[   83.319538] pci 0000:02:00.0:   bridge window [mem 0x600000000-0x617ffffff 64bit pref]
[   83.319566] [drm] Not enough PCI address space for a large BAR.
[   83.319573] amdgpu 0000:03:00.0: BAR 0: assigned [mem 0x600000000-0x60fffffff 64bit pref]
[   83.319595] amdgpu 0000:03:00.0: BAR 2: assigned [mem 0x610000000-0x6101fffff 64bit pref]
[   83.319625] amdgpu 0000:03:00.0: amdgpu: VRAM: 12272M 0x0000008000000000 - 0x00000082FEFFFFFF (12272M used)
[   83.319633] amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[   83.319641] amdgpu 0000:03:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[   83.319649] [drm] Detected VRAM RAM=12272M, BAR=256M
[   83.319654] [drm] RAM width 192bits GDDR6
[   83.319767] [drm] amdgpu: 12272M of VRAM memory ready
[   83.319775] [drm] amdgpu: 2845M of GTT memory ready.
[   83.319794] [drm] GART: num cpu pages 131072, num gpu pages 131072
[   83.319943] [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
[   83.322016] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/navy_flounder_sos.bin failed with error -2
[   83.322037] amdgpu 0000:03:00.0: amdgpu: failed to init sos firmware
[   83.322044] [drm:psp_sw_init [amdgpu]] *ERROR* Failed to load psp firmware!
[   83.322472] [drm:amdgpu_device_init [amdgpu]] *ERROR* sw_init of IP block <psp> failed -2
[   83.322795] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_init failed
[   83.322802] amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init
[   83.322808] amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.
[   83.323187] amdgpu: probe of 0000:03:00.0 failed with error -2
[   83.323329] [drm] amdgpu: ttm finalized

@geerlingguy
Copy link
Owner Author

geerlingguy commented Sep 10, 2021

Hmm... firmware-amd-graphics might not include firmware for the RX 6700 XT (see NixOS/nixpkgs#122776), since the card is new enough to not have been packaged in whatever build that package is based on :(

See more: Radeon RX 6700 XT "Navy Flounder" Microcode Lands In Linux-Firmware.Git, and the commit where firmware was added. (Good ol' Phoronix)

@geerlingguy
Copy link
Owner Author

First time doing this (grabbing newer firmware from the linux-firmware repo):

  1. git clone git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
  2. sudo cp linux-firmware/amdgpu/navy_flounder* /lib/firmware/amdgpu
  3. sudo reboot

And now trying again...

@geerlingguy
Copy link
Owner Author

Okay, earlier firmware bug gave me false hope. We're still crashing and burning:

[   85.221462] [drm] amdgpu kernel modesetting enabled.
[   85.221843] pci 0000:00:00.0: enabling device (0000 -> 0002)
[   85.221866] pci 0000:01:00.0: enabling device (0000 -> 0002)
[   85.221886] pci 0000:02:00.0: enabling device (0000 -> 0002)
[   85.221904] amdgpu 0000:03:00.0: enabling device (0000 -> 0002)
[   85.221916] [drm] initializing kernel modesetting (NAVY_FLOUNDER 0x1002:0x73DF 0x1002:0x0E36 0xC1).
[   85.221929] amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[   85.221965] [drm] register mmio base: 0x18000000
[   85.221970] [drm] register mmio size: 1048576
[   85.221984] [drm] PCIE atomic ops is not supported
[   85.223501] [drm] add ip block number 0 <nv_common>
[   85.223508] [drm] add ip block number 1 <gmc_v10_0>
[   85.223513] [drm] add ip block number 2 <navi10_ih>
[   85.223518] [drm] add ip block number 3 <psp>
[   85.223524] [drm] add ip block number 4 <smu>
[   85.223530] [drm] add ip block number 5 <gfx_v10_0>
[   85.223535] [drm] add ip block number 6 <sdma_v5_2>
[   85.223540] [drm] add ip block number 7 <vcn_v3_0>
[   85.223545] [drm] add ip block number 8 <jpeg_v3_0>
[   85.258238] amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from ROM
[   85.258256] amdgpu: ATOM BIOS: 113-D5121100-101
[   85.258293] [drm] VCN(0) decode is enabled in VM mode
[   85.258298] [drm] VCN(0) encode is enabled in VM mode
[   85.258304] [drm] JPEG decode is enabled in VM mode
[   85.258324] [drm] GPU posting now...
[   85.258413] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[   85.258451] amdgpu 0000:03:00.0: BAR 2: releasing [mem 0x610000000-0x6101fffff 64bit pref]
[   85.258461] amdgpu 0000:03:00.0: BAR 0: releasing [mem 0x600000000-0x60fffffff 64bit pref]
[   85.258510] pci 0000:02:00.0: BAR 9: releasing [mem 0x600000000-0x617ffffff 64bit pref]
[   85.258517] pci 0000:01:00.0: BAR 9: releasing [mem 0x600000000-0x617ffffff 64bit pref]
[   85.258524] pci 0000:00:00.0: BAR 9: releasing [mem 0x600000000-0x617ffffff 64bit pref]
[   85.258545] pci 0000:00:00.0: BAR 9: no space for [mem size 0x600000000 64bit pref]
[   85.258551] pci 0000:00:00.0: BAR 9: failed to assign [mem size 0x600000000 64bit pref]
[   85.258560] pci 0000:01:00.0: BAR 9: no space for [mem size 0x600000000 64bit pref]
[   85.258566] pci 0000:01:00.0: BAR 9: failed to assign [mem size 0x600000000 64bit pref]
[   85.258574] pci 0000:02:00.0: BAR 9: no space for [mem size 0x600000000 64bit pref]
[   85.258580] pci 0000:02:00.0: BAR 9: failed to assign [mem size 0x600000000 64bit pref]
[   85.258588] amdgpu 0000:03:00.0: BAR 0: no space for [mem size 0x400000000 64bit pref]
[   85.258594] amdgpu 0000:03:00.0: BAR 0: failed to assign [mem size 0x400000000 64bit pref]
[   85.258601] amdgpu 0000:03:00.0: BAR 2: no space for [mem size 0x00200000 64bit pref]
[   85.258607] amdgpu 0000:03:00.0: BAR 2: failed to assign [mem size 0x00200000 64bit pref]
[   85.258614] pci 0000:00:00.0: PCI bridge to [bus 01-03]
[   85.258624] pci 0000:00:00.0:   bridge window [mem 0x618000000-0x6182fffff]
[   85.258638] pci 0000:00:00.0: PCI bridge to [bus 01-03]
[   85.258647] pci 0000:00:00.0:   bridge window [mem 0x618000000-0x6182fffff]
[   85.258655] pci 0000:00:00.0:   bridge window [mem 0x600000000-0x617ffffff 64bit pref]
[   85.258665] pci 0000:01:00.0: PCI bridge to [bus 02-03]
[   85.258676] pci 0000:01:00.0:   bridge window [mem 0x618000000-0x6181fffff]
[   85.258686] pci 0000:01:00.0:   bridge window [mem 0x600000000-0x617ffffff 64bit pref]
[   85.258699] pci 0000:02:00.0: PCI bridge to [bus 03]
[   85.258710] pci 0000:02:00.0:   bridge window [mem 0x618000000-0x6181fffff]
[   85.258720] pci 0000:02:00.0:   bridge window [mem 0x600000000-0x617ffffff 64bit pref]
[   85.258747] [drm] Not enough PCI address space for a large BAR.
[   85.258754] amdgpu 0000:03:00.0: BAR 0: assigned [mem 0x600000000-0x60fffffff 64bit pref]
[   85.258775] amdgpu 0000:03:00.0: BAR 2: assigned [mem 0x610000000-0x6101fffff 64bit pref]
[   85.258804] amdgpu 0000:03:00.0: amdgpu: VRAM: 12272M 0x0000008000000000 - 0x00000082FEFFFFFF (12272M used)
[   85.258813] amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[   85.258820] amdgpu 0000:03:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[   85.258828] [drm] Detected VRAM RAM=12272M, BAR=256M
[   85.258834] [drm] RAM width 192bits GDDR6
[   85.258945] [drm] amdgpu: 12272M of VRAM memory ready
[   85.258953] [drm] amdgpu: 2845M of GTT memory ready.
[   85.258971] [drm] GART: num cpu pages 131072, num gpu pages 131072
[   85.259113] [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).

@geerlingguy
Copy link
Owner Author

It does seem like it's running out of address space for a large BAR:

[   85.258747] [drm] Not enough PCI address space for a large BAR.
[   85.258828] [drm] Detected VRAM RAM=12272M, BAR=256M

But that doesn't seem to be the issue here.

@geerlingguy
Copy link
Owner Author

geerlingguy commented Sep 10, 2021

Added a few debug lines, and things were a little different!

[  115.560635] [drm] amdgpu: 12272M of VRAM memory ready
[  115.560677] [drm] amdgpu: 2845M of GTT memory ready.
[  115.560718] [drm] GART: num cpu pages 131072, num gpu pages 131072
[  115.560755] DEBUG: Passed gmc_v10_0_hw_init 1069 
[  115.560973] [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
[  115.560984] DEBUG: Passed gmc_v10_0_hw_init 1078 
[  115.587372] amdgpu 0000:03:00.0: amdgpu: PSP runtime database doesn't exist
[  116.615220] ------------[ cut here ]------------
[  116.615231] Firmware transaction timeout
[  116.615282] WARNING: CPU: 3 PID: 37 at drivers/firmware/raspberrypi.c:67 rpi_firmware_transaction+0xdc/0x108
[  116.615301] Modules linked in: amdgpu(+) drm_ttm_helper ttm i2c_algo_bit rfcomm bnep hci_uart btbcm bluetooth ecdh_generic ecc fuse 8021q garp stp llc snd_soc_hdmi_codec brcmfmac brcmutil v3d vc4 cec cfg80211 bcm2835_codec(C) drm_kms_helper gpu_sched rfkill snd_soc_core drm raspberrypi_hwmon v4l2_mem2mem snd_compress snd_bcm2835(C) bcm2835_v4l2(C) drm_panel_orientation_quirks bcm2835_isp(C) videobuf2_vmalloc snd_pcm_dmaengine bcm2835_mmal_vchiq(C) videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common i2c_brcmstb snd_pcm videodev snd_timer dwc2 mc vc_sm_cma(C) snd syscopyarea sysfillrect sysimgblt roles fb_sys_fops backlight rpivid_mem uio_pdrv_genirq uio nvmem_rmem i2c_dev aes_neon_bs sha256_generic aes_neon_blk crypto_simd cryptd ip_tables x_tables ipv6
[  116.615461] CPU: 3 PID: 37 Comm: kworker/3:1 Tainted: G         C        5.14.2-v8+ #1
[  116.615467] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[  116.615472] Workqueue: events dbs_work_handler
[  116.615485] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
[  116.615490] pc : rpi_firmware_transaction+0xdc/0x108
[  116.615495] lr : rpi_firmware_transaction+0xdc/0x108
[  116.615499] sp : ffffffc0117639c0
[  116.615502] x29: ffffffc0117639c0 x28: ffffffc011763d20 x27: 0000000000000000
[  116.615512] x26: ffffff8042fddd00 x25: ffffff80409cdd00 x24: ffffffc011a7e008

Not sure what PSP runtime database doesn't exist means, but the Firmware transaction timeout seems related to the Pi's own firmware?

@geerlingguy
Copy link
Owner Author

geerlingguy commented Sep 10, 2021

Tried: sudo SKIP_KERNEL=1 rpi-update, then rebooted. Now it's just hanging at:

[  115.560984] DEBUG: Passed gmc_v10_0_hw_init 1078 

And the green ACT light on the IO board just stays lit green.

@geerlingguy
Copy link
Owner Author

Trying a few more times, with various debug statements. I can definitely get to gmc_v10_0_hw_init but I'm trying to dig around and see where the code is calling that through the amd_ip_funcs struct.

Anyways, sometimes I get back to:

[   96.885394] amdgpu 0000:03:00.0: amdgpu: PSP runtime database doesn't exist

@geerlingguy
Copy link
Owner Author

geerlingguy commented Sep 10, 2021

Another run with some more debugging:

[   59.061056] [drm] GART: num cpu pages 131072, num gpu pages 131072
[   59.061084] DEBUG: Passed gmc_v10_0_hw_init 1075 
[   59.061216] [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
[   59.061222] DEBUG: Passed gmc_v10_0_hw_init 1084 
[   59.061784] DEBUG: Passed psp_sw_init 250 
[   59.083186] DEBUG: Passed psp_sw_init 266 
[   59.083216] amdgpu 0000:03:00.0: amdgpu: PSP runtime database doesn't exist
[   59.083223] DEBUG: Passed psp_sw_init 289 
[   61.088295] ------------[ cut here ]------------
[   61.088317] Firmware transaction timeout
[   61.088366] WARNING: CPU: 3 PID: 98 at drivers/firmware/raspberrypi.c:67 rpi_firmware_transaction+0xdc/0x108
[   61.088392] Modules linked in: amdgpu(+) drm_ttm_helper ttm i2c_algo_bit rfcomm bnep hci_uart btbcm bluetooth ecdh_generic ecc fuse 8021q garp stp llc snd_soc_hdmi_codec brcmfmac vc4 brcmutil cec v3d drm_kms_helper gpu_sched drm cfg80211 rfkill drm_panel_orientation_quirks bcm2835_codec(C) bcm2835_v4l2(C) bcm2835_isp(C) bcm2835_mmal_vchiq(C) v4l2_mem2mem videobuf2_vmalloc videobuf2_dma_contig raspberrypi_hwmon videobuf2_memops videobuf2_v4l2 snd_soc_core i2c_brcmstb videobuf2_common dwc2 roles videodev snd_compress snd_bcm2835(C) mc snd_pcm_dmaengine vc_sm_cma(C) snd_pcm snd_timer snd syscopyarea sysfillrect sysimgblt fb_sys_fops rpivid_mem backlight uio_pdrv_genirq uio nvmem_rmem i2c_dev aes_neon_bs sha256_generic aes_neon_blk crypto_simd cryptd ip_tables x_tables ipv6
[   61.088679] CPU: 3 PID: 98 Comm: kworker/3:2 Tainted: G         C        5.14.2-v8+ #1
[   61.088690] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[   61.088698] Workqueue: events dbs_work_handler
[   61.088718] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
[   61.088727] pc : rpi_firmware_transaction+0xdc/0x108
[   61.088736] lr : rpi_firmware_transaction+0xdc/0x108
[   61.088744] sp : ffffffc011be39c0
[   61.088749] x29: ffffffc011be39c0 x28: ffffffc011be3d20 x27: 0000000000000000
[   61.088768] x26: ffffff8058594d80 x25: ffffff80409cdd00 x24: ffffffc011a7d008
[   61.088785] x23: 0000000000001000 x22: ffffff80409cdd00 x21: 00000000ffffff92
[   61.088802] x20: ffffffc01146f520 x19: ffffffc0112f8948 x18: 0000000000000000
[   61.088818] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[   61.088833] x14: 0000000000000000 x13: 74756f656d697420 x12: ffffffc0113862c8
[   61.088849] x11: 0000000000000003 x10: ffffffc01136e288 x9 : ffffffc0100e6f00
[   61.088866] x8 : 0000000000017fe8 x7 : c0000000ffffefff x6 : ffffffc011be3650
[   61.088882] x5 : ffffffc0ea7b0000 x4 : 0000000000000000 x3 : 0000000000000001
[   61.088897] x2 : 0000000000000000 x1 : 20ef52a5bc805600 x0 : 0000000000000000
[   61.088913] Call trace:
[   61.088918]  rpi_firmware_transaction+0xdc/0x108
[   61.088926]  rpi_firmware_property_list+0xc0/0x180
[   61.088935]  rpi_firmware_property+0x78/0x110
[   61.088942]  raspberrypi_fw_set_rate+0x5c/0xd8
[   61.088953]  clk_change_rate+0xdc/0x4e8
[   61.088965]  clk_core_set_rate_nolock+0x1e4/0x238
[   61.088975]  clk_set_rate+0x44/0xb8
[   61.088984]  _set_opp+0x230/0x4f8
[   61.088996]  dev_pm_opp_set_rate+0x128/0x190
[   61.089007]  set_target+0x38/0x48

(Hit that same Pi firmware issue, but system is still hard locked up.)

Looks like it might be failing somewhere in here:

static int psp_sw_init(void *handle)
...
	if (mem_training_ctx->enable_mem_training) {
		ret = psp_memory_training_init(psp);
		if (ret) {
			DRM_ERROR("Failed to initialize memory training!\n");
			return ret;
		}

		ret = psp_mem_training(psp, PSP_MEM_TRAIN_COLD_BOOT);
		if (ret) {
			DRM_ERROR("Failed to process memory training!\n");
			return ret;
		}
	}

@geerlingguy
Copy link
Owner Author

Opened an issue on the 'official' tracker: Freedesktop GitLab - Can't get RX 6700 XT running on Raspberry Pi CM4.

@elmeyer
Copy link

elmeyer commented Sep 10, 2021

The way I read this log is that the actual panic occurs when the Raspberry Pi itself is setting some clockspeed (PCIE bus? its own CPU? But why would that fail…) through a firmware call that times out. I think that’s why we’re not seeing that DRM error about failed memory training being printed, which leads me to believe we’re seeing the crashes occur at random points again? Smells familiar…

@geerlingguy
Copy link
Owner Author

geerlingguy commented Sep 10, 2021

Which leads me to believe we’re seeing the crashes occur at random points again? Smells familiar…

Indeed, I'm running through a few more tests just to see if I can get consistent results (with a tons of .5s delays mixed in).

I just checked before I was going to load amdgpu again, and saw these two errors too (completely random, a few minutes after booting the Pi, hadn't touched it):

[  610.888425] ------------[ cut here ]------------
[  610.888447] fw-clk-m2mc already disabled
[  610.888492] WARNING: CPU: 3 PID: 86 at drivers/clk/clk.c:960 clk_core_disable+0x258/0x290
...
[  610.889440] fw-clk-m2mc already unprepared
[  610.889474] WARNING: CPU: 3 PID: 86 at drivers/clk/clk.c:819 clk_core_unprepare+0x23c/0x260

And looking back, those same two errors occurred 10 seconds into the boot cycle. PCIe bus seems to not be up either on this boot:

[    1.228140] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
[    1.228179] brcm-pcie fd500000.pcie:   No bus range found for /scb/pcie@7d500000, using [bus 00-ff]
[    1.228265] brcm-pcie fd500000.pcie:      MEM 0x0600000000..0x063fffffff -> 0x00c0000000
[    1.228355] brcm-pcie fd500000.pcie:   IB MEM 0x0000000000..0x00ffffffff -> 0x0400000000
[    1.545482] brcm-pcie fd500000.pcie: link down

But a reboot brings it right back.

@geerlingguy
Copy link
Owner Author

I'm also adding .5s delays with two lines like the following:

	printk(KERN_ALERT "DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__);
	msleep(500);

And it looks like I can very consistently reach:

[   76.507503] [drm] Detected VRAM RAM=12272M, BAR=256M
[   76.507508] [drm] RAM width 192bits GDDR6
[   76.507617] [drm] amdgpu: 12272M of VRAM memory ready
[   76.507625] [drm] amdgpu: 2845M of GTT memory ready.
[   76.507643] [drm] GART: num cpu pages 131072, num gpu pages 131072
[   76.507672] DEBUG: Passed gmc_v10_0_hw_init 1075 
[   76.507796] [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
[   76.507803] DEBUG: Passed gmc_v10_0_hw_init 1084 
[   76.508260] DEBUG: Passed psp_sw_init 262 
[   77.046534] DEBUG: Passed psp_sw_init 279 
[   77.564552] DEBUG: Passed psp_get_runtime_db_entry 201 
[   78.076551] amdgpu 0000:03:00.0: amdgpu: PSP runtime database doesn't exist
[   78.076566] DEBUG: Passed psp_sw_init 303 
[   78.588509] DEBUG: Passed psp_sw_init 308 
[   79.100496] DEBUG: Passed psp_sw_init 317

The next block of code, which does not run, is:

		ret = psp_mem_training(psp, PSP_MEM_TRAIN_COLD_BOOT);
		if (ret) {
			DRM_ERROR("Failed to process memory training!\n");
			return ret;
		}

@geerlingguy
Copy link
Owner Author

geerlingguy commented Sep 10, 2021

Debugging psp_v11_0_memory_training now:

[   26.845578] amdgpu 0000:03:00.0: amdgpu: PSP runtime database doesn't exist
[   26.845590] DEBUG: Passed psp_sw_init 303 
[   27.357576] DEBUG: Passed psp_sw_init 308 
[   27.869578] DEBUG: Passed psp_sw_init 317 
[   28.381584] DEBUG: Passed psp_v11_0_memory_training 612 
[   28.893586] DEBUG: Passed psp_v11_0_memory_training 623 
[   29.405609] DEBUG: Passed psp_v11_0_memory_training 634 
[   29.917580] DEBUG: Passed psp_v11_0_memory_training 642 
[   30.429593] DEBUG: Passed psp_v11_0_memory_training 650 
[   30.941605] DEBUG: Passed psp_v11_0_memory_training 658 
[   31.453586] DEBUG: Passed psp_v11_0_memory_training 667 
[   31.965598] DEBUG: Passed psp_v11_0_memory_training 677 
[   32.477579] DEBUG: Passed psp_v11_0_memory_training 686 
[   32.989579] DEBUG: Passed psp_v11_0_memory_training 694 
[   33.501583] DEBUG: Passed psp_v11_0_memory_training 708 
[   34.013581] DEBUG: Passed psp_v11_0_memory_training 718 
[   34.526817] DEBUG: Passed psp_v11_0_memory_training 727 

It looks like it's hitting this portion of code:

static int psp_v11_0_memory_training(struct psp_context *psp, uint32_t ops)
...
	if (drm_dev_enter(&adev->ddev, &idx)) {
			memcpy_fromio(buf, adev->mman.aper_base_kaddr, sz);
			ret = psp_v11_0_memory_training_send_msg(psp, PSP_BL__DRAM_LONG_TRAIN);
			if (ret) {
				DRM_ERROR("Send long training msg failed.\n");
				vfree(buf);
				drm_dev_exit(idx);
				return ret;
			}

memcpy_fromio() seems the likely culprit?

Edit: It seems like every time with debug statements around it, the system halts on the line:

memcpy_fromio(buf, adev->mman.aper_base_kaddr, sz);

@geerlingguy
Copy link
Owner Author

Maybe it's time for me to read through the entire Linux Device Drivers book on PCIe memory access?

@geerlingguy
Copy link
Owner Author

geerlingguy commented Sep 10, 2021

Trimming down the debug to just before the memcpy_fromio() line:

static int psp_v11_0_memory_training(struct psp_context *psp, uint32_t ops)
...
		if (drm_dev_enter(&adev->ddev, &idx)) {
			printk(KERN_ALERT "DEBUG: Passed %s %d \n",__FUNCTION__,__LINE__);
			printk(KERN_ALERT "DEBUG: addr %p, value %u, count %d \n",buf,adev->mman.aper_base_kaddr,sz);
			msleep(500);

			memcpy_fromio(buf, adev->mman.aper_base_kaddr, sz);

I see:

[   48.987688] amdgpu 0000:03:00.0: amdgpu: PSP runtime database doesn't exist
[   48.988976] DEBUG: Passed psp_v11_0_memory_training 692 
[   48.988991] DEBUG: addr 0000000022ac6957, value 536870912, count 33554432 
[   51.837474] ------------[ cut here ]------------
[   51.837490] Firmware transaction timeout
[   51.837532] WARNING: CPU: 1 PID: 177 at drivers/firmware/raspberrypi.c:67 rpi_firmware_transaction+0xdc/0x108

@Coreforge
Copy link

Can you send the rest of the Oops? Most importantly the stack trace and link register.

@martinx72
Copy link

Can you send the rest of the Oops? Most importantly the stack trace and link register.

rx6600_ dmesg.txt

dmesg attached.

@geerlingguy
Copy link
Owner Author

geerlingguy commented Nov 1, 2024

Just realized btop's AMD GPU metrics support requires ROCm, which as I noted is not available on arm64 as of right now.

Also, looks like a rougue memset in @martinx72's output:

[    7.498614] Call trace:
[    7.501061]  __memset+0x16c/0x188
[    7.504383]  kernel_queue_init+0x50/0xa8 [amdgpu]
[    7.509349]  pm_init+0x78/0xf8 [amdgpu]
[    7.513443]  start_cpsch+0x70/0x208 [amdgpu]
[    7.517966]  kgd2kfd_device_init+0x59c/0xa78 [amdgpu]
[    7.523275]  amdgpu_amdkfd_device_init+0x158/0x218 [amdgpu]
[    7.529107]  amdgpu_device_init+0x212c/0x2220 [amdgpu]
[    7.534502]  amdgpu_driver_load_kms+0x20/0x1a8 [amdgpu]
[    7.539983]  amdgpu_pci_probe+0x154/0x420 [amdgpu]
[    7.545027]  pci_device_probe+0xa0/0x148

@Coreforge
Copy link

Can you give it a try with this patch applied?
I wasn't able to replicate the issue with the 6700xt (even with debug_largebar enabled), so there's a good chance there will be a few more places it'll get caught up on.

Interestingly enough, with your card, it seems to only need an 8GB BAR for 8GB of VRAM, while it's requesting 24GB for the 12GB of VRAM on the 6700xt.

@martinx72
Copy link

martinx72 commented Nov 2, 2024

Can you give it a try with this patch applied? I wasn't able to replicate the issue with the 6700xt (even with debug_largebar enabled), so there's a good chance there will be a few more places it'll get caught up on.

Interestingly enough, with your card, it seems to only need an 8GB BAR for 8GB of VRAM, while it's requesting 24GB for the 12GB of VRAM on the 6700xt.

Nice fix, now it boots up and shows things via HDMI.
Your works here is fantastic

I ran glmark2-es2 windowed

=======================================================
    glmark2 2023.01
=======================================================
    OpenGL Information
    GL_VENDOR:      AMD
    GL_RENDERER:    AMD Radeon RX 6600 XT (navi23, LLVM 15.0.6, DRM 3.54, 6.6.51-v8-16k+)
    GL_VERSION:     OpenGL ES 3.2 Mesa 23.2.1-1~bpo12+rpt3
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
    Surface Size:   800x600 windowed
=======================================================
[build] use-vbo=false: FPS: 389 FrameTime: 2.577 ms
[build] use-vbo=true: FPS: 2743 FrameTime: 0.365 ms
[texture] texture-filter=nearest: FPS: 2729 FrameTime: 0.367 ms
[texture] texture-filter=linear: FPS: 2741 FrameTime: 0.365 ms
[texture] texture-filter=mipmap: FPS: 2747 FrameTime: 0.364 ms
[shading] shading=gouraud: FPS: 2740 FrameTime: 0.365 ms
[shading] shading=blinn-phong-inf: FPS: 2757 FrameTime: 0.363 ms
[shading] shading=phong: FPS: 2747 FrameTime: 0.364 ms
[shading] shading=cel: FPS: 2737 FrameTime: 0.365 ms
[bump] bump-render=high-poly: FPS: 2747 FrameTime: 0.364 ms
[bump] bump-render=normals: FPS: 2735 FrameTime: 0.366 ms
[bump] bump-render=height: FPS: 2735 FrameTime: 0.366 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 2795 FrameTime: 0.358 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 2803 FrameTime: 0.357 ms
[pulsar] light=false:quads=5:texture=false: FPS: 2731 FrameTime: 0.366 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 2939 FrameTime: 0.340 ms
[desktop] effect=shadow:windows=4: FPS: 2473 FrameTime: 0.404 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 153 FrameTime: 6.575 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 150 FrameTime: 6.695 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 192 FrameTime: 5.213 ms
[ideas] speed=duration: FPS: 1972 FrameTime: 0.507 ms
[jellyfish] <default>: FPS: 2656 FrameTime: 0.377 ms
[terrain] <default>: FPS: 2076 FrameTime: 0.482 ms
[shadow] <default>: FPS: 1443 FrameTime: 0.693 ms
[refract] <default>: FPS: 1646 FrameTime: 0.608 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 2917 FrameTime: 0.343 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 2963 FrameTime: 0.338 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 2944 FrameTime: 0.340 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 2911 FrameTime: 0.344 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 2910 FrameTime: 0.344 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 2951 FrameTime: 0.339 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 2951 FrameTime: 0.339 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 2864 FrameTime: 0.349 ms
=======================================================
                                  glmark2 Score: 2362 
=======================================================

I have RX7700XT here also, will try it also...

@martinx72
Copy link

I just tested with RX7700XT. for sure it failed.
attched some log here, hope it helps.

dmesg_rx7700xt.txt
lspci_rx7700xt.txt

0000:01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev 11) (prog-if 00 [Normal decode])
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 38
	Region 0: Memory at 1b80200000 (32-bit, non-prefetchable) [size=16K]
	Bus: primary=01, secondary=02, subordinate=03, sec-latency=0
	I/O behind bridge: [disabled] [32-bit]
	Memory behind bridge: 80000000-801fffff [size=2M] [32-bit]
	Prefetchable memory behind bridge: 1800000000-1817ffffff [size=384M] [32-bit]
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied>
	Kernel driver in use: pcieport

0000:02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch (rev 11) (prog-if 00 [Normal decode])
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 39
	Bus: primary=02, secondary=03, subordinate=03, sec-latency=0
	I/O behind bridge: [disabled] [32-bit]
	Memory behind bridge: 80000000-801fffff [size=2M] [32-bit]
	Prefetchable memory behind bridge: 1800000000-1817ffffff [size=384M] [32-bit]
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied>
	Kernel driver in use: pcieport

0000:03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 32 [Radeon RX 7700 XT / 7800 XT] (rev ff) (prog-if 00 [VGA controller])
	Subsystem: Sapphire Technology Limited Navi 32 [Radeon RX 7700 XT / 7800 XT]
	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 38
	Region 0: Memory at 1800000000 (64-bit, prefetchable) [size=256M]
	Region 2: Memory at 1810000000 (64-bit, prefetchable) [size=2M]
	Region 5: Memory at 1b80000000 (32-bit, non-prefetchable) [size=1M]
	Expansion ROM at 1b80100000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel modules: amdgpu

0000:03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin B routed to IRQ 255
	Region 0: Memory at 1b80120000 (32-bit, non-prefetchable) [disabled] [size=16K]
	Capabilities: <access denied>
[    4.802435] [drm] amdgpu kernel modesetting enabled.
[    4.803361] amdgpu: DSDT table not found for OEM information
[    4.803369] amdgpu: IO link not available for non x86 platforms
[    4.803371] amdgpu: Virtual CRAT table created for CPU
[    4.812943] amdgpu: Topology: Add CPU node
[    4.813146] amdgpu 0000:03:00.0: enabling device (0000 -> 0002)
[    4.813156] [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x747E 0x1DA2:0x475F 0xFF).
[    4.813176] [drm] register mmio base: 0x80000000
[    4.813178] [drm] register mmio size: 1048576
[    5.339163] [drm] add ip block number 0 <soc21_common>
[    5.339170] [drm] add ip block number 1 <gmc_v11_0>
[    5.339173] [drm] add ip block number 2 <ih_v6_0>
[    5.339175] [drm] add ip block number 3 <psp>
[    5.339178] [drm] add ip block number 4 <smu>
[    5.339180] [drm] add ip block number 5 <dm>
[    5.339183] [drm] add ip block number 6 <gfx_v11_0>
[    5.339185] [drm] add ip block number 7 <sdma_v6_0>
[    5.339187] [drm] add ip block number 8 <vcn_v4_0>
[    5.339189] [drm] add ip block number 9 <jpeg_v4_0>
[    5.339191] [drm] add ip block number 10 <mes_v11_0>
[    5.354102] [drm] BIOS signature incorrect ff df
[    5.391367] amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from ROM BAR
[    5.391376] amdgpu: ATOM BIOS: 113-D7120600-P03
[    5.397583] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/psp_13_0_10_sos.bin failed with error -2
[    5.397596] [drm:amdgpu_device_init [amdgpu]] *ERROR* early_init of IP block <psp> failed -19
[    5.399268] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/smu_13_0_10.bin failed with error -2
[    5.399278] [drm:amdgpu_device_init [amdgpu]] *ERROR* early_init of IP block <smu> failed -19
[    5.403375] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/gc_11_0_3_pfp.bin failed with error -2
[    5.403385] [drm:amdgpu_device_init [amdgpu]] *ERROR* early_init of IP block <gfx_v11_0> failed -19
[    5.403799] [drm] VCN(0) encode/decode are enabled in VM mode
[    5.403800] [drm] VCN(1) encode/decode are enabled in VM mode
[    5.412191] amdgpu 0000:03:00.0: [drm:jpeg_v4_0_early_init [amdgpu]] JPEG decode is enabled in VM mode
[    5.415987] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/gc_11_0_3_mes_2.bin failed with error -2
[    5.415996] [drm] try to fall back to amdgpu/gc_11_0_3_mes.bin
[    5.416020] amdgpu 0000:03:00.0: Direct firmware load for amdgpu/gc_11_0_3_mes.bin failed with error -2
[    5.416024] [drm:amdgpu_device_init [amdgpu]] *ERROR* early_init of IP block <mes_v11_0> failed -19
[    5.416433] amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init
[    5.416435] amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.

@Coreforge
Copy link

The 7000 series has gfx11, which currently doesn't have any fixes, but I'd expect it to not really be much different from gfx10. Those firmware load issues are odd though. Can you check if those files actually exist on your system? Error -2 would suggest they don't.

@Coreforge
Copy link

I did a bit of cleanup and added a proper config option for the alignment trap under Kernel Features -> Fix up misalligned loads and stores from userspace for 64bit code .
Under Device Drivers -> Graphics support -> Direct Rendering Manager -> Force Architecture can write-combine memory, I added an option to force drm_arch_can_wc_memory to return true for better (potentially) performance, though it might not be entirely stable.

@martinx72
Copy link

The 7000 series has gfx11, which currently doesn't have any fixes, but I'd expect it to not really be much different from gfx10. Those firmware load issues are odd though. Can you check if those files actually exist on your system? Error -2 would suggest they don't.

I installed the firmware via

sudo apt install firmware-amd-graphics

but, you are right, those file are not even existed there.
Screenshot 2024-11-03 123916

Will source them and put them back and see how it will be soon

@martinx72
Copy link

martinx72 commented Nov 3, 2024

I downloaded those necessary firmware bins manually

cd /usr/lib/firmware/amdgpu
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/psp_13_0_10_sos.bin
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/smu_13_0_10.bin
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_pfp.bin
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_mes_2.bin
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_mes1.bin
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/psp_13_0_10_ta.bin
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_me.bin
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_rlc.bin
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_mec.bin
sudo wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/amdgpu/gc_11_0_3_imu.bin

now, those file missing error weres gone,
and got errors as below,
Screenshot 2024-11-03 130601

also full dmesg here: dmesg_rx7700.txt

@geerlingguy
Copy link
Owner Author

geerlingguy commented Nov 4, 2024

@Coreforge - I just played Doom Eternal in 4K for about 20 minutes, was switching between 4K, 720p, 1080p, and trying some different graphics settings. It ran between 15-24 fps, but I didn't have any instability (this was with your patch as of a week ago, it took the weekend to get the game downloaded lol).

Interestingly, lower resolutions ran slower than 4K, as I think the CPU became a greater bottleneck? Or at least lower resolutions weren't any faster. The GPU never got over like 60% utilization:

image

480p:

image

4K:

image

I may try a 3.0 GHz overclock and see if that makes a big difference or not on my setup.

@Coreforge
Copy link

Was this with write-combining enabled or disabled? If you didn't enable the write-combining option I added, can you check dmesg if it's using a large BAR? I've noticed that the 6700xt in my PC seems to be fine with a 16GB bar, so maybe some cards can get a large BAR on the pi, while others can't?
I had to enable write-combining, as otherwise something took too long when starting Doom Eternal, which would then lead to a GPU timeout (disabling or increasing the timeout would probably work too, I didn't try that).

Maybe the 3GHz overclock also isn't quite stable on my pi and I just need to lower the clock speed a little bit.

@geerlingguy
Copy link
Owner Author

@Coreforge - I don't think you had that option ready yet in the commit I was running: Coreforge/linux@7fa79e5

I'm assuming you're referring to the drm_arch_can_wc_memory option? I don't have that in the branch I'm currently testing.

Regarding BAR space:

[    9.887175] [drm] Not enough PCI address space for a large BAR.
[    9.887178] amdgpu 0000:03:00.0: BAR 0: assigned [mem 0x1800000000-0x180fffffff 64bit pref]
[    9.887190] amdgpu 0000:03:00.0: BAR 2: assigned [mem 0x1810000000-0x18101fffff 64bit pref]
[    9.887217] [drm] Detected VRAM RAM=12272M, BAR=256M

@Coreforge
Copy link

Then either write-combining isn't entirely stable (which is very possible, there's a reason it's generally disabled for arm in DRM), or the version of mesa I'm using has some bug.
It's a bit odd that you apparently didn't get the timeouts I was getting, but I don't entirely know what was causing them in the first place.

@geerlingguy
Copy link
Owner Author

@Coreforge - It should be noted Crysis was less stable. It crashed out after a while on every game session I started (4K, 1080p, whatever). Also, OC'ing the Pi to 3.0 GHz increased frame rates a little, but even Doom Eternal was less stable at that clock (it never crashed at 2.4, but crashed after a few minutes at 3.0).

image

Interestingly, it can play in 'Can it run Crysis', it just takes a loooong time to get all the assets loaded in :D

@geerlingguy
Copy link
Owner Author

@martinx72 - FYI, we can maybe continue discussion of the 7xxx series cards over in #680 — it loooks like you're having the same issues that I'm hitting with the Radeon Pro W7700

@Coreforge
Copy link

Since I mentioned it yesterday in #680, I did a run of glmark2-drm to ideally rule out any desktop compositor overhead to get hopefully more comparable results.

=======================================================
    glmark2 2023.01
=======================================================
    OpenGL Information
    GL_VENDOR:      AMD
    GL_RENDERER:    AMD Radeon RX 6700 XT (radeonsi, navi22, LLVM 19.1.1, DRM 3.54, 6.6.51-v8-16k+)
    GL_VERSION:     4.6 (Compatibility Profile) Mesa 24.2.4-1
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
    Surface Size:   3840x2160 fullscreen
=======================================================
[build] use-vbo=false: FPS: 1260 FrameTime: 0.794 ms
[build] use-vbo=true: FPS: 3151 FrameTime: 0.317 ms
[texture] texture-filter=nearest: FPS: 3547 FrameTime: 0.282 ms
[texture] texture-filter=linear: FPS: 3568 FrameTime: 0.280 ms
[texture] texture-filter=mipmap: FPS: 3349 FrameTime: 0.299 ms
[shading] shading=gouraud: FPS: 3551 FrameTime: 0.282 ms
[shading] shading=blinn-phong-inf: FPS: 3591 FrameTime: 0.279 ms
[shading] shading=phong: FPS: 3459 FrameTime: 0.289 ms
[shading] shading=cel: FPS: 3630 FrameTime: 0.276 ms
[bump] bump-render=high-poly: FPS: 3093 FrameTime: 0.323 ms
[bump] bump-render=normals: FPS: 2263 FrameTime: 0.442 ms
[bump] bump-render=height: FPS: 2671 FrameTime: 0.374 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 3575 FrameTime: 0.280 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 2045 FrameTime: 0.489 ms
[pulsar] light=false:quads=5:texture=false: FPS: 3734 FrameTime: 0.268 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 2020 FrameTime: 0.495 ms
[desktop] effect=shadow:windows=4: FPS: 2618 FrameTime: 0.382 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 383 FrameTime: 2.615 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 810 FrameTime: 1.236 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 357 FrameTime: 2.806 ms
[ideas] speed=duration: FPS: 2080 FrameTime: 0.481 ms
[jellyfish] <default>: FPS: 3081 FrameTime: 0.325 ms
[terrain] <default>: FPS: 662 FrameTime: 1.512 ms
[shadow] <default>: FPS: 766 FrameTime: 1.306 ms
[refract] <default>: FPS: 1033 FrameTime: 0.968 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 3885 FrameTime: 0.257 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 3867 FrameTime: 0.259 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 3896 FrameTime: 0.257 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 3907 FrameTime: 0.256 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 3903 FrameTime: 0.256 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 3921 FrameTime: 0.255 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 3904 FrameTime: 0.256 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 3892 FrameTime: 0.257 ms
=======================================================
                                  glmark2 Score: 2770 
=======================================================

That test was with the pi overclocked to 3.0GHz, without overclocking, I got 2584.

It was still mostly limited by the SDMA block according to amdgpu_top, and I also saw a lot of CPU usage from the sdma1 kernel thread in htop (sdma1 often consumed about 10-15% CPU time, while the two glmark2 threads were in about the same range, one was a bit higher at around 28%).

@geerlingguy
Copy link
Owner Author

I also saw an issue from back in 2021 about CPU bottlenecking with glmark2... I wonder if there's a better GPU benchmark we can run that is almost entirely transparent to the CPU? So far only a few games like Doom Eternal even got the GPU up past 70% utilization, and only sometimes (the CPU was usually chugging along with one or two cores at 99-100%, blocking rendering).

@Coreforge
Copy link

GravityMark has been quite light on the CPU, especially the vulkan version (OpenGL not quite as much)

@DanaGoyette
Copy link

Not exactly a benchmark, but I like using the game Veloren to test Vulkan performance. It's available in flatpak (net.veloren.airshipper is the launcher package).
I'll usually just create a world and then spectate the world.

Sadly, v3d/vc4 doesn't seem to support some feature or other that the game needs.

@geerlingguy
Copy link
Owner Author

@Coreforge - Indeed, GravityMark is the only thing I've found that will max out the GPU even more than SuperTuxKart. CPU isn't doing a thing, while GPU is rendering out 200+ fps with 200k asteroids, at 150W (the limit for the W7700). Nice!

@jamesfmackenzie
Copy link

I was able to get the RX 6600 XT running with:

  1. CoreForge's rpi-6.6.y-gpu branch
  2. CoreForge's updated memcpy lib
  3. Alignment fix (enabled through menuconfig kernel build option)

It works great!

Very stable with fast performance too:

image

Thanks for the work on this!

@Srandista
Copy link

Is the @Coreforge memcpy lib still a requirement? I'm asking, because it's not mentioned in @geerlingguy recent article regarding eGPUs.

@geerlingguy
Copy link
Owner Author

@Srandista - nope! Not anymore.

@jamesfmackenzie
Copy link

@Srandista - nope! Not anymore.

Is there a performance advantage to using the library? If not I'll remove from my system too! :-)

@geerlingguy
Copy link
Owner Author

@jamesfmackenzie - After my latest round of testing, I don't think so... It seems like things like gravitymark and glmark2-es2 have similar results with and without. But more testing would be good — before you remove it, maybe run a couple benchmarks. Then remove it, run them again and make sure :D

@Coreforge
Copy link

Technically, there could be a performance advantage in specific scenarios, but I don't know how much it transfers into the real world.
For the pi 4 (which would lock up) and the pi 5 before the alignment trap was complete enough it was required, now it all gets handled by the trap (at a performance cost, as this requires entering the kernel and interpreting the instruction).
In synthetic tests this should be measurable, but I haven't actually done that yet (I probably should, as that'll also allow me to check if the write-combining option actually helps in any way or not, potentially. My current OpenGL test program might not have enough control of the memory).

@Srandista
Copy link

@Coreforge can you please rebase your patch against the latest changes in RPi kernel? Patch can't be cleanly merged since mid-December.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests