Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] mmc: host: bcm2835: Capture MMC errors #6475

Open
wants to merge 688 commits into
base: rpi-6.12.y
Choose a base branch
from

Conversation

lategoodbye
Copy link
Contributor

The commit 91f059c ("mmc: core: Capture eMMC and SD card errors") introduced MMC error stats, which are accessible via debugfs. I thought it would be helpful to implement this for the BCM2835 SDHOST driver. Unfortunately i'm not sure which error should increment what counter. So the commit is currently intended for review and not for merging.

Here is the error mapping:
[MMC_ERR_CMD_TIMEOUT] = "Command Timeout Occurred",
[MMC_ERR_CMD_CRC] = "Command CRC Errors Occurred",
[MMC_ERR_DAT_TIMEOUT] = "Data Timeout Occurred",
[MMC_ERR_DAT_CRC] = "Data CRC Errors Occurred",
[MMC_ERR_AUTO_CMD] = "Auto-Cmd Error Occurred",
[MMC_ERR_ADMA] = "ADMA Error Occurred",
[MMC_ERR_TUNING] = "Tuning Error Occurred",
[MMC_ERR_CMDQ_RED] = "CMDQ RED Errors",
[MMC_ERR_CMDQ_GCE] = "CMDQ GCE Errors",
[MMC_ERR_CMDQ_ICCE] = "CMDQ ICCE Errors",
[MMC_ERR_REQ_TIMEOUT] = "Request Timedout",
[MMC_ERR_CMDQ_REQ_TIMEOUT] = "CMDQ Request Timedout",
[MMC_ERR_ICE_CFG] = "ICE Config Errors",
[MMC_ERR_CTRL_TIMEOUT] = "Controller Timedout errors",
[MMC_ERR_UNEXPECTED_IRQ] = "Unexpected IRQ errors",

mripard and others added 30 commits November 11, 2024 11:36
The BCM2712 has a completely different HVS than the previous
generations, so let's add a new compatible for it.

Signed-off-by: Maxime Ripard <[email protected]>
The BCM2712 has 3 different pixelvalves that are similar to the ones
found in the previous generations but with slightly different
capabilities.

Express that using a new set of compatibles.

Signed-off-by: Maxime Ripard <[email protected]>
The BCM2712 has a MOP controller which is basically a new revision of
the TXP.

Express that by adding a new compatible for it.

Signed-off-by: Maxime Ripard <[email protected]>
The BCM2712 has a MOPLET controller which is basically a TXP without the
transpose feature.

Express that by adding a new compatible for it.

Signed-off-by: Maxime Ripard <[email protected]>
The BCM2712 SoC comes with a new variation of the videocore display
pipeline. Let's create a new compatible for it.

Signed-off-by: Maxime Ripard <[email protected]>
The BCM2712 has an improved display pipeline, most notably with a
different HVS and only HDMI and writeback outputs.

Let's introduce it as a new VideoCore generation and compatible.

Signed-off-by: Maxime Ripard <[email protected]>
The HVS found in the BCM2712, while having a similar role, is very
different from the one found in the previous SoCs. Indeed, the register
layout is fairly different, and the DLIST format is new as well.

Let's introduce the needed functions to support the new HVS.

Signed-off-by: Maxime Ripard <[email protected]>

drm/vc4: Fix atomic_async_check to call the right mode_set function

vc4_plane_atomic_async_check was always calling vc4_plane_mode_set
to validate and generate the dlist for the check. If async_check
decided it had to fall back to a sync commit, then this GEN4/5
dlist could get used on GEN6.

Call either vc4_plane_mode_set or vc6_plane_mode_set as appropriate.

Fixes: 1ab1fbb ("drm/vc4: hvs: Support BCM2712 HVS")
Signed-off-by: Dave Stevenson <[email protected]>

drm/vc4: Add 2712 support to vc4_plane_async_set_fb

vc4_plane_async_set_fb directly overwrites the plane address in
the dlist entry, but hadn't been updated for the GEN6 / 2712
dlist format, corrupting the address in the process.

Add support for the 2712 dlist format to the function.

Fixes: 1ab1fbb ("drm/vc4: hvs: Support BCM2712 HVS")
Signed-off-by: Dave Stevenson <[email protected]>
The PixelValves found on the BCM2712 are similar to the ones found in
the previous generation.

Compared to BCM2711, the pixelvalves only drive one HDMI controller each
and HDMI1 PixelValve has a FIFO long enough to support 4k at 60Hz.

Signed-off-by: Maxime Ripard <[email protected]>
The HDMI controllers found in the BCM2712 are largely the ones found in
the BCM2711 with a different PHY.

There's some difference with how timings are split between registers,
and HDMI1 is now able to run at 4k/60Hz.

Signed-off-by: Maxime Ripard <[email protected]>
The BCM2712 will have several TXP with small differences. Let's add a
structure tied to the compatible to deal with those differences.

Signed-off-by: Maxime Ripard <[email protected]>
The TXP data structure has a name too generic for the multiple variants
we'll have to support. Let's rename it to mention the SoC it applies to.

Signed-off-by: Maxime Ripard <[email protected]>
The MOPLET doesn't have the BYTE_ENABLE field to set, but the TXP and
MOP do, so let's add a boolean to control whether or not we need to set
it.

Signed-off-by: Maxime Ripard <[email protected]>
The new writeback controllers that can be found on the BCM2712 require
to have their horizontal and vertical size reduced by one.

Let's tie that behaviour to the compatible so we can support both the
new and old controllers.

Signed-off-by: Maxime Ripard <[email protected]>
The BCM2712 MOP and MOPLET can handle addresses larger than 32bits
through an extra register. We can easily support it and make it
conditional based on the compatible through a boolean in our variant
structure.

Signed-off-by: Maxime Ripard <[email protected]>
We'll have multiple TXP instances in the BCM2712, so we can't use a
single encoder type anymore. Let's tie the encoder type to the
compatible.

Signed-off-by: Maxime Ripard <[email protected]>
Starting with BCM2712, we'll have a two TXP. Let's follow the HDMI
example and add two encoder types for TXP: TXP0 and TXP1.

Signed-off-by: Maxime Ripard <[email protected]>
The BCM2712 has an evolution of what used to be called TXP in the
earlier SoCs, but is now called MOP.

There's a few differences still, so we can add a new compatible to deal
with them easily.

Signed-off-by: Maxime Ripard <[email protected]>
The BCM2712 features a simpler TXP called MOPLET. Let's add support for
it.

Signed-off-by: Maxime Ripard <[email protected]>
Some code path in vc4 are conditional to a generation and cannot be
executed on others. Let's put a WARN_ON if that ever happens.

Signed-off-by: Maxime Ripard <[email protected]>
Testing whether the VideoCore generation we want to mock is vc5 or vc4
worked so far, but will be difficult to extend to support BCM2712 (VC6).

Convert to a switch.

Signed-off-by: Maxime Ripard <[email protected]>
The DRM device pointer and the DRM encoder pointer are redundant, since
the latter is attached to the former and we can just follow the
drm_encoder->dev pointer.

Let's remove the drm_device pointer argument.

Signed-off-by: Maxime Ripard <[email protected]>
Some tests will need to retrieve the output that was just allocated by
vc4_mock_atomic_add_output().

Instead of making them look them up in the DRM device, we can simply
make vc4_mock_atomic_add_output() return an error pointer that holds the
allocated output instead of the error code.

Signed-off-by: Maxime Ripard <[email protected]>
The BCM2712 has a simpler pipeline that can only output to a writeback
connector and two HDMI controllers.

Let's allow our kunit tests to create a mock of that pipeline.

Signed-off-by: Maxime Ripard <[email protected]>
The BCM2712 has a simpler pipeline than the BCM2711, and thus the muxing
requirements are different. Create some tests to make sure we get proper
muxing decisions.

Signed-off-by: Maxime Ripard <[email protected]>
The current mock planes were just using the regular drm_plane_state,
while the driver expect struct vc4_plane_state that subclasses
drm_plane_state.

Hook the proper implementations of reset, duplicate_state, destroy and
atomic_check to create vc4_plane_state.

Signed-off-by: Maxime Ripard <[email protected]>
Some tests will need to find a plane to run a test on for a given CRTC.
Let's create a small helper to do that.

Signed-off-by: Maxime Ripard <[email protected]>
We'll start to add some tests for the plane state logic, so let's create
a helper to add a plane to an existing atomic state.

Signed-off-by: Maxime Ripard <[email protected]>
We'll start testing our planes code in situations where we will use more
than XRGB8888, so let's add a few common pixel formats.

Signed-off-by: Maxime Ripard <[email protected]>
The BCM2712 comes with a different LBM size computation than the
previous generations, so let's add the few examples provided as kunit
tests to make sure we always satisfy those requirements.

Signed-off-by: Maxime Ripard <[email protected]>
On 2712, the firmware always runs these clock at a speed sufficient
for dual 4kp60.

The requests here prevent the gpu from going into its lowest voltage
mode, so just skip the clock requests.

With this applied the idle voltage on my pi 5 reduces from 0.7424V
to 0.72V.

Signed-off-by: Dom Cobley <[email protected]>
mairacanal and others added 28 commits November 11, 2024 11:37
Commit 94a23e9 ("cgroup: Disable cgroup "memory" by default")
disabled the memory cgroup by default when initing the cgroups. However,
it's possible to disable the memory cgroup by a kernel command line.
Hard-coding such a feature can be problematic as some memory management
features depend on the order that things are set.

For example, it is possible to see a NULL pointer dereference caused by
commit 94a23e9. The NULL pointer
dereference is triggered by the memory shrinker and ends up in a kernel
crash.

[   50.028629] ==================================================================
[   50.028645] BUG: KASAN: null-ptr-deref in do_shrink_slab+0x1fc/0x978
[   50.028663] Write of size 8 at addr 0000000000000000 by task gfxrecon-replay/1965

[   50.028676] CPU: 3 UID: 1000 PID: 1965 Comm: gfxrecon-replay Tainted: G         C         6.12.0-rc4-v8-thp-kasan+ raspberrypi#85
[   50.028685] Tainted: [C]=CRAP
[   50.028689] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[   50.028694] Call trace:
[   50.028697]  dump_backtrace+0xfc/0x120
[   50.028706]  show_stack+0x24/0x38
[   50.028711]  dump_stack_lvl+0x40/0x88
[   50.028720]  print_report+0xe4/0x708
[   50.028728]  kasan_report+0xcc/0x130
[   50.028733]  kasan_check_range+0x254/0x298
[   50.028738]  __kasan_check_write+0x20/0x30
[   50.028745]  do_shrink_slab+0x1fc/0x978
[   50.028751]  shrink_slab+0x318/0xc38
[   50.028756]  shrink_one+0x254/0x6d8
[   50.028762]  shrink_node+0x26b4/0x2848
[   50.028767]  do_try_to_free_pages+0x3e4/0x1190
[   50.028773]  try_to_free_pages+0x5a4/0xb40
[   50.028778]  __alloc_pages_direct_reclaim+0x144/0x298
[   50.028787]  __alloc_pages_slowpath+0x5c4/0xc70
[   50.028793]  __alloc_pages_noprof+0x4a8/0x6a8
[   50.028800]  __folio_alloc_noprof+0x24/0xa8
[   50.028806]  shmem_alloc_and_add_folio+0x2ec/0xce0
[   50.028812]  shmem_get_folio_gfp+0x380/0xc20
[   50.028818]  shmem_read_folio_gfp+0xe0/0x160
[   50.028824]  drm_gem_get_pages+0x238/0x620 [drm]
[   50.029039]  drm_gem_shmem_get_pages_sgt+0xd8/0x4b8 [drm_shmem_helper]
[   50.029053]  v3d_bo_create_finish+0x58/0x1e0 [v3d]
[   50.029083]  v3d_create_bo_ioctl+0xac/0x210 [v3d]
[   50.029105]  drm_ioctl_kernel+0x1d8/0x2b8 [drm]
[   50.029220]  drm_ioctl+0x4b4/0x920 [drm]
[   50.029330]  __arm64_sys_ioctl+0x11c/0x160
[   50.029337]  invoke_syscall+0x88/0x268
[   50.029345]  el0_svc_common+0x160/0x1d8
[   50.029351]  do_el0_svc+0x50/0x68
[   50.029358]  el0_svc+0x34/0x80
[   50.029364]  el0t_64_sync_handler+0x84/0x100
[   50.029371]  el0t_64_sync+0x190/0x198
[   50.029376] ==================================================================

This happens because the memory shrinker is unaware that we are
artificially disabling the memory cgroups and therefore it doesn't
allocate `nr_deferred` (as it would if we used the kernel command line).

To avoid such an issue, revert the artificial disablement and disable it
through the command line. If a user wants to enable the feature, it can
use the `cgroup_enable=` command line.

Signed-off-by: Maíra Canal <[email protected]>
There is a difference in I2C usage between CM4IO and CM5IO. Present a
simple, consistent view of the world by moving the assignment of the
bus IDs into carrier-specific files.

CM5 has reduced connectivity on CM4IO - the DPHYs are connected to
CAM1 and DISP1. Keep i2c-10 for use with that pair, as is the case for
CM4 on CM4IO.

Fixes: 36faab6 ("dts: bcm2712-rpi: Add aliases for the CSI/DSI I2Cs")
See: raspberrypi#6421

Signed-off-by: Phil Elwell <[email protected]>
Clients such as libcamera do not change the LS table dmabuf on every
frame. In such cases instead of mapping/remapping the same dmabuf on
every frame to send to the firmware, cache the dmabuf once and only
update and remap if the dmabuf has been changed by the userland client.

Signed-off-by: Naushir Patuck <[email protected]>
Enable the PWM GPIO module in Raspberry Pi kernels.

Signed-off-by: Tim Gover <[email protected]>
As a workaround (and possibly a fix) for CPU spins observed on BCM2837,
use ptep_clear_flush_young instead of ptep_test_and_clear_young inside
lru_gen_look_around in order to expose PTE changes to the MMU. Note that
on architectures that don't require an explicit flush,
ptep_clear_flush_young just calls ptep_test_and_clear_young.

Signed-off-by: Phil Elwell <[email protected]>
iommu_dma_numa_policy=interleave is not valid in the current tree
It generates an unknown setting will be passed to usespace warning

system_heap.max_order=0 is wanted when numa is enabled, but may not
be when it is disabled.

Add it on firmware side when we know if numa=fake=<n> is used.

Signed-off-by: Dom Cobley <[email protected]>
A user has reported that a card of this model from late 2021 doesn't
work, so extend the date range and make it match on all card sizes.

Signed-off-by: Jonathan Bell <[email protected]>
Add two quirk properties that control whether or not the controller
issues many more handshakes to FS/HS Async endpoints in a single
(micro)frame. Enabling these can significantly increase throughput for
endpoints that frequently respond with NAKs.

Signed-off-by: Jonathan Bell <[email protected]>
If a device frequently NAKs, it can exhaust the scheduled handshakes in
a frame. It will then not get polled by the controller until the next
frame interval. This is most noticeable on FS devices as the controller
schedules a small set of transactions only once per full-speed frame.

Setting the ENH_PER_NAK_FS/LS bits in the GUCTL1 register increases the
number of transactions that can be scheduled to Async (Control/Bulk)
endpoints in the respective frame time. In the FS case, this only
applies to FS devices directly connected to root ports.

Signed-off-by: Jonathan Bell <[email protected]>
There seem to be only benefits, and no downsides.

Signed-off-by: Jonathan Bell <[email protected]>
For platforms that have xHCI controllers attached over PCIe, and
non-coherent routes to main memory, a theoretical race exists between
posting new TRBs to a ring, and writing to the doorbell register.

In a contended system, write traffic from the CPU may be stalled before
the memory controller, whereas the CPU to Endpoint route is separate
and not likely to be contended. Similarly, the DMA route from the
endpoint to main memory may be separate and uncontended.

Therefore the xHCI can receive a doorbell write and find a stale view
of a transfer ring. In cases where only a single TRB is ping-ponged at
a time, this can cause the endpoint to not get polled at all.

Adding a readl() before the write forces a round-trip transaction
across PCIe, definitively serialising the CPU along the PCI
producer-consumer ordering rules.

Signed-off-by: Jonathan Bell <[email protected]>
The DHT11 datasheet is pretty cryptic, but it does suggest that after
each integer value (humidity and temperature) there are "decimal"
values. Validate these as integers in the range 0-9 and treat them as
tenths of a unit.

Link: raspberrypi#6220

Signed-off-by: Phil Elwell <[email protected]>
The logic for dropping a plane less than zero didn't account for the
possibility that a plane could be being upscaled with a src_rect with
width/height < 1 pixel, but not 0 subpixels.

Check for not 0 subpixels, not < 1, in both vc4 and vc6 paths.

Fixes: dac6168 ("drm/vc4: Drop planes that have 0 destination size")
Fixes: f73b18e ("drm/vc4: Drop planes that are completely off-screen")
Signed-off-by: Dave Stevenson <[email protected]>
The documentation says that the TPZ filter can not upscale,
and requesting a scaling factor > 1:1 will output the original
image in the top left, and repeat the right/bottom most pixels
thereafter.
That fits perfectly with upscaling a 1x1 image which is done
a fair amount by some compositors to give solid colour, and it
saves a large amount of LBM (TPZ is based on src size, whilst
PPF is based on dest size).

Select TPZ filter for images with source rectangle <=1.

Signed-off-by: Dave Stevenson <[email protected]>
…ctors

The non-desktop property "Indicates the output should be ignored for
purposes of displaying a standard desktop environment or console."

That sounds like it should be true for all writeback and virtual
connectors as you shouldn't render a desktop to them, so set it
by default.

Signed-off-by: Dave Stevenson <[email protected]>
The limit of 32 planes per DRM device is dictated by the use
of planes_mask returning a u32.

Change to a u64 such that 64 planes can be supported by a device.

Signed-off-by: Dave Stevenson <[email protected]>
The HVS can accept an arbitrary number of planes, provided
that the overall pixel read load is within limits, and
the display list can fit into the dlist memory.

Now that DRM will support 64 planes per device, increase
the number of overlay planes from 16 to 48 so that the
dlist complexity can be increased (eg 4x4 video wall on
each of 3 displays).

Signed-off-by: Dave Stevenson <[email protected]>
Instead of having 48 generic overlay planes, assign 32 to the
writeback connector so that there is no ambiguity in wlroots
when trying to find a plane for composition using the writeback
connector vs display.

Signed-off-by: Dave Stevenson <[email protected]>
Some hardware will implement transpose as a rotation operation,
which when combined with X and Y reflect can result in a rotation,
but is a discrete operation in its own right.

Add an option for transpose only.

Signed-off-by: Dave Stevenson <[email protected]>
Some connectors, particularly writeback, can implement flip
or transpose operations as writing back to memory.

Add a connector rotation property to control this.

Signed-off-by: Dave Stevenson <[email protected]>
The txp block can implement transpose as it writes out the image
data, so expose that through the new connector rotation property.

Signed-off-by: Dave Stevenson <[email protected]>
For devices where transfer lengths are not known upfront, there is a
danger when the destination is wider than the source that partial words
can be lost at the end of a transfer. Ideally the controller would be
able to flush the residue, but it can't - it's not even possible to tell
that there is any.

Instead, allow the client driver to avoid the problem by setting a
smaller width.

Signed-off-by: Phil Elwell <[email protected]>
SPI transfers are of defined length, unlike some UART traffic, so it is
safe to let the DMA controller choose a suitable memory width.

Signed-off-by: Phil Elwell <[email protected]>
In order to avoid losing residue bytes when a receive is terminated
early, set the destination width to single bytes.

Link: raspberrypi#6365

Signed-off-by: Phil Elwell <[email protected]>
Historically we have booted in powersave mode, and switched to ondemand with the
/etc/init.d/raspi-config scipt if the shift key is not pressed.

This was intended to protect users from an unsafe overclock, or inadequate power supply.

But there's little evedence that this option is benefitting anyone,
and we can boot faster without it.

Signed-off-by: Dom Cobley <[email protected]>
The xHC may commence Host Initiated Data Moves for streaming endpoints -
see USB3.2 spec s8.12.1.4.2.4. However, this behaviour is typically
counterproductive as the submission of UAS URBs in {Status, Data,
Command} order and 1 outstanding IO per stream ID means the device never
enters Move Data after a HIMD for Status or Data stages with the same
stream ID. For OUT transfers this is especially inefficient as the host
will start transmitting multiple bulk packets as a burst, all of which
get NAKed by the device - wasting bandwidth.

Also, some buggy UAS adapters don't properly handle the EP flow control
state this creates - e.g. RTL9210.

Set Host Initiated Data Move Disable to always defer stream selection to
the device. xHC implementations may treat this field as "don't care,
forced to 1" anyway - xHCI 1.2 s4.12.1.

Signed-off-by: Jonathan Bell <[email protected]>
The commit 91f059c ("mmc: core: Capture eMMC and SD card errors")
introduced MMC error stats, which are accessible via debugfs. Implement
the necessary increment calls for the BCM2835 SDHOST driver, which
isn't SDHCI compatible. This is useful for debug and testing.

Signed-off-by: Stefan Wahren <[email protected]>
@popcornmix popcornmix force-pushed the rpi-6.12.y branch 2 times, most recently from 886c7da to 12856cc Compare November 18, 2024 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.