You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In our application we are using your ufat library (via my RTOS). In one of the devices the SD card got damaged - it seems that it is able to correctly read (maybe also write) about 600-700 initial blocks, but for further blocks reads (possibly also writes) cause the SD card to fail with a timeout. This is the behaviour observed on the MCU, on a PC the card just fails to mount and trying to dump it allows you to read just the first ~700 blocks and about 350 kB of data, then it throws an I/O error each block.
The problem now seems to be that if the FAT file system mounts properly (and it does, because the initial part of card is fine), then trying to do some operation - in our case creating a folder - will result in a practically infinite loop due to ignored read errors in alloc_cluster(), exactly here:
if (!ufat_read_fat(uf, idx, &c) &&c==UFAT_CLUSTER_FREE) {
This loop will try to iterate all existing clusters (in case of our 16 GB card - about 2 million...) to find a free one, ignoring read errors. Given that the mode of error of the SD card is "timeout", then each read attempt lasts about 100 ms, so it will take "forever" for the loop to end due to checking all the clusters.
Is ignoring read errors here intentional (e.g. to ignore single broken clusters, without completely failing the top-level operation) or just an omission? I'm not sure how to understand that part of code, so I'm starting with a question before potentially providing a fix.
The text was updated successfully, but these errors were encountered:
alloc_cluster() just ignores read errors, trying next cluster until it
either succeeds (finds an empty one) or runs out of clusters (after
checking all of them). A large volume may have quite a lot of clusters -
eg. a 16 GB SD card with a standard format has about 2 million clusters.
When a "persistent" read error happens during the alloc_cluster() (after
a successful mount operation) - for example a volume is physically
disconnected in a very inconvenient moment or the volume is/gets damaged
and all further reads fail - then this loop becomes practically
infinite. In one application we found a damaged SD card, for which reads
of first 600-700 blocks work perfectly fine, but any read beyond that
results in a SDIO interface timing-out (the card will not switch to
expected state within specified time). As the timeout for the operation
is ~100 ms, then the function would loop for over 2 days. The same card
just fails to work in a PC, where any read beyond first ~350 kB (which
is about 700 blocks) fails with an I/O error.
Fix this by returning from alloc_cluster() with an error when any read
operation fails.
Fixesdlbeer#15
In our application we are using your ufat library (via my RTOS). In one of the devices the SD card got damaged - it seems that it is able to correctly read (maybe also write) about 600-700 initial blocks, but for further blocks reads (possibly also writes) cause the SD card to fail with a timeout. This is the behaviour observed on the MCU, on a PC the card just fails to mount and trying to dump it allows you to read just the first ~700 blocks and about 350 kB of data, then it throws an I/O error each block.
The problem now seems to be that if the FAT file system mounts properly (and it does, because the initial part of card is fine), then trying to do some operation - in our case creating a folder - will result in a practically infinite loop due to ignored read errors in
alloc_cluster()
, exactly here:ufat/ufat.c
Line 652 in e2a7466
This loop will try to iterate all existing clusters (in case of our 16 GB card - about 2 million...) to find a free one, ignoring read errors. Given that the mode of error of the SD card is "timeout", then each read attempt lasts about 100 ms, so it will take "forever" for the loop to end due to checking all the clusters.
Is ignoring read errors here intentional (e.g. to ignore single broken clusters, without completely failing the top-level operation) or just an omission? I'm not sure how to understand that part of code, so I'm starting with a question before potentially providing a fix.
The text was updated successfully, but these errors were encountered: