Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sdcv: Don't walk into "res/" subdirectories #1961

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

georgeto
Copy link

@georgeto georgeto commented Oct 9, 2024

When searching for .ifo files use the same optimization as we do in the koreader frontend in getIfosInDir(path) in readerdictionary.lua.

Fortunately sdcv uses the for_each_file function exclusively to search for .ifo files, making the patch trivial.

I have several dictionaries on my Kobo Aura One, some of which contain 'res' subdirectories with many files. After a restart or cold boot, the first dictionary lookup would take around 45s, while subsequent lookups would only take merely around 0.5s. My theory for this massive difference is that after reboot, the file system cache is empty, and therefore the file system structure for all the resource directories must first be loaded from the flash, which seems to be quite slow.

With the patched sdcv there is now no difference between the first and the following lookups anymore. In addition, the average lookup time on my device drops from 0.5s to about 0.3s.


This change is Reviewable

When searching for .ifo files use the same optimization as we do in the
koreader frontend in getIfosInDir(path) in readerdictionary.lua.

Fortunately sdcv uses the for_each_file function exclusively to search
for .ifo files, making the patch trivial.

I have several dictionaries on my Kobo Aura One, some of which contain
'res' subdirectories with many files. After a restart or cold boot, the
first dictionary lookup would take around 45s, while subsequent lookups
would only take merely around 0.5s. My theory for this massive
difference is that after reboot, the file system cache is empty, and
therefore the file system structure for all the resource directories
must first be loaded from the flash, which seems to be quite slow.

With the patched sdcv there is now no difference between the first and
the following lookups anymore. In addition, the average lookup time on
my device drops from 0.5s to about 0.3s.
@benoit-pierre
Copy link
Contributor

Wouldn't it be better to contribute this upstream?

@georgeto
Copy link
Author

georgeto commented Oct 9, 2024

Wouldn't it be better to contribute this upstream?

I have my doubts that upstream would accept it in the current form, because it does not handle the edge case that there is a dictionary named res. For koreader that is not a problem however, because the check in readerdictionary.lua has the same limitation. But with a patch that doesn't have this limitation I could try it upstream. I will think about it.

@georgeto
Copy link
Author

georgeto commented Oct 9, 2024

But with a patch that doesn't have this limitation I could try it upstream. I will think about it.

I looked at the code again, but I currently don't see how to implement this nicely in a generalized way, i.e. outside the controlled conditions that we have in koreader.

A possible approach could the following, however I wouldn't call it nice: The directory iteration could remember when a res subdirectory is encountered, and skip it at first. Then at the end if there was no .ifo file in the directory, enter the res subdirectory. However, it remains that for_each_file is actually a generic function that could theoretically also be used for data types other than .ifo.

@Frenzie
Copy link
Member

Frenzie commented Oct 10, 2024

What concretely is many files?

@georgeto
Copy link
Author

What concretely is many files?

In my data/dict directory there are a total of 10500 files which are located in res subdirectories .
Not necessarily a number where I would expect the measured delay of 45s.

@Frenzie
Copy link
Member

Frenzie commented Oct 10, 2024

On an SD card with FAT32 that can sometimes be surprisingly slow, even if I don't necessarily mean quite that slow. You could run a simple time ls to compare the numbers.

@Frenzie
Copy link
Member

Frenzie commented Oct 10, 2024

Or since I just realized you said subdirectories plural, perhaps that should be something like time find . -type f > /dev/null.

@georgeto
Copy link
Author

Or since I just realized you said subdirectories plural, perhaps that should be something like time find . -type f > /dev/null.

Basically the same result as observed with sdcv:

[root@kobo koreader]# time find data/dict -type f > /dev/null
real    0m 43.31s
user    0m 0.14s
sys     0m 43.04s
[root@kobo koreader]# time find data/dict -type f > /dev/null
real    0m 0.23s
user    0m 0.04s
sys     0m 0.18s
[root@kobo koreader]# find data/dict -type f | wc
    10558     10595    869759

@Frenzie
Copy link
Member

Frenzie commented Oct 10, 2024

Right, so sdcv presumably isn't accidentally doing something inefficient, other than going through res. That's not very useful per se, but it's always good to double check. ^_^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants