Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tldr --update_cache does not delete local-only pages #242

Open
mooreye opened this issue Apr 10, 2024 · 6 comments · May be fixed by #249
Open

tldr --update_cache does not delete local-only pages #242

mooreye opened this issue Apr 10, 2024 · 6 comments · May be fixed by #249

Comments

@mooreye
Copy link

mooreye commented Apr 10, 2024

How to reproduce on Linux:

  1. Download all pages by running tldr -u
  2. Go to ~/.cache/tldr/pages/linux/ dir
  3. Create a test file: cp cp.md zzzzz.md
  4. Running tldr zzzzz now shows the copied cp page
  5. Run tldr -u again
  6. See that the file zzzzz.md is still present.
@mooreye
Copy link
Author

mooreye commented Apr 10, 2024

diff-ing from my backup shows for example that the following pages were once present but were deleted upstream:

[/path/to/my/backup/of/home/dir/.cache/] $ diff -r tldr/ ~/.cache/tldr/
Only in tldr/pages/common: amass-db.md
Only in tldr/pages/common: sockstat.md
Only in tldr/pages/linux: ffuf.md
Only in tldr/pages/linux: flock.md
Only in tldr/pages/linux: select.md
Only in tldr/pages/linux: zipcloak.md

@vitorhcl
Copy link
Member

vitorhcl commented Apr 13, 2024

Hi, thank you for opening this issue!

This repository is just for the pages, specification for tldr clients (programs to visualize tldr pages), scripts for updating pages... But not for tldr clients. So your proposal would be for enforcing this to all clients.

Well, I see some pros and cons of not allowing local-only pages:

  • Pros:
    1. Standard environment across clients.
    2. Encourages the user to submit a pull request for creating a new page.
    3. If page is added on the repository and there was already a local-only page with the same name, it would be likely overwritten, so not allowing them would avoid this situation.
  • Cons:
    1. Local-only pages could give the user the possibility of mantain pages that don't fit the project's goal (e.g. documenting file formats, mixed utilities, code functions or anything desired).
    2. Increases the complexity for all clients for implementing the cache handling system.

We have to discuss if the pros would outweigh the cons on doing this (or if there is another unseen pro or con, really).

diff-ing from my backup shows for example that the following pages were once present but were deleted upstream:

[/path/to/my/backup/of/home/dir/.cache/] $ diff -r tldr/ ~/.cache/tldr/
Only in tldr/pages/common: amass-db.md
Only in tldr/pages/common: sockstat.md
Only in tldr/pages/linux: ffuf.md
Only in tldr/pages/linux: flock.md
Only in tldr/pages/linux: select.md
Only in tldr/pages/linux: zipcloak.md

PS: I don't really see how the comment above is related with the original issue

@gutjuri
Copy link
Member

gutjuri commented Apr 14, 2024

How to reproduce on Linux:

1. Download all pages by running `tldr -u`

2. Go to `~/.cache/tldr/pages/linux/` dir

3. Create a test file: `cp cp.md zzzzz.md`

4. Running `tldr zzzzz` now shows the copied `cp` page

5. Run `tldr -u` again

6. See that the file `zzzzz.md` is still present.

Which tldr client do you use? I.e. how did you install it?

@gutjuri
Copy link
Member

gutjuri commented Apr 14, 2024

Hi, thank you for opening this issue!

This repository is just for the pages, specification for tldr clients (programs to visualize tldr pages), scripts for updating pages... But not for tldr clients. So your proposal would be for enforcing this to all clients.

Well, I see some pros and cons of not allowing local-only pages:

* Pros:
  
  1. Standard environment across clients.
  2. Encourages the user to submit a pull request for creating a new page.
  3. If page is added on the repository and there was already a local-only page with the same name, it would be likely overwritten, so not allowing them would avoid this situation.

* Cons:
  
  1. Local-only pages could give the user the possibility of mantain pages that don't fit the project's goal (e.g. documenting file formats, mixed utilities, code functions or anything desired).
  2. Increases the complexity for all clients for implementing the cache handling system.

We have to discuss if the pros would outweigh the cons on doing this (or if there is another unseen pro or con, really).

diff-ing from my backup shows for example that the following pages were once present but were deleted upstream:

[/path/to/my/backup/of/home/dir/.cache/] $ diff -r tldr/ ~/.cache/tldr/
Only in tldr/pages/common: amass-db.md
Only in tldr/pages/common: sockstat.md
Only in tldr/pages/linux: ffuf.md
Only in tldr/pages/linux: flock.md
Only in tldr/pages/linux: select.md
Only in tldr/pages/linux: zipcloak.md

PS: I don't really see how the comment above is related with the original issue

I do not think that mooreye is proposing to force clients to disallow local-only pages (which would be next to impossible anyways).
They raised the issue that pages that are deleted from this repository are not deleted locally by the client they are using. For example, the ffuf page was first added in linux/ (tldr-pages/tldr#6183), then it was added in common/ (tldr-pages/tldr#12082). Eventually, the version in linux/ was deleted (tldr-pages/tldr#12592). However, the client mooreye is using does not perform this deletion of linux/ffuf.md, so linux/ffuf.md keeps existing on mooreye's computer even though it has been deleted in this repository (mooreye, please correct me if I misunderstood the issue at hand).

I think that this is not an issue with the client spec, because the client spec uses the term "cache" and updating a cache implicates deleting outdated entries IMO. If I understood correctly, this is an issue with the specific tldr client used by mooreye.

@mooreye
Copy link
Author

mooreye commented Apr 14, 2024

I use Fedora's DNF package, which is now v3.2.0

Their upstream info points to https://github.com/tldr-pages/tldr-python-client

@gutjuri gutjuri transferred this issue from tldr-pages/tldr Apr 14, 2024
@gutjuri
Copy link
Member

gutjuri commented Apr 14, 2024

I use Fedora's DNF package, which is now v3.2.0

Their upstream info points to https://github.com/tldr-pages/tldr-python-client

Thanks! I transferred the issue into the python-client's repo.

Also, I have spotted the cause of the issue: The update_cache function of the client downloads a Zip Archive containing the repo's pages. It then individually copies each page into the current machine's filesystem. At no point does it delete pages that are not present in the downloaded Zip file from the current machine's filesystem.

tldr-python-client/tldr.py

Lines 380 to 407 in 68acf05

def update_cache(language: Optional[List[str]] = None) -> None:
languages = get_language_list()
if language and language[0] not in languages:
languages.append(language[0])
for language in languages:
try:
cache_location = f"{DOWNLOAD_CACHE_LOCATION[:-4]}-pages.{language}.zip"
req = urlopen(Request(
cache_location,
headers=REQUEST_HEADERS
), context=URLOPEN_CONTEXT)
zipfile = ZipFile(BytesIO(req.read()))
pattern = re.compile(r"(.+)/(.+)\.md")
cached = 0
for entry in zipfile.namelist():
match = pattern.match(entry)
if match:
store_page_to_cache(
zipfile.read(entry),
match.group(2),
match.group(1),
language
)
cached += 1
print(
"Updated cache for language "
f"{language}: {cached} entries"
)

@patricedenis patricedenis linked a pull request Jun 28, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants