Blacklist requests that are duplicates of existing resources or bound to fail #28

Popolechien · 2022-03-02T10:15:01Z

Following openzim/zimit#113, we should think about implementing a fairly easily editable list (hosted on drive.kiwix.org?) of blacklisted sites that can not be requested on zimit, e.g.

kiwix.org subdomains (download and library);
very large corporate websites (e.g. Facebook, Twitter, Reddit, Youtube, etc.)
websites that have been scraped in the past and failed.

It's probably the matter of a separate ticket, but requests for websites we already have a scraper for (wikipedia, stackoverflow, etc.) should also be soft blocked and the user offered a direct link to the zim file.

rgaudin · 2022-03-02T10:16:58Z

Can you move your comment to #25 and close this? This is the scraper's repo.

Popolechien · 2022-03-02T10:20:10Z

@rgaudin Moved it but I'd keep it open as this ticket is a little bit different.

rgaudin · 2022-03-02T10:21:50Z

This one's better ; closing the other one but the problem raised there remains: where do we point to for stuff that we know exists?

Popolechien · 2022-03-03T16:27:09Z

Is your question "in case there are several versions of the same zim" (e.g., Wikipedia mini/nopic/maxi)?

The basic assumption here is that zimit provides a copy of the real thing, so we should send them the maxi zim file.

stale · 2022-05-03T01:50:18Z

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

kelson42 · 2023-11-04T17:00:54Z

See also #33

benoit74 · 2024-10-28T09:25:25Z

I've started to document blacklist I encounter during maintenance tasks at https://docs.google.com/spreadsheets/d/1mBjWT0hLmeg6EqT4nNEfCzLU8hGSzYs4IgbWDInhPqA/edit?gid=0#gid=0

rgaudin · 2024-10-28T09:47:03Z

Should we add a link to it to the routine? Should we count them in some way?

benoit74 · 2024-10-28T10:04:00Z

Added the link to the routine, indeed it would help to have the link at hand.
About counting them, what would be the added value? (nothing against it, but I don't get why we would like to do this, and it seems to be cumbersome / complex to implement)

rgaudin · 2024-10-28T10:28:17Z

That's why I asked. The value would be to distinguish the importance between them ; should the eventual actions have to be prioritized

Popolechien · 2024-10-28T14:19:29Z

I've added two more to the list.
Which routine are we talking about?

benoit74 · 2024-10-28T19:53:47Z

The weekly infra routine (manual checks we do every week to ensure infra is up and running)

Popolechien added the enhancement New feature or request label Mar 2, 2022

Popolechien transferred this issue from openzim/zimit Mar 2, 2022

rgaudin mentioned this issue Mar 2, 2022

Reject requests for Wikipedia zim files. #25

Closed

stale bot added the stale label May 3, 2022

rgaudin mentioned this issue Aug 21, 2023

Create blacklist of websites that won't be zimmed up. openzim/zimit#206

Closed

kelson42 added the prio1 label Nov 4, 2023

stale bot removed the stale label Nov 4, 2023

kelson42 pinned this issue Nov 4, 2023

Onyx2406 mentioned this issue Mar 10, 2024

GSoC 2024: Internationalization Implementation for Zimit Frontend #51

Closed

benoit74 mentioned this issue Jun 4, 2024

youzim-it is pending my requests too long ! #55

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blacklist requests that are duplicates of existing resources or bound to fail #28

Blacklist requests that are duplicates of existing resources or bound to fail #28

Popolechien commented Mar 2, 2022

rgaudin commented Mar 2, 2022

Popolechien commented Mar 2, 2022

rgaudin commented Mar 2, 2022

Popolechien commented Mar 3, 2022

stale bot commented May 3, 2022

kelson42 commented Nov 4, 2023

benoit74 commented Oct 28, 2024

rgaudin commented Oct 28, 2024

benoit74 commented Oct 28, 2024

rgaudin commented Oct 28, 2024

Popolechien commented Oct 28, 2024

benoit74 commented Oct 28, 2024

Blacklist requests that are duplicates of existing resources or bound to fail #28

Blacklist requests that are duplicates of existing resources or bound to fail #28

Comments

Popolechien commented Mar 2, 2022

rgaudin commented Mar 2, 2022

Popolechien commented Mar 2, 2022

rgaudin commented Mar 2, 2022

Popolechien commented Mar 3, 2022

stale bot commented May 3, 2022

kelson42 commented Nov 4, 2023

benoit74 commented Oct 28, 2024

rgaudin commented Oct 28, 2024

benoit74 commented Oct 28, 2024

rgaudin commented Oct 28, 2024

Popolechien commented Oct 28, 2024

benoit74 commented Oct 28, 2024