-
-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blacklist requests that are duplicates of existing resources or bound to fail #28
Comments
Can you move your comment to #25 and close this? This is the scraper's repo. |
@rgaudin Moved it but I'd keep it open as this ticket is a little bit different. |
This one's better ; closing the other one but the problem raised there remains: where do we point to for stuff that we know exists? |
Is your question "in case there are several versions of the same zim" (e.g., Wikipedia mini/nopic/maxi)? The basic assumption here is that zimit provides a copy of the real thing, so we should send them the |
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions. |
See also #33 |
I've started to document blacklist I encounter during maintenance tasks at https://docs.google.com/spreadsheets/d/1mBjWT0hLmeg6EqT4nNEfCzLU8hGSzYs4IgbWDInhPqA/edit?gid=0#gid=0 |
Should we add a link to it to the routine? Should we count them in some way? |
Added the link to the routine, indeed it would help to have the link at hand. |
That's why I asked. The value would be to distinguish the importance between them ; should the eventual actions have to be prioritized |
I've added two more to the list. |
The weekly infra routine (manual checks we do every week to ensure infra is up and running) |
Following openzim/zimit#113, we should think about implementing a fairly easily editable list (hosted on drive.kiwix.org?) of blacklisted sites that can not be requested on zimit, e.g.
It's probably the matter of a separate ticket, but requests for websites we already have a scraper for (wikipedia, stackoverflow, etc.) should also be soft blocked and the user offered a direct link to the zim file.
The text was updated successfully, but these errors were encountered: