-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a human validation to the download #402
Comments
We want to make it more difficult for crawlers to pull out every plugin from the site which creates crippling load. So we propose two steps:
1 - The name of the plugin is shown to the user. Hovering over the name will show a 'copy' indicator. Clicking it will copy the name to your clipboard. |
Thanks for reacting quickly to this 👏 ! Is it possible to add user-agents to the whitelist? Typically I'm developing QDT which is a tool aiming to automate deployment of QGIS profiles and it definitely downloads plugins. I also think, about @Gustry QGIS Plugin Manager. |
Thanks for the dev @Xpirix and @timlinux QGIS-Plugin-Manager was fixed this morning to add a User-Agent CF 3liz/qgis-plugin-manager#66 |
So you simulate the QGIS user-agent? Is it the recommended practice? I thought that's better if every application has its own user-agent. This is what I implemented in QDT: https://github.com/Guts/qgis-deployment-cli/blob/ef43bbc658f00ad019e6e0e7b2341961a7ae49ba/qgis_deployment_toolbelt/utils/file_downloader.py#L44 |
From qgis/QGIS#57428 (comment) I thought his CI pipeline was already broken because of the change. It seems I read to fast :/ As QGIS-Plugin-Manager is used on our hosting infrastructure, I though I need to make a quick patch to make it work. As I said, I'm of course fine to define a proper user-agent for this tool if it's not fine. I'm following the discussion and I will update if needed. Let us know @timlinux and @Xpirix |
@Gustry Yes, it was broken – but most likely due to overload.
@Guts I do not see any harm in [qgis-plugin-manager] simulating the QGIS user-agent. No, this is not best practice.
@timlinux Crawlers/Robots will do the same, i.e. set the user-agent to QGIS, as soon as they get blocked. |
@timlinux I suggest Rate Limiting and possibly automatic blocking of CIDR blocks1 or single IP addresses. Footnotes
|
Thanks @benz0li , I think your suggestion makes sense. I will take a look at it.
Sure, I will work on this issue and let you know if a specific user-agent is required. But I also think simulating QGIS's user agent is not harmful even if it is not a good practice. Otherwise, we should add each specific user agent to the nginx configuration. @timlinux Is it okay if I combine the ideas in 2 levels:
|
Is there any insight into what clients or user-agents are causing the issues? Is it crawlers that would abide to robots.txt or similar rules? Why is the server saturated on CPU for what seem like simple file downloads which should be low CPU to serve and first saturate the network bandwidth? Is the only reason why the files are not served from a static directory that the site tries to count downloads and is that the reason for the CPU usage? If so, maybe switching to a more simple serving setup and a regular cronjob that parses logs for download counts might be a less intense solution. As a user who occasionally manually downloads plugins and who very rarely downloads all plugins (maybe 1-2 times per year), any intermediate step sounds annoying to users. |
We are facing issues with the repository sometimes being unavailable for automated deployments. We have a preconfigured list of plugins that will be downloaded and extracted on deployment for our environments, so all our users have a necessary plugins available on startup. Deploy will download the required (currently ~3, possibly more in the future) plugin zips on deploy (up to ~10 times a day, usually less), extract those and use QGIS_PLUGINPATH to provide the specified versions of the plugins to use. Clients are not downloading (and users instructed not to download) plugins themselves. While it is possible and probably very beneficial to implement a mirror repository to avoid deployments failing due to external services downtime and also to reduce the load when cached package can be used instead, there are important considerations on how the limiting method affect mirroring use cases:
|
Thanks @komima for your inputs. So for the short term, please update your script's UA so that it can download the plugins without user interaction. We will implement a better approach for the longer term. We will be happy to build a formal API with tokens issued to users for the medium term, but for now I just want to make sure the broad gamut of users have a good experience. |
Ok discussed this further with @mbernasocchi - agreed that starting with just the rate limiting option would be better - @Xpirix will you implement accordingly? |
@timlinux Sure, I will implement it. |
Please find the proposed PR for #402 (comment) at #413 |
From @timlinux
Original post at https://lists.osgeo.org/pipermail/qgis-user/2024-May/054439.html
The text was updated successfully, but these errors were encountered: