Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webhook to generate metapackages for SpaceDock #177

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

HebaruSan
Copy link
Member

Motivation

SpaceDock has a modpack feature (internally called ModList), which doesn't do a lot today. Users can create a "pack" and add mods to it, but all this does is show those mods in one listing with separate Download buttons:

image

@V1TA5 is interested in adding functionality to make this more useful. One idea was to allow users to download a .ckan file for a modpack. SpaceDock can't do this by itself because it only has the SpaceDock id numbers for each mod, not the CKAN module identifiers.

Changes

Now a new /sd/metapackage accepts a form list parameter of mod_ids, as well as form parameters for name and abstract, and returns JSON:

{
    "missing": [ List of mod ids that are not in CKAN, in case SD wants to raise an error ],
    "metapackage": {
        Contents of .ckan file to install the modpack
    }
}

This way SpaceDock has the option to warn users and let them download a partial .ckan file if some modules are missing, or just cancel out.

Note that if multiple netkans share a SpaceDock entry, all will be added to the modpack.

If no modules are found, a 404 error is returned instead.

@HebaruSan HebaruSan added Enhancement New feature or request Webhooks Listens for notifications and triggers actions via queues labels May 30, 2020
@HebaruSan HebaruSan requested a review from DasSkelett May 30, 2020 08:06
@techman83
Copy link
Member

Hmm, I'm not 100% sure about this. If I understand correctly, it makes our hooks server an integral part of Spacedock's production workflow, with a function that isn't guaranteed to be reliable or fast. This could make troubleshooting a little hazy and potentially result in failures for end users.

Could Spacedock consume/cache our metadata to figure this out? We could add a webhook notification to let SD know when it's changed.

@HebaruSan HebaruSan force-pushed the feature/webhook-sd-metapackage branch from 8e36588 to 35ca281 Compare June 1, 2020 04:37
@HebaruSan
Copy link
Member Author

Could Spacedock consume/cache our metadata to figure this out?

If I'm understanding this suggestion correctly, it would involve embedding the following knowledge in SpaceDock-owned code:

  • Format of CKAN's metadata
  • Structure of NetKAN repo
  • The mapping from SpaceDock IDs to CKAN identifiers (which we just recently finished moving over to the CKAN side)
  • Format of a metapackage

I think that all belongs in CKAN-owned code. It's implementation details of our stuff that shouldn't be tightly coupled to other projects.

So is the concern that we might slow down SpaceDock? Or that SpaceDock might overwhelm NetKAN's servers with extra traffic? I'm trying to strategize about how to start getting this back on track. I did work on KSP-SpaceDock/SpaceDock#293 recently, so modpacks should be less egregiously awful soon, but they could still use a "killer app" to make them really useful.

@techman83
Copy link
Member

If I'm understanding this suggestion correctly, it would involve embedding the following knowledge in SpaceDock-owned code:

Format of CKAN's metadata

The CKAN spec is known, consistent and well documented. It is designed to be consumable by any client.

Structure of NetKAN repo

I'm not 100% across why this is needed?

The mapping from SpaceDock IDs to CKAN identifiers (which we just recently finished moving over to the CKAN side)
Format of a metapackage
I think that all belongs in CKAN-owned code. It's implementation details of our stuff that shouldn't be tightly coupled to other projects.

So is the concern that we might slow down SpaceDock? Or that SpaceDock might overwhelm NetKAN's servers with extra traffic?

This is a pretty big change from being an indexing service to an API provider used by end users. The service is not architected with that in mind. If the webhooks go down for a short period, it's annoying, but it's not impacting anyone's ability to do the thing they want to do. Not only that, it's not rate limited, load balanced, redundant or multi-threaded. We'd probably want run a separate service, with it's own workers, own config. The webhooks also have limited but privileged access to our repos, github etc etc, which is locked down to only allow spacedock + github.

I'm not opposed to the concept as a whole. If as a project, we want to provide this service, we should make a conscious and deliberate decision to do so. We would want to step back and do some policy and technical design before throwing another endpoint into the webhooks.

Solid idea and I think it's a good one. Let's flesh it out and fully understand the requirements and consequences of running the service.

@HebaruSan
Copy link
Member Author

OK thanks, that makes sense. Pulling out the key points as notes to self for future reference:

  • Rate limiting
  • Load balancing
  • Redundancy
  • Multi-threading
  • Separate service
    • Its own workers
    • Its own config

The webhooks also have limited but privileged access to our repos, github etc etc, which is locked down to only allow spacedock + github.

For what it's worth, as I had imagined it, this would have continued to be the case. The user would have clicked a download link in SpaceDock, which would have gone to a route on SpaceDock's web server; SpaceDock, on the server side, would then have contacted the CKAN-owned server (whether it's the current NetKAN-Infra webhooks or some new service), received the result, and returned the result back to the user. I definitely am not proposing to open the NetKAN-Infra webhooks up to wide public access.

If the webhooks go down for a short period, it's annoying, but it's not impacting anyone's ability to do the thing they want to do.

Continuing to brainstorm, did we discuss the possibility of doing this communication after a user saves a mod pack on SpaceDock, rather than on the fly at download? The SpaceDock database could add a ModList.metapackage column to store JSON received from NetKAN-Infra, update it in a background task when the mod pack changes, and then serve that up to users clicking the download link immediately. SpaceDock could manage a simple queue locally to re-try failed requests. This should keep the potential delay out of workflows and ensure that even a very popular mod pack would only need one request to CKAN. Does that address any of the concerns about reliability or load balancing?

@techman83
Copy link
Member

OK thanks, that makes sense. Pulling out the key points as notes to self for future reference:

  • Rate limiting

  • Load balancing

  • Redundancy

  • Multi-threading

  • Separate service

    • Its own workers
    • Its own config

Yep those are the important things to think about.

For what it's worth, as I had imagined it, this would have continued to be the case. The user would have clicked a download link in SpaceDock, which would have gone to a route on SpaceDock's web server; SpaceDock, on the server side, would then have contacted the CKAN-owned server (whether it's the current NetKAN-Infra webhooks or some new service), received the result, and returned the result back to the user. I definitely am not proposing to open the NetKAN-Infra webhooks up to wide public access.

Cool, that's probably a good approach for now. I would love to expand things more widely, but we'd have to do a whole lot of work to ensure things are secure, scale nicely and are reliable.

Continuing to brainstorm, did we discuss the possibility of doing this communication after a user saves a mod pack on SpaceDock, rather than on the fly at download? The SpaceDock database could add a ModList.metapackage column to store JSON received from NetKAN-Infra, update it in a background task when the mod pack changes, and then serve that up to users clicking the download link immediately. SpaceDock could manage a simple queue locally to re-try failed requests. This should keep the potential delay out of workflows and ensure that even a very popular mod pack would only need one request to CKAN. Does that address any of the concerns about reliability or load balancing?

I think that's a pretty workable approach. Ultimately this is an in kind feature for SpaceDock and our service isn't geared up for on-demand end user things. We can always send rate-limiting responses if we needed to and SpaceDock could backoff/retry etc. I don't think we'd need to implement it on the first pass though.

It would be nice if this were spun up as a separate container (we can use nginx to host it under a separate path like we did the legacy hooks), as we probably don't want to clash with the webhooks. But we're also almost at capacity on the instance.

@HebaruSan HebaruSan removed the request for review from DasSkelett November 10, 2021 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request Webhooks Listens for notifications and triggers actions via queues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants