-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache WFS data to object storage #349
Comments
@smnorris can you clarify if you mean local or remote object storage? |
Remote. I have not yet tinkered with optimizing the parquet files, the data volume is so trivial in most cases it isn't a big deal. This presumes that clients generally do not need 'live' downloads - data replicated on a scheduled basis by a centralized workflow would meet user needs. This is likely the case for 99% of WFS data - fire perimeters are an obvious exception. |
FWIW, I could just add scheduled replication workflows to Python bcdata - but I don't have an NRS object storage bucket with that mandate, so I'm asking here 😁 |
Hey @smnorris - love the idea, but I think it's out of scope for us here. If the mapping team does something like this though we will 100% make use of it! |
Too bad!
|
I think the other thing that makes this less practical for us is that we support spatial and non-spatial querying (I can't remember if you do?), so would have to build a whole other query backend for parquet. Not hard, but time is limited! :) |
Good point, they won't work as a resource without that work. |
As mentioned in #345, caching frequently accessed WFS datasets to file on object storage could work well to reduce maintenance burden / outages related to WFS / WFS server load.
Functional proof of concept:
https://github.com/smnorris/bcfishpass/blob/main/.github/workflows/replicate-monthly.yaml
https://github.com/smnorris/bcfishpass/blob/main/.github/workflows/replicate-weekly.yaml
https://github.com/smnorris/bcfishpass/blob/main/jobs/replicate_bcgw
https://github.com/smnorris/bcfishpass/blob/main/jobs/bcgw_sources.json
Existing structure presumes clients would access the cache via direct url rather than interfacing with bcdata, but that could be tweaked.
WFS vs S3 cache for a ~370k record dataset:
The text was updated successfully, but these errors were encountered: