Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC|WIP] deploy to GCP #240

Merged
merged 1 commit into from
Jan 17, 2022
Merged

[RFC|WIP] deploy to GCP #240

merged 1 commit into from
Jan 17, 2022

Conversation

jimdigriz
Copy link
Contributor

@jimdigriz jimdigriz commented Apr 20, 2021

WARNING: Work in Progress, do not merge.

Some client work I was doing including making Portier deployable to GCP so I took the opportunity to generalise the solution so that others should be able to use it too. The deployment is described in the documentation in my branch at https://github.com/jimdigriz/portier-broker/tree/gcp/contrib/gcp and includes a list of issues/limitations; unlikely to be solved by myself as most require a Rust coder to work on the actual broker.

Probably of most interest to the project is the (non-awful) conditional build feature slipped into the Dockerfile that lets the user deploy a non-forked version of the project in their environment and pull in the data directory as an externally sourced tarball from an HTTP server.

As for the GCP deployment, it is the usual meat and potatoes you would expect...both the good and bad...

The project is far enough along to start seeking feedback from the maintainers and, if there is an appetite for it, what changes would be necessary for consideration of inclusion in the project.

@jimdigriz jimdigriz changed the title [WiP] deploy to GCP [RFC|WiP] deploy to GCP Apr 20, 2021
@jimdigriz jimdigriz changed the title [RFC|WiP] deploy to GCP [RFC|WIP] deploy to GCP Apr 22, 2021
Copy link
Member

@stephank stephank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! This is super interesting as someone who works at an AWS + Terraform shop. 😃

I'm no expert in any of the gcloud stuff, so that looks good to me. Just want to flesh out the Docker part some more, because I consider it one of the primary installation methods. (Even though I haven't paid enough attention to it.)

* there is no cast operator available that works on `$(ref ...)`
* we cannot use [`outputs`](https://cloud.google.com/deployment-manager/docs/configuration/expose-information-outputs) as it uses pass by reference and creates the same problem
* support for [in-transit encryption](https://cloud.google.com/memorystore/docs/redis/in-transit-encryption)
* [redis crate supports it](https://docs.rs/redis/0.20.0/redis/enum.ConnectionAddr.html#variant.TcpTls) though Portier's [`pubsub.rs`](../../src/utils/redis/pubsub.rs) explicitly does not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry about that. I'm not super happy with the pubsub code, but also haven't really worked with upstream to get us everything we need. Maybe newer versions already do what we want. (Can't say, don't remember off the top of my head what the exact pain points were.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #351 for this.

Dockerfile Outdated Show resolved Hide resolved
@jimdigriz
Copy link
Contributor Author

jimdigriz commented Oct 12, 2021

Sorry for taking some time to get back to this, but I have managed to get everything into a much better shape, including migrating to your data_url implementation.

I have this now running where we are about to make it live in production (~100k blocked domains and ~65k allowed origins[1] slipped in via data_url) so definitely a good time to get feedback and review from the project owners if this is something worth merging.

Thanks

[1] http://localhost:[1-65535] for development in addition to our public origins

Copy link
Member

@stephank stephank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Feel free to file more issues if you think there are more gains to be made here. I'll happily merge this is as-is, if that's okay with you too?

Has the production deploy gone well?


* [Cloud Run](https://cloud.google.com/run/pricing#tables) (~$10/region/month)
* You may wish to pick a Tier 1 region for better pricing where possible
* [Redis](https://cloud.google.com/memorystore/docs/redis/pricing#instance_pricing) (~$40/region/month)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: Does GCP have other storage options that could be more attractive if we supported them? For example, AWS has DynamoDB, which allows request-based pricing.

Copy link
Contributor Author

@jimdigriz jimdigriz Oct 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is Firestore but to be honest if you wanted to go down that path I would recommend providing a 'slum it with'/'poormans database' option of using an S3 backend where garbage collection is handed by the bucket retention policy. Azure does not support S3 (though their storage accounts have interesting functionality making a DB not necessary) but GCP does speak S3.

If I wanted to do this cheaper I could have run Redis from a $4/month burstable shared-core CPU instance but forking out $1000/year so I do not have to maintain authentication services is a bargain; it definitely is for the company I am doing work for where their cloud services bill is at least an order of magnitude higher per month. The other option would have been a third party identity management provider which would have cost substantially more than than the current GCP deployment.

The only real reason I can think of to avoid the Redis cost is "lets work to give Google less money" which I can get behind but personally I would prefer to see the project focus all the other things it wants to do.

So, in short I would not fret about it, if someone wants to make this cheaper, they will balance the $1000/year with their dev effort, ongoing support and PR submission 'costs' and probably like me conclude that they have bigger fish to fry.

Maybe, though not possible for GCP (as you need $1m/year to play), maybe consider offering an Azure and AWS managed service now you know Portier is already worth at least $1000/year to lazy people like me. ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only real reason I can think of to avoid the Redis cost is "lets work to give Google less money" which I can get behind but personally I would prefer to see the project focus all the other things it wants to do.

Hmm, well, project goals are a little lacking, so it's "whatever one wants to work on". 🙃

Maybe, though not possible for GCP (as you need $1m/year to play), maybe consider offering an Azure and AWS managed service now you know Portier is already worth at least $1000/year to lazy people like me. ;)

That is interesting! Hadn't considered these, and I can definitely investigate the AWS side.

There is Firestore but to be honest if you wanted to go down that path I would recommend providing a 'slum it with'/'poormans database' option of using an S3 backend...

S3 honestly sounds like a pretty good idea, something I hadn't considered either. I was thinking about what it would take to adapt Portier to work with serverless runtimes, and that combined with S3 or DynamoDB could make the whole deployment more cloud native and reduce ops overhead.

I don't know how much you can share, but am curious about what factors played into the decision making for your org. Is self-hosting about control / trust? Would self-hosting have been a requirement for competing identity services? Is the branding part (custom data) important?

No worries if you're not allowed to answer these. 🙂

Copy link
Contributor Author

@jimdigriz jimdigriz Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how much you can share, but am curious about what factors played into the decision making for your org. Is self-hosting about control / trust? Would self-hosting have been a requirement for competing identity services? Is the branding part (custom data) important?

We would have preferred a managed service simply as OAuth2/authentication is hard to get right and not something developers tend to be familiar with; most are only as involved as to import the relevant library for their language of choice. Also...monitoring/response/recovery, I always like to make that someone else problem as I have been there, got the t-shirt and it is really the worst. :)

I am struggling to see how 'trust' would be an issue here, as Portier only handles authentication (no authorization or accounting or the resource/data being protected) and so only sees the users email address and user-agent details. From a compliance perspective, it probably would be no different in practice to a "log in with Google/Microsoft/Facebook/...".

We needed custom branding (emails and the landing page) which made self-hosting necessary and operationally it would have been impossible, unwise and impolite to rely you on your demo/public service for our services. Self-hosting also let us put into place other controls such as origins and domain allow/block lists, etc. For a managed service, it would have been also great as I would not have to have plumbed in SendGrid or spend my time following up why the receiver blocked one of SendGrid's IPs due to poor reputation.

I did look around for comparable third party services before building out the self-hosted serverless GCP solution, but they tended to be more of a whole IDM kitchen sink, expect all users to be pre-registrated and charge a per-seat/account fee. I do a lot of work in the RADIUS world and balk at costs that are not percentile (ie. 95%ile) request rate orientated.

Wonderfully though, for both for self-hosted and managed, what is great is Portier is formalised as something the organisation supports and there is no barrier for internal developer adoption.

I have built a bespoke email-loop system before and it was tedious and time consuming whilst Portier presenting as a OAuth2 service which crucially means I no longer have to do 'bespoke' knowledge transfers to others on. I cannot emphasise enough how important it is to be able to say "this is just OAuth2 with ID tokens" and not "@jimdigriz's custom-sauce programming black-magic" :)

Cheers

* Hard coding of the Redis port to `6379/tcp`
* workaround: `Reference [$(ref.portier-europe-west4-redis.port)], was not of type string but [NUMBER], cannot replace inside ["redis://:[email protected]:$($(ref.portier-europe-west4-redis.port))/0\n"]`
* GCP's Deployment Manager is mostly awful, so when GCP throws you lemons, it provides zero tools (or documentation) to make lemonade
* there is no cast operator available that works on `$(ref ...)`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it help if we implemented separate BROKER_REDIS_HOST, BROKER_REDIS_PORT, etc.?

Copy link
Contributor Author

@jimdigriz jimdigriz Oct 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not as GCP barfs at the attempt to shoehorn a number into a string field in the YAML template. IIRC all fields are strings at the templating phase and it is only later passed as YAML (and converted back to numbers) similarly to the situation in a Jinja template...though of course there numbers are automatically casted to strings.

Fortunately in practice the hardcoding is not a real problem and I only document it to stop other developers thinking I must have been an idiot to do it this way and then they waste their time rediscovering this. It could only ever become a problem if the administrator ran multiple Redis instances in the same GCP project which I suspect no one would ever want to; it costs nothing to keep Portier in its own isolated GCP project (similarly if this was in AWS and a separate account) and gain the security benefits of doing so.

contrib/gcp/README.md Outdated Show resolved Hide resolved
@stephank
Copy link
Member

@jimdigriz I'd like to merge this. Is it still WIP in your opinion?

@jimdigriz
Copy link
Contributor Author

I've had this in production for a client for a few months now and it has not needed to make any major changes since the end of October; had to move to a single region (instead of multi-region) as in production I found the C2S geo-targetting hit a different region to the S2S communication and so the Redis server in that region had no idea what was going on.

Everything has been working fine though since. I think it is ready to merge.

@stephank stephank merged commit bbc2a69 into portier:main Jan 17, 2022
@jimdigriz jimdigriz deleted the gcp branch January 17, 2022 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants