Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create basic CDK support and minimalist demo stack #6

Merged
merged 18 commits into from
Nov 22, 2023
Merged

Conversation

pcholakov
Copy link
Collaborator

@pcholakov pcholakov commented Nov 20, 2023

This change introduces a CDK project with several reusable constructs, and a demonstration stack that allows customers to easily deploy a self-hosted CDK stack and Lambda-based service handlers to AWS account.

As this is still a private repository (see linked GH issue for planned work), at the moment comment is primarily sought along these dimensions:

  • Meta: this is not a tiny PR; happy to rebase and split it into more logical chunks if it's too much to digest
  • README - what basic pointers would we like to give a customer who is reasonably proficient at AWS (and specifically CDK and Lambda/the AWS Serverless stack), but new to Restate?
  • Naming, granularity, and scope of responsibility of various components? ()
  • Is RestateSelfHostedStack (restate-self-hosted-stack.ts) simple and intuitive to you? (Imagine that in the future, a customer would write this code + their handler code only, while the Restate constructs will be provided as a library)

Testing

"Works on my machine"

❯ npx cdk deploy --require-approval never --no-rollback
...

pavel-RestateStack: deploying... [1/1]
pavel-RestateStack: creating CloudFormation changeset...

 ✅  pavel-RestateStack

✨  Deployment time: 365.63s

Outputs:
pavel-RestateStack.RestateIngressEndpoint = http://pavel--Resta-ZioBu8Ip87RB-1719749793.eu-central-1.elb.amazonaws.com:80
pavel-RestateStack.RestateServiceApiEndpoint61395974 = https://zabs4h8qa3.execute-api.eu-central-1.amazonaws.com/prod/
Stack ARN:
arn:aws:cloudformation:eu-central-1:663487780041:stack/pavel-RestateStack/a5769500-878b-11ee-bb73-064c3e60e9bf

✨  Total time: 378.22s

❯ export RESTATE_INGRESS_ENDPOINT=$(aws cloudformation describe-stacks --stack-name ${USER}-RestateStack \
    --query "Stacks[0].Outputs[?OutputKey=='RestateIngressEndpoint'].OutputValue" --output text)

❯ curl -X POST -w \\n ${RESTATE_INGRESS_ENDPOINT}/Greeter/greet -H 'content-type: application/json' -d '{"key": "Restate"}'

{"response":"Hello, Restate! :-)"}

Issue: #1
Closes: #3

Copy link
Collaborator

@jackkleeman jackkleeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love it. lets get this in and iterate from there!


const controller = new AbortController();
const healthCheckTimeout = setTimeout(() => controller.abort(), 3000);
const healthCheckUrl = `${props.ingressEndpoint}/grpc.health.v1.Health/Check`;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be using the health endpoint on the meta here, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're quite right, for pre-registration the meta health check is probably the one that matters! I'm going to revisit this entire handler to make it more robust but switching over to meta health check for now. (Separately, the ALB is continuously monitoring the ingress health check.)

console.log(`Got registration response back: ${await discoveryResponse.text()} (${discoveryResponse.status})`);

if (!(healthResponse.status >= 200 && healthResponse.status < 300)) {
// TODO: retry until successful, or some overall timeout is reached
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if only we had some kind of orchestrator that did this for us ;)

* The name of the Secrets Store secret that contains the GitHub token to use for Docker login.
* This is `/restate/docker/github-token` by default.
*/
githubTokenPath?: string;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be a followup, but i think we just made dist publicaly accessible :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, bit of a waste! But it was useful to remember how to do this in CDK – it may yet come up again. Something else I want to investigate: if we can push the Restate image into an ECR registry, perhaps we can eliminate the internet connectivity for the Restate VPC altogether. That would be a big security and cost win.

githubTokenPath?: string;

/**
* Temporary! This will disappear once we move to direct Lambda calls.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be a followup - but latest dist now does support direct calls

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack! I'll iterate on that as a follow up; I still have something weird going on with the discovery resource that I want to get to the bottom of, I'll update to direct Lambda integration asap after I figure it out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up doing this now, the latest branch changes switch to direct integration.

// These rules allow the service registration component to trigger service discovery as needed; the requests
// originate from a VPC-bound Lambda function that backs the custom resource.
this.vpc.privateSubnets.forEach((subnet) => {
restateInstanceSecurityGroup.addIngressRule(ec2.Peer.ipv4(subnet.ipv4CidrBlock), ec2.Port.tcp(RESTATE_META_PORT), "Allow traffic from the VPC to Restate meta");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably will want also 9071 and 9072 (worker http and postgres endpoints)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open question how introspection should work - vpn vs punch through load balancer. if punch through, then need some sort of auth story for the postgres endopint, cant just use aws stuff there

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One quick and secure way might be to allow customers to start an interactive psql session using Session Manager? Not ideal for all users but better than opening an unauthenticated endpoint to the internet.

@pcholakov pcholakov removed the request for review from gvdongen November 21, 2023 13:20
Copy link
Collaborator

@jackkleeman jackkleeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@@ -43,7 +42,8 @@ export const handler: Handler<CloudFormationCustomResourceEvent, Partial<CloudFo

const registerCallTimeout = setTimeout(() => controller.abort(), 3000);
const discoveryEndpointUrl = `${props.metaEndpoint}/endpoints`;
const registrationRequest = JSON.stringify({ uri: props.serviceEndpoint });
// const registrationRequest = JSON.stringify({ uri: props.serviceEndpoint });
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; can clean this up

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh, manage to miss this! Will clean up in the next PR.

@pcholakov pcholakov merged commit 4750047 into main Nov 22, 2023
@jackkleeman jackkleeman deleted the pavel/initial branch November 22, 2023 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate to native Lambda invoke mechanism
2 participants