Starting a new project, I often find myself grappling with a familiar dilemma: the choice between a straightforward setup that may lack future scalability and a more scalable infrastructure that could be overkill or costly for initial needs. Opting for simplicity may lead to a future overhaul as the project grows while starting with a scalable solution can introduce complexity and waste resources during periods of low utilization. The agony of this decision arises from the challenging transition between these two approaches and the absence of seamless integration.
The cloud offers the allure of limitless scalability. However, the ability to scale efficiently from zero to peak capacity, and from peak capacity back to zero continues to be elusive.
AWS Lambda and AWS API Gateway shine in scaling from a non-existent base to moderate levels, making it suitable for early development and small deployments. Yet, scaling beyond that to handle high demand presents challenges (ie. cold starts, per invocation costs). Other services such as AWS EC2, AWS ECS, AWS EKS, AWS Fargate, and AWS App Runner may fit that niche better... though, they all do not scale to zero.
To overcome these issues LambdaFlex combines AWS Lamba and AWS Fargate for scaling from moderate to peak demand. AWS Fargate is capable of running the same Docker image used at the initial AWS Lambda stage, maintaining deployment uniformity. AWS Fargate allows for most re-use, while AWS App Runner and AWS EKS might be explored as alternatives.
Bridging AWS Lambda and AWS Fargate involves addressing how they scale depending on traffic and how traffic is routed between them. Initially, AWS API Gateway directs traffic to AWS Lambda function(s), which scale out as demand transitions from zero to a medium traffic level. This functionality must be replicated on AWS Fargate, along with managing the transition between services. Furthermore, it's noteworthy that AWS Lambda can operate independently of an AWS Virtual Private Cloud (VPC), whereas AWS Fargate requires VPC."
The progression from moderate use to peak demand occurs in a phased approach:
- Initially, AWS Route53, AWS API Gateway, and AWS Lambda are deployed to handle incoming requests.
- As demand increases, AWS VPC, AWS Load Balancer, AWS Target Group(s), and AWS Fargate are deployed
- AWS Route53 is updated to redirect traffic to the AWS Load Balancer, which is backed by AWS Fargate pods.
- As demand subsides, the AWS Route53 records, AWS Fargate services, and AWS VPC are removed and the system is back to cost-efficient AWS Lambda deployment.
flowchart TB
subgraph "Docker Image"
ServiceCode[Service Code]
end
subgraph " "
APIGateway[AWS API Gateway] --> Lambda[AWS Lambda] --> ServiceCode[Common Service Code]
end
subgraph "AWS VPC"
LoadBalancer[AWS LoadBalancer] --> TargetGroups[AWS Target Groups]
TargetGroups --> Fargate[AWS Fargate] --> ServiceCode
end
Route53 -. "Moderate-Peak Traffic" .-> LoadBalancer
Route53 -- "Zero-Moderate Traffic" --> APIGateway
The LambdaFlex methodology effectively merges the simplicity of minimal infrastructures with the robustness of highly scalable systems. By automatically shifting between AWS Lambda for low to moderate demand and AWS Fargate for higher traffic, and back as traffic subsides.
-
Route53 TTL Consideration: DNS records for AWS API Gateway and AWS Load Balancer are set with a TTL of 60 seconds. This implies that changes may take up to 60 seconds to take effect.
-
Scaling Time: AWS Load Balancer and AWS Fargate deployment(s) are not instantaneous, hence scale-up triggers should be tuned accordingly.
-
Scale-Down Process: The scale-down process should consider the TTL of the DNS records and include a delay before tearing down AWS Load Balancer and AWS Fargate services.
-
Scale Up/Down Automation: The scale-up/down process is triggered with AWS Cloudwatch alarms and transitions the AWS CloudFormation stack based on configurable thresholds.
The underlying service is implemented with FastAPI, a robust framework for building APIs and generating OpenAPI specifications. It operates independently from AWS Lambda, AWS Fargate, and Docker (despite Docker's benefits, I prefer avoiding its build time and debugging complexities). FastAPI is versatile and can be utilized as is or replaced with any other framework (ex. NodeJS, GO).
The bootstrapping logic resides in the entrypoint.sh
file. This script evaluates the environment and starts the service according to AWS Lambda or AWS Fargate.
Try some of these endpoints and see the stack scale up and down
graph TD
Lambda --> |Scale Up| Fargate
Fargate -->|Scale Down| Lambda
%% Start --> Lambda
%% Start --> Fargate
%% Lambda
%% --> |High Concurrency Alarm| C{High \nConcurrency}
%% --> |Scale Up| Fargate
%% Fargate
%% --> |Low Concurrency Alarm| D{Low \nConcurrency}
%% --> |Scale Down| Fargate
%% A[Start] -->|Deploy| B[Deployed]
%% B -->|High Concurrency Alarm| C{High Concurrency Alarm State}
%% C -->|ALARM| D[Scale Up]
%% C -->|OK| E[Normal Operation]
%% C -->|INSUFFICIENT_DATA| F[No Data]
%% D -->|Scale Up Complete| E
%% D -->|Scale Up Failed| G[Scale Up Failed]
%% E -->|Low Concurrency Alarm| H{Low Concurrency Alarm State}
%% H -->|ALARM| I[Scale Down]
%% H -->|OK| E
%% H -->|INSUFFICIENT_DATA| F
%% I -->|Scale Down Complete| E
%% I -->|Scale Down Failed| J[Scale Down Failed]
%% G -->|Retry Scale Up| D
%% J -->|Retry Scale Down| I