This repository creates infrastructure.
It needs to be run as a pre-requisite for each stage
dev|qa|stg|prd
before running respective application
build pipeline. The value of stage
is used at
many places in the template to create aws specific and
application specific resources.
Please resolve the todo
items before you go ahead running
the pipeline.
Make sure the following environment variables are available in your pipeline tool.
<stage>
_AWS_ACCESS_KEY_ID<stage>
_AWS_SECRET_ACCESS_KEY<stage>
_AWS_ACCOUNT_ID<stage>
_AWS_SHARED_ACCOUNT_ID<stage>
_AWS_REGION
This repository creates the following infrastructure.
- Elasticache of type Redis (2000-elasticache.yml)
- Elastic Container service for Kubernetes (2000-eks.yml)
- Managed Service for Kafka (2000-msk.yml)
- Postgresql database (2000-rds.yml)
- Elastic Map Reduce (2000-emr.yml)
- ElasticSearch (2000-elasticsearch.yml)
- SSM parameters (1000-ssm.yml)
Recommended values for stagename
are dev|qa|stg|prd
in lowercase, although
any values can be chosen, although SSM Parameters template's Mapping
section needs to be updated accordingly.
All the templates have a condition name IsProd
which
evaluates to true if the stagename
is prd
(and not "prod").
- 1000-ssm.yml is the cloudformation template
- contains parameters that are to be used by all the stacks.
- Used in stack templates using dynamic reference e.g. {{param}}
- Contains a mapping of fairly static elements e.g. vpc, route53 domain, ssh key pair.
- The
stage
argument to be supplied to pipeline tool should be mapped to the Mapping section for each stack. e.g. if 'stage' isdev
, the Mappings section should contain an inner 'dev' section for each stack. - SSM has been chosen over SecretsManager since secretmanager cannot be dynamically referenced in Cloudformation custom resources as of now whereas SSM parameter can be.
- Please check and update this templates for parameters that are customizable for each stack.
- Keep in mind the Cloudformation limit of 200 resources per stack, since each parameter in the template is a resource.
- takes about a minute to deploy.
- 2000-rds.yml is the cloudformation template
- Creates a
private
postgresql database instance for non-prod. For prod, creates a slave instance as well. - creates necessary cloudwatch alarms mapped to an sns topic, only to be created for prod env.
- A KMS key is created to create a Secret that will hold the
password for the postgresql dba user
mydbuser
. The database created will bemydb
. - An ec2 instance will be created to import data into postgres
database or run any sql scripts required. The sql scripts
are to be kept under
resources/_postgres/<stagename>
, if an import-order is important then make sure the alphabetical order of the files is the same as required import-order. - creates Route53 record set for both Primary instance and
slave instance (if applicable). E.g. If the private Route53 hosted
zone is
private.example.com
and stagename isdev
then Primary DB instance will be resolvable atdb-dev.private.example.com
and Slave instance is resolvable atdb-slave-dev.private.example.com
. The format isdb-<stagename>.<Private Zone Name>
. - WARNING: please alter DbParam Resource in the cloudformation template according to the parameters required for postgresql database based on performance requirements before production use.
- termination protection is enabled for the master db in production as a safety measure.
- takes about 45 minutes to deploy for production.
- Note: Production database is set to be not-terminated. Hence before deleting production stack for rds, manually change the termination-enabled flag for RDS.
- 2000-eks.yml is the cloudformation template
- Creates an EKS cluster, nodes and required infrastructure such as security groups etc.
- EKS cluster is created using a cloudformation custom resource. This allows selection of features that are not available through standard AWS::EKS::Cluster resource e.g. cloudwatch logging options and API Server access modes. In addition it also allows automate the creation of ConfigMap that allows nodes to communicate with the EKS cluster.
- EKS cluster resource has a property called
AccessMode
whic takes values fromFullPublic|HalfPublic|FullPrivate
.FullPublic
creates a publicly accessible API Server.HalfPublic
creates a publicly accessible API Server which can be connected from within vpc from vpc-scoped network components.FullPrivate
creates an API Server which is not publicly available and can only be accessed from within vpc (or corporate network). A cicd tool's executors e.g. gitlab managed runners may not be able to access such API server. - A service account is also created that can be used for deployment to the eks cluster. A kubeconfig file meant for the service account is created and uploaded to an S3 bucket for usage.
- An ec2 instance is created to run kubectl command which
creates a ConfigMap and also any kubernetes application
files placed under
resources/_kube/<stagename>
, Kubectl will apply them in alphabetical order, so if apply-order matters ensure that the files have the same alphabetical order as the required apply-order. - This template is intended to be used for datplatform
web-ui, microservices and nifi. If these are to be hosted
on separate eks clusters then create that many eks template
files and change
AppName
cloudformation parameter accordingly. AWS Fargate for EKS
is generally available since 03-Dec-19, It can be considered a viable option however the following points need to be considered about it - 1. There is a maximum of 4 vCPU and 30Gb memory per pod. 2. no support for stateful workloads that require persistent volumes 3. cannot run Daemonsets, Privileged pods, or pods that use HostNetwork or HostPort- Recently in Dec-19, a resource
AWS::EKS::Nodegroup
was added to aws cloudformation. However, since we are creating an EKS cluster using a custom resource, it may not bode well with waiting for eks cluster to become active etc., so it was not explored.
- 2000-elasticache.yml is the cloudformation template.
- Creates a
private
Replication Group which is a collection of Elasticache clusters. At-rest and In-transit encryption are enabled. Allows automatic failover in case the primary-endpoint in unvailable. A separate KMS key is created for the replication group. - Creates a Route53 record set to resolve to redis cluster.
E.g. if the private route53 zone name is
private.example.com
and stagename isdev
then redis cluster is resolvable atdataplatform-redis-dev.private.example.com
and port6379
- 2000-emr.yml is the cloudformation template
- Creates a
private
EMR cluster MasterInstanceGroup, CoreInstanceGroup and TaskInstanceGroup with supporting infrastructure e.g. iam roles, security groups etc. Creating apublic
emr cluster requires different configuration in the template - Creates a route53 record set for the master emr. E.g. if the
private route53 zone is
private.example.com
and stagename isdev
then master emr is resolvable atdataplatform-masteremr-dev.private.example.com
. - Current Issue: Master and Slave security groups cross reference each other, hence while deleting the stack, the deletion fails. Consider manually removing the security group ingress rules from the security groups that fail to get deleted.
- 2000-elasticsearch.yml is the cloudformation template
- Creates a VPC scoped elasticsearch domain.
- It is required by AWS that when creating a VPC domain, you must choose a subnet that uses either 10.x.x.x or 172.x.x.x for its CIDR block.
- AWS::Elasticsearch::Domain does not support creating public log options. Hence a custom lambda resource has been created to create index,search,app log options for the domain. This log options are only created for production. (As of 07-Nov-19, a LogPublishingOption property is available for ElasticSearch domain.)
- For non-prod, it creates three instances by default, where as for prod the number is customizable in the SSM templates.
- creates cloudwatch alerts for the domains for prod.
- Note: currently there's an issue with creating an
elasticsearch service linked role using cloudformation.
As an alternative execute the following one-off command per aws
account as a pre-requisite before running the template:
aws iam create-service-linked-role --aws-service-name es.amazonaws.com
- Creates a route53 record set for elasticsearch vpc endpoint.
E.g. if the private route53 domain name is
private.example.com
and stagename isdev
then Elasticsearch domain is resolvable todataplatform-es-dev.private.example.com
- 2000-msk.yml is the cloudformation template
- Creates an MSK Cluster Configuration, MSK Cluster, Topics on the msk cluster, Schema Registry on an EC2.
- Make sure a VPC with atleast 3 private subnets in different availability zones has been selected for creating a cluster.
- AWS::MSK::Cluster does not support creating Cluster
configuration hence a custom lambda function is written
to perform that.
Note
: Please make sure that the cloudformation custom resourceKafkaPreProcessor
is updated with required cluster properties based on performance requirements. - MSK cluster is a private cluster with TLS enabled meaning that clients can verify server's authenticity, however clients are not required to authenticate themselves. At-rest and In-transit encryption is enabled.
- Once the cluster is created, a custom resource creates
some default topics on the cluster, which is useful if
create.topics.enable
property is not enabled, as recommeded by Kafka. The purpose of the function is to perform any postprocessing that may be required to perform but for now only supports creating topics throughKafkaPostProcessor
custom resource. - SchemaRegistry is not available by default with MSK cluster hence an autoscaling group for ec2 instances will be created that will host the schema-registry to update or read schemas.
- Currently no api is available to delete the cluster configuration.
- Updating cluster configuration is also not possible, since update-cluster-configuration operation is supported on an existing cluster, which could create cyclic dependency in the template if mentioned. Hence for now, it needs to be done manually.
- Deleting the stack could take some 40 minutes easily due to vpc-scoped network interfaces held by lamda (custom resources) need to be deleted, which takes time apparently.
- Creates Route53 record sets to connect to Zookeepers and
schema registries. E.g. if the private route53 zone name
is
private.example.com
and stagename isdev
then a list of zookeepers will be resolvable atdataplatform-zookeepers-dev.private.example.com
and a list of schema registry ec2 will be resolvable atdataplatform-schemareg-dev.private.example.com
- stack creation time is around 25 minutes
- 3000-gitlab.yml is the cloudformation template.
- It creates a private gitlab runner in the aws account being
used according to the
stage
set in. It allows private access to internal resources e.g. private artifactory, nexus etc which is not possible using shared public runners. - It runs just like any other gitlab runner.
- It has access to the account that it is created in according to the IAM role assigned to it in the template. It doesn't need to be compute heavy instance and can be light-weight e.g. t2.micro, t2.small.
- each runner will be tagged according to the stage it is being run in.
- It uses a
GitLabRunnerToken
parameter, which can be fetched from gitlab's project/group/subgroup -> Setting -> CI/CD. Please replace it with a valid value before using it. - Ideally
GitLabRunnerToken
value should be a gitlab Group's token (a group that is specific to a business unit) so that it can be used by all the projects under that group.