Skip to content

Commit

Permalink
Merge pull request #1 from immutable/d740
Browse files Browse the repository at this point in the history
DATA-740:Dataflow changes for blockchain ingestion
  • Loading branch information
roopak-immutable authored Jul 19, 2023
2 parents 5498617 + d6e390a commit 3146f65
Show file tree
Hide file tree
Showing 14 changed files with 389 additions and 29 deletions.
19 changes: 19 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

## Jira

[XXX](https://immutable.atlassian.net/browse/XXX)

## Type of change

- [ ] Bug fix
- [ ] New feature
- [ ] Documentation
- [ ] Configuration

# How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

119 changes: 119 additions & 0 deletions .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
on:
push:
branches:
- master
pull_request:

name: Deploy blockchain etl datafow into GCP

jobs:
terraform:
name: "Run Terraform"
runs-on: ubuntu-latest
environment: ${{ github.head_ref }}
permissions: write-all

steps:
- uses: hashicorp/setup-terraform@v1
with:
terraform_wrapper: false

- name: Checkout
uses: actions/checkout@v2

- id: 'auth-server'
name: 'Authenticate to Google Cloud'
if: ${{ env.ACT == '' }}
uses: 'google-github-actions/auth@v1'
with:
token_format: 'access_token'
workload_identity_provider: 'projects/953944850513/locations/global/workloadIdentityPools/github/providers/github-provider'
service_account: '[email protected]'

- name: Set env vars (dev)
if: endsWith(github.ref, '/develop')
run: |
echo "ENV=dev" >> $GITHUB_ENV
- name: Set env vars (prod)
if: endsWith(github.ref, '/master')
run: |
echo "ENV=prod" >> $GITHUB_ENV
- name: 'Set up Cloud SDK'
uses: 'google-github-actions/setup-gcloud@v1'
with:
version: '>= 363.0.0'

- name: Login to Google Artifact Repository
id: login-gar
run: |
gcloud auth configure-docker us-docker.pkg.dev
- name: Build, tag, and push image to Google Artifact Repository
id: build-image-ecr
run: |
make gcp.ar.push
- name: Terraform Init
working-directory: ./terraform
id: init
run: terraform init

- name: Terraform Workspace Selection
working-directory: ./terraform
id: select
run: terraform workspace select prod

- name: Terraform Validate
working-directory: ./terraform
id: validate
run: terraform validate -no-color

- name: Terraform Plan
working-directory: ./terraform
id: plan
if: github.event_name == 'pull_request'
run: |
out="$(terraform plan -no-color)"
out="${out//'%'/'%25'}"
out="${out//$'\n'/'%0A'}"
out="${out//$'\r'/'%0D'}"
echo "::set-output name=out::$out"
continue-on-error: true

- uses: actions/github-script@v6
if: github.event_name == 'pull_request'
env:
github-token: ${{ secrets.GITHUB_TOKEN }}
with:
script: |
const output = `#### Terraform Initialization ⚙️\`${{ steps.init.outcome }}\`
#### Terraform Validation 🤖\`${{ steps.validate.outcome }}\`
#### Terraform Plan 📖\`${{ steps.plan.outcome }}\`
<details><summary>Show Plan</summary>
\`\`\`terraform\n
${{ steps.plan.outputs.out }}
\n\`\`\`
</details>
*Pusher: @${{ github.actor }}, Action: \`${{ github.event_name }}\`, Workflow: \`${{ github.workflow }}\`*`;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: output
})
- name: Terraform Plan Status
if: steps.plan.outcome == 'failure'
run: exit 1

- name: Terraform Apply
working-directory: ./terraform
if: github.ref == 'refs/heads/master' && github.event_name == 'push'
run: terraform apply -auto-approve -input=false
49 changes: 49 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Local .terraform directories
**/.terraform/*

# .tfstate files
*.tfstate
*.tfstate.*
*.terraform.lock.hcl
org/.terraform/*
department/.terraform/*
terraform/.terraform

# Crash log files
crash.log

#java
target/classes
target/generated-sources
target/generated-test-sources
target/maven-archiver
target/maven-status
target/surefire-reports
target/test-classes
target/*dataflow-0.1*
target/*original*


# Ignore any .tfvars files that are generated automatically for each Terraform run. Most
# .tfvars files are managed as part of configuration and so should be included in
# version control.
#
# example.tfvars

# Ignore override files as they are usually used to override resources locally and so
# are not checked in
override.tf
override.tf.json
*_override.tf
*_override.tf.json

# Include override files you do wish to add to version control using negated pattern
#
# !example_override.tf

# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
# example: *tfplan*
.idea/*
terraform/*/.terraform.lock.hcl
*.DS_Store

9 changes: 9 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
FROM gcr.io/dataflow-templates-base/java8-template-launcher-base:latest

ARG env

ENV FLEX_TEMPLATE_JAVA_CLASSPATH=/template/*
ENV FLEX_TEMPLATE_JAVA_MAIN_CLASS=io.blockchainetl.ethereum.EthereumPubSubToBigQueryPipeline

COPY target/blockchain-etl-dataflow-bundled-0.1.jar /template/
COPY chain-config/blockchain_zkevm_imtbl_testnet_13392_${env}.json /template/
42 changes: 42 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Initialise `ENV` variable to `dev` if `ENV` is not a defined environment variable
ENV?=dev
# Extract current repo's URL and remove the `.git` suffix
repo_url:=$(shell git config --get remote.origin.url)
repo_url:=$(repo_url:.git=)
# Declare gcp variables
gcp_region:=us
gcp_ar_docker_region:=$(gcp_region)-docker.pkg.dev
image_name:=blockchain-etl-dataflow
gcp_ar_url:=$(gcp_ar_docker_region)/$(ENV)-im-data/dataflow

# Declare default image name to be built
tag_commit:=$(shell git log --format=short --pretty="format:%h" -1)

############################################################################################################################################################
# Make command to build Docker image ######
# - `--platform linux/amd64`. ==> for linux/amd65 platform ######
############################################################################################################################################################

docker.build:
docker build --platform linux/amd64 -t $(image_name):$(tag_commit) -f Dockerfile --build-arg env=$(ENV) .

############################################################################################################################################################
# Make command to build docker image and publish image to AWS ECR ######
# - `ecr.push: docker.build` ==> Invokes `docker.build` before executing steps below ######
############################################################################################################################################################

define push_image
docker tag $1:$(tag_commit) $(gcp_ar_url)/$1:$(tag_commit)
docker push $(gcp_ar_url)/$1:$(tag_commit)
docker tag $1:$(tag_commit) $(gcp_ar_url)/$1:latest
docker push $(gcp_ar_url)/$1:latest
endef

gcp.ar.login:
gcloud auth configure-docker $(gcp_ar_docker_region)

gcp.ar.push: docker.build gcp.ar.login
$(call push_image,${image_name})

generate.jar:
mvn -e -Pdataflow-runner package -DskipTests
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@

Dataflow pipelines for Bitcoin ETL. Connects Pub/Sub topics with BigQuery tables.

## Run the following command to deploy the dataflow pipeline to the dev environment
## Local development (All commands in Makefile)

- Generate the jar file `make generate.jar`
- Push the image to Google Artifact Repository `make gcp.ar.push`
- Deploy dataflow job in dev workspace using `terraform plan`

```commandline
./deploy-immutable-zkevm-13372.sh
```
8 changes: 8 additions & 0 deletions chain-config/blockchain_zkevm_imtbl_testnet_13392_dev.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[
{
"transformNamePrefix": "blockchain_zkevm_imtbl_testnet_13392_",
"pubSubSubscriptionPrefix": "projects/dev-im-data/subscriptions/blockchain-zkevm-imtbl-testnet-13392-dataflow",
"bigQueryDataset": "raw_blockchain_zkevm_imtbl_testnet_13392",
"startTimestamp": "2019-03-02T00:00:00Z"
}
]
8 changes: 8 additions & 0 deletions chain-config/blockchain_zkevm_imtbl_testnet_13392_prod.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[
{
"transformNamePrefix": "blockchain_zkevm_imtbl_testnet_13392_",
"pubSubSubscriptionPrefix": "projects/prod-im-data/subscriptions/blockchain-zkevm-imtbl-testnet-13392-dataflow",
"bigQueryDataset": "raw_blockchain_zkevm_imtbl_testnet_13392",
"startTimestamp": "2019-03-02T00:00:00Z"
}
]
17 changes: 0 additions & 17 deletions deploy-immutable-zkevm-13372.sh

This file was deleted.

8 changes: 0 additions & 8 deletions immutable_zkevm_13372_config.json

This file was deleted.

Binary file added target/blockchain-etl-dataflow-bundled-0.1.jar
Binary file not shown.
12 changes: 12 additions & 0 deletions terraform/locals.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
locals {
source_code = "im-data-blockchain-etl-dataflow"
region = "us-central1"
zone = "us-central1-a"
team_owner = "data"
tags = {
team_owner = local.team_owner
terraform = "true"
environment = "${terraform.workspace}"
source = "${local.source_code}"
}
}
Loading

0 comments on commit 3146f65

Please sign in to comment.