Skip to content

Commit

Permalink
feat(cdp): add cdp destination APIs (#14994)
Browse files Browse the repository at this point in the history
This commit adds the CDP destination APIs. Key changes include:

 - use `db-migrate` for migrations
 - jest for functional_tests (although I would be happy to use vitest or
   alternatives if we want to, I didn't want to change too much at once)
 - pnpm for package management
 - koajs for the server
 - Ajv for validation
 - A separate PostgreSQL logical database for the destination APIs
   persistence.

Things still to do:

 - add some delivery mechanism that takes events from Kafka and puts
   them to the destinations.
 - add CI
 - add to Helm Chart
 - add some method of authentication. I've added the API here but it
   might be that I just end up putting that in the main app in the end,
   depending on how much momentum there is to try out separating the API
   a bit, and the logistics of that.
  • Loading branch information
Harry Waye authored Apr 12, 2023
1 parent 5b0b748 commit 9529cdd
Show file tree
Hide file tree
Showing 16 changed files with 6,455 additions and 0 deletions.
128 changes: 128 additions & 0 deletions .github/workflows/cdp.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
#
# Build and test the Docker image for the CDP service found in the cdp/
# directory.
#
# This job is triggered by pushes to the master branch and by pull requests that
# touch the cdp/ directory.
#
# Once built we run the functional tests against the running image.

name: CDP CI

on:
push:
branches:
- master
paths:
- cdp/**
pull_request:
branches:
- master
paths:
- cdp/**

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: docker/setup-buildx-action@v2
- uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- uses: docker/metadata-action@v4
id: meta
with:
images: ghcr.io/${{ github.repository }}/cdp
flavor: |
latest=${{ github.ref == 'refs/heads/master' }}
# Make the image tags used for docker cache. We use this rather than
# ${{ github.repository }} directly because the repository
# organization name is has upper case characters, which are not
# allowed in docker image names.
- uses: docker/metadata-action@v4
id: meta-cache
with:
images: ghcr.io/${{ github.repository }}/cdp
tags: |
type=raw,value=cache
- uses: docker/build-push-action@v4
with:
context: cdp
file: cdp/Dockerfile
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=registry,ref=${{ steps.meta-cache.outputs.tags }}
cache-to: type=registry,ref=${{ steps.meta-cache.outputs.tags }},mode=max

# Output the image tags so that we can use them in the next job.
outputs:
tags: ${{ steps.meta.outputs.tags }}

test:
# Run the functional tests against the CDP service. We pull the image
# from GHCR and run it locally. We need only the db service from the
# main docker-compose.yml file, so we use the --services flag to only
# start that service.
runs-on: ubuntu-latest
needs: build
steps:
- uses: actions/checkout@v3
- uses: docker/setup-buildx-action@v2
- uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Install PNPM
run: |
npm install -g pnpm
- name: Setup node
uses: actions/setup-node@v3
with:
node-version: '18'
cache: 'pnpm'
cache-dependency-path: cdp/pnpm-lock.yaml

- name: Install test dependencies
working-directory: cdp
run: |
pnpm install --frozen-lockfile
- name: Start CDP
working-directory: cdp
run: |
mkdir -p /tmp/logs
docker compose -f ../docker-compose.dev.yml up -d db >> /tmp/logs/db.txt
# Wait for the db service to be ready, up to 30 seconds.
SECONDS=0
until docker compose -f ../docker-compose.dev.yml exec -T db pg_isready; do
if [ $SECONDS -gt 30 ]; then
echo "Timed out waiting for db service to be ready."
exit 1
fi
sleep 1
done
# Create a shell alias for the docker image we just built, using the tags output.
export SECRET_KEY=$(openssl rand -hex 32)
CDP_RUN="docker run -e SECRET_KEY=$SECRET_KEY -e DATABASE_URL=postgres://posthog:posthog@localhost:5432/posthog --rm --network=host ${{ needs.build.outputs.tags }}"
# Run the migrations.
$CDP_RUN sqlx migrate run
# Start the CDP service.
$CDP_RUN &> /tmp/logs/cdp.txt &
# Run the functional tests.
pnpm jest
1 change: 1 addition & 0 deletions cdp/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
dist/
15 changes: 15 additions & 0 deletions cdp/.swcrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"jsc": {
"parser": {
"syntax": "typescript",
"tsx": false,
"decorators": false,
"dynamicImport": false
},
"target": "es2020",
"baseUrl": "."
},
"module": {
"type": "commonjs"
}
}
82 changes: 82 additions & 0 deletions cdp/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Build the CDP server image. We use a multi-stage build to first build the CDP
# node application, then copy the built files to the final image.
#
# Note: separtely we bundle the resulting dist folder into the
# production.Dockerfile image such that the main image can run the entire
# application without needing to build the CDP server.
#
# We also need to copy the migrations folder as the CDP server needs it to
# run the migrations. The migrations use the Rust application sqlx-cli to
# run the migrations, so we need to copy the compiled binary from the Rust
# image. I'm sure there's a better way to do this, but this works for now.

FROM rust:1.68.2-slim-bullseye AS sqlx-cli-build

WORKDIR /code
SHELL ["/bin/bash", "-o", "pipefail", "-c"]

# Since we are using the slim image, we need to install `pkg-config` and
# `libssl-dev` so cargo install completes successfully.
RUN apt-get update && \
apt-get install -y --no-install-recommends \
"pkg-config" \
"libssl-dev" \
&& \
rm -rf /var/lib/apt/lists/*

# Install SQLx CLI.
RUN cargo install --version 0.6.3 sqlx-cli --no-default-features --features native-tls,postgres

FROM node:18.12.1-bullseye-slim AS cdp-build

WORKDIR /code
SHELL ["/bin/bash", "-o", "pipefail", "-c"]

# Install Node.js dependencies.
COPY package.json pnpm-lock.yaml ./
RUN corepack enable && \
mkdir /tmp/pnpm-store && \
pnpm install --frozen-lockfile --store-dir /tmp/pnpm-store && \
rm -rf /tmp/pnpm-store

# Build the CDP server.
#
# Note: we run the build as a separate action to increase
# the cache hit ratio of the layers above.
COPY ./src/ ./src/
COPY tsconfig.json .swcrc ./
RUN pnpm build

# As the CDP server is now built, let’s keep only prod dependencies in the
# node_module folder as we will copy it to the last image. We remove all
# dependencies first to ensure we end up with the smallest possible image.
RUN rm -rf node_modules && \
corepack enable && \
mkdir /tmp/pnpm-store && \
pnpm install --frozen-lockfile --store-dir /tmp/pnpm-store --prod && \
rm -rf /tmp/pnpm-store

# Build the final image.
FROM node:18.12.1-bullseye-slim

WORKDIR /code
SHELL ["/bin/bash", "-o", "pipefail", "-c"]

# Install tini.
RUN apt-get update && \
apt-get install -y --no-install-recommends \
"tini" \
&& \
rm -rf /var/lib/apt/lists/*

# Copy the SQLx CLI binary from the previous stage.
COPY --from=sqlx-cli-build --link /usr/local/cargo/bin/sqlx /usr/local/bin/sqlx

# Copy the built CDP server from the previous stage.
COPY --from=cdp-build --link /code/node_modules/ ./node_modules/
COPY --from=cdp-build --link /code/dist/ ./dist/
COPY --link ./migrations/ ./migrations/

# Set [Tini](https://github.com/krallin/tini) as the entrypoint.
ENTRYPOINT ["/usr/bin/tini", "--"]
CMD ["node", "dist/rest.js"]
5 changes: 5 additions & 0 deletions cdp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Customer Data Pipeline

Handles delivering event streams to destinations.

TODO: fill this in a bit more. Very much a work in progress at the moment.
Loading

0 comments on commit 9529cdd

Please sign in to comment.