Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data warehouse for access logs #46

Merged
merged 41 commits into from
Oct 19, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
3ca9dec
feat(cdk): output access logs bucket
kikuomax Sep 20, 2022
1a13ccf
chore(cdk-ops): update CDK
kikuomax Sep 20, 2022
ae2887c
fix(cdk-ops): name of the main stack
kikuomax Sep 20, 2022
16042a7
feat(cdk-ops): resolve access logs bucket
kikuomax Sep 20, 2022
0211abb
feat(cdk-ops): mask access logs
kikuomax Sep 20, 2022
827846f
feat(cdk-ops): provision AccessLogsMasking
kikuomax Sep 20, 2022
714c370
feat(cdk-ops): process S3 event via SQS
kikuomax Sep 21, 2022
e459feb
feat(cdk-ops): ignores non-existing logs
kikuomax Sep 23, 2022
3af141e
feat(cdk-ops): delete masked access logs
kikuomax Sep 23, 2022
3433cd6
chore(cdk-ops): rename AccessLogsMasking
kikuomax Sep 24, 2022
1fffb18
feat(cdk-ops): prepend prefix to masked logs
kikuomax Sep 26, 2022
58327f3
feat(cdk-ops): prefix date
kikuomax Sep 26, 2022
344b6d3
feat(cdk-ops): configure env
kikuomax Oct 1, 2022
91dfd32
feat(cdk-ops): latest boto3 layer
kikuomax Oct 1, 2022
b2b6987
feat(cdk-ops): provision data warehouse for access logs
kikuomax Oct 1, 2022
268279f
chore(cdk-ops): install cdk2-python-library-layer
kikuomax Oct 8, 2022
570d151
feat(cdk-ops): add libdatawarehouse
kikuomax Oct 8, 2022
29b0b25
feat(cdk-ops): use libdatawarehouse
kikuomax Oct 8, 2022
a9796ab
feat(cdk-ops): load access logs
kikuomax Oct 8, 2022
68efb84
feat(cdk-ops): add sequential numbers to rows
kikuomax Oct 8, 2022
27d4910
fix(cdk-ops): check existence of access logs
kikuomax Oct 9, 2022
b0d9808
chore(cdk-ops): add thoughts about VACUUM
kikuomax Oct 9, 2022
b669dc0
feat(cdk-ops): schedule loading access logs
kikuomax Oct 10, 2022
ae03435
fix(cdk-ops): ignore invalid keys
kikuomax Oct 10, 2022
e7a8ea8
feat(cdk-ops): add vacuum workflow
kikuomax Oct 10, 2022
8a58bc9
feat(cdk-ops): start VACUUM after load
kikuomax Oct 10, 2022
605f682
chore(cdk-ops): output ARN of load-access-logs
kikuomax Oct 11, 2022
c60c541
chore(cdk-ops): install @aws-sdk/client-lambda
kikuomax Oct 11, 2022
6d7ac60
feat(cdk-ops): add data warehouse population script
kikuomax Oct 11, 2022
27d794b
feat(cdk-ops): add production data warehouse
kikuomax Oct 11, 2022
00f8ea7
fix(cdk-ops): intuitive timeout
kikuomax Oct 14, 2022
284ae05
feat(cdk-ops): enable enhanced VPC routing
kikuomax Oct 14, 2022
00ab4f6
fix(cdk-ops): NULL user_agent
kikuomax Oct 15, 2022
39d1779
feat(cdk-ops): enable enhancedVpcRouting
kikuomax Oct 18, 2022
7973ee5
docs(cdk-ops): add data warehouse architecture
kikuomax Oct 18, 2022
4b7a7b9
docs(cdk-ops): update README
kikuomax Oct 19, 2022
9b61fec
docs(cdk-ops): refine data-warehouse.md
kikuomax Oct 19, 2022
81cea1e
docs(cdk-ops): translate data-warehouse.md → ja
kikuomax Oct 19, 2022
b2b47ad
docs(cdk-ops): translate README → ja
kikuomax Oct 19, 2022
8ba5e96
docs(cdk-ops): link to Redshift admin info
kikuomax Oct 19, 2022
c658d26
docs(cdk-ops): update README
kikuomax Oct 19, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions README.ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,10 @@

サブフォルダ[`zola`](zola/README.ja.md)をご覧ください。

## Continuous Delivery
## DevOps

以下の["DevOps"](https://en.wikipedia.org/wiki/DevOps)機能も提供します。
- Continuous Delivery: このレポジトリの`main`ブランチが更新されると、codemongerウェブサイトを更新するためのワークフローが開始します。
- データウェアハウス: codemongerウェブサイトのアクセスログはデータウェアハウスに格納されます。

このレポジトリの`main`ブランチが更新されると、codemongerウェブサイトを更新するためのワークフローが開始します。
詳しくはサブフォルダ[`cdk-ops`](cdk-ops/README.ja.md)をご覧ください。
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,10 @@ Please refer to the subfolder [`cdk`](cdk).

Please refer to the subfolder [`zola`](zola).

## Continuous delivery
## DevOps

The following ["DevOps"](https://en.wikipedia.org/wiki/DevOps) features are also provided,
- Continuous delivery: when the `main` branch of this repository is updated, the workflow to update the codemonger website starts.
- Data warehouse: access logs of the codemonger website are stored in the data warehouse.

When the `main` branch of this repository is updated, the workflow to update the codemonger website starts.
Please refer to the subfolder [`cdk-ops`](cdk-ops) for more details.
1 change: 1 addition & 0 deletions cdk-ops/.gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
*.js
!jest.config.js
!/bin/populate-data-warehouse.js
*.d.ts
node_modules

Expand Down
43 changes: 43 additions & 0 deletions cdk-ops/README.ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,11 @@ codemongerウェブサイトのコンテンツを保管し配信するAWSリソ
ワークフローは`main`ブランチが更新された際(例えばプルリクエストがマージされた際)に開始されます。
プルリクエストの作成前に作成者は[`zola serve`](https://www.getzola.org/documentation/getting-started/cli-usage/#serve)でローカルにコンテンツをレビューしなければなりません。

## アクセスログ用のデータウェアハウス

このCDKスタックはアクセスログ用のデータウェアハウスを確保します。
詳しくは[`docs/data-warehouse.ja.md`](./docs/data-warehouse.ja.md)をご参照ください。

## 事前準備

### コンテンツのためのCDKスタックをデプロイする
Expand Down Expand Up @@ -106,6 +111,44 @@ npx cdk deploy --toolkit-stack-name $TOOLKIT_STACK_NAME -c "@aws-cdk/core:bootst

CDKスタックをデプロイすると、CloudFormationスタック`codemonger-operation`が作成または更新されます。

#### Amazon Redshift Serverlessネームスペースの管理ユーザー

このCDKスタックは[Amazon Redshift Serverless (Redshift Serverless)](https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-serverless.html)ネームスペースの確保時に管理ユーザーを作成します。
管理ユーザーのパスワードは[AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/intro.html)の管理するシークレットとして作成されます。
**CloudFormationはRedshift Serverlessネームスペースの管理ユーザー名とパスワードを一度作成すると変更することができない**ので、**シークレットが更新(再生成)されると管理パスワードが失われます**。

これが起きてしまったら、別のスーパーユーザーで管理パスワードを手作業で更新しなければなりません。
Redshift Serverlessコンソールで管理パスワードを変更するか、CloudFormationの実行ロール\*で[Query Editor v2](https://aws.amazon.com/redshift/query-editor-v2/)を実行して管理パスワードをリセットすることもできます。

\* Redshift Serverlessはネームスペースの作成者に管理権限を与えます。
Redshift Serverlessネームスペースの確保にCDK (CloudFormation)を使用しているので、CloudFormationの実行ロールがその力を授かることになります。

## デプロイ後

### データウェアハウスにデータベースとテーブルを作成する

このCDKスタックをデプロイした後、データウェアハウスにデータベースとテーブルを作成しなければなりません。
以下のコマンドを実行してください。

```sh
npm run populate-dw -- development
npm run populate-dw -- production
```

`populate-dw`スクリプトは[`bin/populate-data-warehouse.js`](./bin/populate-data-warehouse.js)を実行します。

この手続きはCDKスタックを最初に確保した際に一度だけ必要です。

### 日々のアクセスログ読み込みを有効にする

このCDKスタックは、CloudFrontのアクセスログをデータウェアハウスに読み込むLambda関数を1日に1回実行する[Amazon EventBridge](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-what-is.html)のルールを確保します。
ルールはデフォルトで無効化されているので、日々のアクセスログ読み込みを実行するには有効化しなければなりません。
開発用\*と製品用で別々のルールがあります。

確実に[データウェアハウスにデータベースとテーブルを作成](#データウェアハウスにデータベースとテーブルを作成する)しておいてください。

\* 開発用のルールは**毎時**トリガーされます。

## なぜExportを使わないのか?

このCDKスタックはメインとなるcodemongerのCloudFormationスタックに依存します。
Expand Down
43 changes: 43 additions & 0 deletions cdk-ops/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,11 @@ This CDK stack provisions a [AWS CodePipeline](https://docs.aws.amazon.com/codep
The workflow is triggered when the `main` branch is updated; e.g., a pull request is merged.
An author of a pull request has to locally review contents with [`zola serve`](https://www.getzola.org/documentation/getting-started/cli-usage/#serve) before making the pull request.

## Data warehouse for access logs

This CDK stack provisions a data warehouse for access logs.
Please refer to [`docs/data-warehouse.md`](./docs/data-warehouse.md) for more details.

## Prerequisites

### Deploying CDK stack for contents
Expand Down Expand Up @@ -106,6 +111,44 @@ npx cdk deploy --toolkit-stack-name $TOOLKIT_STACK_NAME -c "@aws-cdk/core:bootst

After deploying the CDK stack, you will find the CloudFormation stack `codemonger-operation` created or updated.

#### Admin user of the Amazon Redshift Serverless namespace

This CDK stack creates the admin user of the [Amazon Redshift Serverless (Redshift Serverless)](https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-serverless.html) namespace when it provisions the namespace.
The password of the admin user is created as a secret managed by [AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/intro.html).
Since **CloudFormation cannot change the admin username and password of the Redshift Serverless namespace** once it is provisioned, the **admin password is lost in case the secret is updated (regenerated)**.

If this happens, you have to manually update the admin password as another superuser.
You can change the admin password on the Redshift Serverless console, or you can assume the CloudFormation execution role\* on [Query Editor v2](https://aws.amazon.com/redshift/query-editor-v2/) to reset the admin password.

\* Redshift Serverless gives the creator of a new namespace an admin privilege of it.
Because we are using CDK (CloudFormation) to provision a Redshift Serverless namespace, the execution role of CloudFormation deserves the power.

## Post deployment

### Populating the database and tables on the data warehouse

After deploying this CDK stack, you have to populate the database and tables on the data warehouse.
Please run the following commands.

```sh
npm run populate-dw -- development
npm run populate-dw -- production
```

The `populate-dw` script runs [`bin/populate-data-warehouse.js`](./bin/populate-data-warehouse.js).

This procedure is necessary only once when you deploy this CDK stack for the first time.

### Enabling the daily access log loading

This CDK stack provisions an [Amazon EventBridge](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-what-is.html) rule that runs the Lambda function that loads CloudFront access logs onto the data warehouse once a day.
Since the rule is disabled by default, you have to enable the rule to run the daily access log loading.
There are separate rules for development\* and production.

Please make sure that you have [populated the database and tables on the data warehouse](#populating-the-database-and-tables-on-the-data-warehouse).

\* The rule for development triggers **every hour**.

## Why am I not using exports?

This CDK stack depends on the main codemonger CloudFormation stacks.
Expand Down
9 changes: 9 additions & 0 deletions cdk-ops/bin/cdk-ops.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,15 @@ resolveCodemongerResourceNames()
// env: { account: '123456789012', region: 'us-east-1' },

/* For more information, see https://docs.aws.amazon.com/cdk/latest/guide/environments.html */
env: {
// without the following properties `account` and `region`,
// the stack becomes "environment-agnostic."
// only two availability zones (AZs) are visible in an
// evironment-agnostic stack.
// https://docs.aws.amazon.com/cdk/v2/guide/environments.html
account: process.env.CDK_DEFAULT_ACCOUNT,
region: process.env.CDK_DEFAULT_REGION,
},
codemongerResourceNames: names,
tags: {
project: 'codemonger',
Expand Down
76 changes: 76 additions & 0 deletions cdk-ops/bin/populate-data-warehouse.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
/* Populates the data warehouse. */

const yargs = require('yargs/yargs');
const { hideBin } = require('yargs/helpers');
const {
CloudFormationClient,
DescribeStacksCommand,
} = require('@aws-sdk/client-cloudformation');
const { LambdaClient, InvokeCommand } = require('@aws-sdk/client-lambda');

const CODEMONGER_OPERATIONS_STACK_NAME = 'codemonger-operations';

yargs(hideBin(process.argv))
.command(
'$0 <stage>',
'populates the data warehouse',
_yargs => {
_yargs.positional('stage', {
describe: 'deployment stage of the data warehouse',
choices: ['development', 'production'],
});
},
run,
)
.help()
.argv;

async function run({ stage }) {
console.log('obtaining populate function for', stage);
const functionArn = await getPopulateFunctionArn(stage);
console.log('running populate function for', stage);
await runPopulate(functionArn);
console.log('populated the data warehouse for', stage);
}

// obtains the ARN of the Lambda function that populates the database and
// tables.
async function getPopulateFunctionArn(stage) {
const client = new CloudFormationClient({});
const command = new DescribeStacksCommand({
StackName: CODEMONGER_OPERATIONS_STACK_NAME,
});
const results = await client.send(command);
const outputs = (results.Stacks ?? [])[0]?.Outputs;
if (outputs == null) {
throw new Error(
`please deploy the latest stack ${CODEMONGER_OPERATIONS_STACK_NAME}`,
);
}
const outputKey = stage === 'production'
? 'PopulateProductionDwDatabaseLambdaArn'
: 'PopulateDevelopmentDwDatabaseLambdaArn';
const output = outputs.find(o => o.OutputKey === outputKey);
if (output == null) {
throw new Error(
`please deploy the latest stack ${CODEMONGER_OPERATIONS_STACK_NAME}`,
);
}
return output.OutputValue;
}

// runs a given populate function.
async function runPopulate(functionArn) {
const client = new LambdaClient({});
const command = new InvokeCommand({
FunctionName: functionArn,
Payload: '{}',
});
const results = await client.send(command);
if (results.StatusCode !== 200) {
const decoder = new TextDecoder();
const payload = decoder.decode(results.Payload);
console.error('failed to populate the data warehouse', payload);
throw new Error('failed to populate the data warehouse');
}
}
1 change: 1 addition & 0 deletions cdk-ops/docs/data-warehouse-aws-architecture.drawio
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<mxfile host="Electron" modified="2022-10-18T09:23:10.261Z" agent="5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/20.3.0 Chrome/104.0.5112.114 Electron/20.1.3 Safari/537.36" etag="woa8o7jWEie3WvTo1pGm" version="20.3.0" type="device"><diagram id="JEM28LiEYBrcV714s1jI" name="ページ1">7V1bc6M4Fv41eewUQuLiR1/i2dlKqlzjrZ7Zp5RsFJtpjLwgx8n8+pW42ULCxnEwjkN3VwcdQAKd850bR8odHK7efovwevlEPRLcmYb3dgdHd6YJoOXwH4LynlLcHkgJi8j3sot2hKn/D8mIRkbd+B6JpQsZpQHz1zJxTsOQzJlEw1FEt/JlLzSQR13jBVEI0zkOVOqfvseWGRXYvd2JfxF/scyGds3shVc4vzh7k3iJPbrdI8GHOziMKGXp0eptSAIxefm8pPeNK84WDxaRkNW54f1HH0Dn55RMcPAD/3x++jeY/8h6ecXBJnvhn5Nh9rzsPZ+ENfVDlkykNeD/+DhD487iZ4aidW9aJUK57cgEoLZEHzKh3HZkAih3D0rjg/ID7hGUltS9URrf2HtA/g8O6IYFfkiGhcgZnLiIsOdzVgxpQCNOC2nIZ2+wZKuAtwA/3C59RqZrPBezuuVw4bQXGrJM6IGZt7OJF71ysWaYjxVlfSScINHDK0kZkl4TBHgd+7PirojMN1Hsv5I/SJx2LqhcANfiePW2EFi9x9sY3S8iulknj/87H0t79vl1PRe3s4j+Ivnr3ZnQRK4LkHhoPwhKr/1KIuZzFPUDfyF6ZVQMgrNWQF6Y6JHPhR8uHpPWCBrZ++8N0e8PnIHL6R6Ol8TLXiSTWT4EeasEAyggxnUToSvCond+SXaDlaMyU0vIRGl7uwM5zHXQcg/fJrAz5ZIplkXR9w57/CCD3wlQNBUoTiL/FTMisLiZhYR1uOxwuY/LWHTns/fn3cXTBKR5x2XEAuQ8DPolxHL6gz02x+7nwbYY53TYVlupSiwDGcrQUJBs9jRIBo7REJKhgmQFufEvwubLbGK0cluaUv53LB6hSp5LHEUj0+n3FRnILpbYkvP8Ec9IMKGxz/xE4GaUMbo6KhRzIgRfRtQx9OB4nb7oi/8mnkOBD6iCBQm9XOmdIU/mQXmCzjF5clxVnHLap0sTUqTpN24Utpj3bAcCj7OIHy3EUT47irjxGWIyiyKhgfBONeENo5lOAjruloVg5XueuHkg26CR3h3QSiFXWqFXiOG+zNhNchdI3AVGT2GvTltYTSkLq31lAceubaLbUxazzZzP3PPWZ8tnOvub91JbbRy2OKYD7i1JjHqq++hYF1QStiI0xONxbNakEVvSBQ1x8LCjlvC3u+aRCp4l0/o3Yew9Y47QEDLn0jHFQIfnkT8X3URzcsjC57E9jhbkELgtPWMiEmDGHStp1E+fZeeLzzJCF5rl7NZJaoxy2CAke2ogD7ryLtLnmhQmLONVP4qEtSsuy0xO9Tg2kseBsMT6tMedIBTv+HHZcL+4bMCaooFgI7JhGbJCtUwod5G+wPmyUYoW8nHGH7u+GVnqKT5B/88pJwwDuvEUOeuyAN86C8APn+eJYOhydLkn2FyOrhji03N0sha3VRcLcKrGyTKbStHlumAfmCv8D+dMhs1xRL979NU7yNOSNgWGqzLV0YRfsKnwK3c8a8dfWm2r1bg6ravVvKr2lS5L9KFmhDJRR3NUIlAvy1WoStTRdPaifDfQ3A1Kd1dr67rhLD/XQ9ZobO6dG/lcv2ZRaShcKjXetfrQGFg6dfmS/LnGoFdrHrgaSbyi1DgIraIzE4lxeEk0U3PAtmwJ2G0nzYCK4utwxw/5e8ej5LpOulWRA7tMmAzUBHhhJqdQTVvi+ZzE3Gc3Arrg/6dpm86KHrCibg0zal0yiwnULPVhM9pAGnNgGC4a3V4aU4j1CqdP2JhIuTloiw/ibWtwNTFeKJE/iBcvfR6RKKpkhBnmx3gl2BXOYvGjP/m9UybVnHcc+XsX6jltKxO788m/sk8+HqOxOzjNJx8MAbTsb+OTE6H1Z5GfuIoNKnU5hdK6V+607yWMEMeMc3teQoBXMw8/v2zCedOugtmTpEonVkgjVqgxsXIVsXqk2OsnYcUjXcSd+a9v/gv0HzP/zaXk1M8fCgNvIJjPHeyjwXyqNdsK5vPH1PnhiVYapFasg1g1xGxXglj7We/8xToPu/Owb9TDjhlZF97QeeWih9Hds0vuUJ65aMvLzrGslA9M+ZTwH+NiUjqdXc1VU86HmT3Yts6GbXhBfA6j97+y+5PGf0Xj3sqbo7f9k6P3/daERD5/+aQ+YlcU0OD3EaemS1VVeXAhl+oKkuVdGHy2hnDl5AqCqt6/aBhsqhnzn3i+2az+kyjwTtlXsxLJDjpyNcrevKiyv9aC7vNC3rr1x2nesjX9fK113udNvlnXOLY7+ddaSH2eZ+J+Dcnv3eLk50u+jk9+q6tL4KlJmc4tvEK30DVlX8I27HbdQqimAyZ0veESTUZbUS8xw3HnHR5M38qpANvSVKK7l/QO4YECu+ramCmJ+OQESaldx+0qbttArmRFvbZr6yC6SaNs1TTKyNDz60JG+dT12d2Xkqv6UtKtDzj6pSTKLUaDStUyZaXadnUpbCW/UiTTi8Y1J9PrJmtgM6vFndKSEmT15C4qVot/1iJcWKMMpfGS9rHhAvWb7ZcPxyKaAOpS6qZu4SNC1UJ63h4+ahHNzk83QrwicbLyWPHZk4nqvPVqThuykjAdTWym+0zbmLeO1CzOwcVPLMJh/EKjFZ+7bgVU7RVQdmn/Rs0GPOCi+zgh1TrcQJSGam8LYn+GE3D6bh4lHyF9jGZ350BXsL9ft2XXZ2gRbhpK+3gVAtXWRl6oqw25gY8AsFeKd1suDUFqFmtEAh5MdmskarETymZGVwtoX/IDAFJXSKr8LHua/9uQTRdMnMBmTrhXgavldHNupbpk7gnHvzrcfmSfBJinkFpb24TUpWqdef9y5r28iTDMd5Fqzb5/jyVzqG5999mbVJ63pbNxi5Nv1a2iSqW8tclX829lg9k5RmfaURNqHOCL1kTkCf5uv6HPt6MpHJrMushuNnA1ORed9XSaEqZWVlY1rq/rfr9FrVa9WleQ8OqQ/DEkw9LeERzJmoD5oliGp24x09VrXVW91thCDo9mTqrXGjoAgrEC31ut14rJPCIsfl7hEC+SB2nM7wNm6fta23Vbdiuh1ZdaBF17kR1sdV8ZqGbCsl0KUvFWw7SnTN67QO0AYI0a1eu6lazNRWrfe0kZPDcZ8qG6CABsWQ7yVQ2NFkZY1pWyGtQNmey6XG11/wjr1P1RGwiZkstvMGTaxPU9qsO/vgQUv0ewWDfW8sZAlhob9b2VH34bi3qUYfKOHo4mzXnRLX+sa10LX1eh1l7k1dD3Gt6MKGX7Zk5A/Yl6RFzxfw==</diagram></mxfile>
Binary file added cdk-ops/docs/data-warehouse-aws-architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading