Skip to content

Commit

Permalink
Merge pull request #1 from all-of-us/jaycarlton/reporting
Browse files Browse the repository at this point in the history
[RW-5729][risk=no] Reporting Terraform module (public side)
  • Loading branch information
jaycarlton authored Dec 1, 2020
2 parents 898b522 + 3c9cc54 commit e4eb4c5
Show file tree
Hide file tree
Showing 17 changed files with 1,052 additions and 0 deletions.
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
**/.idea/*
*.tfstate
*.backup
*.iml
.DS_Store
45 changes: 45 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Workbench Terraform Modules
The module directories here represent individually deployable subsystems,
microservices, or other functional units. It's easy enough to put all buckets, say,
in a `gcs` module, but that wouldn't really let us operate on an individual components-owned bucket.

Following is a broad outline fo each child module. If you feel irritated that you can't see, for example,
all dashboards in one place, you can still go to the Console or use `gcloud`.
## Goals
### Automate ourselves out of a job
All the existing and planned Terraform modules have some level of scripted or otherwise automated
support processes.
## Non-goals
### Become the game in town
We don't want to get into a position where we force anyone to use Terraform if it's not the best
choice for them. Terraform is still pretty new, and changing rapidly. The Google provider is also
under rapid development.
### Wag the Dog
We do not have any aspirations to absorb any of the tasks that external teams are responsible for,
including building the GCP projects for each of our environments or conducting all administrative
tasks in either pmi-ops or terra projects. If Terraform really "takes off". then it may make sense to
share learnings, and at that point, there may be opportunities for our Terraform stack to use theirs,
or vice versa. While these boundaries may be fuzzy today, hopefully the addition of clear module
inputs and documentation will drive clarification of responsibilities and visibility into state,
dependencies, etc.
### Bypass security approvals
In some cases, actions that require security approval can be performed in Terraform, particularly
around IAM bindings, access groups, and roles. We don't want a situation where an audit finds that
individuals or service accounts were added or modified without going through the proper channels.

One potential workaround here is to invite sysadmin or security personnel to the private repository
to approve changes to the Terraform module inputs.

## Currently Supported Modules

### Reporting
The state for reporting is currently the BigQuery dataset and its tables and views. In the future,
it makes sense to add these sorts of things:
* Reporting-specific metrics
* Notifications on the system
* Reporting-specific logs, specific logs
* Data blocks for views (maybe)

In other words, the primary focus of the module is the Reporting system, but it may be convenient to
add reporting-specific artifacts that might otherwise be concerned with Monitoring or other auxiliary
services.
218 changes: 218 additions & 0 deletions TERRAFORM_QUICKSTART.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
# Terraform Quickstart
The [official documentation](https://www.terraform.io/) for Terraform
is quite readable and exposes the functionality and assumptions at a good pace.
In particular, I found the [Get Started - Google Cloud](https://learn.hashicorp.com/collections/terraform/gcp-get-started) guide to be very helpful.

It's worth making an alias for terraform and putting it in your `.bash_profile` or other shell init file, as
it's difficult to spell `terraform` correctly when caffeinated.
```shell script
alias tf='terraform'
```
The above tip also serves as a warning and non-apology that I'm going to forget to spell out the
command name repeatedly below.

## Installation
For the work so far, I've used the [Terraform CLI](https://www.terraform.io/docs/cli-index.html), which has the advantage of not costing
money or requiring an email registration. On the mac, `brew inistall terraform` is pretty much all it takes.

Terraform works by keeping state on the local filesystem for evaluating diffs and staging changes. Primary files for users to author
and check in to source control are:
* main.tf - listing providers and specifying Terraform version and other global options
* <subsystem_name>.tf - list of resources and their properties and dependencies. This file can reference any other .tf flies in the local directory.
* variables.tf - any string, numeric, or map variables to be provided to the script.
* external text files - useful files with text input, such as BigQuery table schema JSON files

Output files provided Terraform (and not checked in to source control) include
* tfstate files - a record of the current known state of resources under Terraform's control.

## Organization
Terraform configuration settings are reusable for all environments (after bvinding environment-specific
variables in `.tfvars` files). The reuse is provided by Terraform
## Running
If you have a small change to make to a resource under Terraform's management, in the simplest case the workflow is
* Run `terraform init` to initialize the providers
* Run `terraform state list` to list all artifacts currently known and managed by Terraform within
the scope of the `.tf` files in the current directory.
* Run `terraform show` to view the current state of the (managed) world, and check any errors.
* change the setting in the tf file (such as reporting.tf).
* Run `terraform plan` to see the execution plan. This can be saved with the `-out` argument in
situations where it's important to apply exactly the planned changes. Otherwise, new changes to the
environment might be picked up in the `apply` step, giving possibly significantly different behaviors
than were expected based on the `plan` output.
* Run `terraform apply` to execute the plan and apply the changes. You'll need to type "yes" to
proceed with the changes (or use `-auto-approve` in a non-interactive workflow.)
* Check in changes to the terraform file.

## Managing Ownership
### Importing resources
Frequently, resources to be managed already exist. By default, Terraform will try to re-create them
if they're added to a configuration and fail because the name or r other unique identifier is already in use.
Using `terraform import` allows the existing resource to be included
in the `tfstate` file as if Terraform created it from scratch.

### Removing items from Terraform
Occasionally, it's desirable to remove a resource form Terraform state. This can be helpful when reorganizing
resources or `tf` files. The `terraform state rm` command accomplishes this, and moves those resources
into a state where Terraform doesn't know it either created or owned them. The
[official](https://www.terraform.io/docs/commands/state/rm.html) do are pretty good for this.

## Good Practices
### Formatting
A builtin linter is available with the `terraform fmt` command. It spaces assignments in clever ways
that would be difficult to maintain by hand, but that are easy to read. It's easy to set up in IntelliJ
by installing the FileWitchers plugin and adding a Terraform Format action. Runs fast,too.

### Labels
It's handy to have a human-readable label called `managedByTerraform` and set it to `true` for all TF artifacts.
It's possible to set up default labels and things for this.
### Local Variables
Using a `locals` bock allows you to assign values (computed once) to variables to be used elsewhere. This
is especially useful for nested map lookups:
```hcl-terraform
locals {
project = var.aou_env_info[var.aou_env]["project"]
dataset = var.aou_env_info[var.aou_env]["dataset"]
}
```

Later, simply reference the value by `dataset_id = local.dataset`. Note that these "local" variables
are available to other `.tf` files, but apparently, since things are all initialized at once and immutable,
it doesn't really matter whether you define them in `chicken.tf` or `egg.tf`. It just works as long
as both files are part of the same logical configuration.

It's useful in some cases to specify `default` values for the resources in use, but it's advisable to
force the user to specify certain fundamental things (such as the AoU environment) every time in order
to avoid migrating the wrong environment prematurely (such as removing artifacts that code running on
that environment expects to be there).

### Starting with a scratch state collection
It's much faster to work Terraform-created artifacts, properties, etc, than to attach to existing infrastructure.
For this purpose, it can be handy to add new BigQuery datasets just for the development of the configuration,
capture resource and module identifiers for import, and then tear down the temporary artifacts with `terraform destroy`.

### Use Modules
[Modules](https://www.terraform.io/docs/configuration/modules.html) are the basis of reuse,
encapsulation, and separation of concerns in Terraform. Frequently, the provider (such as Google
Cloud Platform) has already written handy base modules that provide reasonable
defaults, logical arrangement of resources, and convenient output variable declarations.

### Separate Private Vars from Community-use Settings
Names of artifacts, deployments (such as test and staging), service accounts, or other pseudo-secrets
should be kept separate from the primary module definitions outlining behavior. For example, looking
at the reporting project, we have:
* public: table schemas, names, and clustering/partitioning settings
* public: view queries (with dataset and project names abstracted out)
* private: names of AoU environments (currently exposed in several places publicly, but of no legitimate
use to the general public)
* private: BigQuery dataset names. We have a simple convention of naming it after the environment,
but this isn't a contract enforced by our application code or the Terraform configurations.

Why do we include the environment name in the dataset name (as opposed to just calling it `reporting`) in every
environment? Firstly, we have two environments that share a GCP project, so we would have a name clash.
More fundamentally, though, is that it would be too easy to apply a query to a dataset in the wrong environment
if it simply referred to `reporting.workspace` instead of `reporting_prod.workspace`, as the BigQuery
console lets you mix datasets from multiple environments as long as you have the required credentials. In most
cases, I'd argue against such inconsistent resource naming.

### Don't fear the `tfstate` file
Despite the scary name, the contents of `tfstate` are in JSON, and largely readable. You can operate
on it with utilities such as `jq`

```shell script
$ jq '.resources[0].instances[0].attributes.friendly_name' terraform.tfstate
"Workbench Scratch Environment Reporting Data"
```

I'd keep any operations read-only whenever possible, but I have a feeling one of the keys to mastering
Terraform will be understanding the `tfstate` file.
## Gotchas
## A Terra by any other name
[Terra](https://terra.bio/) and [Terraform](https://www.terraform.io/) are different things, and for
the most part going to one organization for help with the other's platform will result in bemusement
at best. Good luck differentiating them on your resume.

### Mis-configuring a tfstate file
The file really shouldn't be checked into source contol, because
it's not safe to have multiple developers working with it. It's too easy to getinito an inconsistent view of the world.

However, that doesn't mean it's safe to lost track of the tfstate JSON file altogether.
When working with multiple people, a shared online backend with locking is really
required.

### Using two terminals in the same terraform root module working directory.
Frequent error messages about the lock file and how you can use `lock=fale` but should really never
do so. It's basically that two processes think they own something in `.terraform/`. So don't do that.

### Using `terraform state show` with `for-each` or an array-declared value.
When creating many items of hte same type at the same level/scope, it's useful to use arrays or
`for-each`. However, the syntax for `tf state show` is trickier because you need to pass a double-quoted
string index from the command line.

Given the following output of `terraform state list`:
```
$ tf state list
module.bigquery_dataset.google_bigquery_dataset.main
module.bigquery_dataset.google_bigquery_table.main["cohort"]
module.bigquery_dataset.google_bigquery_table.main["user"]
module.bigquery_dataset.google_bigquery_table.main["workspace"]
module.bigquery_dataset.google_bigquery_table.view["latest_users"]
```
The naive approach gives you this [cryptic error message](https://github.com/hashicorp/terraform/pull/22395).
```
$ tf state show module.bigquery_dataset.google_bigquery_table.main["cohort"]
Error parsing instance address: module.bigquery_dataset.google_bigquery_table.main[cohort]
This command requires that the address references one specific instance.
To view the available instances, use "terraform state list". Please modify
the address to reference a specific instance.
```
The approach that seems to work in Bash is
```
terraform state show ¨
```

### Cloud not quite ready to use newly created resource
When creating a new BigQuery dataset with tables and views
all at once, I once run into an issue where the new table
wasn't ready for a view creation yet. The error message was
```
Error: googleapi: Error 404: Not found: Table my-project:my_dataset.user, notFound
on .terraform/modules/aou_rw_reporting/main.tf line 76, in resource "google_bigquery_table" "view":
76: resource "google_bigquery_table" "view" {
```

Re-running `terraform apply` fixed this.
### Renaming files and directories
It's really easy to refactor yourself into a corner by renaming modules or directories in their paths.
If you see this error, it probably means you've moved something in the local filesystem that the
cached state was depending on.
```
Error: Module not found
The module address
"/repos/workbench/ops/terraform/modules/aou-rw-reporting/"
could not be resolved.
If you intended this as a path relative to the current module, use
"/repos/workbench/ops/terraform/modules/aou-rw-reporting/"
instead. The "./" prefix indicates that the address is a relative filesystem
path.
```
So the last chance to rename things relatively is just before you've created them and people are depending on them in prod.
It not really easy to rework your tf files after deployment. (Another good reason for a scratch project).

### Running in wrong terminal window
If things get created on the wrong cloud, that's not good. I was really confused when I tried running
the AWS tutorial tf file. `tf destroy` is cathartic in such situations. I'm not even sure it's OK to use two
terminals in the same root module at once.

### Using new BigQuery resources
The BigQuery console UI frequently doesn't list all of the new datasets for several minutes, so using
`bq show` is helpful if you want to see things "with your own eyes after tf operation".

### Yes Man
If you "yes" out of habit but `terraform apply` or `destroy` bailed out earlier than the prompt,
you see a string of `y`s in your terminia. I nearly filed a bug for this, but then realized the `yes`
command with no argument does that for all time (at least, so far...).
Binary file added modules/.DS_Store
Binary file not shown.
72 changes: 72 additions & 0 deletions modules/workbench/WORKBENCH-MODULE-PLAN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Workbench Module Plan
The module directories here represent individually deployable subsystems,
microservices, or other functional units. It's easy enough to put all buckets, say,
in a `gcs` module, but that wouldn't really let us operate on an individual components's bucket.

Following is a broad outline fo each child module. If you feel irritated that you can't see, for example,
all dashboards in one place, you can still go to the Console or use `gcloud`.

# Workbench Module Development Plan
The Workbench is the topmost parent module in the AoU Workbench
Application configuration. It depends on several modules for individual
subsystems.

After creating a valid Terraform configuration we're not finished,
as we need to make sure we don't step on other tools or automatioin.
For example, items that pertain to cloud resources will need to move
out of the workbench JSON config system.

I have automation already for stackdriver setting that fetches all of theiir configurations
and plan to migrate it to Terraform.

## Reporting
The state for reporting is currently the BigQuery dataset and its tables and views.
Highlights
* Reporting-specific metrics with the `google_logging_metric` [resource](https://www.terraform.io/docs/providers/google/r/logging_metric.html)
and others
* Notifications on the system
* Reporting-specific logs, specific logs
* Data blocks for views (maybe)

## Backend Database (notional)
This resource is inherently cross-functional, so we can just put
* The application DB
* backup settings
This will take advantage of the `google_sql_database_instance` resource.

Schema migrations work via `Ruby->Gradle->Liquibase->MySql->�`
Maybe it needs a `Terraform` caboose. It looks like there's not currently a Liquibase provider.

It may not make sense organizationally to do this in Terraform, as there are dependencies on other
team(s) when instantiating or migrating databases.

## Workbench to RDR Pipeline
Instantiate [google_cloud_tasks_queue](https://www.terraform.io/docs/providers/google/r/cloud_tasks_queue.html) resource
resouorces as necessary.

## API Server
* AppEngine versions, instances, logs, etc. Isn't just named
App Engine, since that's the resource that gets crated.

At the moment, there are no plans to rip and replace our existing deployment process or automation,
but we may find areas that the Terraform approach could be helpful (such as managing dependent
deployment artifacts or steps in a declarative way.)

## Action Audit
This module maps to
* Stackdriver logs for each environment. (It will need to
move from the application JSON config likely.)
* Logs-based metrics on the initial log stream
* Sink to BigQuery dataset for each environment (Stackdriver may need to create initially, in which
case, we need to do `terraform state import`.)
* Logs-based metrics on the initial log stream
* Reporting datasets in BigQuery

## Tiers and Egress Detection
There is a [sumo logic provider](https://www.sumologic.com/blog/terraform-provider-hosted/) for Terraform, which is very good
news. It looks really svelte.

We will also want to control the VPC flow logs,
perimeters, etc, but it won't be in this `workbench` module,
because Terra-not-form owns the organization and needs to do
creation manually for now.
11 changes: 11 additions & 0 deletions modules/workbench/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Workbench Analytics Reporting Subsystem
module "reporting" {
source = "./modules/reporting"

# reporting
aou_env = var.aou_env
reporting_dataset_id = var.reporting_dataset_id

# provider
project_id = var.project_id
}
Binary file added modules/workbench/modules/reporting/.DS_Store
Binary file not shown.
Loading

0 comments on commit e4eb4c5

Please sign in to comment.