Merge pull request #1 from all-of-us/jaycarlton/reporting

[RW-5729][risk=no] Reporting Terraform module (public side)
all-of-us · Dec 1, 2020 · e4eb4c5 · e4eb4c5
2 parents 898b522 + 3c9cc54
commit e4eb4c5
Show file tree

Hide file tree

Showing 17 changed files with 1,052 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,5 @@
+**/.idea/*
+*.tfstate
+*.backup
+*.iml
+.DS_Store
diff --git a/README.md b/README.md
@@ -0,0 +1,45 @@
+# Workbench Terraform Modules
+The module directories here represent individually deployable subsystems, 
+microservices, or other functional units. It's easy enough to put all buckets, say,
+in a `gcs` module, but that wouldn't really let us operate on an individual components-owned bucket.
+
+Following is a broad outline fo each child module. If you feel irritated that you can't see, for example,
+all dashboards in one place, you can still go to the Console or use `gcloud`.
+## Goals
+### Automate ourselves out of a job
+All the existing and planned Terraform modules have some level of scripted or otherwise automated
+support processes.
+## Non-goals
+### Become the game in town
+We don't want to get into a position where we force anyone to use Terraform if it's not the best
+choice for them. Terraform is still pretty new, and changing rapidly. The Google provider is also
+under rapid development.  
+### Wag the Dog
+We do not have any aspirations to absorb any of the tasks that external teams are responsible for,
+including building the GCP projects for each of our environments or conducting all administrative
+tasks in either pmi-ops or terra projects. If Terraform really "takes off". then it may make sense to
+share learnings, and at that point, there may be opportunities for our Terraform stack to use theirs,
+or vice versa. While these boundaries may be fuzzy today, hopefully the addition of clear module
+inputs and documentation will drive clarification of responsibilities and visibility into state,
+dependencies, etc.
+### Bypass security approvals
+In some cases, actions that require security approval can be performed in Terraform, particularly
+around IAM bindings, access groups, and roles. We don't want a situation where an audit finds that
+individuals or service accounts were added or modified without going through the proper channels.
+
+One potential workaround here is to invite sysadmin or security personnel to the private repository
+to approve changes to the Terraform module inputs.
+
+## Currently Supported Modules
+
+### Reporting
+The state for reporting is currently the BigQuery dataset and its tables and views. In the future,
+it makes sense to add these sorts of things:
+* Reporting-specific metrics
+* Notifications on the system
+* Reporting-specific logs, specific logs
+* Data blocks for views (maybe)
+
+In other words, the primary focus of the module is the Reporting system, but it may be convenient to
+add reporting-specific artifacts that might otherwise be concerned with Monitoring or other auxiliary
+services.
diff --git a/TERRAFORM_QUICKSTART.md b/TERRAFORM_QUICKSTART.md
@@ -0,0 +1,218 @@
+# Terraform Quickstart
+The [official documentation](https://www.terraform.io/) for Terraform
+is quite readable and exposes the functionality and assumptions at a good pace.
+In particular, I found the [Get Started - Google Cloud](https://learn.hashicorp.com/collections/terraform/gcp-get-started) guide to  be very helpful. 
+
+It's worth making an alias for terraform and putting it in your `.bash_profile` or other shell init file, as
+it's difficult to spell `terraform` correctly when caffeinated.
+```shell script
+alias tf='terraform'
+```
+The above tip also serves as a warning and non-apology that I'm going to forget to spell out the
+command name repeatedly below.
+
+## Installation
+For the work so far, I've used the [Terraform CLI](https://www.terraform.io/docs/cli-index.html), which has the  advantage of not costing
+money or requiring an email registration. On the mac, `brew inistall terraform` is pretty much all it takes. 
+
+Terraform works by keeping state  on the local filesystem for evaluating diffs and staging changes. Primary files for users to author
+and check in to source control are:
+* main.tf - listing providers and specifying Terraform version  and other global options
+* <subsystem_name>.tf - list of resources and their properties and dependencies. This file can reference any other .tf flies in the local directory.
+* variables.tf - any string, numeric, or map variables to be provided to the script.
+* external text files - useful files with text input, such as BigQuery table schema JSON files
+
+Output files provided Terraform (and not checked in to source control) include
+* tfstate files - a record of the current known state of resources under Terraform's control.
+
+## Organization
+Terraform configuration settings are reusable for all environments (after bvinding environment-specific
+variables in `.tfvars` files). The reuse is provided by Terraform
+## Running
+If you have a small change to make to a resource under Terraform's management, in the simplest case the workflow is
+* Run `terraform init` to initialize the providers
+* Run `terraform state list` to list all  artifacts currently known and managed by Terraform within
+the scope of the `.tf` files in the current directory.
+* Run `terraform show` to view the current  state of the (managed) world, and check any errors.
+* change the setting in the tf file (such as reporting.tf). 
+* Run `terraform plan` to see the execution plan. This can be saved with the `-out` argument in
+situations where it's important to apply exactly the planned changes. Otherwise, new changes to the
+environment might be picked up in the `apply` step, giving possibly significantly different behaviors
+than were expected based on the `plan` output.
+* Run `terraform apply` to execute the plan and apply the changes. You'll need  to type "yes" to
+ proceed with the changes (or use `-auto-approve` in a non-interactive workflow.)
+* Check in changes to the terraform file.
+
+## Managing Ownership
+### Importing resources
+Frequently, resources to be managed already exist. By default, Terraform will try to re-create them
+if they're added to a configuration and fail because the name or r other unique identifier is already  in use.
+Using `terraform import` allows the existing resource to be included
+in the `tfstate` file as if Terraform created it from scratch. 
+
+### Removing items from Terraform
+Occasionally, it's desirable to remove a resource form Terraform state. This can be helpful when reorganizing
+resources or `tf` files. The `terraform state rm` command accomplishes this, and moves those resources
+into a state where Terraform doesn't know  it either created or owned them. The
+[official](https://www.terraform.io/docs/commands/state/rm.html) do are pretty good for this.
+
+## Good Practices
+### Formatting
+A builtin linter is available with the `terraform fmt` command. It spaces assignments in clever ways
+that would be difficult to maintain by hand, but that are easy to read. It's easy to set up in IntelliJ
+by installing the FileWitchers plugin and adding a Terraform Format action. Runs fast,too.
+
+### Labels
+It's handy to have a human-readable label called `managedByTerraform` and set it to `true` for all TF artifacts.
+It's possible to set up default labels and things for this.
+### Local Variables
+Using a `locals` bock allows you to assign values (computed once) to variables to be used elsewhere. This
+is especially useful for nested map lookups: 
+```hcl-terraform
+locals {
+  project = var.aou_env_info[var.aou_env]["project"]
+  dataset = var.aou_env_info[var.aou_env]["dataset"]
+}
+```
+
+Later, simply reference the value by `dataset_id = local.dataset`. Note that these "local" variables
+are available to other `.tf` files, but apparently, since things are all initialized at once and immutable,
+it doesn't really matter whether you define them in `chicken.tf` or `egg.tf`. It just works as long
+as both files are part of the same logical configuration.
+
+It's useful in some cases to specify `default` values for the resources in use, but it's advisable to
+force the user to specify certain fundamental things (such as the AoU environment) every time in order
+to avoid migrating the wrong environment prematurely (such as removing artifacts that code running on
+that environment expects to be there).
+
+### Starting with a scratch state collection
+It's much faster to work Terraform-created artifacts, properties, etc, than to attach to existing infrastructure.
+For this purpose, it can be handy to add new BigQuery datasets just for the development of the configuration,
+capture resource and module identifiers for import, and then tear down the temporary artifacts with `terraform destroy`.
+
+### Use Modules
+[Modules](https://www.terraform.io/docs/configuration/modules.html) are the basis of reuse,
+encapsulation, and separation of concerns in Terraform. Frequently, the provider (such as Google
+Cloud Platform) has already written handy base modules that provide reasonable
+defaults, logical arrangement of resources, and convenient output variable declarations.
+
+### Separate Private Vars from Community-use Settings
+Names of artifacts, deployments (such as test and staging), service accounts, or other pseudo-secrets
+should be kept separate from the primary module definitions outlining behavior. For example, looking
+at the reporting project, we have:
+* public: table schemas, names, and clustering/partitioning settings
+* public: view queries (with dataset and project names abstracted out)
+* private: names of AoU environments (currently exposed in several places publicly, but of no legitimate
+use to the general public)
+* private: BigQuery dataset names. We have a simple convention of naming it after the environment,
+but this isn't a contract enforced by our application code or the Terraform configurations.
+
+Why do we include the environment name in the dataset name (as opposed to just calling it `reporting`) in every
+environment? Firstly, we have two environments that share a GCP project, so we would have a name clash.
+More fundamentally, though, is that it would be too easy to apply a query to a dataset in the wrong environment
+if it simply referred to `reporting.workspace` instead of `reporting_prod.workspace`, as the BigQuery
+console lets you mix datasets from multiple environments as long as you have the required credentials. In most
+cases, I'd argue against such inconsistent resource naming.
+
+### Don't fear the `tfstate` file
+Despite the scary name, the contents of `tfstate` are in JSON, and largely readable. You can operate
+on it with utilities such as `jq`
+
+```shell script
+$ jq '.resources[0].instances[0].attributes.friendly_name' terraform.tfstate
+"Workbench Scratch Environment Reporting Data"
+```
+
+I'd keep any operations read-only whenever possible, but I have a feeling one of the keys to mastering
+Terraform will be understanding the `tfstate` file.
+## Gotchas
+## A Terra by any other name
+[Terra](https://terra.bio/) and [Terraform](https://www.terraform.io/) are different things, and for
+the most part going to one organization for help with the other's platform will result in bemusement
+at best. Good luck differentiating them on your resume.
+
+### Mis-configuring a tfstate file
+The file really shouldn't be checked into source contol, because
+it's not safe to have multiple developers working with it. It's too easy to getinito an inconsistent view of the world.
+
+However, that doesn't mean it's safe to lost track of the tfstate JSON file altogether.
+When working with multiple people, a shared   online backend  with locking is really
+required.
+
+### Using two terminals in the same terraform root module working directory.
+Frequent error messages about the lock file and how you can use `lock=fale` but should really never
+do so. It's basically that two processes think they own something in `.terraform/`. So don't do that.
+
+### Using `terraform state show` with `for-each` or an array-declared value.
+When creating many items of hte same type at the same level/scope, it's useful to use arrays or 
+`for-each`. However, the syntax for `tf state show` is trickier because you need to pass a double-quoted
+string index from the command line.
+
+Given the following output of `terraform state list`:
+```
+$ tf state list
+module.bigquery_dataset.google_bigquery_dataset.main
+module.bigquery_dataset.google_bigquery_table.main["cohort"]
+module.bigquery_dataset.google_bigquery_table.main["user"]
+module.bigquery_dataset.google_bigquery_table.main["workspace"]
+module.bigquery_dataset.google_bigquery_table.view["latest_users"]
+```
+The naive approach gives you this [cryptic error message](https://github.com/hashicorp/terraform/pull/22395).
+```
+$ tf state show module.bigquery_dataset.google_bigquery_table.main["cohort"]
+Error parsing instance address: module.bigquery_dataset.google_bigquery_table.main[cohort]
+
+This command requires that the address references one specific instance.
+To view the available instances, use "terraform state list". Please modify 
+the address to reference a specific instance.
+
+```
+The approach that seems to work in Bash is
+```
+ terraform state show ¨
+```
+
+### Cloud not quite ready to use newly created resource
+When creating a new BigQuery dataset with tables and views
+all at once, I once  run into an issue where the new  table
+wasn't  ready for a view creation yet. The error message was
+```
+Error: googleapi: Error 404: Not found: Table my-project:my_dataset.user, notFound
+
+  on .terraform/modules/aou_rw_reporting/main.tf line 76, in resource "google_bigquery_table" "view":
+  76: resource "google_bigquery_table" "view" {
+```
+
+Re-running `terraform apply` fixed this.
+### Renaming  files and  directories
+It's really easy to refactor yourself into a corner by renaming modules or directories in their paths.
+If you see this  error,  it  probably means you've moved something in the local filesystem  that  the
+cached state was depending on.
+```
+Error: Module not found
+
+The module address
+"/repos/workbench/ops/terraform/modules/aou-rw-reporting/"
+could not be resolved.
+
+If you intended this as a path relative to the current module, use
+"/repos/workbench/ops/terraform/modules/aou-rw-reporting/"
+instead. The "./" prefix indicates that the address is a relative filesystem
+path.
+```
+So the last chance to rename things  relatively is just before you've created them and people are depending on them in prod.
+It not really easy to rework your tf  files after deployment. (Another good reason for a scratch project).
+
+### Running in wrong terminal window
+If things get created on the wrong cloud, that's not good. I was really confused when I tried running
+the AWS tutorial tf file. `tf destroy`  is cathartic in such situations. I'm not  even sure it's OK to use  two
+terminals in the same root module at once.
+
+### Using new BigQuery resources
+The BigQuery console UI frequently doesn't list all of the new datasets for several minutes, so using
+`bq show` is helpful if you want to see things  "with  your own eyes after tf operation".
+
+### Yes Man
+If you "yes" out of habit but `terraform apply` or `destroy` bailed out earlier than the prompt,
+you see a string of `y`s in  your terminia. I nearly filed a bug for this, but then realized the `yes`
+command with no argument does that for all time (at least, so far...).
diff --git a/modules/.DS_Store b/modules/.DS_Store
diff --git a/modules/workbench/WORKBENCH-MODULE-PLAN.md b/modules/workbench/WORKBENCH-MODULE-PLAN.md
@@ -0,0 +1,72 @@
+# Workbench Module Plan
+The module directories here represent individually deployable subsystems, 
+microservices, or other functional units. It's easy enough to put all buckets, say,
+in a `gcs` module, but that wouldn't really let us operate on an individual components's bucket.
+
+Following is a broad outline fo each child module. If you feel irritated that you can't see, for example,
+all dashboards in one place, you can still go to the Console or use `gcloud`.
+
+# Workbench Module Development Plan
+The Workbench is the topmost parent module in the AoU Workbench
+Application configuration. It depends on several modules for individual
+subsystems.
+
+After creating a valid Terraform configuration we're  not finished,
+as we need to make sure we don't step on other tools or automatioin.
+For example, items that  pertain to cloud resources will need to move
+out of the workbench JSON config system.
+
+I have automation already for stackdriver setting that fetches all of theiir configurations
+and plan to migrate it to Terraform.
+
+## Reporting
+The state for reporting is currently the BigQuery dataset and its tables and views.
+Highlights
+* Reporting-specific metrics with the `google_logging_metric` [resource](https://www.terraform.io/docs/providers/google/r/logging_metric.html)
+and others
+* Notifications on the system
+* Reporting-specific logs, specific logs
+* Data blocks for views (maybe)
+
+## Backend Database (notional)
+This resource is inherently cross-functional, so we can just put
+* The application DB 
+* backup settings
+This will take advantage of the `google_sql_database_instance` resource.
+
+Schema migrations work via `Ruby->Gradle->Liquibase->MySql->�`  
+Maybe it needs a `Terraform` caboose. It looks like there's not currently a Liquibase provider.
+
+It may not make sense organizationally to do this in Terraform, as there are dependencies on other
+team(s) when instantiating or migrating databases.
+
+## Workbench to RDR Pipeline
+Instantiate [google_cloud_tasks_queue](https://www.terraform.io/docs/providers/google/r/cloud_tasks_queue.html) resource
+resouorces as necessary.
+
+## API Server
+* AppEngine versions, instances, logs, etc. Isn't just named
+App Engine, since that's the resource that gets crated. 
+
+At the moment, there are no plans to rip and replace our existing deployment process or automation,
+but we may find areas that the Terraform approach could be helpful (such as managing dependent
+deployment artifacts or steps in a declarative way.) 
+
+## Action Audit
+This module maps to 
+* Stackdriver logs for each environment. (It will need to
+ move from the application JSON config likely.)
+* Logs-based metrics on the initial log stream
+* Sink to BigQuery dataset for each environment (Stackdriver may need to create initially, in which
+case, we need to do `terraform state import`.)
+* Logs-based metrics on the initial log stream
+* Reporting datasets in BigQuery
+
+## Tiers and Egress Detection
+There is a [sumo logic provider](https://www.sumologic.com/blog/terraform-provider-hosted/) for Terraform, which is very good
+news. It looks really svelte.
+
+We will also want to control the VPC flow logs,
+perimeters, etc, but it  won't be  in this `workbench` module,
+because Terra-not-form owns the organization and needs to do
+creation manually for now.
diff --git a/modules/workbench/main.tf b/modules/workbench/main.tf
@@ -0,0 +1,11 @@
+# Workbench Analytics Reporting Subsystem
+module "reporting" {
+  source = "./modules/reporting"
+
+  # reporting
+  aou_env              = var.aou_env
+  reporting_dataset_id = var.reporting_dataset_id
+
+  # provider
+  project_id = var.project_id
+}
diff --git a/modules/workbench/modules/reporting/.DS_Store b/modules/workbench/modules/reporting/.DS_Store