Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Production DNS to Notify AWS Accounts #36

Open
3 of 6 tasks
ben851 opened this issue Mar 27, 2023 · 49 comments
Open
3 of 6 tasks

Move Production DNS to Notify AWS Accounts #36

ben851 opened this issue Mar 27, 2023 · 49 comments
Assignees
Labels
Reliability Task related to reliability. Support l Soutien Maintenance and bugs while on call. Tech Debt An issue targeting an identified technical debt

Comments

@ben851
Copy link
Contributor

ben851 commented Mar 27, 2023

Description

As a developer/operator of Notify, I want to be able to control the DNS flow for GC Notify in my own infrastructure as code, and not have a dependency on an external team.

WHY

Currently the Production DNS is owned by CDS SRE, and we are required to submit PRs to their code base in order to modify these references. In Staging, this is done via click-ops and slack requests. It is important to codify both production and staging DNS entries in order to ensure consistency. At the same time, it is also important to remove dependencies on external teams to create high velocity development and operation cycles.

WHAT

We can open a PR with CDS SRE to have the *.notification. delegated to our own AWS accounts, at which point we can codify our DNS entries in our own terraform repositories.

VALUE

By having full control over our DNS entries, we will streamline the release and change management process. We will also be able to quickly spin up new environments and automatically create DNS entries for them under the sandbox dns zone.

Acceptance Criteria

  • Both DNS zones are delegated to their respective AWS accounts
  • All DNS entries for notify are done using Terraform
  • Terraform pipelines in github actions manage the DNS as well

QA Steps

  • Verify that the website works
  • DNS entries resolve correctly
  • New environment is spun up with automatic DNS endpoints created
@ben851 ben851 added Low Hanging Fruit | Pff! Facile! Low hanging fruit task. Reliability Task related to reliability. Support l Soutien Maintenance and bugs while on call. Tech Debt An issue targeting an identified technical debt labels Mar 30, 2023
@jimleroyer
Copy link
Member

Hey team! Please add your planning poker estimate with Zenhub @sastels @ben851

@ben851
Copy link
Contributor Author

ben851 commented Apr 3, 2023

This is a pretty quick change, but we will need to be careful w/ Production.

@jimleroyer jimleroyer removed the Low Hanging Fruit | Pff! Facile! Low hanging fruit task. label Apr 4, 2023
@ben851
Copy link
Contributor Author

ben851 commented May 5, 2023

Attaching this to the BCP epic as I need to get this done in order to automate the validation of ACM certificates.

@ben851
Copy link
Contributor Author

ben851 commented May 8, 2023

@ben851
Copy link
Contributor Author

ben851 commented May 9, 2023

  • came up with a better way to do this, and will put forward a new PR today.

@ben851
Copy link
Contributor Author

ben851 commented May 10, 2023

  • New PR up for review

@ben851
Copy link
Contributor Author

ben851 commented May 10, 2023

  • Staging released
  • Will migrate DNS for staging tomorrow morning

@ben851 ben851 self-assigned this May 10, 2023
@ben851
Copy link
Contributor Author

ben851 commented May 11, 2023

  • Staging DNS Migrated
  • Moving to blocked while we wait for SRE to give access to Prod.

@ben851
Copy link
Contributor Author

ben851 commented May 17, 2023

  • Had to implement some magic for the DKIM validation records

@ben851
Copy link
Contributor Author

ben851 commented May 24, 2023

  • I deployed a fix to staging to correct the issue with DKIM validation
  • AWS says it may take up to 72 hours to take effect. Will check at the end of the week

@ben851
Copy link
Contributor Author

ben851 commented Jun 5, 2023

Will follow up with SRE today

@ben851
Copy link
Contributor Author

ben851 commented Jun 6, 2023

  • I contacted Calvin yesterday with no response. Guillaume Charest is the owner of the ticket, but is on vacation.

@ben851
Copy link
Contributor Author

ben851 commented Jun 7, 2023

  • Ben to follow up with Calvin and Guillaume when they return from vacation

@ben851
Copy link
Contributor Author

ben851 commented Jun 8, 2023

  • Calvin says he needs to talk w/ Max about this and will get back to me.

@ben851
Copy link
Contributor Author

ben851 commented Jun 13, 2023

  • Calvin has said that the SRE team wants us to move towards requesting a name server redirect with SSC so we can manage it in our own account. I have found their discussion thread in slack and posited an alternate option to see if that is acceptable.

@ben851
Copy link
Contributor Author

ben851 commented Jun 13, 2023

  • Alternate option will not work due to DNS technical limitations
  • I've sent an email to the SSC account managers for CDS asking for the procedure to get a request in with the SSC DNS team.

@ben851
Copy link
Contributor Author

ben851 commented Jun 15, 2023

  • SSC Responded saying they will set up a meeting. Have not heard back, will poke today.

@ben851
Copy link
Contributor Author

ben851 commented Jun 19, 2023

  • Emailed SSC on Friday. Have not heard back. Will ask again.

@ben851
Copy link
Contributor Author

ben851 commented Jun 20, 2023

  • Emailed SSC Monday, no response.

@ben851
Copy link
Contributor Author

ben851 commented Jun 22, 2023

  • I forwarded the email to Mario and Calvin, asking for them to escalate since I have not heard back for over a week.

@ben851
Copy link
Contributor Author

ben851 commented Jun 27, 2023

  • Meeting w/ SSC scheduled for today at 1pm.

@ben851
Copy link
Contributor Author

ben851 commented Jun 27, 2023

  • SSC Confirmed that I should open a ticket with TBS help desk which will in turn be forwarded to SSC.
  • Ticket has been opened.

@jimleroyer
Copy link
Member

Ticket was closed; we need to open a form with Principal Publisher. Ben is in process to submit the new form.

@ben851
Copy link
Contributor Author

ben851 commented Jul 12, 2023

Ben will do the pre-requisite work before opening the ticket again.

@jimleroyer
Copy link
Member

To do a Terraform release this morning with Ben to move DNS setup to production.

@jimleroyer
Copy link
Member

We encountered an issue with the permissions. Ben to resolve as the release is currently blocked.

@jimleroyer
Copy link
Member

We encountered issues while running the plan to production. The new DNS provider wasn't behaving as expected and we had to make further refactoring to accommodate the differences with our different environments.

@ben851
Copy link
Contributor Author

ben851 commented Jul 20, 2023

DNS zone is now deploying to production. Needs to be tested and still need integration w/ SSC.

@ben851
Copy link
Contributor Author

ben851 commented Jun 26, 2024

We had a heck of a time getting SSC and ESDC together to deploy this.

Instead, we are looking at providing terraform with access to the notification.canada.ca route53 zone already owned by CDS.

@ben851 ben851 changed the title Move Production and Staging/Sandbox DNS to Notify AWS Accounts Move Production DNS to Notify AWS Accounts Jun 26, 2024
@ben851
Copy link
Contributor Author

ben851 commented Jun 26, 2024

Opened an issue with SRE to get access to the notification.canada.ca route 53 zone
cds-snc/dns#395

@ben851
Copy link
Contributor Author

ben851 commented Jun 26, 2024

After speaking with Pat some more, I'm going to kill two birds with one stone and move the terraform plan/apply workflows to OIDC authentication.

@ben851
Copy link
Contributor Author

ben851 commented Jul 3, 2024

Migrated Terraform to OIDC yesterday. Will reach out to Pat to get the new permission scheme

@ben851
Copy link
Contributor Author

ben851 commented Jul 4, 2024

Had to revert the OIDC because it was causing problems with quicksight. Investigating

@ben851
Copy link
Contributor Author

ben851 commented Jul 8, 2024

Will be debugging today

@ben851
Copy link
Contributor Author

ben851 commented Jul 9, 2024

Refactored OIDC into the new multi-job workflows. Reproduced bug with quicksight, added an additional permission for pull-requests on the github workflow, and it's been resolved. Need 2 PRs approved:

Staging fix:
cds-snc/notification-terraform#1421

Production:
cds-snc/notification-terraform#1419

@ben851
Copy link
Contributor Author

ben851 commented Jul 10, 2024

Staging and Production running on OIDC again.

Sylvia is working on this ticket today to grant us access to the prod DNS account, at which point I will be doing diffs and imports to migrate our stuff over.

@jimleroyer
Copy link
Member

Ben will work on a Terraform release as the OIDC prod changes that were done did not work. He will fix this. Afterward, we will be waiting on Sylvia to unblock us.

@ben851
Copy link
Contributor Author

ben851 commented Jul 10, 2024

OIDC fixed in prod, had to open an issue with SRE to get increased permissions on the notification-terraform-plan role in prod.

@ben851
Copy link
Contributor Author

ben851 commented Jul 11, 2024

PR for new role here:
cds-snc/dns#397

I commented that we also need at least read access for the notify-core team.

@ben851
Copy link
Contributor Author

ben851 commented Jul 15, 2024

I got access on Friday - I will work on this today.

@ben851
Copy link
Contributor Author

ben851 commented Jul 16, 2024

Did a comparison between "real"prod and the "fake" prod that DNS records are set to. There were a few discrepancies - merged in some missing entires for

doc.notification.canada.ca
document.notification.canada.ca
api.document.notification.canada.ca (maybe this is why doc-download-api didn't work in dev
documetnation.notification.canada.ca
www.notification.canada.ca

Also had a mismatch on the weighted api.notification.canada.ca which in real life points to itself instead of the api-gateway lambda endpoint. That PR will be merged today.

Once both are merged, I will re-compare between real and fake, and if all is good, change the provider in TF to point to the "real" DNS and start doing imports.

@P0NDER0SA P0NDER0SA assigned P0NDER0SA and ben851 and unassigned ben851 and P0NDER0SA Jul 18, 2024
@ben851
Copy link
Contributor Author

ben851 commented Jul 22, 2024

Finished checks and did an import on prod Friday. I then made a backup of these import states and then restored the old states. Need to merge this PR to staging and then create a prod release, and restore the new states
cds-snc/notification-terraform#1447

@ben851
Copy link
Contributor Author

ben851 commented Jul 22, 2024

Merged to staging, production release ready.

cds-snc/notification-terraform#1449

@P0NDER0SA
Copy link

Pond will review this PR

@P0NDER0SA
Copy link

Just approved this one

@P0NDER0SA P0NDER0SA self-assigned this Jul 24, 2024
@ben851
Copy link
Contributor Author

ben851 commented Jul 24, 2024

Implemented in prod. Need to get SRE to remove their references to DNS in their repository

@ben851
Copy link
Contributor Author

ben851 commented Jul 24, 2024

Issue opened with SRE
cds-snc/dns#408

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reliability Task related to reliability. Support l Soutien Maintenance and bugs while on call. Tech Debt An issue targeting an identified technical debt
Projects
None yet
Development

No branches or pull requests

3 participants