You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently, the AWS Terraform provider encountered a critical issue (related GitHub issue) that lasted for a couple of days. During this time, Burrito was unable to plan or apply any layers using this provider.
Problem
After the AWS Terraform provider issue was resolved (around 11 PM CEST), Burrito resumed applying layers that had drifted. Unfortunately, this resulted in the unintentional application of one of our drifted layers, which subsequently caused the failure of several frontend services.
Request
We would like to understand if there is a way for Burrito to handle such scenarios more effectively. Specifically:
Is there any mechanism in Burrito to monitor or detect provider-related outages and prevent automatic application of drifted layers until the system is confirmed to be stable?
Does Burrito provide an integrated monitoring or alerting endpoint that can help identify these situations before they cause unintended consequences?
Impact
The issue resulted in downtime for our frontend services, and we are looking for a solution to prevent similar incidents in the future.
The text was updated successfully, but these errors were encountered:
Context
Recently, the AWS Terraform provider encountered a critical issue (related GitHub issue) that lasted for a couple of days. During this time, Burrito was unable to plan or apply any layers using this provider.
Problem
After the AWS Terraform provider issue was resolved (around 11 PM CEST), Burrito resumed applying layers that had drifted. Unfortunately, this resulted in the unintentional application of one of our drifted layers, which subsequently caused the failure of several frontend services.
Request
We would like to understand if there is a way for Burrito to handle such scenarios more effectively. Specifically:
Impact
The issue resulted in downtime for our frontend services, and we are looking for a solution to prevent similar incidents in the future.
The text was updated successfully, but these errors were encountered: