Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce housekeeping pressure on CloudFormation #17

Merged
merged 6 commits into from
Jan 4, 2024

Commits on Jan 4, 2024

  1. JPERF-1332: Reduce pressure on CloudFormation

    When there's a lot of expired stacks (e.g. due to housekeeping outage),
    then listing them all requires a lot of Cloudformation requests.
    Then we get throttled, so AWS SDK makes a couple retries underneath,
    but we finally time out. Then we don't reach the stack deletion part.
    So the list remains long and it's a self-perpetuating problem.
    
    Instead, start deleting stacks as soon as we get a batch listed.
    dagguh committed Jan 4, 2024
    Configuration menu
    Copy the full SHA
    6c0036b View commit details
    Browse the repository at this point in the history
  2. Scroll-delete EC2 instances too

    We haven't been throttled by EC2 like for CloudFormation,
    but do it for consistency with CloudFormation housekeeping.
    dagguh committed Jan 4, 2024
    Configuration menu
    Copy the full SHA
    8291fc8 View commit details
    Browse the repository at this point in the history
  3. JPERF-1208: Clean up security groups before stacks

    Network stacks contain VPCs, and VPCs depend on security groups.
    Some security groups are provisioned outside of the stack,
    so deleting such a stack will fail due to the dependency.
    
    Delete stacks at the end, so that all external dependencies are already
    cleaned up.
    dagguh committed Jan 4, 2024
    Configuration menu
    Copy the full SHA
    f2c6336 View commit details
    Browse the repository at this point in the history
  4. Add jenv

    dagguh committed Jan 4, 2024
    Configuration menu
    Copy the full SHA
    6892620 View commit details
    Browse the repository at this point in the history
  5. Fix always reporting fail even on success

    Avoid false-positives like (notice lack of stacktrace logging too):
    ```
    16:36:00,592 ERROR {} Ec2Instance(instanceId = i-080edd5933a74418f) failed to release itself
    16:36:30,577 ERROR {} Ec2Instance(instanceId = i-061f9c1f94990848e) failed to release itself
    ```
    dagguh committed Jan 4, 2024
    Configuration menu
    Copy the full SHA
    b258ad7 View commit details
    Browse the repository at this point in the history
  6. Release SSH keys sequentially

    When you hit the 5k key limit, starting 5k threads is ok for Java,
    but not ok for AWS. We get immediately throttled.
    It's actually faster to do it sequentially. And AWS is quick to delete.
    dagguh committed Jan 4, 2024
    Configuration menu
    Copy the full SHA
    b0952f3 View commit details
    Browse the repository at this point in the history