Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

devhost: cleanup unused VMs (PL-132737) #1086

Merged
merged 1 commit into from
Aug 22, 2024

Conversation

leona-ya
Copy link
Member

@leona-ya leona-ya commented Aug 21, 2024

PL-132737

@flyingcircusio/release-managers

Release process

Impact:

  • devhost VMs will be shut down after 14 days and deleted after 31 days now if they don't get a deployment in the mean time

Changelog:

  • devhost VMs will be shut down after 14 days and deleted after 31 days now if they don't get a deployment in the mean time (PL-132737)

PR release workflow (internal)

  • PR has internal ticket
  • internal issue ID (PL-…) part of branch name
  • internal issue ID mentioned in PR description text
  • ticket is on Platform agile board
  • ticket state set to Pull request ready
  • if ticket is more urgent than within the next few days, directly contact a member of the Platform team

Design notes

  • Provide a feature toggle if the change might need to be adjusted/reverted quickly depending on context. Consider whether the default should be on or off. Example: rate limiting.
  • All customer-facing features and (NixOS) options need to be discoverable from documentation. Add or update relevant documentation such that hosted and guided customers can understand it as well.

Security implications

  • Security requirements defined? (WHERE)

    • This is according the process as discussed internally and with customers
    1. VMs created before this PR should not be stopped and deleted directly, but their countdown should then start.
    2. Data should not be lost
  • Security requirements tested? (EVIDENCE)

    • Change tested on largo00
    1. VMs that were created before the last_deploy_date config attribute was introduced - i.e. before this PR arrived on machines - get this attribute with the first cleanup run and a value of now
    2. devhost VMs are intended to be short-lived, so by concept they should not have any valuable data. Also by shutting down the VM before it gets deleted, users of the VM get informed when they want to use it

@leona-ya leona-ya force-pushed the leona/PL-132737-devhost-cleanup-unused-vms branch from 6084dde to 01f946d Compare August 21, 2024 13:47
fcntl.flock(self.lockfile, fcntl.LOCK_UN)
# Wait for the VM to get online
print("Waiting for VM to become pingable ...")
while True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - nothing to change right away: In fc.qemu and other places we have timeout helpers for loops like this.

vm_cfg["last_deploy_date"] = (
datetime.datetime.utcnow().isoformat()
)
with open(CONFIG_DIR / f"{vm_cfg['name']}.json", mode="w") as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - nothing to do now: There's a pattern around (also fc.qemu) that provides a safe "write and replace" approach that ensures atomicity. This pattern here is fragile in crash/power-off situations and can leave defunct files behind.

with open(CONFIG_DIR / f"{vm_cfg['name']}.json", mode="w") as f:
f.write(json.dumps(vm_cfg))

if datetime.datetime.fromisoformat(vm_cfg["last_deploy_date"]) < (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small pattern nitpick: computing the age and comparing it is a tiny bit easier to read than comparing two absolute dates.

@ctheune ctheune merged commit 3591d4c into fc-21.05-dev Aug 22, 2024
2 checks passed
@ctheune ctheune deleted the leona/PL-132737-devhost-cleanup-unused-vms branch August 22, 2024 06:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants