Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(deployment): Add shutdown-on-startup and skip-upgrade properties to support k8s init container use (#31003) #31047

Merged

Conversation

spbolton
Copy link
Contributor

@spbolton spbolton commented Jan 3, 2025

Proposed Changes

  • Add a FinalStartupServlet servlet that initializes after all other servlets that are set to run on startup. This will check for the shutdown-on-startup config property (DOT_SHUTDOWN_ON_STARTUP as an env var) default false, and if true the server will shut down with exit code 0 just before accepting connections. All startup initialization and upgrade tasks should have been completed. This can be used to run as an Init Container in K8s
  • Add a skip-upgrade config ( DOT_SKIP_UPGRADE ) default false. If set to true the DB startup tasks will not be attempted to be run. This can be used in a k8s pod replica when it is known that the upgrade tasks have been handled by the init container. There may not be much of a downside of leaving this off as it should detect the work has already been done, but this may help to reduce some unecessary work and also distinguish if there are some tasks that need to be run per instance vs per cluster/db.
  • Add and fix some logging helping to distinguish the location of startup steps between Servlets and ServletListeners.

Testing

At this stage a basic test that the server does shut down with exit 0 after all startup tasks when DOT_SHUTDOWN_ON_STARTUP=true is passed, and the fact that with the defaults, it does not impact any existing behavior is required. This will be further tested when implementing the init container that will make use of this flag.

This PR fixes: #31003

@spbolton spbolton changed the title feat(deployment): Add shutdownonstartup and skipupgrade properties t (#31003) feat(deployment): Add shutdown-on-startup and skip-upgrade properties to support k8s init container use (#31003) Jan 3, 2025
@spbolton spbolton enabled auto-merge January 3, 2025 10:57
Copy link
Contributor

@wezell wezell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right direction? I think the upgrade tasks can be run independently of setting up the Application Server, e.g. I know in gradle you could run the upgrade tasks from the CLI. My concern is that starting up the app server takes way longer than just running the tasks and when there are no tasks, this could double the container startup time.

Does this run every startup? How will we know when to run it? Maybe there is an init script that calls the DB db_version table, e.g.

psql -c select max(db_version) from db_version

and then reads the dotcms.jar file, parsing the UpgradeTasks for the highest numbered task, e.g. regexing for something like UpgradeTask(\d{8}).class and if the db_version does not match that, force the run?

Also, another thing to think about is when multiple containers in a pod are coming up at the same time.

@spbolton
Copy link
Contributor Author

spbolton commented Jan 3, 2025

Is this the right direction? I think the upgrade tasks can be run independently of setting up the Application Server, e.g. I know in gradle you could run the upgrade tasks from the CLI. My concern is that starting up the app server takes way longer than just running the tasks and when there are no tasks, this could double the container startup time.

Does this run every startup? How will we know when to run it? Maybe there is an init script that calls the DB db_version table, e.g.

psql -c select max(db_version) from db_version

and then reads the dotcms.jar file, parsing the UpgradeTasks for the highest numbered task, e.g. regexing for something like UpgradeTask(\d{8}).class and if the db_version does not match that, force the run?

Also, another thing to think about is when multiple containers in a pod are coming up at the same time.

However, there is a case to be made for the current approach of starting the full container in an init phase:

  1. Consistency and Simplicity: Starting and shutting down the same container image during initialization simplifies the implementation, keeping the upgrade process consistent with how the running replicas behave. This avoids creating new scripts and custom upgrade logic. This approach aligns with Quarkus’ default Kubernetes deployment strategy, though our startup process is less streamlined.
  2. Init Container Behavior: The init container does not run every time a replica starts. It only runs when a change occurs in the deployment descriptor, such as a configuration update or a new image version. Scaling replicas without configuration changes does not trigger the init container, even if instances are deleted and recreated. The init container will rely on the existing upgrade logic based on the database state. The init container does not run for each replica it runs once before the parallel replicas are started. For more info on when an init container runs check this link https://stackoverflow.com/questions/58693757/running-init-container-on-deployment-update
  3. Parallel Replica Startup: Currently, replicas are scaled up one at a time. By offloading upgrade tasks to the init container, replicas can start in parallel since the upgrade tasks will already be complete.
  4. Smoke Testing and Reliability: Running the same container image for both the init container and the replicas acts as a smoke test. If the init container fails, it prevents the replicas from starting and can trigger a rollback or retry instead of introducing issues later in production.
  5. Health Probes and Startup Timing: Offloading upgrades to the init container allows for better tuning of health probe timeouts since the replicas themselves won’t be performing long upgrade tasks, reducing the risk of false positives in startup delays.

A separate effort is already underway to audit the startup process more closely. This will help clarify whether some tasks can be moved to run post-startup and determine if any steps need to be repeated on each replica or if concurrency issues could arise when multiple instances perform upgrades simultaneously.

For now, running the same image in both the init container and replicas minimizes complexity, ensures a single code path, and reduces the risk of inconsistencies between the upgrade process and normal startup behavior. We may eventually want to factor out the init container especially if we manage to make all the upgrade tasks safe to run in the background with the server started, but for now it solves an immediate problem without adding additional code logic and scripts in the deployment layer, keeping it in the core code.

Either way this current change only adds extra configurability and information on startup order and should not alter any other existing behaviour allowing us to make any of these choices going forward.

@spbolton spbolton force-pushed the issue-31003-add-shutdown-on-startup-and-skip-upgrade branch from 0efbbff to 452a5fb Compare January 6, 2025 18:10
Copy link
Contributor

@dcolina dcolina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍🏽

Copy link
Contributor

@wezell wezell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great responses Steve. Point number #2 addresses. most if not all of my concerns.

@spbolton spbolton added this pull request to the merge queue Jan 8, 2025
Merged via the queue into main with commit 7bdf6b5 Jan 8, 2025
36 checks passed
@spbolton spbolton deleted the issue-31003-add-shutdown-on-startup-and-skip-upgrade branch January 8, 2025 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add shutdown-on-startup and skip-upgrade properties to support running as init-container
3 participants