feat(deployment): Add shutdown-on-startup and skip-upgrade properties to support k8s init container use (#31003) #31047

spbolton · 2025-01-03T10:40:15Z

Proposed Changes

Add a FinalStartupServlet servlet that initializes after all other servlets that are set to run on startup. This will check for the shutdown-on-startup config property (DOT_SHUTDOWN_ON_STARTUP as an env var) default false, and if true the server will shut down with exit code 0 just before accepting connections. All startup initialization and upgrade tasks should have been completed. This can be used to run as an Init Container in K8s
Add a skip-upgrade config ( DOT_SKIP_UPGRADE ) default false. If set to true the DB startup tasks will not be attempted to be run. This can be used in a k8s pod replica when it is known that the upgrade tasks have been handled by the init container. There may not be much of a downside of leaving this off as it should detect the work has already been done, but this may help to reduce some unecessary work and also distinguish if there are some tasks that need to be run per instance vs per cluster/db.
Add and fix some logging helping to distinguish the location of startup steps between Servlets and ServletListeners.

Testing

At this stage a basic test that the server does shut down with exit 0 after all startup tasks when DOT_SHUTDOWN_ON_STARTUP=true is passed, and the fact that with the defaults, it does not impact any existing behavior is required. This will be further tested when implementing the init container that will make use of this flag.

This PR fixes: #31003

wezell

Is this the right direction? I think the upgrade tasks can be run independently of setting up the Application Server, e.g. I know in gradle you could run the upgrade tasks from the CLI. My concern is that starting up the app server takes way longer than just running the tasks and when there are no tasks, this could double the container startup time.

Does this run every startup? How will we know when to run it? Maybe there is an init script that calls the DB db_version table, e.g.

psql -c select max(db_version) from db_version

and then reads the dotcms.jar file, parsing the UpgradeTasks for the highest numbered task, e.g. regexing for something like UpgradeTask(\d{8}).class and if the db_version does not match that, force the run?

Also, another thing to think about is when multiple containers in a pod are coming up at the same time.

spbolton · 2025-01-03T17:06:24Z

Is this the right direction? I think the upgrade tasks can be run independently of setting up the Application Server, e.g. I know in gradle you could run the upgrade tasks from the CLI. My concern is that starting up the app server takes way longer than just running the tasks and when there are no tasks, this could double the container startup time.

Does this run every startup? How will we know when to run it? Maybe there is an init script that calls the DB db_version table, e.g.
psql -c select max(db_version) from db_version
and then reads the dotcms.jar file, parsing the UpgradeTasks for the highest numbered task, e.g. regexing for something like UpgradeTask(\d{8}).class and if the db_version does not match that, force the run?

Also, another thing to think about is when multiple containers in a pod are coming up at the same time.

However, there is a case to be made for the current approach of starting the full container in an init phase:

Consistency and Simplicity: Starting and shutting down the same container image during initialization simplifies the implementation, keeping the upgrade process consistent with how the running replicas behave. This avoids creating new scripts and custom upgrade logic. This approach aligns with Quarkus’ default Kubernetes deployment strategy, though our startup process is less streamlined.
Init Container Behavior: The init container does not run every time a replica starts. It only runs when a change occurs in the deployment descriptor, such as a configuration update or a new image version. Scaling replicas without configuration changes does not trigger the init container, even if instances are deleted and recreated. The init container will rely on the existing upgrade logic based on the database state. The init container does not run for each replica it runs once before the parallel replicas are started. For more info on when an init container runs check this link https://stackoverflow.com/questions/58693757/running-init-container-on-deployment-update
Parallel Replica Startup: Currently, replicas are scaled up one at a time. By offloading upgrade tasks to the init container, replicas can start in parallel since the upgrade tasks will already be complete.
Smoke Testing and Reliability: Running the same container image for both the init container and the replicas acts as a smoke test. If the init container fails, it prevents the replicas from starting and can trigger a rollback or retry instead of introducing issues later in production.
Health Probes and Startup Timing: Offloading upgrades to the init container allows for better tuning of health probe timeouts since the replicas themselves won’t be performing long upgrade tasks, reducing the risk of false positives in startup delays.

A separate effort is already underway to audit the startup process more closely. This will help clarify whether some tasks can be moved to run post-startup and determine if any steps need to be repeated on each replica or if concurrency issues could arise when multiple instances perform upgrades simultaneously.

For now, running the same image in both the init container and replicas minimizes complexity, ensures a single code path, and reduces the risk of inconsistencies between the upgrade process and normal startup behavior. We may eventually want to factor out the init container especially if we manage to make all the upgrade tasks safe to run in the background with the server started, but for now it solves an immediate problem without adding additional code logic and scripts in the deployment layer, keeping it in the core code.

Either way this current change only adds extra configurability and information on startup order and should not alter any other existing behaviour allowing us to make any of these choices going forward.

dcolina

Looks good 👍🏽

wezell

Great responses Steve. Point number #2 addresses. most if not all of my concerns.

spbolton linked an issue Jan 3, 2025 that may be closed by this pull request

Add shutdown-on-startup and skip-upgrade properties to support running as init-container #31003

Closed

github-actions bot mentioned this pull request Jan 3, 2025

Add shutdown-on-startup and skip-upgrade properties to support running as init-container #31003

Closed

spbolton changed the title ~~feat(deployment): Add shutdownonstartup and skipupgrade properties t (#31003)~~ feat(deployment): Add shutdown-on-startup and skip-upgrade properties to support k8s init container use (#31003) Jan 3, 2025

spbolton enabled auto-merge January 3, 2025 10:57

wezell requested changes Jan 3, 2025

View reviewed changes

Add shutdown-on-startup

452a5fb

spbolton force-pushed the issue-31003-add-shutdown-on-startup-and-skip-upgrade branch from 0efbbff to 452a5fb Compare January 6, 2025 18:10

dcolina approved these changes Jan 8, 2025

View reviewed changes

wezell approved these changes Jan 8, 2025

View reviewed changes

spbolton added this pull request to the merge queue Jan 8, 2025

Merged via the queue into main with commit 7bdf6b5 Jan 8, 2025
36 checks passed

spbolton deleted the issue-31003-add-shutdown-on-startup-and-skip-upgrade branch January 8, 2025 15:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(deployment): Add shutdown-on-startup and skip-upgrade properties to support k8s init container use (#31003) #31047

feat(deployment): Add shutdown-on-startup and skip-upgrade properties to support k8s init container use (#31003) #31047

spbolton commented Jan 3, 2025 •

edited by github-actions bot

Loading

wezell left a comment

spbolton commented Jan 3, 2025 •

edited

Loading

dcolina left a comment

wezell left a comment

feat(deployment): Add shutdown-on-startup and skip-upgrade properties to support k8s init container use (#31003) #31047

feat(deployment): Add shutdown-on-startup and skip-upgrade properties to support k8s init container use (#31003) #31047

Conversation

spbolton commented Jan 3, 2025 • edited by github-actions bot Loading

Proposed Changes

Testing

wezell left a comment

Choose a reason for hiding this comment

spbolton commented Jan 3, 2025 • edited Loading

dcolina left a comment

Choose a reason for hiding this comment

wezell left a comment

Choose a reason for hiding this comment

spbolton commented Jan 3, 2025 •

edited by github-actions bot

Loading

spbolton commented Jan 3, 2025 •

edited

Loading