Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sigterm to workers #2129

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

madhur-ob
Copy link
Collaborator

closes #1349

dummy flow to check this:

from metaflow import FlowSpec, step
import signal
import time

class GracefulShutdownTest(FlowSpec):
    @step
    def start(self):
        self.ways = ['quick', 'slow']
        # Split into multiple tasks to test different behaviors
        self.next(self.shutdown_handlers, foreach='ways')

    @step
    def shutdown_handlers(self):
        self.running = True
        # Different behaviors when receiving SIGTERM
        if self.input == 'quick':
            # This task exits quickly when receiving SIGTERM
            def handler(signum, frame):
                print("Quick task received SIGTERM, exiting gracefully")
                self.running = False
            signal.signal(signal.SIGTERM, handler)
            
        elif self.input == 'slow':
            # This task takes a few seconds to clean up
            def handler(signum, frame):
                print("Slow task received SIGTERM, cleaning up...")
                time.sleep(3)  # Simulate cleanup
                print("Slow task finished cleanup, exiting")
                self.running = False
            signal.signal(signal.SIGTERM, handler)

        # All tasks will hang here until killed
        print(f"Task {self.input} running...")
        while self.running:
            time.sleep(1)

        self.next(self.join_handlers)

    @step
    def join_handlers(self, inputs):
        self.next(self.end)
        
    @step
    def end(self):
        pass

if __name__ == '__main__':
    GracefulShutdownTest()

the logs will be:

Metaflow 2.12.3.post147-git77510f0 executing GracefulShutdownTest for user:madhur
Validating your flow...
    The graph looks good!
Running pylint...
    Pylint is happy!
2024-11-04 13:01:18.534 Workflow starting (run-id 1730705478533731):
2024-11-04 13:01:18.544 [1730705478533731/start/1 (pid 74179)] Task is starting.
2024-11-04 13:01:18.823 [1730705478533731/start/1 (pid 74179)] Foreach yields 2 child steps.
2024-11-04 13:01:18.823 [1730705478533731/start/1 (pid 74179)] Task finished successfully.
2024-11-04 13:01:18.828 [1730705478533731/shutdown_handlers/2 (pid 74184)] Task is starting.
2024-11-04 13:01:18.833 [1730705478533731/shutdown_handlers/3 (pid 74185)] Task is starting.
2024-11-04 13:01:19.089 [1730705478533731/shutdown_handlers/3 (pid 74185)] Task slow running...
2024-11-04 13:01:19.090 [1730705478533731/shutdown_handlers/2 (pid 74184)] Task quick running...
^C2024-11-04 13:01:21.233 Workflow interrupted.
2024-11-04 13:01:21.233 Attempting graceful shutdown of 2 active tasks...
2024-11-04 13:01:21.233 [1730705478533731/shutdown_handlers/2 (pid 74184)] [TERMINATED BY ORCHESTRATOR]
2024-11-04 13:01:21.233 [1730705478533731/shutdown_handlers/2 (pid 74184)] [TERMINATED BY ORCHESTRATOR]
2024-11-04 13:01:21.233 [1730705478533731/shutdown_handlers/3 (pid 74185)] [TERMINATED BY ORCHESTRATOR]
2024-11-04 13:01:21.233 [1730705478533731/shutdown_handlers/3 (pid 74185)] [TERMINATED BY ORCHESTRATOR]
2024-11-04 13:01:24.265 Flushing logs...
2024-11-04 13:01:24.265 [1730705478533731/shutdown_handlers/3 (pid 74185)] Slow task received SIGTERM, cleaning up...
2024-11-04 13:01:24.265 [1730705478533731/shutdown_handlers/3 (pid 74185)] 
2024-11-04 13:01:24.265 [1730705478533731/shutdown_handlers/3 (pid 74185)] Aborted!
2024-11-04 13:01:24.265 [1730705478533731/shutdown_handlers/3 (pid 74185)] Slow task finished cleanup, exiting
2024-11-04 13:01:24.266 [1730705478533731/shutdown_handlers/3 (pid 74185)] Task failed.
2024-11-04 13:01:24.266 This failed task will not be retried.
2024-11-04 13:01:24.266 [1730705478533731/shutdown_handlers/2 (pid 74184)] Quick task received SIGTERM, exiting gracefully
2024-11-04 13:01:24.266 [1730705478533731/shutdown_handlers/2 (pid 74184)] 
2024-11-04 13:01:24.266 [1730705478533731/shutdown_handlers/2 (pid 74184)] Aborted!
2024-11-04 13:01:24.266 [1730705478533731/shutdown_handlers/2 (pid 74184)] Task failed.
2024-11-04 13:01:24.267 This failed task will not be retried.

Aborted!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Send a warning signal to tasks before killing them
1 participant