Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Lambda graceful shutdown mechanism only works for Python 3.7, not 3.8 or 3.9 #2

Closed
handlerbot opened this issue Dec 12, 2022 · 14 comments

Comments

@handlerbot
Copy link

Hi! I implemented this pattern as described in a Lambda function using the Python 3.9 runtime on the ARM architecture in us-west-2, and while I'm getting the EXTENSION Name: cloudwatch_lambda_agent State: Ready Events: [SHUTDOWN,INVOKE] logs, my shutdown handler is never being run.

Any suggestions for debugging I can do? Does anyone have a live version of this working for Python (any version, any architecture, any region) that I could try reproducing by matching the successfully working characteristics?

Thanks for any help anyone can provide!

@bnusunny
Copy link
Contributor

Python3.8 and Python3.9 runtimes have this issue. Python3.7 is not affected. I'll forward your feedback to Lambda service team.

@handlerbot handlerbot changed the title Does this approach work / still work with Python? AWS Lambda graceful shutdown mechanism only works for Python 3.7, not 3.8 or 3.9 Dec 12, 2022
@handlerbot
Copy link
Author

handlerbot commented Dec 12, 2022

Boo for that, but good to know, and thank you for the quick response and sending the feedback in!

I have updated the issue title to help others who come in and look, please let me know if you want it changed to something else, or feel free to change it yourself if you have the permissions.

@andidev
Copy link

andidev commented Jan 26, 2023

really? come on!
@bnusunny
can you please add it very clearly to the documentation.

@bnusunny
Copy link
Contributor

bnusunny commented Feb 9, 2023

Yes. I added a note in the README. The Lambda team is aware of this issue and has a plan to improve this.

@andidev
Copy link

andidev commented Apr 30, 2023

@bnusunny and what about new python 3.10? there support added for it?

@bnusunny
Copy link
Contributor

bnusunny commented May 2, 2023

Not yet. I have one workaround if you are building an API: use Lambda Web Adapter to run your Python API on Lambda. It receives SIGTERM with that tool. Here are examples for running Flask and FastAPI with LWA.

@jodylent
Copy link

aws-lambda-python-runtime-interface-client#105 has an update with details which may be relevant here.

@bnusunny
Copy link
Contributor

Great!

@bnusunny
Copy link
Contributor

Good news! This is fixed in Python 3.12 runtime.

https://aws.amazon.com/blogs/compute/python-3-12-runtime-now-available-in-aws-lambda/

@jodylent
Copy link

Thanks for the update -- VERY glad to see progress here!

Is there any plan to backport the fix to prior Python versions?

This remains an issue for anyone using 3.8 <= x <= 3.11, which is going to be the vast majority of organizations, as 3.12 isn't even 3 months into GA

eschwartz added a commit to eschwartz/graceful-shutdown-with-aws-lambda that referenced this issue Dec 15, 2023
@eschwartz
Copy link
Contributor

Please make a note of this issue on the documentation page. I lost many hours trying to get the examples here to work, before finding this (closed) GitHub issue. I'd say this warrants a large warning message on top of the python readme, considering that the examples in this repo will not currently work for many users.

I made a PR with the documentation change: #9

@jodylent
Copy link

Additional info:

  • The Python atexit module does not work on any supported AWS Runtime
    • If this is intentional (and it may be, as a consequence of the Firecracker MicroVM environement), it's not well-documented.
  • The Python signal module works on ONLY Python 3.7 && 3.12
  • Both modules are easily tested via the following dummy handler
Test Handler (collapsed)
# Set Function handler -> $this_file.my_func
import atexit
import signal


def _atexit_hook(message):
    """Register twice, with different messages, to distinguish Lambda INIT/Handler behavior"""
    print("==== ATEXIT HOOK ====")
    print(f"Hook received message: {message}")
    print("==== ATEXIT HOOK ====")


def _signal_hook_1(sig, frame):
    """Signal module accepts a single handler per-signal"""
    print("==== SIGNAL HOOK (1) ====")
    print(f"Caught SIGTERM with sig={sig}, frame={frame}")
    print("==== SIGNAL HOOK (1) ====")


def _signal_hook_2(sig, frame):
    """This func provides a way to test OVERWRITE behavior due to re-registration for a given SIGFOO"""
    print("==== SIGNAL HOOK (2) ====")
    print(f"Caught SIGTERM with sig={sig}, frame={frame}")
    print("==== SIGNAL HOOK (2) ====")


hook_msg = "INIT hook"
msg = f"Register atexit hook: {_atexit_hook} in INIT with message={hook_msg}"
print(f"ATTEMPT: {msg}")
atexit.register(_atexit_hook, hook_msg)
print(f"SUCCESS: {msg}")


msg = f"Register signal hook: {_signal_hook_1} in INIT"
print(f"ATTEMPT: {msg}")
signal.signal(signal.SIGTERM, _signal_hook_1)
print(f"SUCCESS: {msg}")


def my_func(event_records=None, tools=None, *args, **kwargs):
    hook_msg = "ENTRYPOINT hook"
    msg = f"Register atexit hook: {_atexit_hook} in ENTRYPOINT with message={hook_msg}"
    print(f"ATTEMPT: {msg}")
    atexit.register(_atexit_hook, hook_msg)
    print(f"SUCCESS: {msg}")

    msg = f"Register signal hook: {_signal_hook_2} in ENTRYPOINT"
    print(f"ATTEMPT: {msg}")
    signal.signal(signal.SIGTERM, _signal_hook_2)
    print(f"SUCCESS: {msg}")
  • And are expected to produce output similar to the following
Test Handler (collapsed)
INIT_START Runtime Version: python:3.12.v16    Runtime Version ARN: arn:aws:lambda:us-east-1::runtime:c9875014cbcc77e3455765804516f064d18fe7b27ae7bdb2b1d84ab01ba784f5
ATTEMPT: Register atexit hook: <function _atexit_hook at 0x7f1945b3b4c0> in INIT with message=INIT hook
SUCCESS: Register atexit hook: <function _atexit_hook at 0x7f1945b3b4c0> in INIT with message=INIT hook
ATTEMPT: Register signal hook: <function _signal_hook_1 at 0x7f1945b3b740> in INIT
SUCCESS: Register signal hook: <function _signal_hook_1 at 0x7f1945b3b740> in INIT

# Any extension will do -- I used DD_EXTENSION because it logs at INFO during both INIT and SHUTDOWN lifecycle stages

2023-12-15 16:10:33 UTC | DD_EXTENSION | INFO | 0 Features detected from environment:
2023-12-15 16:10:33 UTC | DD_EXTENSION | INFO | Retry queue storage on disk is disabled
2023-12-15 16:10:33 UTC | DD_EXTENSION | INFO | Creating TimeSampler #0
2023-12-15 16:10:33 UTC | DD_EXTENSION | INFO | dogstatsd-udp: starting to listen on 127.0.0.1:8125
2023-12-15 16:10:33 UTC | DD_EXTENSION | INFO | 0 Features detected from environment:
2023-12-15 16:10:33 UTC | DD_EXTENSION | INFO | Loaded configuration: /var/task/datadog.yaml
TELEMETRY    Name: datadog-agent    State: Subscribed    Types: [Platform, Function, Extension]
2023-12-15 16:10:33 UTC | DD_EXTENSION | INFO | Starting logs-agent...
2023-12-15 16:10:33 UTC | DD_EXTENSION | INFO | logs-agent started
EXTENSION    Name: datadog-agent    State: Ready    Events: [SHUTDOWN, INVOKE]

# BEGIN ACTUAL INVOCATION

START RequestId: f72ef0ec-2e57-4fc3-aa2e-06553484c757 Version: $LATEST
ATTEMPT: Register atexit hook: <function _atexit_hook at 0x7f1945b3b4c0> in ENTRYPOINT with message=ENTRYPOINT hook
SUCCESS: Register atexit hook: <function _atexit_hook at 0x7f1945b3b4c0> in ENTRYPOINT with message=ENTRYPOINT hook
ATTEMPT: Register signal hook: <function _signal_hook_2 at 0x7f1945b3ba60> in ENTRYPOINT
SUCCESS: Register signal hook: <function _signal_hook_2 at 0x7f1945b3ba60> in ENTRYPOINT
2023-12-15 16:10:33 UTC | DD_EXTENSION | INFO | Triggering a flush in the logs-agent
2023-12-15 16:10:33 UTC | DD_EXTENSION | INFO | Successfully posted payload to \"https://7-46-0-app.agent.datadoghq.com/api/v1/series\", the agent will only log transaction success every 500 transactions
END RequestId: f72ef0ec-2e57-4fc3-aa2e-06553484c757
REPORT RequestId: f72ef0ec-2e57-4fc3-aa2e-06553484c757    Duration: 42.64 ms    Billed Duration: 43 ms    Memory Size: 1792 MB    Max Memory Used: 88 MB    Init Duration: 636.30 ms

# END ACTUAL INVOCATION

2023-12-15 16:17:16 UTC | DD_EXTENSION | WARN | Could not send payload: Post "https://http-intake.logs.datadoghq.com/api/v2/logs": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
==== SIGNAL HOOK (2) ====
Caught SIGTERM with sig=15, frame=<frame at 0x7f1945b15490, file '/var/lang/lib/python3.12/threading.py', line 1146, code _wait_for_tstate_lock>
==== SIGNAL HOOK (2) ====
2023-12-15 16:17:18 UTC | DD_EXTENSION | INFO | Triggering a flush in the logs-agent
2023-12-15 16:17:18 UTC | DD_EXTENSION | INFO | Stopping logs-agent
2023-12-15 16:17:18 UTC | DD_EXTENSION | INFO | logs-agent stopped

@bnusunny
Copy link
Contributor

bnusunny commented Dec 21, 2023

@jodylent

At the moment, the change in 3.12 will not be back-ported to 3.8 to 3.11. People need to use Python 3.12 to get this feature.

Python atexit module does not trigger on SIGTERM. But you can leverage a signal handler to trigger them. Here is an example.

def sigterm_handler(signum, frame):
    print("Received SIGTERM signal")
    atexit._run_exitfuncs()

# Register the signal handler for SIGTERM
signal.signal(signal.SIGTERM, sigterm_handler)

@jodylent
Copy link

At the moment, the change in 3.12 will not be back-ported to 3.8 to 3.11

... [one] can leverage signal handler to trigger atexit.

Thank you for the updates -- I'll advertise them internally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants