Flow run with many concurrent tasks intermittently crashing, ECS Task doesn't spin down #9837

austinweisgrau · 2023-06-05T19:01:14Z

First check

I added a descriptive title to this issue.
I used the GitHub search to find a similar issue and didn't find it.
I searched the Prefect documentation for this issue.
I checked that this issue is related to Prefect and not one of its dependencies.

Bug summary

A prefect flow that submits several hundred concurrent Prefect tasks for execution intermittently crashes due an exception raised in the Prefect engine/runner. This flow runs in prefect_aws.ECSTask infrastructure, which normally spins down and deregisters after a task finishes, but the ECS Task stays running indefinitely after the crash. The final exception shows up in the Prefect logs, but the stack trace does not, although it does show up in the cloudwatch/ECS logs.

Note, not sure if it's relevant, but I'm using my semaphore implementation described here to rate limit the execution of these concurrent Prefect tasks - only 3 Prefect tasks execute at a time, but all ~200-500 are submitted initially.

Reproduction

Can provide if necessary but because this is an intermittent bug, it's hard to know what a minimal reproduction would be

Error

/usr/local/lib/python3.10/runpy.py:126: RuntimeWarning: 'prefect.engine' found in sys.modules after import of package 'prefect', but prior to execution of 'prefect.engine'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
21:45:49.587 | INFO    | Flow run 'rigorous-coyote' - Downloading flow code from storage at '/opt/prefect/flows'
21:45:57.757 | INFO    | Flow run 'rigorous-coyote' - Populated environment with credentials.
21:45:57.758 | INFO    | Flow run 'rigorous-coyote' - Environment credentials verified.
21:45:57.872 | INFO    | Flow run 'rigorous-coyote' - Created task run 'fetch_objects_from_sql-0' for task 'fetch_objects_from_sql'
21:45:57.873 | INFO    | Flow run 'rigorous-coyote' - Executing 'fetch_objects_from_sql-0' immediately...
21:46:25.153 | INFO    | Task run 'fetch_objects_from_sql-0' - Found ActBlueTransactions. [rows=323]
21:46:25.307 | INFO    | Task run 'fetch_objects_from_sql-0' - Finished in state Completed()
21:46:25.912 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-1' for task 'push_contribution_to_everyaction'
21:46:25.913 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-1' for execution.
21:46:25.916 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-8' for task 'push_contribution_to_everyaction'
21:46:25.917 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-8' for execution.
21:46:26.101 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-0' for task 'push_contribution_to_everyaction'
21:46:26.102 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-0' for execution.
21:46:26.107 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-2' for task 'push_contribution_to_everyaction'
21:46:26.108 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-2' for execution.
21:46:26.111 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-3' for task 'push_contribution_to_everyaction'
21:46:26.112 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-3' for execution.
21:46:26.161 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-88' for task 'push_contribution_to_everyaction'
21:46:26.162 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-88' for execution.
21:46:26.174 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-4' for task 'push_contribution_to_everyaction'
21:46:26.175 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-4' for execution.
21:46:26.179 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-5' for task 'push_contribution_to_everyaction'
21:46:26.180 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-5' for execution.
21:46:26.183 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-6' for task 'push_contribution_to_everyaction'
21:46:26.183 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-6' for execution.
21:46:26.200 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-9' for task 'push_contribution_to_everyaction'
21:46:26.201 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-9' for execution.
21:46:26.208 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-10' for task 'push_contribution_to_everyaction'
21:46:26.209 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-10' for execution.
21:46:26.214 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-14' for task 'push_contribution_to_everyaction'
21:46:26.215 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-14' for execution.
21:46:26.220 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-15' for task 'push_contribution_to_everyaction'
21:46:26.220 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-15' for execution.
21:46:26.227 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-17' for task 'push_contribution_to_everyaction'
21:46:26.228 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-17' for execution.
21:46:26.234 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-18' for task 'push_contribution_to_everyaction'
21:46:26.234 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-18' for execution.
21:46:26.239 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-19' for task 'push_contribution_to_everyaction'
21:46:26.240 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-19' for execution.
21:46:26.247 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-20' for task 'push_contribution_to_everyaction'
21:46:26.247 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-20' for execution.
21:46:26.251 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-21' for task 'push_contribution_to_everyaction'
21:46:26.252 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-21' for execution.
21:46:26.256 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-23' for task 'push_contribution_to_everyaction'
21:46:26.258 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-23' for execution.
21:46:26.266 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-26' for task 'push_contribution_to_everyaction'
21:46:26.266 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-26' for execution.
21:46:26.272 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-28' for task 'push_contribution_to_everyaction'
21:46:26.272 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-28' for execution.
21:46:26.283 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-32' for task 'push_contribution_to_everyaction'
21:46:26.284 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-32' for execution.
21:46:26.289 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-33' for task 'push_contribution_to_everyaction'
21:46:26.294 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-33' for execution.
21:46:26.303 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-36' for task 'push_contribution_to_everyaction'
21:46:26.304 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-36' for execution.
21:46:26.309 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-38' for task 'push_contribution_to_everyaction'
21:46:26.310 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-38' for execution.
21:46:26.315 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-40' for task 'push_contribution_to_everyaction'
21:46:26.316 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-40' for execution.
21:46:26.323 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-47' for task 'push_contribution_to_everyaction'
21:46:26.323 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-47' for execution.
21:46:26.327 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-48' for task 'push_contribution_to_everyaction'
21:46:26.328 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-48' for execution.
21:46:26.332 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-49' for task 'push_contribution_to_everyaction'
21:46:26.333 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-49' for execution.
21:46:26.341 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-51' for task 'push_contribution_to_everyaction'
21:46:26.342 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-51' for execution.
21:46:26.348 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-53' for task 'push_contribution_to_everyaction'
21:46:26.349 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-53' for execution.
21:46:26.354 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-56' for task 'push_contribution_to_everyaction'
21:46:26.355 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-56' for execution.
21:46:26.360 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-58' for task 'push_contribution_to_everyaction'
21:46:26.360 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-58' for execution.
21:46:26.365 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-61' for task 'push_contribution_to_everyaction'
21:46:26.366 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-61' for execution.
21:46:26.372 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-67' for task 'push_contribution_to_everyaction'
21:46:26.373 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-67' for execution.
21:46:26.377 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-68' for task 'push_contribution_to_everyaction'
21:46:26.378 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-68' for execution.
21:46:26.386 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-76' for task 'push_contribution_to_everyaction'
21:46:26.387 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-76' for execution.
21:46:26.391 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-77' for task 'push_contribution_to_everyaction'
21:46:26.392 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-77' for execution.
21:46:26.395 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-78' for task 'push_contribution_to_everyaction'
21:46:26.396 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-78' for execution.
21:46:26.406 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-82' for task 'push_contribution_to_everyaction'
21:46:26.407 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-82' for execution.
21:46:26.431 | ERROR   | Task run 'push_contribution_to_everyaction-1' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.432 | ERROR   | Task run 'push_contribution_to_everyaction-8' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.434 | ERROR   | Task run 'push_contribution_to_everyaction-0' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.435 | ERROR   | Task run 'push_contribution_to_everyaction-2' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.436 | ERROR   | Task run 'push_contribution_to_everyaction-3' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.438 | ERROR   | Task run 'push_contribution_to_everyaction-88' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.439 | ERROR   | Task run 'push_contribution_to_everyaction-4' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.440 | ERROR   | Task run 'push_contribution_to_everyaction-5' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.441 | ERROR   | Task run 'push_contribution_to_everyaction-6' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.442 | ERROR   | Task run 'push_contribution_to_everyaction-9' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.444 | ERROR   | Task run 'push_contribution_to_everyaction-10' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.445 | ERROR   | Task run 'push_contribution_to_everyaction-14' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.446 | ERROR   | Task run 'push_contribution_to_everyaction-15' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.447 | ERROR   | Task run 'push_contribution_to_everyaction-17' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.448 | ERROR   | Task run 'push_contribution_to_everyaction-18' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.450 | ERROR   | Task run 'push_contribution_to_everyaction-19' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.451 | ERROR   | Task run 'push_contribution_to_everyaction-20' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.452 | ERROR   | Task run 'push_contribution_to_everyaction-21' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.453 | ERROR   | Task run 'push_contribution_to_everyaction-23' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.455 | ERROR   | Task run 'push_contribution_to_everyaction-26' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.456 | ERROR   | Task run 'push_contribution_to_everyaction-28' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.457 | ERROR   | Task run 'push_contribution_to_everyaction-32' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.458 | ERROR   | Task run 'push_contribution_to_everyaction-33' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.459 | ERROR   | Task run 'push_contribution_to_everyaction-36' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.461 | ERROR   | Task run 'push_contribution_to_everyaction-38' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.462 | ERROR   | Task run 'push_contribution_to_everyaction-40' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.463 | ERROR   | Task run 'push_contribution_to_everyaction-47' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.464 | ERROR   | Task run 'push_contribution_to_everyaction-82' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.466 | ERROR   | Task run 'push_contribution_to_everyaction-51' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.467 | ERROR   | Task run 'push_contribution_to_everyaction-53' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.468 | ERROR   | Task run 'push_contribution_to_everyaction-67' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.469 | ERROR   | Task run 'push_contribution_to_everyaction-56' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.470 | ERROR   | Task run 'push_contribution_to_everyaction-58' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.472 | ERROR   | Task run 'push_contribution_to_everyaction-77' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.473 | ERROR   | Task run 'push_contribution_to_everyaction-48' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.474 | ERROR   | Task run 'push_contribution_to_everyaction-68' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.477 | ERROR   | Task run 'push_contribution_to_everyaction-78' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.478 | ERROR   | Task run 'push_contribution_to_everyaction-49' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.479 | ERROR   | Task run 'push_contribution_to_everyaction-61' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.480 | ERROR   | Task run 'push_contribution_to_everyaction-76' - Crash detected! Execution was cancelled by the runtime environment.
21:46:27.183 | ERROR   | Flow run 'rigorous-coyote' - Crash detected! Execution was interrupted by an unexpected exception: KeyError: 247
04:00:57.927 | ERROR   | prefect.engine - Engine execution of flow run 'ae12784c-3cec-4595-b526-d64711911cf6' exited with unexpected exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 2301, in <module>
    enter_flow_run_engine_from_subprocess(flow_run_id)
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 223, in enter_flow_run_engine_from_subprocess
    state = from_sync.wait_for_call_in_loop_thread(
  File "/usr/local/lib/python3.10/site-packages/prefect/_internal/concurrency/api.py", line 232, in wait_for_call_in_loop_thread
    return call.result()
  File "/usr/local/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 173, in result
    return self.future.result(timeout=timeout)
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 218, in _run_async
    result = await coro
  File "/usr/local/lib/python3.10/site-packages/prefect/client/utilities.py", line 40, in with_injected_client
    return await fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 357, in retrieve_flow_then_begin_flow_run
    return await begin_flow_run(
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 392, in begin_flow_run
    async with AsyncExitStack() as stack:
  File "/usr/local/lib/python3.10/contextlib.py", line 714, in __aexit__
    raise exc_details[1]
  File "/usr/local/lib/python3.10/contextlib.py", line 217, in __aexit__
    await self.gen.athrow(typ, value, traceback)
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 1793, in report_flow_run_crashes
    yield
  File "/usr/local/lib/python3.10/contextlib.py", line 697, in __aexit__
    cb_suppress = await cb(*exc_details)
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__
    raise exceptions[0]
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 1275, in create_task_run_then_submit
    task_run = await create_task_run(
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 1320, in create_task_run
    task_run = await flow_run_context.client.create_task_run(
  File "/usr/local/lib/python3.10/site-packages/prefect/client/orchestration.py", line 1839, in create_task_run
    response = await self._client.post(
  File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1848, in post
    return await self.request(
  File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1530, in request
    return await self.send(request, auth=auth, follow_redirects=follow_redirects)
  File "/usr/local/lib/python3.10/site-packages/prefect/client/base.py", line 251, in send
    response = await self._send_with_retry(
  File "/usr/local/lib/python3.10/site-packages/prefect/client/base.py", line 193, in _send_with_retry
    response = await request()
  File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1617, in send
    response = await self._send_handling_auth(
  File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1645, in _send_handling_auth
    response = await self._send_handling_redirects(
  File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1682, in _send_handling_redirects
    response = await self._send_single_request(request)
  File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1719, in _send_single_request
    response = await transport.handle_async_request(request)
  File "/usr/local/lib/python3.10/site-packages/httpx/_transports/default.py", line 353, in handle_async_request
    resp = await self._pool.handle_async_request(req)
  File "/usr/local/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 261, in handle_async_request
    raise exc
  File "/usr/local/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 245, in handle_async_request
    response = await connection.handle_async_request(request)
  File "/usr/local/lib/python3.10/site-packages/httpcore/_async/connection.py", line 96, in handle_async_request
    return await self._connection.handle_async_request(request)
  File "/usr/local/lib/python3.10/site-packages/httpcore/_async/http2.py", line 157, in handle_async_request
    raise exc
  File "/usr/local/lib/python3.10/site-packages/httpcore/_async/http2.py", line 121, in handle_async_request
    await self._send_request_body(request=request, stream_id=stream_id)
  File "/usr/local/lib/python3.10/site-packages/httpcore/_async/http2.py", line 233, in _send_request_body
    self._h2_state.end_stream(stream_id)
  File "/usr/local/lib/python3.10/site-packages/h2/connection.py", line 883, in end_stream
    frames = self.streams[stream_id].end_stream()
KeyError: 247



### Versions

```Text
Version:             2.10.11
API version:         0.8.4
Python version:      3.10.11
Git commit:          8c651ffc
Built:               Thu, May 25, 2023 2:59 PM
OS/Arch:             linux/x86_64
Profile:             default
Server type:         cloud

Additional context

No response

The text was updated successfully, but these errors were encountered:

zanieb · 2023-06-05T22:35:01Z

Hm that last log

prefect/src/prefect/engine.py

Lines 2407 to 2414 in 179afa0

    
           engine_logger.error( 
        
               ( 
        
                   f"Engine execution of flow run '{flow_run_id}' exited with unexpected " 
        
                   "exception" 
        
               ), 
        
               exc_info=True, 
        
           ) 
        
           exit(1)

Should result in our process exiting. It seems very weird that the ECS task would not exit. Can you see the status of the container on AWS?

WillRaphaelson · 2023-06-05T22:42:08Z

I'm also curious about the container logs for the ecs task - is the issue here just that tasks keeps going after a crash, or do you think Prefect has something to do with the tasks crashing in the first place?

zanieb · 2023-06-05T22:44:20Z

Well Prefect is definitely crashing due to an error in h2/httpx but the infrastructure should tear down.

austinweisgrau · 2023-06-21T20:00:23Z

I'm also curious about the container logs for the ecs task - is the issue here just that tasks keeps going after a crash, or do you think Prefect has something to do with the tasks crashing in the first place?

I do think Prefect is responsible for the tasks crashing, and that's part of my issue here. Prefect's concurrent task runner very reliably has trouble with more than a few hundred concurrent tasks submitted at once. Isee this kind of crash about half the time I try and run any flow that sends more than a few hundred tasks to a concurrent task runner.

What I sent above is the container logs for the ECS task. The traceback does not show up in the Prefect logs. Let me know if you're asking for something else that I'm not understanding.

austinweisgrau · 2023-06-21T20:03:12Z

I'll post a container status next time I see one of these crashes - I've got to catch it before my nightly script spins down hanging containers.

austinweisgrau · 2023-06-21T21:03:46Z

Ok, I've got one. The container status is "Running". Logs look essentially the same as above.

austinweisgrau · 2023-06-22T16:24:08Z

Just as a note, the task batcher in this package is solving my problem with crashes caused by submitting too many tasks at once, so that particular bugginess is less urgent on my end.

Still seeing most crashed tasks fail to spin down.

github-actions · 2023-07-22T17:01:29Z

This issue is stale because it has been open 30 days with no activity. To keep this issue open remove stale label or comment.

github-actions · 2023-08-05T18:01:21Z

This issue was closed because it has been stale for 14 days with no activity. If this issue is important or you have more to add feel free to re-open it.

bcodell · 2023-08-07T16:38:09Z

Commenting here as I've recently run into the same issues using the ConcurrentTaskRunner - both Prefect tasks crashing due to too many tasks being mapped, and the hanging ECS task after the flow run crashed. Using the cancel button in the UI spins down the ECS task after the fact, and the aforementioned task batching functionality from the prefecto library serves as a reasonable workaround, but I'd expect this functionality to be solved in Prefect's native implementation.

bcodell · 2023-08-07T16:40:50Z

@WillRaphaelson would you mind reopening this issue or pointing me in the direction of a related issue that's open?

serinamarie · 2023-08-29T16:17:03Z

Hi @austinweisgrau, we've added this to our backlog, but would also welcome a contributor.

EmilRex · 2023-10-11T21:15:40Z

This issue is different from - but has the same resolution as - #10149. Essentially, there is a problem in the lower level library we use to handle HTTP/2. As a temporary measure you can set PREFECT_API_ENABLE_HTTP2=false on your agent or worker to disable the use of HTTP/2. I have not seen a report of exactly this error, but would welcome one, especially if it is reproducible.

austinweisgrau added bug Something isn't working status:triage labels Jun 5, 2023

zanieb added the needs:details Blocked by a need for more info from user label Jun 12, 2023

github-actions bot added the status:stale label Jul 22, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 5, 2023

serinamarie reopened this Aug 25, 2023

serinamarie added enhancement An improvement of an existing feature and removed status:stale needs:details Blocked by a need for more info from user labels Aug 29, 2023

serinamarie added the needs:contributor label Aug 29, 2023

WillRaphaelson closed this as not planned Won't fix, can't repro, duplicate, stale Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flow run with many concurrent tasks intermittently crashing, ECS Task doesn't spin down #9837

Flow run with many concurrent tasks intermittently crashing, ECS Task doesn't spin down #9837

austinweisgrau commented Jun 5, 2023

zanieb commented Jun 5, 2023 •

edited

Loading

WillRaphaelson commented Jun 5, 2023 •

edited

Loading

zanieb commented Jun 5, 2023

austinweisgrau commented Jun 21, 2023

austinweisgrau commented Jun 21, 2023

austinweisgrau commented Jun 21, 2023

austinweisgrau commented Jun 22, 2023

github-actions bot commented Jul 22, 2023

github-actions bot commented Aug 5, 2023

bcodell commented Aug 7, 2023

bcodell commented Aug 7, 2023

serinamarie commented Aug 29, 2023

EmilRex commented Oct 11, 2023

Flow run with many concurrent tasks intermittently crashing, ECS Task doesn't spin down #9837

Flow run with many concurrent tasks intermittently crashing, ECS Task doesn't spin down #9837

Comments

austinweisgrau commented Jun 5, 2023

First check

Bug summary

Reproduction

Error

Additional context

zanieb commented Jun 5, 2023 • edited Loading

WillRaphaelson commented Jun 5, 2023 • edited Loading

zanieb commented Jun 5, 2023

austinweisgrau commented Jun 21, 2023

austinweisgrau commented Jun 21, 2023

austinweisgrau commented Jun 21, 2023

austinweisgrau commented Jun 22, 2023

github-actions bot commented Jul 22, 2023

github-actions bot commented Aug 5, 2023

bcodell commented Aug 7, 2023

bcodell commented Aug 7, 2023

serinamarie commented Aug 29, 2023

EmilRex commented Oct 11, 2023

zanieb commented Jun 5, 2023 •

edited

Loading

WillRaphaelson commented Jun 5, 2023 •

edited

Loading