Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flow run with many concurrent tasks intermittently crashing, ECS Task doesn't spin down #9837

Closed
4 tasks done
austinweisgrau opened this issue Jun 5, 2023 · 13 comments
Closed
4 tasks done
Labels
bug Something isn't working enhancement An improvement of an existing feature

Comments

@austinweisgrau
Copy link

First check

  • I added a descriptive title to this issue.
  • I used the GitHub search to find a similar issue and didn't find it.
  • I searched the Prefect documentation for this issue.
  • I checked that this issue is related to Prefect and not one of its dependencies.

Bug summary

A prefect flow that submits several hundred concurrent Prefect tasks for execution intermittently crashes due an exception raised in the Prefect engine/runner. This flow runs in prefect_aws.ECSTask infrastructure, which normally spins down and deregisters after a task finishes, but the ECS Task stays running indefinitely after the crash. The final exception shows up in the Prefect logs, but the stack trace does not, although it does show up in the cloudwatch/ECS logs.

Note, not sure if it's relevant, but I'm using my semaphore implementation described here to rate limit the execution of these concurrent Prefect tasks - only 3 Prefect tasks execute at a time, but all ~200-500 are submitted initially.

Reproduction

Can provide if necessary but because this is an intermittent bug, it's hard to know what a minimal reproduction would be

Error

/usr/local/lib/python3.10/runpy.py:126: RuntimeWarning: 'prefect.engine' found in sys.modules after import of package 'prefect', but prior to execution of 'prefect.engine'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
21:45:49.587 | INFO    | Flow run 'rigorous-coyote' - Downloading flow code from storage at '/opt/prefect/flows'
21:45:57.757 | INFO    | Flow run 'rigorous-coyote' - Populated environment with credentials.
21:45:57.758 | INFO    | Flow run 'rigorous-coyote' - Environment credentials verified.
21:45:57.872 | INFO    | Flow run 'rigorous-coyote' - Created task run 'fetch_objects_from_sql-0' for task 'fetch_objects_from_sql'
21:45:57.873 | INFO    | Flow run 'rigorous-coyote' - Executing 'fetch_objects_from_sql-0' immediately...
21:46:25.153 | INFO    | Task run 'fetch_objects_from_sql-0' - Found ActBlueTransactions. [rows=323]
21:46:25.307 | INFO    | Task run 'fetch_objects_from_sql-0' - Finished in state Completed()
21:46:25.912 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-1' for task 'push_contribution_to_everyaction'
21:46:25.913 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-1' for execution.
21:46:25.916 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-8' for task 'push_contribution_to_everyaction'
21:46:25.917 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-8' for execution.
21:46:26.101 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-0' for task 'push_contribution_to_everyaction'
21:46:26.102 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-0' for execution.
21:46:26.107 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-2' for task 'push_contribution_to_everyaction'
21:46:26.108 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-2' for execution.
21:46:26.111 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-3' for task 'push_contribution_to_everyaction'
21:46:26.112 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-3' for execution.
21:46:26.161 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-88' for task 'push_contribution_to_everyaction'
21:46:26.162 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-88' for execution.
21:46:26.174 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-4' for task 'push_contribution_to_everyaction'
21:46:26.175 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-4' for execution.
21:46:26.179 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-5' for task 'push_contribution_to_everyaction'
21:46:26.180 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-5' for execution.
21:46:26.183 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-6' for task 'push_contribution_to_everyaction'
21:46:26.183 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-6' for execution.
21:46:26.200 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-9' for task 'push_contribution_to_everyaction'
21:46:26.201 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-9' for execution.
21:46:26.208 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-10' for task 'push_contribution_to_everyaction'
21:46:26.209 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-10' for execution.
21:46:26.214 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-14' for task 'push_contribution_to_everyaction'
21:46:26.215 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-14' for execution.
21:46:26.220 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-15' for task 'push_contribution_to_everyaction'
21:46:26.220 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-15' for execution.
21:46:26.227 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-17' for task 'push_contribution_to_everyaction'
21:46:26.228 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-17' for execution.
21:46:26.234 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-18' for task 'push_contribution_to_everyaction'
21:46:26.234 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-18' for execution.
21:46:26.239 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-19' for task 'push_contribution_to_everyaction'
21:46:26.240 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-19' for execution.
21:46:26.247 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-20' for task 'push_contribution_to_everyaction'
21:46:26.247 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-20' for execution.
21:46:26.251 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-21' for task 'push_contribution_to_everyaction'
21:46:26.252 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-21' for execution.
21:46:26.256 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-23' for task 'push_contribution_to_everyaction'
21:46:26.258 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-23' for execution.
21:46:26.266 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-26' for task 'push_contribution_to_everyaction'
21:46:26.266 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-26' for execution.
21:46:26.272 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-28' for task 'push_contribution_to_everyaction'
21:46:26.272 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-28' for execution.
21:46:26.283 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-32' for task 'push_contribution_to_everyaction'
21:46:26.284 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-32' for execution.
21:46:26.289 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-33' for task 'push_contribution_to_everyaction'
21:46:26.294 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-33' for execution.
21:46:26.303 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-36' for task 'push_contribution_to_everyaction'
21:46:26.304 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-36' for execution.
21:46:26.309 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-38' for task 'push_contribution_to_everyaction'
21:46:26.310 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-38' for execution.
21:46:26.315 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-40' for task 'push_contribution_to_everyaction'
21:46:26.316 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-40' for execution.
21:46:26.323 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-47' for task 'push_contribution_to_everyaction'
21:46:26.323 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-47' for execution.
21:46:26.327 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-48' for task 'push_contribution_to_everyaction'
21:46:26.328 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-48' for execution.
21:46:26.332 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-49' for task 'push_contribution_to_everyaction'
21:46:26.333 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-49' for execution.
21:46:26.341 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-51' for task 'push_contribution_to_everyaction'
21:46:26.342 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-51' for execution.
21:46:26.348 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-53' for task 'push_contribution_to_everyaction'
21:46:26.349 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-53' for execution.
21:46:26.354 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-56' for task 'push_contribution_to_everyaction'
21:46:26.355 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-56' for execution.
21:46:26.360 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-58' for task 'push_contribution_to_everyaction'
21:46:26.360 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-58' for execution.
21:46:26.365 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-61' for task 'push_contribution_to_everyaction'
21:46:26.366 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-61' for execution.
21:46:26.372 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-67' for task 'push_contribution_to_everyaction'
21:46:26.373 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-67' for execution.
21:46:26.377 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-68' for task 'push_contribution_to_everyaction'
21:46:26.378 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-68' for execution.
21:46:26.386 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-76' for task 'push_contribution_to_everyaction'
21:46:26.387 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-76' for execution.
21:46:26.391 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-77' for task 'push_contribution_to_everyaction'
21:46:26.392 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-77' for execution.
21:46:26.395 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-78' for task 'push_contribution_to_everyaction'
21:46:26.396 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-78' for execution.
21:46:26.406 | INFO    | Flow run 'rigorous-coyote' - Created task run 'push_contribution_to_everyaction-82' for task 'push_contribution_to_everyaction'
21:46:26.407 | INFO    | Flow run 'rigorous-coyote' - Submitted task run 'push_contribution_to_everyaction-82' for execution.
21:46:26.431 | ERROR   | Task run 'push_contribution_to_everyaction-1' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.432 | ERROR   | Task run 'push_contribution_to_everyaction-8' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.434 | ERROR   | Task run 'push_contribution_to_everyaction-0' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.435 | ERROR   | Task run 'push_contribution_to_everyaction-2' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.436 | ERROR   | Task run 'push_contribution_to_everyaction-3' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.438 | ERROR   | Task run 'push_contribution_to_everyaction-88' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.439 | ERROR   | Task run 'push_contribution_to_everyaction-4' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.440 | ERROR   | Task run 'push_contribution_to_everyaction-5' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.441 | ERROR   | Task run 'push_contribution_to_everyaction-6' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.442 | ERROR   | Task run 'push_contribution_to_everyaction-9' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.444 | ERROR   | Task run 'push_contribution_to_everyaction-10' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.445 | ERROR   | Task run 'push_contribution_to_everyaction-14' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.446 | ERROR   | Task run 'push_contribution_to_everyaction-15' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.447 | ERROR   | Task run 'push_contribution_to_everyaction-17' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.448 | ERROR   | Task run 'push_contribution_to_everyaction-18' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.450 | ERROR   | Task run 'push_contribution_to_everyaction-19' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.451 | ERROR   | Task run 'push_contribution_to_everyaction-20' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.452 | ERROR   | Task run 'push_contribution_to_everyaction-21' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.453 | ERROR   | Task run 'push_contribution_to_everyaction-23' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.455 | ERROR   | Task run 'push_contribution_to_everyaction-26' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.456 | ERROR   | Task run 'push_contribution_to_everyaction-28' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.457 | ERROR   | Task run 'push_contribution_to_everyaction-32' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.458 | ERROR   | Task run 'push_contribution_to_everyaction-33' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.459 | ERROR   | Task run 'push_contribution_to_everyaction-36' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.461 | ERROR   | Task run 'push_contribution_to_everyaction-38' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.462 | ERROR   | Task run 'push_contribution_to_everyaction-40' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.463 | ERROR   | Task run 'push_contribution_to_everyaction-47' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.464 | ERROR   | Task run 'push_contribution_to_everyaction-82' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.466 | ERROR   | Task run 'push_contribution_to_everyaction-51' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.467 | ERROR   | Task run 'push_contribution_to_everyaction-53' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.468 | ERROR   | Task run 'push_contribution_to_everyaction-67' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.469 | ERROR   | Task run 'push_contribution_to_everyaction-56' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.470 | ERROR   | Task run 'push_contribution_to_everyaction-58' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.472 | ERROR   | Task run 'push_contribution_to_everyaction-77' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.473 | ERROR   | Task run 'push_contribution_to_everyaction-48' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.474 | ERROR   | Task run 'push_contribution_to_everyaction-68' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.477 | ERROR   | Task run 'push_contribution_to_everyaction-78' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.478 | ERROR   | Task run 'push_contribution_to_everyaction-49' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.479 | ERROR   | Task run 'push_contribution_to_everyaction-61' - Crash detected! Execution was cancelled by the runtime environment.
21:46:26.480 | ERROR   | Task run 'push_contribution_to_everyaction-76' - Crash detected! Execution was cancelled by the runtime environment.
21:46:27.183 | ERROR   | Flow run 'rigorous-coyote' - Crash detected! Execution was interrupted by an unexpected exception: KeyError: 247
04:00:57.927 | ERROR   | prefect.engine - Engine execution of flow run 'ae12784c-3cec-4595-b526-d64711911cf6' exited with unexpected exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 2301, in <module>
    enter_flow_run_engine_from_subprocess(flow_run_id)
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 223, in enter_flow_run_engine_from_subprocess
    state = from_sync.wait_for_call_in_loop_thread(
  File "/usr/local/lib/python3.10/site-packages/prefect/_internal/concurrency/api.py", line 232, in wait_for_call_in_loop_thread
    return call.result()
  File "/usr/local/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 173, in result
    return self.future.result(timeout=timeout)
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.10/site-packages/prefect/_internal/concurrency/calls.py", line 218, in _run_async
    result = await coro
  File "/usr/local/lib/python3.10/site-packages/prefect/client/utilities.py", line 40, in with_injected_client
    return await fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 357, in retrieve_flow_then_begin_flow_run
    return await begin_flow_run(
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 392, in begin_flow_run
    async with AsyncExitStack() as stack:
  File "/usr/local/lib/python3.10/contextlib.py", line 714, in __aexit__
    raise exc_details[1]
  File "/usr/local/lib/python3.10/contextlib.py", line 217, in __aexit__
    await self.gen.athrow(typ, value, traceback)
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 1793, in report_flow_run_crashes
    yield
  File "/usr/local/lib/python3.10/contextlib.py", line 697, in __aexit__
    cb_suppress = await cb(*exc_details)
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__
    raise exceptions[0]
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 1275, in create_task_run_then_submit
    task_run = await create_task_run(
  File "/usr/local/lib/python3.10/site-packages/prefect/engine.py", line 1320, in create_task_run
    task_run = await flow_run_context.client.create_task_run(
  File "/usr/local/lib/python3.10/site-packages/prefect/client/orchestration.py", line 1839, in create_task_run
    response = await self._client.post(
  File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1848, in post
    return await self.request(
  File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1530, in request
    return await self.send(request, auth=auth, follow_redirects=follow_redirects)
  File "/usr/local/lib/python3.10/site-packages/prefect/client/base.py", line 251, in send
    response = await self._send_with_retry(
  File "/usr/local/lib/python3.10/site-packages/prefect/client/base.py", line 193, in _send_with_retry
    response = await request()
  File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1617, in send
    response = await self._send_handling_auth(
  File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1645, in _send_handling_auth
    response = await self._send_handling_redirects(
  File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1682, in _send_handling_redirects
    response = await self._send_single_request(request)
  File "/usr/local/lib/python3.10/site-packages/httpx/_client.py", line 1719, in _send_single_request
    response = await transport.handle_async_request(request)
  File "/usr/local/lib/python3.10/site-packages/httpx/_transports/default.py", line 353, in handle_async_request
    resp = await self._pool.handle_async_request(req)
  File "/usr/local/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 261, in handle_async_request
    raise exc
  File "/usr/local/lib/python3.10/site-packages/httpcore/_async/connection_pool.py", line 245, in handle_async_request
    response = await connection.handle_async_request(request)
  File "/usr/local/lib/python3.10/site-packages/httpcore/_async/connection.py", line 96, in handle_async_request
    return await self._connection.handle_async_request(request)
  File "/usr/local/lib/python3.10/site-packages/httpcore/_async/http2.py", line 157, in handle_async_request
    raise exc
  File "/usr/local/lib/python3.10/site-packages/httpcore/_async/http2.py", line 121, in handle_async_request
    await self._send_request_body(request=request, stream_id=stream_id)
  File "/usr/local/lib/python3.10/site-packages/httpcore/_async/http2.py", line 233, in _send_request_body
    self._h2_state.end_stream(stream_id)
  File "/usr/local/lib/python3.10/site-packages/h2/connection.py", line 883, in end_stream
    frames = self.streams[stream_id].end_stream()
KeyError: 247


### Versions

```Text
Version:             2.10.11
API version:         0.8.4
Python version:      3.10.11
Git commit:          8c651ffc
Built:               Thu, May 25, 2023 2:59 PM
OS/Arch:             linux/x86_64
Profile:             default
Server type:         cloud

Additional context

No response

@austinweisgrau austinweisgrau added bug Something isn't working status:triage labels Jun 5, 2023
@zanieb
Copy link
Contributor

zanieb commented Jun 5, 2023

Hm that last log

prefect/src/prefect/engine.py

Lines 2407 to 2414 in 179afa0

engine_logger.error(
(
f"Engine execution of flow run '{flow_run_id}' exited with unexpected "
"exception"
),
exc_info=True,
)
exit(1)

Should result in our process exiting. It seems very weird that the ECS task would not exit. Can you see the status of the container on AWS?

@WillRaphaelson
Copy link
Contributor

WillRaphaelson commented Jun 5, 2023

I'm also curious about the container logs for the ecs task - is the issue here just that tasks keeps going after a crash, or do you think Prefect has something to do with the tasks crashing in the first place?

@zanieb
Copy link
Contributor

zanieb commented Jun 5, 2023

Well Prefect is definitely crashing due to an error in h2/httpx but the infrastructure should tear down.

@zanieb zanieb added the needs:details Blocked by a need for more info from user label Jun 12, 2023
@austinweisgrau
Copy link
Author

I'm also curious about the container logs for the ecs task - is the issue here just that tasks keeps going after a crash, or do you think Prefect has something to do with the tasks crashing in the first place?

I do think Prefect is responsible for the tasks crashing, and that's part of my issue here. Prefect's concurrent task runner very reliably has trouble with more than a few hundred concurrent tasks submitted at once. Isee this kind of crash about half the time I try and run any flow that sends more than a few hundred tasks to a concurrent task runner.

What I sent above is the container logs for the ECS task. The traceback does not show up in the Prefect logs. Let me know if you're asking for something else that I'm not understanding.

@austinweisgrau
Copy link
Author

I'll post a container status next time I see one of these crashes - I've got to catch it before my nightly script spins down hanging containers.

@austinweisgrau
Copy link
Author

Ok, I've got one. The container status is "Running". Logs look essentially the same as above.

@austinweisgrau
Copy link
Author

Just as a note, the task batcher in this package is solving my problem with crashes caused by submitting too many tasks at once, so that particular bugginess is less urgent on my end.

Still seeing most crashed tasks fail to spin down.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. To keep this issue open remove stale label or comment.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 5, 2023

This issue was closed because it has been stale for 14 days with no activity. If this issue is important or you have more to add feel free to re-open it.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 5, 2023
@bcodell
Copy link

bcodell commented Aug 7, 2023

Commenting here as I've recently run into the same issues using the ConcurrentTaskRunner - both Prefect tasks crashing due to too many tasks being mapped, and the hanging ECS task after the flow run crashed. Using the cancel button in the UI spins down the ECS task after the fact, and the aforementioned task batching functionality from the prefecto library serves as a reasonable workaround, but I'd expect this functionality to be solved in Prefect's native implementation.

@bcodell
Copy link

bcodell commented Aug 7, 2023

@WillRaphaelson would you mind reopening this issue or pointing me in the direction of a related issue that's open?

@serinamarie serinamarie reopened this Aug 25, 2023
@serinamarie serinamarie added enhancement An improvement of an existing feature and removed status:stale needs:details Blocked by a need for more info from user labels Aug 29, 2023
@serinamarie
Copy link
Contributor

Hi @austinweisgrau, we've added this to our backlog, but would also welcome a contributor.

@EmilRex
Copy link
Contributor

EmilRex commented Oct 11, 2023

This issue is different from - but has the same resolution as - #10149. Essentially, there is a problem in the lower level library we use to handle HTTP/2. As a temporary measure you can set PREFECT_API_ENABLE_HTTP2=false on your agent or worker to disable the use of HTTP/2. I have not seen a report of exactly this error, but would welcome one, especially if it is reproducible.

@WillRaphaelson WillRaphaelson closed this as not planned Won't fix, can't repro, duplicate, stale Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement An improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

6 participants