Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rally hangs indefinitely on OSX when running 'esrally race --revision' #1575

Open
b-deam opened this issue Sep 12, 2022 · 6 comments · May be fixed by #1653
Open

Rally hangs indefinitely on OSX when running 'esrally race --revision' #1575

b-deam opened this issue Sep 12, 2022 · 6 comments · May be fixed by #1653
Labels
bug Something's wrong

Comments

@b-deam
Copy link
Member

b-deam commented Sep 12, 2022

Rally version (get with esrally --version): Only attempted reproduction with main/master branch: esrally 2.6.1.dev0, but suspect more branches are affected.

Invoked command:

esrally race --revision="@2022-09-11" --track=geonames --test-mode --kill-running-processes --target-hosts=127.0.0.1:29200 --challenge=append-no-conflicts --car=4gheap,basic-license --elasticsearch-plugins=analysis-icu --runtime-jdk=bundled

Configuration file (located in ~/.rally/rally.ini)):

[meta]
config.version = 17

[system]
env.name = local

[node]
root.dir = /Users/bradleydeam/.rally/benchmarks
src.root.dir = /Users/bradleydeam/.rally/benchmarks/src

[source]
remote.repo.url = https://github.com/elastic/elasticsearch.git
elasticsearch.src.subdir = elasticsearch
cache = false

[benchmarks]
local.dataset.cache = /Users/bradleydeam/.rally/benchmarks/data

[reporting]
datastore.type = in-memory

[tracks]
default.url = https://github.com/elastic/rally-tracks

[teams]
default.url = https://github.com/elastic/rally-teams

[defaults]
preserve_benchmark_candidate = false

[distributions]
release.cache = true

JVM version: N/A (reproducible without JVM installed)

OS version:

Darwin MacBook-Pro-2.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 18 00:46:32 PDT 2022; root:xnu-8020.101.4~15/RELEASE_ARM64_T6000 x86_64

Description of the problem including expected versus actual behavior:

Expected: Rally installs and starts specified revision of Elasticsearch and plugin(s) and then runs the specified benchmark experiment against the cluster
Actual: Rally hangs indefinitely, after having successfully built Elastiscsearch and any plugin(s) from source

Note that esrally install ... works as expected using the same code paths as esrally race --revision initially does. The only difference between the two is that esrally race --revision starts the Actor Sub system.

Steps to reproduce:

  1. Install Rally from master, python -m venv .venv && source .venv/bin/activate && make install
  2. Execute esrally race --revision="@2022-09-11" --track=geonames --test-mode --kill-running-processes --target-hosts=127.0.0.1:29200 --challenge=append-no-conflicts --car=4gheap,basic-license --elasticsearch-plugins=analysis-icu --runtime-jdk=bundled or pytest -s it -k test_sources
  3. Watch as Rally hangs indefinitely:
$ pytest -s -it -k test_sources
[...]
it/sources_test.py::test_sources[es-it]
   ____        ____
  / __ \____ _/ / /_  __
 / /_/ / __ `/ / / / / /
/ _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
               /____/

[INFO] Race id is [2b313325-31b2-4f45-884d-c9ca86495c73]
[INFO] Preparing for race ...

Provide logs (if relevant):

py-spy dump:

$ pstree -s esrally -w
-+= 00001 root /sbin/launchd
 |-+= 00766 bradleydeam /Library/Application Support/iTerm2/iTermServer-3.4.3 /Library/Application Support/iTerm2/iterm2-daemon-1.socket
 | \-+= 00767 root /usr/bin/login -fpl bradleydeam /Applications/iTerm.app/Contents/MacOS/iTerm2 --launch_shell
 |   \-+= 00777 bradleydeam -zsh
 |     \-+= 92100 bradleydeam /Applications/Xcode.app/Contents/Developer/usr/bin/make it
 |       \-+- 92624 bradleydeam /bin/bash -c . /elastic/rally/.venv/bin/activate; tox -e py38-it
 |         \-+- 92625 bradleydeam /elastic/rally/.venv/bin/python3 /elastic/rally/.venv/bin/tox -e py38-it
 |           \-+- 92757 bradleydeam /elastic/rally/.tox/py38-it/bin/python /elastic/rally/.tox/py38-it/bin/pytest -s it --junitxml=junit-py38-it.xml
 |             \-+- 98774 bradleydeam /elastic/rally/.tox/py38-it/bin/python /elastic/rally/.tox/py38-it/bin/esrally race --revision=latest --track=geonames --test-mode --target-hosts=127.0.0.1:19200 --challenge=append-no-conflicts --car=4gheap,basic-license --elasticsearch-plugins=analysis-icu --kill-running-processes --on-error=abort --enable-assertions --configuration-name=es-it
 |               \--= 98777 bradleydeam (python3.8)
 \-+- 98778 bradleydeam /elastic/rally/.tox/py38-it/bin/python /elastic/rally/.tox/py38-it/bin/esrally race --revision=latest --track=geonames --test-mode --target-hosts=127.0.0.1:19200 --challenge=append-no-conflicts --car=4gheap,basic-license --elasticsearch-plugins=analysis-icu --kill-running-processes --on-error=abort --enable-assertions --configuration-name=es-it
   |--- 98779 bradleydeam /elastic/rally/.tox/py38-it/bin/python /elastic/rally/.tox/py38-it/bin/esrally race --revision=latest --track=geonames --test-mode --target-hosts=127.0.0.1:19200 --challenge=append-no-conflicts --car=4gheap,basic-license --elasticsearch-plugins=analysis-icu --kill-running-processes --on-error=abort --enable-assertions --configuration-name=es-it
   \-+- 98780 bradleydeam /elastic/rally/.tox/py38-it/bin/python /elastic/rally/.tox/py38-it/bin/esrally race --revision=latest --track=geonames --test-mode --target-hosts=127.0.0.1:19200 --challenge=append-no-conflicts --car=4gheap,basic-license --elasticsearch-plugins=analysis-icu --kill-running-processes --on-error=abort --enable-assertions --configuration-name=es-it
     \-+- 98803 bradleydeam /elastic/rally/.tox/py38-it/bin/python /elastic/rally/.tox/py38-it/bin/esrally race --revision=latest --track=geonames --test-mode --target-hosts=127.0.0.1:19200 --challenge=append-no-conflicts --car=4gheap,basic-license --elasticsearch-plugins=analysis-icu --kill-running-processes --on-error=abort --enable-assertions --configuration-name=es-it
       \-+- 98822 bradleydeam /elastic/rally/.tox/py38-it/bin/python /elastic/rally/.tox/py38-it/bin/esrally race --revision=latest --track=geonames --test-mode --target-hosts=127.0.0.1:19200 --challenge=append-no-conflicts --car=4gheap,basic-license --elasticsearch-plugins=analysis-icu --kill-running-processes --on-error=abort --enable-assertions --configuration-name=es-it
         \--- 98823 bradleydeam /elastic/rally/.tox/py38-it/bin/python /elastic/rally/.tox/py38-it/bin/esrally race --revision=latest --track=geonames --test-mode --target-hosts=127.0.0.1:19200 --challenge=append-no-conflicts --car=4gheap,basic-license --elasticsearch-plugins=analysis-icu --kill-running-processes --on-error=abort --enable-assertions --configuration-name=es-it

$ sudo py-spy dump --pid 98823
Process 98823: /elastic/rally/.tox/py38-it/bin/python /elastic/rally/.tox/py38-it/bin/esrally race --revision=latest --track=geonames --test-mode --target-hosts=127.0.0.1:19200 --challenge=append-no-conflicts --car=4gheap,basic-license --elasticsearch-plugins=analysis-icu --kill-running-processes --on-error=abort --enable-assertions --configuration-name=es-it
Python v3.8.10 (/.pyenv/versions/3.8.10/bin/python3.8)

Thread 0x100710580 (active): "MainThread"
    exclusive_processing (thespian/system/transport/asyncTransportBase.py:46)
    __enter__ (contextlib.py:113)
    _runWithExpiry (thespian/system/transport/TCPTransport.py:1094)
    _run_subtransport (thespian/system/transport/wakeupTransportBase.py:80)
    run (thespian/system/transport/wakeupTransportBase.py:71)
    drainTransmits (thespian/system/systemCommon.py:202)
    run (thespian/system/actorManager.py:112)
    startChild (thespian/system/multiprocCommon.py:591)
    run (multiprocessing/process.py:108)
    _bootstrap (multiprocessing/process.py:315)
    _launch (multiprocessing/popen_fork.py:75)
    __init__ (multiprocessing/popen_fork.py:19)
    _Popen (multiprocessing/context.py:277)
    start (multiprocessing/process.py:121)
    _startChildActor (thespian/system/multiprocCommon.py:346)
    createActor (thespian/system/actorManager.py:316)
    createActor (thespian/actors.py:189)
    receiveMsg_StartEngine (esrally/mechanic/mechanic.py:486)
    guard (esrally/actor.py:92)
    receiveMessage (thespian/actors.py:838)
    _handleOneMessage (thespian/system/actorManager.py:163)
    handleMessages (thespian/system/actorManager.py:121)
    _runWithExpiry (thespian/system/transport/TCPTransport.py:1310)
    _run_subtransport (thespian/system/transport/wakeupTransportBase.py:80)
    run (thespian/system/transport/wakeupTransportBase.py:71)
    run (thespian/system/actorManager.py:87)
    startChild (thespian/system/multiprocCommon.py:591)
    run (multiprocessing/process.py:108)
    _bootstrap (multiprocessing/process.py:315)
    _launch (multiprocessing/popen_fork.py:75)
    __init__ (multiprocessing/popen_fork.py:19)
    _Popen (multiprocessing/context.py:277)
    start (multiprocessing/process.py:121)
    _startChildActor (thespian/system/multiprocCommon.py:346)
    createActor (thespian/system/actorManager.py:316)
    createActor (thespian/actors.py:189)
    receiveMsg_StartEngine (esrally/mechanic/mechanic.py:392)
    guard (esrally/actor.py:92)
    receiveMessage (thespian/actors.py:838)
    _handleOneMessage (thespian/system/actorManager.py:163)
    handleMessages (thespian/system/actorManager.py:121)
    _runWithExpiry (thespian/system/transport/TCPTransport.py:1310)
    _run_subtransport (thespian/system/transport/wakeupTransportBase.py:80)
    run (thespian/system/transport/wakeupTransportBase.py:71)
    run (thespian/system/actorManager.py:87)
    startChild (thespian/system/multiprocCommon.py:591)
    run (multiprocessing/process.py:108)
    _bootstrap (multiprocessing/process.py:315)
    _launch (multiprocessing/popen_fork.py:75)
    __init__ (multiprocessing/popen_fork.py:19)
    _Popen (multiprocessing/context.py:277)
    start (multiprocessing/process.py:121)
    _startChildActor (thespian/system/multiprocCommon.py:346)
    createActor (thespian/system/actorManager.py:316)
    createActor (thespian/actors.py:189)
    receiveMsg_Setup (esrally/racecontrol.py:112)
    guard (esrally/actor.py:92)
    receiveMessage (thespian/actors.py:838)
    _handleOneMessage (thespian/system/actorManager.py:163)
    handleMessages (thespian/system/actorManager.py:121)
    _runWithExpiry (thespian/system/transport/TCPTransport.py:1310)
    _run_subtransport (thespian/system/transport/wakeupTransportBase.py:80)
    run (thespian/system/transport/wakeupTransportBase.py:71)
    run (thespian/system/actorManager.py:87)
    startChild (thespian/system/multiprocCommon.py:591)
    run (multiprocessing/process.py:108)
    _bootstrap (multiprocessing/process.py:315)
    _launch (multiprocessing/popen_fork.py:75)
    __init__ (multiprocessing/popen_fork.py:19)
    _Popen (multiprocessing/context.py:277)
    start (multiprocessing/process.py:121)
    _startChildActor (thespian/system/multiprocCommon.py:346)
    h_PendingActor (thespian/system/admin/adminCore.py:318)
    h_PendingActor (thespian/system/admin/globalNames.py:19)
    handleIncoming (thespian/system/admin/adminCore.py:114)
    _runWithExpiry (thespian/system/transport/TCPTransport.py:1310)
    _run_subtransport (thespian/system/transport/wakeupTransportBase.py:80)
    run (thespian/system/transport/wakeupTransportBase.py:71)
    run (thespian/system/admin/convention.py:643)
    startAdmin (thespian/system/multiprocCommon.py:207)
    run (multiprocessing/process.py:108)
    _bootstrap (multiprocessing/process.py:315)
    _launch (multiprocessing/popen_fork.py:75)
    __init__ (multiprocessing/popen_fork.py:19)
    _Popen (multiprocessing/context.py:277)
    start (multiprocessing/process.py:121)
    _startAdmin (thespian/system/multiprocCommon.py:104)
    __init__ (thespian/system/systemBase.py:326)
    __init__ (thespian/system/multiprocCommon.py:86)
    __init__ (thespian/system/multiprocTCPBase.py:28)
    _startupActorSys (thespian/actors.py:676)
    __init__ (thespian/actors.py:635)
    bootstrap_actor_system (esrally/actor.py:263)
    with_actor_system (esrally/rally.py:775)
    race (esrally/rally.py:767)
    dispatch_sub_command (esrally/rally.py:991)
    main (esrally/rally.py:1082)
    <module> (esrally:8)

Setting Thespian debug logs with:

export THESPLOG_FILE="${THESPLOG_FILE:-${HOME}/.rally/logs/actor-system-internal.log}"
export THESPLOG_FILE_MAXSIZE=${THESPLOG_FILE_MAXSIZE:-204800}
export THESPLOG_THRESHOLD="DEBUG"

Allowed me to capture this as Rally 'hung':

2022-09-12 13:22:58.969211 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999865-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/supplier.py, 702, "User-specified revision...-quit_0:04:59.999849)
2022-09-12 13:22:58.969813 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999934-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/supplier.py, 688, "Skip fetching sources f...-quit_0:04:59.999923)
2022-09-12 13:22:59.006508 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999783-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/supplier.py, 702, "User-specified revision...-quit_0:04:59.999766)
2022-09-12 13:22:59.007283 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999946-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/supplier.py, 439, "Setting build JDK major...-quit_0:04:59.999933)
2022-09-12 13:22:59.034824 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999868-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/supplier.py, 746, "Running build command [...-quit_0:04:59.999853)
2022-09-12 13:23:14.644823 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999840-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/supplier.py, 746, "Running build command [...-quit_0:04:59.999824)
2022-09-12 13:24:02.459694 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999668-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/supplier.py, 439, "Setting build JDK major...-quit_0:04:59.999629)
2022-09-12 13:24:35.662401 p16740 dbg  Attempting intent TransportIntent(ActorAddr-(T|:61034)-pending-ExpiresIn_0:04:59.999743-<class 'esrally.mechanic.mechanic.NodesStarted'>-<esrally.mechanic.mechanic.NodesStarted object at 0x11036fc70>-quit_0:04:59.999734)
2022-09-12 13:24:35.664993 p16740 ERR  Actor esrally.mechanic.mechanic.NodeMechanicActor @ ActorAddr-(T|:61053) transport run exception: Traceback (most recent call last):
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/actorManager.py", line 88, in run
    r = self.transport.run(self.handleMessages)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/transport/wakeupTransportBase.py", line 71, in run
    rval = self._run_subtransport(incomingHandler, max_runtime)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/transport/wakeupTransportBase.py", line 80, in _run_subtransport
    rval = self._runWithExpiry(incomingHandler)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/transport/TCPTransport.py", line 1219, in _runWithExpiry
    self._acceptNewIncoming()
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/transport/TCPTransport.py", line 1342, in _acceptNewIncoming
    lsock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
OSError: [Errno 22] Invalid argument

2022-09-12 13:24:35.665390 p16740 dbg  Attempting intent TransportIntent(ActorAddr-(T|:61050)-pending-ExpiresIn_0:04:59.999884-<class 'thespian.actors.ChildActorExited'>-ChildActorExited:ActorAddr-(T|:61053)-quit_0:04:59.999871)
Full actor debug logs
2022-09-12 13:22:53.875894 p16688 dbg  ++++ Starting Admin from /Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/__init__.py
2022-09-12 13:22:53.876248 p16688 I    ++++ Admin started @ ActorAddr-(T|:1900) / gen (3, 10)
2022-09-12 13:22:53.884362 p16688 dbg  Admin of ReceiveEnvelope(from: ActorAddr-(T|:61007), <class 'thespian.system.messages.multiproc.LoggerConnected'> msg: <thespian.system.messages.multiproc.LoggerConnected object at 0x1102c7700>)
2022-09-12 13:22:53.884931 p16688 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61006)-pending-ExpiresIn_0:04:59.999856-<class 'thespian.system.messages.multiproc.EndpointConnected'>-<thespian.system.messages.multiproc.EndpointConnected object at 0x1102c7af0>-quit_0:04:59.999847)
2022-09-12 13:22:53.887899 p16669 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:1900)-pending-ExpiresIn_0:04:59.999802-<class 'thespian.system.messages.admin.QueryExists'>-<thespian.system.messages.admin.QueryExists object at 0x1102acd00>-quit_0:04:59.999787)
2022-09-12 13:22:53.888433 p16688 dbg  Admin of ReceiveEnvelope(from: ActorAddr-(T|:61006), <class 'thespian.system.messages.admin.QueryExists'> msg: <thespian.system.messages.admin.QueryExists object at 0x1102c78b0>)
2022-09-12 13:22:53.888944 p16688 dbg  Attempting intent TransportIntent(ActorAddr-(T|:61006)-pending-ExpiresIn_0:04:59.999784-<class 'thespian.system.messages.admin.QueryAck'>-<thespian.system.messages.admin.QueryAck object at 0x1102c70a0>-quit_0:04:59.999773)
2022-09-12 13:22:53.889216 p16688 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61006)-pending-ExpiresIn_0:04:59.999473-<class 'thespian.system.messages.admin.QueryAck'>-<thespian.system.messages.admin.QueryAck object at 0x1102c70a0>-quit_0:04:59.999466)
2022-09-12 13:22:53.891642 p16669 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:1900)-pending-ExpiresIn_0:04:59.999928-<class 'thespian.system.messages.admin.PendingActor'>-PendingActor#1_of_None-quit_0:04:59.999919)
2022-09-12 13:22:53.894153 p16688 dbg  Admin of ReceiveEnvelope(from: ActorAddr-(T|:61011), <class 'thespian.system.messages.admin.PendingActor'> msg: PendingActor#1_of_None)
2022-09-12 13:22:53.894394 p16688 I    Pending Actor request received for esrally.racecontrol.BenchmarkActor reqs {'coordinator': True} from ActorAddr-(T|:61011)
2022-09-12 13:22:53.894965 p16688 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999917-<class 'logging.LogRecord'>-<LogRecord: esrally.actor, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/actor.py, 124, "Capabilities [{'coordinator': True, 'ip': '127....-quit_0:04:59.999910)
2022-09-12 13:22:53.897728 p16690 I    Starting Actor esrally.racecontrol.BenchmarkActor at ActorAddr-(T|:61013) (parent ActorAddr-(T|:1900), admin ActorAddr-(T|:1900), srcHash None)
2022-09-12 13:22:53.898158 p16690 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:1900)-pending-ExpiresIn_0:04:59.999887-<class 'thespian.system.messages.multiproc.EndpointConnected'>-<thespian.system.messages.multiproc.EndpointConnected object at 0x110287670>-quit_0:04:59.999878)
2022-09-12 13:22:53.900451 p16688 dbg  Attempting intent TransportIntent(ActorAddr-(T|:61011)-pending-ExpiresIn_0:04:59.999824-<class 'thespian.system.messages.admin.PendingActorResponse'>-PendingActorResponse(for None inst# 1) errCode None actual ActorAddr-(T|:61013)-quit_0:04:59.999817)
2022-09-12 13:22:53.900716 p16688 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61011)-pending-ExpiresIn_0:04:59.999505-<class 'thespian.system.messages.admin.PendingActorResponse'>-PendingActorResponse(for None inst# 1) errCode None actual ActorAddr-(T|:61013)-quit_0:04:59.999499)
2022-09-12 13:22:53.901609 p16669 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61013)-pending-ExpiresIn_0:04:59.999833-<class 'esrally.racecontrol.Setup'>-<esrally.racecontrol.Setup object at 0x1102acd00>-quit_0:04:59.999825)
2022-09-12 13:22:53.906689 p16690 dbg  ACTOR got ReceiveEnvelope(from: ActorAddr-(T|:61006), <class 'esrally.racecontrol.Setup'> msg: <esrally.racecontrol.Setup object at 0x1102eb190>)
2022-09-12 13:22:54.728708 p16690 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999890-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.repo, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/utils/repo.py, 68, "Checking out [master] in [/Users/bradl...-quit_0:04:59.999874)
2022-09-12 13:22:54.765182 p16690 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999863-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.process, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/utils/process.py, 95, "Already on 'master'
Your branch ...-quit_0:04:59.999846)
2022-09-12 13:22:54.765634 p16690 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999969-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.repo, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/utils/repo.py, 72, "Rebasing on [master] in [/Users/bradle...-quit_0:04:59.999961)
2022-09-12 13:22:54.823954 p16690 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999851-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.process, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/utils/process.py, 95, "Already on 'master'
Your branch ...-quit_0:04:59.999833)
2022-09-12 13:22:54.851653 p16690 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999822-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.process, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/utils/process.py, 95, "Current branch master is up to d...-quit_0:04:59.999804)
2022-09-12 13:22:54.889225 p16690 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999876-<class 'logging.LogRecord'>-<LogRecord: esrally.track.loader, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/track/loader.py, 1007, "Reading track specification file...-quit_0:04:59.999860)
2022-09-12 13:22:54.907152 p16690 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999943-<class 'logging.LogRecord'>-<LogRecord: esrally.track.loader, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/track/loader.py, 1015, "Final rendered track for '/Users...-quit_0:04:59.999930)
2022-09-12 13:22:54.912686 p16690 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999933-<class 'logging.LogRecord'>-<LogRecord: esrally.track.loader, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/track/loader.py, 1282, "Loading template [definition for...-quit_0:04:59.999918)
2022-09-12 13:22:54.915892 p16690 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999954-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.modules, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/utils/modules.py, 85, "Loading component [geonames] fro...-quit_0:04:59.999942)
2022-09-12 13:22:54.917120 p16690 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999953-<class 'logging.LogRecord'>-<LogRecord: esrally.track.loader, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/track/loader.py, 892, "Preparing track [geonames] for te...-quit_0:04:59.999940)
2022-09-12 13:22:54.986244 p16690 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999898-<class 'logging.LogRecord'>-<LogRecord: esrally.metrics, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/metrics.py, 289, "Creating in-memory metrics store">-quit_0:04:59.999882)
2022-09-12 13:22:54.986695 p16690 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999969-<class 'logging.LogRecord'>-<LogRecord: esrally.metrics, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/metrics.py, 408, "Opening metrics store for race timestamp=[2...-quit_0:04:59.999961)
2022-09-12 13:22:54.987257 p16690 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999911-<class 'logging.LogRecord'>-<LogRecord: esrally.metrics, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/metrics.py, 1239, "Creating file race store">-quit_0:04:59.999898)
2022-09-12 13:22:54.987705 p16690 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999967-<class 'logging.LogRecord'>-<LogRecord: esrally.actor, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/racecontrol.py, 111, "Asking mechanic to start the engine.">-quit_0:04:59.999959)
2022-09-12 13:22:54.988052 p16690 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999970-<class 'logging.LogRecord'>-<LogRecord: esrally.actor, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/actor.py, 124, "Capabilities [{'coordinator': True, 'ip': '127....-quit_0:04:59.999962)
2022-09-12 13:22:54.990063 p16690 dbg  Attempting intent TransportIntent(ActorAddr-LocalAddr.0-pending-ExpiresIn_0:04:59.999712-<class 'esrally.mechanic.mechanic.StartEngine'>-<esrally.mechanic.mechanic.StartEngine object at 0x1103afbe0>-quit_0:04:59.999699)
2022-09-12 13:22:54.990282 p16690 dbg  CannotPickleAddress, appending intent for ActorAddr-LocalAddr.0
2022-09-12 13:22:54.991046 p16690 dbg  ACTOR got ReceiveEnvelope(from: ActorAddr-(T|:61013), <class 'thespian.system.messages.multiproc.ChildMayHaveDied'> msg: <thespian.system.messages.multiproc.ChildMayHaveDied object at 0x1103aff10>)
2022-09-12 13:22:54.991759 p16718 I    Starting Actor esrally.mechanic.mechanic.MechanicActor at ActorAddr-(T|:61034) (parent ActorAddr-(T|:61013), admin ActorAddr-(T|:1900), srcHash None)
2022-09-12 13:22:54.992870 p16718 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61013)-pending-ExpiresIn_0:04:59.999306-<class 'thespian.system.messages.multiproc.EndpointConnected'>-<thespian.system.messages.multiproc.EndpointConnected object at 0x1103a5370>-quit_0:04:59.999296)
2022-09-12 13:22:55.000417 p16690 dbg  ACTOR got ReceiveEnvelope(from: ActorAddr-(T|:61034), <class 'thespian.system.messages.multiproc.EndpointConnected'> msg: <thespian.system.messages.multiproc.EndpointConnected object at 0x1102eb880>)
2022-09-12 13:22:55.000901 p16690 dbg  Attempting intent TransportIntent(ActorAddr-0-ActorAddr-(T|:61034)-pending-ExpiresIn_0:04:59.988730-<class 'esrally.mechanic.mechanic.StartEngine'>-<esrally.mechanic.mechanic.StartEngine object at 0x1103afbe0>-quit_0:04:59.988722)
2022-09-12 13:22:55.001487 p16690 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61034)-pending-ExpiresIn_0:04:59.988184-<class 'esrally.mechanic.mechanic.StartEngine'>-<esrally.mechanic.mechanic.StartEngine object at 0x1103afbe0>-quit_0:04:59.988175)
2022-09-12 13:22:55.002233 p16718 dbg  ACTOR got ReceiveEnvelope(from: ActorAddr-(T|:61013), <class 'esrally.mechanic.mechanic.StartEngine'> msg: <esrally.mechanic.mechanic.StartEngine object at 0x1103a5970>)
2022-09-12 13:22:55.002662 p16718 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999949-<class 'logging.LogRecord'>-<LogRecord: esrally.actor, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/mechanic.py, 366, "Received signal from race control t...-quit_0:04:59.999940)
2022-09-12 13:22:55.758482 p16718 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999873-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.repo, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/utils/repo.py, 68, "Checking out [master] in [/Users/bradl...-quit_0:04:59.999856)
2022-09-12 13:22:55.805396 p16718 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999860-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.process, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/utils/process.py, 95, "Switched to branch 'master'
Your...-quit_0:04:59.999845)
2022-09-12 13:22:55.806018 p16718 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999916-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.repo, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/utils/repo.py, 72, "Rebasing on [master] in [/Users/bradle...-quit_0:04:59.999900)
2022-09-12 13:22:55.858162 p16718 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999847-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.process, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/utils/process.py, 95, "Already on 'master'
Your branch ...-quit_0:04:59.999827)
2022-09-12 13:22:55.885869 p16718 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999818-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.process, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/utils/process.py, 95, "Current branch master is up to d...-quit_0:04:59.999799)
2022-09-12 13:22:55.924845 p16718 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999911-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.team, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/team.py, 339, "Loading plugin [analysis-icu] w...-quit_0:04:59.999896)
2022-09-12 13:22:55.925763 p16718 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999953-<class 'logging.LogRecord'>-<LogRecord: esrally.actor, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/mechanic.py, 387, "Cluster consisting of [{'host': '12...-quit_0:04:59.999927)
2022-09-12 13:22:55.926146 p16718 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999965-<class 'logging.LogRecord'>-<LogRecord: esrally.actor, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/actor.py, 124, "Capabilities [{'coordinator': True, 'ip': '127....-quit_0:04:59.999956)
2022-09-12 13:22:55.927880 p16718 dbg  Attempting intent TransportIntent(ActorAddr-LocalAddr.0-pending-ExpiresIn_0:04:59.999724-<class 'esrally.mechanic.mechanic.StartEngine'>-<esrally.mechanic.mechanic.StartEngine object at 0x1103a5970>-quit_0:04:59.999711)
2022-09-12 13:22:55.928101 p16718 dbg  CannotPickleAddress, appending intent for ActorAddr-LocalAddr.0
2022-09-12 13:22:55.928848 p16718 dbg  ACTOR got ReceiveEnvelope(from: ActorAddr-(T|:61034), <class 'thespian.system.messages.multiproc.ChildMayHaveDied'> msg: <thespian.system.messages.multiproc.ChildMayHaveDied object at 0x1103ab850>)
2022-09-12 13:22:55.929355 p16739 I    Starting Actor esrally.mechanic.mechanic.Dispatcher at ActorAddr-(T|:61050) (parent ActorAddr-(T|:61034), admin ActorAddr-(T|:1900), srcHash None)
2022-09-12 13:22:55.929827 p16739 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61034)-pending-ExpiresIn_0:04:59.999873-<class 'thespian.system.messages.multiproc.EndpointConnected'>-<thespian.system.messages.multiproc.EndpointConnected object at 0x1103abb80>-quit_0:04:59.999863)
2022-09-12 13:22:55.935353 p16718 dbg  ACTOR got ReceiveEnvelope(from: ActorAddr-(T|:61050), <class 'thespian.system.messages.multiproc.EndpointConnected'> msg: <thespian.system.messages.multiproc.EndpointConnected object at 0x1103a62b0>)
2022-09-12 13:22:55.935740 p16718 dbg  Attempting intent TransportIntent(ActorAddr-0-ActorAddr-(T|:61050)-pending-ExpiresIn_0:04:59.991700-<class 'esrally.mechanic.mechanic.StartEngine'>-<esrally.mechanic.mechanic.StartEngine object at 0x1103a5970>-quit_0:04:59.991692)
2022-09-12 13:22:55.936455 p16718 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61050)-pending-ExpiresIn_0:04:59.991061-<class 'esrally.mechanic.mechanic.StartEngine'>-<esrally.mechanic.mechanic.StartEngine object at 0x1103a5970>-quit_0:04:59.991052)
2022-09-12 13:22:55.937301 p16739 dbg  ACTOR got ReceiveEnvelope(from: ActorAddr-(T|:61034), <class 'esrally.mechanic.mechanic.StartEngine'> msg: <esrally.mechanic.mechanic.StartEngine object at 0x1103abd30>)
2022-09-12 13:22:55.937950 p16739 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999932-<class 'logging.LogRecord'>-<LogRecord: esrally.actor, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/actor.py, 124, "Capabilities [{'coordinator': True, 'ip': '127....-quit_0:04:59.999923)
2022-09-12 13:22:55.940678 p16739 dbg  Attempting intent TransportIntent(ActorAddr-LocalAddr.0-pending-ExpiresIn_0:04:59.999685-<class 'esrally.mechanic.mechanic.StartNodes'>-<esrally.mechanic.mechanic.StartNodes object at 0x1103a6940>-quit_0:04:59.999666)
2022-09-12 13:22:55.940877 p16739 dbg  CannotPickleAddress, appending intent for ActorAddr-LocalAddr.0
2022-09-12 13:22:55.942104 p16740 I    Starting Actor esrally.mechanic.mechanic.NodeMechanicActor at ActorAddr-(T|:61053) (parent ActorAddr-(T|:61050), admin ActorAddr-(T|:1900), srcHash None)
2022-09-12 13:22:55.942537 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61050)-pending-ExpiresIn_0:04:59.999866-<class 'thespian.system.messages.multiproc.EndpointConnected'>-<thespian.system.messages.multiproc.EndpointConnected object at 0x1103bf8e0>-quit_0:04:59.999856)
2022-09-12 13:22:55.944364 p16739 dbg  ACTOR got ReceiveEnvelope(from: ActorAddr-(T|:61053), <class 'thespian.system.messages.multiproc.EndpointConnected'> msg: <thespian.system.messages.multiproc.EndpointConnected object at 0x1103a69a0>)
2022-09-12 13:22:55.944744 p16739 dbg  Attempting intent TransportIntent(ActorAddr-0-ActorAddr-(T|:61053)-pending-ExpiresIn_0:04:59.995443-<class 'esrally.mechanic.mechanic.StartNodes'>-<esrally.mechanic.mechanic.StartNodes object at 0x1103a6940>-quit_0:04:59.995435)
2022-09-12 13:22:55.945263 p16739 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61053)-pending-ExpiresIn_0:04:59.994926-<class 'esrally.mechanic.mechanic.StartNodes'>-<esrally.mechanic.mechanic.StartNodes object at 0x1103a6940>-quit_0:04:59.994919)
2022-09-12 13:22:55.946014 p16740 dbg  ACTOR got ReceiveEnvelope(from: ActorAddr-(T|:61050), <class 'esrally.mechanic.mechanic.StartNodes'> msg: <esrally.mechanic.mechanic.StartNodes object at 0x1103c2190>)
2022-09-12 13:22:55.946453 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999947-<class 'logging.LogRecord'>-<LogRecord: esrally.actor, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/mechanic.py, 552, "Starting node(s) [0] on [127.0.0.1]...-quit_0:04:59.999939)
2022-09-12 13:22:55.947798 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999971-<class 'logging.LogRecord'>-<LogRecord: esrally.metrics, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/metrics.py, 408, "Opening metrics store for race timestamp=[2...-quit_0:04:59.999962)
2022-09-12 13:22:56.733382 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999869-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.repo, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/utils/repo.py, 127, "Checking out revision [c18ab4d] in [/...-quit_0:04:59.999852)
2022-09-12 13:22:56.771657 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999841-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.process, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/utils/process.py, 95, "Note: switching to 'c18ab4d'.

Y...-quit_0:04:59.999824)
2022-09-12 13:22:56.773528 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999949-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.team, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/team.py, 339, "Loading plugin [analysis-icu] w...-quit_0:04:59.999936)
2022-09-12 13:22:56.774563 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999967-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/supplier.py, 86, "Enabling source artifact...-quit_0:04:59.999959)
2022-09-12 13:22:56.777628 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999960-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/supplier.py, 127, "Adding core plugin sour...-quit_0:04:59.999951)
2022-09-12 13:22:56.778063 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999971-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.java_resolver, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/java_resolver.py, 48, "Allowed JDK ve...-quit_0:04:59.999963)
2022-09-12 13:22:56.874006 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999882-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.java_resolver, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/java_resolver.py, 50, "Detected JDK w...-quit_0:04:59.999867)
2022-09-12 13:22:56.874872 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999968-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/supplier.py, 685, "Fetching latest sources...-quit_0:04:59.999960)
2022-09-12 13:22:58.594374 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999850-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.process, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/utils/process.py, 95, "From https://github.com/elastic/...-quit_0:04:59.999834)
2022-09-12 13:22:58.760902 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999858-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.process, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/utils/process.py, 95, "Switched to branch 'main'
Your b...-quit_0:04:59.999841)
2022-09-12 13:22:58.934777 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999858-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.process, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/utils/process.py, 95, "
�[KSuccessfully rebased and upd...-quit_0:04:59.999843)
2022-09-12 13:22:58.969211 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999865-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/supplier.py, 702, "User-specified revision...-quit_0:04:59.999849)
2022-09-12 13:22:58.969813 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999934-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/supplier.py, 688, "Skip fetching sources f...-quit_0:04:59.999923)
2022-09-12 13:22:59.006508 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999783-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/supplier.py, 702, "User-specified revision...-quit_0:04:59.999766)
2022-09-12 13:22:59.007283 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999946-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/supplier.py, 439, "Setting build JDK major...-quit_0:04:59.999933)
2022-09-12 13:22:59.034824 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999868-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/supplier.py, 746, "Running build command [...-quit_0:04:59.999853)
2022-09-12 13:23:14.644823 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999840-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/supplier.py, 746, "Running build command [...-quit_0:04:59.999824)
2022-09-12 13:24:02.459694 p16740 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:61007)-pending-ExpiresIn_0:04:59.999668-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/mechanic/supplier.py, 439, "Setting build JDK major...-quit_0:04:59.999629)
2022-09-12 13:24:35.662401 p16740 dbg  Attempting intent TransportIntent(ActorAddr-(T|:61034)-pending-ExpiresIn_0:04:59.999743-<class 'esrally.mechanic.mechanic.NodesStarted'>-<esrally.mechanic.mechanic.NodesStarted object at 0x11036fc70>-quit_0:04:59.999734)
2022-09-12 13:24:35.664993 p16740 ERR  Actor esrally.mechanic.mechanic.NodeMechanicActor @ ActorAddr-(T|:61053) transport run exception: Traceback (most recent call last):
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/actorManager.py", line 88, in run
    r = self.transport.run(self.handleMessages)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/transport/wakeupTransportBase.py", line 71, in run
    rval = self._run_subtransport(incomingHandler, max_runtime)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/transport/wakeupTransportBase.py", line 80, in _run_subtransport
    rval = self._runWithExpiry(incomingHandler)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/transport/TCPTransport.py", line 1219, in _runWithExpiry
    self._acceptNewIncoming()
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/transport/TCPTransport.py", line 1342, in _acceptNewIncoming
    lsock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
OSError: [Errno 22] Invalid argument

2022-09-12 13:24:35.665390 p16740 dbg  Attempting intent TransportIntent(ActorAddr-(T|:61050)-pending-ExpiresIn_0:04:59.999884-<class 'thespian.actors.ChildActorExited'>-ChildActorExited:ActorAddr-(T|:61053)-quit_0:04:59.999871)

Thanks to @pquentin, we found a workaround for this by commenting out line 1342 in .venv/lib/python3.8/site-packages/thespian/system/transport/TCPTransport.py

- lsock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
+ # lsock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

We're not exactly sure what is happening (envoyproxy/envoy#1446 suggests that perhaps we're trying to set an option on a socket that is shut down), but this appears to only affect OSX.

@b-deam b-deam added the bug Something's wrong label Sep 12, 2022
@inqueue
Copy link
Member

inqueue commented Sep 12, 2022

On macOS, it only reproduces when Rally must invoke a build, otherwise, the race succeeds:

uname -a
Darwin grape.lan 21.6.0 Darwin Kernel Version 21.6.0: Wed Aug 10 14:25:27 PDT 2022; root:xnu-8020.141.5~2/RELEASE_X86_64 x86_64

The behavior is consistent with the rally.ini configuration for the local repository cache; i.e., do not cache artifacts:

[source]
remote.repo.url = https://github.com/elastic/elasticsearch.git
elasticsearch.src.subdir = elasticsearch
cache = false

During a successful benchmark, an stream of lsock objects are created at the start and they look like this:

_acceptNewIncoming:
      lsock=<socket.socket fd=29, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('192.168.5.116', 52934), raddr=('192.168.5.116', 52967)>

Starting a run we know will fail (one without a cached ES artifact), lsock starts like this:

_acceptNewIncoming:
      lsock=<socket.socket fd=19, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('192.168.5.116', 53029), raddr=('192.168.5.116', 53077)>

then ~85s later right before the failure point:

83.5s _acceptNewIncoming:
      lsock=<socket.socket fd=27, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('192.168.5.116', 53523)>

The repro and error are the same every time. I don't know yet what any of this means, but it has me thinking that either: 1) the ES build is getting in the way, or 2) there is something wrong in the actor system or how it is used.

@b-deam
Copy link
Member Author

b-deam commented Sep 13, 2022

  1. the ES build is getting in the way,

You can stub out the build command with something simple (i.e. sleep 10) and the hanging is still reproducible, so I don't think it's related to the build causing a 'timeout' or something similar.

  1. there is something wrong in the actor system or how it is used.

We should probably try and put together a minimal reproduction, but TBH I am not super familiar with how this part of the codebase works, and it's going to require quite some effort.

@inqueue
Copy link
Member

inqueue commented Sep 13, 2022

The differences with the socket local and remote addresses might be a red herring. Since Rally uses thespian 3.10.1, I tried 3.10.3 just to see if it would have an impact and it did not. I suspect the issue has more to do with attempting to act on an invalid TCP socket. I can confirm the issue occurs somewhere in here, reliably, every time ES is built from source:

[2022-09-13T11:05:20,746][INFO ][o.e.x.s.c.f.PersistentCache] [rally-node-0] persistent cache index loaded
[2022-09-13T11:05:20,747][INFO ][o.e.x.d.l.DeprecationIndexingComponent] [rally-node-0] deprecation component started
[2022-09-13T11:05:20,824][INFO ][o.e.t.TransportService   ] [rally-node-0] publish_address {127.0.0.1:29300}, bound_addresses {127.0.0.1:29300}
[2022-09-13T11:05:21,053][INFO ][o.e.c.c.ClusterBootstrapService] [rally-node-0] this node has not joined a bootstrapped cluster yet; [cluster.initial_master_nodes] is set to [rally-node-0]
[2022-09-13T11:05:21,059][INFO ][o.e.c.c.Coordinator      ] [rally-node-0] setting initial configuration to VotingConfiguration{_QoTerzwS5-mpL4yREcdVQ}
[2022-09-13T11:05:21,379][INFO ][o.e.c.s.MasterService    ] [rally-node-0] elected-as-master ([1] nodes joined)[_FINISH_ELECTION_, {rally-node-0}{_QoTerzwS5-mpL4yREcdVQ}{PC4Ey4lhRc6PLajebkJ6Sg}{rally-node-0}{127.0.0.1}{127.0.0.1:29300}{cdfhilmrstw} completing elect
ion], term: 1, version: 1, delta: master node changed {previous [], current [{rally-node-0}{_QoTerzwS5-mpL4yREcdVQ}{PC4Ey4lhRc6PLajebkJ6Sg}{rally-node-0}{127.0.0.1}{127.0.0.1:29300}{cdfhilmrstw}]}
[2022-09-13T11:05:21,463][INFO ][o.e.c.c.CoordinationState] [rally-node-0] cluster UUID set to [u_62GO_3Q6qNgIN3pmNcyA]
[2022-09-13T11:05:21,589][INFO ][o.e.c.s.ClusterApplierService] [rally-node-0] master node changed {previous [], current [{rally-node-0}{_QoTerzwS5-mpL4yREcdVQ}{PC4Ey4lhRc6PLajebkJ6Sg}{rally-node-0}{127.0.0.1}{127.0.0.1:29300}{cdfhilmrstw}]}, term: 1, version: 1, r
eason: Publication{term=1, version=1}
[2022-09-13T11:05:21,632][INFO ][o.e.r.s.FileSettingsService] [rally-node-0] starting file settings watcher ...
[2022-09-13T11:05:21,637][INFO ][o.e.r.s.FileSettingsService] [rally-node-0] file settings service up and running [tid=45]
[2022-09-13T11:05:21,639][INFO ][o.e.h.AbstractHttpServerTransport] [rally-node-0] publish_address {127.0.0.1:29200}, bound_addresses {127.0.0.1:29200}
[2022-09-13T11:05:21,639][INFO ][o.e.n.Node               ] [rally-node-0] started {rally-node-0}{_QoTerzwS5-mpL4yREcdVQ}{PC4Ey4lhRc6PLajebkJ6Sg}{rally-node-0}{127.0.0.1}{127.0.0.1:29300}{cdfhilmrstw}{xpack.installed=true}
[2022-09-13T11:05:21,778][INFO ][o.e.g.GatewayService     ] [rally-node-0] recovered [0] indices into cluster_state
[2022-09-13T11:05:22,162][INFO ][o.e.c.m.MetadataIndexTemplateService] [rally-node-0] adding component template [.deprecation-indexing-mappings]
<continues>

If Rally does not have to initiate an ES build, the benchmark always succeeds.

rally.log

2022-09-13 11:03:56,765 ActorAddr-(T|:59122)/PID:44858 esrally.utils.repo INFO Rebasing on [master] in [/Users/jbryan/.rally/benchmarks/teams/default] for distribution version [None].
2022-09-13 11:03:56,804 ActorAddr-(T|:59122)/PID:44858 esrally.utils.process INFO Already on 'master'
Your branch is up to date with 'origin/master'.

2022-09-13 11:03:56,835 ActorAddr-(T|:59122)/PID:44858 esrally.utils.process INFO Current branch master is up to date.

2022-09-13 11:03:56,855 ActorAddr-(T|:59122)/PID:44858 esrally.mechanic.team INFO Loading plugin [analysis-icu] with default configuration.
2022-09-13 11:03:56,856 ActorAddr-(T|:59122)/PID:44858 esrally.actor INFO Cluster consisting of [{'host': '127.0.0.1', 'port': 29200}] will be provisioned by Rally.
2022-09-13 11:03:56,858 ActorAddr-(T|:59122)/PID:44858 esrally.actor INFO Capabilities [{'coordinator': True, 'ip': '127.0.0.1', 'Convention Address.IPv4': '127.0.0.1:1900', 'Thespian ActorSystem Name': 'multiprocTCPBase', 'Thespian ActorSystem Version': 2, 'Thespi
an Watch Supported': True, 'Python Version': (3, 10, 4, 'final', 0), 'Thespian Generation': (3, 10), 'Thespian Version': '1663081435739'}] match requirements [{}].
2022-09-13 11:03:56,898 ActorAddr-(T|:59138)/PID:44884 esrally.actor INFO Capabilities [{'coordinator': True, 'ip': '127.0.0.1', 'Convention Address.IPv4': '127.0.0.1:1900', 'Thespian ActorSystem Name': 'multiprocTCPBase', 'Thespian ActorSystem Version': 2, 'Thespi
an Watch Supported': True, 'Python Version': (3, 10, 4, 'final', 0), 'Thespian Generation': (3, 10), 'Thespian Version': '1663081435739'}] match requirements [{'coordinator': True}].
2022-09-13 11:03:56,915 ActorAddr-(T|:59141)/PID:44885 esrally.actor INFO Starting node(s) [0] on [127.0.0.1].
(END)

Actor system log

2022-09-13 11:03:56.447174 p44836 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999909-<class 'logging.LogRecord'>-<LogRecord: esrally.metrics, 20, /Users/jbryan/dev/src/rally/esrally/metrics.py, 1239, "Creating file race store">-quit_0:04:59.999890)
2022-09-13 11:03:56.447769 p44836 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999940-<class 'logging.LogRecord'>-<LogRecord: esrally.actor, 20, /Users/jbryan/dev/src/rally/esrally/racecontrol.py, 111, "Asking mechanic to start the engine.">-quit_0:04:59.999923)
2022-09-13 11:03:56.448454 p44836 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999931-<class 'logging.LogRecord'>-<LogRecord: esrally.actor, 20, /Users/jbryan/dev/src/rally/esrally/actor.py, 124, "Capabilities [{'coordinator': True, 'ip': '127.0.0.1', 'Convention ...-quit_0:04:59.99
9913)
2022-09-13 11:03:56.452935 p44836 dbg  Attempting intent TransportIntent(ActorAddr-LocalAddr.0-pending-ExpiresIn_0:04:59.999416-<class 'esrally.mechanic.mechanic.StartEngine'>-<esrally.mechanic.mechanic.StartEngine object at 0x10c4c0580>-quit_0:04:59.998350)
2022-09-13 11:03:56.453415 p44836 dbg  CannotPickleAddress, appending intent for ActorAddr-LocalAddr.0
2022-09-13 11:03:56.455489 p44858 I    Starting Actor esrally.mechanic.mechanic.MechanicActor at ActorAddr-(T|:59122) (parent ActorAddr-(T|:59101), admin ActorAddr-(T|:1900), srcHash None)
2022-09-13 11:03:56.456596 p44858 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59101)-pending-ExpiresIn_0:04:59.999606-<class 'thespian.system.messages.multiproc.EndpointConnected'>-<thespian.system.messages.multiproc.EndpointConnected object at 0x10c4c2380>-quit_0:04:59.999562)
2022-09-13 11:03:56.456894 p44836 dbg  ACTOR got ReceiveEnvelope(from: ActorAddr-(T|:59101), <class 'thespian.system.messages.multiproc.ChildMayHaveDied'> msg: <thespian.system.messages.multiproc.ChildMayHaveDied object at 0x10c4c2290>)
2022-09-13 11:03:56.486113 p44836 dbg  ACTOR got ReceiveEnvelope(from: ActorAddr-(T|:59122), <class 'thespian.system.messages.multiproc.EndpointConnected'> msg: <thespian.system.messages.multiproc.EndpointConnected object at 0x10c4c2230>)
2022-09-13 11:03:56.487008 p44836 dbg  Attempting intent TransportIntent(ActorAddr-0-ActorAddr-(T|:59122)-pending-ExpiresIn_0:04:59.964128-<class 'esrally.mechanic.mechanic.StartEngine'>-<esrally.mechanic.mechanic.StartEngine object at 0x10c4c0580>-quit_0:04:59.964107)
2022-09-13 11:03:56.487854 p44836 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59122)-pending-ExpiresIn_0:04:59.963275-<class 'esrally.mechanic.mechanic.StartEngine'>-<esrally.mechanic.mechanic.StartEngine object at 0x10c4c0580>-quit_0:04:59.963258)
2022-09-13 11:03:56.489045 p44858 dbg  ACTOR got ReceiveEnvelope(from: ActorAddr-(T|:59101), <class 'esrally.mechanic.mechanic.StartEngine'> msg: <esrally.mechanic.mechanic.StartEngine object at 0x10c47f130>)
2022-09-13 11:03:56.489814 p44858 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999909-<class 'logging.LogRecord'>-<LogRecord: esrally.actor, 20, /Users/jbryan/dev/src/rally/esrally/mechanic/mechanic.py, 366, "Received signal from race control to start engine.">-quit_0:04:59.999891)
2022-09-13 11:03:56.743671 p44858 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999772-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.repo, 20, /Users/jbryan/dev/src/rally/esrally/utils/repo.py, 68, "Checking out [master] in [/Users/jbryan/.rally/benchmarks/...-quit_0:04:59.999733)
2022-09-13 11:03:56.764912 p44858 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999740-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.process, 20, /Users/jbryan/dev/src/rally/esrally/utils/process.py, 95, "Switched to branch 'master'
Your branch is up to dat...-quit_0:04:59.999715)
2022-09-13 11:03:56.765567 p44858 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999937-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.repo, 20, /Users/jbryan/dev/src/rally/esrally/utils/repo.py, 72, "Rebasing on [master] in [/Users/jbryan/.rally/benchmarks/t...-quit_0:04:59.999921)
2022-09-13 11:03:56.804984 p44858 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999742-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.process, 20, /Users/jbryan/dev/src/rally/esrally/utils/process.py, 95, "Already on 'master'
Your branch is up to date with '...-quit_0:04:59.999715)
2022-09-13 11:03:56.836045 p44858 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999766-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.process, 20, /Users/jbryan/dev/src/rally/esrally/utils/process.py, 95, "Current branch master is up to date.
">-quit_0:04:59.999740)
2022-09-13 11:03:56.856111 p44858 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999794-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.team, 20, /Users/jbryan/dev/src/rally/esrally/mechanic/team.py, 339, "Loading plugin [analysis-icu] with default configur...-quit_0:04:59.999771)
2022-09-13 11:03:56.857504 p44858 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999924-<class 'logging.LogRecord'>-<LogRecord: esrally.actor, 20, /Users/jbryan/dev/src/rally/esrally/mechanic/mechanic.py, 387, "Cluster consisting of [{'host': '127.0.0.1', 'port': 29...-quit_0:04:59.999906)
2022-09-13 11:03:56.858302 p44858 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999838-<class 'logging.LogRecord'>-<LogRecord: esrally.actor, 20, /Users/jbryan/dev/src/rally/esrally/actor.py, 124, "Capabilities [{'coordinator': True, 'ip': '127.0.0.1', 'Convention ...-quit_0:04:59.999821)
2022-09-13 11:03:56.861429 p44858 dbg  Attempting intent TransportIntent(ActorAddr-LocalAddr.0-pending-ExpiresIn_0:04:59.999415-<class 'esrally.mechanic.mechanic.StartEngine'>-<esrally.mechanic.mechanic.StartEngine object at 0x10c47f130>-quit_0:04:59.999357)
2022-09-13 11:03:56.861996 p44858 dbg  CannotPickleAddress, appending intent for ActorAddr-LocalAddr.0
2022-09-13 11:03:56.864175 p44884 I    Starting Actor esrally.mechanic.mechanic.Dispatcher at ActorAddr-(T|:59138) (parent ActorAddr-(T|:59122), admin ActorAddr-(T|:1900), srcHash None)
2022-09-13 11:03:56.865495 p44884 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59122)-pending-ExpiresIn_0:04:59.999467-<class 'thespian.system.messages.multiproc.EndpointConnected'>-<thespian.system.messages.multiproc.EndpointConnected object at 0x10c47fb20>-quit_0:04:59.999421)
2022-09-13 11:03:56.866421 p44858 dbg  ACTOR got ReceiveEnvelope(from: ActorAddr-(T|:59122), <class 'thespian.system.messages.multiproc.ChildMayHaveDied'> msg: <thespian.system.messages.multiproc.ChildMayHaveDied object at 0x10c47fa30>)
2022-09-13 11:03:56.893966 p44858 dbg  ACTOR got ReceiveEnvelope(from: ActorAddr-(T|:59138), <class 'thespian.system.messages.multiproc.EndpointConnected'> msg: <thespian.system.messages.multiproc.EndpointConnected object at 0x10c4c2320>)
2022-09-13 11:03:56.894771 p44858 dbg  Attempting intent TransportIntent(ActorAddr-0-ActorAddr-(T|:59138)-pending-ExpiresIn_0:04:59.965815-<class 'esrally.mechanic.mechanic.StartEngine'>-<esrally.mechanic.mechanic.StartEngine object at 0x10c47f130>-quit_0:04:59.965797)
2022-09-13 11:03:56.895724 p44858 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59138)-pending-ExpiresIn_0:04:59.964839-<class 'esrally.mechanic.mechanic.StartEngine'>-<esrally.mechanic.mechanic.StartEngine object at 0x10c47f130>-quit_0:04:59.964822)
2022-09-13 11:03:56.897395 p44884 dbg  ACTOR got ReceiveEnvelope(from: ActorAddr-(T|:59122), <class 'esrally.mechanic.mechanic.StartEngine'> msg: <esrally.mechanic.mechanic.StartEngine object at 0x10c4e7a30>)
2022-09-13 11:03:56.898738 p44884 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999878-<class 'logging.LogRecord'>-<LogRecord: esrally.actor, 20, /Users/jbryan/dev/src/rally/esrally/actor.py, 124, "Capabilities [{'coordinator': True, 'ip': '127.0.0.1', 'Convention ...-quit_0:04:59.999859)
2022-09-13 11:03:56.902820 p44884 dbg  Attempting intent TransportIntent(ActorAddr-LocalAddr.0-pending-ExpiresIn_0:04:59.999394-<class 'esrally.mechanic.mechanic.StartNodes'>-<esrally.mechanic.mechanic.StartNodes object at 0x10c4e79d0>-quit_0:04:59.999342)
2022-09-13 11:03:56.903376 p44884 dbg  CannotPickleAddress, appending intent for ActorAddr-LocalAddr.0
2022-09-13 11:03:56.905181 p44885 I    Starting Actor esrally.mechanic.mechanic.NodeMechanicActor at ActorAddr-(T|:59141) (parent ActorAddr-(T|:59138), admin ActorAddr-(T|:1900), srcHash None)
2022-09-13 11:03:56.906326 p44885 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59138)-pending-ExpiresIn_0:04:59.999581-<class 'thespian.system.messages.multiproc.EndpointConnected'>-<thespian.system.messages.multiproc.EndpointConnected object at 0x10c4e47f0>-quit_0:04:59.999537)
2022-09-13 11:03:56.911268 p44884 dbg  ACTOR got ReceiveEnvelope(from: ActorAddr-(T|:59141), <class 'thespian.system.messages.multiproc.EndpointConnected'> msg: <thespian.system.messages.multiproc.EndpointConnected object at 0x10c4e7c40>)
2022-09-13 11:03:56.912409 p44884 dbg  Attempting intent TransportIntent(ActorAddr-0-ActorAddr-(T|:59141)-pending-ExpiresIn_0:04:59.989520-<class 'esrally.mechanic.mechanic.StartNodes'>-<esrally.mechanic.mechanic.StartNodes object at 0x10c4e79d0>-quit_0:04:59.989499)
2022-09-13 11:03:56.913529 p44884 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59141)-pending-ExpiresIn_0:04:59.988366-<class 'esrally.mechanic.mechanic.StartNodes'>-<esrally.mechanic.mechanic.StartNodes object at 0x10c4e79d0>-quit_0:04:59.988347)
2022-09-13 11:03:56.914883 p44885 dbg  ACTOR got ReceiveEnvelope(from: ActorAddr-(T|:59138), <class 'esrally.mechanic.mechanic.StartNodes'> msg: <esrally.mechanic.mechanic.StartNodes object at 0x10c4e57e0>)
2022-09-13 11:03:56.915862 p44885 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999846-<class 'logging.LogRecord'>-<LogRecord: esrally.actor, 20, /Users/jbryan/dev/src/rally/esrally/mechanic/mechanic.py, 552, "Starting node(s) [0] on [127.0.0.1].">-quit_0:04:59.999826)
2022-09-13 11:03:56.920998 p44885 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999909-<class 'logging.LogRecord'>-<LogRecord: esrally.metrics, 20, /Users/jbryan/dev/src/rally/esrally/metrics.py, 408, "Opening metrics store for race timestamp=[20220913T150348Z], tr...-quit_0:04:59.99
9889)
2022-09-13 11:03:57.165369 p44885 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999780-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.repo, 20, /Users/jbryan/dev/src/rally/esrally/utils/repo.py, 127, "Checking out revision [c18ab4d] in [/Users/jbryan/.rally/...-quit_0:04:59.99
9754)
2022-09-13 11:03:57.199535 p44885 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999768-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.process, 20, /Users/jbryan/dev/src/rally/esrally/utils/process.py, 95, "Note: switching to 'c18ab4d'.

You are in 'detached ...-quit_0:04:59.999741)
2022-09-13 11:03:57.202417 p44885 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999928-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.team, 20, /Users/jbryan/dev/src/rally/esrally/mechanic/team.py, 339, "Loading plugin [analysis-icu] with default configur...-quit_0:04:59.99
9909)
2022-09-13 11:03:57.204056 p44885 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999933-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/jbryan/dev/src/rally/esrally/mechanic/supplier.py, 74, "Enabling source artifact caching.">-quit_0:04:59.999915)
2022-09-13 11:03:57.206042 p44885 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999941-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/jbryan/dev/src/rally/esrally/mechanic/supplier.py, 115, "Adding core plugin source supplier for [ana...-quit_0:04:59.99
9924)
2022-09-13 11:03:57.206654 p44885 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999943-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.java_resolver, 20, /Users/jbryan/dev/src/rally/esrally/mechanic/java_resolver.py, 44, "Using JDK bundled with Elasticsear...-quit_0:04:59.99
9926)

Noting a break here since it takes a moment to download ES source.

2022-09-13 11:05:00.271099 p44885 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999732-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.process, 20, /Users/jbryan/dev/src/rally/esrally/utils/process.py, 95, "Cloning into '/Users/jbryan/.rally/benchmarks/src/el...-quit_0:04:59.99
9702)
2022-09-13 11:05:00.271966 p44885 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999918-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/jbryan/dev/src/rally/esrally/mechanic/supplier.py, 630, "Fetching from remote and checking out revis...-quit_0:04:59.99
9898)
2022-09-13 11:05:01.300922 p44885 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999746-<class 'logging.LogRecord'>-<LogRecord: esrally.utils.process, 20, /Users/jbryan/dev/src/rally/esrally/utils/process.py, 95, "Note: switching to '2c4731e900b4da94503fa570a27547c2...-quit_0:04:59.99
9719)
2022-09-13 11:05:01.319588 p44885 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999767-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/jbryan/dev/src/rally/esrally/mechanic/supplier.py, 640, "User-specified revision [@2022-09-11] for [...-quit_0:04:59.99
9741)
2022-09-13 11:05:01.320486 p44885 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999902-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/jbryan/dev/src/rally/esrally/mechanic/supplier.py, 626, "Skip fetching sources for Elasticsearch.">-quit_0:04:59.999883
)
2022-09-13 11:05:01.347340 p44885 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999706-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/jbryan/dev/src/rally/esrally/mechanic/supplier.py, 640, "User-specified revision [current] for [Elas...-quit_0:04:59.99
9677)
2022-09-13 11:05:01.348167 p44885 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999928-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/jbryan/dev/src/rally/esrally/mechanic/supplier.py, 342, "Using cached artifact in [/Users/jbryan/.ra...-quit_0:04:59.99
9908)
2022-09-13 11:05:01.348863 p44885 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999887-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.supplier, 20, /Users/jbryan/dev/src/rally/esrally/mechanic/supplier.py, 342, "Using cached artifact in [/Users/jbryan/.ra...-quit_0:04:59.99
9866)
2022-09-13 11:05:01.349631 p44885 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999925-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.provisioner, 20, /Users/jbryan/dev/src/rally/esrally/mechanic/provisioner.py, 275, "Preparing candidate locally in [/User...-quit_0:04:59.99
9905)
2022-09-13 11:05:01.353926 p44885 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999668-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.provisioner, 20, /Users/jbryan/dev/src/rally/esrally/mechanic/provisioner.py, 280, "Unzipping /Users/jbryan/.rally/benchm...-quit_0:04:59.999645)
2022-09-13 11:05:05.310665 p44885 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:59095)-pending-ExpiresIn_0:04:59.999753-<class 'logging.LogRecord'>-<LogRecord: esrally.mechanic.provisioner, 20, /Users/jbryan/dev/src/rally/esrally/mechanic/provisioner.py, 287, "Deleting pre-bundled Elasticsearch co...-quit_0:04:59.999692)
2022-09-13 11:05:21.996964 p44885 dbg  Attempting intent TransportIntent(ActorAddr-(T|:59122)-pending-ExpiresIn_0:04:59.999525-<class 'esrally.mechanic.mechanic.NodesStarted'>-<esrally.mechanic.mechanic.NodesStarted object at 0x10c4f2fe0>-quit_0:04:59.999501)
2022-09-13 11:05:22.002274 p44885 ERR  Actor esrally.mechanic.mechanic.NodeMechanicActor @ ActorAddr-(T|:59141) transport run exception: Traceback (most recent call last):
  File "/Users/jbryan/dev/src/rally/.venv/lib/python3.10/site-packages/thespian/system/actorManager.py", line 87, in run
    r = self.transport.run(self.handleMessages)
  File "/Users/jbryan/dev/src/rally/.venv/lib/python3.10/site-packages/thespian/system/transport/wakeupTransportBase.py", line 71, in run
    rval = self._run_subtransport(incomingHandler, max_runtime)
  File "/Users/jbryan/dev/src/rally/.venv/lib/python3.10/site-packages/thespian/system/transport/wakeupTransportBase.py", line 80, in _run_subtransport
    rval = self._runWithExpiry(incomingHandler)
  File "/Users/jbryan/dev/src/rally/.venv/lib/python3.10/site-packages/thespian/system/transport/TCPTransport.py", line 1219, in _runWithExpiry
    self._acceptNewIncoming()
  File "/Users/jbryan/dev/src/rally/.venv/lib/python3.10/site-packages/thespian/system/transport/TCPTransport.py", line 1346, in _acceptNewIncoming
    lsock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
OSError: [Errno 22] Invalid argument

2022-09-13 11:05:22.003103 p44885 dbg  Attempting intent TransportIntent(ActorAddr-(T|:59138)-pending-ExpiresIn_0:04:59.999732-<class 'thespian.actors.ChildActorExited'>-ChildActorExited:ActorAddr-(T|:59141)-quit_0:04:59.999701)

The Actor system error occurring right at esrally.mechanic.mechanic.NodesStarted is suspicious. Like @b-deam says, I think it is going to require some minimal reproduction to track it down.

@pquentin
Copy link
Member

I reported the issue upstream: thespianpy/Thespian#70

@pquentin
Copy link
Member

pquentin commented Oct 4, 2022

Right now if anyone faces the issue I'd recommend to apply the local workaround until we hear more from the Thespian side.

@inqueue
Copy link
Member

inqueue commented Jan 12, 2023

I encountered another way to reproduce the issue after forgetting to use the workaround. The stall can happen directly after downloading elastic/logs track corpora and building file offsets. It reproduces pretty reliably on macOS.

https://gist.github.com/inqueue/ddb5c28bb0512ebdc26391bd32e10d91

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something's wrong
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants