-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove dispatching in TaskCollection #8903
Remove dispatching in TaskCollection #8903
Conversation
@@ -1149,7 +1130,7 @@ class TaskGroup(TaskCollection): | |||
__slots__ = tuple(__annotations__) | |||
|
|||
def __init__(self, name: str, prefix: TaskPrefix): | |||
super().__init__(name) | |||
TaskCollection.__init__(self, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super()
is convenient and very useful in complex MROs but it also adds overhead which we don't need/want here.
super().update_nbytes(diff) | ||
self.prefix.update_nbytes(diff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haven't benchmarked this but the function dispatch is more epensive than the computation itself.
duration_us = max(round((stop - start) * 1e6), 0) | ||
self._duration_us += duration_us | ||
self._all_durations_us[action] += duration_us |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this contradicts DRY but I think it's worth it considering that it's just two places and the complexity is low
@@ -1033,7 +1015,7 @@ class TaskPrefix(TaskCollection): | |||
__slots__ = tuple(__annotations__) | |||
|
|||
def __init__(self, name: str): | |||
super().__init__(name) | |||
TaskCollection.__init__(self, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it still makes some sense to keep the baseclass for common state instantiation and getters but we should try to avoid super
distributed/scheduler.py
Outdated
@property | ||
def types(self) -> Set[str]: | ||
"""The result types of this collection""" | ||
return self._types.keys() | ||
|
||
def update_nbytes(self, diff: int) -> None: | ||
self.nbytes_total += diff | ||
|
||
@staticmethod | ||
def _calculate_duration_us(start: float, stop: float) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: this is dead (dispatch not worth it)
def __len__(self) -> int: | ||
return sum(self.states.values()) | ||
return self._size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen plugins accessing this and this can also be costly if calculated on demand
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 25 files + 1 25 suites +1 10h 28m 45s ⏱️ + 30m 13s For more details on these failures, see this check. Results for commit e660bbf. ± Comparison against base commit 48509b3. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @fjetter
We refactored the TaskState a while back such that TaskPrefix, TaskGroup classes are subclasses and use similar interfaces to update their internal state. See #8681 and linked ticket for context.
For large graphs (see below pyspy profile of the highlevel climatology workload of the coiled geo benchmarks) this class structure is adding notable overhead.
scheduler.zip
This screen shows one of many of the papercuts we're dealing with here
(This is a left heavy view of a client_releases_keys of about 1M tasks)
The state update/transition and other related state udpates are popping up all over the place. Tiny papercuts that add up over time. I haven't done a measurement how this will play out end to end. Likely not a huge deal but small things add up for those hot paths.
Micro benchmarks
The micro benchmarks (see above) show that ditching the indirection entirely speeds this up by a factor of 2. Not game changing but this abstraction is not worth any performance penalty.