Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about hook call order #250

Open
tlambert03 opened this issue Jan 25, 2020 · 27 comments
Open

Question about hook call order #250

tlambert03 opened this issue Jan 25, 2020 · 27 comments

Comments

@tlambert03
Copy link
Contributor

❓ Question

I'm new to pluggy, and trying to decide whether to use it as the plugin manager in a project I work on (looks great!). One of the main things I find myself wanting more control over is the exact order in which implementations of a specific hook are called (beyond simply registering an @hookimpl with trylast or tryfirst). Specifically, I would ultimately like to let the end user reorder plugin implementations of specific hooks after plugin registration.

To accomplish that I find myself tempted to write methods to reorder _HookCaller._nonwrappers ... but because that is a private attribute, I am hesitant to do so and wonder if I am maybe misunderstanding a fundamental design principle here. Is that a valid approach, or discouraged?

@goodboy
Copy link
Contributor

goodboy commented Jan 26, 2020

Hey @tlambert03. The call time order is LIFO relative to registration as mentioned in the docs. So if you want a specific order currently you'd have to register hooks last to first in order to get that order when invoking. We don't have a public API for ordering currently but it might be something useful? I've thought about offering a generator API which allows for controlling which hooks are invoked by yielding them in the order desired - this might play with #50 potentially.

If you're concerned about order it may be more sensible to build a small *ordered hook protocol" much like pytest does. Instead of trying to control the order of invocation of a set of implementations for a single hook, make multiple hooks that are invoked in a particular order: If multiple implementations are provided for a hook you get all results in a list so if it's a matter of just reordering results then it should be mostly straight forward (though you might not know which element is from which implementation). I have feeling what you're actually after is something more along the lines of what we've discussed in #50.

Hope this helps!

@tlambert03
Copy link
Contributor Author

tlambert03 commented Jan 26, 2020

Thanks for the response! Definitely helpful to get your thoughts on this. I think my case is along the lines of #50 (though actually... after reading that, I'm now slightly confused about my understanding of the firstresult option). The ordered hook protocol is interesting, but I'm still not sure I see a full/ideal solution for my usage. In case it's helpful, I'll provide a little more background on our use case.

I'm contemplating using pluggy as the plugin manager for napari, a multidimensional image/data viewer. The first hook we're discussing would be an file reader plugin, where plugins can add support for different file formats. Plugins would implement a hook that accepts a filepath (str) and returns None if the path represents a file/directory/URL format that the plugin does not support, or a callback function capable of actually reading the data if it does recognize the format (I'm glossing over some side details, but that's the basic idea). Here are the reasons I wanted the consumer to be able to reorder the hooks:

  • there may be many plugins registered, yet we want to ensure that this particular hook runs very fast.
  • we want to use firstresult to ensure that as soon as one of the plugins recognizes the format, its callback function is used and the other plugins do not even get a chance to run their hooks.
  • multiple plugins will likely recognize/support a given file, so we'd very much like the end-user to be able to tune the order of hook calls to suite their use case (we would provide a GUI for them to do so), potentially even reordering multiple times after plugins have been registered.

it's not immediately clear how to use an ordered hook protocol to allow the end user to reoderd the hooks, while also using the firstresult idea to immediately terminate the hook chain once a plugin claims that file. If you agree that this is a use case that can't be cleanly accomplished at the moment and would consider a PR, I can think about it some more and potentially submit one. Thanks again!

(sorry accidentally hit the close button).

@elchupanebrej
Copy link

Hey @tlambert03 , @goodboy

Maybe the next kind of hookimpl API could be helpful:

from pluggy import PluginManager, HookimplMarker

hookimpl = HookimplMarker("myproject")


class Plugin1:
    @hookimpl(before=['Plugin3'])
    def myhook(self, args):
        return 1


class Plugin2:
    @hookimpl(before=['Plugin1'])
    def myhook(self, args):
        return 2


class Plugin3:
    @hookimpl(after=['Plugin2'])
    def myhook(self, args):
        return 3


pm = PluginManager("myproject")
pm.register(Plugin1())
pm.register(Plugin2())
pm.register(Plugin3())

assert pm.hook.myhook(args=()) == [2, 1, 3]

Implementation could be pretty easy based on a projection of partially ordered set to strict order. Tryfirst/trylast also could be in place but have to be less strict than the defined order. Please share your thoughts.

@RonnyPfannschmidt
Copy link
Member

Having topologies is a long desired feature, what's lacking is a actually sensible implementation

@AvlWx2014
Copy link

AvlWx2014 commented Sep 15, 2022

What about introducing public API to allow subclasses of PluginManager (or through constructor parameters to PluginManager instead) to provide a custom hook execution function? With some hand-holding from pluggy I think this would help address running plugins in any order clients want without clients being able to go wrong easily, and without necessarily having to overhaul PluginManager.

For example, I'm working on being able to execute plugins that have dependencies on other plugins. Right now I have a subclass of PluginManager that maintains the set of registered plugins in a DAG and overrides self._inner_hookexec with its own function that executes plugins in a different order.

Example:

class DagPluginManager(PluginManager):
    def __init__(self, project_name):
        super().__init__(project_name)
        self._dag = SimpleDirectedGraph()
        self._inner_hookexec = self._dag_exec
        
    def _dag_exec(self, _, hook_impls, caller_kwargs, __):
        # execute plugins in custom order based on graph

Obviously this approach is pretty fragile since it relies on internal API and some implementation details, which is why I want to help implement something that isn't fragile. Wdyt?

@RonnyPfannschmidt
Copy link
Member

Having a dag of plugins both for conditional and for unconditional activation and ordering is a long wanted feature

Unfortunately there is a number of caveats and not enough time

Im happy to provide input, but I can't work on it myself anytime soon

@AvlWx2014
Copy link

Rather than overhauling PluginManager to use a dag (or adding another PluginManager implementation which uses a dag), I was suggesting it could be helpful to simply allow client code to inject their own exec function to substitute for multicall. I'm basically doing this right now, but it's not public API. If you think that's useful I can submit a WIP PR for it, but if not I could try the DAG approach.

@RonnyPfannschmidt
Copy link
Member

instead of replacing multicall, i would suggest to have something that orders the items added baed on a better topology

right now we have tryfirst/trylast and the hookwrapper vs normal hook order

whats needed as well would be to have something like before/after/requires to add the extra items

@pirate
Copy link

pirate commented Sep 27, 2024

Would ya'll be open to me submitting a new PR to add two new methods inspired by: pluggy.PluginManager.subset_hook_caller()?

  • PluginManager.ordered_hook_caller(name, plugin_order=[...])
  • PluginManager.reversed_hook_caller(name)

These would allow the user to specify a plugin order at calltime, just like they can specify a subset currently. You can then use any arbitrary other code to determine the topology, and simply pass it via the plugin_order=... arg.

class PluginManager:
    ...
    
    def ordered_hook_caller(
        self, name: str, plugin_order: Iterable[_Plugin]
    ) -> HookCaller:
        """Return a proxy :class:`~pluggy.HookCaller` instance for the named
        method which manages calls to the given plugins in the given order."""
        orig: HookCaller = getattr(self.hook, name)
        return _OrderedHookCaller(orig, plugin_order)

    def reversed_hook_caller(self) -> HookCaller:
        """Return a proxy :class:`~pluggy.HookCaller` instance for the named
        method which manages calls to the all plugins in reverse (FIFO) order."""

		return self.ordered_hook_caller(plugin_order=reversed(self.get_plugins()))
class _OrderedHookCaller(HookCaller):
    """A proxy to another HookCaller which manages calls to all registered
    plugins in a fixed order passed as an argument. (based on _SubsetHookCaller)"""

    __slots__ = ("_orig", "_plugin_order")

    def __init__(self, orig: HookCaller, plugin_order: AbstractSet[_Plugin]) -> None:
        self._orig = orig
        self._plugin_order = plugin_order
        self.name = orig.name  # type: ignore[misc]
        self._hookexec = orig._hookexec  # type: ignore[misc]

    @property  # type: ignore[misc]
    def _hookimpls(self) -> list[HookImpl]:
		return sorted(
			self._orig._hookimpls,
			key=lambda order, _impl: self._plugin_order.index(impl.plugin)
		)

    @property
    def spec(self) -> HookSpec | None:  # type: ignore[override]
        return self._orig.spec

    @property
    def _call_history(self) -> _CallHistory | None:  # type: ignore[override]
        return self._orig._call_history

    def __repr__(self) -> str:
        return f"<_OrderedHookCaller {self.name!r}>"

@RonnyPfannschmidt
Copy link
Member

I need a better explanation on the wins here,as proposed here I consider it a -1 as it adds confusion to invocation

@pirate
Copy link

pirate commented Sep 27, 2024

The benefit is the ability to call hookimpls in reverse (FIFO) order.

To start with a really simple example, lets say your plugins log some console messages to stdout on startup, and the user has defined their plugins to be ['plugin1', 'plugin2', 'plugin2'].

Currently if each plugin defines a hook like def log_on_startup(): print('Plugin1 has started!'), etc., then calling pm.hook.log_on_startup will confuse the user as it will print:

Plugin3 has started!
Plugin2 has started!
Plugin1 has started!

(reversed from the order they defined them)

You wouldn't want to reverse the actual plugin order just to make the log lines print in order, because you do actually want the later plugin3 to override anything it clashes with from the earlier plugin1 for example. (and that is also what the user would expect given that order, last plugin listed wins)

In this scenario it makes sense to have an optional HookCaller that can invoke hookimpls in reverse, that way we could do:

if __name__ == '__main__':
    pm.reversed_hook_caller().log_on_startup()

and it would print:

Plugin1 has started!
Plugin2 has started!
Plugin3 has started!

(matching the order the user defined their plugins, and keeping last-plugin-wins behavior intact)

There are many other cases too where having reverse order is helpful: e.g. when later plugins depend on side effects (filesystem writes, db changes, etc.) triggered by earlier plugins.

Would you prefer a different name for the function? Perhaps pm.fifo_hook_caller() or maybe pm.hook_fifo()?


Custom order is also useful in many cases, e.g. if plugin1 writes to stdout and runs really quickly and plugin2 writes to a remote filestore and is quite slow, we may want to run plugin1 before plugin2 so that the user sees some progress quickly.

Another scenario is where each plugin exposes some function that operates on some object, and the order is user-customizable at runtime using a flag on the object, we need a way to override plugin order to match the user-provided order.

Basically you can imagine any scenario where subset_hook_caller() is useful, there is probably a similar scenario where a ordered_hook_caller() is also useful (and they might even be needed together too).

@RonnyPfannschmidt
Copy link
Member

the tryfirst and trylast arguments to hookimpl are the baseline control for order

adding call site controlled order as a api seems like a point of confusion as all call sites must remember to call the correct hook invocation

the outlined use-case looks like something that can solve with tryfirst/last as a approximation, while not a good/perfect solution, introducing call site order control is a api that users will get wrong way more easy

@RonnyPfannschmidt
Copy link
Member

the key problem with "being able to" in the proposed case is "and also having to ensure/remember to do it in the cases where needed"

@pirate
Copy link

pirate commented Sep 27, 2024

the outlined use-case looks like something that can solve with tryfirst/last as a approximation

None of the examples I gave would really be solved by tryfirst/trylast, as none are actually about being last or first, reversed_ is about maintaining the originally defined relative order / and ordered_ is for customizing relative order at call-time (which tryfirst/trylast cant do).

as all call sites must remember to call the correct hook invocation

Note neither of the two proposed funcs are designed to be used unless they're actually needed, they're explicitly long-named helper functions provided as escape hatches like pm.subset_hook_caller(), and are much less ergonomic than pm.hook.xyz, so I'm not sure I understand the concern with "having to remember which one to use". I was imagining they'd only get a tiny section in the docs explaining the edge case where they'd be needed, just like subset_hook_caller. These are not features I'd direct people to use until they've already tried tryfirst/trylast, but if you need them, there is not really any other option.


My personal use-case is that I've pluginized ArchiveBox, and there are dozens of plugins with lots of different behavior, including user-defined plugins. There's a whole directed graph of dependencies between plugins that can be changed at runtime, and the plugins dont know about their relative order at build time so tryfirst/trylast doesn't work. Controlling relative order at calltime is critical for the system to work.

For example, three of my plugins produce various html outputs when given a URL (singlefile, dom, and wget). Some later plugins depend on whatever html has been collected so far and parse it for article text (using readability/mercury). The user may want to parse article text from only the first html output, so they could order the plugins ['singlefile', 'readability', 'wget', 'dom']. That way readability only runs on the output from singlefile, but wget and dom still run after that. last/first semantics are not enough to express the order here, as:

['singlefile', 'readability', 'wget', 'dom'] (this generates 1 text dump)
!=
['singlefile', 'wget', 'dom', 'readability'] (this will generate 3 article text dumps)
!=
['readability', 'singlefile', 'wget', 'dom'] (this will generate 0 article text dumps)

Now scale this up to 15+ different plugins varying user-by-user depending on what they install + exposed in a UI that allows re-ordering their behavior on each invocation, and it becomes essential to be able to manage order at the callsite.

image
(and this is just the start, I'm building it to support a public marketplace with an ecosystem of hundreds of plugins in the future, like HomeAssistant/NextCloud/etc.)

@pirate
Copy link

pirate commented Sep 27, 2024

This may be a more elegant way to do this @RonnyPfannschmidt: #485 (comment)

@RonnyPfannschmidt
Copy link
Member

Thanks for outlining your use case

There are a few problems

  1. There's no defined order and I'm hesitant to add one, in particular if it's dynamic
  2. I need clarification on the call tree, my initial understanding implies that hook calls nest based on order

I'd expect collection and processing to be part of a pipeline

For example pytest splits collection and test running to be able to reorder or remove test items

@pirate
Copy link

pirate commented Sep 28, 2024

  1. There's no defined order and I'm hesitant to add one, in particular if it's dynamic

Perhaps the docs should be updated to reflect that it's less formalized then? Right now it pretty strongly indicates that hooks are called in LIFO order (based on registration): https://pluggy.readthedocs.io/en/stable/index.html#call-time-order (with the exception of tryfirst/trylast marked hooks and wrapper hooks)

  1. I need clarification on the call tree, my initial understanding implies that hook calls nest based on order

The call tree is linear in my case, not nested, the invocation is just one top-level pm.hook.extract_url(url) which triggers all the extractors in serial order, and they share a common state directory for all output. No hookimpls call other hookimpls.

For example pytest splits collection and test running to be able to reorder or remove test items

I would love to do that but I cannot find a function to get the hookimpl for a given plugin #485 (comment). I feel like that would be a great solution though, as then call order/subsetting/etc. could be arbitrarily controlled at the callsite. Can I submit a PR to add PluginManager.get_hookimpl(plugin_name, hook_name, fallback=None)?

Alternatively, for the least new conceptual surface area, the same could be accomplished by adding a function on HookCaller like HookCaller.get_hookimpls_by_plugin(hook_name) that returns a dictionary of {plugin_name: hookimpl}. It would behave exactly like the existing HookCaller.get_hookimpls, but just return a dict instead of a list.

@RonnyPfannschmidt
Copy link
Member

the results not telling the origin of value are to me a necessary help in decoupling
if you need to know the origin of a value to make sense of it, then there is a design issue

get_hookimpls_by_plugin is unsafe as a plugin may register multiple hooks for the same hook-name

i did some re-reading of the ideas on how reordering changes the number and types of outputs,

its a very terrifying detail in some sense - ideally the final outcome is not dependent on hook order

my basic understanding is that the plan here is to encode a processing pipeline into the hook order, which is kind of terrifying

however i do agree that providing a reordered hook caller for pluggy is helpful

the idea being that we get a way to get a reordered hook caller that deals with priorities

def reordered_hook_caller(
        self, name: str, remove_plugins: Iterable[_Plugin].  prepend_plugins: Iterable[_Plugin].  append_plugins: Iterable[_Plugin],
        #todo: figure more sane arg names/semantics
    ) -> HookCaller:
    # fetch the hook callers, and while respecting tryfirst/trylast - reoder based on remove, prepend and append
    ...

@pirate
Copy link

pirate commented Oct 1, 2024

if you need to know the origin of a value to make sense of it, then there is a design issue

Sure if you only have 3 or 4 plugins that are all designed to play nice with each other. But imagine a Home Assistant / Nextcloud ecosystem where you have hundreds. The more complexity is implemented with plugins the more clear it becomes that there will be situations where you have many outputs in a pile that need to be identified, or situations where plugins need to be skipped or reordered at runtime.

self, name: str, remove_plugins: Iterable[_Plugin].  prepend_plugins: Iterable[_Plugin].  append_plugins: Iterable[_Plugin],

I don't think pluggy needs to implement all of this internally, especially if you're averse to encouraging this pattern in the first place, couldn't pluggy just expose a HookCaller/PluginManager.get_plugin_hooks(plugin: _Plugin, hook_name: str) that returns a list of hookimpls and let the caller deal with it? That seems to conflict the least with the philosophy you're expressing of avoiding pluggy-sanctioned ordering.

@RonnyPfannschmidt
Copy link
Member

again - pluggy is not designed to run a processing graph make it compose one instead of running one

if you have situation dependent distinct needs for different graphs of the processing - hooks are thew wrong tool, i refuse to enable them to do something so distinct -

there ought to be a single topology of hooks at max - if more is needed, then use hooks to compose it instead of hooks to execute it

@RonnyPfannschmidt
Copy link
Member

the use-case you outline literally reads like "i will have hundreds of practically hostile to each other plugins and i want the plugin framework to solve that" to me

make hook that provide the configured processors, and then have your tool sort them into a sensible topology instead

@pirate
Copy link

pirate commented Oct 5, 2024

I think you may have skipped over the second part of my comment. I'm explicitly not asking pluggy to solve any of my topolgy issues.

At this point I just want a very basic function: HookCaller/PluginManager.get_plugin_hooks(plugin: _Plugin, hook_name: str) (or get_hookimpls_by_plugin returning a list for each)

This just returns the list of hookimpls that given plugin has registered that match a given hook name, nothing more than that. No topology solving, no new ordering code added to pluggy. I will implement all the topolgy solving on my own end.

@RonnyPfannschmidt
Copy link
Member

Am i correct in assuming that there will not be a correct was to invoke such a hook in the normal way, similar to how historic hooks habe constraints

@pirate
Copy link

pirate commented Oct 5, 2024

If we return a bound closure method for each hook referencing its self: HookCaller then it should work, because it'll retain self._call_history/self.is_historic as usual. Is there any other code path that would prevent it from just being called like a normal function after that?

@RonnyPfannschmidt
Copy link
Member

The Feature is not sanely possible for historic hooks

My intent is to make sure reorder required hooks cannot be invoked without applying the controll

@pirate
Copy link

pirate commented Oct 5, 2024

What if the function returned the control order with the hooks and left it to the caller:

class HookCaller:
    def hooks_by_plugin(self, hook_name: str):
        return { # or a NamedTuple or Tuple work fine too
            'tryfirst': {
                'plugin_name': [hookimpl1, hookimpl2, ...],
                'plugin_name2': [hookimpl1, hookimpl2, ...],
			     ...,
			},
			'main': {
			    'plugin_name': [somehookimpl, ...],
                'plugin_name2': [somehookimpl, ...],
			     ...,
			},
			'trylast: {
				'plugin_name': [someotherhookimpl, ...],
                'plugin_name2': [someotherhookimpl, ...],
			     ...,
		    },
	    }

That would actually help me display it in a UI very nicely, as I could show the user that there are before/after hooks and enforce that they don't reorder those.

@RonnyPfannschmidt
Copy link
Member

This can be implemented in terms of get hookimpls, im opposed to encode the ux target in pluggy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants