-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Lazily retrieve pipeline stages from PSR-11 container #55
Comments
This is already possible with the current implementation.
The "only" thing that needs to be changed is to remove the typehint from See working example: https://3v4l.org/3qGGe |
Yes, that's roughly what the implementation could look like! To clarify, main reason I raised the feature request is to explore whether having this functionality (and docs, tests, etc) contributed to be part of Pipeline is something maintainers are interested in. If not, users can always build the abstraction themselves. |
I don't think that it should be implemented into this package as it's already possible to provide this. What I could imagine is a pipeline-psr11 package of some sort. This way you can provide an extension to certain other implementations. |
Description
Pipelines are great for data processing. However, there may be cases where the data fed into the pipeline is invalid, causing any stage to fail. That means there can be quite a few pipeline stages that we loaded, configured, et cetera, that are not going to be called. We can optimize performance in these cases by lazily initializing pipeline stages.
Instead of coming up with some bespoke interface to do so, we can instead delegate this to an existing PSR-11 container implementation. PSR-11 can be considered quite mature at this point, and seems like a good match.
So, instead of doing:
We might have something like:
What problem does this solve
As mentioned; lazy loading can do a lot for performance in larger applications. This idea came up because in my application I have a data processing pipeline with various stages that can fail. There are also (class based) stages that interact with a remote database, use configuration files, etc, which are expensive to initialize.
The cleanest way to write these stages would usually be a simple class where dependencies are passed to the constructor and initialization like preparing SQL statements, parsing a configuration file, etc are done in the constructor as well. Then the
__invoke()
method is ready to just do its work.However, that setup is expensive: not only the initialization that happens within the stage itself, but also the dependencies the stage depends upon need already be resolved. For example, if a stage depends on a PDO object to do it's database work, we need to already set up a connection to the database.
That means that if the pipeline is processing some payload that fails during the very first stage (i.e. a validation step fails), we already have done the expensive initialization for all the stages that follow it but that are never going to be invoked.
(A currently possible workaround is passing a container instance into the stages and have them lazily load their dependencies and do setup lazily whenever the stage is first invoked. This adds a lot of code complexity to the stages, and passing a container around like that is a bit of an anti-pattern. Solving this within the Pipeline abstraction would generally make for much nicer code.)
Brainstorm: If we implement this, how?
callable|string
: if it's a callable, it is used directly as a stage. If it's a non-callable string, then it's used as a key to retrieve the stage from the container.ContainerAwarePipeline(Builder)
as in my example above). We would still need to widen thecallable
type used in the interfaces.I'd be happy to do the initial work and make a pull request, if the Pipeline maintainers are interested to have this kind of functionality added.
The text was updated successfully, but these errors were encountered: