-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Package upload plugin #8698
Comments
I would be very interested by such a plugin, there is no easy way at the moment to use a dask cluster in "development" mode, meaning when you are working on unpackaged source code. |
Spent a few days dealing with this python package upload issues and env/dependency management. Came up with this solution to upload my CWD project and it seems to be working: dask_plugins.py:
Some file using dask client:
I think all dependency uploaders now need to upload dependencies to BOTH the scheduler and workers due to new serialization changes (mentioned here): #7797 |
I've added our implementation in PR #8884 |
Dask's remote task execution is very straightforward, using a function not dependent on an external package. However, the most common use case relies on existing installed libraries or project packages. There are two types of dependencies:
To deliver both dependencies to workers, we do the following:
poetry export -f requirements.txt --output requirements.txt
and builds a docker image remotely using the Kubernetes driver. PipInstall plugin is another way to do it, but it might slow down the cluster starting time till minutes. In our case, it takes less than a minute after image warmup on Kubernetes nodes.While we successfully solved the delivery of extra dependencies to remote worker nodes, this requires a deep understanding of Dask cluster deployment and extra helper functions that do not come with Dask out of the box. I propose improving the Developer's Experience in this direction. I would focus on local source delivery on worker nodes first. To be more specific:
upload_package(module: ModuleType)
as a complimentary function for existing upload_file(path).upload_package()
.venv
packages like Dask-specific modules on remove worker nodes that should simplify the debug process. In the scope of #11160 investigation, I already proved that is possible (please see Can not process datasets created by the older version of Dask dask#11160 (comment))We already have a working prototype of the Worker/Scheduler plugin that performs all the above described. If there is a demand for such a plugin, we look forward to contributing our source. Any comments and suggestions are very welcome 🤗
Here are some usage examples:
Project source uploading to all workers:
We can replace part of the Dask source on all worker nodes for debugging purposes:
Here is an example of an adjusted function:
dask/dask#11160 (comment)
The text was updated successfully, but these errors were encountered: