-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing to deserialize a Dask Dataframe on a distributed cluster #8038
Comments
The issue was that Fixed by manually installing msgpack:
|
Its June 2024, and I am unable to install/update msgpack-python to its current version 1.0.8. I am using the commands |
We are testing against msgpack-python==1.0.0 and the most recent version. If anybody can provide us with a reproducer that'd be helpful and we'd also adjust pinning accordingly to avoid this problem. I tested against the CSV reproducer above but could not reproduce the issue. |
Background
I'm trying to load a CSV file in with a SLURM cluster and convert it to Parquet, yet every time I perform this conversion with a cluster I receive errors on the worker side. This happens in both a Jupyter notebook and normal Python script. Miniconda3 and all of the code are stored in a directory shared across the controller and worker nodes of the cluster.
EDIT:
I've tried reading the CSV with
LocalCluster()
as well, and still get the exact same errorExample Code
EDIT: Here's a more reproducible example than my previous one. Upon running this, I still get the same error as I did when I was reading data from cloud storage.
The follow error report is generated upon execution of
df.compute()
:Additional Info
I have checked the version of Dask across my cluster with
client.get_versions(check=True)
(per the suggestion from @mrocklin in #2124), which outputs the following:I have also confirmed that
df.compute()
works fine without usingdask.distributed
(albeit I run out of memory, but no deserialization errors are thrown).Environment
2023.7.1
3.11.4
Linux
conda
The text was updated successfully, but these errors were encountered: