Use Xfuse in Windows #30

Yao-14 · 2021-10-25T02:09:03Z

Hi there,
When I use Xfuse in Windows, I encounter the following error.

🚨 ERROR : RuntimeError: CUDA out of memory. Tried to allocate 108.00 MiB (GPU 0; 4.00 GiB total capacity; 1.75 GiB already allocated; 0 bytes free; 1.88 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

But when I check the available memory of the GPU, it shows that there is still 3GB of unused space. How can I solve this problem?

ludvb · 2021-10-25T11:15:48Z

Hi!

My understanding of this error message is that PyTorch was only able to reserve 1.88 GiB of memory. Maybe the difference to the 3 GB you see is reserved but unallocated in other processes? Nevertheless, it will be difficult to train models on 3 GB VRAM. We have been able to run on 11 GB, but even then only on very small batch sizes.

Besides closing down other processes that may be using the GPU, what you can try is to downsample the experiments more and use smaller patch sizes, but this will of course be at the expense of output resolution. I would aim for something like a 1200x1200 px resolution (by setting the --scale argument in xfuse convert to 1200 divided by the resolution of the image file) and a patch size of maybe 384 or 256 (by setting the attribute patch_size = 384 under the optimization header in the .toml config file passed to xfuse run). The batch_size attribute will likely need to be either 2 or 3.

Yao-14 · 2021-10-25T13:57:53Z

Thank you very much for your reply, your suggestions are very useful to me. However, when I was about to end the operation, I encountered the following misalignment.

🚨 ERROR : IndexError: tensors used as indices must be long, byte or bool tensors
Traceback (most recent call last):
File "e:\conda\lib\site-packages\xfuse\model\experiment\st\st.py", line 519, in model
data = data[idxs - 1]
IndexError: tensors used as indices must be long, byte or bool tensors

Is there an error in the code? The torch version I use is 1.10.0+cu113.

ludvb · 2021-10-25T18:00:34Z

Thanks for the report! I failed to reproduce this on my Linux computer with a clean environment and torch 1.10.0+cu102, so it may be Windows-specific. The idxs tensor originates from here in the dataloader:

xfuse/xfuse/data/slide/data/st_slide.py

Line 169 in 5ac2333

label=torch.as_tensor(label).long(),

As far as I can see, we don't do anything that could change its data type from long. As a quick fix, you could try changing the line where the error occurs to data = data[idxs.long() - 1]. You are welcome to submit a PR if you find a solution!

Yao-14 · 2021-10-26T02:39:27Z

Well, I have tried to run XFuse in Windows successfully, but at the end of the run, there are the following warnings. Do I need to ignore these warnings or how to deal with them?

⚠ WARNING : UserWarning (e:\conda\lib\site-packages\xfuse\session\io.py:25): Failed to store session item "covariates".The error returned was: [Errno 2] No such file or directory: '/dev/null'
⚠ WARNING : UserWarning (e:\conda\lib\site-packages\xfuse\session\io.py:25): Failed to store session item "genes".The error returned was: [Errno 2] No such file or directory: '/dev/null'
⚠ WARNING : UserWarning (e:\conda\lib\site-packages\xfuse\session\io.py:25): Failed to store session item "model".The error returned was: [Errno 2] No such file or directory: '/dev/null'
⚠ WARNING : UserWarning (e:\conda\lib\site-packages\xfuse\session\io.py:25): Failed to store session item "training_data".The error returned was: [Errno 2] No such file or directory: '/dev/null'
⚠ WARNING : UserWarning (e:\conda\lib\site-packages\xfuse\session\io.py:25): Failed to store session item "metagene_expansion_strategy".The error returned was: [Errno 2] No such file or directory: '/dev/null'

ludvb · 2021-10-26T09:33:06Z

These warnings indicate that the model couldn't be saved, so it won't be possible to use the trained model for additional analyses. I have attempted to fix this in #31. If you have time to try it out, any feedback would be great! You can install the fixed version using pip install --force-reinstall --user git+https://github.com/ludvb/xfuse@check-pickle.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Xfuse in Windows #30

Use Xfuse in Windows #30

Yao-14 commented Oct 25, 2021

ludvb commented Oct 25, 2021

Yao-14 commented Oct 25, 2021

ludvb commented Oct 25, 2021

Yao-14 commented Oct 26, 2021

ludvb commented Oct 26, 2021

Use Xfuse in Windows #30

Use Xfuse in Windows #30

Comments

Yao-14 commented Oct 25, 2021

ludvb commented Oct 25, 2021

Yao-14 commented Oct 25, 2021

ludvb commented Oct 25, 2021

Yao-14 commented Oct 26, 2021

ludvb commented Oct 26, 2021