Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation of xfuse #75

Open
cathalgking opened this issue Aug 24, 2023 · 11 comments
Open

Installation of xfuse #75

cathalgking opened this issue Aug 24, 2023 · 11 comments

Comments

@cathalgking
Copy link

I am trying to install xfuse on my laptop and on our HPC. I run the pip install command as listed on the GitHub but run into an error on both computers, as shown below. Does this package require some dependencies? Can xfuse be installed through conda? Does it matter if I try to install it with pip or pip3 ?

`File "", line 188, in configuration
File "/tmp/pip-build-env-da5v2_l9/overlay/lib/python3.10/site-packages/numpy/distutils/misc_util.py", line 1050, in add_subpackage
config_list = self.get_subpackage(subpackage_name, subpackage_path,
File "/tmp/pip-build-env-da5v2_l9/overlay/lib/python3.10/site-packages/numpy/distutils/misc_util.py", line 1016, in get_subpackage
config = self.get_configuration_from_setup_py(
File "/tmp/pip-build-env-da5v2_l9/overlay/lib/python3.10/site-packages/numpy/distutils/misc_util.py", line 958, in get_configuration_from_setup_py
config = setup_module.configuration(*args)
File "/tmp/pip-install-455o447
/scikit-learn_ed3da863168d40b98d93dbf65283ce18/sklearn/setup.py", line 83, in configuration
cythonize_extensions(top_path, config)
File "/tmp/pip-install-455o447
/scikit-learn_ed3da863168d40b98d93dbf65283ce18/sklearn/_build_utils/init.py", line 70, in cythonize_extensions
config.ext_modules = cythonize(
File "/tmp/pip-build-env-da5v2_l9/overlay/lib/python3.10/site-packages/Cython/Build/Dependencies.py", line 1125, in cythonize
result.get(99999) # seconds
File "/homes/cathal.king/anaconda3/envs/xfuse/lib/python3.10/multiprocessing/pool.py", line 774, in get
raise self._value
Cython.Compiler.Errors.CompileError: sklearn/ensemble/_hist_gradient_boosting/splitting.pyx
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.`

@cathalgking
Copy link
Author

I have tried on a HPC, in a new conda environment. I am still having a problem installing xfuse. The python version that I have is:
Python 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 18:08:17) [GCC 12.2.0] on linux

The command I then try to install with is:
pip install --user git+https://github.com/ludvb/xfuse@master

The error is similar to the above comment. Can this package be installed another way? The contents of the output are shown below:

` File "/tmp/pip-build-env-fbz3xb7k/overlay/lib/python3.11/site-packages/numpy/distutils/misc_util.py", line 958, in _get_configuration_from_setup_py
config = setup_module.configuration(*args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/pip-install-la_vbire/scikit-learn_3cb818c059e343ffa0d5f51ac241aa5c/sklearn/setup.py", line 83, in configuration
cythonize_extensions(top_path, config)
File "/tmp/pip-install-la_vbire/scikit-learn_3cb818c059e343ffa0d5f51ac241aa5c/sklearn/_build_utils/init.py", line 70, in cythonize_extensions
config.ext_modules = cythonize(
^^^^^^^^^^
File "/tmp/pip-build-env-fbz3xb7k/overlay/lib/python3.11/site-packages/Cython/Build/Dependencies.py", line 1125, in cythonize
result.get(99999) # seconds
^^^^^^^^^^^^^^^^^
File "/homes/cathal.king/miniconda3/envs/xfuse2/lib/python3.11/multiprocessing/pool.py", line 774, in get
raise self._value
Cython.Compiler.Errors.CompileError: sklearn/ensemble/_hist_gradient_boosting/splitting.pyx
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.`

@ludvb
Copy link
Owner

ludvb commented Aug 28, 2023

Thanks for reporting. I can reproduce this on my computer. It seems the scikit-learn version used in xfuse does not support Python 3.10 or 3.11. I would recommend using Python 3.8 instead, since that is the version xfuse was developed on. When creating your conda environment, you can specify the Python version to install like this: conda create -n xfuse python=3.8.

@cathalgking
Copy link
Author

@ludvb
Got this installed eventually. However, now I am getting an Index error: "ERROR : IndexError: index 7809 is out of bounds for axis 0 with size 2000"

Does this relate to one of the space ranger files? If so, which one? The error seems unclear.
Screen Shot 2023-08-29 at 2 10 59 pm

@ludvb
Copy link
Owner

ludvb commented Aug 29, 2023

I think this error may happen when there is a mismatch between the files passed to the --image and --tissue-positions arguments, so that the spot coordinates are off. The image file should be the same as the one passed to Space Ranger (note that the image files produced by Space Ranger, tissue_hires_image.png and tissue_lowres_image.png, are not used by xfuse).

@cathalgking
Copy link
Author

cathalgking commented Aug 29, 2023

@ludvb
Before I ran the above, I edited the tissue_positions_list.csv file by deleting the first row which contained barcode,in_tissue,array_row,array_col,pxl_row_in_fullres,pxl_col_in_fullres. This was because of the TypeError I received which was: TypeError: type str doesn't define __round__ method. Would this matter for xfuse? Or would it cause the mismatch?
I also re-named the file to tissue_positions.csv but I don't think that would matter. The file now looks like:
Screen Shot 2023-08-29 at 8 16 25 pm

For the --image parameter, I point to the tissue_hires_image.png file in the spatial directory. If neither of those images you mentioned are used by xfuse, then which image should I use instead?
Screen Shot 2023-08-29 at 8 19 21 pm

@ludvb
Copy link
Owner

ludvb commented Aug 29, 2023

Before I ran the above, I edited the tissue_positions_list.csv file by deleting the first row which contained barcode,in_tissue,array_row,array_col,pxl_row_in_fullres,pxl_col_in_fullres. This was because of the TypeError I received which was: TypeError: type str doesn't define round method. Would this matter for xfuse? Or would it cause the mismatch?

👍 This is fine and necessary. Newer versions of Space Ranger includes a header which xfuse doesn't expect.

I also re-named the file to tissue_positions.csv but I don't think that would matter. The file now looks like:

This should also not be a problem.

For the --image parameter, I point to the tissue_hires_image.png file in the spatial directory. If neither of those images you mentioned are used by xfuse, then which image should I use instead?

The tissue_hires_image.png file is a downsampled version of the original brightfield image. The tissue_positions file specifies spot coordinates in the original image and not the hires image, so xfuse convert only works with the original image. If you don't have access to the original image, it is also possible to convert the positions to coordinates in the hires image by multiplying them by the hires scale factor "tissue_hires_scalef" in the scalefactors_json.json file. If you can get hold of the original brightfield image that is a better option however, since it would avoid resampling the downscaled image.

@cathalgking
Copy link
Author

Pointing to the brightfield image seemed to have worked, thanks!
Only a data.h5 file was made in directory. However, the log file contains a message WARNING : UserWarning (/conda_apps/xfuse_0.2.1/lib/python3.8/site-packages/xfuse/utility/mask.py:74): Failed to mask tissue

And also other messages such as The image resolution is very large! 😱 XFuse typically works best on medium resolution images (approximately 1000x1000 px). If you experience performance issues, please consider reducing the resolution. [2023-08-29 21:49:21,857] WARNING : UserWarning (/conda_apps/xfuse_0.2.1/lib/python3.8/site-packages/xfuse/convert/utility.py:216): Count matrix contains duplicated columns. Counts will be summed by column name.

Do I need to make any considerations for the remaining part of the xfuse analysis because of this output?

@cathalgking
Copy link
Author

The HPC that I use does not have a GPU. Approximately how long would a run for 1 of the samples take to process? My image is approx ~350M and data file is 7.7G data.h5

@ludvb
Copy link
Owner

ludvb commented Aug 29, 2023

Only a data.h5 file was made in directory. However, the log file contains a message WARNING : UserWarning (/conda_apps/xfuse_0.2.1/lib/python3.8/site-packages/xfuse/utility/mask.py:74): Failed to mask tissue

This can sometimes be caused by passing the filtered_feature_bc_matrix.h5 instead of the raw_feature_bc_matrix.h5. The filtered matrix does not contain spots outside the tissue, which xfuse uses to initialize the inside/outside mask from.

And also other messages such as The image resolution is very large!...

xfuse runs best on images that have a height and width in the 1000 to 2000 px range. So if your brightfield image is, for example, 10000x10000, it's a good idea to specify --scale 0.15 to the convert command in order to downscale the image to this range.

The HPC that I use does not have a GPU. Approximately how long would a run for 1 of the samples take to process?

Running on the CPU - while possible - is not really suitable for anything besides testing on small toy datasets. You will see a time estimate when running, but it will probably take several days if not weeks to finish. What you can try is to reduce the --scale further, perhaps aiming for a resolution around 500x500. But please note that results will suffer significantly by doing so. Unfortunately there is no way around this.

@cathalgking
Copy link
Author

So you are saying it is better to use --scale in this scenario? But only at around 0.15? And anything more than that might compromise results? I might have access to a GPU so I would like to get the most accurate results as possible considering.

@ludvb
Copy link
Owner

ludvb commented Sep 4, 2023

So you are saying it is better to use --scale in this scenario? But only at around 0.15? And anything more than that might compromise results?

In my experience, yes, something like that. It also depends on the patch_size that you specify in the config file; I think results are better usually when the training patches capture a quite large part of the tissue area. A theory would be that this makes it easier for the recognition network to know which part of the tissue it is looking at.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants