Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with reading 10X VISIUM Cytassist data SpaceRanger Output #76

Closed
thjimmylee opened this issue Aug 11, 2023 · 6 comments · Fixed by #91
Closed

Issues with reading 10X VISIUM Cytassist data SpaceRanger Output #76

thjimmylee opened this issue Aug 11, 2023 · 6 comments · Fixed by #91

Comments

@thjimmylee
Copy link

Hi,
This is a cool spatial tool, but I run into issue might be specific to the new Visium Cytassist SpaceRanger output
For instance, if I directly read the spaceranger output using spatialdata_io.visium, I would get the error below:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[21], line 1
----> 1 spatialdata_io.visium('./spaceranger210_count_47058_WTSI_GRCh38-2020-A')

File ~/mambaforge/envs/spatialdata/lib/python3.9/site-packages/spatialdata_io/readers/visium.py:92, in visium(path, dataset_id, counts_file, fullres_image_file, tissue_positions_file, scalefactors_file, imread_kwargs, image_models_kwargs, **kwargs)
     90         library_id = first_file.replace(f"_{VisiumKeys.COUNTS_FILE}", "")
     91     else:
---> 92         raise ValueError(
     93             f"Cannot determine the library_id. Expecting a file with format <library_id>_{VisiumKeys.COUNTS_FILE}. Has "
     94             f"the files been renamed?"
     95         )
     96     counts_file = f"{library_id}_{VisiumKeys.COUNTS_FILE}"
     97 except IndexError as e:

ValueError: Cannot determine the library_id. Expecting a file with format <library_id>_filtered_feature_bc_matrix.h5. Has the files been renamed?

By reading the error message, I got that the tool was expecting to have a library_id for the matrix.h5 file, which is not essentially included in the spaceranger output, but I renamed it with some random string and it worked, but then I encounter another error message as shown below:

/Users/tl7/mambaforge/envs/spatialdata/lib/python3.9/site-packages/anndata/_core/anndata.py:1840: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
  utils.warn_names_duplicates("var")
/Users/tl7/mambaforge/envs/spatialdata/lib/python3.9/site-packages/anndata/_core/anndata.py:1840: UserWarning: Variable names are not unique. To make them unique, call `.var_names_make_unique`.
  utils.warn_names_duplicates("var")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[22], line 1
----> 1 spatialdata_io.visium('./spaceranger210_count_47058_WTSI_GRCh38-2020-A')

File ~/mambaforge/envs/spatialdata/lib/python3.9/site-packages/spatialdata_io/readers/visium.py:166, in visium(path, dataset_id, counts_file, fullres_image_file, tissue_positions_file, scalefactors_file, imread_kwargs, image_models_kwargs, **kwargs)
    161 transform_hires = Scale(
    162     np.array([scalefactors[VisiumKeys.SCALEFACTORS_HIRES], scalefactors[VisiumKeys.SCALEFACTORS_HIRES]]),
    163     axes=("y", "x"),
    164 )
    165 shapes = {}
--> 166 circles = ShapesModel.parse(
    167     coords,
    168     geometry=0,
    169     radius=scalefactors["spot_diameter_fullres"] / 2.0,
    170     index=adata.obs["spot_id"].copy(),
    171     transformations={
    172         "global": transform_original,
    173         "downscaled_hires": transform_hires,
    174         "downscaled_lowres": transform_lowres,
    175     },
    176 )
    177 shapes[dataset_id] = circles
    178 adata.obs["region"] = dataset_id

File ~/mambaforge/envs/spatialdata/lib/python3.9/functools.py:938, in singledispatchmethod.__get__.<locals>._method(*args, **kwargs)
    936 def _method(*args, **kwargs):
    937     method = self.dispatcher.dispatch(args[0].__class__)
--> 938     return method.__get__(obj, cls)(*args, **kwargs)

File ~/mambaforge/envs/spatialdata/lib/python3.9/site-packages/spatialdata/models/models.py:382, in ShapesModel._(cls, data, geometry, offsets, radius, index, transformations)
    370 @parse.register(np.ndarray)
    371 @classmethod
    372 def _(
   (...)
    379     transformations: MappingToCoordinateSystem_t | None = None,
    380 ) -> GeoDataFrame:
    381     geometry = GeometryType(geometry)
--> 382     data = from_ragged_array(geometry_type=geometry, coords=data, offsets=offsets)
    383     geo_df = GeoDataFrame({"geometry": data})
    384     if GeometryType(geometry).name == "POINT":

File ~/mambaforge/envs/spatialdata/lib/python3.9/site-packages/shapely/_ragged_array.py:440, in from_ragged_array(geometry_type, coords, offsets)
    438 if geometry_type == GeometryType.POINT:
    439     assert offsets is None or len(offsets) == 0
--> 440     return _point_from_flatcoords(coords)
    441 if geometry_type == GeometryType.LINESTRING:
    442     return _linestring_from_flatcoords(coords, *offsets)

File ~/mambaforge/envs/spatialdata/lib/python3.9/site-packages/shapely/_ragged_array.py:303, in _point_from_flatcoords(coords)
    302 def _point_from_flatcoords(coords):
--> 303     result = creation.points(coords)
    305     # Older versions of GEOS (<= 3.9) don't automatically convert NaNs
    306     # to empty points -> do manually
    307     empties = np.isnan(coords).all(axis=1)

File ~/mambaforge/envs/spatialdata/lib/python3.9/site-packages/shapely/decorators.py:77, in multithreading_enabled.<locals>.wrapped(*args, **kwargs)
     75     for arr in array_args:
     76         arr.flags.writeable = False
---> 77     return func(*args, **kwargs)
     78 finally:
     79     for arr, old_flag in zip(array_args, old_flags):

File ~/mambaforge/envs/spatialdata/lib/python3.9/site-packages/shapely/creation.py:74, in points(coords, y, z, indices, out, **kwargs)
     72 coords = _xyz_to_coords(coords, y, z)
     73 if indices is None:
---> 74     return lib.points(coords, out=out, **kwargs)
     75 else:
     76     return simple_geometries_1d(coords, indices, GeometryType.POINT, out=out)

TypeError: ufunc 'points' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Below is the file tree of the spaceranger output:

./spaceranger210_count_47058_WTSI_GRCh38-2020-A
├── _invocation
├── analysis
│   ├── clustering
│   │   ├── gene_expression_graphclust
│   │   │   └── clusters.csv
│   │   ├── gene_expression_kmeans_10_clusters
│   │   │   └── clusters.csv
│   │   ├── gene_expression_kmeans_2_clusters
│   │   │   └── clusters.csv
│   │   ├── gene_expression_kmeans_3_clusters
│   │   │   └── clusters.csv
│   │   ├── gene_expression_kmeans_4_clusters
│   │   │   └── clusters.csv
│   │   ├── gene_expression_kmeans_5_clusters
│   │   │   └── clusters.csv
│   │   ├── gene_expression_kmeans_6_clusters
│   │   │   └── clusters.csv
│   │   ├── gene_expression_kmeans_7_clusters
│   │   │   └── clusters.csv
│   │   ├── gene_expression_kmeans_8_clusters
│   │   │   └── clusters.csv
│   │   └── gene_expression_kmeans_9_clusters
│   │       └── clusters.csv
│   ├── diffexp
│   │   ├── gene_expression_graphclust
│   │   │   └── differential_expression.csv
│   │   ├── gene_expression_kmeans_10_clusters
│   │   │   └── differential_expression.csv
│   │   ├── gene_expression_kmeans_2_clusters
│   │   │   └── differential_expression.csv
│   │   ├── gene_expression_kmeans_3_clusters
│   │   │   └── differential_expression.csv
│   │   ├── gene_expression_kmeans_4_clusters
│   │   │   └── differential_expression.csv
│   │   ├── gene_expression_kmeans_5_clusters
│   │   │   └── differential_expression.csv
│   │   ├── gene_expression_kmeans_6_clusters
│   │   │   └── differential_expression.csv
│   │   ├── gene_expression_kmeans_7_clusters
│   │   │   └── differential_expression.csv
│   │   ├── gene_expression_kmeans_8_clusters
│   │   │   └── differential_expression.csv
│   │   └── gene_expression_kmeans_9_clusters
│   │       └── differential_expression.csv
│   ├── pca
│   │   └── gene_expression_10_components
│   │       ├── components.csv
│   │       ├── dispersion.csv
│   │       ├── features_selected.csv
│   │       ├── projection.csv
│   │       └── variance.csv
│   ├── tsne
│   │   └── gene_expression_2_components
│   │       └── projection.csv
│   └── umap
│       └── gene_expression_2_components
│           └── projection.csv
├── cloupe.cloupe
├── deconvolution
│   ├── deconvolution_k10
│   │   ├── deconvolution_topic_features_k10.csv
│   │   └── deconvolved_spots_k10.csv
│   ├── deconvolution_k11
│   │   ├── deconvolution_topic_features_k11.csv
│   │   └── deconvolved_spots_k11.csv
│   ├── deconvolution_k12
│   │   ├── deconvolution_topic_features_k12.csv
│   │   └── deconvolved_spots_k12.csv
│   ├── deconvolution_k13
│   │   ├── deconvolution_topic_features_k13.csv
│   │   └── deconvolved_spots_k13.csv
│   ├── deconvolution_k14
│   │   ├── deconvolution_topic_features_k14.csv
│   │   └── deconvolved_spots_k14.csv
│   ├── deconvolution_k15
│   │   ├── deconvolution_topic_features_k15.csv
│   │   └── deconvolved_spots_k15.csv
│   ├── deconvolution_k16
│   │   ├── deconvolution_topic_features_k16.csv
│   │   └── deconvolved_spots_k16.csv
│   ├── deconvolution_k17
│   │   ├── deconvolution_topic_features_k17.csv
│   │   └── deconvolved_spots_k17.csv
│   ├── deconvolution_k18
│   │   ├── deconvolution_topic_features_k18.csv
│   │   └── deconvolved_spots_k18.csv
│   ├── deconvolution_k19
│   │   ├── deconvolution_topic_features_k19.csv
│   │   └── deconvolved_spots_k19.csv
│   ├── deconvolution_k2
│   │   ├── deconvolution_topic_features_k2.csv
│   │   └── deconvolved_spots_k2.csv
│   ├── deconvolution_k3
│   │   ├── deconvolution_topic_features_k3.csv
│   │   └── deconvolved_spots_k3.csv
│   ├── deconvolution_k4
│   │   ├── deconvolution_topic_features_k4.csv
│   │   └── deconvolved_spots_k4.csv
│   ├── deconvolution_k5
│   │   ├── deconvolution_topic_features_k5.csv
│   │   └── deconvolved_spots_k5.csv
│   ├── deconvolution_k6
│   │   ├── deconvolution_topic_features_k6.csv
│   │   └── deconvolved_spots_k6.csv
│   ├── deconvolution_k7
│   │   ├── deconvolution_topic_features_k7.csv
│   │   └── deconvolved_spots_k7.csv
│   ├── deconvolution_k8
│   │   ├── deconvolution_topic_features_k8.csv
│   │   └── deconvolved_spots_k8.csv
│   ├── deconvolution_k9
│   │   ├── deconvolution_topic_features_k9.csv
│   │   └── deconvolved_spots_k9.csv
│   ├── dendrogram_k19.png
│   └── dendrogram_k19_distances.png
├── filtered_feature_bc_matrix
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── filtered_feature_bc_matrix.h5
├── metrics_summary.csv
├── molecule_info.h5
├── probe_set.csv
├── raw_feature_bc_matrix
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
├── raw_feature_bc_matrix.h5
├── raw_probe_bc_matrix.h5
├── spaceranger210_count_47058_GRCh38-2020-A.html
├── spatial
│   ├── aligned_fiducials.jpg
│   ├── aligned_tissue_image.jpg
│   ├── cytassist_image.tiff
│   ├── detected_tissue_image.jpg
│   ├── scalefactors_json.json
│   ├── spatial_enrichment.csv
│   ├── tissue_hires_image.png
│   ├── tissue_lowres_image.png
│   ├── tissue_positions.csv
│   └── tissue_positions_list.csv

I am currently using the latest version of Space Ranger 2.0.1 (January 18, 2023).

@LucaMarconato
Copy link
Member

Hi Jimmy, thanks for reporting. Can you try using the latest main version? @giovp worked on a related problem on #51 and therefore it could be fixed now.

Otherwise, @giovp could you please have a look? Maybe we could test the various SpaceRanger versions with scripts in the spatialdata-sandbox that I run nightly, wdyt?

@thjimmylee
Copy link
Author

Hi @LucaMarconato ,
Thanks for your reply. Yes I am using the latest version 0.0.7 that has this error and this is how I read the spacerange output:

import spatialdata_io
sp_data=spatialdata_io.visium('./spaceranger210_count_47058_WTSI_GRCh38-2020-A')

@ilia-kats
Copy link
Contributor

Having the same issue here, and I think it's definitely related to #51. @giovp, which data sets did you test it on? I know that files downloaded from the 10x website do have a library_id prepended, but this is never the case for actual spaceranger output, which is why I had removed it in #44.

@grst
Copy link
Contributor

grst commented Sep 14, 2023

Same issue here... the IO function should ideally support both h5 files with and without library_id prefix.

@benedekp
Copy link

I had the same issue now with the naming of the files, it would be very useful to have the option to load without the prefix.
I also encountered the second error message:
TypeError: ufunc 'points' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
For me it was related to the two types of SpaceRanger outputs : "tissue_positions.csv" and "tissue_positions_list.csv". When using squidpy I have renamed this file to use the sc.read_visium() command and then correcting for the format, that's why it caused now the problem. I see that @thjimmylee also had both files under the spatial folder and probably caused the same mismatched naming and format.

@LucaMarconato
Copy link
Member

The PR #91 should fix the problem.

I haven't made a full test like @giovp did in #51 of the various SpaceRanger versions, but I am testing against three datasets (see details in the PR), including one that doesn't contain the dataset_id in the file name.
Also now I am testing these three datasets in a nightly job, so this should prevent coming back to the same bug in the future.

Please @grst @benedekp @ilia-kats @thjimmylee, if you have the change let me know if this fixes your problem. If not I am happy to be more systematic and include more datasets in the nightly job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants