Adapting SpatialData to naming restrictions #707
Replies: 1 comment
-
Thanks @aeisenbarth, this looks great. The code runs smoothly and can be used to easily adjust for naming restrictions. I found an edge case that is not handled by this code: when the name of a var column contains a A concrete example: the steinbock dataset from spatialdata-sandbox https://github.com/giovp/spatialdata-sandbox/tree/main/steinbock_io has a var column called The edge case should fade when the above-mentioned anndata issue is addressed. |
Beta Was this translation helpful? Give feedback.
-
Since the introduction of naming restrictions in 137e1e0 and #703, the issue can occur that previously created SpatialData files have become invalid in the current SpatialData version. For being able to use them in newer versions, they need to be adapted to the new naming rules.
The rules are as follows:
_
, hyphens-
and dots.
.Alphanumeric includes upper- and lowercase letters of various alphabets (
µ
,福
), and number-like (²
). Whitespace and symbols are excluded (especially those disallowed by file systems like/
,\
)..
,..
, which would have special meanings as file paths.__
.abc
,Abc
,ABC
cannot occur together, but a single one of either case is allowed._index
is a reserved name.With the following code samples, I show how to identify invalid names and how to rename them. The examples work without parsing the files with SpatialData, since it would reject them.
Approach
Collect paths of invalid names. I chose paths because they contain the nesting level and make renaming easier.
It is important to know:
region
(or whatever REGION_KEY is) in tables. This column is stored in binary Zarr files and cannot be edited in a text editor. If that is the case, you need to read the valid SpatialData afterwards and correct the entries in the table.obs
andvar
columns are also referenced in theobs/.zattrs
file (orvar/.zattrs
) in the"column-order"
list. You must also rename the entries in this list.Code
Assume
spatialdata_path
is the file path to the SpatialData.zarr
file.1. Iterate over all elements and check names against the validation functions
Collect SpatialData elements whose names contain forbidden characters:
Collect SpatialData elements whose names are not unique on case-insensitive file systems:
Find table columns with forbidden characters, and table columns with colliding names:
Now that you have collected paths that you need to rename, print them and inspect them.
2. Transform the names into valid names with a transformation function
For simplicity, we continue with this primitive replacement function.
Note that this can cause new name collisions if different forbidden character are replaced by the same. With a more concrete use case, you may rather write them to a CSV file, assign new names and read them back to a dictionary.
3. Rename the paths
For elements:
And for table columns:
Proceed analogously with
duplicate_element_paths
,duplicate_table_columns
.User feedback
We encourage the users to give feedback on the workaround shown here; if this discussion addressed your problems please give a arrow-up.
Beta Was this translation helpful? Give feedback.
All reactions