Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[24.1] Fix h5ad metadata #18635

Merged
merged 1 commit into from
Aug 9, 2024
Merged

Conversation

nilchia
Copy link
Contributor

@nilchia nilchia commented Aug 1, 2024

This PR fixes h5ad metadata problem.

The metadata of h5ad is not shown correctly in the history:
image

This little change fixes that :)

image

@davelopez davelopez changed the title Fix metadata [24.1] Fix h5ad metadata Aug 5, 2024
@davelopez davelopez requested a review from a team August 5, 2024 09:53
@bernt-matthias
Copy link
Contributor

Wondering if we should add a test for this, analogous to https://github.com/galaxyproject/galaxy/blob/dev/test/unit/data/datatypes/test_qiime2.py

@dannon
Copy link
Member

dannon commented Aug 5, 2024

@bernt-matthias Yes, for sure. I add a visualization test in #18552 that inadvertently exercises this as well, and I wonder if this explains that test failure that I haven't got around to fixing up yet :)

Edit: Followed up on that test failure -- that was due to the test suite not loading the h5 datatype at all, so unrelated, though I'm still +1 on a datatype test here.

@nilchia
Copy link
Contributor Author

nilchia commented Aug 7, 2024

Dear all,
I made a test file (I hope that's correct :) ).
I have an anndata as a test file but I don't know which directory to put that.

@bgruening
Copy link
Member

How large is it? If its too large put it into https://github.com/galaxyproject/galaxy-test-data

@nilchia
Copy link
Contributor Author

nilchia commented Aug 7, 2024

It is 74.0 kB

@bgruening
Copy link
Member

Then have a look where qiime2.qza is stored and put your file next to it :)

@nilchia
Copy link
Contributor Author

nilchia commented Aug 8, 2024

Hi @bgruening @davelopez @bernt-matthias @dannon,
Thanks for your review. Is the test good now?

@@ -1574,7 +1574,7 @@ def _layercountsize(tmp, lennames=0):
dataset.metadata.shape = (int(dataset.metadata.obs_size), int(dataset.metadata.var_size))

def set_peek(self, dataset: DatasetProtocol, **kwd) -> None:
if not dataset.dataset.purged:
if not dataset.file_name_ is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not dataset.file_name_ is None:
if dataset.file_name_ is not None:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you make that change, this looks quite wrong to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it gives this error:

    def set_peek(self, dataset: DatasetProtocol, **kwd) -> None:
>       if dataset.dataset.purged:
E       AttributeError: 'NoneType' object has no attribute 'purged'

the MockDataset class does not have purge attribute:

class MockDataset:
    def __init__(self, id):
        self.id = id
        self.metadata = MockMetadata()
        self.dataset = None
        self.file_name_: Optional[str] = None

    def get_file_name(self, sync_cache=True):
        return self.file_name_

    def set_file_name(self, file_name):
        self.file_name_ = file_name

    def has_data(self):
        return True

    def get_size(self):
        return self.dataset and os.path.getsize(self.dataset.get_file_name())
        ```
        

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't adjust code for unit tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please give me a hint of what I should do?
That class does not have "purged" attribute. am I right?

@@ -1596,7 +1596,7 @@ def _makelayerstrings(layer, count, names):
peekstr += _makelayerstrings("uns", tmp.uns_count, tmp.uns_layers)

dataset.peek = peekstr
dataset.blurb = f"Anndata file ({nice_size(dataset.get_size())})"
dataset.blurb = "Anndata file"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was wrong with the dataset size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gives this error:

lib/galaxy/datatypes/binary.py:1599: in set_peek
    dataset.blurb = f"Anndata file ({nice_size(dataset.get_size())})"
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

size = None

    def nice_size(size: Union[float, int, str, Decimal]) -> str:
        """
        Returns a readably formatted string with the size
    
        >>> nice_size(100)
        '100 bytes'
        >>> nice_size(10000)
        '9.8 KB'
        >>> nice_size(1000000)
        '976.6 KB'
        >>> nice_size(100000000)
        '95.4 MB'
        """
        try:
>           size = float(size)
E           TypeError: float() argument must be a string or a number, not 'NoneType'

Maybe sth is wrong with nice_size ?
it is in init.py

@@ -1574,7 +1574,7 @@ def _layercountsize(tmp, lennames=0):
dataset.metadata.shape = (int(dataset.metadata.obs_size), int(dataset.metadata.var_size))

def set_peek(self, dataset: DatasetProtocol, **kwd) -> None:
if not dataset.dataset.purged:
if dataset.file_name_ is not None:
Copy link
Member

@mvdbeek mvdbeek Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if dataset.file_name_ is not None:
if not dataset.dataset.purged:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is too large, can you create a smaller one please ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fine now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks, we'll use that later for a different type of test.

@mvdbeek
Copy link
Member

mvdbeek commented Aug 8, 2024

You've now tailored the code to the mock implementation and it will not actually work. If the core team decides we want to validate metadata attributes we can do this more systematically. I would be in favor of merging only the fix.

@nilchia
Copy link
Contributor Author

nilchia commented Aug 9, 2024

This fix worked on my local galaxy with only changing layers = list(tmp.attrs) to layers = list(tmp.keys()) without other changes.

But for making a test, the other classes (like MockDataset) are problematic.

@mvdbeek
Copy link
Member

mvdbeek commented Aug 9, 2024

You can add the required attributes to the mock class if you like, but the fix seems fine as is.

Copy link
Member

@mvdbeek mvdbeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @nilchia!

mvdbeek added a commit to galaxyproject/galaxy-test-data that referenced this pull request Aug 9, 2024
@nilchia
Copy link
Contributor Author

nilchia commented Aug 9, 2024

Thank you @mvdbeek!

@mvdbeek mvdbeek merged commit b0705ed into galaxyproject:release_24.1 Aug 9, 2024
43 of 49 checks passed
Copy link

github-actions bot commented Aug 9, 2024

This PR was merged without a "kind/" label, please correct.

@nilchia nilchia deleted the fix_metadata branch August 10, 2024 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants