Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow for incomplete data ingestion, fix ypix bug, add mean image to caiman seg class #227

Merged
merged 17 commits into from
Mar 4, 2024

Conversation

RichieHakim
Copy link
Contributor

@RichieHakim RichieHakim commented Jun 3, 2023

This PR contains focuses on some changes to the Suite2pSegmentationExtractor class. The following changes are made:

  1. Now allows for incomplete data ingestion. New argument (allow_incomplete_import). If any files are missing (ie F.npy is not there), it can simply ignore it an move on. Also, a new argument (warn_missing_files) toggles printing warnings for missing files.
  2. New arguments for searching within a subdirectory specified by the plane number. This is often not how directories are structured and this toggle removes the user burden of restructuring directories if they choose not to.
  3. Potential bug fix in the orientation of roi_masks. It seems that the code was writen to flip the roi_masks in the y direction. I'm not sure why this is the case. I removed code that ingested this way and code that undid this during saving.
  4. Typo fix in readme.
  5. Add mean image extraction from caiman segmentation class.

@RichieHakim RichieHakim changed the title added new args to allow for incomplete data import and fixed ypix bug allow for incomplete data ingestion, fix ypix bug, add mean image to caiman seg class Jun 18, 2023
@RichieHakim
Copy link
Contributor Author

Transposing the NWB summary image may also be necessary. I can add that to this PR if you agree @CodyCBakerPhD
Screenshot from 2023-06-18 17-29-13

@CodyCBakerPhD
Copy link
Member

@RichieHakim Sorry I haven't had the bandwidth to get to this until now - I'm personally spread across a number of projects and this corner of the ecosystem has often not gotten as much attention (unless we need to add a new format, such as the recent additions of Miniscope and Bruker)

To alleviate this, we'll be bringing in @pauladkisson who will become the key point person for reviewing, dev ops, and other miscellaneous admin responsibilities; I'll be bringing them up to speed on this repository in the coming weeks

In the meantime, I have a couple concerns about this PR being more complicated than it needs to be

We certainly want the SegmentationExtractor for Suite2P to not fail on some target output, and given that you indicate it is possible to have an absence of certain files in your output, I think the general request to enable safe loading of individual streams (potential trace files) is perfectly reasonable

We generally try to avoid try/except patterns as a style whenever possible however, and I'd say it is perfectly acceptable for the time being to simply skip the loading of the numpy file conditional on whether or not the file exists on the system; this reduces the code changes here by quite a bit while maintaining consistency of trace dictionary behavior in the current paradigm (though the very notion of the trace dictionary is something we'd eventually like to refactor out)

  1. Potential bug fix in the orientation of roi_masks. It seems that the code was writen to flip the roi_masks in the y direction.

What I can say in general is that ROIExtractors follows image (height x width) shape convention similar to OpenCV and other image parsing libraries

Unfortunately, this often conflicts with the NWB equivalent, whose schema mandates usage of the (width x height) data arrays instead; hence, why the NWB extractor for any of these fields will also transpose the data from the source

However, given how long ago this class was implemented, and how the individual who originally implemented it is no longer active, I can't actually judge if it is correct or not

Now, if you're saying the flipping being done at https://github.com/catalystneuro/roiextractors/pull/227/files#diff-cdefaf6b12d83c6942b1dcdc3245b73ef75445fac2293e7ca046be26a77d720fL118 and https://github.com/catalystneuro/roiextractors/pull/227/files#diff-cdefaf6b12d83c6942b1dcdc3245b73ef75445fac2293e7ca046be26a77d720fL176-R207 is a bug as far as you can tell, can you do me a favor and raise a separate issue for that and remove the changes from this PR (so this PR can focus purely on the partial loading of streams +/- the Caiman summary improvement?); then we can do a deep dive using some source images as reference to confirm if it's intentional or not

Typo fix in readme.

Thanks, if you're open to making the changes as I've suggested, can you also update the CHANGELOG.md file describing the changes of this PR?

Add mean image extraction from caiman segmentation class.

Back to the question of testing; do you know if the example files on the GIN repo have these new fields that you're adding a functionality for? And if not can you share a version of your files (or a stubbed one that is < 10 MB) that you developed it for?

@RichieHakim
Copy link
Contributor Author

@CodyCBakerPhD

Thanks Cody! I appreciate that you all are maintaining a pretty large amount of code and projects so I'm thankful that you were able to address this PR. Your comments were good.

There are a few changes in here so I'll go one by one

Suite2p class:

  • supporting incomplete ingestion: This is critical and it looks like your changes support this
  • Non-strict directory structure for plane numbers: This is not critical and I'm cool with leaving it as is.

CaImAn class:

  • Fortran ordering of reshaping: I'm 95% sure this is correct but I'd suggest some further testing with new datasets.

NWB class:

  • transposition of image: I can't really conclude on the code links you sent. I'm not sure. This is not data I handle myself. I just plotted the results and it clearly showed that the image was transposed.

Changelog: Will do later today.

Data files:

Back to the question of testing; do you know if the example files on the GIN repo have these new fields that you're adding a functionality for? And if not can you share a version of your files (or a stubbed one that is < 10 MB) that you developed it for?

I don't have access to personally extracted caiman files. The example caiman files on the GIN repo do not have a mean image. I think a recurring theme here is that we are unable to test changes effectively because we don't trust the GIN repo data. It might be a good idea to invest a little bit of time and rerun some example data through the different extraction pipelines (and maybe make a simple pipeline to run new tests as updates to those repos occur).

I think it might be useful to provide a little more context to how I (and others) have been using this repo.
Here is the ingestion class I'm wrapping roiextractors classes with: https://github.com/RichieHakim/ROICaT/blob/dev/roicat/data_importing.py#L1429
and here is a demo on how it is being used: https://github.com/RichieHakim/ROICaT/blob/main/notebooks/jupyter/other/demo_data_importing.ipynb

Let me know if there is anything else I can help with. Thanks again!

@CodyCBakerPhD CodyCBakerPhD linked an issue Jun 29, 2023 that may be closed by this pull request
2 tasks
Comment on lines 93 to 99
def _attempt_load_npy(self, filename, mmap_mode=None) -> np.ndarray | None:
"""Attempt to load the filename located in the current `plane_no` subfolder; return None if file is missing."""

file_path = self.folder_path / f"plane{self.plane_no}" / filename
if not file_path.exists():
return
return np.load(file_path, mmap_mode=mmap_mode, allow_pickle=mmap_mode is None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _attempt_load_npy(self, filename, mmap_mode=None) -> np.ndarray | None:
"""Attempt to load the filename located in the current `plane_no` subfolder; return None if file is missing."""
file_path = self.folder_path / f"plane{self.plane_no}" / filename
if not file_path.exists():
return
return np.load(file_path, mmap_mode=mmap_mode, allow_pickle=mmap_mode is None)
def _attempt_load_npy(self, filename, mmap_mode=None, transpose: bool = False) -> np.ndarray | None:
"""Attempt to load the filename located in the current `plane_no` subfolder; return None if file is missing."""
file_path = self.folder_path / f"plane{self.plane_no}" / filename
if not file_path.exists():
return
array = np.load(file_path, mmap_mode=mmap_mode, allow_pickle=mmap_mode is None)
if transpose:
return array.T
return array

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small improvement for above code reduction

Comment on lines 67 to 85
def try_load_npy(filename, mmap_mode=None, fn_transform=lambda x: x):
"""
This function allows for incomplete import of files.
"""
try:
return fn_transform(self._attempt_load_npy(filename, mmap_mode=mmap_mode))
except FileNotFoundError:
if allow_incomplete_import:
warnings.warn(f"File {filename} not found.") if warn_missing_files else None
return None
else:
raise FileNotFoundError(f"File {filename} not found.")

self.stat = try_load_npy("stat.npy")
self._roi_response_raw = try_load_npy("F.npy", mmap_mode="r", fn_transform=lambda x: x.T)
self._roi_response_neuropil = try_load_npy("Fneu.npy", mmap_mode="r", fn_transform=lambda x: x.T)
self._roi_response_deconvolved = try_load_npy("spks.npy", mmap_mode="r", fn_transform=lambda x: x.T)
self.iscell = try_load_npy("iscell.npy", mmap_mode="r")
self.ops = try_load_npy("ops.npy", fn_transform=lambda x: x.item())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def try_load_npy(filename, mmap_mode=None, fn_transform=lambda x: x):
"""
This function allows for incomplete import of files.
"""
try:
return fn_transform(self._attempt_load_npy(filename, mmap_mode=mmap_mode))
except FileNotFoundError:
if allow_incomplete_import:
warnings.warn(f"File {filename} not found.") if warn_missing_files else None
return None
else:
raise FileNotFoundError(f"File {filename} not found.")
self.stat = try_load_npy("stat.npy")
self._roi_response_raw = try_load_npy("F.npy", mmap_mode="r", fn_transform=lambda x: x.T)
self._roi_response_neuropil = try_load_npy("Fneu.npy", mmap_mode="r", fn_transform=lambda x: x.T)
self._roi_response_deconvolved = try_load_npy("spks.npy", mmap_mode="r", fn_transform=lambda x: x.T)
self.iscell = try_load_npy("iscell.npy", mmap_mode="r")
self.ops = try_load_npy("ops.npy", fn_transform=lambda x: x.item())
self.stat = _attempt_load_npy(filename="stat.npy", mmap_mode="r")
self._roi_response_raw = _attempt_load_npy(filename="F.npy", mmap_mode="r", transpose=True)
self._roi_response_neuropil = _attempt_load_npy(filename="Fneu.npy", mmap_mode="r", transpose=True)
self._roi_response_deconvolved = _attempt_load_npy(filename="spks.npy", mmap_mode="r", transpose=True)
self.iscell = _attempt_load_npy(filename="iscell.npy", mmap_mode="r")
self.ops = _attempt_load_npy(filename="ops.npy", mmap_mode="r")
self.ops = self.ops.item() if self.ops is not None else self.ops

After the first comment regarding attempt_load_npy was accepted (and with extra simplification I just added), it allows us to remove a lot of this

Comment on lines 40 to 44
allow_incomplete_import: bool
If True, will not raise an error if the file is incomplete.
warn_missing_files: bool
If True, will raise a warning if a file is incomplete and
allow_incomplete_import is True.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
allow_incomplete_import: bool
If True, will not raise an error if the file is incomplete.
warn_missing_files: bool
If True, will raise a warning if a file is incomplete and
allow_incomplete_import is True.

With cleanup below, optional warning no longer needed - incomplete import always enabled

Comment on lines 26 to 27
allow_incomplete_import: bool = False,
warn_missing_files: bool = True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
allow_incomplete_import: bool = False,
warn_missing_files: bool = True,

With cleanup below, optional warning no longer needed - incomplete import always enabled

@@ -1,6 +1,7 @@
import shutil
from pathlib import Path
from typing import Optional
import warnings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import warnings

No need for warning either if the rest is cleaned up

@CodyCBakerPhD
Copy link
Member

transposition of image: I can't really conclude on the code links you sent. I'm not sure. This is not data I handle myself. I just plotted the results and it clearly showed that the image was transposed.

I see; thanks for the report - given this PR already has a couple other things going for it, I say let's do a follow-up that specifically checks it for consistency across our repo

As you say, generating some up-to-date examples from each pipeline using some standard base data could be very helpful in this endeavor so I'll task @pauladkisson (who will also be improving the documentation for our project) with handling this in the coming weeks

@RichieHakim
Copy link
Contributor Author

What is the status of this PR? Are we waiting on anything else before merging? Happy to help if you point me in the right direction.

@CodyCBakerPhD
Copy link
Member

What is the status of this PR?

Thanks for checking in - just waiting on the cleanup of the unneeded lines pointed out in my suggestions above

@pauladkisson
Copy link
Member

@RichieHakim, circling back around to this. Do you need any clarification on the suggestions Cody gave?

@CodyCBakerPhD
Copy link
Member

@weiglszonja Can you confirm that most of the approved features from this request were merged in with #234? If so, this can likely be closed

@weiglszonja
Copy link
Contributor

These are addressed in #242

Not added in #242:

  • readme fix (CNNM-E)
  • Add mean image extraction from caiman segmentation class.

@CodyCBakerPhD
Copy link
Member

@weiglszonja Thanks for summarizing

Not added in #242:

readme fix (CNNM-E)
Add mean image extraction from caiman segmentation class.

Looks like the README thing is no longer a problem (fixed at some point on main)

@alessandratrapani Can you take a crack at implementing isolated fixes to the Caiman images as seen from this PR?

@alessandratrapani alessandratrapani self-assigned this Dec 1, 2023
@pauladkisson pauladkisson enabled auto-merge (squash) March 4, 2024 18:48
Copy link

codecov bot commented Mar 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.37%. Comparing base (0f46b2c) to head (1e2ce74).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #227      +/-   ##
==========================================
+ Coverage   79.33%   79.37%   +0.04%     
==========================================
  Files          39       39              
  Lines        3063     3069       +6     
==========================================
+ Hits         2430     2436       +6     
  Misses        633      633              
Flag Coverage Δ
unittests 79.37% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...s/extractors/caiman/caimansegmentationextractor.py 85.18% <100.00%> (+0.87%) ⬆️

@pauladkisson pauladkisson merged commit 7aa2a63 into catalystneuro:main Mar 4, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Import mean or summary images from all segmentation methods
5 participants