-
Notifications
You must be signed in to change notification settings - Fork 1
Conversation
e4268ce
to
7fcf23e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually for_wsi_classification
should also accept a list of pretransforms so we can just add pre_transforms in the configs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be generalized the same way the loss factory is
…an read features correctly
…d is set in the writers
…oss. In the future maybe this should be done with an argument from the configs. Now it works with whatever is available
…uld return that for features
merge main branch that includes classification models
Next to the available test I've also confirmed that I can extract features with this model and then use them in a classification pipeline as expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments to begin with.
I think we could use two functions:
read_region_at_scale
and read_region_at_level
. The latter can be used for instance for features, and the first for images/masks. The first can use the SlideImage backend
elif data_format == DataFormat.FEATURE: | ||
size = (num_samples, 1) | ||
mpp = 1.0 | ||
tile_size = (1, 1) | ||
tile_overlap = (0, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make the mpp equal to the factor of the tile size and mpp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not fully understand what you mean here, but these settings are mainly here so that the reader gets them correctly. I can change the mpp as long as the reader also reads at that specific mpp. 1.0 was chosen now for ease
ahcore/cli/tiling.py
Outdated
data_format=DataFormat.COMPRESSED_IMAGE if compression != "none" else DataFormat.IMAGE, | ||
color_profile=color_profile, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't 'none' need to be an Enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, I've looked at it and here the dataformat should always be Image, because the compression here does not make the image a compressed image.
if self._stitching_mode != StitchingMode.CROP: | ||
raise NotImplementedError("Stitching mode other than CROP is not supported for features.") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to check that the overlap is 0, or at least need to think about what this means for the features, and explain it somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added check. ATM I don't see how features with overlap should work so we do not allow it.
|
||
def read_region(self, location: tuple[int, int], level: int, size: tuple[int, int]) -> pyvips.Image: | ||
""" | ||
Reads a region in the stored h5 file. This function stitches the regions as saved in the cache file. Doing this | ||
it takes into account: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention somewhere the difference between this function and the feature function (e.g. interpolation etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments to the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be generalized the same way the loss factory is
ahcore/transforms/pre_transforms.py
Outdated
|
||
@classmethod | ||
def for_wsi_classification( | ||
cls, data_description: DataDescription, requires_target: bool = True | ||
) -> PreTransformTaskFactory: | ||
transforms: list[PreTransformCallable] = [] | ||
|
||
transforms.append(SampleNFeatures(n=1000)) | ||
|
||
if not requires_target: | ||
return cls(transforms) | ||
return cls(transforms, data_description, requires_target) | ||
|
||
index_map = data_description.index_map | ||
if index_map is None: | ||
raise ConfigurationError("`index_map` is required for classification models when the target is required.") | ||
|
||
label_keys = data_description.label_keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrite this in a more factory design, check how this is done with losses and metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've improved the function and added docstring, but I am not sure how I would improve on the factory design more.
raise ValueError(f"Expected features to have a width dimension of 1, got {h}.") | ||
|
||
n_random_indices = ( | ||
np.random.choice(w, self.n, replace=False) if w > self.n else np.random.choice(w, self.n, replace=True) | ||
) | ||
|
||
# Extract the selected columns (indices) from the image | ||
# Create a new image from the selected indices | ||
# todo: this can probably be done without a for-loop quicker | ||
selected_columns = [features.crop(idx, 0, 1, h) for idx in n_random_indices] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to seed on something to make sure that each run gives reproducible values. Maybe the path is a good seed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the seed_everything function in the entrypoints should handle this right?
ahcore/transforms/pre_transforms.py
Outdated
class SelectSpecificLabels: | ||
def __init__(self, keys: list[str] | str): | ||
if isinstance(keys, str): | ||
keys = [keys] | ||
self._keys = keys | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstrings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
using_features = False | ||
|
||
if tile.bands > 4: | ||
# assuming that more than four bands/channels means that we are handling features | ||
using_features = True | ||
tile_ = tile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should use pyvips when we have features. In that case torch.Tensor of numpy arrays should be what we are using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A dlup dataset will always return a pyvips image and it does work as currently implemented, but indeed we could think about if we want to change that.
Can you add a full example on how you run the extraction? Adding it here is fine, afterwards we can see how to add it to documentation. |
The changes in this PR make it possible to load features (h5 or Zarr) from a database.
By changing the backend of the TiledWsiDataset the features can be loaded in a similar way we do with images. At the moment I'm assuming the classification task only, but it should be extendable to segmentation tasks as well. (In segmentation tasks you might want to have multiple "tiles" of features per WSI, in classification you would only want one set of features per WSI).
LMK what you think