A fundamental concept of the htc framework is the DataPath
class which is the main interaction point with all image-based actions like reading data cubes, accessing image properties or loading annotations. This tutorial gives hinters if the default classes (like DataPathTivita
) are not enough anymore and you want to write your own data path class. As a general hint, you can explore the existing classes in ../htc/tivita
for examples. In general, datasets are expected to be structured like
YOUR_DATASET
├── data # Your raw data and annotations
│ └── dataset_settings.json # General information about your dataset with easy access in the framework (see below)
└── intermediates # Generated files
It is not necessary to have this structure but maybe makes your life easier ;-)
In the following, some general important concepts are explained but feel free to overwrite any method in the DataPath
class as you wish.
To adapt your data path class to your dataset structure, you can overwrite the default iterate method. Here, you can define the logic to find all images in your dataset and save custom attributes (for example, subject identifiers). Here you can also load your custom dataset settings (see below) and assign it to your data paths. A stub for the iterate
looks as follows:
class DataPathCustom(DataPath):
@staticmethod
def iterate(
data_dir: Path,
filters: list[Callable[["DataPath"], bool]],
annotation_name: Union[str, list[str]],
) -> Iterator["DataPathCustom"]:
# Optional but recommended (see below)
dataset_settings = DatasetSettings(data_dir / "dataset_settings.json")
# Optional, only if needed/available
intermediates_dir = settings.data_dirs.find_intermediates_dir(data_dir)
# Adjust looping according to your dataset structure
for image_dir in sorted(data_dir.iterdir()):
# Add custom attributes as needed
path = DataPathCustom(image_dir, data_dir, intermediates_dir=intermediates_dir, dataset_settings=dataset_settings, annotation_name_default=annotation_name)
if all([f(path) for f in filters]):
yield path
It may be useful to make your annotations for an image available. In the case of segmentations (same spatial shape as the input image), you can overload the read_segmentation()
method. It has no required arguments and expects to return a Numpy array containing the label indices for each pixel:
def read_segmentation(self) -> np.ndarray:
# Or relative to self.intermediates_dir if available
return np.load(self.data_dir / "annotations.npz") # e.g. np.uint8 with shape [480, 640]
It can be very useful to store information which apply to all your images at one place. For this, you can use the DatasetSettings
class which basically loads a global JSON file with attributes for all your images. This object is created once and a reference is stored for each data path path.dataset_settings
so that you can easily access the information. A typical dataset settings may look like
{
"dataset_name": "NAME_OF_YOUR_DATASET",
"data_path_class": "htc.tivita.DataPathMultiorgan>DataPathMultiorgan",
"shape": [
480,
640,
100
],
"shape_names": [
"height",
"width",
"channels"
],
"label_mapping": {
"class_1": 0,
"class_2": 1,
"unlabeled": 255
},
"last_valid_label_index": 1
}
Of course, you can add custom information as you wish.
label_mapping
is especially interesting if you also overload theread_segmentation()
method. This basically gives a meaning to the labels in your segmentation mask (which can again be used for remapping tasks for training).data_path_class
is a very special key. Here you can specify a Python import (module>class
). If your dataset settings is stored indata_dir / "dataset_settings.json"
(i.e. in the top of yourdata
directory), then you can just writeand it will automatically pick up your custom class. This way, you can always use thepaths = list(DataPath.iterate(Path("your path")))
DataPath
class as general entry point.