curious about images and masks in a segmentation model #1567

roybenhayun · 2023-09-18T09:17:52Z

roybenhayun
Sep 18, 2023

Hi,
I was looking at the example at https://github.com/microsoft/satellite-imagery-labeling-tool/blob/main/example/training-example.ipynb and was wondering where the link between the images and the mask occurs.

would be happy to get an explanation on where these two parts meet...

here are my thoughts while reading and running the example code:

images is a 4 channels geotiff, masks is a single channel
SegmentationDataModule extends LightningDataModule and creates an IntersectionDataset from images and masks
SemanticSegmentationTask receives in_channels=4 (as per the images only)
trainer.fit(model=task, datamodule=datamodule) binds the data module and task

initially, before looking at the images geotiff (and missing the documentation specifically saying images is 4 channels), I assumed that the in_channels=4 means that there are 3 RGB channels from images and the 4th channel is the mask. probably, I assumed the 4 channels go this way as usage of "Early Fusion" which was mentioned in the IADF workshop at 1:17:03 (see https://youtu.be/R_FhY8aq708?si=-bsGBX6PO0eyuuTX&t=4623)

only when changing the example to working with another dataset, realized the in_channels may actually be the images 4 channels and that I am probably missing few points and making inaccurate assumptions. still wondering how images and masks channels related to in_channels.

so would be happy to hear how the images&masks, in_channels and early fusion, all connect and work together in torchgeo.

thanks!

Answered by isaaccorley

Sep 19, 2023

Hi @roybenhayun. Our SegmentationTask uses semantic segmentation models from the segmentation_models.pytorch. It takes in images with in_channels number of channels. The model then outputs mask predictions. Can you clarify what EarlyFusion you're referring to? We don't do any early fusion in the model.

View full answer

adamjstewart · 2023-09-18T17:21:10Z

adamjstewart
Sep 18, 2023
Maintainer

@calebrob6 can add more details since he wrote that specific tutorial, but I'll answer the question more generally.

here are my thoughts while reading and running the example code:

I believe all those thoughts are the correct understanding.

"image" datasets take a geospatial bounding box as input and return a dictionary with an "image" key. "mask" datasets do the same but return a dict with a "mask" key. IntersectionDataset simply combines these into a single dict with both keys. The sampler is responsible for choosing a geospatial bounding box from the intersection of both datasets.

Let me know if this answers your question or not.

4 replies

roybenhayun Sep 19, 2023
Author

Hi,
I expected the in_channels would receive 4 channels from the images ds and 1 channel from the mask dataset. and that this is due to usage of Early Fusion.
however, in the example, it receives only 4.

so would be happy to hear how the images&masks, in_channels and early fusion, all connect and work together in torchgeo.

thanks!

isaaccorley Sep 19, 2023
Maintainer

Hi @roybenhayun. Our SegmentationTask uses semantic segmentation models from the segmentation_models.pytorch. It takes in images with in_channels number of channels. The model then outputs mask predictions. Can you clarify what EarlyFusion you're referring to? We don't do any early fusion in the model.

Answer selected by roybenhayun

roybenhayun Sep 19, 2023
Author

thanks @isaaccorley for the reply. it probably clarifies most of the questions - the in_channelsis the number of channels in the input image.

what I referred to by Early Fusion is probably a wrong associative assumption I made from watching the IADF workshop at 1:17:03 and wrongly thought the in_channels should be the concatenation of image+mask channels, 4+1 in this example.

so the in_channels is simply then the number of channels in the input image. cool.

isaaccorley Sep 19, 2023
Maintainer

Ah I see what you mean. So the EarlyFusion method in the video is for change detection where you have a pre/post image of some event and you would concatenate the 2 images together (not the image and mask). In this case the resulting in_channels would be 2 * num_channels_per_image

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

curious about images and masks in a segmentation model #1567

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

curious about images and masks in a segmentation model #1567

roybenhayun Sep 18, 2023

Replies: 1 comment · 4 replies

adamjstewart Sep 18, 2023 Maintainer

roybenhayun Sep 19, 2023 Author

isaaccorley Sep 19, 2023 Maintainer

roybenhayun Sep 19, 2023 Author

isaaccorley Sep 19, 2023 Maintainer

roybenhayun
Sep 18, 2023

Replies: 1 comment 4 replies

adamjstewart
Sep 18, 2023
Maintainer

roybenhayun Sep 19, 2023
Author

isaaccorley Sep 19, 2023
Maintainer

roybenhayun Sep 19, 2023
Author

isaaccorley Sep 19, 2023
Maintainer