curious about images and masks in a segmentation model #1567
-
Hi, would be happy to get an explanation on where these two parts meet... here are my thoughts while reading and running the example code:
initially, before looking at the images geotiff (and missing the documentation specifically saying images is 4 channels), I assumed that the in_channels=4 means that there are 3 RGB channels from images and the 4th channel is the mask. probably, I assumed the 4 channels go this way as usage of "Early Fusion" which was mentioned in the IADF workshop at 1:17:03 (see https://youtu.be/R_FhY8aq708?si=-bsGBX6PO0eyuuTX&t=4623) only when changing the example to working with another dataset, realized the in_channels may actually be the images 4 channels and that I am probably missing few points and making inaccurate assumptions. still wondering how images and masks channels related to in_channels. so would be happy to hear how the images&masks, in_channels and early fusion, all connect and work together in torchgeo. thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
@calebrob6 can add more details since he wrote that specific tutorial, but I'll answer the question more generally.
I believe all those thoughts are the correct understanding. "image" datasets take a geospatial bounding box as input and return a dictionary with an "image" key. "mask" datasets do the same but return a dict with a "mask" key. IntersectionDataset simply combines these into a single dict with both keys. The sampler is responsible for choosing a geospatial bounding box from the intersection of both datasets. Let me know if this answers your question or not. |
Beta Was this translation helpful? Give feedback.
Hi @roybenhayun. Our SegmentationTask uses semantic segmentation models from the segmentation_models.pytorch. It takes in images with
in_channels
number of channels. The model then outputs mask predictions. Can you clarify what EarlyFusion you're referring to? We don't do any early fusion in the model.