You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For this issue, we should focus on using our FCN model. The FCN weights and architecture are taken from torchvision and we perform additional transfer learning with our custom competition dataset.
Regardless of which model we're using, segmentation models should have the following properties:
Inputs:
3 channel (RGB) image.
This will probably be represented as an opencv Matrix of size (3xWxH)
This image will represent the cropped version of a target after being run through saliency (not the full resolution aerial image).
This target will also never be an emergent target (saliency will sort out the emergent vs. classical targets and only pass the classial ones down)
Outputs:
Two 1 channel binary masks
Also two opencv matrices of size (1xWxH)
One mask to represent the pixels that correspond to the Shape mask and another mask for the Character mask
The mask is a 1 channel matrix that has the same width/height of the original image. Pixels that are turned on represent where the shape/character exists and pixels that are turned off are where the class does not exist.
Some models are pytorch models that we will have to load and pass images through (similar to for saliency #17). Other models are just some python code that we need to port over. TODO writeup for more implementation details.
Data Flow Outline
Here I will outline the flow of data for the segmentation stage of the CV pipeline.
Here are a few Python code snippets that illustrate how to use the model and extract the outputs:
We start off with the cropped target found by the saliency stage. This is currently the only argument to the segment method. To see what's inside you see the struct definition over here. It basically stores the 3-channel color cropped image (same as the one displayed above).
Next we need to convert this image to something the model can access as input. The model does not take an openCV matrix but instead various libtorch/Pytorch types. I'm not exactly sure what this specific model takes. We've been working with the saliency model and learned that it takes a std::vector<torch::jit::Ivalue>. I'm not sure if this FCN model will take the same type, but I'm sure it will throw an error if it doesn't receive the right type. If the type is a std::vector<torch::jit::IValue>, then you can do the conversion using the functions from this comment. You may need to call ToTensor first and then pass the output into ToInput which creates a std::vectortorch::jit:IValue that the model can accept.
Next, you might need to resize the image. In our old code we always resize to 3x128x128. You might not need to for this model but I would play around with it. To do resizing, you should probably do this before converting to a std::vectortorch::jit::Ivalue and operate on the at::Tensor type. This interpolate function seems like it would do the trick (see discussion here. Feel free to find another way of doing the resizing/interpolation. We could also do the resize when the image is an opencv Matrix (before turning it into a tensor). The OpenCV API seems a lot easier to use so this might be a better option, but it's up to you and whatever works.
Once you've got the right type (and possibly the right size), you can pass it in as input to the model.To do so, you can use the model's forward method. Here's an example of how it works. Then, you should get a tensor as output (you might need to call the .toTensor() function from the example code)
Next we need to extract out our shape and character mask from the output (see above for examples of masks). This somewhat follows our existing python code. The model returns a dictionary so we must get the value associated with "out" key according to our old code + online examples.
Once we've done that, we need to apply the softmax value to the model output. The model actually returns a tensor with logits or unbounded numbers that correlate to a predicted class (shape/char/background) for each pixel in the original image. For example, if the model predicts that a given pixel corresponds to the shape class it might predict a large numerical value in that position compared to the other classes. We don't want our predictions in this format. It would be nicer to have probabilities where for all classes it adds up to 100%. For example, for a single pixel we might have the shape class have a value of 0.8, character of 0.1, and background of 0.1. This prediction means that the model is very confident that the pixel falls under the shape class. To get our data in this format we can use the Softmax function. Read more about the pytorch implementation here. One crucial detail to pay attention to is the dimension that we take the softmax across. We want to compute it across a single pixel value in all three channels (we DONT want to take it across multiple pixel values in one channel lengthwise or widthwise).
Once we've computed softmax and have the output of softmax, we can resize it back to it's original size (if we resized it earlier). We'll also have to index out the shape and character masks. The model will return a 3 channel prediction where each channel represents the segmentation mask for that respective class. You'll need to do some indexing to get out the shape and character masks.
Once you've separated out the shape and character masks, you should have two tensors. The final step is to convert them back into openCV images to be returned as part of the SegmentationResults struct
Then you should be good to return SegmentationResults.
Implemenation Notes
First focus on loading the model with libtorch and C++ following the sample code here. We also have it working in one of the integration tests. For the FCN model, you'll want to use the model weights from here. When loading the model file be weary of the paths. If you use a relative path it will likely be relative to the build folder since that's where we run all our commands. With regards to where this goes, I think it makes sense to put this in the constructor of the Segmentation class.
For testing, I would do all testing within your own integration test that calls the Segmentation::segment with a testing image loaded from a file. You can add a new .cpp file to tests/integration/ and add lines to CMakeLists.txt to get it to compile https://github.com/tritonuas/obcpp/blob/feat/cv-orchestrator/CMakeLists.txt#L93-L97. Then you can run make executable-name and ./bin/executable-name to run it.
The text was updated successfully, but these errors were encountered:
atar13
changed the title
Integrate Segmentation model
Integrate Segmentation model(s)
Nov 30, 2023
Subtask of #15.
Depends on #16.
High Level Overview
We should be able to load any one of our segmentation implementations that can be found here: https://github.com/tritonuas/hutzler-571/tree/master/hutzler_571/segmentors.
For this issue, we should focus on using our FCN model. The FCN weights and architecture are taken from torchvision and we perform additional transfer learning with our custom competition dataset.
Regardless of which model we're using, segmentation models should have the following properties:
Inputs:
Outputs:
Some models are pytorch models that we will have to load and pass images through (similar to for saliency #17). Other models are just some python code that we need to port over. TODO writeup for more implementation details.
Data Flow Outline
Here I will outline the flow of data for the segmentation stage of the CV pipeline.
Here are a few Python code snippets that illustrate how to use the model and extract the outputs:
Our C++ code will follow most of the same steps.
All the steps in this section will be implemented in the
Segmentation::segment
method https://github.com/tritonuas/obcpp/blob/feat/cv-orchestrator/include/cv/segmentation.hpp#L24segment
method. To see what's inside you see the struct definition over here. It basically stores the 3-channel color cropped image (same as the one displayed above).std::vector<torch::jit::Ivalue>
. I'm not sure if this FCN model will take the same type, but I'm sure it will throw an error if it doesn't receive the right type. If the type is astd::vector<torch::jit::IValue>
, then you can do the conversion using the functions from this comment. You may need to call ToTensor first and then pass the output into ToInput which creates a std::vectortorch::jit:IValue that the model can accept."out"
key according to our old code + online examples.SegmentationResults
.Implemenation Notes
Segmentation::segment
with a testing image loaded from a file. You can add a new.cpp
file totests/integration/
and add lines toCMakeLists.txt
to get it to compile https://github.com/tritonuas/obcpp/blob/feat/cv-orchestrator/CMakeLists.txt#L93-L97. Then you can runmake executable-name
and./bin/executable-name
to run it.The text was updated successfully, but these errors were encountered: