Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Segmentation model(s) #18

Closed
atar13 opened this issue Oct 8, 2023 · 1 comment
Closed

Integrate Segmentation model(s) #18

atar13 opened this issue Oct 8, 2023 · 1 comment
Assignees

Comments

@atar13
Copy link
Member

atar13 commented Oct 8, 2023

Subtask of #15.

Depends on #16.

High Level Overview

We should be able to load any one of our segmentation implementations that can be found here: https://github.com/tritonuas/hutzler-571/tree/master/hutzler_571/segmentors.

For this issue, we should focus on using our FCN model. The FCN weights and architecture are taken from torchvision and we perform additional transfer learning with our custom competition dataset.

Regardless of which model we're using, segmentation models should have the following properties:

Inputs:

  • 3 channel (RGB) image.
    • This will probably be represented as an opencv Matrix of size (3xWxH)
    • This image will represent the cropped version of a target after being run through saliency (not the full resolution aerial image).
    • This target will also never be an emergent target (saliency will sort out the emergent vs. classical targets and only pass the classial ones down)

000000002

Outputs:

  • Two 1 channel binary masks
    • Also two opencv matrices of size (1xWxH)
    • One mask to represent the pixels that correspond to the Shape mask and another mask for the Character mask
    • The mask is a 1 channel matrix that has the same width/height of the original image. Pixels that are turned on represent where the shape/character exists and pixels that are turned off are where the class does not exist.

000000001 jpg_shape
000000001 jpg_char

Some models are pytorch models that we will have to load and pass images through (similar to for saliency #17). Other models are just some python code that we need to port over. TODO writeup for more implementation details.

Data Flow Outline

Here I will outline the flow of data for the segmentation stage of the CV pipeline.

Here are a few Python code snippets that illustrate how to use the model and extract the outputs:

Our C++ code will follow most of the same steps.

All the steps in this section will be implemented in the Segmentation::segment method https://github.com/tritonuas/obcpp/blob/feat/cv-orchestrator/include/cv/segmentation.hpp#L24

  • We start off with the cropped target found by the saliency stage. This is currently the only argument to the segment method. To see what's inside you see the struct definition over here. It basically stores the 3-channel color cropped image (same as the one displayed above).
  • Next we need to convert this image to something the model can access as input. The model does not take an openCV matrix but instead various libtorch/Pytorch types. I'm not exactly sure what this specific model takes. We've been working with the saliency model and learned that it takes a std::vector<torch::jit::Ivalue>. I'm not sure if this FCN model will take the same type, but I'm sure it will throw an error if it doesn't receive the right type. If the type is a std::vector<torch::jit::IValue>, then you can do the conversion using the functions from this comment. You may need to call ToTensor first and then pass the output into ToInput which creates a std::vectortorch::jit:IValue that the model can accept.
  • Next, you might need to resize the image. In our old code we always resize to 3x128x128. You might not need to for this model but I would play around with it. To do resizing, you should probably do this before converting to a std::vectortorch::jit::Ivalue and operate on the at::Tensor type. This interpolate function seems like it would do the trick (see discussion here. Feel free to find another way of doing the resizing/interpolation. We could also do the resize when the image is an opencv Matrix (before turning it into a tensor). The OpenCV API seems a lot easier to use so this might be a better option, but it's up to you and whatever works.
  • Once you've got the right type (and possibly the right size), you can pass it in as input to the model.To do so, you can use the model's forward method. Here's an example of how it works. Then, you should get a tensor as output (you might need to call the .toTensor() function from the example code)
  • Next we need to extract out our shape and character mask from the output (see above for examples of masks). This somewhat follows our existing python code. The model returns a dictionary so we must get the value associated with "out" key according to our old code + online examples.
  • Once we've done that, we need to apply the softmax value to the model output. The model actually returns a tensor with logits or unbounded numbers that correlate to a predicted class (shape/char/background) for each pixel in the original image. For example, if the model predicts that a given pixel corresponds to the shape class it might predict a large numerical value in that position compared to the other classes. We don't want our predictions in this format. It would be nicer to have probabilities where for all classes it adds up to 100%. For example, for a single pixel we might have the shape class have a value of 0.8, character of 0.1, and background of 0.1. This prediction means that the model is very confident that the pixel falls under the shape class. To get our data in this format we can use the Softmax function. Read more about the pytorch implementation here. One crucial detail to pay attention to is the dimension that we take the softmax across. We want to compute it across a single pixel value in all three channels (we DONT want to take it across multiple pixel values in one channel lengthwise or widthwise).
  • Once we've computed softmax and have the output of softmax, we can resize it back to it's original size (if we resized it earlier). We'll also have to index out the shape and character masks. The model will return a 3 channel prediction where each channel represents the segmentation mask for that respective class. You'll need to do some indexing to get out the shape and character masks.
  • Once you've separated out the shape and character masks, you should have two tensors. The final step is to convert them back into openCV images to be returned as part of the SegmentationResults struct
  • Then you should be good to return SegmentationResults.

Implemenation Notes

  • First focus on loading the model with libtorch and C++ following the sample code here. We also have it working in one of the integration tests. For the FCN model, you'll want to use the model weights from here. When loading the model file be weary of the paths. If you use a relative path it will likely be relative to the build folder since that's where we run all our commands. With regards to where this goes, I think it makes sense to put this in the constructor of the Segmentation class.
  • For testing, I would do all testing within your own integration test that calls the Segmentation::segment with a testing image loaded from a file. You can add a new .cpp file to tests/integration/ and add lines to CMakeLists.txt to get it to compile https://github.com/tritonuas/obcpp/blob/feat/cv-orchestrator/CMakeLists.txt#L93-L97. Then you can run make executable-name and ./bin/executable-name to run it.
@atar13 atar13 changed the title Integrate Segmentation model Integrate Segmentation model(s) Nov 30, 2023
@shijiew555 shijiew555 self-assigned this Jan 17, 2024
@atar13
Copy link
Member Author

atar13 commented Apr 1, 2024

Closed by #123

@atar13 atar13 closed this as completed Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants