You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, we save all our images to a queue. Each additional image seems to be taking 100MB of memory. LUCID says the image buffer size is 20MB but in our observations memory usage is increasing much faster.
Even if the images are only taking 20MB, it will only take a few hundred images to take up all the Jetson's memory and crash it. Assuming you have 4GB free out of 8GB (since other stuff is running with obcpp or the host), it will take 200 images of size 20MB to fill that memory. This is not outside the realm of possibility to occur during flights.
To Reproduce
Steps to reproduce the behavior:
SSH in to Jetson
Checkout feat/lucid-camera branch
Set image taking delay to 0s in camera_lucid.cpp test to quickly take a bunch of images
Compile camera_lucid test and run
Watch memory consumption skyrocket
Potential Solutions
These aren't mutually exclusive so we can do a combination of these.
Option 1: Save images to disk instead of memory
Instead of saving the images to a queue, store them to a file (png or jpg; both should be less size than our memory representation).
Pros:
We have a lot more disk space than memory
We can keep taking images without worrying about running out of memory
Cons:
Disk space is slow and copying data to disk wastes CPU cycles
Have to load image back into memory when running pipeline
We might still run out of disk space. Right now the SD card is only 64 GB and a lot of that space is already taken up by docker images so we're running low on space. So it's not unreasonable to run out space after leaving the camera running for a while. We could mitigate this by putting back the 512GB SD card.
Option 2: Run the pipeline sooner and free image data
Instead of waiting until all the image taking is done we can take a few images, run the pipeline sooner than later so we can free the image buffers when the pipeline (or at least the saliency stage completes).
The implementation of this depends on how long the pipeline takes to run (which is something we have to benchmark).
If the pipeline is quick, we can take pictures and synchronously run the pipeline, free the image and move onto taking the next one. This only works if the pipeline takes a second or two to run.
If the pipeline takes a while, we don't want to block our image taking since we could be missing images of targets while flying over the search area. So we should have two threads running simultaneously: one taking images and another running the pipeline. This works as a producer consumer model where the pipeline consumes the images that the camera produces. The one thing to be careful of here is to make sure image producer doesn't use too much memory since it'll be producing images faster than the pipeline can finish. Also, one thing to keep in mind is the additional memory used by the pipeline (pytorch models and other stuff)
Option 3: Optimize the existing code
There might be a way to optimize how we're storing data in the code. Maybe we can use some lossless compression instead of storing huge WxHxC arrays. It's also possible that the images are being stored multiple times somewhere as well (since it seems to be taking 100MB of data instead of 20MB). To make sure there's no memory leaks we can check with Valgrind since it seems it should work on Jetson in Docker according to the Internet.
The text was updated successfully, but these errors were encountered:
🐛 Bug
Currently, we save all our images to a queue. Each additional image seems to be taking 100MB of memory. LUCID says the image buffer size is 20MB but in our observations memory usage is increasing much faster.
Even if the images are only taking 20MB, it will only take a few hundred images to take up all the Jetson's memory and crash it. Assuming you have 4GB free out of 8GB (since other stuff is running with obcpp or the host), it will take 200 images of size 20MB to fill that memory. This is not outside the realm of possibility to occur during flights.
To Reproduce
Steps to reproduce the behavior:
feat/lucid-camera
branchcamera_lucid.cpp
test to quickly take a bunch of imagescamera_lucid
test and runPotential Solutions
These aren't mutually exclusive so we can do a combination of these.
Option 1: Save images to disk instead of memory
Instead of saving the images to a queue, store them to a file (png or jpg; both should be less size than our memory representation).
Pros:
Cons:
Option 2: Run the pipeline sooner and free image data
Instead of waiting until all the image taking is done we can take a few images, run the pipeline sooner than later so we can free the image buffers when the pipeline (or at least the saliency stage completes).
The implementation of this depends on how long the pipeline takes to run (which is something we have to benchmark).
Option 3: Optimize the existing code
There might be a way to optimize how we're storing data in the code. Maybe we can use some lossless compression instead of storing huge WxHxC arrays. It's also possible that the images are being stored multiple times somewhere as well (since it seems to be taking 100MB of data instead of 20MB). To make sure there's no memory leaks we can check with Valgrind since it seems it should work on Jetson in Docker according to the Internet.
The text was updated successfully, but these errors were encountered: