Replies: 1 comment 1 reply
-
Hi Liam, Thanks for reaching out and describing your issues in detail! Let's take it one at a time:
Good question. There isn't a definitive answer to this, but what we can recommend from experience:
We've only rigorously quantified this in a study we did for the SLEAP paper (Fig. 2c): As you can see, the accuracy is still increasing as we get past 1000 labels, but note the log-scale on the x-axis.
Good question. A common and rigorous way would be to generate three splits of your dataset:
SLEAP does a 90%/10% train/validation split automatically every time you train. This is because, in practice, generating three splits would mean that even less of your labeled data is used for training, and usually the validation and test sets are fairly close anyway. The most rigorous way would account for the fact that the dataset is fairly small and perform cross-validation where you generate 80/10/10 splits many times so that you get a distribution of accuracy metrics evaluated using different samplings of the splits. In practice, this is usually not feasible unless you have access to a lot of GPUs since it would require training many neural networks. Our recommendation is generally to rely on the validation set accuracy, which is what is reported by default within the GUI and saved to the model folder after each run. We also have a notebook on model evaluation that might be helpful for more detailed accuracy analyses. Note that this is all regarding pose estimation performance, not identity tracking which is a whole 'nother can of worms...
The mAP metric is a pretty good holistic summary of the accuracy across your whole dataset (see the paper for details on how it's computed). The values depend a lot on the type of skeleton you have, but it ranges between 0 and 1 (with 1 being the best). Our "gold standard" models usually reach around ~0.8 mAP. The other thing to look at are distributions of distances (Fig 2 in the paper) which tell you about localization error, or the distributions of OKS values (see the metrics notebook) which are a more principled way of scoring pose predictions since it accounts for visibility and body size.
Hmm, this is tricky. What's likely happening is that SLEAP loses track of more than 1 animal for more frames than the window size, so when one of them returns, there's no easy way to figure out which of the original tracks it should be assigned to. Despite having 80 tracks, assuming everything is working correctly, you should find that there are always at most 3 tracks in any frame. We should definitely be able to do better though! Try enabling "Connect Single Track Breaks" and using the "flow" method which can improve the association after crossings like in your screenshot. You may also want to increase the frame window 10. Our tracker is definitely not ideal for harder cases and we should add some more heuristic options that, while not ideal for everyone, might help with proofreading for some cases. A couple of relevant suggestions in #797 and #737. Another might be to have an option to force an assignment to the a maximum of 3 tracks for entire duration of the video, even if we're effectively just randomly guessing at times. If you have any other suggestions, please feel free to post in the Ideas! If tracking is prohibitively bad on your data after trying out some different settings, let us know and perhaps share your data with us over email so we can have a closer look.
We're definitely always open to suggestions! The best way is to improve the model predictions, but I appreciate how that's not always enough. Generally, our approach has been to get the best predictions we can such that 99%+ of frames are good enough, and then we simply delete the egregiously bad ones during proofreading. This is faster than correcting them and we can usually get good enough results by interpolating across missing frames during analysis (provided they happen infrequently enough). Again, we love to hear user feedback on how we can improve the labeling interface and workflow, so don't hesitate to ask for your dream features in Ideas!
This sounds pretty bad! Data integrity is our top priority, so we should definitely follow up on this, especially if you can reproduce it. Do you mind filling out a Bug Report so we can try to reproduce it on our end? Thanks for taking the time to ask all these questions -- I'm sure they'll be super helpful to other users browsing the discussions! Let me know if you have any follow ups! Cheers, Talmo |
Beta Was this translation helpful? Give feedback.
-
Hello!
I am very new to SLEAP and have several questions to help improve my model. I have began by following along the tutorial and seeking help here and reading relevant posts. My lab is hoping to use SLEAP for pose and identity tracking of 3 mice in a small box where they are free to interact. (Below)
I began labeling with random 100 frames across 5 videos. entered the loop of training and correcting the model on 20-100 random frames. I went through this process about 5-6 times (Top-Down) with ultimately 500-600 frames, and roughly 1500-1800 instances in my training set. (1) Do you recommend an certain number of frames to initially label, and is there an average or recommended number of ultimate frames and instances to achieve an accurate model? (2) should I run training on a new video to asses accuracy, and what is the best method of evaluating the accuracy of the trained model?
I stopped this process after I was "satisfied" with the performance of the predictions (I'd estimate 85% accuracy on the final 100 randomly predicted frames. (3)What level of accuracy should I look for in my model?
I next went though with running inference on one entire video (8999 frames). The settings I used are shown below as observed from a previous posts recommendation.
After running In total I had around 80 identified tracks which I had to go through to correct to the 3 actual identities present. (4) Are there better setting which I should use for identity tracking to limit this number. I did use the target instances per frame and cull to count settings (5)However, is there a way to set a target number of identities across the video and specify a 3 animal project?
I then went through to assess the accuracy of the skeleton placement across frames which was about 80% right. Here the only way I could seem to correct the 20% that were miss labeled (scrambled skeleton, backwards skeleton, piece of skeleton labeled on wrong body etc.) was to go through frame by frame making corrections. (6)Is there any way to make corrections across a clip or improve the process of fixing mislabeled skeletons? (I am hoping and imagine that improving the training process will limit the amount of mislabels and improve the accuracy of the model in general).
Lastly, I recently updated SLEAP and lost all the predictions on my videos, left with only the instances I labeled or corrected. This has happened before when leaving the program even after saving multiple times. (7) Is there anyway to save predictions when closing the program?
Thank you for all of your help! Your team has been amazing in answering questions so far.
Please let me know if you would like to know anything more about the process or specifications I have used.
Liam
Beta Was this translation helpful? Give feedback.
All reactions