Key Areas of Focus:
Enhanced AI Model Integration & Functionality:
YOLOv11 Support: Integration of YOLOv11n and YOLOv11x models for object detection and instance segmentation, including training capabilities.
Multi-Modal Capabilities: Leveraging models like Florence-2 for image captioning and Molmo for behavior analysis.
Ollama Integration: Ability to chat with Ollama models (like llama3.2-vision) for video frame analysis and caption improvement.
CountGD Integration: Introduction of a new CountGD multi-modal counting app, likely focused on counting small objects.
SAM 2.1 as Default: Setting Segment Anything Model (SAM) 2.1 as the default segmentation model.
Improved Prompt Handling: Enhancements in how text prompts are used for segmentation and other AI tasks.
CPU Fallback: Ensuring operations can fall back to CPU if MPS (Apple Silicon GPU) is not supported.
Improved User Interface and Workflow:
Video Management: Introduction of a Video Manager for importing, loading, and deleting videos.
Interactive Flag Table: An enhanced table for managing user-defined flags with start/end button events, checkboxes/icons, and editing capabilities.
Canvas Screenshot Feature: Ability to take screenshots of the annotation canvas.
Video Frame Navigation: Improvements to navigating video frames.
Behavior Tracking & Analysis: Significant work on behavior analysis features, including:
Loading behavior data with event timestamps.
Creating ethograms for visualizing behavior.
Behavior tracking with defined ranges.
A BehaviorDataset class for PyTorch integration.
Behavior classification models using Transformers and CLIP.
Behavior evaluation modules.
LanceDB Integration: Implementation of LanceDB for image indexing and video frame search.
Recording Widget: Added a widget for video recording functionality.
Improved Caption Handling: Displaying video frame captions in a text edit widget and adding an "Improve Caption" feature using Ollama.
Data Handling & Conversion:
LabelMe to YOLO Conversion: Functionality to convert annotations between LabelMe and YOLO formats.
JSON to CSV Conversion: Added a dialog for converting LabelMe JSON files to CSV.
DAVIS Dataset Conversion: Scripts for converting DAVIS datasets.
Keypoint Extraction: Features to extract and save keypoints from JSON files.
Video Clips Dataset Handling: Support for working with video clips as datasets.
Performance & Stability:
Decord Integration: Utilizing decord for faster random frame access.
Asynchronous Frame Loading: Implementing asynchronous loading for long videos.
Memory Optimizations: Improvements in video frame loading and storage.
Batching for Inference: Writing prediction results to JSON in batches for long videos.
Bug Fixes: Several fixes addressing issues like ONNX reshape errors, caption handling, and MPS support.
Documentation & Tutorials:
Updates to the README and Jupyter Book documentation.
A Colab notebook for YOLOv11 instance segmentation.
Tutorials for converting SLEAP keypoints and labeling for place preference.