Release v1.2.2 · healthonrails/annolid

Key Areas of Focus:

Enhanced AI Model Integration & Functionality:

YOLOv11 Support: Integration of YOLOv11n and YOLOv11x models for object detection and instance segmentation, including training capabilities.

Multi-Modal Capabilities: Leveraging models like Florence-2 for image captioning and Molmo for behavior analysis.

Ollama Integration: Ability to chat with Ollama models (like llama3.2-vision) for video frame analysis and caption improvement.

CountGD Integration: Introduction of a new CountGD multi-modal counting app, likely focused on counting small objects.

SAM 2.1 as Default: Setting Segment Anything Model (SAM) 2.1 as the default segmentation model.

Improved Prompt Handling: Enhancements in how text prompts are used for segmentation and other AI tasks.

CPU Fallback: Ensuring operations can fall back to CPU if MPS (Apple Silicon GPU) is not supported.

Improved User Interface and Workflow:

Video Management: Introduction of a Video Manager for importing, loading, and deleting videos.

Interactive Flag Table: An enhanced table for managing user-defined flags with start/end button events, checkboxes/icons, and editing capabilities.

Canvas Screenshot Feature: Ability to take screenshots of the annotation canvas.

Video Frame Navigation: Improvements to navigating video frames.

Behavior Tracking & Analysis: Significant work on behavior analysis features, including:

Loading behavior data with event timestamps.

Creating ethograms for visualizing behavior.

Behavior tracking with defined ranges.

A BehaviorDataset class for PyTorch integration.

Behavior classification models using Transformers and CLIP.

Behavior evaluation modules.

LanceDB Integration: Implementation of LanceDB for image indexing and video frame search.

Recording Widget: Added a widget for video recording functionality.

Improved Caption Handling: Displaying video frame captions in a text edit widget and adding an "Improve Caption" feature using Ollama.

Data Handling & Conversion:

LabelMe to YOLO Conversion: Functionality to convert annotations between LabelMe and YOLO formats.

JSON to CSV Conversion: Added a dialog for converting LabelMe JSON files to CSV.

DAVIS Dataset Conversion: Scripts for converting DAVIS datasets.

Keypoint Extraction: Features to extract and save keypoints from JSON files.

Video Clips Dataset Handling: Support for working with video clips as datasets.

Performance & Stability:

Decord Integration: Utilizing decord for faster random frame access.

Asynchronous Frame Loading: Implementing asynchronous loading for long videos.

Memory Optimizations: Improvements in video frame loading and storage.

Batching for Inference: Writing prediction results to JSON in batches for long videos.

Bug Fixes: Several fixes addressing issues like ONNX reshape errors, caption handling, and MPS support.

Documentation & Tutorials:

Updates to the README and Jupyter Book documentation.

A Colab notebook for YOLOv11 instance segmentation.

Tutorials for converting SLEAP keypoints and labeling for place preference.

Provide feedback