✨✨Latest Advances on Multimodal Large Language Models
-
Updated
Dec 13, 2024
✨✨Latest Advances on Multimodal Large Language Models
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Mixture-of-Experts for Large Vision-Language Models
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
[NeurIPS'24 & ICMLW'24] CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
[NeurIPS 2024] Hawk: Learning to Understand Open-World Video Anomalies
Awesome Large Vision-Language Model: A Curated List of Large Vision-Language Model
This is the offical repository of LLAVIDAL
Official code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".
Leverage multimodal large vision language model for quantitative analysis
Add a description, image, and links to the large-vision-language-model topic page so that developers can more easily learn about it.
To associate your repository with the large-vision-language-model topic, visit your repo's landing page and select "manage topics."