Highlights
- Pro
Stars
serp-ai / bark-with-voice-clone
Forked from suno-ai/bark🔊 Text-prompted Generative Audio Model - With the ability to clone voices
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
TensorFlow Lite Micro for Espressif Chipsets
Optimised Neural Network functions for Espressif chipsets
Helper functions to create COCO datasets
This repo is an implementation of PyTorch version YOLOV Series
NanoDet-Plus⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥
Gradient based receptive field estimation for Convolutional Neural Networks
Official implementation for DMNet: Density map guided object detection in aerial image (CVPR 2020 EarthVision workshop)
🤘 TT-NN operator library, and TT-Metalium low level kernel programming model.
[NeurlPS2024] One-Step Effective Diffusion Network for Real-World Image Super-Resolution
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
A very compact representation of a placeholder for an image.
Zstandard - Fast real-time compression algorithm
[SIGGRAPH Asia 2024, Journal Track] ToonCrafter: Generative Cartoon Interpolation
This repository contains the official implementation of the research paper, "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" CVPR 2024
[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 2…
Implementations of few-shot object detection benchmarks
An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).
Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones