diff --git a/index.md b/index.md index 1eb04ad..98861f9 100644 --- a/index.md +++ b/index.md @@ -72,6 +72,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [HuggingChat](https://huggingface.co/chat/) | Making the community's best AI chat models available to everyone. | | | Tool | | [Hugging Face API Unity Integration](https://github.com/huggingface/unity-api) | This Unity package provides an easy-to-use integration for the Hugging Face Inference API, allowing developers to access and use Hugging Face AI models within their Unity projects. | | Unity | Tool | | [ImageBind](https://github.com/facebookresearch/ImageBind) | ImageBind One Embedding Space to Bind Them All. |[arXiv](https://arxiv.org/abs/2305.05665) | | Tool | +| [Index-1.9B](https://github.com/bilibili/Index-1.9B) | A SOTA lightweight multilingual LLM. | | | Tool | | [InteractML-Unity](https://github.com/Interactml/iml-unity) | InteractML, an Interactive Machine Learning Visual Scripting framework for Unity3D. | | Unity | Tool | | [InteractML-Unreal Engine](https://github.com/Interactml/iml-ue4) | Bringing Machine Learning to Unreal Engine. | | Unreal Engine | Tool | | [InternLM](https://github.com/InternLM/InternLM) | InternLM has open-sourced a 7 billion parameter base model, a chat model tailored for practical scenarios and the training system. |[arXiv](https://arxiv.org/abs/2403.17297) | | Tool | @@ -90,6 +91,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [LLaSM](https://github.com/LinkSoul-AI/LLaSM) | Large Language and Speech Model. | | | Tool | | [LLM Answer Engine](https://github.com/developersdigest/llm-answer-engine) | Build a Perplexity-Inspired Answer Engine Using Next.js, Groq, Mixtral, Langchain, OpenAI, Brave & Serper. | | | Tool | | [llm.c](https://github.com/karpathy/llm.c) | LLM training in simple, raw C/CUDA. | | | Tool | +| [LLMUnity](https://github.com/undreamai/LLMUnity) | Create characters in Unity with LLMs! | | Unity | Tool | | [LLocalSearch](https://github.com/nilsherzig/LLocalSearch) | LLocalSearch is a completely locally running search engine using LLM Agents. | | | Tool | | [LogicGamesSolver](https://github.com/fabridigua/LogicGamesSolver) | A Python tool to solve logic games with AI, Deep Learning and Computer Vision. | | | Tool | | [Large World Model (LWM)](https://github.com/LargeWorldModel/LWM) | Large World Model (LWM) is a general-purpose large-context multimodal autoregressive model. |[arXiv](https://arxiv.org/abs/2402.08268) | | Tool | @@ -104,6 +106,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [MLC LLM](https://github.com/mlc-ai/mlc-llm) | Enable everyone to develop, optimize and deploy AI models natively on everyone's devices. | | | Tool | | [MobiLlama](https://github.com/mbzuai-oryx/MobiLlama) | Towards Accurate and Lightweight Fully Transparent GPT. |[arXiv](https://arxiv.org/abs/2402.16840) | | Tool | | [MoE-LLaVA](https://github.com/PKU-YuanGroup/MoE-LLaVA) | Mixture of Experts for Large Vision-Language Models. |[arXiv](https://arxiv.org/abs/2401.15947) | | Tool | +| [Moshi](https://www.moshi.chat/?queue_id=talktomoshi) | Moshi is an experimental conversational AI. | | | Tool | | [MOSS](https://github.com/OpenLMLab/MOSS) | An open-source tool-augmented conversational language model from Fudan University. | | | Tool | | [mPLUG-Owl🦉](https://github.com/X-PLUG/mPLUG-Owl) | Modularization Empowers Large Language Models with Multimodality. |[arXiv](https://arxiv.org/abs/2304.14178) | | Tool | | [Nemotron-4](https://arxiv.org/abs/2402.16819) | A 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. |[arXiv](https://arxiv.org/abs/2402.16819) | | Tool | @@ -118,6 +121,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [Perplexica](https://github.com/ItzCrazyKns/Perplexica) | An AI-powered search engine. | | | Tool | | [Pi](https://heypi.com/talk) | AI chatbot designed for personal assistance and emotional support. | | | Tool | | [Qwen1.5](https://github.com/QwenLM/Qwen1.5) | Qwen1.5 is the improved version of Qwen. | | | Tool | +| [Qwen2](https://github.com/QwenLM/Qwen2) | Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud. | | | Tool | | [Qwen-7B](https://github.com/QwenLM/Qwen-7B) | The official repo of Qwen-7B (通义千问-7B) chat & pretrained large language model proposed by Alibaba Cloud. | | | Tool | | [RepoAgent](https://github.com/OpenBMB/RepoAgent) | RepoAgent is an Open-Source project driven by Large Language Models(LLMs) that aims to provide an intelligent way to document projects. |[arXiv](https://arxiv.org/abs/2402.16667) | | Tool | | [Sanity AI Engine](https://github.com/tosos/SanityEngine) | Sanity AI Engine for the Unity Game Development Tool. | | Unity | Tool | @@ -154,6 +158,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [AgentSims](https://github.com/py499372727/AgentSims/) | An Open-Source Sandbox for Large Language Model Evaluation. | | | Agent | | [AI Town](https://github.com/a16z-infra/ai-town) | AI Town is a virtual town where AI characters live, chat and socialize. | | | Agent | | [anime.gf](https://github.com/cyanff/anime.gf) | Local & Open Source Alternative to CharacterAI. | | | Game | +| [Astrocade](https://www.astrocade.com/) | Create games with AI | | | Game | | [Atomic Agents](https://github.com/KennyVaneetvelde/atomic_agents) | The Atomic Agents framework is designed to be modular, extensible, and easy to use. | | | Agent | | [AutoAgents](https://github.com/Link-AGI/AutoAgents) | A Framework for Automatic Agent Generation. | | | Agent | | [AutoGen](https://github.com/microsoft/autogen) | Enable Next-Gen Large Language Model Applications. |[arXiv](https://arxiv.org/abs/2308.08155) | | Agent | @@ -185,8 +190,10 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [Langflow](https://github.com/logspace-ai/langflow) | Langflow is a UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows. | | | Agent | | [LARP](https://github.com/MiAO-AI-Lab/LARP) | Language-Agent Role Play for open-world games. |[arXiv](https://arxiv.org/abs/2312.17653) | | Agent | | [LlamaIndex](https://github.com/run-llama/llama_index) | LlamaIndex is a data framework for your LLM application. | | | Agent | +| [Mixture of Agents (MoA)](https://github.com/togethercomputer/MoA) | Mixture-of-Agents Enhances Large Language Model Capabilities. |[arXiv](https://arxiv.org/abs/2406.04692) | | Agent | | [Moonlander.ai](https://www.moonlander.ai/) | Start building 3D games without any coding using generative AI. | | | Framework | | [MuG Diffusion](https://github.com/Keytoyze/Mug-Diffusion) | MuG Diffusion is a charting AI for rhythm games based on Stable Diffusion (one of the most powerful AIGC models) with a large modification to incorporate audio waves. | | | Game | +| [OmAgent](https://github.com/om-ai-lab/OmAgent) | A multimodal agent framework for solving complex tasks. | | | Agent | | [OpenAgents](https://github.com/xlang-ai/OpenAgents) | An Open Platform for Language Agents in the Wild. | | | Agent | | [Opus](https://opus.ai/) | An AI app that turns text into a video game. | | | Game | | [Pipecat](https://github.com/pipecat-ai/pipecat) | Open Source framework for voice and multimodal conversational AI. | | | Agent | @@ -198,6 +205,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [Translation Agent](https://github.com/andrewyng/translation-agent) | Agentic translation using reflection workflow. | | | Agent | | [Video2Game](https://github.com/video2game/video2game) | Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video. |[arXiv](https://arxiv.org/abs/2404.09833) | | Game | | [V-IRL](https://virl-platform.github.io/) | Grounding Virtual Intelligence in Real Life. |[arXiv](https://arxiv.org/abs/2402.03310) | | Agent | +| [WebDesignAgent](https://github.com/DAMO-NLP-SG/WebDesignAgent) | An agent used for webdesign. | | | Agent | | [XAgent](https://github.com/OpenBMB/XAgent) | An Autonomous LLM Agent for Complex Task Solving. | | | Agent |
@@ -213,6 +221,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [Chapyter](https://github.com/chapyter/chapyter) | ChatGPT Code Interpreter in Jupyter Notebooks. | | | Code | | [CodeGeeX](https://github.com/THUDM/CodeGeeX) | An Open Multilingual Code Generation Model. |[arXiv](https://arxiv.org/abs/2303.17568) | | Code | | [CodeGeeX2](https://github.com/THUDM/CodeGeeX2) | A More Powerful Multilingual Code Generation Model. | | | Code | +| [CodeGeeX4](https://github.com/THUDM/CodeGeeX4) | CodeGeeX4: Open Multilingual Code Generation Model. | | | Code | | [CodeGen](https://github.com/salesforce/CodeGen) | CodeGen is an open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex. |[arXiv](https://arxiv.org/abs/2203.13474) | | Code | | [CodeGen2](https://github.com/salesforce/CodeGen2) | CodeGen2 models for program synthesis. |[arXiv](https://arxiv.org/abs/2305.02309) | | Code | | [Code Llama](https://github.com/facebookresearch/codellama) | Code Llama is a large language models for code based on Llama 2. | | | Code | @@ -250,6 +259,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | :------------------------------------------------------------------------------------------ | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | :-----------: | :-------: | | [AnyDoor](https://ali-vilab.github.io/AnyDoor-Page/) | Zero-shot Object-level Image Customization. |[arXiv](https://arxiv.org/abs/2307.09481) | | Image | | [AnyText](https://github.com/tyxsspa/AnyText) | Multilingual Visual Text Generation And Editing. |[arXiv](https://arxiv.org/abs/2311.03054) | | Image | +| [AutoStudio](https://github.com/donahowe/AutoStudio) | Crafting Consistent Subjects in Multi-turn Interactive Image Generation. |[arXiv](https://arxiv.org/abs/2406.01388) | | Image | | [Blender-ControlNet](https://github.com/coolzilj/Blender-ControlNet) | Using ControlNet right in Blender. | | Blender | Image | | [BriVL](https://github.com/BAAI-WuDao/BriVL) | Bridging Vision and Language Model. |[arXiv](https://arxiv.org/abs/2103.06561) | | Image | | [CLIPasso](https://github.com/yael-vinker/CLIPasso) | A method for converting an image of an object to a sketch, allowing for varying levels of abstraction. |[arXiv](https://arxiv.org/abs/2202.05822) | | Image | @@ -261,6 +271,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [Dashtoon Studio](https://www.dashtoon.ai/) | Dashtoon Studio is an AI powered comic creation platform. | | | Comic | | [DeepAI](https://deepai.org/) | DeepAI offers a suite of tools that use AI to enhance your creativity. | | | Image | | [DeepFloyd IF](https://github.com/deep-floyd/IF) | IF by DeepFloyd Lab at StabilityAI. | | | Image | +| [Depth Anything V2](https://github.com/DepthAnything/Depth-Anything-V2) | Depth Anything V2 |[arXiv](https://arxiv.org/abs/2406.09414) | | Image | | [Depth map library and poser](https://github.com/jexom/sd-webui-depth-lib) | Depth map library for use with the Control Net extension for Automatic1111/stable-diffusion-webui. | | | Image | | [Diffuse to Choose](https://diffuse2choose.github.io/) | Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All. |[arXiv](https://arxiv.org/abs/2401.13795) | | Image | | [Disco Diffusion](https://github.com/alembics/disco-diffusion) | A frankensteinian amalgamation of notebooks, models and techniques for the generation of AI Art and Animations. | | | Image | @@ -282,6 +293,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [InstantID](https://github.com/InstantID/InstantID) | Zero-shot Identity-Preserving Generation in Seconds. |[arXiv](https://arxiv.org/abs/2401.07519) | | Image | | [InternLM-XComposer2](https://github.com/InternLM/InternLM-XComposer) | InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension. |[arXiv](https://arxiv.org/abs/2401.16420) | | Image | | [KOALA](https://youngwanlee.github.io/KOALA/) | Self-Attention Matters in Knowledge Distillation of Latent Diffusion Models for Memory-Efficient and Fast Image Synthesis. | | | Image | +| [Kolors](https://github.com/Kwai-Kolors/Kolors) | Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis. | | | Image | | [KREA](https://www.krea.ai/) | Generate images and videos with a delightful AI-powered design tool. | | | Image | | [LaVi-Bridge](https://github.com/ShihaoZhaoZSH/LaVi-Bridge) | Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation. |[arXiv](https://arxiv.org/abs/2403.07860) | | Image | | [LayerDiffusion](https://github.com/layerdiffusion/LayerDiffusion) | Transparent Image Layer Diffusion using Latent Transparency. |[arXiv](https://arxiv.org/abs/2305.18676) | | Image | @@ -289,10 +301,12 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [LlamaGen](https://github.com/FoundationVision/LlamaGen) | Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation. |[arXiv](https://arxiv.org/abs/2406.06525) | | Image | | [MetaShoot](https://metashoot.vinzi.xyz/) | MetaShoot is a digital twin of a photo studio, developed as a plugin for Unreal Engine that gives any creator the ability to produce highly realistic renders in the easiest and quickest way. | | Unreal Engine | Image | | [Midjourney](https://www.midjourney.com/) | Midjourney is an independent research lab exploring new mediums of thought and expanding the imaginative powers of the human species. | | | Image | +| [MIGC](https://github.com/limuloo/MIGC) | MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis. |[arXiv](https://arxiv.org/abs/2402.05408) | | Image | | [MimicBrush](https://github.com/ali-vilab/MimicBrush) | Zero-shot Image Editing with Reference Imitation. |[arXiv](https://arxiv.org/abs/2406.07547) | | Image | | [Omost](https://github.com/lllyasviel/Omost) | Omost is a project to convert LLM's coding capability to image generation (or more accurately, image composing) capability. | | | Image | | [Openpose Editor](https://github.com/fkunn1326/openpose-editor) | Openpose Editor for AUTOMATIC1111's stable-diffusion-webui. | | | Image | | [Outfit Anyone](https://humanaigc.github.io/outfit-anyone/) | Ultra-high quality virtual try-on for Any Clothing and Any Person. | | | Image | +| [PaintsUndo](https://github.com/lllyasviel/Paints-UNDO) | PaintsUndo: A Base Model of Drawing Behaviors in Digital Paintings. | | | Image | | [PhotoMaker](https://photo-maker.github.io/) | Customizing Realistic Human Photos via Stacked ID Embedding. |[arXiv](https://arxiv.org/abs/2312.04461) | | Image | | [Photoroom](https://www.photoroom.com/backgrounds) | AI Background Generator. | | | Image | | [Plask](https://plask.ai/) | AI image generation in the cloud. | | | Image | @@ -335,6 +349,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [InstructHumans](https://github.com/viridityzhu/InstructHumans) | Editing Animated 3D Human Textures with Instructions. |[arXiv](https://arxiv.org/abs/2404.04037) | | Texture | | [InteX](https://github.com/ashawkey/InTeX) | Interactive Text-to-Texture Synthesis via Unified Depth-aware Inpainting. |[arXiv](https://arxiv.org/abs/2403.11878) | | Texture | | [MaterialSeg3D](https://github.com/PROPHETE-pro/MaterialSeg3D_) | MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets. |[arXiv](https://arxiv.org/abs/2404.13923) | | Texture | +| [MeshAnything](https://github.com/buaacyw/MeshAnything) | MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets. |[arXiv](https://arxiv.org/abs/2406.10163) | | Mesh | | [Neuralangelo](https://github.com/NVlabs/neuralangelo) | High-Fidelity Neural Surface Reconstruction. |[arXiv](https://arxiv.org/abs/2306.03092) | | Texture | | [Paint-it](https://github.com/postech-ami/paint-it) | Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering. | | | Texture | | [Polycam](https://poly.cam/material-generator) | Create your own 3D textures just by typing. | | | Texture | @@ -342,6 +357,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [Text2Tex](https://daveredrum.github.io/Text2Tex/) | Text-driven texture Synthesis via Diffusion Models. |[arXiv](https://arxiv.org/abs/2303.11396) | | Texture | | [Texture Lab](https://www.texturelab.xyz/) | AI-generated texures. You can generate your own with a text prompt. | | | Texture | | [With Poly](https://withpoly.com/browse/textures) | Create Textures With Poly. Generate 3D materials with AI in a free online editor, or search our growing community library. | | | Texture | +| [X-Mesh](https://github.com/xmu-xiaoma666/X-Mesh) | X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance. |[arXiv](https://arxiv.org/abs/2303.15764) | | Texture | @@ -363,12 +379,15 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [BlenderGPT](https://github.com/gd3kr/BlenderGPT) | Use commands in English to control Blender with OpenAI's GPT-4. | | Blender | Model | | [Blender-GPT](https://github.com/TREE-Ind/Blender-GPT) | An all-in-one Blender assistant powered by GPT3/4 + Whisper integration. | | Blender | Model | | [Blockade Labs](https://www.blockadelabs.com/) | Digital alchemy is real with Skybox Lab - the ultimate AI-powered solution for generating incredible 360° skybox experiences from text prompts. | | | Model | +| [CharacterGen](https://github.com/zjp-shadow/CharacterGen) | CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization. |[arXiv](https://arxiv.org/abs/2402.17214) | | 3D | | [chatGPT-maya](https://github.com/LouisRossouw/chatGPT-maya) | Simple Maya tool that utilizes open AI to perform basic tasks based on descriptive instructions. | | Maya | Model | | [CityDreamer](https://github.com/hzxie/city-dreamer) | Compositional Generative Model of Unbounded 3D Cities. |[arXiv](https://arxiv.org/abs/2309.00610) | | 3D | | [CSM](https://www.csm.ai/) | Generate 3D worlds from images and videos. | | | 3D | | [Dash](https://www.polygonflow.io/) | Your Copilot for World Building in Unreal Engine. | | Unreal Engine | 3D | | [DreamGaussian4D](https://github.com/jiawei-ren/dreamgaussian4d) | Generative 4D Gaussian Splatting. |[arXiv](https://arxiv.org/abs/2312.17142) | | 4D | | [DUSt3R](https://github.com/naver/dust3r) | Geometric 3D Vision Made Easy. |[arXiv](https://arxiv.org/abs/2312.14132) | | 3D | +| [GALA3D](https://github.com/VDIGPKU/GALA3D) | GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting. |[arXiv](https://arxiv.org/abs/2402.07207) | | 3D | +| [GaussianCube](https://github.com/GaussianCube/GaussianCube) | A Structured and Explicit Radiance Representation for 3D Generative Modeling. |[arXiv](https://arxiv.org/abs/2403.19655) | | 3D | | [GaussianDreamer](https://github.com/hustvl/GaussianDreamer) | Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors. |[arXiv](https://arxiv.org/abs/2310.08529) | | 3D | | [GenieLabs](https://www.genielabs.tech/) | Empower your game with AI-UGC. | | | 3D | | [HiFA](https://hifa-team.github.io/HiFA-site/) | High-fidelity Text-to-3D with advance Diffusion guidance. | | | Model | @@ -402,6 +421,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [3DTopia](https://github.com/3DTopia/3DTopia) | Text-to-3D Generation within 5 Minutes. |[arXiv](https://arxiv.org/abs/2403.02234) | | 3D | | [threestudio](https://github.com/threestudio-project/threestudio) | A unified framework for 3D content generation. | | | Model | | [TripoSR](https://github.com/VAST-AI-Research/TripoSR) | A state-of-the-art open-source model for fast feedforward 3D reconstruction from a single image. |[arXiv](https://arxiv.org/abs/2403.02151) | | Model | +| [Unique3D](https://github.com/AiuniAI/Unique3D) | High-Quality and Efficient 3D Mesh Generation from a Single Image. |[arXiv](https://arxiv.org/abs/2405.20343) | | 3D | | [UnityGaussianSplatting](https://github.com/aras-p/UnityGaussianSplatting) | Toy Gaussian Splatting visualization in Unity. | | Unity | 3D | | [ViVid-1-to-3](https://github.com/ubc-vision/vivid123) | Novel View Synthesis with Video Diffusion Models. |[arXiv](https://arxiv.org/abs/2312.01305) | | 3D | | [Voxcraft](https://voxcraft.ai/) | Crafting Ready-to-Use 3D Models with AI. | | | 3D | @@ -420,19 +440,25 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [ChatAvatar](https://hyperhuman.deemos.com/chatavatar) | Progressive generation Of Animatable 3D Faces Under Text guidance. | | | Avatar | | [ChatdollKit](https://github.com/uezo/ChatdollKit) | ChatdollKit enables you to make your 3D model into a chatbot. | | Unity | Avatar | | [DreamTalk](https://github.com/ali-vilab/dreamtalk) | When Expressive Talking Head Generation Meets Diffusion Probabilistic Models. |[arXiv](https://arxiv.org/abs/2312.09767) | | Avatar | +| [Duix](https://github.com/GuijiAI/duix.ai) | Duix - Silicon-Based Digital Human SDK 🌐🤖 | | | Avatar | +| [EchoMimic](https://github.com/BadToBest/EchoMimic) | EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions. |[arXiv](https://arxiv.org/abs/2407.08136) | | Avatar | | [EMOPortraits](https://github.com/neeek2303/EMOPortraits) | Emotion-enhanced Multimodal One-shot Head Avatars. | | | Avatar | +| [E3 Gen](https://github.com/olivia23333/E3Gen) | Efficient, Expressive and Editable Avatars Generation. |[arXiv](https://arxiv.org/abs/2405.19203) | | Avatar | | [GeneAvatar](https://github.com/zju3dv/GeneAvatar) | Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image. |[arXiv](https://arxiv.org/abs/2404.02152) | | Avatar | | [GeneFace++](https://github.com/yerfor/GeneFacePlusPlus) | Generalized and Stable Real-Time 3D Talking Face Generation. | | | Avatar | | [Hallo](https://github.com/fudan-generative-vision/hallo) | Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation. |[arXiv](https://arxiv.org/abs/2406.08801) | | Avatar | | [HeadSculpt](https://brandonhan.uk/HeadSculpt/) | Crafting 3D Head Avatars with Text. |[arXiv](https://arxiv.org/abs/2306.03038) | | Avatar | | [Linly-Talker](https://github.com/Kedreamix/Linly-Talker) | Digital Avatar Conversational System. | | | Avatar | +| [LivePortrait](https://github.com/KwaiVGI/LivePortrait) | LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control. |[arXiv](https://arxiv.org/abs/2407.03168) | | Avatar | | [MotionGPT](https://github.com/OpenMotionLab/MotionGPT) | Human Motion as a Foreign Language, a unified motion-language generation model using LLMs. |[arXiv](https://arxiv.org/abs/2306.14795) | | Avatar | | [MusePose](https://github.com/TMElyralab/MusePose) | MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation. | | | Avatar | | [MuseTalk](https://github.com/TMElyralab/MuseTalk) | Real-Time High Quality Lip Synchorization with Latent Space Inpainting. | | | Avatar | | [MuseV](https://github.com/TMElyralab/MuseV) | Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising. | | | Avatar | +| [Portrait4D](https://github.com/YuDeng/Portrait-4D) | Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data. |[arXiv](https://arxiv.org/abs/2311.18729) | | Avatar | | [Ready Player Me](https://readyplayer.me/) | Integrate customizable avatars into your game or app in days. | | | Avatar | | [StyleAvatar3D](https://github.com/icoz69/StyleAvatar3D) | Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation. |[arXiv](https://arxiv.org/abs/2305.19012) | | Avatar | | [Text2Control3D](https://text2control3d.github.io/) | Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model. |[arXiv](https://arxiv.org/abs/2309.03550) | | Avatar | +| [Topo4D](https://github.com/XuanchenLi/Topo4D) | Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture. |[arXiv](https://arxiv.org/abs/2406.00440) | | Avatar | | [UnityAIWithChatGPT](https://github.com/haili1234/UnityAIWithChatGPT) | Based on Unity, ChatGPT+UnityChan voice interactive display is realized. | | Unity | Avatar | | [Vid2Avatar](https://moygcc.github.io/vid2avatar/) | 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition. |[arXiv](https://arxiv.org/abs/2302.11566) | | Avatar | | [VLOGGER](https://enriccorona.github.io/vlogger/) | Multimodal Diffusion for Embodied Avatar Synthesis. | | | Avatar | @@ -478,16 +504,21 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | Source | Description | Paper | Game Engine | Type | | :------------------------------------------------------------------------------------------ | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------: | :-----------: | :-------: | +| [Cambrian-1](https://github.com/cambrian-mllm/cambrian) | Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs. |[arXiv](https://arxiv.org/abs/2406.16860) | | Multimodal LLMs | | [CogVLM2](https://github.com/THUDM/CogVLM2) | GPT4V-level open-source multi-modal model based on Llama3-8B. | | | Visual | | [CoTracker](https://co-tracker.github.io/) | It is Better to Track Together. |[arXiv](https://arxiv.org/abs/2307.07635) | | Visual | | [FaceHi](https://m.facehi.ai/) | It is Better to Track Together. | | | Visual | +| [InternLM-XComposer2](https://github.com/InternLM/InternLM-XComposer) | InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension. |[arXiv](https://arxiv.org/abs/2404.06512) | | Visual | | [LGVI](https://jianzongwu.github.io/projects/rovi/) | Towards Language-Driven Video Inpainting via Multimodal Large Language Models. | | | Visual | | [LLaVA++](https://github.com/mbzuai-oryx/LLaVA-pp) | Extending Visual Capabilities with LLaMA-3 and Phi-3. | | | Visual | | [MaskViT](https://maskedvit.github.io/) | Masked Visual Pre-Training for Video Prediction. |[arXiv](https://arxiv.org/abs/2206.11894) | | Visual | | [MiniCPM-Llama3-V 2.5](https://github.com/OpenBMB/MiniCPM-V) | A GPT-4V Level MLLM on Your Phone. | | | Visual | +| [MoE-LLaVA](https://github.com/PKU-YuanGroup/MoE-LLaVA) | Mixture of Experts for Large Vision-Language Models. |[arXiv](https://arxiv.org/abs/2401.15947) | | Visual | | [MotionLLM](https://github.com/IDEA-Research/MotionLLM) | Understanding Human Behaviors from Human Motions and Videos. |[arXiv](https://arxiv.org/abs/2405.20340) | | Visual | | [PLLaVA](https://github.com/magic-research/PLLaVA) | Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning. |[arXiv](https://arxiv.org/abs/2404.16994) | | Visual | | [Qwen-VL](https://github.com/QwenLM/Qwen-VL) | A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond. |[arXiv](https://arxiv.org/abs/2308.12966) | | Visual | +| [ShareGPT4V](https://github.com/ShareGPT4Omni/ShareGPT4V) | Improving Large Multi-modal Models with Better Captions. |[arXiv](https://arxiv.org/abs/2311.12793) | | Visual | +| [Video-LLaVA](https://github.com/PKU-YuanGroup/Video-LLaVA) | Learning United Visual Representation by Alignment Before Projection. |[arXiv](https://arxiv.org/abs/2311.10122) | | Visual | | [VideoLLaMA 2](https://github.com/DAMO-NLP-SG/VideoLLaMA2) | Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs. |[arXiv](https://arxiv.org/abs/2406.07476) | | Visual | | [Video-MME](https://github.com/BradyFU/Video-MME) | The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis. |[arXiv](https://arxiv.org/abs/2405.21075) | | Visual | | [Vitron](https://github.com/SkyworkAI/Vitron) | A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing. | | | Visual | @@ -513,6 +544,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [CoNR](https://github.com/megvii-research/CoNR) | Genarate vivid dancing videos from hand-drawn anime character sheets(ACS). |[arXiv](https://arxiv.org/abs/2207.05378) | | Video | | [Decohere](https://www.decohere.ai/) | Create what can't be filmed. | | | Video | | [Descript](https://www.descript.com/) | Descript is the simple, powerful , and fun way to edit. | | | Video | +| [Diffutoon](https://github.com/modelscope/DiffSynth-Studio) | High-Resolution Editable Toon Shading via Diffusion Models. |[arXiv](https://arxiv.org/abs/2401.16224) | | Video | | [dolphin](https://github.com/kaleido-lab/dolphin) | General video interaction platform based on LLMs. | | | Video | | [DomoAI](https://domoai.app/) | Amplify Your Creativity with DomoAI. | | | Video | | [DynamiCrafter](https://doubiiu.github.io/projects/DynamiCrafter/) | Animating Open-domain Images with Video Diffusion Priors. |[arXiv](https://arxiv.org/abs/2310.12190) | | Video | @@ -548,6 +580,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [MicroCinema](https://wangyanhui666.github.io/MicroCinema.github.io/) | A Divide-and-Conquer Approach for Text-to-Video Generation. |[arXiv](https://arxiv.org/abs/2311.18829) | | Video | | [Mini-Gemini](https://github.com/dvlab-research/MiniGemini) | Mining the Potential of Multi-modality Vision Language Models. | | | Vision | | [MobileVidFactory](https://arxiv.org/abs/2307.16371) | Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text. | | | Video | +| [MOFA-Video](https://github.com/MyNiuuu/MOFA-Video) | Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model. |[arXiv](https://arxiv.org/abs/2405.20222) | | Video | | [MoneyPrinterTurbo](https://github.com/harry0703/MoneyPrinterTurbo) | Use large models to generate short videos with one click. | | | Video | | [Moonvalley](https://moonvalley.ai/) | Moonvalley is a groundbreaking new text-to-video generative AI model. | | | Video | | [Mora](https://github.com/lichao-sun/Mora) | More like Sora for Generalist Video Generation. |[arXiv](https://arxiv.org/abs/2403.13248) | | Video | @@ -597,6 +630,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [Video LDMs](https://research.nvidia.com/labs/toronto-ai/VideoLDM/) | Align your Latents: High- resolution Video Synthesis with Latent Diffusion Models. |[arXiv](https://arxiv.org/abs/2304.08818) | | Video | | [Video-LLaVA](https://github.com/PKU-YuanGroup/Video-LLaVA) | Learning United Visual Representation by Alignment Before Projection. |[arXiv](https://arxiv.org/abs/2311.10122) | | Video | | [VideoMamba](https://github.com/OpenGVLab/VideoMamba) | State Space Model for Efficient Video Understanding. |[arXiv](https://arxiv.org/abs/2403.06977) | | Video | +| [Video-of-Thought](https://github.com/scofield7419/Video-of-Thought) | Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition. | | | Video | | [VideoPoet](https://sites.research.google/videopoet/) | A large language model for zero-shot video generation. |[arXiv](https://arxiv.org/abs/2312.14125) | | Video | | [Vispunk Motion](https://vispunk.com/video) | Create realistic videos using just text. | | | Video | | [VisualRWKV](https://github.com/howard-hou/VisualRWKV) | VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks. | | | Visual | @@ -623,14 +657,17 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [AudioLDM 2](https://github.com/haoheliu/audioldm2) | Learning Holistic Audio Generation with Self-supervised Pretraining. |[arXiv](https://arxiv.org/abs/2308.05734) | | Audio | | [Auffusion](https://github.com/happylittlecat2333/Auffusion) | Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation. |[arXiv](https://arxiv.org/abs/2401.01044) | | Audio | | [CTAG](https://github.com/PapayaResearch/ctag) | Creative Text-to-Audio Generation via Synthesizer Programming. | | | Audio | +| [FoleyCrafter](https://github.com/open-mmlab/FoleyCrafter) | FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. |[arXiv](https://arxiv.org/abs/2407.01494) | | Audio | | [MAGNeT](https://pages.cs.huji.ac.il/adiyoss-lab/MAGNeT/) | Masked Audio Generation using a Single Non-Autoregressive Transformer. | | | Audio | | [Make-An-Audio](https://text-to-audio.github.io/) | Text-To-Audio Generation with Prompt-Enhanced Diffusion Models. |[arXiv](https://arxiv.org/abs/2301.12661) | | Audio | +| [Make-An-Audio 3](https://github.com/Text-to-Audio/Make-An-Audio-3) | Transforming Text into Audio via Flow-based Large Diffusion Transformers. |[arXiv](https://arxiv.org/abs/2305.18474) | | Audio | | [NeuralSound](https://github.com/hellojxt/NeuralSound) | Learning-based Modal Sound Synthesis with Acoustic Transfer. |[arXiv](https://arxiv.org/abs/2108.07425) | | Audio | | [OptimizerAI](https://www.optimizerai.xyz/) | Sounds for Creators, Game makers, Artists, Video makers. | | | Audio | | [SEE-2-SOUND](https://github.com/see2sound/see2sound) | Zero-Shot Spatial Environment-to-Spatial Sound. |[arXiv](https://arxiv.org/abs/2406.06612) | | Audio | | [SoundStorm](https://google-research.github.io/seanet/soundstorm/examples/) | Efficient Parallel Audio Generation. |[arXiv](https://arxiv.org/abs/2305.09636) | | Audio | | [Stable Audio](https://www.stableaudio.com/) | Fast Timing-Conditioned Latent Audio Diffusion. | | | Audio | | [Stable Audio Open](https://huggingface.co/stabilityai/stable-audio-open-1.0) | Stable Audio Open 1.0 generates variable-length (up to 47s) stereo audio at 44.1kHz from text prompts. | | | Audio | +| [SyncFusion](https://github.com/mcomunita/syncfusion) | SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis. |[arXiv](https://arxiv.org/abs/2310.15247) | | Audio | | [TANGO](https://github.com/declare-lab/tango) | Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model. | | | Audio | | [WavJourney](https://github.com/Audio-AGI/WavJourney) | Compositional Audio Creation with Large Language Models. |[arXiv](https://arxiv.org/abs/2307.14335) | | Audio | @@ -646,6 +683,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [Boomy](https://boomy.com/) | Create generative music. Share it with the world. | | | Music | | [ChatMusician](https://shanghaicannon.github.io/ChatMusician/) | Fostering Intrinsic Musical Abilities Into LLM. | | | Music | | [Chord2Melody](https://github.com/tanreinama/chord2melody) | Automatic Music Generation AI. | | | Music | +| [Diff-BGM](https://github.com/sizhelee/Diff-BGM) | A Diffusion Model for Video Background Music Generation. | [arXiv](https://arxiv.org/abs/2405.11913) | | Music | | [GPTAbleton](https://github.com/BurnedGuitarist/GPTAbleton) | Draft script for processing GPT response and sending the MIDI notes into the Ableton clips with AbletonOSC and python-osc. | | | Music | | [HeyMusic.AI](https://heymusic.ai/zh) | AI Music Generator | | | Music | | [Image to Music](https://imagetomusic.top/) | AI Image to Music Generator is a tool that uses artificial intelligence to convert images into music. | | | Music | @@ -688,6 +726,8 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2) | VITS2 Backbone with multilingual bert. | | | Speech | | [ChatTTS](https://github.com/2noise/ChatTTS) | ChatTTS is a generative speech model for daily dialogue. | | | Speech | | [CLAPSpeech](https://clapspeech.github.io/) | Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training. | [arXiv](https://arxiv.org/abs/2305.10763) | | Speech | +| [CosyVoice](https://github.com/FunAudioLLM/CosyVoice) | Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability. | | | Speech | +| [DEX-TTS](https://github.com/winddori2002/DEX-TTS) | Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability. | [arXiv](https://arxiv.org/abs/2406.19135) | | Speech | | [EmotiVoice](https://github.com/netease-youdao/EmotiVoice) | A Multi-Voice and Prompt-Controlled TTS Engine. | | | Speech | | [Fliki](https://fliki.ai/) | Turn text into videos with AI voices. | | | Speech | | [Glow-TTS](https://github.com/jaywalnut310/glow-tts) | A Generative Flow for Text-to-Speech via Monotonic Alignment Search. | [arXiv](https://arxiv.org/abs/2005.11129) | | Speech | @@ -702,6 +742,7 @@ Here we will keep track of the latest AI Game Development Tools, including LLM, | [OpenVoice](https://github.com/myshell-ai/OpenVoice) | Instant voice cloning by MyShell. | | | Speech | | [OverFlow](https://github.com/shivammehta25/OverFlow) | Putting flows on top of neural transducers for better TTS. | | | Speech | | [RealtimeTTS](https://github.com/KoljaB/RealtimeTTS) | RealtimeTTS is a state-of-the-art text-to-speech (TTS) library designed for real-time applications. | | | Speech | +| [SenseVoice](https://github.com/FunAudioLLM/SenseVoice) | SenseVoice is a speech foundation model with multiple speech understanding capabilities, including automatic speech recognition (ASR), spoken language identification (LID), speech emotion recognition (SER), and audio event detection (AED). | | | Speech | | [SpeechGPT](https://github.com/0nutation/SpeechGPT) | Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities. | [arXiv](https://arxiv.org/abs/2305.11000) | | Speech | | [speech-to-text-gpt3-unity](https://github.com/dr-iskandar/speech-to-text-gpt3-unity) | This is the repo I use Whisper and ChatGPT API from OpenAI in Unity. | | Unity | Speech | | [Stable Speech](https://github.com/sanchit-gandhi/stable-speech) | Stability AI's Text-to-Speech model. | | | Speech |