Home

Awesome

AI Game DevTools (AI-GDT) 🎮

<p align="center"> <img src="AI-Game.png" alt="AI-Game" style="display:block; margin:auto; width:580px;" /> </p>

Here we will keep track of the latest AI Game Development Tools, including LLM, Agent, Code, Writer, Image, Texture, Shader, 3D Model, Animation, Video, Audio, Music, Singing Voice and Analytics. 🔥

Table of Contents

Project List

<span id="tool">Tool (AI LLM)</span>

SourceDescriptionPaperGame EngineType
AgentGPT🤖 Assemble, configure, and deploy autonomous AI Agents in your browser.Tool
AICommandChatGPT integration with Unity Editor.UnityTool
AIOSLLM Agent Operating System.Tool
AI ScientistThe AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery.arXivTool
Assistant CLIA comfortable CLI tool to use ChatGPT service🔥Tool
Auto-GPTAn experimental open-source attempt to make GPT-4 fully autonomous.Tool
BabyAGIThis Python script is an example of an AI-powered task management system.Tool
👶🤖🖥️ BabyAGI UIBabyAGI UI is designed to make it easier to run and develop with babyagi in a web app, like a ChatGPT.Tool
baichuan-7BA large-scale 7B pretraining language model developed by Baichuan.Tool
Baichuan-13BA 13B large language model developed by Baichuan Intelligent Technology.Tool
Baichuan 2A series of large language models developed by Baichuan Intelligent Technology.Tool
BishengBisheng is an open LLM devops platform for next generation AI applications.Tool
Character-LLMA Trainable Agent for Role-Playing.arXivTool
ChatDevCommunicative Agents for Software Development.arXivTool
ChatGPT-API-unityBinds ChatGPT chat completion API to pure C# on Unity.UnityTool
ChatGPTForUnityChatGPT for unity.UnityTool
ChatRWKVChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.Tool
ChatYuanLarge Language Model for Dialogue in Chinese and English.Tool
Chinese-LLaMA-Alpaca-3(Chinese Llama-3 LLMs) developed from Meta Llama 3.Tool
Chrome-GPTAn AutoGPT agent that controls Chrome on your desktop.Tool
CogVLMCogVLM, a powerful open-source visual language foundation model.arXivTool
CoreNetA library for training deep neural networks.Tool
DBRXDBRX is a large language model trained by Databricks.Tool
DCLMDataComp for Language Models.arXivTool
DemoGPTAuto Gen-AI App Generator with the Power of Llama 2Tool
Design2CodeAutomating Front-End EngineeringTool
DevikaDevika is an Agentic AI Software Engineer.Tool
DevonAn open-source pair programmer.Tool
DoraGenerating powerful websites, one prompt at a time.Tool
FlowiseDrag & drop UI to build your customized LLM flow using LangchainJS.Tool
GeminiGemini is built from the ground up for multimodality — reasoning seamlessly across text, images, video, audio, and code.Tool
GemmaGemma is a family of lightweight, state-of-the art open models built from research and technology used to create Google Gemini models.Tool
gemma.cpplightweight, standalone C++ inference engine for Google's Gemma models.Tool
GLM-4GLM-4-9B is the open-source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI.Tool
GPT4AllA chatbot trained on a massive collection of clean assistant data including code, stories and dialogue.Tool
GPT-4oGPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs.Tool
GPTScriptDevelop LLM Apps in Natural Language.Tool
Grok-1The weights and architecture of our 314 billion parameter Mixture-of-Experts model, Grok-1.Tool
HuggingChatMaking the community's best AI chat models available to everyone.Tool
Hugging Face API Unity IntegrationThis Unity package provides an easy-to-use integration for the Hugging Face Inference API, allowing developers to access and use Hugging Face AI models within their Unity projects.UnityTool
ImageBindImageBind One Embedding Space to Bind Them All.arXivTool
Index-1.9BA SOTA lightweight multilingual LLM.Tool
InteractML-UnityInteractML, an Interactive Machine Learning Visual Scripting framework for Unity3D.UnityTool
InteractML-Unreal EngineBringing Machine Learning to Unreal Engine.Unreal EngineTool
InternLMInternLM has open-sourced a 7 billion parameter base model, a chat model tailored for practical scenarios and the training system.arXivTool
InternLM-XComposerInternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.arXivTool
JanBring AI to your Desktop.Tool
LaminiLamini allows any engineering team to outperform general purpose LLMs through RLHF and fine- tuning on their own data.Tool
LaMini-LMLaMini-LM is a collection of small-sized, efficient language models distilled from ChatGPT and trained on a large-scale dataset of 2.58M instructions.Tool
LangChainLangChain is a framework for developing applications powered by language models.Tool
LangFlow⛓️ LangFlow is a UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows.Tool
LaVagueAutomate automation with Large Action Model framework.Tool
LemurOpen Foundation Models for Language Agents.Tool
Lepton AIA Pythonic framework to simplify AI service building.Tool
Lit-LLaMAImplementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training.Tool
llama2-webuiRun Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac).Tool
Llama 3The official Meta Llama 3 GitHub site.Tool
Llama 3.1Llama is an accessible, open large language model (LLM) designed for developers, researchers, and businesses to build, experiment, and responsibly scale their generative AI ideas.Tool
LLaSMLarge Language and Speech Model.Tool
LLM Answer EngineBuild a Perplexity-Inspired Answer Engine Using Next.js, Groq, Mixtral, Langchain, OpenAI, Brave & Serper.Tool
llm.cLLM training in simple, raw C/CUDA.Tool
LLMUnityCreate characters in Unity with LLMs!UnityTool
LLocalSearchLLocalSearch is a completely locally running search engine using LLM Agents.Tool
LogicGamesSolverA Python tool to solve logic games with AI, Deep Learning and Computer Vision.Tool
LongWriterLongWriter: Unleashing 10,000+ Word Generation From Long Context LLMs.arXivTool
Large World Model (LWM)Large World Model (LWM) is a general-purpose large-context multimodal autoregressive model.arXivTool
Lumina-T2XLumina-T2X is a unified framework for Text to Any Modality Generation.arXivTool
MetaGPTThe Multi-Agent FrameworkTool
MiniCPM-2BAn end-side LLM outperforms Llama2-13B.Tool
MiniGPT-4Enhancing Vision-language Understanding with Advanced Large Language Models.arXivTool
MiniGPT-5Interleaved Vision-and-Language Generation via Generative Vokens.arXivTool
Mixtral 8x7BA high quality Sparse Mixture-of-Experts.arXivTool
Mistral 7BThe best 7B model to date, Apache 2.0.Tool
Mistral LargeMistral Large is a new cutting-edge text generation model. It reaches top-tier reasoning capabilities.Tool
MLC LLMEnable everyone to develop, optimize and deploy AI models natively on everyone's devices.Tool
MobiLlamaTowards Accurate and Lightweight Fully Transparent GPT.arXivTool
MoE-LLaVAMixture of Experts for Large Vision-Language Models.arXivTool
MoshiMoshi is an experimental conversational AI.Tool
MoshiMoshi: a speech-text foundation model for real time dialogue.Tool
MOSSAn open-source tool-augmented conversational language model from Fudan University.Tool
mPLUG-Owl🦉Modularization Empowers Large Language Models with Multimodality.arXivTool
Nemotron-4A 15-billion-parameter large multilingual language model trained on 8 trillion text tokens.arXivTool
NExT-GPTAny-to-Any Multimodal Large Language Model.Tool
OLMoOpen Language ModelarXivTool
OmniLMMLarge multi-modal models for strong performance and efficient deployment.Tool
OneLLMOne Framework to Align All Modalities with Language.arXivTool
Open-AssistantOpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.Tool
OpenDevinAn autonomous AI software engineer.Tool
Orion-14BOrion-14B is a family of models includes a 14B foundation LLM, and a series of models.arXivTool
PandaOverseas Chinese open source large language model, based on Llama-7B, -13B, -33B, -65B for continuous pre-training in the Chinese field.Tool
PerplexicaAn AI-powered search engine.Tool
PiAI chatbot designed for personal assistance and emotional support.Tool
Qwen1.5Qwen1.5 is the improved version of Qwen.Tool
Qwen2Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.Tool
Qwen-7BThe official repo of Qwen-7B (通义千问-7B) chat & pretrained large language model proposed by Alibaba Cloud.Tool
RepoAgentRepoAgent is an Open-Source project driven by Large Language Models(LLMs) that aims to provide an intelligent way to document projects.arXivTool
Sanity AI EngineSanity AI Engine for the Unity Game Development Tool.UnityTool
SearchGPT🌳 Connecting ChatGPT with the InternetTool
ShareGPT4VImproving Large Multi-Modal Models with Better Captions.Tool
SkyworkSkywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data.Tool
StableLMStability AI Language Models.arXivTool
Stanford AlpacaAn Instruction-following LLaMA Model.Tool
Text generation web UIA gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, OPT, and GALACTICA.Tool
TinyChatEngineOn-Device LLM Inference Library.Tool
ToolBenchAn open platform for training, serving, and evaluating large language model for tool learning.Tool
Unity ChatGPTUnity ChatGPT Experiments.UnityTool
Unity OpenAI-API IntegrationIntegrate openai GPT-3 language model and ChatGPT API into a Unity project.UnityTool
Unreal Engine 5 Llama LoRAA proof-of-concept project that showcases the potential for using small, locally trainable LLMs to create next-generation documentation tools.Unreal EngineTool
UnrealGPTA collection of Unreal Engine 5 Editor Utility widgets powered by GPT3/4.Unreal EngineTool
Video-LLaVALearning United Visual Representation by Alignment Before Projection.arXivTool
WebGPTRun GPT model on the browser with WebGPU.Tool
Web3-GPTDeploy smart contracts with AITool
WordGPT🤖 Bring the power of ChatGPT to Microsoft WordTool
XAgentAn Autonomous LLM Agent for Complex Task Solving.Tool
YiA series of large language models trained from scratch by developers.Tool
01 ProjectThe open-source language model computer.Tool
<p style="text-align: right;"><a href="#table-of-contents">^ Back to Contents ^</a></p>

<span id="game">Game (Agent)</span>

SourceDescriptionPaperGame EngineType
AgentBenchA Comprehensive Benchmark to Evaluate LLMs as Agents.arXivAgent
Agent Group ChatAn Interactive Group Chat Simulacra For Better Eliciting Collective Emergent Behavior.arXivAgent
Agent KAn autoagentic AGI that is self-evolving and modular.Agent
AgentScopeStart building LLM-empowered multi-agent applications in an easier way.arXivAgent
AgentSimsAn Open-Source Sandbox for Large Language Model Evaluation.Agent
AI TownAI Town is a virtual town where AI characters live, chat and socialize.Agent
anime.gfLocal & Open Source Alternative to CharacterAI.Game
AstrocadeCreate games with AIGame
Atomic AgentsThe Atomic Agents framework is designed to be modular, extensible, and easy to use.Agent
AutoAgentsA Framework for Automatic Agent Generation.Agent
AutoGenEnable Next-Gen Large Language Model Applications.arXivAgent
behaviacBehaviac is a framework of the game AI development.Framework
BiomesBiomes is an open source sandbox MMORPG built for the web using web technologies such as Next.js, Typescript, React and WebAssembly.Game
Buffer of ThoughtsThought-Augmented Reasoning with Large Language Models.arXivAgent
Byzer-AgentEasy, fast, and distributed agent framework for everyone.Agent
Cat TownA C(h)atGPT-powered simulation with cats.Agent
Cat TownA C(h)atGPT-powered simulation with cats.Agent
CharacterGLMCustomizing Chinese Conversational AI Characters with Large Language Models.arXivAgent
ChatDevCommunicative Agents for Software Development.arXivAgent
CogAgentCogAgent is an open-source visual language model improved based on CogVLM.arXivAgent
CradleTowards General Computer Control.Agent
crewAIFramework for orchestrating role-playing, autonomous AI agents.Agent
DifyDify is an open-source LLM app building platform.Agent
Digital Life ProjectAutonomous 3D Characters with Social Intelligence.arXivAgent
everything-aiYour fully proficient, AI-powered and local chatbot assistant🤖.Agent
fabricfabric is an open-source framework for augmenting humans using AI.Agent
FastGPTFastGPT is a knowledge-based platform built on the LLM.Agent
fastRAGEfficient Retrieval Augmentation and Generation Framework.Agent
GameAISDKImage-based game AI automation framework.Framework
GameNGenDiffusion Models Are Real-Time Game Engines.arXivGame
GameGen-OGameGen-O: Open-world Video Game Generation.Game
GenAgentGenAgent: Build Collaborative AI Systems with Automated Workflow Generation - Case Studies on ComfyUI.arXivAgent
Generative AgentsInteractive Simulacra of Human Behavior.arXivAgent
GenesisGenesis: A Generative and Universal Physics Engine for Robotics and Beyond.Game
GenieGenerative Interactive Environments.Game
gigaxRuntime, LLM-powered NPCs.Game
HippoRAGNeurobiologically Inspired Long-Term Memory for Large Language Models.arXivAgent
Interactive LLM Powered NPCsInteractive LLM Powered NPCs, is an open-source project that completely transforms your interaction with non-player characters (NPCs) in any game!Game
IoAAn open-source framework for collaborative AI agents, enabling diverse, distributed agents to team up and tackle complex tasks through internet-like connectivity.Agent
KwaiAgentsA generalized information-seeking agent system with Large Language Models (LLMs).arXivAgent
LangChainGet your LLM application from prototype to production.Agent
LangflowLangflow is a UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows.Agent
LangGraph StudioLangGraph Studio offers a new way to develop LLM applications by providing a specialized agent IDE that enables visualization, interaction, and debugging of complex agentic applications.Agent
LARPLanguage-Agent Role Play for open-world games.arXivAgent
LLama Agentic SystemAgentic components of the Llama Stack APIs.Agent
LlamaIndexLlamaIndex is a data framework for your LLM application.Agent
MindSearch🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT).Agent
Mixture of Agents (MoA)Mixture-of-Agents Enhances Large Language Model Capabilities.arXivAgent
MMRoleMMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents.arXivAgent
Moonlander.aiStart building 3D games without any coding using generative AI.Framework
MuG DiffusionMuG Diffusion is a charting AI for rhythm games based on Stable Diffusion (one of the most powerful AIGC models) with a large modification to incorporate audio waves.Game
OasisOasis is an interactive world model developed by Decart and Etched. Based on diffusion transformers, Oasis takes in user keyboard input and generates gameplay in an autoregressive manner.Game
OmAgentA multimodal agent framework for solving complex tasks.Agent
OpenAgentsAn Open Platform for Language Agents in the Wild.Agent
OpusAn AI app that turns text into a video game.Game
PipecatOpen Source framework for voice and multimodal conversational AI.Agent
Qwen-AgentQwen-Agent is a framework for developing LLM applications based on the instruction following, tool usage, planning, and memory capabilities of Qwen.Agent
RagasRagas is a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines.Agent
RPBench-AutoAn automated pipeline for evaluating LLMs for role-playing.Game
SIMAA generalist AI agent for 3D virtual environments.Agent
StoryGames.aiAI for Dreamers Make Games.Game
SWE-agentAgent Computer Interfaces Enable Software Engineering Language Models.arXivAgent
TaskGenA Task-based agentic framework building on StrictJSON outputs by LLM agents.Agent
TEN AgentTEN Agent is the world’s first real-time multimodal agent integrated with the OpenAI Realtime API, RTC, and features weather checks, web search, vision, and RAG capabilities.Agent
Translation AgentAgentic translation using reflection workflow.Agent
TwitterTwitter Personality is a web application that analyzes your Twitter handle to create a personalized personality profile using Wordware AI Agent.Agent
UnboundedUnbounded: A Generative Infinite Game of Character Life Simulation.arXivGame
Video2GameReal-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video.arXivGame
V-IRLGrounding Virtual Intelligence in Real Life.arXivAgent
WebDesignAgentAn agent used for webdesign.Agent
XAgentAn Autonomous LLM Agent for Complex Task Solving.Agent
<p style="text-align: right;"><a href="#table-of-contents">^ Back to Contents ^</a></p>

<span id="code">Code</span>

SourceDescriptionPaperGame EngineType
AI Code TranslatorUse AI to translate code from one language to another.Code
aiXcoder-7BaiXcoder-7B Code Large Language Model.Code
bloopbloop is a fast code search engine written in Rust.Code
ChapyterChatGPT Code Interpreter in Jupyter Notebooks.Code
CodeGeeXAn Open Multilingual Code Generation Model.arXivCode
CodeGeeX2A More Powerful Multilingual Code Generation Model.Code
CodeGeeX4CodeGeeX4: Open Multilingual Code Generation Model.Code
CodeGenCodeGen is an open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.arXivCode
CodeGen2CodeGen2 models for program synthesis.arXivCode
Code LlamaCode Llama is a large language models for code based on Llama 2.Code
CodeTFOne-stop Transformer Library for State-of-the-art Code LLM.Code
CodeT5Open Code LLMs for Code Understanding and Generation.Code
CursorWrite, edit, and chat about your code with GPT-4 in a new type of editor.Code
DeepSeek CoderDeepSeek Coder: Let the Code Write Itself.arXivCode
OpenAI CodexOpenAI Codex is a descendant of GPT-3.Code
PandasAIPandas AI is a Python library that integrates generative artificial intelligence capabilities into Pandas, making dataframes conversational.Code
RobloxScripterAIRobloxScripterAI is an AI-powered code generation tool for Roblox.RobloxCode
Scikit-LLMSeamlessly integrate powerful language models like ChatGPT into scikit-learn for enhanced text analysis tasks.Code
SoTaNaThe Open-Source Software Development Assistant.arXivCode
Stable Code 3BCoding on the Edge.Code
StarCoder💫 StarCoder is a language model (LM) trained on source code and natural language text.arXivCode
StarCoder 2StarCoder2 is a family of code generation models (3B, 7B, and 15B), trained on 600+ programming languages from The Stack v2 and some natural language text such as Wikipedia, Arxiv, and GitHub issues.arXivCode
UnityGen AIUnityGen AI is an AI-powered code generation plugin for Unity.UnityCode
VoidVoid is an open source Cursor alternative. Write code with the best AI tools, retain full control over your data, and access powerful AI features.Code
<p style="text-align: right;"><a href="#table-of-contents">^ Back to Contents ^</a></p>

<span id="writer">Writer</span>

SourceDescriptionPaperGame EngineType
AI-WriterAI writes novels, generates fantasy and romance web articles, etc. Chinese pre-trained generative model.Writer
Notebook.aiNotebook.ai is a set of tools for writers, game designers, and roleplayers to create magnificent universes – and everything within them.Writer
NovelNotion-style WYSIWYG editor with AI-powered autocompletions.Writer
NovelAIDriven by AI, painlessly construct unique stories, thrilling tales, seductive romances, or just fool around.Writer
<p style="text-align: right;"><a href="#table-of-contents">^ Back to Contents ^</a></p>

<span id="image">Image</span>

SourceDescriptionPaperGame EngineType
AnyDoorZero-shot Object-level Image Customization.arXivImage
AnyTextMultilingual Visual Text Generation And Editing.arXivImage
AutoStudioCrafting Consistent Subjects in Multi-turn Interactive Image Generation.arXivImage
Blender-ControlNetUsing ControlNet right in Blender.BlenderImage
BriVLBridging Vision and Language Model.arXivImage
CatVTONCatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models.arXivImage
CLIPassoA method for converting an image of an object to a sketch, allowing for varying levels of abstraction.arXivImage
ClipDropCreate stunning visuals in seconds.Image
ComfyUIA powerful and modular stable diffusion GUI with a graph/nodes interface.Image
ConceptLabCreative Generation using Diffusion Prior Constraints.arXivImage
ControlNetControlNet is a neural network structure to control diffusion models by adding extra conditions.arXivImage
CSGOCSGO: Content-Style Composition in Text-to-Image Generation.arXivImage
DALL·E 2DALL·E 2 is an AI system that can create realistic images and art from a description in natural language.Image
Dashtoon StudioDashtoon Studio is an AI powered comic creation platform.Comic
DeepAIDeepAI offers a suite of tools that use AI to enhance your creativity.Image
DeepFloyd IFIF by DeepFloyd Lab at StabilityAI.Image
Depth Anything V2Depth Anything V2arXivImage
Depth map library and poserDepth map library for use with the Control Net extension for Automatic1111/stable-diffusion-webui.Image
Diffuse to ChooseEnriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All.arXivImage
Disco DiffusionA frankensteinian amalgamation of notebooks, models and techniques for the generation of AI Art and Animations.Image
DragGANInteractive Point-based Manipulation on the Generative Image Manifold.arXivImage
Draw ThingsAI- assisted image generation in Your Pocket.Image
DWPoseEffective Whole-body Pose Estimation with Two-stages Distillation.arXivImage
EasyPhotoYour Smart AI Photo Generator.Image
FluxThis repo contains minimal inference code to run text-to-image and image-to-image with our Flux latent rectified flow transformers.Image
Follow-Your-ClickOpen-domain Regional Image Animation via Short Prompts.arXivImage
FooocusFocus on prompting and generating.Image
GIFfusionCreate GIFs and Videos using Stable Diffusion.Image
Grounded-Segment-AnythingAutomatically Detect , Segment and Generate Anything with Image, Text, and Audio Inputs.arXivImage
HivisionIDPhotosHivisionIDPhotos: a lightweight and efficient AI ID photos tools.Image
HuaHua is an AI image editor with Stable Diffusion (and more).Image
Hunyuan-DiTA Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding.arXivImage
IC-LightIC-Light is a project to manipulate the illumination of images.Image
IdeogramHelping people become more creative.Image
ImagenImagen is an AI system that creates photorealistic images from input text.Image
img2img-turboOne-Step Image-to-Image with SD-Turbo.Image
Img2PromptGet prompts from stable diffusion generated images.Image
InstantIDZero-shot Identity-Preserving Generation in Seconds.arXivImage
InternLM-XComposer2InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.arXivImage
KOALASelf-Attention Matters in Knowledge Distillation of Latent Diffusion Models for Memory-Efficient and Fast Image Synthesis.Image
KolorsKolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis.Image
KREAGenerate images and videos with a delightful AI-powered design tool.Image
LaVi-BridgeBridging Different Language Models and Generative Vision Models for Text-to-Image Generation.arXivImage
LayerDiffusionTransparent Image Layer Diffusion using Latent Transparency.arXivImage
LexicaA Stable Diffusion prompts search engine.Image
LlamaGenAutoregressive Model Beats Diffusion: Llama for Scalable Image Generation.arXivImage
Lumina-mGPTLumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining.arXivImage
MetaShootMetaShoot is a digital twin of a photo studio, developed as a plugin for Unreal Engine that gives any creator the ability to produce highly realistic renders in the easiest and quickest way.Unreal EngineImage
MidjourneyMidjourney is an independent research lab exploring new mediums of thought and expanding the imaginative powers of the human species.Image
MIGCMIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis.arXivImage
MimicBrushZero-shot Image Editing with Reference Imitation.arXivImage
OmniGenOmniGen: Unified Image Generation.arXivImage
OmostOmost is a project to convert LLM's coding capability to image generation (or more accurately, image composing) capability.Image
Openpose EditorOpenpose Editor for AUTOMATIC1111's stable-diffusion-webui.Image
Outfit AnyoneUltra-high quality virtual try-on for Any Clothing and Any Person.Image
PaintsUndoPaintsUndo: A Base Model of Drawing Behaviors in Digital Paintings.Image
PhotoMakerCustomizing Realistic Human Photos via Stacked ID Embedding.arXivImage
PhotoroomAI Background Generator.Image
PlaskAI image generation in the cloud.Image
Prompt.ArtThe Generators Hub.Image
PuLIDPure and Lightning ID Customization via Contrastive Alignment.arXivImage
Rich-Text-to-ImageExpressive Text-to-Image Generation with Rich Text.arXivImage
RPG-DiffusionMasterMastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG).Image
SEED-StorySEED-Story: Multimodal Long Story Generation with Large Language Model.arXivImage
Segment AnythingSegment Anything Model (SAM): a new AI model from Meta AI that can "cut out" any object , in any image , with a single click.arXivImage
Segment Anything Model 2 (SAM 2)SAM 2: Segment Anything in Images and Videos.arXivImage
sd-webui-controlnetWebUI extension for ControlNet.Image
SDXL-LightningProgressive Adversarial Diffusion Distillation.arXivImage
SDXSReal-Time One-Step Latent Diffusion Models with Image Conditions.Image
Stable.artPhotoshop plugin for Stable Diffusion with Automatic1111 as backend (locally or with Google Colab).Image
Stable CascadeStable Cascade consists of three models: Stage A, Stage B and Stage C, representing a cascade for generating images, hence the name "Stable Cascade".Image
Stable DiffusionA latent text-to-image diffusion model.Image
stable-diffusion.cppStable Diffusion in pure C/C++.Image
Stable Diffusion web UIA browser interface based on Gradio library for Stable Diffusion.Image
Stable Diffusion web UIWeb-based UI for Stable Diffusion.Image
Stable Diffusion WebUI ChineseChinese version of stable-diffusion-webui.Image
Stable Diffusion XLGenerate images from text.arXivImage
Stable Diffusion XL TurboReal-Time Text-to-Image Generation.Image
Stable Diffusion 3.5Stable Diffusion 3.5 open release includes multiple model variants, including Stable Diffusion 3.5 Large and Stable Diffusion 3.5 Large Turbo.Image
Stable DoodleStable Doodle is a sketch-to-image tool that converts a simple drawing into a dynamic image.Image
StableStudioStableStudio by Stability AIImage
StoryMakerStoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation.arXivImage
StreamDiffusionA Pipeline-Level Solution for Real-Time Interactive Generation.Image
StyleDropText-To-Image Generation in Any Style.arXivImage
SyncDreamerGenerating Multiview-consistent Images from a Single-view Image.arXivImage
UltraEditUltraEdit: Instruction-based Fine-Grained Image Editing at Scale.arXivImage
UltraPixelUltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks.arXivImage
Unity ML Stable DiffusionCore ML Stable Diffusion on Unity.UnityImage
Vispunk VisionsText-to-Image generation platform.Image
<p style="text-align: right;"><a href="#table-of-contents">^ Back to Contents ^</a></p>

<span id="texture">Texture</span>

SourceDescriptionPaperGame EngineType
CRMSingle Image to 3D Textured Mesh with Convolutional Reconstruction Model.arXivTexture
DreamMatHigh-quality PBR Material Generation with Geometry- and Light-aware Diffusion Models.arXivTexture
DreamSpaceDreaming Your Room Space with Text-Driven Panoramic Texture Propagation.Texture
Dream TexturesStable Diffusion built-in to Blender. Create textures, concept art, background assets, and more with a simple text prompt.BlenderTexture
InstructHumansEditing Animated 3D Human Textures with Instructions.arXivTexture
InteXInteractive Text-to-Texture Synthesis via Unified Depth-aware Inpainting.arXivTexture
LLaMA-MeshLLaMA-Mesh: Unifying 3D Mesh Generation with Language Models.arXivMesh
MaterialSeg3DMaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets.arXivTexture
MeshAnythingMaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets.arXivMesh
NeuralangeloHigh-Fidelity Neural Surface Reconstruction.arXivTexture
Paint-itText-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering.Texture
PolycamCreate your own 3D textures just by typing.Texture
TexFusionSynthesizing 3D Textures with Text-Guided Image Diffusion Models.arXivTexture
Text2TexText-driven texture Synthesis via Diffusion Models.arXivTexture
Texture LabAI-generated texures. You can generate your own with a text prompt.Texture
With PolyCreate Textures With Poly. Generate 3D materials with AI in a free online editor, or search our growing community library.Texture
X-MeshX-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance.arXivTexture
<p style="text-align: right;"><a href="#table-of-contents">^ Back to Contents ^</a></p>

<span id="shader">Shader</span>

SourceDescriptionPaperGame EngineType
AI ShaderChatGPT-powered shader generator for Unity.UnityShader
<p style="text-align: right;"><a href="#table-of-contents">^ Back to Contents ^</a></p>

<span id="model">3D Model</span>

SourceDescriptionPaperGame EngineType
Animate3DAnimate3D: Animating Any 3D Model with Multi-view Video Diffusion.arXiv3D
Anything-3DSegment-Anything + 3D. Let's lift the anything to 3D.arXivModel
Any2PointAny2Point: Empowering Any-modality Large Models for Efficient 3D Understanding.arXiv3D
BlenderGPTUse commands in English to control Blender with OpenAI's GPT-4.BlenderModel
Blender-GPTAn all-in-one Blender assistant powered by GPT3/4 + Whisper integration.BlenderModel
Blockade LabsDigital alchemy is real with Skybox Lab - the ultimate AI-powered solution for generating incredible 360° skybox experiences from text prompts.Model
CF-3DGSCOLMAP-Free 3D Gaussian Splatting.arXiv3D
CharacterGenCharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization.arXiv3D
chatGPT-mayaSimple Maya tool that utilizes open AI to perform basic tasks based on descriptive instructions.MayaModel
CityDreamerCompositional Generative Model of Unbounded 3D Cities.arXiv3D
CSMGenerate 3D worlds from images and videos.3D
DashYour Copilot for World Building in Unreal Engine.Unreal Engine3D
DreamCatalystDreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation.arXiv3D
DreamGaussian4DGenerative 4D Gaussian Splatting.arXiv4D
DUSt3RGeometric 3D Vision Made Easy.arXiv3D
Edify 3DEdify 3D: Scalable High-Quality 3D Asset Generation.arXiv3D
GALA3DGALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting.arXiv3D
GaussCtrlGaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing.arXiv3D
GaussianCubeA Structured and Explicit Radiance Representation for 3D Generative Modeling.arXiv3D
GaussianDreamerFast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors.arXiv3D
GenieLabsEmpower your game with AI-UGC.3D
HiFAHigh-fidelity Text-to-3D with advance Diffusion guidance.Model
HoloDreamerHoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions.arXiv3D
Hunyuan3D-1.0Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation.arXiv3D
InfinigenInfinite Photorealistic Worlds using Procedural Generation.arXiv3D
Instruct-NeRF2NeRFEditing 3D Scenes with Instructions.arXivModel
Interactive3DCreate What You Want by Interactive 3D Generation.arXiv3D
Isotropic3DImage-to-3D Generation Based on a Single CLIP Embedding.3D
LATTE3DLarge-scale Amortized Text-To-Enhanced3D Synthesis.arXiv3D
LIONLatent Point Diffusion Models for 3D Shape Generation.arXivModel
Luma AICapture in lifelike 3D. Unmatched photorealism, reflections, and details. The future of VFX is now, for everyone!Model
lumine AIAI-Powered Creativity.3D
Make-It-3DHigh-Fidelity 3D Creation from A Single Image with Diffusion Prior.arXivModel
MeshyCreate Stunning 3D Game Assets with AI.3D
MootionMagical 3D AI Animation Maker.3D
MVDreamMulti-view Diffusion for 3D Generation.arXiv3D
NVIDIA Instant NeRFInstant neural graphics primitives: lightning fast NeRF and more.Model
One-2-3-45Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization.arXivModel
Paint3DPaint Anything 3D with Lighting-Less Texture Diffusion Models.arXiv3D
PAniC-3DStylized Single-view 3D Reconstruction from Portraits of Anime Characters.arXivModel
Point·EPoint cloud diffusion for 3D model synthesis.Model
ProlificDreamerHigh-Fidelity and diverse Text-to-3D generation with Variational score Distillation.arXivModel
SF3DSF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement.arXiv3D
Shap-EGenerate 3D objects conditioned on text or images.arXivModel
Sloyd3D modelling has never been easier.Model
Spline AIThe power of AI is coming to the 3rd dimension. Generate objects, animations, and textures using prompts.Model
Stable DreamfusionA pytorch implementation of the text-to-3D model Dreamfusion, powered by the Stable Diffusion text-to-2D model.Model
SV3DNovel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion.arXiv3D
TafiAI text to 3D character engine.Model
3D-GPTProcedural 3D Modeling with Large Language Models.arXiv3D
3D-LLMInjecting the 3D World into Large Language Models.arXiv3D
3DpressoExtract a 3D model of an object, captured on a video.Model
3DTopiaText-to-3D Generation within 5 Minutes.arXiv3D
3DTopia-XL3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion.arXiv3D
threestudioA unified framework for 3D content generation.Model
TripoSRA state-of-the-art open-source model for fast feedforward 3D reconstruction from a single image.arXivModel
Unique3DHigh-Quality and Efficient 3D Mesh Generation from a Single Image.arXiv3D
UnityGaussianSplattingToy Gaussian Splatting visualization in Unity.Unity3D
ViVid-1-to-3Novel View Synthesis with Video Diffusion Models.arXiv3D
VoxcraftCrafting Ready-to-Use 3D Models with AI.3D
Wonder3DSingle Image to 3D using Cross-Domain Diffusion.arXiv3D
Zero-1-to-3Zero-shot One Image to 3D Object.arXivModel
<p style="text-align: right;"><a href="#table-of-contents">^ Back to Contents ^</a></p>

<span id="avatar">Avatar</span>

SourceDescriptionPaperGame EngineType
AniPortraitAudio-Driven Synthesis of Photorealistic Portrait Animations.arXivAvatar
CALMConditional Adversarial Latent Models for Directable Virtual Characters.arXivAvatar
ChatAvatarProgressive generation Of Animatable 3D Faces Under Text guidance.Avatar
ChatdollKitChatdollKit enables you to make your 3D model into a chatbot.UnityAvatar
DreamTalkWhen Expressive Talking Head Generation Meets Diffusion Probabilistic Models.arXivAvatar
DuixDuix - Silicon-Based Digital Human SDK 🌐🤖Avatar
EchoMimicEchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions.arXivAvatar
EMOPortraitsEmotion-enhanced Multimodal One-shot Head Avatars.Avatar
E3 GenEfficient, Expressive and Editable Avatars Generation.arXivAvatar
ExAvatarExAvatar - Expressive Whole-Body 3D Gaussian Avatar.arXivAvatar
GeneAvatarGeneric Expression-Aware Volumetric Head Avatar Editing from a Single Image.arXivAvatar
GeneFace++Generalized and Stable Real-Time 3D Talking Face Generation.Avatar
HalloHierarchical Audio-Driven Visual Synthesis for Portrait Image Animation.arXivAvatar
Hallo2Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation.arXivAvatar
HeadSculptCrafting 3D Head Avatars with Text.arXivAvatar
IntrinsicAvatarIntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing.arXivAvatar
Linly-TalkerDigital Avatar Conversational System.Avatar
LivePortraitLivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control.arXivAvatar
MotionGPTHuman Motion as a Foreign Language, a unified motion-language generation model using LLMs.arXivAvatar
MusePoseMusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation.Avatar
MuseTalkReal-Time High Quality Lip Synchorization with Latent Space Inpainting.Avatar
MuseVInfinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising.Avatar
Portrait4DLearning One-Shot 4D Head Avatar Synthesis using Synthetic Data.arXivAvatar
Ready Player MeIntegrate customizable avatars into your game or app in days.Avatar
RodinHDRodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models.arXivAvatar
StyleAvatar3DLeveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation.arXivAvatar
Text2Control3DControllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model.arXivAvatar
Topo4DTopology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture.arXivAvatar
UnityAIWithChatGPTBased on Unity, ChatGPT+UnityChan voice interactive display is realized.UnityAvatar
Vid2Avatar3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition.arXivAvatar
VLOGGERMultimodal Diffusion for Embodied Avatar Synthesis.Avatar
Wild2AvatarRendering Humans Behind Occlusions.arXivAvatar
<p style="text-align: right;"><a href="#table-of-contents">^ Back to Contents ^</a></p>

<span id="animation">Animation</span>

SourceDescriptionPaperGame EngineType
Animate AnyoneConsistent and Controllable Image-to-Video Synthesis for Character Animation.arXivAnimation
AnimateAnythingFine-Grained Open Domain Image Animation with Motion Guidance.arXivAnimation
AnimateDiffAnimate Your Personalized Text-to-Image Diffusion Models without Specific Tuning.arXivAnimation
AnimateLCMLet's Accelerate the Video Generation within 4 Steps!arXivAnimation
Animate-XAnimate-X: Universal Character Image Animation with Enhanced Motion Representation.arXivAnimation
AnimateZeroVideo Diffusion Models are Zero-Shot Image Animators.arXivAnimation
AnimationGPTAn AIGC tool for generating game combat motion assets.Animation
DeforumDeforum leverages Stable Diffusion to generate evolving AI visuals.Animation
DrawingSpinUpDrawingSpinUp: 3D Animation from Single Character Drawings.arXivAnimation
DreaMovingA Human Video Generation Framework based on Diffusion Models.arXivAnimation
FaceFusionNext generation face swapper and enhancer.Animation
FreeInitBridging Initialization Gap in Video Diffusion Models.arXivAnimation
GeneFaceGeneralized and High-Fidelity Audio-Driven 3D Talking Face Synthesis.arXivAnimation
ID-AnimatorZero-Shot Identity-Preserving Human Video Generation.arXivAnimation
MagicAnimateTemporally Consistent Human Image Animation using Diffusion Model.arXivAnimation
NUWADragNUWA is an open-domain diffusion-based video generation model takes text, image, and trajectory controls as inputs to achieve controllable video generation.arXivAnimation
NUWA-InfinityNUWA-Infinity is a multimodal generative model that is designed to generate high-quality images and videos from given text, image or video input.Animation
NUWA-XLA novel Diffusion over Diffusion architecture for eXtremely Long video generation.Animation
Omni AnimationAI Generated High Fidelity Animations.Animation
PIAYour Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models.arXivAnimation
SadTalkerLearning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation.arXivAnimation
SadTalker-Video-Lip-SyncThis project is based on SadTalkers Wav2lip for video lip synthesis.Animation
Stable AnimationA powerful text-to-animation tool for developers.Animation
TaleCrafterAn interactive story visualization tool that support multiple characters.arXivAnimation
ToonCrafterToonCrafter: Generative Cartoon Interpolation.arXivAnimation
Wav2LipAccurately Lip-syncing Videos In The Wild.arXivAnimation
Wonder StudioAn AI tool that automatically animates, lights and composes CG characters into a live-action scene.Animation
<p style="text-align: right;"><a href="#table-of-contents">^ Back to Contents ^</a></p>

<span id="visual">Visual</span>

SourceDescriptionPaperGame EngineType
Cambrian-1Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs.arXivMultimodal LLMs
CogVLM2GPT4V-level open-source multi-modal model based on Llama3-8B.Visual
CoTrackerIt is Better to Track Together.arXivVisual
EVF-SAMEVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model.arXivVisual
FaceHiIt is Better to Track Together.Visual
InternLM-XComposer2InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.arXivVisual
KangarooKangaroo: A Powerful Video-Language Model Supporting Long-context Video Input.Visual
LGVITowards Language-Driven Video Inpainting via Multimodal Large Language Models.Visual
LLaVA++Extending Visual Capabilities with LLaMA-3 and Phi-3.Visual
LLaVA-OneVisionLLaVA-OneVision: Easy Visual Task Transfer.arXivVisual
LongVALong Context Transfer from Language to Vision.arXivVisual
MaskViTMasked Visual Pre-Training for Video Prediction.arXivVisual
MiniCPM-Llama3-V 2.5A GPT-4V Level MLLM on Your Phone.Visual
MoE-LLaVAMixture of Experts for Large Vision-Language Models.arXivVisual
MotionLLMUnderstanding Human Behaviors from Human Motions and Videos.arXivVisual
PLLaVAParameter-free LLaVA Extension from Images to Videos for Video Dense Captioning.arXivVisual
Qwen-VLA Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond.arXivVisual
SapiensSapiens: Foundation for Human Vision Models.arXivVisual
ShareGPT4VImproving Large Multi-modal Models with Better Captions.arXivVisual
SOLOSOLO: A Single Transformer for Scalable Vision-Language Modeling.arXivVisual
Video-CCAMVideo-CCAM: Advancing Video-Language Understanding with Causal Cross-Attention Masks.Visual
Video-LLaVALearning United Visual Representation by Alignment Before Projection.arXivVisual
VideoLLaMA 2Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs.arXivVisual
Video-MMEThe First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.arXivVisual
VitronA Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing.Visual
VILAVILA: On Pre-training for Visual Language Models.arXivVisual
<p style="text-align: right;"><a href="#table-of-contents">^ Back to Contents ^</a></p>

<span id="video">Video</span>

SourceDescriptionPaperGame EngineType
360DVDControllable Panorama Video Generation with 360-Degree Video Diffusion Model.arXivVideo
Animate-A-StoryRetrieval-Augmented Video Generation for Telling a Story.arXivVideo
Anything in Any ScenePhotorealistic Video Object Insertion.Video
ART•VAuto-Regressive Text-to-Video Generation with Diffusion Models.arXivVideo
AssistiveMeet the generative video platform that brings your ideas to life.Video
AtomoVideoHigh Fidelity Image-to-Video Generation.arXivVideo
BackgroundRemoverBackground Remover lets you Remove Background from images and video using AI with a simple command line interface that is free and open source.Video
BoximatorGenerating Rich and Controllable Motions for Video Synthesis.arXivVideo
CoDeFContent Deformation Fields for Temporally Consistent Video Processing.arXivVideo
CogVideoGenerate Videos from Text Descriptions.Video
CogVideoXCogVideoX is an open-source version of the video generation model, which is homologous to 清影.Video
CogVLMCogVLM is a powerful open-source visual language model (VLM).Visual
CoNRGenarate vivid dancing videos from hand-drawn anime character sheets(ACS).arXivVideo
DecohereCreate what can't be filmed.Video
DescriptDescript is the simple, powerful , and fun way to edit.Video
DiffutoonHigh-Resolution Editable Toon Shading via Diffusion Models.arXivVideo
dolphinGeneral video interaction platform based on LLMs.Video
DomoAIAmplify Your Creativity with DomoAI.Video
DreamCinemaDreamCinema: Cinematic Transfer with Free Camera and 3D Character.arXivVideo
DynamiCrafterAnimating Open-domain Images with Video Diffusion Priors.arXivVideo
EDGEWe introduce EDGE, a powerful method for editable dance generation that is capable of creating realistic, physically-plausible dances while remaining faithful to arbitrary input music.arXivVideo
EMOEmote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions.arXivVideo
Emu VideoFactorizing Text-to-Video Generation by Explicit Image Conditioning.Video
EtnaEtna can generate corresponding video content based on short text descriptions.Video
FairyFast Parallelized Instruction-Guided Video-to-Video Synthesis.Video
Follow-Your-CanvasFollow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation.arXivVideo
Follow Your PosePose-Guided Text-to-Video Generation using Pose-Free Videos.arXivVideo
FullJourneyYour complete suite of AI Creation tools at your fingertips.Video
Gen-2A multi-modal AI system that can generate novel videos with text, images, or video clips.Video
Generative DynamicsGenerative Image Dynamics.Video
GenieGenerative Interactive Environments.arXivVideo
GenmoMagically make videos with AI.Video
GenTronDiffusion Transformers for Image and Video Generation.Video
HiGenHierarchical Spatio-temporal Decoupling for Text-to-Video generation.Video
Hotshot-XLHotshot-XL is an AI text-to-GIF model trained to work alongside Stable Diffusion XL.Video
Imagen VideoGiven a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models.Video
InstructVideoInstructing Video Diffusion Models with Human Feedback.arXivVideo
I2VGen-XLHigh-Quality Image-to-Video Synthesis via Cascaded Diffusion Models.arXivVideo
LaVieHigh-Quality Video Generation with Cascaded Latent Diffusion Models.arXivVideo
LTX StudioLTX Studio is a holistic, AI-driven filmmaking platform for creators, marketers, filmmakers and studios.Video
LTX-VideoLTX-Video is the first DiT-based video generation model that can generate high-quality videos in real-time. It can generate 24 FPS videos at 768x512 resolution, faster than it takes to watch them.Video
LumiereA Space-Time Diffusion Model for Video Generation.arXivVideo
LVDMLatent Video Diffusion Models for High-Fidelity Long Video Generation.arXivVideo
MagicVideoEfficient Video Generation With Latent Diffusion Models.arXivVideo
MagicVideo-V2Multi-Stage High-Aesthetic Video Generation.arXivVideo
Magic HourAI Video for Creators made simple.Video
MAGVIT-v2Tokenizer is key to visual generation.Video
MAGVITMasked Generative Video Transformer.Video
Make-A-VideoMake-A-Video is a state-of-the-art AI system that generates videos from text.arXivVideo
Make Pixels DanceHigh-Dynamic Video Generation.arXivVideo
Make-Your-VideoCustomized Video Generation Using Textual and Structural Guidance.arXivVideo
MicroCinemaA Divide-and-Conquer Approach for Text-to-Video Generation.arXivVideo
MIMOMIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling.arXivVideo
Mini-GeminiMining the Potential of Multi-modality Vision Language Models.Vision
MobileVidFactoryAutomatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text.Video
Mochi 1Mochi 1 is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation.Video
MOFA-VideoControllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.arXivVideo
MoneyPrinterTurboUse large models to generate short videos with one click.Video
MoonvalleyMoonvalley is a groundbreaking new text-to-video generative AI model.Video
MoraMore like Sora for Generalist Video Generation.arXivVideo
Morph StudioWith our Text-to-Video AI Magic, manifest your creativity through your prompt.Video
MotionCloneMotionClone: Training-Free Motion Cloning for Controllable Video Generation.arXivVideo
MotionCtrlA Unified and Flexible Motion Controller for Video Generation.arXivVideo
MotionDirectorMotion Customization of Text-to-Video Diffusion Models.arXivVideo
MotionshopAn application of replacing the characters in video with 3D avatars.Video
Mov2movMov2mov plugin for Automatic1111/stable-diffusion-webui.Video
MovieFactoryAutomatic Movie Creation from Text using Large Generative Models for Language and Images.arXivVideo
Neural FramesDiscover the synthesizer for the visual world.Video
NeverEndsCreate your world.Video
Open-SoraDemocratizing Efficient Video Production for All.Video
Open-SoraOpen-Sora Plan.Video
PhenakiA model for generating videos from text, with prompts that can change over time, and videos that can be as long as multiple minutes.arXivVideo
Pika LabsPika Labs is revolutionizing video-making experience with AI.Video
PixelingPixeling empowers our customers to create highly precise, ultra-realistic, and extremely controllable visual content including images, videos and 3D models.Video
PixVerseCreate breath-taking videos with AI.Video
PollinationsCreating gets easy, fast, and fun.Video
Reuse and DiffuseIterative Denoising for Text-to-Video Generation.arXivVideo
ShortGPTAn experimental AI framework for automated short/video content creation.Video
Show-1Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation.arXivVideo
Snap VideoScaled Spatiotemporal Transformers for Text-to-Video Synthesis.arXivVideo
SoraCreating video from text.Video
SoraWebuiSoraWebui is an open-source Sora web client, enabling users to easily create videos from text with OpenAI's Sora model.Video
StableVideoText-driven Consistency-aware Diffusion Video Editing.Video
Stable Video DiffusionStable Video Diffusion (SVD) Image-to-Video.Video
StoryDiffusionConsistent Self-Attention for Long-Range Image and Video Generation.arXivVideo
StreamingT2VConsistent, Dynamic, and Extendable Long Video Generation from Text.arXivVideo
StyleCrafternhancing Stylized Text-to-Video Generation with Style Adapter.arXivVideo
TATSLong Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer.Video
Text2Video-ZeroText-to-Image Diffusion Models are Zero-Shot Video Generators.arXivVideo
TF-T2VA Recipe for Scaling up Text-to-Video Generation with Text-free Videos.arXivVideo
ToraTora: Trajectory-oriented Diffusion Transformer for Video Generation.arXivVideo
Track-AnythingTrack-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything and XMem.arXivVideo
Tune-A-VideoOne-Shot Tuning of Image Diffusion Models for Text-to-Video Generation.arXivVideo
TwelveLabsMultimodal AI that understands videos like humans.Video
UniVGTowards UNIfied-modal Video Generation.Video
Vchitect-2.0Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models.Video
VGenA holistic video generation ecosystem for video generation building on diffusion models.arXivVideo
ViewCrafterViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis.arXivVideo
Video-ChatGPTVideo-ChatGPT is a video conversation model capable of generating meaningful conversation about videos.arXivVideo
VideoComposerCompositional Video Synthesis with Motion Controllability.arXivVideo
VideoCrafter1Open Diffusion Models for High-Quality Video Generation.arXivVideo
VideoCrafter2Overcoming Data Limitations for High-Quality Video Diffusion Models.arXivVideo
VideoDrafterContent-Consistent Multi-Scene Video Generation with LLM.arXivVideo
VideoElevatorElevating Video Generation Quality with Versatile Text-to-Image Diffusion Models.arXivVideo
VideoFactorySwap Attention in Spatiotemporal Diffusions for Text-to-Video Generation.Video
VideoGenA Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation.arXivVideo
VideoLCMVideo Latent Consistency Model.arXivVideo
Video LDMsAlign your Latents: High- resolution Video Synthesis with Latent Diffusion Models.arXivVideo
Video-LLaVALearning United Visual Representation by Alignment Before Projection.arXivVideo
VideoMambaState Space Model for Efficient Video Understanding.arXivVideo
Video-of-ThoughtVideo-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition.Video
VideoPoetA large language model for zero-shot video generation.arXivVideo
Vispunk MotionCreate realistic videos using just text.Video
VisualRWKVVisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks.Visual
V-JEPAVideo Joint Embedding Predictive Architecture.arXivVideo
W.A.L.TPhotorealistic Video Generation with Diffusion Models.arXivVideo
ZeroscopeZeroscope Text-to-Video.Video
<p style="text-align: right;"><a href="#table-of-contents">^ Back to Contents ^</a></p>

<span id="audio">Audio</span>

SourceDescriptionPaperGame EngineType
AcademiCodecAn Open Source Audio Codec Model for Academic Research.Audio
AmphionAn Open-Source Audio, Music, and Speech Generation Toolkit.arXivAudio
ArchiSoundAudio generation using diffusion models, in PyTorch.Audio
AudioboxUnified Audio Generation with Natural Language Prompts.Audio
AudioEditingZero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion.arXivAudio
Audiogen CodecA low compression 48khz stereo neural audio codec for general audio, optimizing for audio fidelity 🎵.Audio
AudioGPTUnderstanding and Generating Speech, Music, Sound, and Talking Head.arXivAudio
AudioLCMText-to-Audio Generation with Latent Consistency Models.arXivAudio
AudioLDMText-to-Audio Generation with Latent Diffusion Models.arXivAudio
AudioLDM 2Learning Holistic Audio Generation with Self-supervised Pretraining.arXivAudio
AuffusionLeveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation.arXivAudio
CTAGCreative Text-to-Audio Generation via Synthesizer Programming.Audio
FoleyCrafterFoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds.arXivAudio
MAGNeTMasked Audio Generation using a Single Non-Autoregressive Transformer.Audio
Make-An-AudioText-To-Audio Generation with Prompt-Enhanced Diffusion Models.arXivAudio
Make-An-Audio 3Transforming Text into Audio via Flow-based Large Diffusion Transformers.arXivAudio
NeuralSoundLearning-based Modal Sound Synthesis with Acoustic Transfer.arXivAudio
OptimizerAISounds for Creators, Game makers, Artists, Video makers.Audio
Qwen2-AudioQwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.arXivAudio
SEE-2-SOUNDZero-Shot Spatial Environment-to-Spatial Sound.arXivAudio
SoundStormEfficient Parallel Audio Generation.arXivAudio
Stable AudioFast Timing-Conditioned Latent Audio Diffusion.Audio
Stable Audio OpenStable Audio Open 1.0 generates variable-length (up to 47s) stereo audio at 44.1kHz from text prompts.Audio
SyncFusionSyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis.arXivAudio
TANGOText-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model.Audio
VTA-LDMVideo-to-Audio Generation with Hidden Alignment.arXivAudio
WavJourneyCompositional Audio Creation with Large Language Models.arXivAudio
<p style="text-align: right;"><a href="#table-of-contents">^ Back to Contents ^</a></p>

<span id="music">Music</span>

SourceDescriptionPaperGame EngineType
AIVAThe Artificial Intelligence composing emotional soundtrack music.Music
Amper MusicCustom music generation technology powered by Amper.Music
BoomyCreate generative music. Share it with the world.Music
ChatMusicianFostering Intrinsic Musical Abilities Into LLM.Music
Chord2MelodyAutomatic Music Generation AI.Music
Diff-BGMA Diffusion Model for Video Background Music Generation.arXivMusic
FluxMusicFluxMusic: Text-to-Music Generation with Rectified Flow Transformer.arXivMusic
GPTAbletonDraft script for processing GPT response and sending the MIDI notes into the Ableton clips with AbletonOSC and python-osc.Music
HeyMusic.AIAI Music GeneratorMusic
Image to MusicAI Image to Music Generator is a tool that uses artificial intelligence to convert images into music.Music
JEN-1Text-Guided Universal Music Generation with Omnidirectional Diffusion Models.Music
JukeboxA Generative Model for Music.arXivMusic
MagentaMagenta is a research project exploring the role of machine learning in the process of creating art and music.Music
MeLoDyEfficient Neural Music GenerationMusic
MubertAI Generative Music.Music
MuseNetA deep neural network that can generate 4-minute musical compositions with 10 different instruments, and can combine styles from country to Mozart to the Beatles.Music
MusicGenSimple and Controllable Music Generation.arXivMusic
MusicLDMEnhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies.arXivMusic
MusicLMGenerating Music From Text.arXivMusic
Riffusion AppRiffusion is an app for real-time music generation with stable diffusion.Music
SonautoSonauto is an AI music editor that turns prompts, lyrics, or melodies into full songs in any style.Music
SoundRawAI music generator for creators.Music
Soundry AIGenerative AI tools including text-to-sound and infinite sample packs.Music
<p style="text-align: right;"><a href="#table-of-contents">^ Back to Contents ^</a></p>

<span id="voice">Singing Voice</span>

SourceDescriptionPaperGame EngineType
DiffSingerSinging Voice Synthesis via Shallow Diffusion Mechanism.arXivSinging Voice
Retrieval-based-Voice-Conversion-WebUIAn easy-to-use SVC framework based on VITS.Singing Voice
so-vits-svcSoftVC VITS Singing Voice Conversion.Singing Voice
VI-SVSUse VITS and Opencpop to develop singing voice synthesis; Different from VISinger.Singing Voice
<p style="text-align: right;"><a href="#table-of-contents">^ Back to Contents ^</a></p>

<span id="speech">Speech</span>

SourceDescriptionPaperGame EngineType
ApplioUltimate voice cloning tool, meticulously optimized for unrivaled power, modularity, and user-friendly experience.Speech
AudyoText in. Audio out.Speech
BarkText-Prompted Generative Audio Model.Speech
Bert-VITS2VITS2 Backbone with multilingual bert.Speech
ChatTTSChatTTS is a generative speech model for daily dialogue.Speech
CLAPSpeechLearning Prosody from Text Context with Contrastive Language-Audio Pre-Training.arXivSpeech
CosyVoiceMulti-lingual large voice generation model, providing inference, training and deployment full-stack ability.Speech
DEX-TTSDiffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability.arXivSpeech
EmotiVoiceA Multi-Voice and Prompt-Controlled TTS Engine.Speech
FlikiTurn text into videos with AI voices.Speech
GLM-4-VoiceGLM-4-Voice is an end-to-end voice model launched by Zhipu AI. GLM-4-Voice can directly understand and generate Chinese and English speech, engage in real-time voice conversations, and change attributes such as emotion, intonation, speech rate, and dialect based on user instructions.Speech
Glow-TTSA Generative Flow for Text-to-Speech via Monotonic Alignment Search.arXivSpeech
GPT-SoVITSA Powerful Few-shot Voice Conversion and Text-to-Speech WebUI.Speech
LOVOLOVO is the go-to AI Voice Generator & Text to Speech platform for thousands of creators.Speech
MahaTTSAn Open-Source Large Speech Generation Model.Speech
Matcha-TTSA fast TTS architecture with conditional flow matching.arXivSpeech
MeloTTSHigh-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.Speech
MetaVoice-1BAI for human-level speech intelligence.Speech
NarakeetEasily Create Voiceovers Using Realistic Text to Speech.Speech
Mini-OmniMini-Omni: Language Models Can Hear, Talk While Thinking in Streaming. Mini-Omni is an open-source multimodel large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.arXivSpeech
One-Shot-Voice-CloningOne Shot Voice Cloning base on Unet-TTS.Speech
OpenVoiceInstant voice cloning by MyShell.Speech
OverFlowPutting flows on top of neural transducers for better TTS.Speech
RealtimeTTSRealtimeTTS is a state-of-the-art text-to-speech (TTS) library designed for real-time applications.Speech
SenseVoiceSenseVoice is a speech foundation model with multiple speech understanding capabilities, including automatic speech recognition (ASR), spoken language identification (LID), speech emotion recognition (SER), and audio event detection (AED).Speech
SpeechGPTEmpowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities.arXivSpeech
speech-to-text-gpt3-unityThis is the repo I use Whisper and ChatGPT API from OpenAI in Unity.UnitySpeech
Stable SpeechStability AI's Text-to-Speech model.Speech
StableTTSNext-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3.Speech
StyleTTS 2Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models.arXivSpeech
tortoise.cpptortoise.cpp: GGML implementation of tortoise-tts.Speech
TorToiSe-TTSA multi-voice TTS system trained with an emphasis on quality.Speech
TTS Generation WebUITTS Generation WebUI (Bark, MusicGen, Tortoise, RVC, Vocos, Demucs).Speech
VALL-ENeural Codec Language Models are Zero-Shot Text to Speech Synthesizers.arXivSpeech
VALL-E XSpeak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language ModelingarXivSpeech
VocodeVocode is an open-source library for building voice-based LLM applications.Speech
VoiceboxText-Guided Multilingual Universal Speech Generation at Scale.arXivSpeech
VoiceCraftZero-Shot Speech Editing and Text-to-Speech in the Wild.Speech
WhisperWhisper is a general-purpose speech recognition model.Speech
WhisperSpeechAn Open Source text-to-speech system built by inverting Whisper.Speech
X-E-SpeechJoint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion.Speech
XTTSXTTS is a library for advanced Text-to-Speech generation.Speech
YourTTSTowards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone.arXivSpeech
ZMM-TTSZero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations.arXivSpeech
<p style="text-align: right;"><a href="#table-of-contents">^ Back to Contents ^</a></p>

<span id="speech">Analytics</span>

SourceDescriptionGame EngineType
Ludo.aiAssistant for game research and design.Analytics
<p style="text-align: right;"><a href="#table-of-contents">^ Back to Contents ^</a></p>