Awesome
:basketball: GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
https://github.com/jiaxilv/GPT4Motion/assets/67190845/f4309a24-fb8d-4800-8709-5794adf338e3
We introduce GPT4Motion, a training-free framework that leverages the planning capability of large language models such as GPT, the physical simulation strength of Blender, and the excellent image generation ability of text-to-image diffusion models to enhance the quality of video synthesis.
Jiaxi Lv*, Yi Huang*, Mingfu Yan*, Jiancheng Huang, Jianzhuang Liu, Yifan Liu, Yafei Wen, Xiaoxin Chen, Shifeng Chen
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, VIVO AI Lab
News!!! :fire::fire::fire:
-
2024-06-18 :trophy::trophy::trophy: Congratulations! GPT4Motion was received the Best Paper Runner-Up Award at the CVPR 2024 PBDL workshop.
-
2024-04-16 We released the code for GPT4Motion.
-
2024-04-09 GPT4Motion was accepted by the CVPR 2024 PBDL workshop!
-
2023-11-28 GPT4Motion was covered by Synced.
-
2023-11-22 GPT4Motion was recommended by AK and included in Hugging Face's daily papers.
-
2023-11-21 The GPT4Motion paper was uploaded to arXiv.
Overview
First, the user prompt is inserted into our designed prompt template. Then, the Python script generated by GPT-4 drives the Blender physics engine to simulate the corresponding motion, producing sequences of edge maps and depth maps. Finally, two ControlNets are employed to constrain the physical motion of video frames generated by Stable Diffusion, where a temporal consistency constraint is designed to enforce the coherence among frames.
Performance
Comparisons with Baselines
https://github.com/jiaxilv/GPT4Motion/assets/67190845/83ce746d-f2e0-4e40-bc0d-cd572b6b867e
Comparison of the video results generated by different text-to-video models with the prompt "A white flag flaps in the wind".
https://github.com/jiaxilv/GPT4Motion/assets/67190845/9780d680-888f-4fe9-8371-cbc0a1c69801
Comparison of the video results generated by different text-to-video models with the prompt "Water flows into a white mug on a table, top-down view.
Controlling Physical Properties
https://github.com/jiaxilv/GPT4Motion/assets/67190845/0724e560-606c-409e-b6ec-038141a6055c
GPT4Motion's results on basketball drop and collision.
https://github.com/jiaxilv/GPT4Motion/assets/67190845/71513144-57ba-4743-807c-6f7ebd9c9eb2
GPT4Motion's results on "A white flag flags in light or the or strong wind".
https://github.com/jiaxilv/GPT4Motion/assets/67190845/b3f2e66c-65b4-4a2f-b656-dd163e28c5b7
GPT4Motion's results on "A white T-shirt flutters in light or the or strong wind".
https://github.com/jiaxilv/GPT4Motion/assets/67190845/753a7a15-28a2-40cf-818e-dd96e29713c7
GPT4Motion's results on "Water or Viscous or Very viscous flows into a white mug on a table, top-down view".
Directory of our code
For ease of reading, we list our directory structure.
├── data
│ └── basketball
│ └── A basketball free falls in the air
│ ├── depth
│ │ ├── depth_0000.png
│ │ └── ... (more depth images)
│ └── freestyle
│ ├── canny_0000.png
│ └── ... (more canny images)
├── PhysicsGeneration
│ ├── BlenderTool
│ │ ├── assets
│ │ │ └── basketball.obj
│ │ ├── __init__.py
│ │ └── utils.py
│ ├── prompt_for_GPT4.txt
│ └── script.py
├── README.MD
└── VideoGeneration
├── config
│ └── basketball.yaml
├── main.py
├── requirements.txt
└── utils
├── Cross_Frame_Attention.py
├── __init__.py
└── utils_all.py
data
: This directory stores the data used in the project.basketball
: A subdirectory specifically for basketball-related data.A basketball free falls in the air
: Contains data for a scenario where a basketball is free-falling.depth
: Contains depth images.freestyle
: Contains canny images.
PhysicsGeneration
: This directory contains the complete prompt for GPT-4 and the code to obtain depth maps and edge maps rendered through Blender.VideoGeneration
: This directory contains the code for generating a video using depth maps and edge maps generated by Blender.
Reproducing Our Work
Generation of Motion Edge Maps and Depth Maps with GPT-4 and Blender
Please install Blender 3.6 according to https://www.blender.org/download/.
cd PhysicsGeneration
Please copy the prompt from "prompt_for_GPT4.txt" to GPT-4 and add the following prefix to the Python code generated by GPT-4:
import bpy
import os
import math
import random
from random import uniform
import mathutils
from sys import path
path.append(bpy.path.abspath("./"))
from BlenderTool.utils import *
ASSETS_PATH = 'BlenderTool/assets/'
You will get a Python file like "script.py", please use the following commands to generate the edge and depth maps:
blender -b -P script.py
The generated edge maps and depth maps are saved are saved in "../data/new/" folder.
Video Generation
Please move to the "VideoGeneration" folder and install the corresponding environment:
cd ../VideoGeneration/
conda create -n GPT4Motion python=3.9
conda activate GPT4Motion
pip install -r requirements.txt
You can generate videos based on our pre-existing depth and edge maps by following the instructions below:
python main.py config/basketball.yaml
The generated results are shown below:
https://github.com/jiaxilv/GPT4Motion/assets/67190845/cb0b2b9d-b2ec-44b1-a6cd-4076bfaede8d