Home

Awesome

<p align="center"> <h2 align="center">MotionDirector: Motion Customization of Text-to-Video Diffusion Models</h2> <p align="center"> <a href="https://ruizhaocv.github.io/"><strong>Rui Zhao</strong></a> · <a href="https://ycgu.site/"><strong>Yuchao Gu</strong></a> · <a href="https://zhangjiewu.github.io/"><strong>Jay Zhangjie Wu</strong></a> · <a href="https://junhaozhang98.github.io//"><strong>David Junhao Zhang</strong></a> · <a href="https://jia-wei-liu.github.io/"><strong>Jia-Wei Liu</strong></a> · <a href="https://weijiawu.github.io/"><strong>Weijia Wu</strong></a> · <a href="https://www.jussikeppo.com/"><strong>Jussi Keppo</strong></a> · <a href="https://sites.google.com/view/showlab"><strong>Mike Zheng Shou</strong></a> <br> <br> <a href="https://arxiv.org/abs/2310.08465"><img src='https://img.shields.io/badge/arXiv-2310.08465-b31b1b.svg'></a> <a href='https://showlab.github.io/MotionDirector'><img src='https://img.shields.io/badge/Project_Page-MotionDirector-blue'></a> <a href='https://huggingface.co/spaces/ruizhaocv/MotionDirector'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-yellow'></a> <a href='https://www.youtube.com/watch?v=Wq93zi8bE3U'><img src='https://img.shields.io/badge/Demo_Video-MotionDirector-red'></a> <br> <b>Show Lab, National University of Singapore</b> </p> <p align="center"> <img src="https://github.com/showlab/MotionDirector/blob/page/assets/teaser.gif" width="1080px"/> <br> <em>MotionDirector can customize text-to-video diffusion models to generate videos with desired motions.</em> </p>

Task Definition

Motion Customization of Text-to-Video Diffusion Models: </br> Given a set of video clips of the same motion concept, the task of Motion Customization is to adapt existing text-to-video diffusion models to generate diverse videos with this motion.

Demos

Demo Video:

Demo Video of MotionDirector

Customize both Appearance and Motion: <a name="Customize_both_Appearance_and_Motion"></a>

<table class="center"> <tr> <td style="text-align:center;"><b>Reference images or videos</b></td> <td style="text-align:center;" colspan="3"><b>Videos generated by MotionDirector</b></td> </tr> <tr> <td><img src=assets/customized_appearance_results/reference_images.png></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_riding_a_horse_through_an_ancient_battlefield_1455028.gif></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_playing_golf_in_front_of_the_Great_Wall_5804477.gif></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_walking_cross_the_ancient_army_captured_with_a_reverse_follow_cinematic_shot_653658.gif></td> </tr> <tr> <td width=25% style="text-align:center;color:gray;">Reference images for appearance customization: "A Terracotta Warrior on a pure color background."</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is riding a horse through an ancient battlefield."</br> seed: 1455028</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is playing golf in front of the Great Wall." </br> seed: 5804477</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is walking cross the ancient army captured with a reverse follow cinematic shot." </br> seed: 653658</td> </tr> <tr> <td><img src=assets/multi_videos_results/reference_videos.gif></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_riding_a_bicycle_past_an_ancient_Chinese_palace_166357.gif></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_lifting_weights_in_front_of_the_Great_Wall_5635982.gif></td> <td><img src=assets/customized_appearance_results/A_Terracotta_Warrior_is_skateboarding_9033688.gif></td> </tr> <tr> <td width=25% style="text-align:center;color:gray;">Reference videos for motion customization: "A person is riding a bicycle."</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is riding a bicycle past an ancient Chinese palace."</br> seed: 166357.</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is lifting weights in front of the Great Wall." </br> seed: 5635982</td> <td width=25% style="text-align:center;">"A Terracotta Warrior is skateboarding." </br> seed: 9033688</td> </tr> </table>

News

ToDo

Model List

TypeTraining DataDescriptionsLink
MotionDirector for SportsMultiple videos for each model.Learn motion concepts of sports, i.e. lifting weights, riding horse, palying golf, etc.Link
MotionDirector for Cinematic ShotsA single video for each model.Learn motion concepts of cinematic shots, i.e. dolly zoom, zoom in, zoom out, etc.Link
MotionDirector for Image AnimationA single image for spatial path. And a single video or multiple videos for temporal path.Animate the given image with learned motions.Link
MotionDirector with Customized AppearanceA single image or multiple images for spatial path. And a single video or multiple videos for temporal path.Customize both appearance and motion in video generation.Link

Setup

Requirements

# create virtual environment
conda create -n motiondirector python=3.8
conda activate motiondirector
# install packages
pip install -r requirements.txt

Weights of Foundation Models

git lfs install
## You can choose the ModelScopeT2V or ZeroScope, etc., as the foundation model.
## ZeroScope
git clone https://huggingface.co/cerspense/zeroscope_v2_576w ./models/zeroscope_v2_576w/
## ModelScopeT2V
git clone https://huggingface.co/damo-vilab/text-to-video-ms-1.7b ./models/model_scope/

Weights of trained MotionDirector <a name="download_weights"></a>

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/ruizhaocv/MotionDirector_weights ./outputs

# More and better trained MotionDirector are released at a new repo:
git clone https://huggingface.co/ruizhaocv/MotionDirector ./outputs
# The usage is slightly different, which will be updated later.

Usage

Training

Train MotionDirector on multiple videos:

python MotionDirector_train.py --config ./configs/config_multi_videos.yaml

Train MotionDirector on a single video:

python MotionDirector_train.py --config ./configs/config_single_video.yaml

Note:

Inference

python MotionDirector_inference.py --model /path/to/the/foundation/model  --prompt "Your prompt" --checkpoint_folder /path/to/the/trained/MotionDirector --checkpoint_index 300 --noise_prior 0.

Note:

Inference with pre-trained MotionDirector

All available weights are at official Huggingface Repo. Run the download command, the weights will be downloaded to the folder outputs, then run the following inference command to generate videos.

MotionDirector trained on multiple videos:

python MotionDirector_inference.py --model /path/to/the/ZeroScope  --prompt "A person is riding a bicycle past the Eiffel Tower." --checkpoint_folder ./outputs/train/riding_bicycle/ --checkpoint_index 300 --noise_prior 0. --seed 7192280

Note:

Results:

<table class="center"> <tr> <td style="text-align:center;"><b>Reference Videos</b></td> <td style="text-align:center;" colspan="3"><b>Videos Generated by MotionDirector</b></td> </tr> <tr> <td><img src=assets/multi_videos_results/reference_videos.gif></td> <td><img src=assets/multi_videos_results/A_person_is_riding_a_bicycle_past_the_Eiffel_Tower_7192280.gif></td> <td><img src=assets/multi_videos_results/A_panda_is_riding_a_bicycle_in_a_garden_2178639.gif></td> <td><img src=assets/multi_videos_results/An_alien_is_riding_a_bicycle_on_Mars_2390886.gif></td> </tr> <tr> <td width=25% style="text-align:center;color:gray;">"A person is riding a bicycle."</td> <td width=25% style="text-align:center;">"A person is riding a bicycle past the Eiffel Tower.” </br> seed: 7192280</td> <td width=25% style="text-align:center;">"A panda is riding a bicycle in a garden." </br> seed: <s>2178639</s> </td> <td width=25% style="text-align:center;">"An alien is riding a bicycle on Mars." </br> seed: 2390886</td> </table>

MotionDirector trained on a single video:

16 frames:

python MotionDirector_inference.py --model /path/to/the/ZeroScope  --prompt "A tank is running on the moon." --checkpoint_folder ./outputs/train/car_16/ --checkpoint_index 150 --noise_prior 0.5 --seed 8551187
<table class="center"> <tr> <td style="text-align:center;"><b>Reference Video</b></td> <td style="text-align:center;" colspan="3"><b>Videos Generated by MotionDirector</b></td> </tr> <tr> <td><img src=assets/single_video_results/reference_video.gif></td> <td><img src=assets/single_video_results/A_tank_is_running_on_the_moon_8551187.gif></td> <td><img src=assets/single_video_results/A_lion_is_running_past_the_pyramids_431554.gif></td> <td><img src=assets/single_video_results/A_spaceship_is_flying_past_Mars_8808231.gif></td> </tr> <tr> <td width=25% style="text-align:center;color:gray;">"A car is running on the road."</td> <td width=25% style="text-align:center;">"A tank is running on the moon.” </br> seed: 8551187</td> <td width=25% style="text-align:center;">"A lion is running past the pyramids." </br> seed: 431554</td> <td width=25% style="text-align:center;">"A spaceship is flying past Mars." </br> seed: 8808231</td> </tr> </table>

24 frames:

python MotionDirector_inference.py --model /path/to/the/ZeroScope  --prompt "A truck is running past the Arc de Triomphe." --checkpoint_folder ./outputs/train/car_24/ --checkpoint_index 150 --noise_prior 0.5 --width 576 --height 320 --num-frames 24 --seed 34543
<table class="center"> <tr> <td style="text-align:center;"><b>Reference Video</b></td> <td style="text-align:center;" colspan="3"><b>Videos Generated by MotionDirector</b></td> </tr> <tr> <td><img src=assets/single_video_results/24_frames/reference_video.gif></td> <td><img src=assets/single_video_results/24_frames/A_truck_is_running_past_the_Arc_de_Triomphe_34543.gif></td> <td><img src=assets/single_video_results/24_frames/An_elephant_is_running_in_a_forest_2171736.gif></td> </tr> <tr> <td width=25% style="text-align:center;color:gray;">"A car is running on the road."</td> <td width=25% style="text-align:center;">"A truck is running past the Arc de Triomphe.” </br> seed: 34543</td> <td width=25% style="text-align:center;">"An elephant is running in a forest." </br> seed: 2171736</td> </tr> <tr> <td><img src=assets/single_video_results/24_frames/reference_video.gif></td> <td><img src=assets/single_video_results/24_frames/A_person_on_a_camel_is_running_past_the_pyramids_4904126.gif></td> <td><img src=assets/single_video_results/24_frames/A_spacecraft_is_flying_past_the_Milky_Way_galaxy_3235677.gif></td> </tr> <tr> <td width=25% style="text-align:center;color:gray;">"A car is running on the road."</td> <td width=25% style="text-align:center;">"A person on a camel is running past the pyramids." </br> seed: 4904126</td> <td width=25% style="text-align:center;">"A spacecraft is flying past the Milky Way galaxy." </br> seed: 3235677</td> </tr> </table>

MotionDirector for Sports <a name="MotionDirector_for_Sports"></a>

python MotionDirector_inference.py --model /path/to/the/ZeroScope  --prompt "A panda is lifting weights in a garden." --checkpoint_folder ./outputs/train/lifting_weights/ --checkpoint_index 300 --noise_prior 0. --seed 9365597
<table class="center"> <tr> <td style="text-align:center;" colspan="4"><b>Videos Generated by MotionDirector</b></td> </tr> <tr> <td style="text-align:center;" colspan="2"><b>Lifting Weights</b></td> <td style="text-align:center;" colspan="2"><b>Riding Bicycle</b></td> </tr> <tr> <td><img src=assets/sports_results/lifting_weights/A_panda_is_lifting_weights_in_a_garden_1699276.gif></td> <td><img src=assets/sports_results/lifting_weights/A_police_officer_is_lifting_weights_in_front_of_the_police_station_6804745.gif></td> <td><img src=assets/multi_videos_results/A_panda_is_riding_a_bicycle_in_a_garden_2178639.gif></td> <td><img src=assets/multi_videos_results/An_alien_is_riding_a_bicycle_on_Mars_2390886.gif></td> </tr> <tr> <td width=25% style="text-align:center;">"A panda is lifting weights in a garden.” </br> seed: 1699276</td> <td width=25% style="text-align:center;">"A police officer is lifting weights in front of the police station.” </br> seed: 6804745</td> <td width=25% style="text-align:center;">"A panda is riding a bicycle in a garden." </br> seed: <s>2178639</s> </td> <td width=25% style="text-align:center;">"An alien is riding a bicycle on Mars." </br> seed: 2390886</td> </tr> <tr> <td style="text-align:center;" colspan="2"><b>Riding Horse</b></td> <td style="text-align:center;" colspan="2"><b>Riding Horse</b></td> </tr> <tr> <td><img src=assets/sports_results/riding_horse/A_knight_riding_on_horseback_passing_by_a_castle_6491893.gif></td> <td><img src=assets/sports_results/riding_horse/A_man_riding_an_elephant_through_the_jungle_6230765.gif></td> <td><img src=assets/sports_results/riding_horse/A_girl_riding_a_unicorn_galloping_under_the_moonlight_6940542.gif></td> <td><img src=assets/sports_results/riding_horse/An_adventurer_riding_a_dinosaur_exploring_through_the_rainforest_6972276.gif></td> </tr> <tr> <td width=25% style="text-align:center;">"A knight riding on horseback passing by a castle.” </br> seed: 6491893</td> <td width=25% style="text-align:center;">"A man riding an elephant through the jungle.” </br> seed: 6230765</td> <td width=25% style="text-align:center;">"A girl riding a unicorn galloping under the moonlight." </br> seed: 6940542</td> <td width=25% style="text-align:center;">"An adventurer riding a dinosaur exploring through the rainforest." </br> seed: 6972276</td> </tr> <tr> <td style="text-align:center;" colspan="2"><b>Skateboarding</b></td> <td style="text-align:center;" colspan="2"><b>Playing Golf</b></td> </tr> <tr> <td><img src=assets/sports_results/skateboarding/A_robot_is_skateboarding_in_a_cyberpunk_city_1020673.gif></td> <td><img src=assets/sports_results/skateboarding/A_teddy_bear_skateboarding_in_Times_Square_New_York_3306353.gif></td> <td><img src=assets/sports_results/playing_golf/A_man_is_playing_golf_in_front_of_the_White_House_8870450.gif></td> <td><img src=assets/sports_results/playing_golf/A_monkey_is_playing_golf_on_a_field_full_of_flowers_2989633.gif></td> </tr> <tr> <td width=25% style="text-align:center;">"A robot is skateboarding in a cyberpunk city.” </br> seed: 1020673</td> <td width=25% style="text-align:center;">"A teddy bear skateboarding in Times Square New York.” </br> seed: 3306353</td> <td width=25% style="text-align:center;">"A man is playing golf in front of the White House." </br> seed: 8870450</td> <td width=25% style="text-align:center;">"A monkey is playing golf on a field full of flowers." </br> seed: 2989633</td> <tr> </table>

More sports, to be continued ...

MotionDirector for Cinematic Shots <a name="MotionDirector_for_Cinematic_Shots"></a>

1. Zoom

1.1 Dolly Zoom (Hitchcockian Zoom)

python MotionDirector_inference.py --model /path/to/the/ZeroScope  --prompt "A firefighter standing in front of a burning forest captured with a dolly zoom." --checkpoint_folder ./outputs/train/dolly_zoom/ --checkpoint_index 150 --noise_prior 0.5 --seed 9365597
<table class="center"> <tr> <td style="text-align:center;"><b>Reference Video</b></td> <td style="text-align:center;" colspan="3"><b>Videos Generated by MotionDirector</b></td> </tr> <tr> <td><img src=assets/cinematic_shots_results/dolly_zoom_16.gif></td> <td><img src=assets/cinematic_shots_results/A_firefighter_standing_in_front_of_a_burning_forest_captured_with_a_dolly_zoom_9365597.gif></td> <td><img src=assets/cinematic_shots_results/A_lion_sitting_on_top_of_a_cliff_captured_with_a_dolly_zoom_1675932.gif></td> <td><img src=assets/cinematic_shots_results/A_Roman_soldier_standing_in_front_of_the_Colosseum_captured_with_a_dolly_zoom_2310805.gif></td> </tr> <tr> <td width=25% style="text-align:center;color:gray;">"A man standing in room captured with a dolly zoom."</td> <td width=25% style="text-align:center;">"A firefighter standing in front of a burning forest captured with a dolly zoom." </br> seed: 9365597 </br> noise_prior: 0.5</td> <td width=25% style="text-align:center;">"A lion sitting on top of a cliff captured with a dolly zoom." </br> seed: 1675932 </br> noise_prior: 0.5</td> <td width=25% style="text-align:center;">"A Roman soldier standing in front of the Colosseum captured with a dolly zoom." </br> seed: 2310805 </br> noise_prior: 0.5 </td> </tr> <tr> <td><img src=assets/cinematic_shots_results/dolly_zoom_16.gif></td> <td><img src=assets/cinematic_shots_results/A_firefighter_standing_in_front_of_a_burning_forest_captured_with_a_dolly_zoom_4615820.gif></td> <td><img src=assets/cinematic_shots_results/A_lion_sitting_on_top_of_a_cliff_captured_with_a_dolly_zoom_4114896.gif></td> <td><img src=assets/cinematic_shots_results/A_Roman_soldier_standing_in_front_of_the_Colosseum_captured_with_a_dolly_zoom_7492004.gif></td> </tr> <tr> <td width=25% style="text-align:center;color:gray;">"A man standing in room captured with a dolly zoom."</td> <td width=25% style="text-align:center;">"A firefighter standing in front of a burning forest captured with a dolly zoom." </br> seed: 4615820 </br> noise_prior: 0.3</td> <td width=25% style="text-align:center;">"A lion sitting on top of a cliff captured with a dolly zoom." </br> seed: 4114896 </br> noise_prior: 0.3</td> <td width=25% style="text-align:center;">"A Roman soldier standing in front of the Colosseum captured with a dolly zoom." </br> seed: 7492004</td> </tr> </table>

1.2 Zoom In

The reference video is shot with my own water cup. You can also pick up your cup or any other object to practice camera movements and turn it into imaginative videos. Create your AI films with customized camera movements!

python MotionDirector_inference.py --model /path/to/the/ZeroScope  --prompt "A firefighter standing in front of a burning forest captured with a zoom in." --checkpoint_folder ./outputs/train/zoom_in/ --checkpoint_index 150 --noise_prior 0.3 --seed 1429227
<table class="center"> <tr> <td style="text-align:center;"><b>Reference Video</b></td> <td style="text-align:center;" colspan="3"><b>Videos Generated by MotionDirector</b></td> </tr> <tr> <td><img src=assets/cinematic_shots_results/zoom_in_16.gif></td> <td><img src=assets/cinematic_shots_results/A_firefighter_standing_in_front_of_a_burning_forest_captured_with_a_zoom_in_1429227.gif></td> <td><img src=assets/cinematic_shots_results/A_lion_sitting_on_top_of_a_cliff_captured_with_a_zoom_in_487239.gif></td> <td><img src=assets/cinematic_shots_results/A_Roman_soldier_standing_in_front_of_the_Colosseum_captured_with_a_zoom_in_1393184.gif></td> </tr> <tr> <td width=25% style="text-align:center;color:gray;">"A cup in a lab captured with a zoom in."</td> <td width=25% style="text-align:center;">"A firefighter standing in front of a burning forest captured with a zoom in." </br> seed: 1429227</td> <td width=25% style="text-align:center;">"A lion sitting on top of a cliff captured with a zoom in." </br> seed: 487239 </td> <td width=25% style="text-align:center;">"A Roman soldier standing in front of the Colosseum captured with a zoom in." </br> seed: 1393184</td> </tr> </table>

1.3 Zoom Out

python MotionDirector_inference.py --model /path/to/the/ZeroScope  --prompt "A firefighter standing in front of a burning forest captured with a zoom out." --checkpoint_folder ./outputs/train/zoom_out/ --checkpoint_index 150 --noise_prior 0.3 --seed 4971910
<table class="center"> <tr> <td style="text-align:center;"><b>Reference Video</b></td> <td style="text-align:center;" colspan="3"><b>Videos Generated by MotionDirector</b></td> </tr> <tr> <td><img src=assets/cinematic_shots_results/zoom_out_16.gif></td> <td><img src=assets/cinematic_shots_results/A_firefighter_standing_in_front_of_a_burning_forest_captured_with_a_zoom_out_4971910.gif></td> <td><img src=assets/cinematic_shots_results/A_lion_sitting_on_top_of_a_cliff_captured_with_a_zoom_out_1767994.gif></td> <td><img src=assets/cinematic_shots_results/A_Roman_soldier_standing_in_front_of_the_Colosseum_captured_with_a_zoom_out_8203639.gif></td> </tr> <tr> <td width=25% style="text-align:center;color:gray;">"A cup in a lab captured with a zoom out."</td> <td width=25% style="text-align:center;">"A firefighter standing in front of a burning forest captured with a zoom out." </br> seed: 4971910</td> <td width=25% style="text-align:center;">"A lion sitting on top of a cliff captured with a zoom out." </br> seed: 1767994 </td> <td width=25% style="text-align:center;">"A Roman soldier standing in front of the Colosseum captured with a zoom out." </br> seed: 8203639</td> </tr> </table>

2. Advanced Cinematic Shots

<table class="center"> <tr> <td style="text-align:center;" colspan="2"><b>Follow</b></td> <td style="text-align:center;" colspan="2"><b>Reverse Follow</b></td> </tr> <tr> <td><img src=assets/cinematic_shots_results/more_results/A_fireman_is_walking_through_fire_captured_with_a_follow_cinematic_shot_4926511.gif></td> <td><img src=assets/cinematic_shots_results/more_results/A_spaceman_is_walking_on_the_moon_with_a_follow_cinematic_shot_7594623.gif></td> <td><img src=assets/cinematic_shots_results/more_results/A_fireman_is_walking_through_fire_captured_with_a_reverse_follow_cinematic_shot_9759630.gif></td> <td><img src=assets/cinematic_shots_results/more_results/A_spaceman_walking_on_the_moon_captured_with_a_reverse_follow_cinematic_shot_4539309.gif></td> </tr> <tr> <td width=25% style="text-align:center;">"A fireman is walking through fire captured with a follow cinematic shot.” </br> seed: 4926511</td> <td width=25% style="text-align:center;">"A spaceman is walking on the moon with a follow cinematic shot.” </br> seed: 7594623</td> <td width=25% style="text-align:center;">"A fireman is walking through fire captured with a reverse follow cinematic shot.” </br> seed: 9759630</td> <td width=25% style="text-align:center;">"A spaceman walking on the moon captured with a reverse follow cinematic shot." </br> seed: 4539309</td> </tr> <tr> <td style="text-align:center;" colspan="2"><b>Chest Transition</b></td> <td style="text-align:center;" colspan="2"><b>Mini Jib Reveal: Foot-to-Head Shot</b></td> </tr> <tr> <td><img src=assets/cinematic_shots_results/more_results/A_fireman_is_walking_through_the_burning_forest_captured_with_a_chest_transition_cinematic_shot_5236349.gif></td> <td><img src=assets/cinematic_shots_results/more_results/An_ancient_Roman_soldier_walks_through_the_crowd_on_the_street_captured_with_a_chest_transition_cinematic_shot_3982271.gif></td> <td><img src=assets/cinematic_shots_results/more_results/An_ancient_Roman_soldier_walks_through_the_crowd_on_the_street_captured_with_a_mini_jib_reveal_cinematic_shot_654178.gif></td> <td><img src=assets/cinematic_shots_results/more_results/A_British_Redcoat_soldier_is_walking_through_the_mountains_captured_with_a_mini_jib_reveal_cinematic_shot_566917.gif></td> </tr> <tr> <td width=25% style="text-align:center;">"A fireman is walking through the burning forest captured with a chest transition cinematic shot.” </br> seed: 5236349</td> <td width=25% style="text-align:center;">"An ancient Roman soldier walks through the crowd on the street captured with a chest transition cinematic shot.” </br> seed: 3982271</td> <td width=25% style="text-align:center;">"An ancient Roman soldier walks through the crowd on the street captured with a mini jib reveal cinematic shot.” </br> seed: 654178</td> <td width=25% style="text-align:center;">"A British Redcoat soldier is walking through the mountains captured with a mini jib reveal cinematic shot." </br> seed: 566917</td> </tr> <tr> <td style="text-align:center;" colspan="2"><b>Pull Back: Subject Enters form the Left</b></td> <td style="text-align:center;" colspan="2"><b>Orbit</b></td> </tr> <tr> <td><img src=assets/cinematic_shots_results/more_results/A_robot_looks_at_a_distant_cyberpunk_city_captured_with_a_pull_back_cinematic_shot_9342597.gif></td> <td><img src=assets/cinematic_shots_results/more_results/A_woman_looks_at_a_distant_erupting_volcano_captured_with_a_pull_back_cinematic_shot_4197508.gif></td> <td><img src=assets/cinematic_shots_results/more_results/A_fireman_in_the_burning_forest_captured_with_an_orbit_cinematic_shot_8450300.gif></td> <td><img src=assets/cinematic_shots_results/more_results/A_spaceman_on_the_moon_captured_with_an_orbit_cinematic_shot_5899496.gif></td> </tr> <tr> <td width=25% style="text-align:center;">"A robot looks at a distant cyberpunk city captured with a pull back cinematic shot.” </br> seed: 9342597</td> <td width=25% style="text-align:center;">"A woman looks at a distant erupting volcano captured with a pull back cinematic shot.” </br> seed: 4197508</td> <td width=25% style="text-align:center;">"A fireman in the burning forest captured with an orbit cinematic shot.” </br> seed: 8450300</td> <td width=25% style="text-align:center;">"A spaceman on the moon captured with an orbit cinematic shot." </br> seed: 5899496</td> </tr> </table>

More Cinematic Shots, to be continued ....

MotionDirector for Image Animation <a name="MotionDirector_for_Image_Animation"></a>

Train

Train the spatial path with reference image.

python MotionDirector_train.py --config ./configs/config_single_image.yaml

Then train the temporal path to learn the motion in reference video.

python MotionDirector_train.py --config ./configs/config_single_video.yaml

Inference

Inference with spatial path learned from reference image and temporal path learned form reference video.

python MotionDirector_inference_multi.py --model /path/to/the/foundation/model  --prompt "Your prompt" --spatial_path_folder /path/to/the/trained/MotionDirector/spatial/lora/ --temporal_path_folder /path/to/the/trained/MotionDirector/temporal/lora/ --noise_prior 0.

Example

Download the pre-trained weights.

git clone https://huggingface.co/ruizhaocv/MotionDirector ./outputs

Run the following command.

python MotionDirector_inference_multi.py --model /path/to/the/ZeroScope  --prompt "A car is running on the road." --spatial_path_folder ./outputs/train/image_animation/train_2023-12-26T14-37-16/checkpoint-300/spatial/lora/ --temporal_path_folder ./outputs/train/image_animation/train_2023-12-26T13-08-20/checkpoint-300/temporal/lora/ --noise_prior 0.5 --seed 5057764
<table class="center"> <tr> <td style="text-align:center;"><b>Reference Image</b></td> <td style="text-align:center;"><b>Reference Video</b></td> <td style="text-align:center;" colspan="2"><b>Videos Generated by MotionDirector</b></td> </tr> <tr> <td><img src=test_data/img_car/car.jpg></td> <td><img src=assets/image_animation_results/car-turn-original.gif></td> <td><img src=assets/image_animation_results/A_car_is_running_on_the_road_5057764.gif></td> <td><img src=assets/image_animation_results/A_car_is_running_on_the_road_covered_with_snow_4904543.gif></td> </tr> <tr> <td width=25% style="text-align:center;color:gray;">"A car is running on the road."</td> <td width=25% style="text-align:center;color:gray;">"A car is running on the road."</td> <td width=25% style="text-align:center;">"A car is running on the road." </br> seed: 5057764</td> <td width=25% style="text-align:center;">"A car is running on the road covered with snow." </br> seed: 4904543</td> </tr> </table>

MotionDirector with Customized Appearance <a name="MotionDirector_with_Customized_Appearance"></a>

Train

Train the spatial path with reference images.

python MotionDirector_train.py --config ./configs/config_multi_images.yaml

Then train the temporal path to learn the motions in reference videos.

python MotionDirector_train.py --config ./configs/config_multi_videos.yaml

Inference

Inference with spatial path learned from reference images and temporal path learned form reference videos.

python MotionDirector_inference_multi.py --model /path/to/the/foundation/model  --prompt "Your prompt" --spatial_path_folder /path/to/the/trained/MotionDirector/spatial/lora/ --temporal_path_folder /path/to/the/trained/MotionDirector/temporal/lora/ --noise_prior 0.

Example

Download the pre-trained weights.

git clone https://huggingface.co/ruizhaocv/MotionDirector ./outputs

Run the following command.

python MotionDirector_inference_multi.py --model /path/to/the/ZeroScope  --prompt "A Terracotta Warrior is riding a horse through an ancient battlefield." --spatial_path_folder ./outputs/train/customized_appearance/terracotta_warrior/checkpoint-default/spatial/lora --temporal_path_folder ./outputs/train/riding_horse/checkpoint-default/temporal/lora/ --noise_prior 0. --seed 1455028

Results are shown in the table.

More results

If you have a more impressive MotionDirector or generated videos, please feel free to open an issue and share them with us. We would greatly appreciate it. Improvements to the code are also highly welcome.

Please refer to Project Page for more results.

Astronaut's daily life on Mars:

<table class="center"> <tr> <td style="text-align:center;" colspan="4"><b>Astronaut's daily life on Mars (Motion concepts learned by MotionDirector)</b></td> </tr> <tr> <td style="text-align:center;"><b>Lifting Weights</b></td> <td style="text-align:center;"><b>Playing Golf</b></td> <td style="text-align:center;"><b>Riding Horse</b></td> <td style="text-align:center;"><b>Riding Bicycle</b></td> </tr> <tr> <td><img src=assets/astronaut_mars/An_astronaut_is_lifting_weights_on_Mars_4K_high_quailty_highly_detailed_4008521.gif></td> <td><img src=assets/astronaut_mars/Astronaut_playing_golf_on_Mars_659514.gif></td> <td><img src=assets/astronaut_mars/An_astronaut_is_riding_a_horse_on_Mars_4K_high_quailty_highly_detailed_1913261.gif></td> <td><img src=assets/astronaut_mars/An_astronaut_is_riding_a_bicycle_past_the_pyramids_Mars_4K_high_quailty_highly_detailed_5532778.gif></td> </tr> <tr> <td width=25% style="text-align:center;">"An astronaut is lifting weights on Mars, 4K, high quailty, highly detailed.” </br> seed: 4008521</td> <td width=25% style="text-align:center;">"Astronaut playing golf on Mars” </br> seed: 659514</td> <td width=25% style="text-align:center;">"An astronaut is riding a horse on Mars, 4K, high quailty, highly detailed." </br> seed: 1913261</td> <td width=25% style="text-align:center;">"An astronaut is riding a bicycle past the pyramids Mars, 4K, high quailty, highly detailed." </br> seed: 5532778</td> </tr> <tr> <td style="text-align:center;"><b>Skateboarding</b></td> <td style="text-align:center;"><b>Cinematic Shot: "Reverse Follow"</b></td> <td style="text-align:center;"><b>Cinematic Shot: "Follow"</b></td> <td style="text-align:center;"><b>Cinematic Shot: "Orbit"</b></td> </tr> <tr> <td><img src=assets/astronaut_mars/An_astronaut_is_skateboarding_on_Mars_6615212.gif></td> <td><img src=assets/astronaut_mars/An_astronaut_is_walking_on_Mars_captured_with_a_reverse_follow_cinematic_shot_1224445.gif></td> <td><img src=assets/astronaut_mars/An_astronaut_is_walking_on_Mars_captured_with_a_follow_cinematic_shot_6191674.gif></td> <td><img src=assets/astronaut_mars/An_astronaut_is_standing_on_Mars_captured_with_an_orbit_cinematic_shot_7483453.gif></td> </tr> <tr> <td width=25% style="text-align:center;">"An astronaut is skateboarding on Mars"</br> seed: 6615212</td> <td width=25% style="text-align:center;">"An astronaut is walking on Mars captured with a reverse follow cinematic shot." </br> seed: 1224445</td> <td width=25% style="text-align:center;">"An astronaut is walking on Mars captured with a follow cinematic shot." </br> seed: 6191674</td> <td width=25% style="text-align:center;">"An astronaut is standing on Mars captured with an orbit cinematic shot." </br> seed: 7483453</td> <tr> </table>

Citation


@article{zhao2023motiondirector,
  title={MotionDirector: Motion Customization of Text-to-Video Diffusion Models},
  author={Zhao, Rui and Gu, Yuchao and Wu, Jay Zhangjie and Zhang, David Junhao and Liu, Jiawei and Wu, Weijia and Keppo, Jussi and Shou, Mike Zheng},
  journal={arXiv preprint arXiv:2310.08465},
  year={2023}
}

Shoutouts