Home

Awesome

ComfyUI_EchoMimic

You can using EchoMimic & EchoMimic V2 in comfyui

EchoMimici:Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
Echomimic_v2: Towards Striking, Simplified, and Semi-Body Human Animation


Updates:

2024/11/22


1. Installation

In the ./ComfyUI /custom_node directory, run the following:

git clone https://github.com/smthemex/ComfyUI_EchoMimic.git

2. Requirements

pip install -r requirements.txt
pip install --no-deps facenet-pytorch

Notice

pip uninstall torchaudio torchvision torch xformers
pip install torch torchvision torchaudio --index-url  https://download.pytorch.org/whl/cu124
pip install xformers
python -m pip uninstall torchaudio torchvision torch xformers
python -m pip install torch torchvision torchaudio --index-url  https://download.pytorch.org/whl/cu124
python -m pip install xformers
pip uninstall ffmpeg   
pip install ffmpeg-python  

Troubleshooting errors with stable-audio-tools / other audio issues

If using conda & python >3.12

Uninstall all & downgrade python

pip uninstall torchaudio torchvision torch xformers ffmpeg

conda uninstall python
conda install python=3.11.9

pip install --upgrade pip wheel
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia
or install torch 2.4 
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia

Should have most of these packages if you install the custom nodes from git urls

pip install flash-attn spandrel opencv-python diffusers jwt diffusers bitsandbytes omegaconf decord carvekit insightface easydict open_clip ffmpeg-python taming onnxruntime

3. Models Required


├── ComfyUI/models/ echo_mimic
|         ├── unet
|             ├── diffusion_pytorch_model.bin
|             ├── config.json
|         ├── audio_processor
|             ├── whisper_tiny.pt
|         ├── vae
|             ├── diffusion_pytorch_model.safetensors
|             ├── config.json

├── ComfyUI/models/echo_mimic
|         ├── denoising_unet.pth
|         ├── face_locator.pth
|         ├── motion_module.pth
|         ├── reference_unet.pth

Audio-Drived Algo Inference acc 音频驱动加速版

├── ComfyUI/models/echo_mimic
|         ├── denoising_unet_acc.pth
|         ├── face_locator.pth
|         ├── motion_module_acc.pth
|         ├── reference_unet.pth

Using Pose-Drived Algo Inference 姿态驱动

├── ComfyUI/models/echo_mimic
|         ├── denoising_unet_pose.pth
|         ├── face_locator_pose.pth
|         ├── motion_module_pose.pth
|         ├── reference_unet_pose.pth

Using Pose-Drived Algo Inference ACC 姿态驱动加速版

├── ComfyUI/models/echo_mimic
|         ├── denoising_unet_pose_acc.pth
|         ├── face_locator_pose.pth
|         ├── motion_module_pose_acc.pth
|         ├── reference_unet_pose.pth

3.2 v2 version use model below V2, Automatic download, you can manually add it 使用以下模型,使用及自动下载,你可以手动添加:
模型地址address:huggingface

├── ComfyUI/models/echo_mimic/v2
|         ├── denoising_unet.pth
|         ├── motion_module.pth
|         ├── pose_encoder.pth
|         ├── reference_unet.pth

4 Example



5 Function Description


--infer_mode:音频驱动视频生成,“audio_drived” 和"audio_drived_acc";
--infer_mode:参考pkl模型文件视频pose生成 "pose_normal", "pose_acc";
----motion_sync:如果打开且video_file有视频文件时,生成pkl文件,并生成参考视频的视频;pkl文件在input\tensorrt_lite 目录下,再次使用需要重启comfyUI。
----motion_sync:如果关闭且pose_mode不为none的时候,读取选定的pose_mode目录名的pkl文件,生成pose视频;如果pose_mode为空的时候,生成基于默认assets\test_pose_demo_pose的视频
----audio_from_video:仅在motion_sync开启,且video_file有视频文件时可用,可用提取video_file的视频文件的声音,请确保该视频有声音,且为mp4格式。

特别的选项
--save_video:如果不想使用VH节点时,可以开启,默认关闭;
--draw_mouse:你可以试试;
--length:帧数,时长等于length/fps;
--acc模型 ,6步就可以,但是质量略有下降;
--lowvram :低显存用户可以开启 lowvram users can enable it
--内置内置图片等比例裁切。
特别注意的地方
--cfg数值设置为1,仅在turbo模式有效,其他会报错。

Infir_mode: Audio driven video generation, "audio-d rived" and "audio-d rived_acc";
Infer_rode: Refer to the PKL model file to generate "pose_normal" and "pose_acc" for the video pose;
Motion_Sync: If opened and there is a video file in videoFILE, generate a pkl file and generate a reference video for the video; The pkl file is located in the input \ sensorrt_lite directory. To use it again, you need to restart ComfyUI.
Motion_Sync: If turned off and pose_mode is not 'none', read the pkl file of the selected pose_mode directory name and generate a pose video; If pose_mode is empty, generate a video based on the default assets \ test_pose_demo_pose
Audio_from-video: Only available when motion_stync is enabled and videoFILE has video files, it can extract the sound from videoFILE's video files. Please ensure that the video has sound and is in mp4 format.

Special options:
--Save_video: If you do not want to use VH nodes, it can be turned on and turned off by default;
--Draw_mause: You can try it out;
--Length: frame rate, duration equal to length/fps;
--The ACC model only requires 6 steps, but the quality has slightly decreased;
--Built in image proportional cropping.
Special attention should be paid to:
--The cfg value is set to 1, which is only valid in turbo mode, otherwise an error will be reported.


既往更新:

Previous updates:


6 Citation

EchoMimici

@misc{chen2024echomimic,
  title={EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning},
  author={Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma},
  year={2024},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

EchoMimici-V2

@misc{meng2024echomimic,
  title={EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation},
  author={Rang Meng, Xingyu Zhang, Yuming Li, Chenguang Ma},
  year={2024},
  eprint={2411.10061},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

hallo2

@misc{cui2024hallo2,
	title={Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation},
	author={Jiahao Cui and Hui Li and Yao Yao and Hao Zhu and Hanlin Shang and Kaihui Cheng and Hang Zhou and Siyu Zhu and️ Jingdong Wang},
	year={2024},
	eprint={2410.07718},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
}