Home

Awesome

<h1 align='center'>HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models</h1> <div align='center'> <a href='https://github.com/songkey' target='_blank'>Shengkai Zhang</a>, <a href='https://github.com/RhythmJnh' target='_blank'>Nianhong Jiao</a>, <a href='https://github.com/Shelton0215' target='_blank'>Tian Li</a>, <a href='https://github.com/chaojie12131243' target='_blank'>Chaojie Yang</a>, <a href='https://github.com/xchgit' target='_blank'>Chenhui Xue</a><sup>*</sup>, <a href='https://github.com/boya34' target='_blank'>Boya Niu</a><sup>*</sup>, <a href='https://github.com/HelloVision/HelloMeme' target='_blank'>Jun Gao</a> </div> <div align='center'> HelloVision | HelloGroup Inc. </div> <div align='center'> <small><sup>*</sup> Intern</small> </div> <br> <div align='center'> <a href='https://github.com/HelloVision/HelloMeme'><img src='https://img.shields.io/github/stars/HelloVision/HelloMeme'></a> <a href='https://songkey.github.io/hellomeme/'><img src='https://img.shields.io/badge/Project-HomePage-Green'></a> <a href='https://arxiv.org/pdf/2410.22901'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> <a href='https://huggingface.co/songkey'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow'></a> <a href='https://github.com/HelloVision/ComfyUI_HelloMeme'><img src='https://img.shields.io/badge/ComfyUI-UI-blue'></a> </div> <p align="center"> <img src="data/demo.gif" alt="showcase"> </p>

πŸ”† New Features/Updates

Introduction

This repository contains the official code implementation of the paper HelloMeme. Any updates related to the code or models from the paper will be posted here. The code for the ablation experiments discussed in the paper will be added to the ExperimentsOnSKAttentions section. Additionally, we plan to release a ComfyUI interface for HelloMeme, with updates posted here as well.

Getting Started

1. Create a Conda Environment

conda create -n hellomeme python=3.10.11
conda activate hellomeme

2. Install PyTorch and FFmpeg

To install the latest version of PyTorch, please refer to the official PyTorch website for detailed installation instructions. Additionally, the code will invoke the system's ffmpeg command for video and audio editing, so the runtime environment must have ffmpeg pre-installed. For installation guidance, please refer to the official FFmpeg website.

3. Install dependencies

pip install diffusers transformers einops scipy opencv-python tqdm pillow onnxruntime onnx safetensors accelerate peft

[!IMPORTANT]

Note the version of diffusers required: frequent updates to diffusers may lead to dependency conflicts. We will periodically check the repo’s compatibility with the latest diffusers version. The currently tested and supported version is diffusers==0.31.0.

4. Clone the repository

git clone https://github.com/HelloVision/HelloMeme
cd HelloMeme

5. Run the code

python inference_image.py # for image generation
python inference_video.py # for video generation

6. Install for Gradio App

We recommend setting up the environment with conda.

pip install gradio
pip install imageio[ffmpeg]

run python app.py

After run the app, all models will be downloaded. Longer the driver video, more VRAM will need.

Examples

Image Generation

The input for the image generation script inference_image.py consists of a reference image and a drive image, as shown in the figure below:

<table> <tr> <td><img src="./data/reference_images/harris.jpg" width="256" height="256"> <br> Reference Image</td> <td ><img src="./data/drive_images/yao.jpg" width="192" height="256"> <br> Drive Image </td> </tr> </table>

The output of the image generation script is shown below:

<table> <tr> <td><img src="./data/harris_yao.jpg" width="256" height="256"> <br> Based on SD1.5 </td> <td ><img src="./data/harris_yao_toon.jpg" height="256" height="256"> <br> Based on <a href="https://civitai.com/models/75650/disney-pixar-cartoon-type-b">disneyPixarCartoon</a> </td> </tr> </table>

Video Generation

The input for the video generation script inference_video.py consists of a reference image and a drive video, as shown in the figure below:

<table> <tr> <td><img src="./data/reference_images/trump.jpg" width="256" height="256"> <br> Reference Image</td> <td ><img src="./data/jue.gif" width="256" height="256"> <br> Drive Video </td> </tr> </table>

The output of the video generation script is shown below:

<table> <tr> <td><img src="./data/trump_jue.gif" width="256" height="256"> <br> Based on <a href="https://civitai.com/models/25694/epicrealism">epicrealism</a> </td> <td ><img src="./data/trump_jue-toon.gif" width="256" height="256"> <br> Based on <a href="https://civitai.com/models/75650/disney-pixar-cartoon-type-b">disneyPixarCartoon</a> </td> </tr> </table>

[!Note]

If the face in the driving video has significant movement (such as evident camera motion), it is recommended to set the trans_ratio parameter to 0 to prevent distorted outputs.

inference_video(engines, ref_img_path, drive_video_path, save_path, trans_ratio=0.0)

Pretrained Models

Our models are all hosted on πŸ€—, and the startup script will download them automatically. The specific model information is as follows:

modelsizeurlInfo
songkey/hm_reference312M<a href='https://huggingface.co/songkey/hm_reference'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow'></a>The weights of the ReferenceAdapter module
songkey/hm_control149M<a href='https://huggingface.co/songkey/hm_control'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow'></a>The weights of the HMControlNet module
songkey/hm_animatediff835M<a href='https://huggingface.co/songkey/hm_animatediff'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow'></a>The weights of the Turned Animatediff (patch size 16)
songkey/hm_animatediff_frame12835M<a href='https://huggingface.co/songkey/hm_animatediff_frame12'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow'></a>The weights of the Turned Animatediff (patch size 12)
hello_3dmm.onnx311M<a href='https://huggingface.co/songkey/hello_group_facemodel'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow'></a>For face RT Extractor
hello_arkit_blendshape.onnx9.11M<a href='https://huggingface.co/songkey/hello_group_facemodel'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow'></a>Extract ARKit blendshape parameters
hello_face_det.onnx317K<a href='https://huggingface.co/songkey/hello_group_facemodel'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow'></a>Face Detector
hello_face_landmark.onnx2.87M<a href='https://huggingface.co/songkey/hello_group_facemodel'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow'></a>Face Landmarks (222 points)

Our pipeline also supports loading stylized base models (safetensors). For video generation tasks, using some customized models for portrait generation, such as Realistic Vision V6.0 B1, can produce better results. You can download checkpoints and loras into the directories pretrained_models/ and pretrained_models/loras/, respectively.

Acknowledgements

Thanks to πŸ€— for providing diffusers, which has greatly enhanced development efficiency in diffusion-related work. We also drew considerable inspiration from MagicAnimate and EMO, and Animatediff allowed us to implement the video version at a very low cost. Finally, we thank our colleagues Shengjie Wu and Zemin An, whose foundational modules played a significant role in this work.

Citation

@misc{zhang2024hellomemeintegratingspatialknitting,
        title={HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models}, 
        author={Shengkai Zhang and Nianhong Jiao and Tian Li and Chaojie Yang and Chenhui Xue and Boya Niu and Jun Gao},
        year={2024},
        eprint={2410.22901},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://arxiv.org/abs/2410.22901}, 
  }

Contact

Shengkai Zhang (songkey@pku.edu.cn)