SpecVQGAN | Taming the visually guided sound generation by shrinking a training dataset to a set of representative vectors | <ul><li>Vladimir Iashin</li> <li>Esa Rahtu</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/wiki.svg" alt="wiki" height=20/>, <img src="images/wiki.svg" alt="wiki" height=20/>, <img src="images/wiki.svg" alt="wiki" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 12.07.2024 |
LivePortrait | Video-driven portrait animation framework with a focus on better generalization, controllability, and efficiency for practical usage | <ul><li>Jianzhu Guo</li> <li>Dingyun Zhang</li> <li>Xiaoqiang Liu</li> <li>Zhizhou Zhong</li><details><summary>others</summary><li>Yuan Zhang</li> <li>Pengfei Wan</li> <li>Di Zhang</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li>project</li><li><img src="images/reddit.svg" alt="reddit" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 10.07.2024 |
TAPIR | Tracking Any Point with per-frame Initialization and temporal Refinement | <ul><li>Carl Doersch</li> <li>Yi Yang</li> <li>Mel Vecerik</li> <li>Dilara Gokay</li><details><summary>others</summary><li>Ankush Gupta</li> <li>Yusuf Aytar</li> <li>Joao Carreira</li> <li>Andrew Zisserman</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post, blog post</li><li><img src="images/deepmind.svg" alt="deepmind" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li><img src="images/neurips.svg" alt="neurips" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 05.07.2024 |
Wav2Lip | A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild | <ul><li>Prajwal Renukanand</li> <li>Rudrabha Mukhopadhyay</li> <li>Vinay Namboodiri</li> <li>C. V. Jawahar</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>data</li><li>demo</li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 27.06.2024 |
DeepLabCut | Efficient method for markerless pose estimation based on transfer learning with deep neural networks that achieves excellent results with minimal training data | <ul><li>Alexander Mathis</li> <li>Pranav Mamidanna</li> <li>Kevin Cury</li> <li>Taiga Abe</li><details><summary>others</summary><li>Venkatesh Murthy</li> <li>Mackenzie Mathis</li> <li>Matthias Bethge</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/docker.svg" alt="docker" height=20/></li><li>forum</li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li><img src="images/twitter.svg" alt="twitter" height=20/></li><li>website</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 05.06.2024 |
PoolFormer | MetaFormer Is Actually What You Need for Vision | <ul><li>Weihao Yu</li> <li>Mi Luo</li> <li>Pan Zhou</li> <li>Chenyang Si</li><details><summary>others</summary><li>Yichen Zhou</li> <li>Xinchao Wang</li> <li>Jiashi Feng</li> <li>Shuicheng Yan</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 01.06.2024 |
StoryDiffusion | Way of self-attention calculation, termed Consistent Self-Attention, that significantly boosts the consistency between the generated images and augments prevalent pretrained diffusion-based text-to-image models in a zero-shot manner | <ul><li>Yupeng Zhou</li> <li>Daquan Zhou</li> <li>Ming-Ming Cheng</li> <li>Jiashi Feng</li> <li>Qibin Hou</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li>project</li><li><img src="images/reddit.svg" alt="reddit" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 04.05.2024 |
FILM | A frame interpolation algorithm that synthesizes multiple intermediate frames from two input images with large in-between motion | <ul><li>Fitsum Reda</li> <li>Janne Kontkanen</li> <li>Eric Tabellion</li> <li>Deqing Sun</li><details><summary>others</summary><li>Caroline Pantofaru</li> <li>Brian Curless</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>data, data, data</li><li><img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/tf.svg" alt="tf" height=20/>, <img src="images/tf.svg" alt="tf" height=20/>, <img src="images/tf.svg" alt="tf" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 03.05.2024 |
PuLID | Pure and Lightning ID customization, a tuning-free ID customization method for text-to-image generation | <ul><li>Zinan Guo</li> <li>Yanze Wu</li> <li>Zhuowei Chen</li> <li>Lang Chen</li> <li>Qian He</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/reddit.svg" alt="reddit" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 03.05.2024 |
VoiceCraft | token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech on audiobooks, internet videos, and podcasts | <ul><li>Puyuan Peng</li> <li>Po-Yao Huang</li> <li>Shang-Wen Li</li> <li>Abdelrahman Mohamed</li> <li>David Harwath</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li>project</li><li><img src="images/reddit.svg" alt="reddit" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 21.04.2024 |
ZeST | Method for zero-shot material transfer to an object in the input image given a material exemplar image | <ul><li>Ta-Ying Cheng</li> <li>Prafull Sharma</li> <li>Andrew Markham</li> <li>Niki Trigoni</li> <li>Varun Jampani</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li>project</li><li><img src="images/reddit.svg" alt="reddit" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 16.04.2024 |
InstantMesh | Feed-forward framework for instant 3D mesh generation from a single image, featuring state-of-the-art generation quality and significant training scalability | <ul><li>Jiale Xu</li> <li>Weihao Cheng</li> <li>Yiming Gao</li> <li>Xintao Wang</li><details><summary>others</summary><li>Shenghua Gao</li> <li>Ying Shan</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li><img src="images/reddit.svg" alt="reddit" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 16.04.2024 |
AlphaFold | Highly accurate protein structure prediction | <ul><li>John Jumper</li> <li>Richard Evans</li> <li>Alexander Pritzel</li> <li>Tim Green</li><details><summary>others</summary><li>Michael Figurnov</li> <li>Olaf Ronneberger</li> <li>Kathryn Tunyasuvunakool</li> <li>Russ Bates</li> <li>Augustin Žídek</li> <li>Anna Potapenko</li> <li>Alex Bridgland</li> <li>Clemens Meyer</li> <li>Simon Kohl</li> <li>Andrew Ballard</li> <li>Bernardino Romera-Paredes</li> <li>Stanislav Nikolov</li> <li>Rishub Jain</li></ul></details> | <ul><li>blog post, blog post</li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>paper</li><li><img src="images/pwc.svg" alt="pwc" height=20/></li><li><img src="images/wiki.svg" alt="wiki" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 15.04.2024 |
Würstchen | Architecture for text-to-image synthesis that combines competitive performance with unprecedented cost-effectiveness for large-scale text-to-image diffusion models | <ul><li>Pablo Pernias</li> <li>Dominic Rampas</li> <li>Mats Richter</li> <li>Christopher Pal</li> <li>Marc Aubreville</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li><img src="images/reddit.svg" alt="reddit" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 06.04.2024 |
AQLM | Extreme Compression of Large Language Models via Additive Quantization | <ul><li>Vage Egiazarian</li> <li>Andrei Panferov</li> <li>Denis Kuznedelev</li> <li>Elias Frantar</li><details><summary>others</summary><li>Artem Babenko</li> <li>Dan Alistarh</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/></li><li><img src="images/reddit.svg" alt="reddit" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 08.03.2024 |
YOLOv9 | Learning What You Want to Learn Using Programmable Gradient Information | <ul><li>Chien-Yao Wang</li> <li>I-Hau Yeh</li> <li>Hong-Yuan Mark Liao</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post</li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 05.03.2024 |
Multi-LoRA Composition | LoRA Switch and LoRA Composite, approaches that aim to surpass traditional techniques in terms of accuracy and image quality, especially in complex compositions | <ul><li>Ming Zhong</li> <li>Yelong Shen</li> <li>Shuohang Wang</li> <li>Yadong Lu</li><details><summary>others</summary><li>Yizhu Jiao</li> <li>Siru Ouyang</li> <li>Donghan Yu</li> <li>Jiawei Han</li> <li>Weizhu Chen</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li><img src="images/reddit.svg" alt="reddit" height=20/></li><li><img src="images/twitter.svg" alt="twitter" height=20/></li><li>website</li></ul> | ![Open In Colab](images/colab.svg) | 03.03.2024 |
AMARETTO | Multiscale and multimodal inference of regulatory networks to identify cell circuits and their drivers shared and distinct within and across biological systems of human disease | <ul><li>Nathalie Pochet</li> <li>Olivier Gevaert</li> <li>Mohsen Nabian</li> <li>Jayendra Shinde</li><details><summary>others</summary><li>Celine Everaert</li> <li>Thorin Tabor</li></ul></details> | <ul><li>bioconductor</li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 28.02.2024 |
LIDA | Tool for generating grammar-agnostic visualizations and infographics | Victor Dibia | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 06.02.2024 |
ViT | Vision Transformer and MLP-Mixer Architectures | <ul><li>Alexey Dosovitskiy</li> <li>Lucas Beyer</li> <li>Alexander Kolesnikov</li> <li>Dirk Weissenborn</li><details><summary>others</summary><li>Xiaohua Zhai</li> <li>Thomas Unterthiner</li> <li>Mostafa Dehghani</li> <li>Matthias Minderer</li> <li>Georg Heigold</li> <li>Sylvain Gelly</li> <li>Jakob Uszkoreit</li> <li>Neil Houlsby</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post</li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/kaggle.svg" alt="kaggle" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 06.02.2024 |
3D Ken Burns | A reference implementation of 3D Ken Burns Effect from a Single Image using PyTorch - given a single input image, it animates this still image with a virtual camera scan and zoom subject to motion parallax | Manuel Romero | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 24.01.2024 |
VALL-E X | Cross-lingual neural codec language model for cross-lingual speech synthesis | <ul><li>Ziqiang Zhang</li> <li>Long Zhou</li> <li>Chengyi Wang</li> <li>Sanyuan Chen</li><details><summary>others</summary><li>Yu Wu</li> <li>Shujie Liu</li> <li>Zhuo Chen</li> <li>Yanqing Liu</li> <li>Huaming Wang</li> <li>Jinyu Li</li> <li>Lei He</li> <li>Sheng Zhao</li> <li>Furu Wei</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>demo</li><li><img src="images/discord.svg" alt="discord" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 19.01.2024 |
PhotoMaker | Efficient personalized text-to-image generation method, which mainly encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information | <ul><li>Zhen Li</li> <li>Mingdeng Cao</li> <li>Xintao Wang</li> <li>Zhongang Qi</li><details><summary>others</summary><li>Ming-Ming Cheng</li> <li>Ying Shan</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li>project</li><li><img src="images/reddit.svg" alt="reddit" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 18.01.2024 |
DDColor | End-to-end method with dual decoders for image colorization | <ul><li>Xiaoyang Kang</li> <li>Tao Yang</li> <li>Wenqi Ouyang</li> <li>Peiran Ren</li><details><summary>others</summary><li>Lingzhi Li</li> <li>Xuansong Xie</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 15.01.2024 |
PASD | Pixel-aware stable diffusion network to achieve robust Real-ISR as well as personalized stylization | <ul><li>Tao Yang</li> <li>Peiran Ren</li> <li>Xuansong Xie</li> <li>Lei Zhang</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/></li><li><img src="images/reddit.svg" alt="reddit" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 12.01.2024 |
HandRefiner | Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting | <ul><li>Wenquan Lu</li> <li>Yufei Xu</li> <li>Jing Zhang</li> <li>Chaoyue Wang</li> <li>Dacheng Tao</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/reddit.svg" alt="reddit" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 08.01.2024 |
audio2photoreal | Framework for generating full-bodied photorealistic avatars that gesture according to the conversational dynamics of a dyadic interaction | <ul><li>Evonne Ng</li> <li>Javier Romero</li> <li>Timur Bagautdinov</li> <li>Shaojie Bai</li><details><summary>others</summary><li>Trevor Darrell</li> <li>Angjoo Kanazawa</li> <li>Alexander Richard</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 08.01.2024 |
GraphCast | Learning skillful medium-range global weather forecasting | <ul><li>Rémi Lam</li> <li>Alvaro Sanchez-Gonzalez</li> <li>Matthew Willson</li> <li>Peter Wirnsberger</li><details><summary>others</summary><li>Meire Fortunato</li> <li>Ferran Alet</li> <li>Suman Ravuri</li> <li>Timo Ewalds</li> <li>Zach Eaton-Rosen</li> <li>Weihua Hu</li> <li>Alexander Merose</li> <li>Stephan Hoyer</li> <li>George Holland</li> <li>Oriol Vinyals</li> <li>Jacklynn Stott</li> <li>Alexander Pritzel</li> <li>Shakir Mohamed</li> <li>Peter Battaglia</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>data</li><li><img src="images/deepmind.svg" alt="deepmind" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 04.01.2024 |
ESM | Evolutionary Scale Modeling: Pretrained language models for proteins | <ul><li>Zeming Lin</li> <li>Roshan Rao</li> <li>Brian Hie</li> <li>Zhongkai Zhu</li><details><summary>others</summary><li>Allan dos Santos Costa</li> <li>Maryam Fazel-Zarandi</li> <li>Tom Sercu</li> <li>Salvatore Candido</li> <li>Alexander Rives</li> <li>Joshua Meier</li> <li>Robert Verkuil</li> <li>Jason Liu</li> <li>Chloe Hsu</li> <li>Adam Lerer</li></ul></details> | <ul><li>ESM Atlas</li><li>FSDP</li><li>ICML</li><li>data</li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li>paper, paper, paper, paper</li><li>pubmed</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 28.12.2023 |
CoTracker | Architecture that jointly tracks multiple points throughout an entire video | <ul><li>Nikita Karaev</li> <li>Ignacio Rocco</li> <li>Benjamin Graham</li> <li>Natalia Neverova</li><details><summary>others</summary><li>Andrea Vedaldi</li> <li>Christian Rupprecht</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 28.12.2023 |
LLaVA | Large Language and Vision Assistant, an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding | <ul><li>Haotian Liu</li> <li>Chunyuan Li</li> <li>Qingyang Wu</li> <li>Yong Jae Lee</li> <li>Yuheng Li</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>demo</li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 22.12.2023 |
Background Matting V2 | Real-time, high-resolution background replacement technique which operates at 30fps in 4K resolution, and 60fps for HD on a modern GPU | <ul><li>Shanchuan Lin</li> <li>Andrey Ryabtsev</li> <li>Soumyadip Sengupta</li> <li>Brian Curless</li><details><summary>others</summary><li>Steve Seitz</li> <li>Ira Kemelmacher-Shlizerman</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 22.12.2023 |
Gaussian Splatting | State-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (≥ 100 fps) novel-view synthesis at 1080p resolution | <ul><li>Bernhard Kerbl</li> <li>Georgios Kopanas</li> <li>Thomas Leimkühler</li> <li>George Drettakis</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li>project</li><li><img src="images/reddit.svg" alt="reddit" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 19.12.2023 |
SMPLer-X | Scaling up EHPS towards the first generalist foundation model, with up to ViT-Huge as the backbone and training with up to 4.5M instances from diverse data sources | <ul><li>Zhongang Cai</li> <li>Wanqi Yin</li> <li>Ailing Zeng</li> <li>Chen Wei</li><details><summary>others</summary><li>Qingping Sun</li> <li>Yanjun Wang</li> <li>Hui En Pang</li> <li>Haiyi Mei</li> <li>Mingyuan Zhang</li> <li>Lei Zhang</li> <li>Chen Change Loy</li> <li>Lei Yang</li> <li>Ziwei Liu</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/neurips.svg" alt="neurips" height=20/></li><li>project</li><li><img src="images/reddit.svg" alt="reddit" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 18.12.2023 |
DeepCache | Training-free paradigm that accelerates diffusion models from the perspective of model architecture | <ul><li>Xinyin Ma</li> <li>Gongfan Fang</li> <li>Xinchao Wang</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li>project</li><li><img src="images/reddit.svg" alt="reddit" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 18.12.2023 |
MagicAnimate | Diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity | <ul><li>Zhongcong Xu</li> <li>Jianfeng Zhang</li> <li>Jun Hao Liew</li> <li>Hanshu Yan</li><details><summary>others</summary><li>Jiawei Liu</li> <li>Chenxu Zhang</li> <li>Jiashi Feng</li> <li>Mike Shou</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li>project</li><li>website</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 18.12.2023 |
DiffBIR | Towards Blind Image Restoration with Generative Diffusion Prior | <ul><li>Xinqi Lin</li> <li>Jingwen He</li> <li>Ziyan Chen</li> <li>Zhaoyang Lyu</li><details><summary>others</summary><li>Ben Fei</li> <li>Bo Dai</li> <li>Wanli Ouyang</li> <li>Yu Qiao</li> <li>Chao Dong</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 18.12.2023 |
SPIN | Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop | <ul><li>Nikos Kolotouros</li> <li>Georgios Pavlakos</li> <li>Michael Black</li> <li>Kostas Daniilidis</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/docker.svg" alt="docker" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 12.12.2023 |
AudioLDM | Text-to-audio system that is built on a latent space to learn the continuous audio representations from contrastive language-audio pretraining latents | <ul><li>Haohe Liu</li> <li>Zehua Chen</li> <li>Yi Yuan</li> <li>Xinhao Mei</li><details><summary>others</summary><li>Xubo Liu</li> <li>Danilo Mandic</li> <li>Wenwu Wang</li> <li>Mark Plumbley</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 02.12.2023 |
TabPFN | Neural network that learned to do tabular data prediction | <ul><li>Noah Hollmann</li> <li>Samuel Müller</li> <li>Katharina Eggensperger</li> <li>Frank Hutter</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post</li><li><img src="images/twitter.svg" alt="twitter" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 29.11.2023 |
Concept Sliders | Plug-and-play low rank adaptors applied on top of pretrained models | <ul><li>Rohit Gandikota</li> <li>Joanna Materzyńska</li> <li>Tingrui Zhou</li> <li>Antonio Torralba</li> <li>David Bau</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li><img src="images/neurips.svg" alt="neurips" height=20/></li><li>project</li><li><img src="images/reddit.svg" alt="reddit" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 26.11.2023 |
Qwen-VL | Set of large-scale vision-language models designed to perceive and understand both text and images | <ul><li>Jinze Bai</li> <li>Shuai Bai</li> <li>Shusheng Yang</li> <li>Shijie Wang</li><details><summary>others</summary><li>Sinan Tan</li> <li>Peng Wang</li> <li>Junyang Lin</li> <li>Chang Zhou</li> <li>Jingren Zhou</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>demo</li><li><img src="images/discord.svg" alt="discord" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 24.11.2023 |
AnimeGANv3 | Double-tail generative adversarial network for fast photo animation | <ul><li>Gang Liu</li> <li>Xin Chen</li></ul> | <ul><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 23.11.2023 |
PixArt-Σ | Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation | <ul><li>Junsong Chen</li> <li>Chongjian Ge</li> <li>Enze Xie</li> <li>Yue Wu</li><details><summary>others</summary><li>Lewei Yao</li> <li>Xiaozhe Ren</li> <li>Zhongdao Wang</li> <li>Ping Luo</li> <li>Huchuan Lu</li> <li>Zhenguo Li</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/discord.svg" alt="discord" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/></li><li>project</li><li><img src="images/reddit.svg" alt="reddit" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 07.11.2023 |
Zero123++ | Image-conditioned diffusion model for generating 3D-consistent multi-view images from a single input view | <ul><li>Ruoxi Shi</li> <li>Hansheng Chen</li> <li>Zhuoyang Zhang</li> <li>Minghua Liu</li><details><summary>others</summary><li>Chao Xu</li> <li>Xinyue Wei</li> <li>Linghao Chen</li> <li>Chong Zeng</li> <li>Hao Su</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li><img src="images/reddit.svg" alt="reddit" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 26.10.2023 |
Show-1 | Hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation | <ul><li>David Junhao Zhang</li> <li>Jay Zhangjie Wu</li> <li>Jiawei Liu</li> <li>Rui Zhao</li><details><summary>others</summary><li>Lingmin Ran</li> <li>Yuchao Gu</li> <li>Difei Gao</li> <li>Mike Zheng Shou</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/></li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 15.10.2023 |
AudioSep | Foundation model for open-domain audio source separation with natural language queries | <ul><li>Xubo Liu</li> <li>Qiuqiang Kong</li> <li>Yan Zhao</li> <li>Haohe Liu</li><details><summary>others</summary><li>Yi Yuan</li> <li>Yuzhuo Liu</li> <li>Rui Xia</li> <li>Yuxuan Wang</li> <li>Mark Plumbley</li> <li>Wenwu Wang</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 12.10.2023 |
DA-CLIP | Degradation-aware vision-language model to better transfer pretrained vision-language models to low-level vision tasks as a universal framework for image restoration | <ul><li>Ziwei Luo</li> <li>Fredrik Gustafsson</li> <li>Zheng Zhao</li> <li>Jens Sjölund</li> <li>Thomas Schön</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 11.10.2023 |
SadTalker | Generates 3D motion coefficients of the 3DMM from audio and implicitly modulates a novel 3D-aware face render for talking head generation | <ul><li>Wenxuan Zhang</li> <li>Xiaodong Cun</li> <li>Xuan Wang</li> <li>Yong Zhang</li><details><summary>others</summary><li>Xi Shen</li> <li>Yu Guo</li> <li>Ying Shan</li> <li>Fei Wang</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/discord.svg" alt="discord" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 10.10.2023 |
Musika | Music generation system that can be trained on hundreds of hours of music using a single consumer GPU, and that allows for much faster than real-time generation of music of arbitrary length on a consumer CPU | <ul><li>Marco Pasini</li> <li>Jan Schlüter</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>data</li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 09.10.2023 |
YOLOv6 | Single-stage object detection framework dedicated to industrial applications | <ul><li>Kaiheng Weng</li> <li>Meng Cheng</li> <li>Yiduo Li</li> <li>Xiangxiang Chu</li> <li>Xiaolin Wei</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post</li><li>data</li><li><img src="images/docs.svg" alt="docs" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 08.10.2023 |
DreamGaussian | Algorithm to convert 3D Gaussians into textured meshes and apply a fine-tuning stage to refine the details | <ul><li>Jiaxiang Tang</li> <li>Jiawei Ren</li> <li>Hang Zhou</li> <li>Ziwei Liu</li> <li>Gang Zeng</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 04.10.2023 |
ICON | Given a set of images, method estimates a detailed 3D surface from each image and then combines these into an animatable avatar | <ul><li>Yuliang Xiu</li> <li>Jinlong Yang</li> <li>Dimitrios Tzionas</li> <li>Michael Black</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 31.08.2023 |
DINOv2 | Produce high-performance visual features that can be directly employed with classifiers as simple as linear layers on a variety of computer vision tasks; these visual features are robust and perform well across domains without any requirement for fine-tuning | <ul><li>Maxime Oquab</li> <li>Timothée Darcet</li> <li>Théo Moutakanni</li> <li>Huy Vo</li><details><summary>others</summary><li>Marc Szafraniec</li> <li>Vasil Khalidov</li> <li>Pierre Fernandez</li> <li>Daniel Haziza</li> <li>Francisco Massa</li> <li>Alaaeldin El-Nouby</li> <li>Mahmoud Assran</li> <li>Nicolas Ballas</li> <li>Wojciech Galuba</li> <li>Russell Howes</li> <li>Po-Yao Huang</li> <li>Shang-Wen Li</li> <li>Ishan Misra</li> <li>Michael Rabbat</li> <li>Vasu Sharma</li> <li>Gabriel Synnaeve</li> <li>Hu Xu</li> <li>Hervé Jegou</li> <li>Julien Mairal</li> <li>Patrick Labatut</li> <li>Armand Joulin</li> <li>Piotr Bojanowski</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post</li><li>demo</li><li><img src="images/hf.svg" alt="hf" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/>, <img src="images/medium.svg" alt="medium" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 31.08.2023 |
Neuralangelo | Framework for high-fidelity 3D surface reconstruction from RGB video captures | <ul><li>Zhaoshuo Li</li> <li>Thomas Müller</li> <li>Alex Evans</li> <li>Russell Taylor</li><details><summary>others</summary><li>Mathias Unberath</li> <li>Ming-Yu Liu</li> <li>Chen-Hsuan Lin</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post</li><li><img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 27.08.2023 |
OWL-ViT | Simple Open-Vocabulary Object Detection with Vision Transformers | <ul><li>Matthias Minderer</li> <li>Alexey Gritsenko</li> <li>Austin Stone</li> <li>Maxim Neumann</li><details><summary>others</summary><li>Dirk Weissenborn</li> <li>Alexey Dosovitskiy</li> <li>Aravindh Mahendran</li> <li>Anurag Arnab</li> <li>Mostafa Dehghani</li> <li>Zhuoran Shen</li> <li>Xiao Wang</li> <li>Xiaohua Zhai</li> <li>Thomas Kipf</li> <li>Neil Houlsby</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 21.08.2023 |
StyleGAN3 | Alias-Free Generative Adversarial Networks | <ul><li>Tero Karras</li> <li>Miika Aittala</li> <li>Samuli Laine</li> <li>Erik Härkönen</li><details><summary>others</summary><li>Janne Hellsten</li> <li>Jaakko Lehtinen</li> <li>Timo Aila</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/neurips.svg" alt="neurips" height=20/></li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 13.08.2023 |
FateZero | Zero-shot text-based editing method on real-world videos without per-prompt training or use-specific mask | <ul><li>Chenyang Qi</li> <li>Xiaodong Cun</li> <li>Yong Zhang</li> <li>Chenyang Lei</li><details><summary>others</summary><li>Xintao Wang</li> <li>Ying Shan</li> <li>Qifeng Chen</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/></li><li>project</li><li><img src="images/reddit.svg" alt="reddit" height=20/></li><li>video</li></ul> | ![Open In Colab](images/colab.svg) | 12.08.2023 |
Big GAN | Large Scale GAN Training for High Fidelity Natural Image Synthesis | <ul><li>Andrew Brock</li> <li>Jeff Donahue</li> <li>Karen Simonyan</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 03.08.2023 |
LaMa | Resolution-robust Large Mask Inpainting with Fourier Convolutions | <ul><li>Roman Suvorov</li> <li>Elizaveta Logacheva</li> <li>Anton Mashikhin</li> <li>Anastasia Remizova</li><details><summary>others</summary><li>Arsenii Ashukha</li> <li>Aleksei Silvestrov</li> <li>Naejin Kong</li> <li>Harshith Goka</li> <li>Kiwoong Park</li> <li>Victor Lempitsky</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 01.08.2023 |
MakeItTalk | A method that generates expressive talking-head videos from a single facial image with audio as the only input | <ul><li>Yang Zhou</li> <li>Xintong Han</li> <li>Eli Shechtman</li> <li>Jose Echevarria</li><details><summary>others</summary><li>Evangelos Kalogerakis</li> <li>Dingzeyu Li</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>data</li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 27.07.2023 |
HiDT | A generative image-to-image model and a new upsampling scheme that allows to apply image translation at high resolution | <ul><li>Denis Korzhenkov</li> <li>Gleb Sterkin</li> <li>Sergey Nikolenko</li> <li>Victor Lempitsky</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 24.07.2023 |
CutLER | Simple approach for training unsupervised object detection and segmentation models | <ul><li>Xudong Wang</li> <li>Rohit Girdhar</li> <li>Stella Yu</li> <li>Ishan Misra</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/docs.svg" alt="docs" height=20/></li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 24.07.2023 |
Recognize Anything & Tag2Text | Vision language pre-training framework, which introduces image tagging into vision-language models to guide the learning of visual-linguistic features | <ul><li>Xinyu Huang</li> <li>Youcai Zhang</li> <li>Jinyu Ma</li> <li>Zhaoyang Li</li><details><summary>others</summary><li>Yanchun Xie</li> <li>Yuzhuo Qin</li> <li>Tong Luo</li> <li>Yaqian Li</li> <li>Yandong Guo</li> <li>Yandong Guo</li> <li>Lei Zhang</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li>project, project</li></ul> | ![Open In Colab](images/colab.svg) | 09.07.2023 |
Thin-Plate Spline Motion Model | End-to-end unsupervised motion transfer framework | <ul><li>Jian Zhao</li> <li>Hui Zhang</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li>supp</li></ul> | ![Open In Colab](images/colab.svg) | 07.07.2023 |
DragGAN | Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold | <ul><li>Xingang Pan</li> <li>Ayush Tewari</li> <li>Thomas Leimkühler</li> <li>Lingjie Liu</li><details><summary>others</summary><li>Abhimitra Meka</li> <li>Christian Theobalt</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/twitter.svg" alt="twitter" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 03.07.2023 |
Fast Segment Anything | CNN Segment Anything Model trained using only 2% of the SA-1B dataset published by SAM authors | <ul><li>Xu Zhao</li> <li>Wenchao Ding</li> <li>Yongqi An</li> <li>Yinglong Du</li><details><summary>others</summary><li>Tao Yu</li> <li>Min Li</li> <li>Ming Tang</li> <li>Jinqiao Wang</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 30.06.2023 |
MobileSAM | Towards Lightweight SAM for Mobile Applications | <ul><li>Chaoning Zhang</li> <li>Dongshen Han</li> <li>Yu Qiao</li> <li>Jung Uk Kim</li><details><summary>others</summary><li>Sung-Ho Bae</li> <li>Seungkyu Lee</li> <li>Choong Seon Hong</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/twitter.svg" alt="twitter" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 30.06.2023 |
Grounding DINO | Marrying DINO with Grounded Pre-Training for Open-Set Object Detection | <ul><li>Shilong Liu</li> <li>Zhaoyang Zeng</li> <li>Tianhe Ren</li> <li>Feng Li</li><details><summary>others</summary><li>Hao Zhang</li> <li>Jie Yang</li> <li>Chunyuan Li</li> <li>Jianwei Yang</li> <li>Hang Su</li> <li>Jun Zhu</li> <li>Lei Zhang</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/pwc.svg" alt="pwc" height=20/>, <img src="images/pwc.svg" alt="pwc" height=20/>, <img src="images/pwc.svg" alt="pwc" height=20/>, <img src="images/pwc.svg" alt="pwc" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 28.06.2023 |
T5X | Modular, composable, research-friendly framework for high-performance, configurable, self-service training, evaluation, and inference of sequence models at many scales | <ul><li>Adam Roberts</li> <li>Hyung Won Chung</li> <li>Anselm Levskaya</li> <li>Gaurav Mishra</li><details><summary>others</summary><li>James Bradbury</li> <li>Daniel Andor</li> <li>Sharan Narang</li> <li>Brian Lester</li> <li>Colin Gaffney</li> <li>Afroz Mohiuddin</li> <li>Curtis Hawthorne</li> <li>Aitor Lewkowycz</li> <li>Alex Salcianu</li> <li>Marc van Zee</li> <li>Jacob Austin</li> <li>Sebastian Goodman</li> <li>Livio Baldini Soares</li> <li>Haitang Hu</li> <li>Sasha Tsvyashchenko</li> <li>Aakanksha Chowdhery</li> <li>Jasmijn Bastings</li> <li>Jannis Bulian</li> <li>Xavier Garcia</li> <li>Jianmo Ni</li> <li>Kathleen Kenealy</li> <li>Jonathan Clark</li> <li>Dan Garrette</li> <li>James Lee-Thorp</li> <li>Colin Raffel</li> <li>Noam Shazeer</li> <li>Marvin Ritter</li> <li>Maarten Bosma</li> <li>Alexandre Passos</li> <li>Jeremy Maitin-Shepard</li> <li>Noah Fiedel</li> <li>Brennan Saeta</li> <li>Ryan Sepassi</li> <li>Alexander Spiridonov</li> <li>Joshua Newlan</li> <li>Andrea Gesmundo</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/docs.svg" alt="docs" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/tf.svg" alt="tf" height=20/>, <img src="images/tf.svg" alt="tf" height=20/>, <img src="images/tf.svg" alt="tf" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 27.06.2023 |
First Order Motion Model for Image Animation | Transferring facial movements from video to image | Aliaksandr Siarohin | <ul><li><img src="images/neurips.svg" alt="neurips" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 04.06.2023 |
Parallel WaveGAN | State-of-the-art non-autoregressive models to build your own great vocoder | Tomoki Hayashi | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>demo</li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 01.06.2023 |
ECON | designed for "Human digitization from a color image", which combines the best properties of implicit and explicit representations, to infer high-fidelity 3D clothed humans from in-the-wild images, even with loose clothing or in challenging poses | <ul><li>Yuliang Xiu</li> <li>Jinlong Yang</li> <li>Xu Cao</li> <li>Dimitrios Tzionas</li> <li>Michael Black</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/discord.svg" alt="discord" height=20/></li><li><img src="images/docker.svg" alt="docker" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/reddit.svg" alt="reddit" height=20/></li><li><img src="images/twitter.svg" alt="twitter" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 31.05.2023 |
MMS | The Massively Multilingual Speech project expands speech technology from about 100 languages to over 1000 by building a single multilingual speech recognition model supporting over 1100 languages, language identification models able to identify over 4000 languages, pretrained models supporting over 1400 languages, and text-to-speech models for over 1100 languages | <ul><li>Vineel Pratap</li> <li>Andros Tjandra</li> <li>Bowen Shi</li> <li>Paden Tomasello</li><details><summary>others</summary><li>Arun Babu</li> <li>Sayani Kundu</li> <li>Ali Elkahky</li> <li>Zhaoheng Ni</li> <li>Apoorv Vyas</li> <li>Maryam Fazel-Zarandi</li> <li>Alexei Baevski</li> <li>Yossi Adi</li> <li>Xiaohui Zhang</li> <li>Wei-Ning Hsu</li> <li>Alexis Conneau</li> <li>Michael Auli</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post</li><li><img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 26.05.2023 |
DFL-Colab | This project provides you IPython Notebook to use DeepFaceLab | chervonij | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>guide</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 30.04.2023 |
FAB | Flow AIS Bootstrap uses AIS to generate samples in regions where the flow is a poor approximation of the target, facilitating the discovery of new modes | <ul><li>Laurence Midgley</li> <li>Vincent Stimper</li> <li>Gregor N. C. Simm</li> <li>Bernhard Schölkopf</li> <li>José Miguel Hernández-Lobato</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 29.04.2023 |
CodeFormer | Transformer-based prediction network to model global composition and context of the low-quality faces for code prediction, enabling the discovery of natural faces that closely approximate the target faces even when the inputs are severely degraded | <ul><li>Shangchen Zhou</li> <li>Kelvin Chan</li> <li>Chongyi Li</li> <li>Chen Change Loy</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/neurips.svg" alt="neurips" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 21.04.2023 |
Text2Video-Zero | Text-to-Image Diffusion Models are Zero-Shot Video Generators | <ul><li>Levon Khachatryan</li> <li>Andranik Movsisyan</li> <li>Vahram Tadevosyan</li> <li>Roberto Henschel</li><details><summary>others</summary><li>Zhangyang Wang</li> <li>Shant Navasardyan</li> <li>Humphrey Shi</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li>project</li><li>video</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 11.04.2023 |
Segment Anything | The Segment Anything Model produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image | <ul><li>Alexander Kirillov</li> <li>Eric Mintun</li> <li>Nikhila Ravi</li> <li>Hanzi Mao</li><details><summary>others</summary><li>Chloé Rolland</li> <li>Laura Gustafson</li> <li>Tete Xiao</li> <li>Spencer Whitehead</li> <li>Alex Berg</li> <li>Wan-Yen Lo</li> <li>Piotr Dollar</li> <li>Ross Girshick</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post, blog post</li><li>data</li><li>website</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 10.04.2023 |
FollowYourPose | Two-stage training scheme that can utilize image pose pair and pose-free video datasets and the pre-trained text-to-image model to obtain the pose-controllable character videos | <ul><li>Yue Ma</li> <li>Yingqing He</li> <li>Xiaodong Cun</li> <li>Xintao Wang</li><details><summary>others</summary><li>Siran Chen</li> <li>Ying Shan</li> <li>Xiu Li</li> <li>Qifeng Chen</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/></li><li>project</li><li><img src="images/twitter.svg" alt="twitter" height=20/></li><li>video</li></ul> | ![Open In Colab](images/colab.svg) | 07.04.2023 |
EVA3D | High-quality unconditional 3D human generative model that only requires 2D image collections for training | <ul><li>Fangzhou Hong</li> <li>Zhaoxi Chen</li> <li>Yushi Lan</li> <li>Liang Pan</li> <li>Ziwei Liu</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 06.04.2023 |
Stable Dreamfusion | Using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis | <ul><li>Jiaxiang Tang</li> <li>Ben Poole</li> <li>Ajay Jain</li> <li>Jon Barron</li> <li>Ben Mildenhall</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li>project</li><li><img src="images/pt.svg" alt="pt" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 04.04.2023 |
UniFormer | Unified Transformer for Efficient Spatiotemporal Representation Learning | <ul><li>Kunchang Li</li> <li>Yali Wang</li> <li>Peng Gao</li> <li>Guanglu Song</li><details><summary>others</summary><li>Yu Liu</li> <li>Hongsheng Li</li> <li>Yu Qiao</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 31.03.2023 |
PIFuHD | Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization | <ul><li>Shunsuke Saito</li> <li>Tomas Simon</li> <li>Jason Saragih</li> <li>Hanbyul Joo</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 26.03.2023 |
VideoReTalking | System to edit the faces of a real-world talking head video according to input audio, producing a high-quality and lip-syncing output video even with a different emotion | <ul><li>Kun Cheng</li> <li>Xiaodong Cun</li> <li>Yong Zhang</li> <li>Menghan Xia</li><details><summary>others</summary><li>Fei Yin</li> <li>Mingrui Zhu</li> <li>Xuan Wang</li> <li>Jue Wang</li> <li>Nannan Wang</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li>project</li><li><img src="images/reddit.svg" alt="reddit" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 19.03.2023 |
Visual ChatGPT | Connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting | <ul><li>Chenfei Wu</li> <li>Shengming Yin</li> <li>Weizhen Qi</li> <li>Xiaodong Wang</li><details><summary>others</summary><li>Zecheng Tang</li> <li>Nan Duan</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 15.03.2023 |
Tune-A-Video | One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation | <ul><li>Jay Zhangjie Wu</li> <li>Yixiao Ge</li> <li>Xintao Wang</li> <li>Stan Weixian Lei</li><details><summary>others</summary><li>Yuchao Gu</li> <li>Yufei Shi</li> <li>Wynne Hsu</li> <li>Ying Shan</li> <li>Xiaohu Qie</li> <li>Mike Zheng Shou</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 23.02.2023 |
GPEN | GAN Prior Embedded Network for Blind Face Restoration in the Wild | <ul><li>Tao Yang</li> <li>Peiran Ren</li> <li>Xuansong Xie</li> <li>Lei Zhang</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>demo</li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 15.02.2023 |
PyMAF-X | Кegression-based approach to recovering parametric full-body models from monocular images | <ul><li>Hongwen Zhang</li> <li>Yating Tian</li> <li>Yuxiang Zhang</li> <li>Mengcheng Li</li><details><summary>others</summary><li>Liang An</li> <li>Zhenan Sun</li> <li>Yebin Liu</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 14.02.2023 |
Disco Diffusion | A frankensteinian amalgamation of notebooks, models and techniques for the generation of AI Art and Animations | <ul><li>Max Ingham</li> <li>Adam Letts</li> <li>Daniel Russell</li> <li>Chigozie Nri</li></ul> | <ul><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 11.02.2023 |
Open-Unmix | A deep neural network reference implementation for music source separation, applicable for researchers, audio engineers and artists | <ul><li>Fabian-Robert Stöter</li> <li>Antoine Liutkus</li></ul> | <ul><li>data</li><li><img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/pwc.svg" alt="pwc" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 09.02.2023 |
GrooVAE | Some applications of machine learning for generating and manipulating beats and drum performances | <ul><li>Jon Gillick</li> <li>Adam Roberts</li> <li>Jesse Engel</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post</li><li>data</li><li>web app</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 01.02.2023 |
Multitrack MusicVAE | The models in this notebook are capable of encoding and decoding single measures of up to 8 tracks, optionally conditioned on an underlying chord | <ul><li>Ian Simon</li> <li>Adam Roberts</li> <li>Colin Raffel</li> <li>Jesse Engel</li><details><summary>others</summary><li>Curtis Hawthorne</li> <li>Douglas Eck</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post</li></ul> | ![Open In Colab](images/colab.svg) | 01.02.2023 |
MusicVAE | A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music | <ul><li>Adam Roberts</li> <li>Jesse Engel</li> <li>Colin Raffel</li> <li>Curtis Hawthorne</li> <li>Douglas Eck</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post</li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 01.02.2023 |
Learning to Paint | Learning to Paint With Model-based Deep Reinforcement Learning | Manuel Romero | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/reddit.svg" alt="reddit" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 01.02.2023 |
VALL-E | Language modeling approach for text to speech synthesis | <ul><li>Chengyi Wang</li> <li>Sanyuan Chen</li> <li>Yu Wu</li> <li>Ziqiang Zhang</li><details><summary>others</summary><li>Long Zhou</li> <li>Shujie Liu</li> <li>Zhuo Chen</li> <li>Yanqing Liu</li> <li>Huaming Wang</li> <li>Jinyu Li</li> <li>Lei He</li> <li>Sheng Zhao</li> <li>Furu Wei</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li>project</li><li><img src="images/reddit.svg" alt="reddit" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 18.01.2023 |
Instant-NGP | Instant Neural Graphics Primitives with a Multiresolution Hash Encoding | <ul><li>Thomas Müller</li> <li>Alex Evans</li> <li>Christoph Schied</li> <li>Alexander Keller</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post</li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li>tutorial</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 18.01.2023 |
Fourier Feature Networks | Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains | <ul><li>Matthew Tancik</li> <li>Pratul Srinivasan</li> <li>Ben Mildenhall</li> <li>Sara Fridovich-Keil</li><details><summary>others</summary><li>Nithin Raghavan</li> <li>Utkarsh Singhal</li> <li>Ravi Ramamoorthi</li> <li>Jon Barron</li> <li>Ren Ng</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/neurips.svg" alt="neurips" height=20/>, <img src="images/neurips.svg" alt="neurips" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 17.01.2023 |
AlphaPose | Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time | <ul><li>Hao-Shu Fang</li> <li>Jiefeng Li</li> <li>Hongyang Tang</li> <li>Chao Xu</li><details><summary>others</summary><li>Haoyi Zhu</li> <li>Yuliang Xiu</li> <li>Yong-Lu Li</li> <li>Cewu Lu</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 07.01.2023 |
HybrIK | Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation | <ul><li>Jiefeng Li</li> <li>Chao Xu</li> <li>Zhicun Chen</li> <li>Siyuan Bian</li><details><summary>others</summary><li>Lixin Yang</li> <li>Cewu Lu</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/pwc.svg" alt="pwc" height=20/></li><li>supp</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 01.01.2023 |
Demucs | Hybrid Spectrogram and Waveform Source Separation | Alexandre Défossez | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 21.11.2022 |
StyleCLIP | Text-Driven Manipulation of StyleGAN Imager | <ul><li>Or Patashnik</li> <li>Zongze Wu</li> <li>Eli Shechtman</li> <li>Daniel Cohen-Or</li> <li>Dani Lischinski</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 30.10.2022 |
MotionDiffuse | The first diffusion model-based text-driven motion generation framework, which demonstrates several desired properties over existing methods | <ul><li>Mingyuan Zhang</li> <li>Zhongang Cai</li> <li>Liang Pan</li> <li>Fangzhou Hong</li><details><summary>others</summary><li>Xinying Guo</li> <li>Lei Yang</li> <li>Ziwei Liu</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 13.10.2022 |
VToonify | Leverages the mid- and high-resolution layers of StyleGAN to render high-quality artistic portraits based on the multi-scale content features extracted by an encoder to better preserve the frame details | <ul><li>Shuai Yang</li> <li>Liming Jiang</li> <li>Ziwei Liu</li> <li>Chen Change Loy</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 07.10.2022 |
PyMAF | Pyramidal Mesh Alignment Feedback loop in regression network for well-aligned body mesh recovery and extend it for the recovery of expressive full-body models | <ul><li>Hongwen Zhang</li> <li>Yating Tian</li> <li>Yuxiang Zhang</li> <li>Mengcheng Li</li><details><summary>others</summary><li>Liang An</li> <li>Zhenan Sun</li> <li>Yebin Liu</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 06.10.2022 |
AlphaTensor | Discovering faster matrix multiplication algorithms with reinforcement learning | <ul><li>Alhussein Fawzi</li> <li>Matej Balog</li> <li>Aja Huang</li> <li>Thomas Hubert</li><details><summary>others</summary><li>Bernardino Romera-Paredes</li> <li>Mohammadamin Barekatain</li> <li>Alexander Novikov</li> <li>Francisco Ruiz</li> <li>Julian Schrittwieser</li> <li>Grzegorz Swirszcz</li> <li>David Silver</li> <li>Demis Hassabis</li> <li>Pushmeet Kohli</li></ul></details> | <ul><li>blog post</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 04.10.2022 |
Swin2SR | Novel Swin Transformer V2, to improve SwinIR for image super-resolution, and in particular, the compressed input scenario | <ul><li>Marcos Conde</li> <li>Ui-Jin Choi</li> <li>Maxime Burchi</li> <li>Radu Timofte</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li><img src="images/kaggle.svg" alt="kaggle" height=20/>, <img src="images/kaggle.svg" alt="kaggle" height=20/>, <img src="images/kaggle.svg" alt="kaggle" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 03.10.2022 |
Functa | From data to functa: Your data point is a function and you can treat it like one | <ul><li>Emilien Dupont</li> <li>Hyunjik Kim</li> <li>Ali Eslami</li> <li>Danilo Rezende</li> <li>Dan Rosenbaum</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/tf.svg" alt="tf" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 24.09.2022 |
Whisper | Automatic speech recognition system trained on 680,000 hours of multilingual and multitask supervised data collected from the web | <ul><li>Alec Radford</li> <li>Jong Wook Kim</li> <li>Tao Xu</li> <li>Greg Brockman</li><details><summary>others</summary><li>Christine McLeavey</li> <li>Ilya Sutskever</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post</li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 21.09.2022 |
DeOldify (video) | Colorize your own videos! | Jason Antic | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li>model</li><li><img src="images/reddit.svg" alt="reddit" height=20/>, <img src="images/reddit.svg" alt="reddit" height=20/></li><li><img src="images/twitter.svg" alt="twitter" height=20/></li><li>website</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 19.09.2022 |
DeOldify (photo) | Colorize your own photos! | <ul><li>Jason Antic</li> <li>Matt Robinson</li> <li>María Benavente</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>model</li><li><img src="images/reddit.svg" alt="reddit" height=20/></li><li><img src="images/twitter.svg" alt="twitter" height=20/></li><li>website</li></ul> | ![Open In Colab](images/colab.svg) | 19.09.2022 |
Real-ESRGAN | Extend the powerful ESRGAN to a practical restoration application, which is trained with pure synthetic data | <ul><li>Xintao Wang</li> <li>Liangbin Xie</li> <li>Chao Dong</li> <li>Ying Shan</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 18.09.2022 |
IDE-3D | Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis | <ul><li>Jingxiang Sun</li> <li>Xuan Wang</li> <li>Yichun Shi</li> <li>Lizhen Wang</li><details><summary>others</summary><li>Jue Wang</li> <li>Yebin Liu</li></ul></details> | <ul><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 08.09.2022 |
Decision Transformers | An architecture that casts the problem of RL as conditional sequence modeling | <ul><li>Lili Chen</li> <li>Kevin Lu</li> <li>Aravind Rajeswaran</li> <li>Kimin Lee</li><details><summary>others</summary><li>Aditya Grover</li> <li>Michael Laskin</li> <li>Pieter Abbeel</li> <li>Aravind Srinivas</li> <li>Igor Mordatch</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/></li><li>project</li><li><img src="images/wiki.svg" alt="wiki" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 06.09.2022 |
Dream Fields | Zero-Shot Text-Guided Object Generation | <ul><li>Ajay Jain</li> <li>Ben Mildenhall</li> <li>Jon Barron</li> <li>Pieter Abbeel</li> <li>Ben Poole</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 05.09.2022 |
GANgealing | Framework for learning discriminative models and their GAN-generated training data jointly end-to-end | <ul><li>William Peebles</li> <li>Jun-Yan Zhu</li> <li>Richard Zhang</li> <li>Antonio Torralba</li><details><summary>others</summary><li>Alexei Efros</li> <li>Eli Shechtman</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 01.09.2022 |
textual-inversion | An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion | <ul><li>Rinon Gal</li> <li>Yuval Alaluf</li> <li>Yuval Atzmon</li> <li>Or Patashnik</li><details><summary>others</summary><li>Amit Bermano</li> <li>Gal Chechik</li> <li>Daniel Cohen-Or</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 21.08.2022 |
StyleGAN-Human | A Data-Centric Odyssey of Human Generation | <ul><li>Jianglin Fu</li> <li>Shikai Li</li> <li>Yuming Jiang</li> <li>Kwan-Yee Lin</li><details><summary>others</summary><li>Chen Qian</li> <li>Chen Change Loy</li> <li>Wayne Wu</li> <li>Ziwei Liu</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/pwc.svg" alt="pwc" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 19.08.2022 |
Make-A-Scene | Scene-Based Text-to-Image Generation with Human Priors | <ul><li>Oran Gafni</li> <li>Adam Polyak</li> <li>Oron Ashual</li> <li>Shelly Sheynin</li><details><summary>others</summary><li>Devi Parikh</li> <li>Yaniv Taigman</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 12.08.2022 |
StyleGAN-NADA | Zero-Shot non-adversarial domain adaptation of pre-trained generators | <ul><li>Rinon Gal</li> <li>Or Patashnik</li> <li>Haggai Maron</li> <li>Gal Chechik</li> <li>Daniel Cohen-Or</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 09.08.2022 |
YOLOv7 | Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors | <ul><li>Chien-Yao Wang</li> <li>Alexey Bochkovskiy</li> <li>Mark Liao</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>data, data, data, data</li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/pwc.svg" alt="pwc" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 09.08.2022 |
GLIP | Grounded language-image pre-training model for learning object-level, language-aware, and semantic-rich visual representations | <ul><li>Liunian Harold Li</li> <li>Pengchuan Zhang</li> <li>Haotian Zhang</li> <li>Jianwei Yang</li><details><summary>others</summary><li>Chunyuan Li</li> <li>Yiwu Zhong</li> <li>Lijuan Wang</li> <li>Lu Yuan</li> <li>Lei Zhang</li> <li>Jenq-Neng Hwang</li> <li>Kai-Wei Chang</li> <li>Jianfeng Gao</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post</li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/>, <img src="images/medium.svg" alt="medium" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 30.07.2022 |
Anycost GAN | Interactive natural image editing | <ul><li>Ji Lin</li> <li>Richard Zhang</li> <li>Frieder Ganz</li> <li>Song Han</li> <li>Jun-Yan Zhu</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 20.07.2022 |
GFPGAN | Towards Real-World Blind Face Restoration with Generative Facial Prior | <ul><li>Xintao Wang</li> <li>Yu Li</li> <li>Honglun Zhang</li> <li>Ying Shan</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 13.07.2022 |
EPro-PnP | Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation | <ul><li>Hansheng Chen</li> <li>Pichao Wang</li> <li>Fan Wang</li> <li>Wei Tian</li><details><summary>others</summary><li>Lu Xiong</li> <li>Hao Li</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>nuScenes</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 12.07.2022 |
VQ-Diffusion | Based on a VQ-VAE whose latent space is modeled by a conditional variant of the recently developed Denoising Diffusion Probabilistic Model | <ul><li>Shuyang Gu</li> <li>Dong Chen</li> <li>Jianmin Bao</li> <li>Fang Wen</li><details><summary>others</summary><li>Bo Zhang</li> <li>Dongdong Chen</li> <li>Lu Yuan</li> <li>Baining Guo</li> <li>Shuyang Gu</li> <li>Zhicong Tang</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 30.06.2022 |
OPT | Open Pre-trained Transformers is a family of NLP models trained on billions of tokens of text obtained from the internet | <ul><li>Susan Zhang</li> <li>Stephen Roller</li> <li>Naman Goyal</li> <li>Mikel Artetxe</li><details><summary>others</summary><li>Moya Chen</li> <li>Christopher Dewan</li> <li>Mona Diab</li> <li>Xi Victoria Lin</li> <li>Todor Mihaylov</li> <li>Myle Ott</li> <li>Sam Shleifer</li> <li>Kurt Shuster</li> <li>Daniel Simig</li> <li>Punit Singh Koura</li> <li>Anjali Sridhar</li> <li>Tianlu Wang</li> <li>Luke Zettlemoyer</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post</li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 29.06.2022 |
Customizing a Transformer Encoder | We will learn how to customize the encoder to employ new network architectures | Chen Chen | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 22.06.2022 |
MTTR | End-to-End Referring Video Object Segmentation with Multimodal Transformers | <ul><li>Adam Botach</li> <li>Evgenii Zheltonozhskii</li> <li>Chaim Baskin</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 20.06.2022 |
SwinIR | Image Restoration Using Swin Transformer | <ul><li>Jingyun Liang</li> <li>Jiezhang Cao</li> <li>Guolei Sun</li> <li>Kai Zhang</li><details><summary>others</summary><li>Luc Van Gool</li> <li>Radu Timofte</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 17.06.2022 |
VRT | A Video Restoration Transformer | <ul><li>Jingyun Liang</li> <li>Jiezhang Cao</li> <li>Yuchen Fan</li> <li>Kai Zhang</li><details><summary>others</summary><li>Yawei Li</li> <li>Radu Timofte</li> <li>Luc Van Gool</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 15.06.2022 |
Omnivore | A single model which excels at classifying images, videos, and single-view 3D data using exactly the same model parameters | <ul><li>Rohit Girdhar</li> <li>Mannat Singh</li> <li>Nikhila Ravi</li> <li>Laurens Maaten</li><details><summary>others</summary><li>Armand Joulin</li> <li>Ishan Misra</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li>project</li><li><img src="images/pwc.svg" alt="pwc" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 14.06.2022 |
Detic | Detecting Twenty-thousand Classes using Image-level Supervision | <ul><li>Xingyi Zhou</li> <li>Rohit Girdhar</li> <li>Armand Joulin</li> <li>Philipp Krähenbühl</li> <li>Ishan Misra</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 07.06.2022 |
T0 | Multitask Prompted Training Enables Zero-Shot Task Generalization | <ul><li>Victor Sanh</li> <li>Albert Webson</li> <li>Colin Raffel</li> <li>Stephen Bach</li><details><summary>others</summary><li>Lintang Sutawika</li> <li>Zaid Alyafeai</li> <li>Antoine Chaffin</li> <li>Arnaud Stiegler</li> <li>Teven Scao</li> <li>Arun Raja</li> <li>Manan Dey</li> <li>M Saiful Bari</li> <li>Canwen Xu</li> <li>Urmish Thakker</li> <li>Shanya Sharma</li> <li>Eliza Szczechla</li> <li>Taewoon Kim</li> <li>Gunjan Chhablani</li> <li>Nihal Nayak</li> <li>Debajyoti Datta</li> <li>Jonathan Chang</li> <li>Mike Tian-Jian Jiang</li> <li>Matteo Manica</li> <li>Sheng Shen</li> <li>Zheng Xin Yong</li> <li>Harshit Pandey</li> <li>Rachel Bawden</li> <li>Trishala Neeraj</li> <li>Jos Rozen</li> <li>Abheesht Sharma</li> <li>Andrea Santilli</li> <li>Thibault Fevry</li> <li>Jason Alan Fries</li> <li>Ryan Teehan</li> <li>Stella Biderman</li> <li>Leo Gao</li> <li>Tali Bers</li> <li>Thomas Wolf</li> <li>Alexander M. Rush</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 29.05.2022 |
AvatarCLIP | A zero-shot text-driven framework for 3D avatar generation and animation | <ul><li>Fangzhou Hong</li> <li>Mingyuan Zhang</li> <li>Liang Pan</li> <li>Zhongang Cai</li><details><summary>others</summary><li>Lei Yang</li> <li>Ziwei Liu</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>data</li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 15.05.2022 |
Text2Mesh | Text-Driven Neural Stylization for Meshes | <ul><li>Oscar Michel</li> <li>Roi Bar-On</li> <li>Richard Liu</li> <li>Sagie Benaim</li> <li>Rana Hanocka</li></ul> | <ul><li>CLIP</li><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/kaggle.svg" alt="kaggle" height=20/></li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 14.05.2022 |
T5 | Text-To-Text Transfer Transformer | <ul><li>Colin Raffel</li> <li>Noam Shazeer</li> <li>Adam Roberts</li> <li>Katherine Lee</li><details><summary>others</summary><li>Sharan Narang</li> <li>Michael Matena</li> <li>Yanqi Zhou</li> <li>Wei Li</li> <li>Peter J. Liu</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/tf.svg" alt="tf" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 11.05.2022 |
XLS-R | Self-supervised Cross-lingual Speech Representation Learning at Scale | <ul><li>Arun Babu</li> <li>Changhan Wang</li> <li>Andros Tjandra</li> <li>Kushal Lakhotia</li><details><summary>others</summary><li>Qiantong Xu</li> <li>Naman Goyal</li> <li>Kritika Singh</li> <li>Patrick von Platen</li> <li>Yatharth Saraf</li> <li>Juan Pino</li> <li>Alexei Baevski</li> <li>Alexis Conneau</li> <li>Michael Auli</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post</li><li><img src="images/git.svg" alt="git" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 10.05.2022 |
DiffCSE | Unsupervised contrastive learning framework for learning sentence embeddings | <ul><li>Yung-Sung Chuang</li> <li>Rumen Dangovski</li> <li>Hongyin Luo</li> <li>Yang Zhang</li><details><summary>others</summary><li>Shiyu Chang</li> <li>Marin Soljačić</li> <li>Shang-Wen Li</li> <li>Scott Wen-tau Yih</li> <li>Yoon Kim</li> <li>James Glass</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li><img src="images/twitter.svg" alt="twitter" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 24.04.2022 |
ViDT+ | An Extendable, Efficient and Effective Transformer-based Object Detector | <ul><li>Hwanjun Song</li> <li>Deqing Sun</li> <li>Sanghyuk Chun</li> <li>Varun Jampani</li><details><summary>others</summary><li>Dongyoon Han</li> <li>Byeongho Heo</li> <li>Wonjae Kim</li> <li>Ming-Hsuan Yang</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 20.04.2022 |
NAFNet | Nonlinear Activation Free Network for Image Restoration | <ul><li>Liangyu Chen</li> <li>Xiaojie Chu</li> <li>Xiangyu Zhang</li> <li>Jian Sun</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/pwc.svg" alt="pwc" height=20/>, <img src="images/pwc.svg" alt="pwc" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 15.04.2022 |
Panini-Net | GAN Prior based Degradation-Aware Feature Interpolation for Face Restoration | <ul><li>Yinhuai Wang</li> <li>Yujie Hu</li> <li>Jian Zhang</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 13.04.2022 |
Deep Painterly Harmonization | Algorithm produces significantly better results than photo compositing or global stylization techniques and that it enables creative painterly edits that would be otherwise difficult to achieve | <ul><li>Fujun Luan</li> <li>Sylvain Paris</li> <li>Eli Shechtman</li> <li>Kavita Bala</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 07.04.2022 |
E2FGVI | An End-to-End framework for Flow-Guided Video Inpainting through elaborately designed three trainable modules, namely, flow completion, feature propagation, and content hallucination modules | <ul><li>Zhen Li</li> <li>Cheng-Ze Lu</li> <li>Jianhua Qin</li> <li>Chun-Le Guo</li> <li>Ming-Ming Cheng</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>data, data</li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 06.04.2022 |
LDM | High-Resolution Image Synthesis with Latent Diffusion Models | <ul><li>Robin Rombach</li> <li>Andreas Blattmann</li> <li>Dominik Lorenz</li> <li>Patrick Esser</li> <li>Björn Ommer</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 04.04.2022 |
GP-UNIT | Novel framework, Generative Prior-guided UNsupervised Image-to-image Translation, to improve the overall quality and applicability of the translation algorithm | <ul><li>Shuai Yang</li> <li>Liming Jiang</li> <li>Ziwei Liu</li> <li>Chen Change Loy</li></ul> | <ul><li>ImageNet</li><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 02.04.2022 |
DualStyleGAN | More challenging exemplar-based high-resolution portrait style transfer by introducing a novel DualStyleGAN with flexible control of dual styles of the original face domain and the extended artistic portrait domain | <ul><li>Shuai Yang</li> <li>Liming Jiang</li> <li>Ziwei Liu</li> <li>Chen Change Loy</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>data, data</li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/>, <img src="images/hf.svg" alt="hf" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 24.03.2022 |
CLIPasso | Semantically-Aware Object Sketching | <ul><li>Yael Vinker</li> <li>Ehsan Pajouheshgar</li> <li>Jessica Y. Bo</li> <li>Roman Bachmann</li><details><summary>others</summary><li>Amit Bermano</li> <li>Daniel Cohen-Or</li> <li>Amir Zamir</li> <li>Ariel Shamir</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>demo</li><li><img src="images/git.svg" alt="git" height=20/></li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 21.03.2022 |
StyleSDF | A high resolution, 3D-consistent image and shape generation technique | <ul><li>Roy Or-El</li> <li>Xuan Luo</li> <li>Mengyi Shan</li> <li>Eli Shechtman</li><details><summary>others</summary><li>Jeong Joon Park</li> <li>Ira Kemelmacher-Shlizerman</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 05.03.2022 |
Disentangled Lifespan Face Synthesis | LFS model is proposed to disentangle the key face characteristics including shape, texture and identity so that the unique shape and texture age transformations can be modeled effectively | <ul><li>Sen He</li> <li>Wentong Liao</li> <li>Michael Yang</li> <li>Yi-Zhe Song</li><details><summary>others</summary><li>Bodo Rosenhahn</li> <li>Tao Xiang</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 22.02.2022 |
ClipCap | CLIP Prefix for Image Captioning | <ul><li>Ron Mokady</li> <li>Amir Hertz</li> <li>Amit Bermano</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>data</li><li><img src="images/hf.svg" alt="hf" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 15.02.2022 |
ROMP | Monocular, One-stage, Regression of Multiple 3D People | <ul><li>Yu Sun</li> <li>Qian Bao</li> <li>Wu Liu</li> <li>Yili Fu</li><details><summary>others</summary><li>Michael Black</li> <li>Tao Mei</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 11.02.2022 |
Mask2Former | Masked-attention Mask Transformer for Universal Image Segmentation | <ul><li>Bowen Cheng</li> <li>Ishan Misra</li> <li>Alexander Schwing</li> <li>Alexander Kirillov</li> <li>Rohit Girdhar</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>demo</li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 09.02.2022 |
JoJoGAN | One Shot Face Stylization | <ul><li>Min Jin Chong</li> <li>David Forsyth</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 02.02.2022 |
Pose with Style | Detail-Preserving Pose-Guided Image Synthesis with Conditional StyleGAN | <ul><li>Badour AlBahar</li> <li>Jingwan Lu</li> <li>Jimei Yang</li> <li>Zhixin Shu</li><details><summary>others</summary><li>Eli Shechtman</li> <li>Jia-Bin Huang</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 19.01.2022 |
ConvNeXt | A pure ConvNet model constructed entirely from standard ConvNet modules | <ul><li>Zhuang Liu</li> <li>Hanzi Mao</li> <li>Chao-Yuan Wu</li> <li>Christoph Feichtenhofer</li><details><summary>others</summary><li>Trevor Darrell</li> <li>Saining Xie</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 19.01.2022 |
diffsort | Differentiable Sorting Networks | <ul><li>Felix Petersen</li> <li>Christian Borgelt</li> <li>Hilde Kuehne</li> <li>Oliver Deussen</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 17.01.2022 |
Taming Transformers for High-Resolution Image Synthesis | We combine the efficiancy of convolutional approaches with the expressivity of transformers by introducing a convolutional VQGAN, which learns a codebook of context-rich visual parts, whose composition is modeled with an autoregressive transformer | <ul><li>Patrick Esser</li> <li>Robin Rombach</li> <li>Björn Ommer</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 13.01.2022 |
GLIDE | Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models | <ul><li>Alex Nichol</li> <li>Prafulla Dhariwal</li> <li>Aditya Ramesh</li> <li>Pranav Shyam</li><details><summary>others</summary><li>Pamela Mishkin</li> <li>Bob McGrew</li> <li>Ilya Sutskever</li> <li>Mark Chen</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 22.12.2021 |
Nerfies | First method capable of photorealistically reconstructing deformable scenes using photos/videos captured casually from mobile phones | <ul><li>Keunhong Park</li> <li>Utkarsh Sinha</li> <li>Jon Barron</li> <li>Sofien Bouaziz</li><details><summary>others</summary><li>Dan Goldman</li> <li>Steve Seitz</li> <li>Ricardo Martin-Brualla</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/reddit.svg" alt="reddit" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 06.12.2021 |
HyperStyle | A hypernetwork that learns to modulate StyleGAN's weights to faithfully express a given image in editable regions of the latent space | <ul><li>Yuval Alaluf</li> <li>Omer Tov</li> <li>Ron Mokady</li> <li>Rinon Gal</li> <li>Amit Bermano</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>data</li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 03.12.2021 |
encoder4editing | Designing an Encoder for StyleGAN Image Manipulation | <ul><li>Omer Tov</li> <li>Yuval Alaluf</li> <li>Yotam Nitzan</li> <li>Or Patashnik</li> <li>Daniel Cohen-Or</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 02.12.2021 |
StyleCariGAN | Caricature Generation via StyleGAN Feature Map Modulation | <ul><li>Wonjong Jang</li> <li>Gwangjin Ju</li> <li>Yucheol Jung</li> <li>Jiaolong Yang</li><details><summary>others</summary><li>Xin Tong</li> <li>Seungyong Lee</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 30.11.2021 |
CartoonGAN | The implementation of the cartoon GAN model with PyTorch | Tobias Sunderdiek | <ul><li><img src="images/kaggle.svg" alt="kaggle" height=20/></li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 24.11.2021 |
SimSwap | An efficient framework, called Simple Swap, aiming for generalized and high fidelity face swapping | <ul><li>Xuanhong Chen</li> <li>Bingbing Ni</li> <li>Yanhao Ge</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 24.11.2021 |
RVM | Robust High-Resolution Video Matting with Temporal Guidance | <ul><li>Shanchuan Lin</li> <li>Linjie Yang</li> <li>Imran Saleemi</li> <li>Soumyadip Sengupta</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 24.11.2021 |
RVM | Robust, real-time, high-resolution human video matting method that achieves new state-of-the-art performance | <ul><li>Shanchuan Lin</li> <li>Linjie Yang</li> <li>Imran Saleemi</li> <li>Soumyadip Sengupta</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>project</li><li><img src="images/reddit.svg" alt="reddit" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 24.11.2021 |
AnimeGANv2 | An improved version of AnimeGAN - it prevents the generation of high-frequency artifacts by simply changing the normalization of features in the network | <ul><li>Xin Chen</li> <li>Gang Liu</li> <li>bryandlee</li></ul> | <ul><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 17.11.2021 |
SOAT | StyleGAN of All Trades: Image Manipulation with Only Pretrained StyleGAN | <ul><li>Min Jin Chong</li> <li>Hsin-Ying Lee</li> <li>David Forsyth</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 13.11.2021 |
Arnheim | Generative Art Using Neural Visual Grammars and Dual Encoders | <ul><li>Chrisantha Fernando</li> <li>Ali Eslami</li> <li>Jean-Baptiste Alayrac</li> <li>Piotr Mirowski</li><details><summary>others</summary><li>Dylan Banarse</li> <li>Simon Osindero</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/wiki.svg" alt="wiki" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 11.11.2021 |
StyleGAN 2 | Generation of faces, cars, etc. | Mikael Christensen | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 05.11.2021 |
ByteTrack | Multi-Object Tracking by Associating Every Detection Box | <ul><li>Yifu Zhang</li> <li>Peize Sun</li> <li>Yi Jiang</li> <li>Dongdong Yu</li><details><summary>others</summary><li>Ping Luo</li> <li>Xinggang Wang</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>data, data</li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/pwc.svg" alt="pwc" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 30.10.2021 |
GPT-2 | Retrain an advanced text generating neural network on any text dataset using gpt-2-simple! | Max Woolf | <ul><li>blog post, blog post</li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/reddit.svg" alt="reddit" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 18.10.2021 |
ConvMixer | An extremely simple model that is similar in spirit to the ViT and the even-more-basic MLP-Mixer in that it operates directly on patches as input, separates the mixing of spatial and channel dimensions, and maintains equal size and resolution throughout the network | <ul><li>Asher Trockman</li> <li>Zico Kolter</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 05.10.2021 |
IC-GAN | Instance-Conditioned GAN | <ul><li>Arantxa Casanova</li> <li>Marlène Careil</li> <li>Jakob Verbeek</li> <li>Michał Drożdżal</li> <li>Adriana Romero-Soriano</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post</li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/neurips.svg" alt="neurips" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 01.10.2021 |
Skillful Precipitation Nowcasting Using Deep Generative Models of Radar | Open-sourced dataset and model snapshot for precipitation nowcasting | <ul><li>Suman Ravuri</li> <li>Karel Lenc</li> <li>Matthew Willson</li> <li>Dmitry Kangin</li><details><summary>others</summary><li>Rémi Lam</li> <li>Piotr Mirowski</li> <li>Maria Athanassiadou</li> <li>Sheleem Kashem</li> <li>Rachel Prudden</li> <li>Amol Mandhane</li> <li>Aidan Clark</li> <li>Andrew Brock</li> <li>Karen Simonyan</li> <li>Raia Hadsell</li> <li>Niall Robinson</li> <li>Ellen Clancy</li> <li>Shakir Mohamed</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post</li><li>local kernel</li><li><img src="images/tf.svg" alt="tf" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 29.09.2021 |
Live Speech Portraits | Real-Time Photorealistic Talking-Head Animation | <ul><li>Yuanxun Lu</li> <li>Jinxiang Chai</li> <li>Xun Cao</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 26.09.2021 |
StylEx | Training a GAN to explain a classifier in StyleSpace | <ul><li>Oran Lang</li> <li>Yossi Gandelsman</li> <li>Michal Yarom</li> <li>Yoav Wald</li><details><summary>others</summary><li>Gal Elidan</li> <li>Avinatan Hassidim</li> <li>William Freeman</li> <li>Phillip Isola</li> <li>Amir Globerso</li> <li>Michal Irani</li> <li>Inbar Mosseri</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post</li><li>project</li><li>supplementary</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 25.08.2021 |
VITS | Parallel end-to-end TTS method that generates more natural sounding audio than current two-stage models | <ul><li>Jaehyeon Kim</li> <li>Jungil Kong</li> <li>Juhee Son</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>demo</li></ul> | ![Open In Colab](images/colab.svg) | 23.08.2021 |
Bringing Old Photo Back to Life | Restoring old photos that suffer from severe degradation through a deep learning approach | <ul><li>Ziyu Wan</li> <li>Bo Zhang</li> <li>Dongdong Chen</li> <li>Pan Zhang</li><details><summary>others</summary><li>Dong Chen</li> <li>Jing Liao</li> <li>Fang Wen</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>demo</li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 13.07.2021 |
PTI | Pivotal Tuning Inversion enables employing off-the-shelf latent based semantic editing techniques on real images using StyleGAN | <ul><li>Daniel Roich</li> <li>Ron Mokady</li> <li>Amit Bermano</li> <li>Daniel Cohen-Or</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 01.07.2021 |
TediGAN | Framework for multi-modal image generation and manipulation with textual descriptions | <ul><li>Weihao Xia</li> <li>Yujiu Yang</li> <li>Jing-Hao Xue</li> <li>Baoyuan Wu</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 30.06.2021 |
SCALE | Modeling Clothed Humans with a Surface Codec of Articulated Local Elements | <ul><li>Qianli Ma</li> <li>Shunsuke Saito</li> <li>Jinlong Yang</li> <li>Siyu Tang</li> <li>Michael Black</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>data</li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>poster</li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 26.06.2021 |
CogView | Mastering Text-to-Image Generation via Transformers | <ul><li>Ming Ding</li> <li>Zhuoyi Yang</li> <li>Wenyi Hong</li> <li>Wendi Zheng</li><details><summary>others</summary><li>Chang Zhou</li> <li>Junyang Lin</li> <li>Xu Zou</li> <li>Zhou Shao</li> <li>Hongxia Yang</li> <li>Jie Tang</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>demo</li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li><img src="images/neurips.svg" alt="neurips" height=20/></li><li><img src="images/reddit.svg" alt="reddit" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 21.06.2021 |
GANs N' Roses | Stable, Controllable, Diverse Image to Image Translation | <ul><li>Min Jin Chong</li> <li>David Forsyth</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 19.06.2021 |
Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes | A method to stylize images by optimizing parameterized brushstrokes instead of pixels | <ul><li>Dmytro Kotovenko</li> <li>Matthias Wright</li> <li>Arthur Heimbrecht</li> <li>Björn Ommer</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 02.06.2021 |
Pixel2Style2Pixel | Encoding in Style: A StyleGAN Encoder for Image-to-Image Translation | <ul><li>Elad Richardson</li> <li>Yuval Alaluf</li> <li>Yotam Nitzan</li> <li>Daniel Cohen-Or</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 01.06.2021 |
Fine-tuning a BERT | We will work through fine-tuning a BERT model using the tensorflow-models PIP package | <ul><li>Chen Chen</li> <li>Claire Yao</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/tf.svg" alt="tf" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 24.05.2021 |
ReStyle | A Residual-Based StyleGAN Encoder via Iterative Refinement | <ul><li>Yuval Alaluf</li> <li>Or Patashnik</li> <li>Daniel Cohen-Or</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 21.05.2021 |
Motion Representations for Articulated Animation | Novel motion representations for animating articulated objects consisting of distinct parts | <ul><li>Aliaksandr Siarohin</li> <li>Oliver Woodford</li> <li>Jian Ren</li> <li>Menglei Chai</li> <li>Sergey Tulyakov</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 29.04.2021 |
SAM | Age Transformation Using a Style-Based Regression Model | <ul><li>Yuval Alaluf</li> <li>Or Patashnik</li> <li>Daniel Cohen-Or</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 26.04.2021 |
Geometry-Free View Synthesis | Is a geometric model required to synthesize novel views from a single image? | <ul><li>Robin Rombach</li> <li>Patrick Esser</li> <li>Björn Ommer</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>data</li><li><img src="images/git.svg" alt="git" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 22.04.2021 |
NeRViS | An algorithm for full-frame video stabilization by first estimating dense warp fields | <ul><li>Yu-Lun Liu</li> <li>Wei-Sheng Lai</li> <li>Ming-Hsuan Yang</li> <li>Yung-Yu Chuang</li> <li>Jia-Bin Huang</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>data</li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 11.04.2021 |
NeX | View synthesis based on enhancements of multiplane image that can reproduce NeXt-level view-dependent effects in real time | <ul><li>Suttisak Wizadwongsa</li> <li>Pakkapon Phongthawee</li> <li>Jiraphon Yenphraphai</li> <li>Supasorn Suwajanakorn</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>data, data</li><li><img src="images/git.svg" alt="git" height=20/></li><li>project</li><li>vistec</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 25.03.2021 |
Score SDE | Score-Based Generative Modeling through Stochastic Differential Equations | <ul><li>Yang Song</li> <li>Jascha Sohl-Dickstein</li> <li>Diederik Kingma</li> <li>Abhishek Kumar</li><details><summary>others</summary><li>Stefano Ermon</li> <li>Ben Poole</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 18.03.2021 |
Talking Head Anime from a Single Image | The network takes as input an image of an anime character's face and a desired pose, and it outputs another image of the same character in the given pose | Pramook Khungurn | <ul><li><img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/wiki.svg" alt="wiki" height=20/>, <img src="images/wiki.svg" alt="wiki" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 23.02.2021 |
NFNet | An adaptive gradient clipping technique, a significantly improved class of Normalizer-Free ResNets | <ul><li>Andrew Brock</li> <li>Soham De</li> <li>Samuel L. Smith</li> <li>Karen Simonyan</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 17.02.2021 |
RITM | Simple feedforward model for click-based interactive segmentation that employs the segmentation masks from previous steps | <ul><li>Konstantin Sofiiuk</li> <li>Ilia Petrov</li> <li>Anton Konushin</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/pwc.svg" alt="pwc" height=20/>, <img src="images/pwc.svg" alt="pwc" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 13.02.2021 |
CLIP | A neural network which efficiently learns visual concepts from natural language supervision | <ul><li>Jong Wook Kim</li> <li>Alec Radford</li> <li>Ilya Sutskever</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>data</li><li>paper</li><li>project</li><li>slides</li></ul> | ![Open In Colab](images/colab.svg) | 29.01.2021 |
Adversarial Patch | A method to create universal, robust, targeted adversarial image patches in the real world | Tom Brown | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 27.01.2021 |
MSG-Net | Multi-style Generative Network with a novel Inspiration Layer, which retains the functionality of optimization-based approaches and has the fast speed of feed-forward networks | <ul><li>Hang Zhang</li> <li>Kristin Dana</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 25.01.2021 |
f-BRS | Feature backpropagating refinement scheme that solves an optimization problem with respect to auxiliary variables instead of the network inputs, and requires running forward and backward pass just for a small part of a network | <ul><li>Konstantin Sofiiuk</li> <li>Ilia Petrov</li> <li>Olga Barinova</li> <li>Anton Konushin</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 25.01.2021 |
Neural Style Transfer | Implementation of Neural Style Transfer in Keras 2.0+ | Somshubra Majumdar | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 22.01.2021 |
SkyAR | A vision-based method for video sky replacement and harmonization, which can automatically generate realistic and dramatic sky backgrounds in videos with controllable styles | Zhengxia Zou | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 18.01.2021 |
MusicXML Documentation | The goal of this notebook is to explore one of the magenta libraries for music | <ul><li>Prakruti Joshi</li> <li>Falak Shah</li> <li>Twisha Naik</li></ul> | <ul><li>magenta</li><li>music theory</li><li>musicXML</li></ul> | ![Open In Colab](images/colab.svg) | 08.01.2021 |
SVG VAE | A colab demo for the SVG VAE model | Raphael Gontijo Lopes | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post</li></ul> | ![Open In Colab](images/colab.svg) | 08.01.2021 |
Neural Magic Eye | Learning to See and Understand the Scene Behind an Autostereogram | <ul><li>Zhengxia Zou</li> <li>Tianyang Shi</li> <li>Yi Yuan</li> <li>Zhenwei Shi</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 01.01.2021 |
FGVC | Method first extracts and completes motion edges, and then uses them to guide piecewise-smooth flow completion with sharp edges | <ul><li>Chen Gao</li> <li>Ayush Saraf</li> <li>Johannes Kopf</li> <li>Jia-Bin Huang</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 30.12.2020 |
VIBE | Video Inference for Body Pose and Shape Estimation, which makes use of an existing large-scale motion capture dataset together with unpaired, in-the-wild, 2D keypoint annotations | <ul><li>Muhammed Kocabas</li> <li>Nikos Athanasiou</li> <li>Michael Black</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/pwc.svg" alt="pwc" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 23.12.2020 |
SeFa | A closed-form approach for unsupervised latent semantic factorization in GANs | <ul><li>Yujun Shen</li> <li>Bolei Zhou</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 06.12.2020 |
Stylized Neural Painting | An image-to-painting translation method that generates vivid and realistic painting artworks with controllable styles | <ul><li>Zhengxia Zou</li> <li>Tianyang Shi</li> <li>Yi Yuan</li> <li>Zhenwei Shi</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 01.12.2020 |
BiT | Big Transfer: General Visual Representation Learning | <ul><li>Alexander Kolesnikov</li> <li>Lucas Beyer</li> <li>Xiaohua Zhai</li> <li>Joan Puigcerver</li><details><summary>others</summary><li>Jessica Yung</li> <li>Sylvain Gelly</li> <li>Neil Houlsby</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/hf.svg" alt="hf" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 12.11.2020 |
LaSAFT | Latent Source Attentive Frequency Transformation for Conditioned Source Separation | Woosung Choi | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>data</li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 01.11.2020 |
Lifespan Age Transformation Synthesis | Multi-domain image-to-image generative adversarial network architecture, whose learned latent space models a continuous bi-directional aging process | <ul><li>Roy Or-El</li> <li>Soumyadip Sengupta</li> <li>Ohad Fried</li> <li>Eli Shechtman</li> <li>Ira Kemelmacher-Shlizerman</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 31.10.2020 |
HiGAN | Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis | <ul><li>Ceyuan Yang</li> <li>Yujun Shen</li> <li>Bolei Zhou</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 14.10.2020 |
InterFaceGAN | Interpreting the Latent Space of GANs for Semantic Face Editing | <ul><li>Yujun Shen</li> <li>Jinjin Gu</li> <li>Xiaoou Tang</li> <li>Bolei Zhou</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 13.10.2020 |
Instance-aware Image Colorization | Novel deep learning framework to achieve instance-aware colorization | Jheng-Wei Su | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 30.08.2020 |
MoCo | Momentum Contrast for unsupervised visual representation learning | <ul><li>Kaiming He</li> <li>Haoqi Fan</li> <li>Yuxin Wu</li> <li>Saining Xie</li> <li>Ross Girshick</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 20.08.2020 |
CAPE | Learning to Dress 3D People in Generative Clothing | <ul><li>Qianli Ma</li> <li>Jinlong Yang</li> <li>Anurag Ranjan</li> <li>Sergi Pujades</li><details><summary>others</summary><li>Gerard Pons-Moll</li> <li>Siyu Tang</li> <li>Michael Black</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>data</li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li><img src="images/medium.svg" alt="medium" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 05.08.2020 |
Rewriting a Deep Generative Model | We ask if a deep network can be reprogrammed to follow different rules, by enabling a user to directly change the weights, instead of training with a data set | <ul><li>David Bau</li> <li>Steven Liu</li> <li>Tongzhou Wang</li> <li>Jun-Yan Zhu</li> <li>Antonio Torralba</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 31.07.2020 |
SIREN | Implicit Neural Representations with Periodic Activation Functions | <ul><li>Vincent Sitzmann</li> <li>Julien Martel</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>data</li><li><img src="images/neurips.svg" alt="neurips" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 24.06.2020 |
PIFu | Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization | <ul><li>Ryota Natsume</li> <li>Shunsuke Saito</li> <li>Zeng Huang</li> <li>Angjoo Kanazawa</li> <li>Hao Li</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 17.06.2020 |
3D Photo Inpainting | Method for converting a single RGB-D input image into a 3D photo, i.e., a multi-layer representation for novel view synthesis that contains hallucinated color and depth structures in regions occluded in the original view | <ul><li>Meng-Li Shih</li> <li>Shih-Yang Su</li> <li>Johannes Kopf</li> <li>Jia-Bin Huang</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 04.05.2020 |
Motion Supervised co-part Segmentation | A self-supervised deep learning method for co-part segmentation | <ul><li>Aliaksandr Siarohin</li> <li>Subhankar Roy</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/git.svg" alt="git" height=20/></li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 07.04.2020 |
Onsets and Frames | Onsets and Frames is an automatic music transcription framework with piano and drums models | <ul><li>Curtis Hawthorne</li> <li>Erich Elsen</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post</li><li>data, data</li></ul> | ![Open In Colab](images/colab.svg) | 02.04.2020 |
BERT score | An automatic evaluation metric for text generation | Tianyi Zhang | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 05.03.2020 |
Generating Piano Music with Transformer | This Colab notebook lets you play with pretrained Transformer models for piano music generation, based on the Music Transformer | <ul><li>Ian Simon</li> <li>Anna Huang</li> <li>Jesse Engel</li> <li>Curtis Hawthorne</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post</li></ul> | ![Open In Colab](images/colab.svg) | 16.09.2019 |
HMR | End-to-end framework for reconstructing a full 3D mesh of a human body from a single RGB image | <ul><li>Angjoo Kanazawa</li> <li>Michael Black</li> <li>David Jacobs</li> <li>Jitendra Malik</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li><img src="images/docker.svg" alt="docker" height=20/></li><li><img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/>, <img src="images/git.svg" alt="git" height=20/></li><li>project</li><li><img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 15.03.2019 |
GANSynth | This notebook is a demo GANSynth, which generates audio with Generative Adversarial Networks | Jesse Engel | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/>, <img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>project</li></ul> | ![Open In Colab](images/colab.svg) | 25.02.2019 |
Latent Constraints | Conditional Generation from Unconditional Generative Models | <ul><li>Jesse Engel</li> <li>Matthew Hoffman</li> <li>Adam Roberts</li></ul> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>data</li></ul> | ![Open In Colab](images/colab.svg) | 27.11.2017 |
Performance RNN | This notebook shows you how to generate new performed compositions from a trained model | <ul><li>Ian Simon</li> <li>Sageev Oore</li> <li>Curtis Hawthorne</li></ul> | <ul><li>blog post</li><li>data</li></ul> | ![Open In Colab](images/colab.svg) | 11.07.2017 |
NSynth | This colab notebook has everything you need to upload your own sounds and use NSynth models to reconstruct and interpolate between them | <ul><li>Jesse Engel</li> <li>Cinjon Resnick</li> <li>Adam Roberts</li> <li>Sander Dieleman</li><details><summary>others</summary><li>Karen Simonyan</li> <li>Mohammad Norouzi</li> <li>Douglas Eck</li></ul></details> | <ul><li><img src="images/arxiv.svg" alt="arxiv" height=20/></li><li>blog post</li><li>data</li><li>tutorial</li><li><img src="images/yt.svg" alt="yt" height=20/>, <img src="images/yt.svg" alt="yt" height=20/></li></ul> | ![Open In Colab](images/colab.svg) | 06.04.2017 |