Awesome
A Survey of Generative AI for de novo Drug Design
Update: our paper has been accepted for Briefings in Bioinformatics!
Repository for the survey paper "A Survey of Generative AI for de novo Drug Discovery: New Frontiers in Molecule and Protein Design".
<p align="center"> Xiangru Tang<sup>1</sup>*, Howard Dai<sup>1</sup>*, Elizabeth Knight<sup>1</sup>*, Yunyang Li<sup>1</sup>, Fang Wu<sup>2</sup>, Tianxiao Li<sup>1</sup>, Mark Gerstein<sup>1</sup> </p> <p align="center"> 1. Yale University; 2. Stanford University<br> (*: Equal Contribution) </p>Table of Contents
[**] denotes appendix sections.
Cite us
@article{tang2024survey,
title={A survey of generative ai for de novo drug design: new frontiers in molecule and protein generation},
author={Tang, Xiangru and Dai, Howard and Knight, Elizabeth and Wu, Fang and Li, Yunyang and Li, Tianxiao and Gerstein, Mark},
journal={Briefings in Bioinformatics},
volume={25},
number={4},
year={2024},
publisher={Oxford Academic}
}
Overview of Topics
An overview of topics covered in our paper. Sections highlighted in blue can be found in the main text, while purple sections are extended sections found in the appendix.
<p align="center"> <br> <!-- <img src="GenAIOutline_New.png" alt="generative AI for drug design" width="500"> --> <img src="GenAIOutline_New.png" alt="generative AI for drug design"> </p> <!--- # Technical Background TODO: update the photo with a new screenshot (changed RFDiffusion citation) INSERT TABLE W/ TECH PAPERS * **Paper Title** (Model name) Author1, Author2, ... Conference (Year) -->Molecule
Target-Agnostic Generation
Datasets
-
Quantum chemistry structures and properties of 134 kilo molecules (QM9)
Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, O. Anatole von Lilienfeld
Scientific Data (2014) -
GEOM, energy-annotated molecular conformations for property prediction and molecular generation (GEOM)
Simon Axelrod, Rafael Gómez-Bombarelli
Scientific Data (2022)
Metrics
- Quantifying the chemical beauty of drugs (QED)
G Richard Bickerton, Gaia V Paolini, Jérémy Besnard, Sorel Muresan, Andrew L Hopkins
Nature Chemistry (2012)
Models
-
Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules (CVAE)
Rafael Gómez-Bombarelli, Jennifer N. Wei, David Duvenaud, JoséMiguel Hernández-Lobato, BenjamínSánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D. Hirzel, Ryan P. Adams, and Alán Aspuru-Guzik
ACS Central Science (2018) -
Grammar Variational Autoencoder (GVAE)
Matt J. Kusner, Brooks Paige, José Miguel Hernández-Lobato
ICML 2017 -
Syntax-Directed Variational Autoencoder for Structured Data (SD-VAE)
Hanjun Dai, Yingtao Tian, Bo Dai, Steven Skiena, Le Song
ICLR 2018 -
Junction Tree Variational Autoencoder for Molecular Graph Generation (JT-VAE)
Wengong Jin, Regina Barzilay, Tommi Jaakkola
ICML 2018 -
E(n) Equivariant Normalizing Flows (E-NF)
Victor Garcia Satorras, Emiel Hoogeboom, Fabian Fuchs, Ingmar Posner, Max Welling
NeurIPS 2021 -
Symmetry-adapted generation of 3d point sets for the targeted discovery of molecules (G-SchNet)
Niklas Gebauer, Michael Gastegger, Kristof Schütt
NeurIPS 2019 -
Equivariant Diffusion for Molecule Generation in 3D (EDM)
Emiel Hoogeboom, Vı́ctor Garcia Satorras, Clément Vignac, Max Welling
ICML 2022 -
Geometry-Complete Diffusion for 3D Molecule Generation and Optimization (GCDM)
Alex Morehead, Jianlin Cheng
arXiv:2302.04313 (2023) -
MDM: Molecular Diffusion Model for 3D Molecule Generation (MDM)
Lei Huang, Hengtong Zhang, Tingyang Xu, Ka-Chun Wong
AAAI 2023 -
Geometric Latent Diffusion Models for 3D Molecule Generation (GeoLDM)
Minkai Xu, Alexander S Powers, Ron O. Dror, Stefano Ermon, Jure Leskovec
ICML 2023 -
Learning Joint 2D & 3D Diffusion Models for Complete Molecule Generation (JODO)
Han Huang, Leilei Sun, Bowen Du, Weifeng Lv
arXiv:2305.12347 (2023) -
MiDi: Mixed Graph and 3D Denoising Diffusion for Molecule Generation (MiDi)
Clement Vignac, Nagham Osman, Laura Toni, Pascal Frossard
arXiv:2302.09048 (2023)
Target-Aware Generation
Datasets
-
Three-Dimensional Convolutional Neural Networks and a Cross-Docked Data Set for Structure-Based Drug Design (CrossDocked2020)
Paul G. Francoeur, Tomohide Masuda, Jocelyn Sunseri, Andrew Jia, Richard B. Iovanisci, Ian Snyder, David R. Koes
ACS JCIM 2020 -
ZINC20—A Free Ultralarge-Scale Chemical Database for Ligand Discovery (ZINC20)
John J. Irwin, Khanh G. Tang, Jennifer Young, Chinzorig Dandarchuluun, Benjamin R. Wong, Munkhzul Khurelbaatar, Yurii S. Moroz, John Mayfield, Roger A. Sayle
ACS JCIM 2020 -
Binding MOAD (Mother Of All Databases) (Binding MOAD)
Liegi Hu, Mark L. Benson, Richard D. Smith, Michael G. Lerner, Heather A. Carlson
Proteins 2005
Metrics
-
AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading (Vina AutoDock)
Oleg Trott, Arthur J. Olson
JCC 2010 -
Quantifying the chemical beauty of drugs (QED) G Richard Bickerton, Gaia V Paolini, Jérémy Besnard, Sorel Muresan, Andrew L Hopkins
Nature Chemistry (2012) -
Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions (SAScore)
Peter Ertl, Ansgar Schuffenhauer Journal of Cheminformatics 2009
Models
-
DrugGPT: A GPT-based Strategy for Designing Potential Ligands Targeting Specific Proteins (DrugGPT)
Yuesen Li, Chengyi Gao, Xin Song, Xiangyu Wang, Yungang Xu, Suxia Han
bioRxiv (2023) -
Generating 3D Molecular Structures Conditional on a Receptor Binding Site with Deep Generative Models (LiGAN)
Tomohide Masuda, Matthew Ragoza, David Ryan Koes
arXiv:2010.14442 (2020) -
Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets (Pocket2Mol)
Xingang Peng, Shitong Luo, Jiaqi Guan, Qi Xie, Jian Peng, Jianzhu Ma
ICML 2022 -
A 3D Generative Model for Structure-Based Drug Design
Shitong Luo, Jiaqi Guan, Jianzhu Ma, Jian Peng
NeurIPS 2021 -
3D Equivariant Diffusion for Target-Aware Molecule Generation and Affinity Prediction (TargetDiff) Jiaqi Guan, Wesley Wei Qian, Xingang Peng, Yufeng Su, Jian Peng, Jianzhu Ma
ICLR 2023 -
Structure-based Drug Design with Equivariant Diffusion Models (DiffSBDD)
Arne Schneuing, Yuanqi Du, Charles Harris, Arian Jamasb, Ilia Igashov, Weitao Du, Tom Blundell, Pietro Lió, Carla Gomes, Max Welling, Michael Bronstein, Bruno Correia
arXiv:2210.13695 (2022)
Conformation Generation (appendix)
Datasets
-
GEOM, energy-annotated molecular conformations for property prediction and molecular generation (GEOM)
Simon Axelrod, Rafael Gómez-Bombarelli
Scientific Data 2022 -
SchNet: A continuous-filter convolutional neural network for modeling quantum interactions (ISO17)
Kristof Schütt, Pieter-Jan Kindermans, Huziel Enoc Sauceda Felix, Stefan Chmiela, Alexandre Tkatchenko, Klaus-Robert Müller
NeurIPS 2017
Metrics
- Learning Neural Generative Dynamics for Molecular Conformation Generation (Coverage, Matching)
Minkai Xu, Shitong Luo, Yoshua Bengio, Jian Peng, Jian Tang
ICLR 2021
Models
-
Molecular Geometry Prediction using a Deep Generative Graph Neural Network (CVGAE)
Elman Mansimov, Omar Mahmood, Seokho Kang, Kyunghyun Cho
Scientific Reports 2019 -
A Generative Model for Molecular Distance Geometry (GraphDG)
Gregor N. C. Simm, Jose Miguel Hernandez-Lobato
ICML 2020 -
Learning Neural Generative Dynamics for Molecular Conformation Generation (CGCF)
Minkai Xu, Shitong Luo, Yoshua Bengio, Jian Peng, Jian Tang
ICLR 2021 -
GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles (GeoMol)
Octavian Ganea, Lagnajit Pattanaik, Connor Coley, Regina Barzilay, Klavs Jensen, William Green, Tommi Jaakkola
NeurIPS 2021 -
Learning Gradient Fields for Molecular Conformation Generation (ConfGF)
Chence Shi, Shitong Luo, Minkai Xu, Jian Tang
ICML 2021 -
Predicting Molecular Conformation via Dynamic Graph Score Matching (DGSM)
Shitong Luo, Chence Shi, Minkai Xu, Jian Tang
NeurIPS 2021 -
GeoDiff: A Geometric Diffusion Model for Molecular Conformation Generation (GeoDiff)
Minkai Xu, Lantao Yu, Yang Song, Chence Shi, Stefano Ermon, Jian Tang
ICLR 2022
Protein
Representation Learning (appendix)
Datasets
-
UniProt: the Universal Protein knowledgebase (UniProt)
Rolf Apweiler, Amos Bairoch, Cathy H. Wu, Winona C. Barker, Brigitte Boeckmann, Serenella Ferro, Elisabeth Gasteiger, Hongzhan Huang, Rodrigo Lopez, Michele Magrane, Maria J. Martin, Darren A. Natale, Claire O'Donovan, Nicole Redaschi, Lai-Su L. Yeh
Nucleic Acids Research 2004 -
OntoProtein: Protein Pretraining With Gene Ontology Embedding (ProteinKG)
Ningyu Zhang, Zhen Bi, Xiaozhuan Liang, Siyuan Cheng, Haosen Hong, Shumin Deng, Jiazhang Lian, Qiang Zhang, Huajun Chen
ICLR 2022 -
The Protein Data Bank (PDB)
Helen M. Berman, John Westbrook, Zukang Feng, Gary Gilliland, T. N. Bhat, Helge Weissig, Ilya N. Shindyalov, Philip E. Bourne
Nucleic Acids Research 2000 -
AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models (AlphaFoldDB)
Mihaly Varadi, Stephen Anyango, Mandar Deshpande, Sreenath Nair, Cindy Natassia, Galabina Yordanova, David Yuan, Oana Stroe, Gemma Wood, Agata Laydon, Augustin Žídek, Tim Green, Kathryn Tunyasuvunakool, Stig Petersen, John Jumper, Ellen Clancy, Richard Green, Ankur Vora, Mira Lutfi, Michael Figurnov, Andrew Cowie, Nicole Hobbs, Pushmeet Kohli, Gerard Kleywegt, Ewan Birney, Demis Hassabis, Sameer Velankar
Nucleic Acids Research 2022 -
Pfam: The protein families database in 2021 (Pfam)
Jaina Mistry, Sara Chuguransky, Lowri Williams, Matloob Qureshi, Gustavo A Salazar, Erik L L Sonnhammer, Silvio C E Tosatto, Lisanna Paladin, Shriya Raj, Lorna J Richardson, Robert D Finn, Alex Bateman
Nucleic Acids Research 2021
Models
-
Unified rational protein engineering with sequence-based deep representation learning (UniRep)
Ethan C. Alley, Grigory Khimulya, Surojit Biswas, Mohammed AlQuraishi, George M. Church
Nature Methods 2019 -
Prottrans: Toward understanding the language of life through self-supervised learning (ProtBERT)
Ahmed Elnaggar, Michael Heinzinger, Christian Dallago, Ghalia Rehawi, Yu Wang, Llion Jones, Tom Gibbs, Tamas Feher, Christoph Angerer, Martin Steinegger, Debsindhu Bhowmik, and Burkhard Rost
IEEE PAMI 2021 -
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences (ESM-1b)
Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, Rob Fergus
PNAS 2021 -
MSA Transformer (MSA Transformer)
Roshan M Rao, Jason Liu, Robert Verkuil, Joshua Meier, John Canny, Pieter Abbeel, Tom Sercu, Alexander Rives
ICML 2021 -
Retrieved Sequence Augmentation for Protein Representation Learning (RSA)
Chang Ma, Haiteng Zhao, Lin Zheng, Jiayi Xin, Qintong Li, Lijun Wu, Zhihong Deng, Yang Lu, Qi Liu, Lingpeng Kong
bioRxiv (2023) -
OntoProtein: Protein Pretraining With Gene Ontology Embedding (OntoProtein)
Ningyu Zhang, Zhen Bi, Xiaozhuan Liang, Siyuan Cheng, Haosen Hong, Shumin Deng, Jiazhang Lian, Qiang Zhang, Huajun Chen
ICLR 2022 -
Protein Representation Learning via Knowledge Enhanced Primary Structure Modeling (KeAP)
Hong-Yu Zhou, Yunxiang Fu, Zhicheng Zhang, Cheng Bian, Yizhou Yu
bioRxiv (2023) -
Intrinsic-Extrinsic Convolution and Pooling for Learning on 3D Protein Structures (IEConv)
Pedro Hermosilla, Marco Schäfer, Matěj Lang, Gloria Fackelmann, Pere Pau Vázquez, Barbora Kozlíková, Michael Krone, Tobias Ritschel, Timo Ropinski
ICLR 2021 -
Structure-based protein function prediction using graph convolutional networks (DeepFRI)
Vladimir Gligorijević, P. Douglas Renfrew, Tomasz Kosciolek, Julia Koehler Leman, Daniel Berenberg, Tommi Vatanen, Chris Chandler, Bryn C. Taylor, Ian M. Fisk, Hera Vlamakis, Ramnik J. Xavier, Rob Knight, Kyunghyun Cho, Richard Bonneau
Nature Communications 2021 -
Protein Representation Learning by Geometric Structure Pretraining (GearNET)
Zuobai Zhang, Minghao Xu, Arian Jamasb, Vijil Chenthamarakshan, Aurelie Lozano, Payel Das, Jian Tang
arXiv:2203.06125 (2022)
Structure Prediction
Datasets
-
The Protein Data Bank (PDB)
Helen M. Berman, John Westbrook, Zukang Feng, Gary Gilliland, T. N. Bhat, Helge Weissig, Ilya N. Shindyalov, Philip E. Bourne
Nucleic Acids Research 2000 -
Critical assessment of methods of protein structure prediction (CASP)—Round XIV (CASP14)
Andriy Kryshtafovych, Torsten Schwede, Maya Topf, Krzysztof Fidelis, John Moult
Proteins 2021 -
Continuous Automated Model EvaluatiOn (CAMEO) complementing the critical assessment of structure prediction in CASP12 (CAMEO)
Jürgen Haas, Alessandro Barbato, Dario Behringer, Gabriel Studer, Steven Roth, Martino Bertoni, Khaled Mostaguir, Rafal Gumienny, Torsten Schwede
Proteins 2017
Metrics
-
LGA: a method for finding 3D similarities in protein structures (GDT-TS)
Adam Zemla
Nucleic Acids 2003 -
Scoring function for automated assessment of protein structure template quality (TM-score)
Yang Zhang, Jeffrey Skolnick
Proteins 2004 -
lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests (lDDT)
Valerio Mariani, Marco Biasini, Alessandro Barbato, Torsten Schwede
Bioinformatics 2013
Models
-
Highly accurate protein structure prediction with AlphaFold (AlphaFold)
John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, Alex Bridgland, Clemens Meyer, Simon A. A. Kohl, Andrew J. Ballard, Andrew Cowie, Bernardino Romera-Paredes, Stanislav Nikolov, Rishub Jain, Jonas Adler, Trevor Back, Stig Petersen, David Reiman, Ellen Clancy, Michal Zielinski, Martin Steinegger, Michalina Pacholska, Tamas Berghammer, Sebastian Bodenstein, David Silver, Oriol Vinyals, Andrew W. Senior, Koray Kavukcuoglu, Pushmeet Kohli, Demis Hassabis
Nature 2021) -
The trRosetta server for fast and accurate protein structure prediction (trRosetta)
Zongyang Du, Hong Su, Wenkai Wang, Lisha Ye, Hong Wei, Zhenling Peng, Ivan Anishchenko, David Baker, Jianyi Yang Nature Protocols 2021 -
Accurate prediction of protein structures and interactions using a three-track neural network (RoseTTAFold)
Minkyung Baek, Frank DiMaio, Ivan Anishchenko, Justas Dauparas, Sergey Ovchinnikov, Gyu Rie Lee, Jue Wang, Qian Cong, Lisa N. Kinch, R. Dustin Schaeffer, Claudia Millán, Hahnbeom Park, Carson Adams, Caleb R. Glassman, Andy DeGiovanni, Jose H. Pereira, Andria V. Rodrigues, Alberdina A. van Dijk, Ana C. Ebrecht, Diederik J. Opperman, Theo Sagmeister, Christoph Buhlheller, Tea Pavkov-Keller, Manoj K. Rathinaswamy, Udit Dalwadi, Calvin K. Yip, John E. Burke, K. Christopher Garcia, Nick V. Grishin, Paul D. Adams, Randy J. Read, David Baker
Science 2021 -
Evolutionary-scale prediction of atomic-level protein structure with a language model (ESMFold)
Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, Robert Verkuil, Ori Kabeli, Yaniv Shmueli, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Salvatore Candido, Alexander Rives
Science 2023 -
EigenFold: Generative Protein Structure Prediction with Diffusion Models (EigenFold)
Bowen Jing, Ezra Erives, Peter Pao-Huang, Gabriele Corso, Bonnie Berger, Tommi Jaakkola
arXiv:2304.02198 (2023)
Sequence Generation
Datasets
-
The Protein Data Bank (PDB)
Helen M. Berman, John Westbrook, Zukang Feng, Gary Gilliland, T. N. Bhat, Helge Weissig, Ilya N. Shindyalov, Philip E. Bourne
Nucleic Acids Research 2000 -
UniProt: the Universal Protein knowledgebase (UniRef/UniParc)
Rolf Apweiler, Amos Bairoch, Cathy H. Wu, Winona C. Barker, Brigitte Boeckmann, Serenella Ferro, Elisabeth Gasteiger, Hongzhan Huang, Rodrigo Lopez, Michele Magrane, Maria J. Martin, Darren A. Natale, Claire O'Donovan, Nicole Redaschi, Lai-Su L. Yeh
Nucleic Acids Research 2004 -
CATH: comprehensive structural and functional annotations for genome sequences (CATH)
Ian Sillitoe, Tony E. Lewis, Alison Cuff, Sayoni Das, Paul Ashford, Natalie L. Dawson, Nicholas Furnham, Roman A. Laskowski, David Lee, Jonathan G. Lees, Sonja Lehtinen, Romain A. Studer, Janet Thornton, Christine A. Orengo
Nucleic Acids Research 2015 -
Direct prediction of profiles of sequences compatible to a protein structure by neural networks with fragment-based local and energy-based nonlocal profiles (TS500)
Zhixiu Li, Yuedong Yang, Eshel Faraggi, Jian Zhan, and Yaoqi Zhou
Proteins 2014
Models
-
ProteinVAE: Variational AutoEncoder for Translational Protein Design (ProteinVAE)
Suyue Lyu, Shahin Sowlati-Hashjin, Michael Garton
bioRxiv (2023) -
ProT-VAE: Protein Transformer Variational AutoEncoder for Functional Protein Design (ProT-VAE)
Emre Sevgen, Joshua Moller, Adrian Lange, John Parker, Sean Quigley, Jeff Mayer, Poonam Srivastava, Sitaram Gayatri, David Hosfield, Maria Korshunova, Micha Livne, Michelle Gill, Rama Ranganathan, Anthony B. Costa, Andrew L. Ferguson
bioRxiv (2023) -
Expanding functional protein sequence spaces using generative adversarial networks (ProteinGAN)
Donatas Repecka, Vykintas Jauniskis, Laurynas Karpus, Elzbieta Rembeza, Irmantas Rokaitis, Jan Zrimec, Simona Poviloniene, Audrius Laurynenas, Sandra Viknander, Wissam Abuajwa, Otto Savolainen, Rolandas Meskys, Martin K. M. Engqvist, Aleksej Zelezniak
Nature Machine Intelligence (2021) -
Fast and flexible protein design using deep graph neural networks (ProteinSolver)
Alexey Strokach, David Becerra, Carles Corbi-Verge, Albert Perez-Riba, Philip M. Kim
Cell Systems 2020 -
PiFold: Toward effective and efficient protein inverse folding (PiFold)
Zhangyang Gao, Cheng Tan, Stan Z. Li
ICLR 2023 -
Protein sequence design with a learned potential
Namrata Anand, Raphael Eguchi, Irimpan I. Mathews, Carla P. Perez, Alexander Derry, Russ B. Altman, Po-Ssu Huang
Nature Communications 2022 -
Rotamer-free protein sequence design based on deep learning and self-consistency (ABACUS-R)
Yufeng Liu, Lu Zhang, Weilun Wang, Min Zhu, Chenchen Wang, Fudong Li, Jiahai Zhang, Houqiang Li, Quan Chen, Haiyan Liu
Nature Computational Science 2022 -
ProRefiner: an entropy-based refining strategy for inverse protein folding with global graph attention (ProRefiner)
Xinyi Zhou, Guangyong Chen, Junjie Ye, Ercheng Wang, Jun Zhang, Cong Mao, Zhanwei Li, Jianye Hao, Xingxu Huang, Jin Tang, Pheng Ann Heng
Nature Communications 2023 -
Graphormer supervised de novo protein design method and function validation (GPD)
Junxi Mu, Zhengxin Li, Bo Zhang, Qi Zhang, Jamshed Iqbal, Abdul Wadood, Ting Wei, Yan Feng, Hai-Feng Chen
Briefings in Bioinformatics 2024 -
Learning from Protein Structure with Geometric Vector Perceptrons (GVP-GNN)
Bowen Jing, Stephan Eismann, Patricia Suriana, Raphael John Lamarre Townshend, Ron Dror
ICLR 2021 -
Learning inverse folding from millions of predicted structures (ESM-IF1)
Chloe Hsu, Robert Verkuil, Jason Liu, Zeming Lin, Brian Hie, Tom Sercu, Adam Lerer, Alexander Rives
ICML 2022 -
Robust deep learning--based protein sequence design using ProteinMPNN (ProteinMPNN)
J Dauparas, I Anishchenko, N Bennett, H Bai, R J Ragotte, L F Milles, B I M Wicky, A Courbet, R J de Haas, N Bethel, P J Y Leung, T F Huddy, S Pellock, D Tischer, F Chan, B Koepnick, H Nguyen, A Kang, B Sankaran, A K Bera, N P King, D Baker
Science 2022
Backbone Design
Datasets
-
The Protein Data Bank (PDB)
Helen M. Berman, John Westbrook, Zukang Feng, Gary Gilliland, T. N. Bhat, Helge Weissig, Ilya N. Shindyalov, Philip E. Bourne
Nucleic Acids Research 2000 -
AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models (AlphaFoldDB)
Mihaly Varadi, Stephen Anyango, Mandar Deshpande, Sreenath Nair, Cindy Natassia, Galabina Yordanova, David Yuan, Oana Stroe, Gemma Wood, Agata Laydon, Augustin Žídek, Tim Green, Kathryn Tunyasuvunakool, Stig Petersen, John Jumper, Ellen Clancy, Richard Green, Ankur Vora, Mira Lutfi, Michael Figurnov, Andrew Cowie, Nicole Hobbs, Pushmeet Kohli, Gerard Kleywegt, Ewan Birney, Demis Hassabis, Sameer Velankar
Nucleic Acids Research 2022 -
SCOP: A structural classification of proteins database for the investigation of sequences and structures (SCOP)
Alexey G. Murzin, Steven E. Brenner, Tim Hubbard, Cyrus Chothia JMB 1995 -
SCOPe: improvements to the structural classification of proteins – extended database to facilitate variant interpretation and machine learning (SCOPe)
John-Marc Chandonia, Lindsey Guan, Shiangyi Lin, Changhua Yu, Naomi K Fox, Steven E Brenner Nucleic Acids Research 2022 -
CATH: comprehensive structural and functional annotations for genome sequences (CATH)
Ian Sillitoe, Tony E. Lewis, Alison Cuff, Sayoni Das, Paul Ashford, Natalie L. Dawson, Nicholas Furnham, Roman A. Laskowski, David Lee, Jonathan G. Lees, Sonja Lehtinen, Romain A. Studer, Janet Thornton, Christine A. Orengo
Nucleic Acids Research 2015
Metrics
- Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem (scTM)
Brian L. Trippe, Jason Yim, Doug Tischer, David Baker, Tamara Broderick, Regina Barzilay, Tommi Jaakkola
ICLR 2023
Models
-
Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem (ProtDiff)
Brian L. Trippe, Jason Yim, Doug Tischer, David Baker, Tamara Broderick, Regina Barzilay, Tommi Jaakkola
ICLR 2023 -
Protein structure generation via folding diffusion (FoldingDiff)
Kevin E. Wu, Kevin K. Yang, Rianne van den Berg, Sarah Alamdari, James Y. Zou, Alex X. Lu, Ava P. Amini
Nature Communications 2024 -
A Latent Diffusion Model for Protein Structure Generation (LatentDiff)
Cong Fu, Keqiang Yan, Limei Wang, Wing Yee Au, Michael McThrow, Tao Komikado, Koji Maruhashi, Kanji Uchino, Xiaoning Qian, Shuiwang Ji
LoG 2023 -
Generating Novel, Designable, and Diverse Protein Structures by Equivariantly Diffusing Oriented Residue Clouds (Genie)
Yeqing Lin, Mohammed AlQuraishi
arXiv:2301.12485 (2023) -
SE(3) diffusion model with application to protein backbone generation (FrameDiff)
Jason Yim, Brian L. Trippe, Valentin De Bortoli, Emile Mathieu, Arnaud Doucet, Regina Barzilay, Tommi Jaakkola
ICML 2023 -
De novo design of protein structure and function with RFdiffusion (RFDiffusion)
Joseph L. Watson, David Juergens, Nathaniel R. Bennett, Brian L. Trippe, Jason Yim, Helen E. Eisenach, Woody Ahern, Andrew J. Borst, Robert J. Ragotte, Lukas F. Milles, Basile I. M. Wicky, Nikita Hanikel, Samuel J. Pellock, Alexis Courbet, William Sheffler, Jue Wang, Preetham Venkatesh, Isaac Sappington, Susana Vázquez Torres, Anna Lauko, Valentin De Bortoli, Emile Mathieu, Sergey Ovchinnikov, Regina Barzilay, Tommi S. Jaakkola, Frank DiMaio, Minkyung Baek, David Baker
Nature 2023 -
Protein Language Model Supervised Precise and Efficient Protein Backbone Design Method (GPDL)
Bo Zhang, Kexin Liu, Zhuoqi Zheng, Yunfeiyang Liu, Junxi Mu, Ting Wei, Hai-Feng Chen
bioRxiv (2023) -
Joint Design of Protein Sequence and Structure based on Motifs (GeoPro)
Zhenqiao Song, Yunlong Zhao, Yufei Song, Wenxian Shi, Yang Yang, Lei Li
arXiv:2310.02546 (2023) -
An all-atom protein generative model (Protpardelle)
Alexander E. Chu, Lucy Cheng, Gina El Nesr, Minkai Xu, Po-Ssu Huang
bioRxiv (2023) -
Protein Sequence and Structure Co-Design with Equivariant Translation (ProtSeed)
Chence Shi, Chuanrui Wang, Jiarui Lu, Bozitao Zhong, Jian Tang
ICLR 2023
Antibody
Representation Learning (appendix)
Datasets
- Observed Antibody Space: A diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences (OAS)
Tobias H. Olsen, Fergus Boyles, Charlotte M. Deane
Protein Science 2022
Models
-
Antibody Representation Learning for Drug Discovery (BERTTransformer)
Lin Li, Esther Gupta, John Spaeth, Leslie Shing, Tristan Bepler, Rajmonda Sulo Caceres
arXiv:2210.02881 (2022) -
Deciphering antibody affinity maturation with language models and weakly supervised learning (AntiBERTy)
Jeffrey A. Ruffolo, Jeffrey J. Gray, Jeremias Sulam
arXiv:2112.07782 (2021) -
Deciphering the language of antibodies using selfsupervised learning (AntiBERTa)
Jinwoo Leem, Laura S. Mitchell, James H.R. Farmery, Justin Barton, Jacob D. Galson
Patterns 2022 -
AbLang: an antibody language model for completing antibody sequences (AbLang)
Tobias H Olsen, Iain H Moal, Charlotte M Deane
Bioinformatics Advances 2022 -
Pre-training with A rational approach for antibody (PARA)
Xiangrui Gao, Changling Cao, Lipeng Lai
bioRxiv (2023)
Structure Prediction (appendix)
Datasets
-
SAbDab: the structural antibody database (SAbDab)
James Dunbar, Konrad Krawczyk, Jinwoo Leem, Terry Baker, Angelika Fuchs, Guy Georges, Jiye Shi, Charlotte M. Deane
Nucleic Acids Research 2014 -
RosettaAntibodyDesign (RAbD): A general framework for computational antibody design (RAB)
Jared Adolf-Bryfogle, Oleks Kalyuzhniy, Michael Kubitz, Brian D. Weitzner, Xiaozhen Hu, Yumiko Adachi, William R. Schief, Roland L. Dunbrack, Jr.
PLOS Computational Biology 2018
Metrics
- Improved prediction of antibody VL–VH orientation (OCD)
Nicholas A. Marze, Sergey Lyskov, Jeffrey J. Gray
PEDS 2016
Models
-
tFold-Ab: Fast and Accurate Antibody Structure Prediction without Sequence Homologs (tFold-Ab)
Jiaxiang Wu, Fandi Wu, Biaobin Jiang, Wei Liu, Peilin Zhao
bioRxiv (2022) -
xTrimoABFold: De novo Antibody Structure Prediction without MSA (xTrimoABFold)
Yining Wang, Xumeng Gong, Shaochuan Li, Bing Yang, YiWu Sun, Chuan Shi, Yangang Wang, Cheng Yang, Hui Li, Le Song
arXiv:2212.00735 (2022) -
ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins (ABodyBuilder)
Brennan Abanades, Wing Ki Wong, Fergus Boyles, Guy Georges, Alexander Bujotzek, Charlotte M. Deane
Nature 2023 -
ABlooper: fast accurate antibody CDR loop structure prediction with accuracy estimation (ABlooper)
Brennan Abanades, Guy Georges, Alexander Bujotzek, Charlotte M Deane
Bioinformatics 2022 -
Geometric potentials from deep learning improve prediction of CDR H3 loop structures (DeepH3)
Jeffrey A Ruffolo, Carlos Guerra, Sai Pooja Mahajan, Jeremias Sulam, Jeffrey J Gray
Bioinformatics 2020 -
Simple End-to-end Deep Learning Model for CDR-H3 Loop Structure Prediction (SimpleDH3)
Natalia Zenkova, Ekaterina Sedykh, Tatiana Shugaeva, Vladislav Strashko, Timofei Ermak, Aleksei Shpilman
arXiv:2111.10656 (2021) -
Antibody structure prediction using interpretable deep learning (DeepAB)
Jeffrey A Ruffolo, Jeremias Sulam, Jeffrey J Gray
Patterns 2021 -
Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies (IgFold)
Jeffrey A Ruffolo, Lee-Shin Chu, Sai Pooja Mahajan, Jeffrey J Gray
Nature Communications 2023
CDR Generation (appendix)
Datasets
-
SAbDab: the structural antibody database (SAbDab)
James Dunbar, Konrad Krawczyk, Jinwoo Leem, Terry Baker, Angelika Fuchs, Guy Georges, Jiye Shi, Charlotte M. Deane
Nucleic Acids Research 2014 -
RosettaAntibodyDesign (RAbD): A general framework for computational antibody design (RAB)
Jared Adolf-Bryfogle, Oleks Kalyuzhniy, Michael Kubitz, Brian D. Weitzner, Xiaozhen Hu, Yumiko Adachi, William R. Schief, Roland L. Dunbrack, Jr.
PLOS Computational Biology 2018 -
SKEMPI 2.0: an updated benchmark of changes in protein-protein binding energy, kinetics and thermodynamics upon mutation (SKEMPI)
Justina Jankauskaite, Brian Jiménez-García, Justas Dapkunas, Juan Fernández-Recio, Iain H Moal
Bioinformatics 2019
Metrics
- Scoring function for automated assessment of protein structure template quality (TM-score)
Yang Zhang, Jeffrey Skolnick
Proteins 2004
Models
-
In silico proof of principle of machine learning-based antibody design at unconstrained scale
Rahmad Akbara, Philippe A. Roberta, Cédric R. Weberb, Michael Widrichc, Robert Franka, Milena Pavlovićd, Lonneke Schefferd, Maria Chernigovskayaa, Igor Snapkova, Andrei Slabodkina, Brij Bhushan Mehtaa, Enkelejda Mihoe, Fridtjof Lund-Johansena, Jan Terje Andersena,f, Sepp Hochreiterc,g, Ingrid Hobæk Haffh, Günter Klambauerc, Geir Kjetil Sandved, Victor Greiff
mAbs 2022https://www.tandfonline.com/doi/full/10.1080/19420862.2022.2031482 -
Iterative Refinement Graph Neural Network for Antibody Sequence-Structure Co-design (RefineGNN)
Wengong Jin, Jeremy Wohlwend, Regina Barzilay, Tommi Jaakkola
ICLR 2022 -
Conditional Antibody Design as 3D Equivariant Graph Translation (MEAN)
Xiangzhe Kong, Wenbing Huang, Yang Liu
ICLR 2023 -
Cross-Gate MLP with Protein Complex Invariant Embedding is A One-Shot Antibody Designer (ADesigner)
Cheng Tan, Zhangyang Gao, Lirong Wu, Jun Xia, Jiangbin Zheng, Xihong Yang, Yue Liu, Bozhen Hu, Stan Z. Li
AAAI 2024 -
Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models for Protein Structures (DiffAb)
Shitong Luo, Yufeng Su, Xingang Peng, Sheng Wang, Jian Peng, Jianzhu Ma
NeurIPS 2022 -
Deep Learning for Flexible and Site-Specific Protein Docking and Design (DockGPT)
Matt McPartlon, Jinbo Xu
bioRxiv (2023) -
Antibody-Antigen Docking and Design via Hierarchical Equivariant Refinement (HERN)
Wengong Jin, Dr.Regina Barzilay, Tommi Jaakkola
ICML 2022 -
End-to-End Full-Atom Antibody Design (dyMEAN)
Xiangzhe Kong, Wenbing Huang, Yang Liu
ICML 2023
Peptide
Misc. Tasks
Models
-
A Multi-Modal Contrastive Diffusion Model for Therapeutic Peptide Generation (MMCD)
Yongkang Wang, Xuan Liu, Feng Huang, Zhankun Xiong, Wen Zhang
AAAI 2024 -
PepGB: Facilitating peptide drug discovery via graph neural networks (PepGB)
Yipin Lei, Xu Wang, Meng Fang, Han Li, Xiang Li, Jianyang Zeng
arXiv:2401.14665 (2024) -
PepHarmony: A Multi-View Contrastive Learning Framework for Integrated Sequence and Structure-Based Peptide Encoding (PepHarmony)
Ruochi Zhang, Haoran Wu, Chang Liu, Huaping Li, Yuqian Wu, Kewei Li, Yifan Wang, Yifan Deng, Jiahui Chen, Fengfeng Zhou, Xin Gao
arXiv:2401.11360 (2024) -
PEFT-SP: Parameter-Efficient Fine-Tuning on Large Protein Language Models Improves Signal Peptide Prediction (PEFT-SP)
Shuai Zeng, Duolin Wang, Dong Xu
bioRxiv (2023) -
AdaNovo: Adaptive De Novo Peptide Sequencing with Conditional Mutual Information (AdaNovo)
Jun Xia, Shaorong Chen, Jingbo Zhou, Tianze Ling, Wenjie Du, Sizhe Liu, Stan Z. Li
arXiv:2403.07013 (2024)