Awesome
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models
Introduction
Welcome to JailbreakZoo, a dedicated repository focused on the jailbreaking of large models (LMs), encompassing both large language models (LLMs) and vision language models (VLMs). This project aims to explore the vulnerabilities, exploit methods, and defense mechanisms associated with these advanced AI models. Our goal is to foster a deeper understanding and awareness of the security aspects surrounding large-scale AI systems.
Our website can be found in here
Our paper can be found in here
Timeline
This repository is systematically organized according to the publication timeline.
:fire::fire::fire: <span style="font-size:xx-large;">The latest update being September 01, 2024</span> :fire::fire::fire:
Contents
-
Jailbreaks of LLMs: Discover the techniques and case studies related to the jailbreaking of large language models.
-
Defenses of LLMs: Explore the strategies and methods employed to defend large language models against various types of attacks.
-
Jailbreaks of VLMs: Learn about the vulnerabilities and jailbreaking approaches specific to vision language models.
-
Defenses of VLMs: Understand the defense mechanisms designed for vision language models, including the most recent advancements and strategies.
Contributing
We welcome contributions from the community! Whether you're interested in adding new research, improving existing documentation, or sharing your own jailbreak or defense strategies, your insights are valuable to us. Please check our Contribution Guidelines for more information on how you can get involved.
License and Citation
This project is available under the MIT License. Please refer to our citation guidelines if you wish to reference our work in your research or publications.
Thank you for visiting JailbreakZoo. We hope this repository serves as a valuable resource in your exploration of large model security.
Acknowledgement
Special thanks to our notable contributors: Haibo Jin, Leyang Hu, Xinuo Li, Peiyan Zhang, Chonghan Chen, Jun Zhuang, and Haohan Wang.
*The ranking is in partial order.
Reference
@article{jin2024jailbreakzoo,
title={JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models},
author={Jin, Haibo and Hu, Leyang and Li, Xinuo and Zhang, Peiyan and Chen, Chonghan and Zhuang, Jun and Wang, Haohan},
journal={arXiv preprint arXiv:2407.01599},
year={2024}
}