Awesome
Voice Jailbreak Attacks Against GPT-4o
Disclaimer. This repo contains examples of harmful language. Reader discretion is recommended.
This is the official repository for Voice Jailbreak Attacks Against GPT-4o. In this paper, we present the first study on how to jailbreak GPT-4o with voice.
Check out our demo below!
Code
- Set your OpenAI key
echo "export OPENAI_API_KEY='YOURKEY'" >> ~/.zshrc
source ~/.zshrc
echo $OPENAI_API_KEY # check your key
- Convert forbidden questions to audio files
python tts/prompt2audio.py --dataset baseline --voice fable
- Convert text jailbreak prompts to audio files
python tts/prompt2audio.py --dataset textjailbreak --voice fable
Then, manually play each audio on GPT-4o to test its performance.
Data
Forbidden Questions
- English:
data/question_set/questions_tiny.csv
- Chinese:
data/question_set/questions_tiny_zh.csv
Prompts
- Text jailbreak prompts:
data/jailbreak_prompts/text_jailbreak_prompts.csv
- VoiceJailbreak prompts:
data/jailbreak_prompts/voicejailbreak.csv
- Plot format of the forbidden questions:
data/question_set/questions_tiny_plot.csv
- Plot format of the forbidden questions:
Success Cases
data/screenshot/
Ethics
We take utmost care of the ethics of our study. Specifically, all experiments are conducted using two registered accounts and manually labeled by the authors, thus eliminating the exposure risks to third parties, such as crowdsourcing workers. Therefore, our work is not considered human subjects research by our Institutional Review Boards (IRB). We acknowledge that evaluating GPT-4o's capabilities in answering forbidden questions can reveal how the model can be induced to generate inappropriate content. This can raise concerns about potential misuse. We believe it is important to disclose this research fully. The methods presented are straightforward to implement and are likely to be discovered by potential adversaries. We have responsibly disclosed our findings to related LLM vendors.
Citation
If you find this useful in your research, please consider citing:
@article{SWBZ24,
author = {Xinyue Shen and Yixin Wu and Michael Backes and Yang Zhang},
title = {{Voice Jailbreak Attacks Against GPT-4o}},
journal = {{CoRR abs/2405.19103}},
year = {2024}
}
License
VoiceJailbreak
is licensed under the terms of the MIT license. See LICENSE for more details.