Awesome
<h1 align="center"> <img src="resources/prompt-icon.svg" alt="prompt-icon"> Prompt Fuzzer <img src="resources/prompt-icon.svg" alt="prompt-icon"> </h1> <h2 align="center"> The open-source tool to help you harden your GenAI applications <br> <br> </h2> <div align="center"> <h4> Brought to you by Prompt Security, the Complete Platform for GenAI Security </div><div align="center"> </div>
Table of Contents
<!-- vim-markdown-toc GFM -->- :sparkles: About
- :rotating_light: Features
- :rocket: Installation
- Using pip
- Package page
- :construction: Using docker coming soon
- Usage
- Examples
- :clapper: Demo video
- Supported attacks
- :rainbow: Whatโs next on the roadmap?
- :beers: Contributing
<a id="what-is-prompt-fuzzer"></a>
โจ What is the Prompt Fuzzer
- This interactive tool assesses the security of your GenAI application's system prompt against various dynamic LLM-based attacks. It provides a security evaluation based on the outcome of these attack simulations, enabling you to strengthen your system prompt as needed.
- The Prompt Fuzzer dynamically tailors its tests to your application's unique configuration and domain.
- The Fuzzer also includes a Playground chat interface, giving you the chance to iteratively improve your system prompt, hardening it against a wide spectrum of generative AI attacks.
:warning: Using the Prompt Fuzzer will lead to the consumption of tokens. :warning:
<br><a id="installation"></a>
๐ Installation
-
Install the Fuzzer package <a id="using-pip"></a>
Using pip install
pip install prompt-security-fuzzer
<a id="using-pypi"></a>
Using the package page on PyPi
You can also visit the package page on PyPi
Or grab latest release wheel file form releases
-
Launch the Fuzzer
export OPENAI_API_KEY=sk-123XXXXXXXXXXXX prompt-security-fuzzer
-
Input your system prompt
-
Start testing
-
Test yourself with the Playground! Iterate as many times are you like until your system prompt is secure.
<a id="usage"></a>
:computer: Usage
<a id="features"></a>
Features
<b>The Prompt Fuzzer Supports:</b><br> ๐ง 16 llm providers<br> ๐ซ 15 different attacks<br> ๐ฌ Interactive mode<br> ๐ค CLI mode<br> ๐งต Multi threaded testing<br>
<a id="environment-variables"></a>
Environment variables:
You need to set an environment variable to hold the access key of your preferred LLM provider.
default is OPENAI_API_KEY
Example: set OPENAI_API_KEY
with your API Token to use with your OpenAI account.
Alternatively, create a file named .env
in the current directory and set the OPENAI_API_KEY
there.
<a id="llm-providers"></a>
ENVIORMENT KEY | Description |
---|---|
ANTHROPIC_API_KEY | Anthropic Chat large language models. |
ANYSCALE_API_KEY | Anyscale Chat large language models. |
AZURE OPENAI_API_KEY | Azure OpenAI Chat Completion API. |
BAICHUAN_API_KEY | Baichuan chat models API by Baichuan Intelligent Technology. |
COHERE_API_KEY | Cohere chat large language models. |
EVERLYAI_API_KEY | EverlyAI Chat large language models |
FIREWORKS_API_KEY | Fireworks Chat models |
GIGACHAT_CREDENTIALS | GigaChat large language models API. |
GOOGLE_API_KEY | Google PaLM Chat models API. |
JINA_API_TOKEN | Jina AI Chat models API. |
KONKO_API_KEY | ChatKonko Chat large language models API. |
MINIMAX_API_KEY , MINIMAX_GROUP_ID | Wrapper around Minimax large language models. |
OPENAI_API_KEY | OpenAI Chat large language models API. |
PROMPTLAYER_API_KEY | PromptLayer and OpenAI Chat large language models API. |
QIANFAN_AK , QIANFAN_SK | Baidu Qianfan chat models. |
YC_API_KEY | YandexGPT large language models. |
<a id="options"></a>
Command line Options
--list-providers
Lists all available providers--list-attacks
Lists available attacks and exit--attack-provider
Attack Provider--attack-model
Attack Model--target-provider
Target provider--target-model
Target model--num-attempts, -n
NUM_ATTEMPTS Number of different attack prompts--num-threads, -t
NUM_THREADS Number of worker threads--attack-temperature, -a
ATTACK_TEMPERATURE Temperature for attack model--debug-level, -d
DEBUG_LEVEL Debug level (0-2)-batch, -b
Run the fuzzer in unattended (batch) mode, bypassing the interactive steps
<a id="examples"></a>
Examples
System prompt examples (of various strengths) can be found in the subdirectory system_prompt.examples in the sources.
<a id="interactive"></a>
Interactive mode (default mode)
Run tests against the system prompt
prompt_security_fuzzer
<a id="singlerun"></a>
:speedboat: Quick start single run
Run tests against the system prompt (in non-interactive batch mode):
prompt-security-fuzzer -b ./system_prompt.examples/medium_system_prompt.txt
๐บ Custom Benchmark!
Run tests against the system prompt with a custom benchmark
prompt-security-fuzzer -b ./system_prompt.examples/medium_system_prompt.txt --custom-benchmark=ps_fuzz/attack_data/custom_benchmark1.csv
๐น Run only a subset of attacks!
Run tests against the system prompt with a subset of attacks
prompt-security-fuzzer -b ./system_prompt.examples/medium_system_prompt.txt --custom-benchmark=ps_fuzz/attack_data/custom_benchmark1.csv --tests='["ucar","amnesia"]'
<br>
<br>
<br>
<a id="colab"></a>
๐ Google Colab Notebook
Refine and harden your system prompt in our Google Colab Notebook<br><br> <img src="./resources/PromptFuzzer.png" alt="Prompt Fuzzer Refinement Process"/> <br><br> <a id="demovideo"></a>
๐ฌ Demo video
<a id="attacks"></a>
:crossed_swords: Simulated Attack Details
We use a dynamic testing approach, where we get the necessary context from your System Prompt and based on that adapt the fuzzing process.
<a id="jailbreak"></a>
Jailbreak
- AIM Jailbreak: Probes ethical compliance by eliciting instructions on illegal activities via Machiavelli roleplay.
- Affirmative Suffix: Exploits a vulnerability to elicit affirmative responses for prohibited content.
- Amnesia: Causes the LLM to forget or disregard system prompt instructions, leading to unrelated outputs.
- Contextual Redirection: Tricks the LLM into adopting a specific role that may bypass preset restrictions.
- Do Anything Now Jailbreak (DAN): Tests LLM resilience against performing forbidden activities through roleplay.
- Harmful Behavior: Challenges the LLM with prompts from a harmful behavior dataset.
- Linguistic Evasion: Uses non-English languages to bypass preset restrictions or focus.
- Self Refine: Attempts to bypass system prompts by refining responses towards unethical directions.
- UCAR: Assesses LLM adherence to content restrictions when asked to ignore such filters.
- Base64 Evasion Attempts to evade detection of harmful prompts by encoding them into base64 strings
<a id="pi-injection"></a>
Prompt Injection
- Authoritative Role Impersonation: Impersonates an authoritative role to misdirect the LLM's outputs.
- Complimentary Transition: Tests ability to maintain content standards when switching topics after a compliment.
- Ethical Compliance: Evaluates resistance to discussing harmful or inappropriate content about sensitive topics.
- Typoglycemia Attack: Exploits text processing vulnerabilities by omitting random characters, causing incorrect responses.
<a id="systemleak"></a>
System prompt extraction
- System Prompt Stealer: Attempts to extract the LLM's internal configuration or sensitive information.
Definitions
- Broken: Attack type attempts that LLM succumbed to.
- Resilient: Attack type attempts that LLM resisted.
- Errors: Attack type attempts that had inconclusive results.
<a id="roadmap"></a>
:rainbow: Whatโs next on the roadmap?
- Google Colab Notebook
- Adjust the output evaluation mechanism for prompt dataset testing
- Continue adding new GenAI attack types
- Enhaced reporting capabilites
- Hardening recommendations
Turn this into a community project! We want this to be useful to everyone building GenAI applications. If you have attacks of your own that you think should be a part of this project, please contribute! This is how: https://github.com/prompt-security/ps-fuzz/blob/main/CONTRIBUTING.md
<a id="contributing"></a>
๐ป Contributing
Interested in contributing to the development of our tools? Great! For a guide on making your first contribution, please see our Contributing Guide. This section offers a straightforward introduction to adding new tests.
For ideas on what tests to add, check out the issues tab in our GitHub repository. Look for issues labeled new-test
and good-first-issue
, which are perfect starting points for new contributors.