Awesome
Awesome-ML-Security
A curated list of awesome machine learning security references, guidance, tools, and more.
Table of Contents
- Awesome-ML-Security
Relevant work, standards, literature
CIA of the model
Membership attacks, model inversion attacks, model extraction, adversarial perturbation, prompt injections, etc.
- Towards the Science of Security and Privacy in Machine Learning
- SoK: Machine Learning Governance
- Not with a Bug, But with a Sticker: Attacks on Machine Learning Systems and What To Do About Them
- On the Impossible Safety of Large AI Models
Confidentiality
Reconstruction (model inversion; attribute inference; gradient and information leakage), theft of data, Membership inference and reidentification of data, Model extraction (model theft), property inference (leakage of dataset properties), etc.
- awesome-ml-privacy-attacks
- Privacy Side Channels in Machine Learning Systems
- Beyond Labeling Oracles: What does it mean to steal ML models?
- Text Embeddings Reveal (Almost) As Much As Text
- Language Model Inversion
- Extracting Training Data from ChatGPT
- Recovering the Pre-Fine-Tuning Weights of Generative Models
Integrity
Backdoors/neural trojans (same as for non-ML systems), adversarial evasion (perturbation of an input to evade a certain classification or output), data poisoning and ordering (providing malicious data or changing the order of the data flow into an ML model).
- A Systematic Survey of Backdoor Attack, Weight Attack and Adversarial Examples
- Poisoning Web-Scale Training Datasets is Practical
- Planting Undetectable Backdoors in Machine Learning Models
- Motivating the Rules of the Game for Adversarial Example Research
- On Evaluating Adversarial Robustness
- Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
- Universal and Transferable Adversarial Attacks on Aligned Language Models
- Manipulating SGD with Data Ordering Attacks
- Adversarial reprogramming - repurposing a model for a different task than its original intended purpose
- Model spinning attacks (meta backdoors) - forcing a model to produce output that adheres to a meta task (for ex. making a general LLM produce propaganda)
- LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?
- Securing LLM Systems Against Prompt Injection & Mitigating Stored Prompt Injection Attacks Against LLM Applications
Availability
- Energy-latency attacks - denial of service for neural networks
Degraded model performance
- Trail of Bits's Audit of YOLOv7
- Robustness Testing of Autonomy Software
- Can robot navigation bugs be found in simulation? An exploratory study
- Bugs can optimize for bad behavior (OpenAI GPT-2)
- You Only Look Once Run time errors
ML-Ops
- Incubated ML Exploits: Backdooring ML Pipelines using Input-Handling Bugs
- Auditing the Ask Astro LLM Q&A app
- Exploiting ML models with pickle file attacks: Part 1 & Exploiting ML models with pickle file attacks: Part 2
- PCC: Bold step forward, not without flaws
- Trail of Bits's Audit of the Safetensors Library
- Facebook’s LLAMA being openly distributed via torrents
- Summoning Demons: The Pursuit of Exploitable Bugs in Machine Learning
- DeepPayload: Black-box Backdoor Attack on Deep Learning Models through Neural Payload Injection
- Weaponizing Machine Learning Models with Ransomware (and Machine Learning Threat Roundup)
- Bug Characterization in Machine Learning-based Systems
- LeftoverLocals: Listening to LLM responses through leaked GPU local memory
- Offensive ML Playbook
AI’s effect on attacks/security elsewhere
- How AI will affect cybersecurity: What we told the CFTC
- Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants
- Examining Zero-Shot Vulnerability Repair with Large Language Models
- Do Users Write More Insecure Code with AI Assistants?
- Learned Systems Security
- Beyond the Hype: A Real-World Evaluation of the Impact and Cost of Machine Learning-Based Malware Detection
- Data-Driven Offense from Infiltrate 2015
- Codex (and GPT-4) can’t beat humans on smart contract audits
Self-driving cars
LLM Alignment
Regulatory actions
US
- FTC: Keep your AI claims in check
- FAA - Unmanned Aircraft Vehicles
- NHTSA - Automated Vehicle safety
- AI Bill of Rights
- Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence
EU
- The Artificial Intelligence Act (proposed)
Other
- TIME Ideas: How AI Can Be Regulated Like Nuclear Energy
- Trail of Bits’s Response to OSTP National Priorities for AI RFI
- Trail of Bits’s Response to NTIA AI Accountability RFC
Safety standards
- Toward Comprehensive Risk Assessments and Assurance of AI-Based Systems
- ISO/IEC 42001 — Artificial intelligence — Management system
- ISO/IEC 22989 — Artificial intelligence — Concepts and terminology
- ISO/IEC 38507 — Governance of IT — Governance implications of the use of artificial intelligence by organizations
- ISO/IEC 23894 — Artificial Intelligence — Guidance on Risk Management
- ANSI/UL 4600 Standard for Safety for the Evaluation of Autonomous Products — addresses fully autonomous systems that move such as self-driving cars, and other vehicles including lightweight unmanned aerial vehicles (UAVs). Includes safety case construction, risk analysis, design process, verification and validation, tool qualification, data integrity, human-machine interaction, metrics and conformance assessment.
- High-Level Expert Group on AI in European Commission — Ethics Guidelines for Trustworthy Artificial Intelligence
Taxonomies and frameworks
- NIST AI 100-2e2023
- MITRE ATLAS
- AI Incident Database
- OWASP Top 10 for LLMs
- Guidelines for secure AI system development
Security tools and techniques
API probing
- PrivacyRaven: runs different privacy attacks against ML models; the tool only runs black-box label-only attacks
- Counterfit: runs different adversarial ML attacks against ML models
Model backdoors
- Fickling: a decompiler, static analyzer, and bytecode rewriter for Python pickle files; injects backdoors into ML model files
- Semgrep rules for ML
- API Rate Limiting
Other
- Awesome Large Language Model Tools for Cybersecurity Research
- Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models
Background information
- Building A Generative AI Platform (Chip Huyen)
- Machine Learning Glossary | Google Developers
- Hugging Face NLP course
- Making Large Language Models work for you
- Andrej Karpathy's Intro to Large Language Models and Neural Networks: Zero to Hero
- Normcore LLM Reading List especially Building LLM applications for production
- 3blue1brown's Guide to Neural Networks
- Licensing:
DeepFakes, disinformation, and abuse
- How to Prepare for the Deluge of Generative AI on Social Media
- Generative ML and CSAM: Implications and Mitigations
Notable incidents
Notable harms
Incident | Type | Loss |
---|---|---|
Google Photos Gorillas | Algorithmic bias | Reputational |
Uber hits a pedestrian | Model failure | |
Facebook mistranslation leads to arrest | Algorithmic bias |