1. AI Red Teaming vs. Traditional Red Teaming & Key Risks
- Traditional Red Teaming : Double-blind, stealthy, mimics nation-state attacks.
- AI Red Teaming : Typically single-blind, simulates both adversarial and benign users, evolves rapidly.
Key Generative AI Risks:
- Fabrication : Confident but incorrect outputs.
- Alignment Gaps : Learned behavior diverges from intended goals.
- Prompt Injection : Models treat all input as a single instruction stream, making them vulnerable to both direct and indirect attacks.
The Microsoft AI red team, founded in 2018, tests models like Copilot and OpenAI systems pre-release. Their diverse team ensures coverage across safety, abuse, privacy, and adversarial threats.
2. How Generative AI Works
- Generative vs. Traditional AI : Generative models produce content; traditional models classify or score inputs.
- Model Scale : Large models (e.g., GPT-3, GPT-4o) have billions of parameters; SLMs are smaller, cheaper, and more task-specific.
Training Phases:
- Pre-training : Massive datasets teach general patterns—biases and harmful content can be embedded.
- Post-training : Adds interactivity and safety instructions—often fragile and easily broken.
- Red Teaming : Stress tests before deployment in a break-fix cycle.
- App-side Mitigation : Developers layer in custom filters and tests.
How Text Is Processed:
- Tokenization : Input is split into tokens, which are turned into vectors (embeddings).
- Transformer Architecture : Models learn context across token sequences—powerful but vulnerable to manipulation.