GENERATIVE AI - RED TEAMING

1. AI Red Teaming vs. Traditional Red Teaming & Key Risks

Traditional Red Teaming : Double-blind, stealthy, mimics nation-state attacks.
AI Red Teaming : Typically single-blind, simulates both adversarial and benign users, evolves rapidly.

Key Generative AI Risks:

Fabrication : Confident but incorrect outputs.
Alignment Gaps : Learned behavior diverges from intended goals.
Prompt Injection : Models treat all input as a single instruction stream, making them vulnerable to both direct and indirect attacks.

The Microsoft AI red team, founded in 2018, tests models like Copilot and OpenAI systems pre-release. Their diverse team ensures coverage across safety, abuse, privacy, and adversarial threats.

2. How Generative AI Works

Generative vs. Traditional AI : Generative models produce content; traditional models classify or score inputs.
Model Scale : Large models (e.g., GPT-3, GPT-4o) have billions of parameters; SLMs are smaller, cheaper, and more task-specific.

Training Phases:

Pre-training : Massive datasets teach general patterns—biases and harmful content can be embedded.
Post-training : Adds interactivity and safety instructions—often fragile and easily broken.
Red Teaming : Stress tests before deployment in a break-fix cycle.
App-side Mitigation : Developers layer in custom filters and tests.

How Text Is Processed:

Tokenization : Input is split into tokens, which are turned into vectors (embeddings).
Transformer Architecture : Models learn context across token sequences—powerful but vulnerable to manipulation.