LLM01 — Prompt Injection¶
User prompts alter the intended behavior of the LLM system.
Modality: black-box. Status: covered.
Prompt injection is the canonical LLM application risk: attacker-controlled text (direct, or indirect via a document/web page/tool output the app ingests) overrides the application's intended instructions, making it leak data, take unauthorized actions, or produce attacker-chosen output. In one study, 86% of tested LLM applications were vulnerable.
How LLMSecTest tests it¶
LLMSecTest sends a battery of injection techniques to your application and checks whether the app obeys the injected instruction. Detection uses an attacker-side marker: the attack asks the app to emit a unique sentinel; if the reply contains it, the injection succeeded. This needs nothing from inside your app, so it works black-box against any endpoint.
Techniques in the corpus include:
- Direct instruction override — "Ignore previous instructions and …".
- Forged system delimiter — fake
system:/role markers to impersonate the app's own instructions. - Prefix / compliance priming — coaxing a "Sure, …" prefix that carries the injected payload.
- Role-play jailbreak (DAN-style) and task context switch.
Reading a finding¶
A finding names the technique, the OWASP category (LLM01), a severity, and the evidence — for example "target obeyed the injected instruction and emitted marker: PWNED-LLM01-DELIM", alongside the prompt and the app's response. In SARIF it lands in your code-scanning view mapped to LLM01.
Remediation¶
- Treat all model input as untrusted — including tool outputs, retrieved documents, and prior turns.
- Keep privileged instructions out of reach of user content; don't concatenate user text into the system prompt.
- Constrain outputs and validate them before acting (see LLM05); require explicit, out-of-band authorization for state-changing actions (see LLM06).
- Add a guardrail/classifier layer and re-run LLMSecTest to confirm it closes the finding.
See the OWASP LLM01 entry for the full guidance.