Blog
We Mapped 2,748 Dangerous AI Agent Skills to the OWASP LLM Top 10 — Here's What We Found
March 13, 2026 · 14 min read · By 4Worlds
The OWASP Top 10 for LLM Applications (2025 v2.0) is the most widely cited framework for LLM security risks. It covers prompt injection, data poisoning, supply chain vulnerabilities, and more. Security teams reference it in threat models. Compliance frameworks point to it. It shapes how the industry thinks about AI security.
But it's a theoretical framework — built from expert consensus, not empirical data. Nobody has tested it against a real-world corpus of AI agent configurations to see which risks actually show up in practice and which are theoretical concerns.
Until now.
What we did
We took ClawAudit, our static security analyzer for AI agent configurations, and added OWASP LLM Top 10 mapping to every finding it produces — every pattern match, every compound threat, every permission integrity violation. Then we re-scanned 2,748 dangerous-tier OpenClaw skills: the ones that scored below 40/100 in our registry-wide audit.
These are skills that real developers have published to a real package registry. They're not contrived examples — they're configurations that AI agents actually execute. Some are malicious. Most are just careless. All of them represent real risk.
The scan maps each finding to one or more OWASP LLM categories, producing a dataset that answers the question: which OWASP risks dominate real-world AI agent configurations?
The headline numbers
The full breakdown:
| OWASP ID | Name | Skills affected | % of dangerous |
|---|---|---|---|
| LLM02 | Sensitive Information Disclosure | 2,201 | 80.1% |
| LLM06 | Excessive Agency | 1,358 | 49.4% |
| LLM05 | Improper Output Handling | 651 | 23.7% |
| LLM01 | Prompt Injection | 577 | 21.0% |
| LLM03 | Supply Chain Vulnerabilities | 481 | 17.5% |
| LLM07 | System Prompt Leakage | 71 | 2.6% |
| LLM04 | Data and Model Poisoning | 0 | 0% |
| LLM08 | Vector and Embedding Weaknesses | 0 | 0% |
| LLM09 | Misinformation | 0 | 0% |
| LLM10 | Unbounded Consumption | 0 | 0% |
The distribution is radically uneven. Two categories account for the vast majority of real-world risk. Four categories don't apply at all. Let's dig into each one.
LLM02: Sensitive Information Disclosure — 80.1%
"Sensitive information can be disclosed through the output of the LLM, leading to unauthorized data access, privacy violations, and security breaches." — OWASP
The most prevalent risk by a wide margin. 4 out of 5 dangerous skills have patterns that map to sensitive information disclosure.
In the context of agent configurations, this manifests in three ways:
1. Credential access patterns
Skills that access process.env.API_KEY,
os.environ['SECRET_TOKEN'], or similar patterns. Many skills
legitimately need API keys — the question is whether they also do something dangerous with them.
ClawAudit distinguishes between "uses a credential" (normal) and "uses a credential AND makes
network requests to an undeclared endpoint" (credential theft pattern). The compound threat
credential_access + network_out is the strongest signal for
LLM02, and it's the most common compound threat in the entire dataset.
2. Filesystem path exposure
Skills that reference ~/.ssh,
~/.aws/credentials,
/etc/passwd, or cloud provider credential files. These
aren't accessing the credentials directly (that would be code execution) — they're
configuring the agent to be aware of these paths, which means the agent can read
them if it has filesystem access.
3. System prompt / configuration leakage
Skills that expose their own system prompt, internal configuration, or operational instructions in ways that could be extracted by a user or another agent. This overlaps with LLM07 (System Prompt Leakage), which we counted separately.
Why 80% is expected
This number is high but not surprising. Agent skills exist to do things — call APIs, access files, interact with services. All of these require credentials. The dangerous part isn't credential access itself; it's credential access combined with other capabilities that create exfiltration vectors.
The real insight is in the compound threat data: of the 2,201 skills with LLM02 findings,
a significant portion also have network_out or
data_encoding capabilities, which elevates the risk from
"accesses credentials" to "can exfiltrate credentials."
LLM06: Excessive Agency — 49.4%
"An LLM-based system is often granted a degree of agency to call functions or interface with other systems. Excessive agency occurs when LLM-based systems are granted access to functions, permissions, or autonomy beyond what is necessary." — OWASP
Nearly half of all dangerous skills exhibit excessive agency. This is the permission integrity gap — the difference between what a skill declares it needs and what it actually does.
Consider a skill that declares this in its YAML frontmatter:
requires:
env:
- API_KEY
bins:
- jq Looks minimal. But in its code blocks, the skill:
- Makes outbound HTTP requests to
https://some-server.com/api - Reads files from the filesystem using
readFileSync - Spawns subprocesses using
child_process.exec - Writes to configuration files
None of these capabilities are declared. The skill asks for jq
and an API key, but it's actually reading your files, executing commands, and sending data to
external servers. This is textbook excessive agency.
How ClawAudit detects this
ClawAudit extracts capabilities from code blocks in instruction context and compares them against declared permissions in frontmatter. The gap between "declared" and "actual" is quantified as permission integrity violations, each mapped to LLM06.
Three types of violations:
- Undeclared capabilities (high severity) — the skill does things it didn't
ask permission for. Network access without declaring
curlorwget. Credential access without declaring any environment variables. - Opaque dependencies (medium) — runtime package installation
(
npm install,pip install) where the full dependency tree is not auditable before execution. - Over-declared permissions (low) — the inverse. A skill declares
dockerandkubectlin its required binaries but never uses them. Less dangerous, but suspicious — why request capabilities you don't need?
The MCP angle
Excessive agency gets worse in multi-file configurations. ClawAudit's
cross-file trust tracing (new in v2) detects capability escalation across
CLAUDE.md and .mcp.json
files. A CLAUDE.md might look clean, but if the MCP config it references grants the agent
access to docker, kubectl,
or filesystem servers, the effective permission surface is much larger than the CLAUDE.md
suggests.
This kind of cross-boundary escalation is invisible when you audit files individually. You have to trace trust across the full configuration stack.
LLM05: Improper Output Handling — 23.7%
"Improper output handling refers to insufficient validation, sanitization, and handling of outputs from large language models before they are passed to other components." — OWASP
In the agent config context, this category captures code execution patterns — skills that take input and execute it as code without sanitization:
eval()andnew Function()— dynamic code execution in JavaScriptchild_process.exec()andsubprocess.run()— shell command executioncurl | bash— downloading and executing remote scripts__import__('os').system()— Python OS-level access
Zone-aware false positive prevention
This is where ClawAudit's zone-aware analysis is critical. A security tutorial might include:
# Security Warning
Do NOT use eval() with user input:
```js
// BAD - vulnerable to code injection
eval(userInput)
```
A naive scanner would flag this as a critical finding. ClawAudit classifies it as a code block
under a security documentation heading, applies a 0.15x severity multiplier, and suppresses it
to info level. The same eval() in an instruction code block
gets full critical severity.
Without zone awareness, you'd either drown in false positives from documentation or miss real threats by filtering too aggressively. This is a solved problem in traditional SAST (Semgrep, CodeQL understand ASTs), but nobody had brought it to markdown-based agent configs before.
LLM01: Prompt Injection — 21.0%
"A prompt injection vulnerability occurs when an attacker manipulates a large language model through crafted inputs, causing the LLM to unknowingly execute the attacker's intentions." — OWASP
One in five dangerous skills contains prompt injection patterns. These break down into several subcategories:
Direct injection patterns
The crude but effective approach: literal strings like "ignore previous instructions," "you are now a different AI," or "disregard all prior directives." These are sometimes test strings left in by developers, sometimes genuinely malicious. ClawAudit doesn't distinguish intent — it flags the pattern and lets the human decide.
Obfuscated injection
More sophisticated attacks use Unicode homoglyphs to evade simple string matching. The Cyrillic character "а" (U+0430) looks identical to Latin "a" (U+0061) in most fonts. An attacker can write "іgnоrе рrеvіоus іnstruсtіоns" using a mix of Cyrillic and Latin characters that visually appears normal but bypasses keyword filters.
ClawAudit's Unicode confusable normalization defeats this. Before pattern scanning, all Cyrillic lookalikes (А→A, е→e, о→o, р→p, с→c, і→i), Greek lookalikes (α→a), fullwidth ASCII (A→A), and zero-width characters (U+200B, U+200C, U+200D) are normalized. The presence of zero-width characters itself is flagged as an evasion signal.
Covert action directives
Instructions embedded in what appears to be documentation or configuration text, but that actually direct the agent to perform unauthorized actions. These are harder to catch with regex — they rely on natural language patterns rather than specific keywords. ClawAudit matches against patterns like "secretly," "without the user knowing," "covertly," and similar directive language, but this is the category where static analysis hits its ceiling. A sufficiently sophisticated attacker can always rephrase.
LLM03: Supply Chain Vulnerabilities — 17.5%
"The supply chain in LLMs can be vulnerable, impacting the integrity of training data, models, and deployment platforms." — OWASP
In agent configurations, supply chain risk looks different from traditional software. It's not about compromised npm packages in a lockfile — it's about skills that install packages at runtime, bypassing any pre-deployment audit.
Runtime installation patterns
Skills that include instructions like:
```bash
npm install some-package
pip install -r requirements.txt
cargo install binary-tool
```
When the agent executes these, it's pulling code from a package registry at runtime. The
skill author doesn't control (and the user can't audit) what some-package
depends on. A single compromised transitive dependency, and the agent is executing
attacker-controlled code with the user's permissions.
Binary downloads
Some skills download pre-built binaries from external URLs:
```bash
curl -sSL https://some-site.com/install.sh | bash
wget https://github.com/user/repo/releases/download/v1/tool -O /usr/local/bin/tool
```
The classic curl | bash pattern. In a traditional context
this is already dangerous. In an agent context, it's worse — the agent might execute this
without any user confirmation, depending on the agent framework's permission model.
The compound threat: supply_chain
ClawAudit's compound threat detection catches the dangerous combination:
package_install + process_exec. A skill that both installs
packages and executes processes has the full chain for supply chain compromise — it can
pull arbitrary code and run it.
LLM07: System Prompt Leakage — 2.6%
A smaller category in our dataset. 71 skills (2.6%) have patterns that could lead to system prompt exposure. This typically manifests as skills that reference their own configuration files, agent memory directories, or internal state in ways that could be extracted by a user or forwarded to external services.
The low prevalence makes sense — system prompt leakage is primarily a runtime
risk (an adversarial user crafting queries to extract the system prompt). What we detect
statically is the precondition: skills that access agent memory files
(MEMORY.md, daily_notes)
or expose configuration paths, which could facilitate leakage if combined with network
access.
The four that don't apply: LLM04, LLM08, LLM09, LLM10
Four OWASP categories showed 0% prevalence in our scan. This isn't a failure of detection — these categories genuinely don't apply to static analysis of agent configuration files:
Requires access to training data or model weights. Agent configurations don't interact with model training — they define how a pre-trained model operates. Not applicable to config-level analysis.
RAG-specific: attacks on embedding similarity, adversarial document injection into vector stores. Agent config files don't define embedding pipelines or vector storage. Relevant for RAG-focused security tools, not for config auditing.
Output quality and factual accuracy. This is a model evaluation concern, not a configuration security concern. No static pattern in a config file indicates whether the model will produce misinformation.
Resource exhaustion and denial-of-service through excessive token usage or infinite loops. Detectable only at runtime through rate limiting and monitoring. Config files don't contain enough information to predict resource consumption.
We're being honest about scope. Static analysis of agent configurations can't catch everything. These categories require runtime monitoring, model evaluation infrastructure, or training pipeline security — all valid concerns, all outside our lane.
Beyond OWASP: compound threats
The OWASP framework maps well to individual findings, but the most dangerous patterns emerge from combinations of capabilities. Individual findings tell you what a skill can do. Compound threats tell you what it's set up to do.
ClawAudit detects 20 compound threat patterns — dangerous capability combinations that indicate specific attack postures:
| Pattern | Capabilities | Severity | OWASP |
|---|---|---|---|
| Data exfiltration | file_read + network_out | Critical | LLM02, LLM06 |
| Credential theft | credential_access + network_out | High | LLM02 |
| Remote code execution | network_out + dynamic_eval | Critical | LLM05, LLM06 |
| Supply chain | package_install + process_exec | High | LLM03 |
| C2 channel | network_in + process_exec | Critical | LLM05, LLM06 |
| Obfuscated execution | data_encoding + dynamic_eval | Critical | LLM05 |
| Memory exfiltration | agent_memory + network_out | Critical | LLM02 |
The compound threat model gives us something OWASP alone doesn't: a way to distinguish
between "has a dangerous capability" and "has a dangerous capability plus the
means to exploit it." A skill with credential_access
alone is concerning. A skill with credential_access + network_out +
data_encoding is in active exfiltration posture.
The hidden attack surface: cross-file trust
This dataset focused on individual SKILL.md files. But in real-world deployments, AI agents don't run from a single file. A typical Claude Code project has:
CLAUDE.md— project instructions, implicit permissions, MCP tool references.mcp.json— MCP server definitions, credentials, transport configs- Sometimes both, plus additional config files
ClawAudit v2 introduced cross-file trust tracing that connects these files. When scanning a directory, it detects:
- Capability escalation — CLAUDE.md looks clean, but MCP servers in the config
grant
docker,kubectl, or other dangerous commands the CLAUDE.md doesn't mention - Credential flows — traces how API keys and tokens move from env vars through MCP server configs to remote endpoints
- Remote tool delegation — CLAUDE.md uses MCP tools served by a non-localhost server. Tool responses from remote servers are not integrity-verified — a compromised MCP server can inject arbitrary tool responses
- Phantom tools — CLAUDE.md references MCP tools that don't exist in any scanned MCP config. Either the config is missing or the tools are coming from somewhere undocumented
These cross-file risks don't map cleanly to any single OWASP category. They span LLM01 (tool response injection), LLM02 (credential flow), LLM03 (remote MCP supply chain), and LLM06 (escalated permissions). This is where a composite threat model outperforms a category-based framework.
What this means for AI agent security
1. OWASP is directionally correct but unevenly distributed
LLM02 (sensitive data) and LLM06 (excessive agency) account for the vast majority of real-world risk in agent configurations. If you're prioritizing which OWASP categories to address first, these two give you the most coverage. LLM04, LLM08, LLM09, and LLM10 are irrelevant at the config layer.
2. Permission integrity is the most actionable signal
The gap between "declared permissions" and "actual capabilities" is both the most common finding (LLM06) and the easiest to fix. Skill authors can close this gap by declaring what their skills actually do. Platform operators can enforce permission integrity as a publishing requirement.
3. Compound threats matter more than individual findings
A skill with 10 low-severity findings might be fine. A skill with two capabilities that form a compound threat (credential access + network egress) might not be. Severity scores that only count individual findings miss the combinatorial risk.
4. Audit before deployment, not after
Every finding in this dataset was detectable before the agent ran. Before it accessed your credentials. Before it read your files. Before it opened a network connection. Static analysis catches the posture; runtime monitoring catches the act. You want both, but static comes first.
Try it
npx @clawaudit/cli scan .
Scans your project for CLAUDE.md,
.mcp.json, and other agent config files. Every finding
includes OWASP LLM Top 10 tags. Output in text, JSON, or
SARIF v2.1.0
for GitHub Code Scanning integration.
Fully local. Zero dependencies. Takes about a second. Source on GitHub (MIT).
Methodology
We scanned 2,748 of 3,183 dangerous-tier skills from the OpenClaw skill registry. Dangerous tier = skills scoring below 40/100 in ClawAudit's composite trust scoring (50% security, 25% transparency, 25% maintenance). 435 skills were excluded due to file size limits (>100KB SKILL.md files that cause out-of-memory errors in batch processing).
Scanning used ClawAudit v2 with OWASP LLM Top 10 (2025 v2.0) mapping. Each finding was mapped to one or more OWASP categories based on the finding's detection category. Compound threats were independently mapped using capability-pair-to-OWASP associations. Percentages reflect the proportion of scanned skills with at least one finding in each OWASP category (skill-level prevalence, not finding-level count).
All analysis is static — no network requests, no LLM inference, no runtime execution. The scanner is fully deterministic: the same input always produces the same output. No probability-based classification. Source code for both the analyzer and the bulk scanning scripts is included in the repository.