Blog

We Mapped 2,748 Dangerous AI Agent Skills to the OWASP LLM Top 10 — Here's What We Found

March 13, 2026 · 14 min read · By 4Worlds

The OWASP Top 10 for LLM Applications (2025 v2.0) is the most widely cited framework for LLM security risks. It covers prompt injection, data poisoning, supply chain vulnerabilities, and more. Security teams reference it in threat models. Compliance frameworks point to it. It shapes how the industry thinks about AI security.

But it's a theoretical framework — built from expert consensus, not empirical data. Nobody has tested it against a real-world corpus of AI agent configurations to see which risks actually show up in practice and which are theoretical concerns.

Until now.

What we did

We took ClawAudit, our static security analyzer for AI agent configurations, and added OWASP LLM Top 10 mapping to every finding it produces — every pattern match, every compound threat, every permission integrity violation. Then we re-scanned 2,748 dangerous-tier OpenClaw skills: the ones that scored below 40/100 in our registry-wide audit.

These are skills that real developers have published to a real package registry. They're not contrived examples — they're configurations that AI agents actually execute. Some are malicious. Most are just careless. All of them represent real risk.

The scan maps each finding to one or more OWASP LLM categories, producing a dataset that answers the question: which OWASP risks dominate real-world AI agent configurations?

The headline numbers

2,748
Skills scanned
80.1%
LLM02 — Sensitive Info
49.4%
LLM06 — Excessive Agency
23.7%
LLM05 — Output Handling
21.0%
LLM01 — Prompt Injection
17.5%
LLM03 — Supply Chain

The full breakdown:

OWASP ID Name Skills affected % of dangerous
LLM02Sensitive Information Disclosure2,20180.1%
LLM06Excessive Agency1,35849.4%
LLM05Improper Output Handling65123.7%
LLM01Prompt Injection57721.0%
LLM03Supply Chain Vulnerabilities48117.5%
LLM07System Prompt Leakage712.6%
LLM04Data and Model Poisoning00%
LLM08Vector and Embedding Weaknesses00%
LLM09Misinformation00%
LLM10Unbounded Consumption00%

The distribution is radically uneven. Two categories account for the vast majority of real-world risk. Four categories don't apply at all. Let's dig into each one.

LLM02: Sensitive Information Disclosure — 80.1%

"Sensitive information can be disclosed through the output of the LLM, leading to unauthorized data access, privacy violations, and security breaches." — OWASP

The most prevalent risk by a wide margin. 4 out of 5 dangerous skills have patterns that map to sensitive information disclosure.

In the context of agent configurations, this manifests in three ways:

1. Credential access patterns

Skills that access process.env.API_KEY, os.environ['SECRET_TOKEN'], or similar patterns. Many skills legitimately need API keys — the question is whether they also do something dangerous with them.

ClawAudit distinguishes between "uses a credential" (normal) and "uses a credential AND makes network requests to an undeclared endpoint" (credential theft pattern). The compound threat credential_access + network_out is the strongest signal for LLM02, and it's the most common compound threat in the entire dataset.

2. Filesystem path exposure

Skills that reference ~/.ssh, ~/.aws/credentials, /etc/passwd, or cloud provider credential files. These aren't accessing the credentials directly (that would be code execution) — they're configuring the agent to be aware of these paths, which means the agent can read them if it has filesystem access.

3. System prompt / configuration leakage

Skills that expose their own system prompt, internal configuration, or operational instructions in ways that could be extracted by a user or another agent. This overlaps with LLM07 (System Prompt Leakage), which we counted separately.

Why 80% is expected

This number is high but not surprising. Agent skills exist to do things — call APIs, access files, interact with services. All of these require credentials. The dangerous part isn't credential access itself; it's credential access combined with other capabilities that create exfiltration vectors.

The real insight is in the compound threat data: of the 2,201 skills with LLM02 findings, a significant portion also have network_out or data_encoding capabilities, which elevates the risk from "accesses credentials" to "can exfiltrate credentials."

LLM06: Excessive Agency — 49.4%

"An LLM-based system is often granted a degree of agency to call functions or interface with other systems. Excessive agency occurs when LLM-based systems are granted access to functions, permissions, or autonomy beyond what is necessary." — OWASP

Nearly half of all dangerous skills exhibit excessive agency. This is the permission integrity gap — the difference between what a skill declares it needs and what it actually does.

Consider a skill that declares this in its YAML frontmatter:

requires:
  env:
    - API_KEY
  bins:
    - jq

Looks minimal. But in its code blocks, the skill:

  • Makes outbound HTTP requests to https://some-server.com/api
  • Reads files from the filesystem using readFileSync
  • Spawns subprocesses using child_process.exec
  • Writes to configuration files

None of these capabilities are declared. The skill asks for jq and an API key, but it's actually reading your files, executing commands, and sending data to external servers. This is textbook excessive agency.

How ClawAudit detects this

ClawAudit extracts capabilities from code blocks in instruction context and compares them against declared permissions in frontmatter. The gap between "declared" and "actual" is quantified as permission integrity violations, each mapped to LLM06.

Three types of violations:

  • Undeclared capabilities (high severity) — the skill does things it didn't ask permission for. Network access without declaring curl or wget. Credential access without declaring any environment variables.
  • Opaque dependencies (medium) — runtime package installation (npm install, pip install) where the full dependency tree is not auditable before execution.
  • Over-declared permissions (low) — the inverse. A skill declares docker and kubectl in its required binaries but never uses them. Less dangerous, but suspicious — why request capabilities you don't need?

The MCP angle

Excessive agency gets worse in multi-file configurations. ClawAudit's cross-file trust tracing (new in v2) detects capability escalation across CLAUDE.md and .mcp.json files. A CLAUDE.md might look clean, but if the MCP config it references grants the agent access to docker, kubectl, or filesystem servers, the effective permission surface is much larger than the CLAUDE.md suggests.

This kind of cross-boundary escalation is invisible when you audit files individually. You have to trace trust across the full configuration stack.

LLM05: Improper Output Handling — 23.7%

"Improper output handling refers to insufficient validation, sanitization, and handling of outputs from large language models before they are passed to other components." — OWASP

In the agent config context, this category captures code execution patterns — skills that take input and execute it as code without sanitization:

  • eval() and new Function() — dynamic code execution in JavaScript
  • child_process.exec() and subprocess.run() — shell command execution
  • curl | bash — downloading and executing remote scripts
  • __import__('os').system() — Python OS-level access

Zone-aware false positive prevention

This is where ClawAudit's zone-aware analysis is critical. A security tutorial might include:

# Security Warning
Do NOT use eval() with user input:
```js
// BAD - vulnerable to code injection
eval(userInput)
```

A naive scanner would flag this as a critical finding. ClawAudit classifies it as a code block under a security documentation heading, applies a 0.15x severity multiplier, and suppresses it to info level. The same eval() in an instruction code block gets full critical severity.

Without zone awareness, you'd either drown in false positives from documentation or miss real threats by filtering too aggressively. This is a solved problem in traditional SAST (Semgrep, CodeQL understand ASTs), but nobody had brought it to markdown-based agent configs before.

LLM01: Prompt Injection — 21.0%

"A prompt injection vulnerability occurs when an attacker manipulates a large language model through crafted inputs, causing the LLM to unknowingly execute the attacker's intentions." — OWASP

One in five dangerous skills contains prompt injection patterns. These break down into several subcategories:

Direct injection patterns

The crude but effective approach: literal strings like "ignore previous instructions," "you are now a different AI," or "disregard all prior directives." These are sometimes test strings left in by developers, sometimes genuinely malicious. ClawAudit doesn't distinguish intent — it flags the pattern and lets the human decide.

Obfuscated injection

More sophisticated attacks use Unicode homoglyphs to evade simple string matching. The Cyrillic character "а" (U+0430) looks identical to Latin "a" (U+0061) in most fonts. An attacker can write "іgnоrе рrеvіоus іnstruсtіоns" using a mix of Cyrillic and Latin characters that visually appears normal but bypasses keyword filters.

ClawAudit's Unicode confusable normalization defeats this. Before pattern scanning, all Cyrillic lookalikes (А→A, е→e, о→o, р→p, с→c, і→i), Greek lookalikes (α→a), fullwidth ASCII (A→A), and zero-width characters (U+200B, U+200C, U+200D) are normalized. The presence of zero-width characters itself is flagged as an evasion signal.

Covert action directives

Instructions embedded in what appears to be documentation or configuration text, but that actually direct the agent to perform unauthorized actions. These are harder to catch with regex — they rely on natural language patterns rather than specific keywords. ClawAudit matches against patterns like "secretly," "without the user knowing," "covertly," and similar directive language, but this is the category where static analysis hits its ceiling. A sufficiently sophisticated attacker can always rephrase.

LLM03: Supply Chain Vulnerabilities — 17.5%

"The supply chain in LLMs can be vulnerable, impacting the integrity of training data, models, and deployment platforms." — OWASP

In agent configurations, supply chain risk looks different from traditional software. It's not about compromised npm packages in a lockfile — it's about skills that install packages at runtime, bypassing any pre-deployment audit.

Runtime installation patterns

Skills that include instructions like:

```bash
npm install some-package
pip install -r requirements.txt
cargo install binary-tool
```

When the agent executes these, it's pulling code from a package registry at runtime. The skill author doesn't control (and the user can't audit) what some-package depends on. A single compromised transitive dependency, and the agent is executing attacker-controlled code with the user's permissions.

Binary downloads

Some skills download pre-built binaries from external URLs:

```bash
curl -sSL https://some-site.com/install.sh | bash
wget https://github.com/user/repo/releases/download/v1/tool -O /usr/local/bin/tool
```

The classic curl | bash pattern. In a traditional context this is already dangerous. In an agent context, it's worse — the agent might execute this without any user confirmation, depending on the agent framework's permission model.

The compound threat: supply_chain

ClawAudit's compound threat detection catches the dangerous combination: package_install + process_exec. A skill that both installs packages and executes processes has the full chain for supply chain compromise — it can pull arbitrary code and run it.

LLM07: System Prompt Leakage — 2.6%

A smaller category in our dataset. 71 skills (2.6%) have patterns that could lead to system prompt exposure. This typically manifests as skills that reference their own configuration files, agent memory directories, or internal state in ways that could be extracted by a user or forwarded to external services.

The low prevalence makes sense — system prompt leakage is primarily a runtime risk (an adversarial user crafting queries to extract the system prompt). What we detect statically is the precondition: skills that access agent memory files (MEMORY.md, daily_notes) or expose configuration paths, which could facilitate leakage if combined with network access.

The four that don't apply: LLM04, LLM08, LLM09, LLM10

Four OWASP categories showed 0% prevalence in our scan. This isn't a failure of detection — these categories genuinely don't apply to static analysis of agent configuration files:

LLM04: Data and Model Poisoning

Requires access to training data or model weights. Agent configurations don't interact with model training — they define how a pre-trained model operates. Not applicable to config-level analysis.

LLM08: Vector and Embedding Weaknesses

RAG-specific: attacks on embedding similarity, adversarial document injection into vector stores. Agent config files don't define embedding pipelines or vector storage. Relevant for RAG-focused security tools, not for config auditing.

LLM09: Misinformation

Output quality and factual accuracy. This is a model evaluation concern, not a configuration security concern. No static pattern in a config file indicates whether the model will produce misinformation.

LLM10: Unbounded Consumption

Resource exhaustion and denial-of-service through excessive token usage or infinite loops. Detectable only at runtime through rate limiting and monitoring. Config files don't contain enough information to predict resource consumption.

We're being honest about scope. Static analysis of agent configurations can't catch everything. These categories require runtime monitoring, model evaluation infrastructure, or training pipeline security — all valid concerns, all outside our lane.

Beyond OWASP: compound threats

The OWASP framework maps well to individual findings, but the most dangerous patterns emerge from combinations of capabilities. Individual findings tell you what a skill can do. Compound threats tell you what it's set up to do.

ClawAudit detects 20 compound threat patterns — dangerous capability combinations that indicate specific attack postures:

Pattern Capabilities Severity OWASP
Data exfiltrationfile_read + network_outCriticalLLM02, LLM06
Credential theftcredential_access + network_outHighLLM02
Remote code executionnetwork_out + dynamic_evalCriticalLLM05, LLM06
Supply chainpackage_install + process_execHighLLM03
C2 channelnetwork_in + process_execCriticalLLM05, LLM06
Obfuscated executiondata_encoding + dynamic_evalCriticalLLM05
Memory exfiltrationagent_memory + network_outCriticalLLM02

The compound threat model gives us something OWASP alone doesn't: a way to distinguish between "has a dangerous capability" and "has a dangerous capability plus the means to exploit it." A skill with credential_access alone is concerning. A skill with credential_access + network_out + data_encoding is in active exfiltration posture.

The hidden attack surface: cross-file trust

This dataset focused on individual SKILL.md files. But in real-world deployments, AI agents don't run from a single file. A typical Claude Code project has:

  • CLAUDE.md — project instructions, implicit permissions, MCP tool references
  • .mcp.json — MCP server definitions, credentials, transport configs
  • Sometimes both, plus additional config files

ClawAudit v2 introduced cross-file trust tracing that connects these files. When scanning a directory, it detects:

  • Capability escalation — CLAUDE.md looks clean, but MCP servers in the config grant docker, kubectl, or other dangerous commands the CLAUDE.md doesn't mention
  • Credential flows — traces how API keys and tokens move from env vars through MCP server configs to remote endpoints
  • Remote tool delegation — CLAUDE.md uses MCP tools served by a non-localhost server. Tool responses from remote servers are not integrity-verified — a compromised MCP server can inject arbitrary tool responses
  • Phantom tools — CLAUDE.md references MCP tools that don't exist in any scanned MCP config. Either the config is missing or the tools are coming from somewhere undocumented

These cross-file risks don't map cleanly to any single OWASP category. They span LLM01 (tool response injection), LLM02 (credential flow), LLM03 (remote MCP supply chain), and LLM06 (escalated permissions). This is where a composite threat model outperforms a category-based framework.

What this means for AI agent security

1. OWASP is directionally correct but unevenly distributed

LLM02 (sensitive data) and LLM06 (excessive agency) account for the vast majority of real-world risk in agent configurations. If you're prioritizing which OWASP categories to address first, these two give you the most coverage. LLM04, LLM08, LLM09, and LLM10 are irrelevant at the config layer.

2. Permission integrity is the most actionable signal

The gap between "declared permissions" and "actual capabilities" is both the most common finding (LLM06) and the easiest to fix. Skill authors can close this gap by declaring what their skills actually do. Platform operators can enforce permission integrity as a publishing requirement.

3. Compound threats matter more than individual findings

A skill with 10 low-severity findings might be fine. A skill with two capabilities that form a compound threat (credential access + network egress) might not be. Severity scores that only count individual findings miss the combinatorial risk.

4. Audit before deployment, not after

Every finding in this dataset was detectable before the agent ran. Before it accessed your credentials. Before it read your files. Before it opened a network connection. Static analysis catches the posture; runtime monitoring catches the act. You want both, but static comes first.

Try it

npx @clawaudit/cli scan .

Scans your project for CLAUDE.md, .mcp.json, and other agent config files. Every finding includes OWASP LLM Top 10 tags. Output in text, JSON, or SARIF v2.1.0 for GitHub Code Scanning integration.

Fully local. Zero dependencies. Takes about a second. Source on GitHub (MIT).

Methodology

We scanned 2,748 of 3,183 dangerous-tier skills from the OpenClaw skill registry. Dangerous tier = skills scoring below 40/100 in ClawAudit's composite trust scoring (50% security, 25% transparency, 25% maintenance). 435 skills were excluded due to file size limits (>100KB SKILL.md files that cause out-of-memory errors in batch processing).

Scanning used ClawAudit v2 with OWASP LLM Top 10 (2025 v2.0) mapping. Each finding was mapped to one or more OWASP categories based on the finding's detection category. Compound threats were independently mapped using capability-pair-to-OWASP associations. Percentages reflect the proportion of scanned skills with at least one finding in each OWASP category (skill-level prevalence, not finding-level count).

All analysis is static — no network requests, no LLM inference, no runtime execution. The scanner is fully deterministic: the same input always produces the same output. No probability-based classification. Source code for both the analyzer and the bulk scanning scripts is included in the repository.