ClawAudit's scoring moved from v5 to v6 to correct a class of false positives. For a security tool, the honest accounting of what that fix gains and what it trades away is the product — so this page documents both: the false positives v6 corrects, and the false-negative surfaces the corrections introduce or leave open, each with its planned close.
What v6 corrects (false positives)
Each of these was a case where ClawAudit penalized declared or documented behavior — the opposite of the hidden behavior the tool exists to surface.
- Declared credentials in MCP configs. An
.mcp.jsonthat declares an environment variable like${GITHUB_TOKEN}was scored as credential access. Declaring that a server needs a token is the audited, transparent path — not a threat. v6 treats a declared placeholder as informational. Hardcoded literal secrets in a config are still flagged (that path is unchanged). - Zero-env SaaS skills. A skill that reads its API key at
runtime and posts only to its own vendor API (e.g. a skill reading
AGENTMAIL_API_KEYand callingapi.agentmail.to) was scored as credential theft, because the old downgrade keyed on frontmatter-declared environment variables that runtime-getenv skills don't have. v6 suppresses the credential compound when there is no evidence the credential reaches a destination distinct from the skill's own declared API. - Example-payload security tools. Linters and auditors
that document attack payloads (so they can teach you to recognize them) were
scored as if they ran them. v6 honors a structural opt-in only — a fenced
code block tagged
```bash-example, or frontmatterdocumentation_only: true. It is never inferred from a skill's name or description (no "this looks like a security tool" heuristic, which would be trivially abused). - Placeholder and documentation URLs. Template hosts
(
${VAR},your-domain.com, RFC 2606example.com) were being sent to VirusTotal. v6 skips them — matched on the parsed hostname at dot-label boundaries, never as a substring, so a real host that merely contains a placeholder token is not skipped.
What v6 adds (a backstop)
Every correction above lowers a score. To keep that from ever talking a real payload
up to a clean tier, v6 adds a Hard Floor: if a skill's raw
bytes contain an execution sink — pipe-to-shell (curl ... | bash),
fetch-then-eval (eval(atob(...))), PowerShell
iex, download-then-run staging, and similar — the final
tier is capped at Caution, evaluated last, and cannot be
lifted by any downgrade, example-payload opt-in, or allowlist. The corrections are only safe
because the floor backstops them.
Known limitations (false-negative surfaces)
A security tool that hides its blind spots is worse than one that names them. v6's corrections introduce or leave open three false-negative surfaces. We validated the affected populations by hand before shipping and found no missed threats in them — but the mechanisms are real, so they're documented here with their planned closes.
- Look-alike exfil host. The zero-env credential suppression decides "is this the skill's own API?" by matching the destination host against the skill's identity. An attacker who names their exfiltration host after the skill could evade it. Sampled clean (no true positives found), but the mechanism stands. Planned close: match the registrable domain (eTLD+1 via the public suffix list) instead of token overlap.
- Unresolvable destination. When a skill builds its network destination dynamically (a concatenated or config-supplied URL with no static value), the suppressor can't see where the credential goes, so it suppresses blind. We hand-read every affected skill in our sample and each routed to a user-configured or first-party endpoint — but "we couldn't see the destination" is a real limit. Planned close: stay conservative (don't suppress) when no destination is statically resolvable.
- Documentation opt-in abuse. The example-payload opt-in
suppresses a skill's findings — and the Hard Floor only backstops execution
sinks, not non-execution network exfiltration. A skill self-declaring
documentation_onlywhile quietly POSTing a credential to a distinct host could have escaped. Closed by a contradiction guard: the opt-in is revoked the moment credential exfiltration to a distinct destination is detected, because the self-declaration is itself an audit-visible fingerprint.
Registry-wide counts
Because v6 lowers a class of scores, the registry-wide "Dangerous" count under v6 will be lower than the v5 figure our earlier posts cite (1,555 of 19,461 skills in the March 17, 2026 scan). The full-registry rescan that produces the authoritative v6 count is in progress; cited counts will be updated to v6 when it completes. Until then, figures in existing posts are the v5 numbers, dated as such.
Validation
Before shipping v6 we ran a regression against a hand-labeled sample: known-bad skills must stay flagged (zero false negatives among them is a hard stop), known-good skills must clear. v6 passed — no known-bad skill escaped, and the corrected false positives cleared — on a population where every entry was either fetched and judged or explicitly accounted for, not silently skipped. The three surfaces above are the residual, disclosed tradeoff.
Questions or a disagreement with a verdict? [email protected].