ClawAudit verdict
self-safety-guard
sx-self-safety-guard
Implements AI self-safety guard; monitors and protects against malicious requests.
⚠ Flagged for review — coarse, uncorroborated signal, not a confirmed exploit. Review the config yourself before installing.
Automated static analysis — not a human review. ClawAudit flags capabilities, not confirmed intent, and can produce false positives. Disagree with this verdict? Use Dispute below.
Findings (4)
Prompt injection — tries to override agent instructions
SKILL.md · prose · downgraded · Ignore previous instructions
Raw model control tokens — prompt injection via token manipulation
references/prompt-injection-patterns.md · prose · downgraded · [INST]
Possible prompt injection — attempts to redefine agent identity
SKILL.md · prose · downgraded · You are now
Fake system prompt — attempts to inject instructions
references/prompt-injection-patterns.md · prose · downgraded · System: You are
Permissions & capabilities
No declared permissions — minimal attack surface.
Is this flag fair?
Thanks — recorded.