Three AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted it | Venture Beat

Overview

Three AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted it

A security researcher, working with colleagues at Johns Hopkins University, opened a Git Hub pull request, typed a malicious instruction into the PR title, and watched Anthropic’s Claude Code Security Review action post its own API key as a comment. The same prompt injection worked on Google’s Gemini CLI Action and Git Hub’s Copilot Agent (Microsoft). No external infrastructure required.

Details

Aonan Guan, the researcher who discovered the vulnerability, alongside Johns Hopkins colleagues Zhengyu Liu and Gavin Zhong, published the full technical disclosure last week, calling it “Comment and Control.” Git Hub Actions does not expose secrets to fork pull requests by default when using the pull_request trigger, but workflows using pull_request_target, which most AI agent integrations require for secret access, do inject secrets into the runner environment. This limits the practical attack surface but does not eliminate it: collaborators, comment fields, and any repo using pull_request_target with an AI coding agent are exposed.

Per Guan’s disclosure timeline: Anthropic classified it as CVSS 9.4 Critical (

100 bounty), Google paid a

1,337 bounty, and Git Hub awarded

500 through the Copilot Bounty Program. The

100 amount is notably low relative to the CVSS 9.4 rating; Anthropic’s Hacker One program scopes agent-tooling findings separately from model-safety vulnerabilities. All three patched quietly, and none had issued CVEs in the NVD or published security advisories through Git Hub Security Advisories as of Saturday.

Comment and Control exploited a prompt injection vulnerability in Claude Code Security Review, a specific Git Hub Action feature that Anthropic’s own system card acknowledged is “not hardened against prompt injection.” The feature is designed to process trusted first-party inputs by default; users who opt into processing untrusted external PRs and issues accept additional risk and are responsible for restricting agent permissions. Anthropic updated its documentation to clarify this operating model after the disclosure. The same class of attack operates beneath Open AI’s safeguard layer at the agent runtime, based on what their system card does not document — not a demonstrated exploit. The exploit is the proof case, but the story is what the three system cards reveal about the gap between what vendors document and what they protect.

Open AI and Google did not respond for comment by publication time.

“At the action boundary, not the model boundary,” Merritt Baer, CSO at Enkrypt AI and former Deputy CISO at AWS, told Venture Beat when asked where protection actually needs to sit. “The runtime is the blast radius.”

Anthropic’s Opus 4.7 system card runs 232 pages with quantified hack rates and injection resistance metrics. It discloses a restricted model strategy (Mythos held back as a capability preview) and states directly that Claude Code Security Review is “not hardened against prompt injection.” The system card explains to readers that the runtime was exposed. Comment and Control proved it. Anthropic does gate certain agent actions outside the system card’s scope — Claude Code Auto Mode, for example, applies runtime-level protections — but the system card itself does not document these runtime safeguards or their coverage.

Open AI’s GPT-5.4 system card documents extensive red teaming and publishes model-layer injection evals but not agent-runtime or tool-execution resistance metrics. Trusted Access for Cyber scales access to thousands. The system card tells you what red teamers tested. It does not tell you how resistant the model is to the attacks they found.

Google’s Gemini 3.1 Pro model card, shipped in February, defers most safety methodology to older documentation, a Venture Beat review of the card found. Google’s Automated Red Teaming program remains internal only. No external cyber program.

232 pages. Quantified hack rates, classifier scores, and injection resistance metrics.

Extensive. Red teaming hours documented. No injection resistance rates published.

Few pages. Defers to older Gemini 3 Pro card. No quantified results.

CVP. Removes cyber safeguards for vetted pentesters and red teamers doing authorized offensive work. Does not address prompt injection defense. Platform and data-retention exclusions not yet publicly documented.

Yes. Mythos held back as a capability preview. Opus 4.7 is the testbed.

No restricted model. Full capability released, access gated.

Claude Code Security Review: system card states it is not hardened against prompt injection. The feature is designed for trusted first-party inputs. Anthropic applies additional runtime protections (e.g., Claude Code Auto Mode) not documented in the system card.

Not documented. TAC governs access, not agent operations.

Not directly exploited. Structural gap inferred from TAC design, not demonstrated.

$1,337 bounty per Guan disclosure. Patched. No CVE.

Model-layer injection evals published. No agent-runtime or tool-execution resistance rates.

Baer offered specific procurement questions. “For Anthropic, ask how safety results actually transfer across capability jumps,” she told Venture Beat. “For Open AI, ask what ‘trusted’ means under compromise.” For both, she said, directors need to “demand clarity on whether safeguards extend into tool execution, not just prompt filtering.”

Seven threat classes neither safeguard approach closes

Each row names what breaks, why your controls miss it, what Comment and Control proved, and the recommended action for the week ahead.

Launch announcements describe the program. Support documentation lists the exclusions. Security teams read the announcement. Procurement reads neither.

The exploit targets the agent runtime, not the deployment platform. A team running Claude Code on Bedrock is outside CVP coverage, but CVP was not designed to address this class of vulnerability in the first place.

Email your Anthropic and Open AI reps today. One question, in writing: ‘Confirm whether [your platform] and [your data retention config] are covered by your runtime-level prompt injection protections, and describe what those protections include.’ File the response in your vendor risk register.

ANTHROPIC_API_KEY, GEMINI_API_KEY, GITHUB_TOKEN, and any production secret stored as a Git Hub Actions env var are readable by every workflow step, including AI coding agents.

The default Git Hub Actions config does not scope secrets to individual steps. Repo-level and org-level secrets propagate to all workflows. Most teams never audit which steps access which secrets.

The agent read the API key from the runner env var, encoded it in a PR comment body, and posted it through Git Hub’s API. No attacker-controlled infrastructure required. Exfiltration ran through Git Hub’s own API — the platform itself became the C2 channel.

Run: grep -r ‘secrets.’ .github/workflows/ across every repo with an AI agent. List every secret the agent can access. Rotate all exposed credentials. Migrate to short-lived OIDC tokens (Git Hub, Git Lab, Circle CI).

AI agents granted bash execution, git push, and API write access at setup. Permissions never scoped down. No periodic least-privilege review. Agents accumulate access in the same way service accounts do.

Agents are configured once during onboarding and inherited across repos. No tooling flags unused permissions. The Comment and Control agent had bash, write, and env-read access for a code review task.

The agent had bash access it did not need for code review. It used that access to read env vars and post exfiltrated data. Stripping bash would have blocked the attack chain entirely.

Audit agent permissions repo by repo. Strip bash from code review agents. Set repo access to read-only. Gate write access (PR comments, commits, merges) behind a human approval step.

CVSS 9.4 Critical. Anthropic, Google, and Git Hub patched. Zero CVE entries in NVD. Zero advisories. Your vulnerability scanner, SIEM, and GRC tool all show green.

No CNA has yet issued a CVE for a coding agent prompt injection, and current CVE practices have not captured this class of failure mode. Vendors patch through version bumps. Qualys, Tenable, and Rapid 7 have nothing to scan for.

A SOC analyst running a full scan on Monday morning would find zero entries for a Critical vulnerability that hit Claude Code Security Review, Gemini CLI Action, and Copilot simultaneously.

Create a new category in your supply chain risk register: ‘AI agent runtime.’ Assign a 48-hour check-in cadence with each vendor’s security contact. Do not wait for CVEs. None have come yet, and the taxonomy gap makes them unlikely without industry pressure.

Opus 4.7 blocks a phishing email prompt. It does not block an agent from reading $ANTHROPIC_API_KEY and posting it as a PR comment. Safeguards gate generation, not operation.

Safeguards filter model outputs (text). Agent operations (bash, git push, curl, API POST) bypass safeguard evaluation entirely. The runtime is outside the safeguard perimeter. Anthropic applies some runtime-level protections in features like Claude Code Auto Mode, but these are not documented in the system card and their scope is not publicly defined.

The agent never generated prohibited content. It performed a legitimate operation (post a PR comment) containing exfiltrated data. Safeguards never triggered.

Map every operation your AI agents perform: bash, git, API calls, file writes. For each, ask the vendor in writing: does your safeguard layer evaluate this action before execution? Document the answer.

PR titles, PR body text, issue comments, code review comments, and commit messages are all parsed by AI coding agents as context. Any can contain injected instructions.

No input sanitization layer between Git Hub and the agent instruction set. The agent cannot distinguish developer intent from attacker injection in untrusted fields. Claude Code Git Hub Action is designed for trusted first-party inputs by default. Users who opt into processing untrusted external PRs accept additional risk.

A single malicious PR title became a complete exfiltration command. The agent treated it as a legitimate instruction and executed it without validation or confirmation.

Implement input sanitization as defense-in-depth, but do not rely on traditional WAF-style regex patterns. LLM prompt injections are non-deterministic and will evade static pattern matching. Restrict agent context to approved workflow configs and combine with least-privilege permissions.

No comparable injection resistance data across vendors
No comparable injection resistance data across vendors

Anthropic publishes quantified injection resistance rates in 232 pages. Open AI publishes model-layer injection evals but no agent-runtime resistance rates. Google publishes a few-page card referencing an older model.

No industry standard for AI safety metric disclosure. Vendors may have internal metrics and red-team programs, but published disclosures are not comparable. Procurement has no baseline and no framework to require one.

Anthropic, Open AI, and Google were all approved for enterprise use without comparable injection resistance data. The exploit exposed what unmeasured risk looks like in production.

Write one sentence for your next vendor meeting: ‘Show me your quantified injection resistance rate for my model version on my platform.’ Document refusals for EU AI Act high-risk compliance. Deadline: August 2026.

Open AI’s GPT-5.4 was not directly exploited in the Comment and Control disclosure. The gaps identified in the Open AI and Google columns are inferred from what their system cards and program documentation do not publish, not from demonstrated exploits. That distinction matters. Absence of published runtime metrics is a transparency gap, not proof of a vulnerability. It does mean procurement teams cannot verify what they cannot measure.

Eligibility requirements for Anthropic’s Cyber Verification Program and Open AI’s Trusted Access for Cyber are still evolving, as are platform coverage and program scope, so security teams should validate current vendor docs before treating any coverage described here as definitive. Anthropic’s CVP is designed for authorized offensive security research — removing cyber safeguards for vetted actors — and is not a prompt injection defense program. Security leaders mapping these gaps to existing frameworks can align threat classes 1–3 with NIST CSF 2.0 GV. SC (Supply Chain Risk Management), threat class 4 with ID. RA (Risk Assessment), and threat classes 5–7 with PR. DS (Data Security).

Comment and Control focuses on Git Hub Actions today, but the seven threat classes generalize to most CI/CD runtimes where AI agents execute with access to secrets, including Git Hub Actions, Git Lab CI, Circle CI, and custom runners. Safety metric disclosure formats are in flux across all three vendors; Anthropic currently leads on published quantification in its system card documentation, but norms are likely to converge as EU AI Act obligations come into force. Comment and Control targeted Claude Code Git Hub Action, a specific product feature, not Anthropic’s models broadly. The vulnerability class, however, applies to any AI coding agent operating in a CI/CD runtime with access to secrets.

“Don’t standardize on a model. Standardize on a control architecture,” Baer told Venture Beat. “The risk is systemic to agent design, not vendor-specific. Maintain portability so you can swap models without reworking your security posture.”

Build a deployment map. Confirm your platform qualifies for the runtime protections you think cover you. If you run Opus 4.7 on Bedrock, ask your Anthropic account rep what runtime-level prompt injection protections apply to your deployment surface. Email your account rep today. (Anthropic Cyber Verification Program)

Audit every runner for secret exposure. Run grep -r ‘secrets.’ .github/workflows/ across every repo with an AI coding agent. List every secret the agent can access. Rotate all exposed credentials. (Git Hub Actions secrets documentation)

Start migrating credentials now. Switch stored secrets to short-lived OIDC token issuance. Git Hub Actions, Git Lab CI, and Circle CI all support OIDC federation. Set token lifetimes to minutes, not hours. Plan full rollout over one to two quarters, starting with repos running AI agents. (Git Hub OIDC docs | Git Lab OIDC docs | Circle CI OIDC docs)

Fix agent permissions repo by repo. Strip bash execution from every AI agent doing code review. Set repository access to read-only. Gate write access behind a human approval step. (Git Hub Actions permissions documentation)

Add input sanitization as one layer, not the only layer. Filter pull request titles, comments, and review threads for instruction patterns before they reach agents. Combine with least-privilege permissions and OIDC. Static regex will not catch non-deterministic prompt injections on its own.

Add “AI agent runtime” to your supply chain risk register. Assign a 48-hour patch verification cadence with each vendor’s security contact. Do not wait for CVEs. None have come yet for this class of vulnerability.

Check which hardened Git Hub Actions mitigations you already have in place. Hardened Git Hub Actions configurations block this attack class today: the permissions key restricts GITHUB_TOKEN scope, environment protection rules require approval before secrets are injected, and first-time-contributor gates prevent external pull requests from triggering agent workflows. (Git Hub Actions security hardening guide)

Prepare one procurement question per vendor before your next renewal. Write one sentence: “Show me your quantified injection resistance rate for the model version I run on the platform I deploy to.” Document refusals for EU AI Act high-risk compliance. The deadline is August 2026.

“Raw zero-days aren’t how most systems get compromised. Composability is,” Baer said. “It’s the glue code, the tokens in CI, the over-permissioned agents. When you wire a powerful model into a permissive runtime, you’ve already done most of the attacker’s work for them.”

Deep insights for enterprise AI, data, and security leaders

By submitting your email, you agree to our Terms and Privacy Notice.

Key Takeaways

Three AI coding agents leaked secrets through a single prompt injection
A security researcher, working with colleagues at Johns Hopkins University, opened a Git Hub pull request, typed a malicious instruction into the PR title, and watched Anthropic’s Claude Code Security Review action post its own API key as a comment
Aonan Guan, the researcher who discovered the vulnerability, alongside Johns Hopkins colleagues Zhengyu Liu and Gavin Zhong, published the full technical disclosure last week, calling it “Comment and Control
Per Guan’s disclosure timeline: Anthropic classified it as CVSS 9
Comment and Control exploited a prompt injection vulnerability in Claude Code Security Review, a specific Git Hub Action feature that Anthropic’s own system card acknowledged is “not hardened against prompt injection