Comparisons

GitHub now has several agent surfaces. Anthropic ships anthropics/claude-code-action. OpenAI ships Codex on GitHub (with a Copilot-bundled preview since February 2026). Google ships google-github-actions/run-gemini-cli. anomalyco/opencode ships its own action. GitHub itself ships Copilot cloud agent (formerly Copilot coding agent), which lives directly inside github.com. Pullfrog uses those same agents — claude-code and opencode are the runtimes the Pullfrog harness invokes — and adds an open-source orchestration layer on top: a curated toolbelt the agent can actually use in CI, a permission model that isolates secrets and git credentials from the agent process, state that compounds across runs, lifecycle hooks your repo controls, a programmable trigger matrix, and a managed dashboard. This page covers how Pullfrog compares to each surface.

Native vendor agent actions

The vendor actions (claude-code-action, Codex on GitHub, run-gemini-cli, the opencode action) all do the same thing well: drop a single agent into a GitHub workflow, listen for a mention, run an agent on a runner, and get you a code change. If you have one repo, one model, one developer, and you don’t need PR review or anything to persist between invocations, those actions are great and you should use them. Pullfrog is built for the next step up. Batteries-included toolbelt. The moment you run an agent in CI, you discover it needs to do things vendor actions don’t ship: read CI logs without grepping plain text, check out a PR with a diff already formatted, install dependencies in the background, upload a screenshot or build artifact to S3, browse the live app to verify a fix, take a screenshot to attach to a review comment. With claude-code-action you wire each of those up yourself — write the MCP server, handle auth, gate permissions — for every repo. Pullfrog ships them in the harness, pre-authed, and permission-gated against the active mode. Out of the box: structured GitHub API tools, get_check_suite_logs that returns parsed CI failures instead of raw text dumps, checkout_pr with a formatted diff, upload_file that streams to S3 and returns a public URL, a background dependency installer, report_progress that mirrors the agent’s to-do list to a sticky GitHub comment, and a preconfigured agent-browser skill for headless Chromium plus screenshots. The browser daemon is parked outside the agent’s PID namespace so it survives across tool calls — a small detail that turns out to matter when the agent wants to take a “before” and “after”. Lifecycle scripts your repo controls. Four code-enforced hooks: setupScript, postCheckoutScript, prepushScript, and stopScript. They run inside the harness’s permission boundary and can install dependencies, prime caches, run the test suite, or gate a push on lint. Uniquely, stopScript exiting non-zero resumes the agent with the failure as context — so the agent fixes its own broken push instead of opening a red PR for a human to clean up. Vendor actions support exactly one extension point: more YAML steps in your workflow, with no agent-aware feedback loop. State that compounds across runs. Every Pullfrog run writes a structured snapshot of what it did, and the next run on the same PR loads that snapshot before the agent starts thinking. Follow-up reviews, CI re-runs, and “address the feedback” comments don’t restart from a blank context — the agent picks up where the previous run left off, with knowledge of what’s already changed and why. Repo-level learnings accrue across PRs, so the agent gets better at your codebase the more it works in it. Vendor actions are amnesiac: each @claude or @codex mention is a fresh process with a fresh prompt, which is why they keep making the same suggestion you rejected last time. Incremental PR review, handled elegantly. PRs aren’t reviewed once. They’re reviewed every time the author pushes a fixup, every time a teammate leaves feedback, every time CI fails and the agent retries. Vendor actions handle this by re-reviewing the whole PR from scratch on every event, which produces duplicate comments on every push and forces the agent to re-derive context it already had. Pullfrog handles it the way a careful human reviewer would: each subsequent review uses git range-diff against the previous review’s snapshot so the agent only looks at what’s actually new, comments are stable across re-runs, and addressed comments resolve themselves automatically the moment the diff makes them moot. Underneath, the review mode runs as a parallel fan-out of subagents — one lens per concern (correctness, security, style, tests, docs) executing concurrently rather than serially crawling the diff. Actionable comments expose a one-click Fix that spawns an address-review run on the spot. Eight built-in modes plus custom modes. implement, review, plan, address-review, triage, resolve-conflicts, headless, and more — each with its own playbook, tool permission set, and output shape. The router picks the right mode automatically from the trigger; you can pin a mode with a slash command, define custom modes per repo, or disable any built-in mode via config. Vendor actions have one mode: “do what the prompt says”. Triggers configured in a dashboard, not workflow YAML. Seven trigger types — @pullfrog mentions, new PRs, new issues, PR review submitted, CI failure, scheduled cron, and console-dispatched runs — each routable to a different mode, configurable per repo through the dashboard. Edit the comment that triggered a run and the in-flight run cancels and restarts on the new prompt. Submit a review while an implement run is mid-flight and a post-review safety-net dispatch queues an address-review so the feedback isn’t lost. Vendor actions express triggers as workflow event filters in YAML; richer policy means more YAML. Default-secure permissions. The vendor actions take a binary approach to security: the agent runs as a normal process on your runner, sees every environment variable, and uses GITHUB_TOKEN for git the same way any other CI step does. That’s fine when you trust the prompt entirely. It falls apart the moment a non-collaborator can trigger a run, the moment the agent processes untrusted issue text, or the moment you have an org-wide secret you don’t want every repo’s agent to see. Pullfrog’s agent gets the same comprehensive set of tools — real shell, real git, real network, real filesystem — but with no direct access to your secrets or git credentials. The shell tool runs inside a Linux PID namespace with a default-deny env allowlist; secrets in the runner env simply aren’t visible to bash. Git operations authenticate through a single-use, per-call ASKPASS code; the GitHub token is never in the agent’s shell env. Pushes to the default branch are blocked at the harness level. Because every potentially-dangerous action goes through a Pullfrog tool rather than raw shell, the harness can enforce a shell × push permission matrix per mode, with the intersection of repo policy and mode policy winning. Non-collaborator triggers are capped at restricted regardless of repo config; the schema rejects override smuggling. So a “review-only” mode can never accidentally push to your branch, and a non-collaborator’s @pullfrog can never escalate to unrestricted even on a public repo with the trigger opened up. See Security for the full layer breakdown. Open source. The action is open source under pullfrog/pullfrog. Audit it, fork it, contribute. Active open source projects can apply to our free-for-OSS program.

GitHub Copilot cloud agent

Copilot cloud agent (renamed from “Copilot coding agent” in early 2026) is the closest competitor to Pullfrog. It ships a deeply integrated first-party agent that runs in ephemeral GitHub-managed runners, with hooks, MCP support, custom subagents, and a copilot-setup-steps.yml configuration. The wedges Pullfrog has against Copilot are different from the ones it has against claude-code-action. Pricing. Copilot cloud agent is always more expensive than Pullfrog, and it’s billed per seat — a model that doesn’t align incentives between the platform and the team using it. A lot of what Copilot ships is gated behind specific paid tiers: access to the latest Claude Opus and GPT-Codex models, Copilot Memory (still in public preview and only on Pro / Pro+), enterprise governance features, and so on. Pullfrog is pay-as-you-go from day one and exposes the latest models, PR-level memory, and repo learnings without tier gating. Curated toolbelt vs DIY MCP JSON. Copilot speaks MCP but ships zero opinionated tools. Every server is a per-repo JSON entry, every secret is a COPILOT_MCP_* Agents secret you provision, every tools allowlist is trial-and-error. There’s no first-party browser tool, no S3 helper, no structured CI-log fetcher (closest equivalent is the GitHub MCP server’s actions toolset, which returns raw workflow logs). Pullfrog’s S3 / headless browser / structured CI logs / formatted PR diffs / background dep installer are pre-wired in the harness. Equivalent capability is achievable on Copilot; the activation energy is much higher. (Copilot cloud agent also doesn’t support OAuth-authenticated remote MCP servers — a documented limitation.) Lifecycle hooks designed for iteration, not just observability. Copilot’s hooks (preToolUse, postToolUse, sessionStart, agentStop, errorOccurred, etc.) can deny individual tool calls and log what the agent did. But they can’t feed output back into a new agent turn. Pullfrog’s stopScript non-zero → resume-with-failure-context loop has no Copilot equivalent. Hooks-as-feedback, not hooks-as-observability. PR-level memory and repo learnings. Pullfrog persists a structured snapshot of every run and feeds it to the next run on the same PR, plus durable repo-level learnings the team can see and edit. Copilot Memory is the closest analog and shipped recently — but it’s in public preview, only available on Pro and Pro+, and per-user scoped rather than team-shared. Programmable triggers, not just human entry points. Copilot has many human entry points. It has zero programmable triggers — no cron, no “auto-review every new external PR”, no per-event policy beyond enable/disable. The one real overlap with Pullfrog is the “ask Copilot to fix this failing workflow run” button in the Actions UI. Pullfrog’s dashboard-configured trigger matrix (mentions, new PRs, new issues, PR reviews, CI failures, cron) is a clean win for any team that wants the agent to react to events without a human in the loop. Agent and model agnostic. Copilot’s model set is closed: Auto, Claude Sonnet 4.5, Claude Opus 4.7, GPT-5.2-Codex. No BYOK, no Bedrock, no Vertex, no other providers, no swapping out the agent runtime itself. Pullfrog runs claude-code or opencode underneath, with BYOK supported and every OpenRouter-routed provider available. Restricted runs from non-collaborators. Copilot’s stance on outside contributors is strict: “Copilot cloud agent will only respond to interactions from users with repository write access.” Non-collaborator issue or PR comments cannot trigger the agent at all, even on public repos. Pullfrog takes a different posture — non-collaborator triggers are allowed but capped at restricted mode with no override. A feature for OSS maintainers who want drive-by issue triage without granting write access, not a security advantage over Copilot. Pick the default that matches your workflow. There are two things Copilot does better. First-party affordances — verified bot commit identity, automatic security scanning on agent PRs, server-side enterprise governance enforced by GitHub itself. These matter most to large GitHub Enterprise shops; weigh accordingly. Entry-point breadth — GitHub Mobile, Raycast, Copilot Chat in five IDEs, and native Jira / Slack / Teams / Linear / Azure Boards integrations. Pullfrog has GitHub plus the dashboard.

Feature matrix

Feature	`claude-code-action`	Codex on GitHub	`run-gemini-cli`	Copilot cloud agent	Pullfrog
Pay-as-you-go (no per-seat lock-in)	Yes (BYOK)	No	Yes (BYOK)	No	Yes
Latest Opus / Codex models without paid-tier gating	Yes (BYOK)	No	n/a	No	Yes
PR-level memory + repo learnings	No	No	No	Preview, Pro/Pro+ only	Yes
Curated MCP toolbelt out of the box	No	No	No	No (DIY MCP JSON)	Yes
Lifecycle hooks with stop-failure resume	No	No	No	No	Yes
Incremental PR review with auto-resolve	No	Preview	No	No (manual re-request)	Yes
Multiple built-in modes routed automatically	No	No	No	Custom subagents (DIY)	Yes
Programmable triggers (cron, auto on new PR/issue, CI failure)	No	No	No	No (CI-failure UI only)	Yes
Per-mode × per-trigger permission matrix	No	No	No	No	Yes
Shell sandbox: secrets and git creds isolated from agent	Partial	Yes (vendor cloud)	No	Yes (firewall + sandbox)	Yes
Multi-provider model routing / BYOK	No	No	No	No (closed model set)	Yes
Native GitHub-verified bot identity	No	Yes (vendor)	No	Yes	No (uses GitHub App)
Native integrations (Mobile, Raycast, Jira, Slack, Linear)	No	No	No	Yes	No
Open source	Yes (MIT)	No	Yes (Apache-2.0)	No	Yes

When each option is the right call

Use a vendor agent action when you have one repo, you’re committed to one model, you want the agent only when you mention it, and you don’t need PR review or anything that persists between invocations. The simplest tool wins. Use Copilot cloud agent when you’re already on a paid Copilot plan, you want the deepest possible GitHub-native integration (verified bot identity, every IDE, Mobile, Raycast, Jira, Slack), you need GHEC governance enforced server-side, and your workflow is human-driven (assign issue, mention agent) rather than event-driven. If you’re a GitHub Enterprise shop with strict content-exclusion or SSO requirements, this is often the right answer. Use Pullfrog when you want pay-as-you-go pricing instead of per-seat commitments, you want curated tools out of the box (browser, screenshots, S3, structured CI logs), you want the agent to react to events automatically (new PRs, CI failures, cron) instead of waiting for a human to assign it, you want state to compound across runs, you want to BYOK or run multiple models, or you want non-collaborator contributors to be able to trigger restricted-scope runs without granting write access. All three coexist fine in the same repo. Use what fits each workflow. Ready to try it? Head to getting started.

Getting started

Usage

Reference

Native vendor agent actions

GitHub Copilot cloud agent

Feature matrix

When each option is the right call

Getting started

Usage

Reference

Documentation Index

​Native vendor agent actions

​GitHub Copilot cloud agent

​Feature matrix

​When each option is the right call

Native vendor agent actions

GitHub Copilot cloud agent

Feature matrix

When each option is the right call