Claude Code vs Codex: How I Split the Work

I keep two agentic coding CLIs running: Claude Code and Codex. For a while that felt like I couldn't pick one. What fixed it wasn't picking one. It was finding a single question that tells me which to use, every time, without thinking about it.

The question is: is this task a conversation, or a straight line?

A conversation is open-ended. I poke at the problem, change direction, follow a thread, back up. A straight line is the opposite: I know exactly what I want done, I just don't want to do it by hand again. Once I started sorting tasks that way, the two tools stopped overlapping and started covering for each other. This post is that split, the routine automation that made it obvious, and the cost logic that keeps it honest. I build WordPress plugins, so the examples lean that way.

The straight-line work goes to Codex

The piece that made this practical is codex exec, Codex's non-interactive mode. You hand it one instruction, it runs once, and it prints the result. That is the part you can drop into a script.

I set the model and reasoning once, in ~/.codex/config.toml:

model = "gpt-5.5"
model_reasoning_effort = "medium"
approval_policy = "on-request"
sandbox_mode = "workspace-write"

Medium reasoning is on purpose. Routine work is mechanical edits and summaries, not hard design, and heavier reasoning just makes the run slower and more expensive for the same result. I raise it in the moment only when a task turns genuinely hard. The on-request policy makes Codex ask before it writes or runs anything, and workspace-write keeps it inside the working folder. Project conventions live in AGENTS.md, which is Codex's version of a CLAUDE.md.

Here is the work I actually hand it. Commit messages from the staged diff:

git add -A
codex exec "read git diff --staged and output a single-line commit message. No preamble, message only."

And the one that earns its keep, version bumps. A WordPress plugin holds its version in two places that have to match: the Version: header in the main PHP file and the Stable tag: in readme.txt. Miss one and the release breaks.

codex exec "bump this plugin's version from 1.0.9.10 to 1.0.9.11. Change two places: the Version: header in the main PHP file and the Stable tag in readme.txt. Nothing else."

With on-request, I see the diff before it applies. Then the release chores go into one script:

#!/usr/bin/env bash
set -euo pipefail
NEW_VERSION="$1"
codex exec "bump the plugin version to $NEW_VERSION in the PHP header and readme.txt Stable tag, nothing else."
codex exec "add a $NEW_VERSION section to the top of CHANGELOG.md from recent commits, matching the existing format."
git diff

A careful five-minute ritual became one command and a diff review.

The conversation work, and why I keep the second tool

Exploration and building stay in Claude Code. It holds the whole project and is comfortable going back and forth while something takes shape. If Codex handles the straight lines, why keep Claude Code around? Because a few patterns only work with two tools.

The one I lean on most is cross-model review. I build with Claude Code, then have Codex review the diff:

codex exec "review git diff for security issues and bugs. Cite file and line for each. Findings only, not praise."

A model reviewing its own output tends to like what it wrote. A different model trips on what the first one walked past. The "findings only" part matters more than it looks, because without it you get "this looks solid" and a soft non-answer.

The second pattern is extract-the-repeat. I explore a feature in Claude Code, notice a step I'll repeat every time, and pull just that step into a codex exec line in a script. Thinking stays in the conversational tool, repetition moves to the straight-line one.

The third is for changes I can't get wrong: run the same request through both and diff the answers. If they agree, I relax. If they diverge, that is where a human decision belongs. It's heavy, so I save it for the scary ones.

Keeping context in sync

Claude Code reads CLAUDE.md, Codex reads AGENTS.md. I keep both in the repo with the same conventions, so either tool behaves the same. The trap is editing one and forgetting the other, so a convention change means touching both. When I move a long task across, I have the first tool summarize where things stand and hand over the summary, not the whole transcript. These tools hold only so much at once, so moving the gist keeps the second one sharp.

The cost logic, including the June 15 change

Money is part of why two beats one here. The rule: do long, exploratory work where it's flat-rate, and short, mechanical work where metered is cheap anyway. Interactive Claude Code runs inside the subscription. A codex exec call is small, so metered costs little per run.

This sharpened on June 15, 2026, when Anthropic moved programmatic Claude use, the claude -p headless path and the Agent SDK, onto separate metered credit, while interactive Claude Code stayed on the plan. So scripting Claude with claude -p is no longer flat. Which fits the split I already had: explore interactively on the flat plan, automate through codex exec where metered is cheap. Pricing shifts, so check current numbers, but the shape holds.

What goes wrong, and the one rule I keep

Two tools have their own friction. The two convention files drift if you edit one and not the other. Pointing both at the same files at once lets one edit stomp the other, so I work one at a time. And reviewers slide into agreement without the "findings only" instruction. Two is not always better than one. If one tool covers it, I use one.

The rule I don't break: treat anything either tool writes as untrusted input until I've read it. Version numbers, config, anything touching user input gets a human diff review before it commits or ships, no matter which model produced it. The point of automating the boring work is to free up attention, so I spend some of it on the review.

Where this leaves me

The split held up because it maps to something real. Some work is a conversation and some is a straight line, and the tools are honestly better at one or the other. Build and explore in the conversational one, run and repeat in the straight-line one, let a different model check the first model's work, and put the release chores behind a single command. Judgment and the last diff stay with me. The day one tool covers the job, I'll use one.

I build WordPress plugins and write about AI tooling and security at https://raplsworks.com/.

Conversation or straight line: how I split work between Claude Code and Codex

The straight-line work goes to Codex

The conversation work, and why I keep the second tool

Keeping context in sync

The cost logic, including the June 15 change

What goes wrong, and the one rule I keep

Where this leaves me

Comments

More from this blog

Before you disconnect MCP servers, read the context breakdown

Prompt injection isn't the whole story: secure what the model hands back

A WordPress chatbot that answers from your own content: the design decisions behind it

Claude Fable 5 lasted three days. Then the US government pulled it.

Command Palette

The straight-line work goes to Codex

The conversation work, and why I keep the second tool

Keeping context in sync

The cost logic, including the June 15 change

What goes wrong, and the one rule I keep

Where this leaves me

Comments

More from this blog