Codex Major Upgrades: Time to Switch from Claude Code?

Over the past five years, AI in the software development industry has evolved from simple code completion and pair programming to solutions capable of supporting full-scale software architecture decisions. A whole wave of AI-powered tools, ranging from AI plugins for mainstream IDEs and standalone AI-driven IDEs, to full-scale AI engineering platforms, has entered the market to accelerate the coding process, reduce repetitive tasks, and elevate what it means to build software. Claude Code has been among the leading tools on the market so far.

But in spring 2025, OpenAI relaunched Codex, a substitute for its earlier, API-only version that had been deprecated. OpenAI claims the upgraded Codex is not just a model that turns text into code, but a full-fledged AI software engineer that works autonomously on programming tasks in the cloud. Since then, OpenAI introduced several significant updates, culminating in the GPT-5-Codex model, now accessible virtually everywhere developers work. With the latest upgrades announced during OpenAI DevDay, Codex has gained even more powerful capabilities so today many developers are switching or considering switching to it from other popular alternatives.

Let’s unpack what Codex is, how it has evolved over the past six months, its latest updates, and whether it’s time to switch from tools like Claude Code.

What is Codex?

In May 2025, OpenAI introduced a research preview of its upgraded Codex, a cloud-based software engineering agent that can perform multiple parallel tasks, such as writing new features, fixing bugs, and creating pull requests in isolated environments. The announced tool was powered by codex-1 model, a version of OpenAI o3 optimized for programming. Initially available for ChatGPT Pro, Team, and Enterprise users, the agent was gradually expanded to Plus and Edu subscribers.

Users could try it through a sidebar in the ChatGPT interface. To submit a task, they had to enter a text query and click "Code," and to ask questions about existing code, click "Ask." Each task ran in an isolated cloud sandbox preloaded with the user's repository. The agent could read and edit files, as well as run commands, including test harnesses, linters, and type checkers.

At the same time, OpenAI released Codex CLI, its lightweight open-source assistant that developers could use directly in their terminal. The CLI version allowed using OpenAI’s coding models like o3 and o4-mini locally, so it was possible to “pair program” with them to write, fix, or edit code faster without needing to open ChatGPT or a web app.

OpenAI Launches Its Major Upgrade - GPT-5-Codex

In September, OpenAI released its major upgrade - GPT-5-Codex, a new, engineering-focused variant of its GPT-5, specifically tailored for agent-based programming in Codex. In simpler terms, it’s GPT-5 fine-tuned specifically for programming and agentic workflows. The new model has become the primary/default and is designed to tackle large, real-world engineering challenges: building complete projects from scratch, adding features and tests, debugging, refactoring, and running code reviews, while simultaneously interacting with external tools and test suites. When running in the cloud, GPT-5-Codex can accept images/screenshots, visually monitor progress, and attach visual artifacts to tasks.

OpenAI positions it as a successor to earlier Codex efforts, but building on GPT-5 to improve its analysis of large codebases and better perform multi-step engineering tasks. As an agent-based system, it can not only generate code but also autonomously schedule tasks, run tests, fix errors, iterate, and bring projects to successful completion without constant human intervention. It is the level at which AI is perceived not as a passive auto-completion tool, but as a fully-fledged member of the development team.

What’s more, the system can be used by those with no programming language skills as OpenAI has successfully implemented a "vibe coding" mode. It means that instead of typing actual code, users can describe the app, features, or behavior they want. Codex will generate the program or web service for you based on this description.

Adaptive Thinking

One of the defining innovations of the product is the duration of adaptive thinking. GPT-5-Codex adjusts the amount of hidden reasoning performed: simple tasks are solved almost instantly, while complex refactorings or long-running tasks allow the model to "think" for much longer. Unlike GPT-5, which uses a router to switch between models, GPT-5-Codex automatically adjusts the execution time on a task, spending anywhere from a few seconds to several hours. In some cases, the model runs continuously for up to seven hours, refactoring, fixing errors, and finalizing the solution. At the same time, for small interactive steps, the model consumes far fewer tokens than a general-purpose GPT-5 instance. This alternating reasoning strategy is designed to produce quick answers and deep, thorough execution when needed.

Integration with IDEs, GitHub, and ChatGPT interface

The model is now available virtually everywhere - via Codex CLI, Codex IDE extension, Codex Cloud, GitHub workflows, and ChatGPT mobile interfaces. Developers can access it both through their ChatGPT subscription and via an API key.

Specifically, with this update in September, OpenAI added the Codex IDE extension and GitHub integration, making the Codex cloud agent accessible directly from developers’ primary workflows without leaving their editor or GitHub. Codex IDE extension allows developers to interact with Codex directly in their editor, preview cloud tasks, and maintain context between local and cloud environments. The extension is compatible with VS Code and its forks, such as Cursor and Windsurf. Integration with GitHub enables automated code reviews and pull request generation.

Codex CLI and Cloud Update

The Codex CLI and Cloud tools were also updated, with improved image support, task tracking, security modes, and an improved user interface facilitating development and code review. The update to Codex-CLI marks an important step toward bringing AI tools closer to the reality of everyday development. A key innovation is the codex resume feature, which allows resuming old sessions. Now, instead of starting from scratch, the model returns to an unfinished project and continues working in the same context, as if it were a human developer who had stepped away from the computer for a while.

Code Review

Codex now features code review capabilities trained to find critical errors. Unlike static analysis tools, it compares the stated PR target with the actual changes in the diff, analyzes the entire codebase and dependencies, and runs code and tests to verify behavior. This level of thoroughness helps teams find issues earlier, reduce reviewer workload, and release with greater confidence. After inclusion in GitHub repositories, Codex automatically reviews PRs as they move from draft to final, publishing its analysis directly in the PR. If it recommends edits, it is possible to stay in the same discussion thread and ask Codex to implement the changes. It is also possible to manually request a review by mentioning @codex review in the PR and providing additional instructions, such as @codex review for security vulnerabilities or @codex review for outdated dependencies.

Benchmarks

But what is the most interesting is the benchmark results. OpenAI claims that GPT-5-Codex outperforms GPT-5 in agent-based coding benchmarks like SWE-bench Verified and demonstrates performance gains when refactoring code from large repositories. On the SWE-bench Verified dataset, which contains 500 real-world software engineering tasks, GPT-5-Codex achieved a success rate of 74.5%. This exceeds GPT-5's 72.8% success rate on the same benchmark, confirming the agent's improved capabilities. The 500 programming tasks are from real-world open-source projects. Previously, only 477 tasks could be tested, but now all 500 tasks can be tested, providing more comprehensive results.

From earlier GPT-5 settings to GPT-5-Codex, code refactoring scores increased significantly. While the baseline GPT-5 achieved 33.9%, the Codex modification achieved 51.3% on the same set of tasks. These gains reflect improvements in the quality of work with large codebases and that the model not only writes code but is also significantly better at adapting and improving existing projects. The model also produced fewer erroneous or irrelevant code comments and is more effective at detecting bugs.

What has OpenAI added to Codex at its latest DevDay?

Slack Integration

Now Codex can be integrated directly into Slack, the chat app many teams use for daily work. Many real engineers already work this way — discussing code and debugging in Slack. So you can talk to Codex right inside Slack, just like you’d message a teammate, as the interaction style is conversational, not command-line or IDE-based. It is possible to type @codex and ask something like “Can you help fix this bug?” or “Generate a test for this function. And Codex automatically gets context from the conversation, selects the appropriate environment, and replies in the Slack thread — generating or explaining code, suggesting fixes, etc., with a link to the completed task in Codex cloud. You can then merge its changes and keep iterating, or export the task to your computer for local editing.

Codex SDK

The next major feature announced by OpenAI is the Codex SDK. You can now integrate Codex directly into your own software — like your internal tools, apps, or CI/CD pipelines. OpenAI also announced the release of a new GitHub Action, an automatically triggered step in GitHub for events: push, issue, PR, etc. If you connect Codex to your development tools through the SDK or GitHub Actions, you’re basically allowing it to run code and make decisions without human approval. So instead of a developer manually asking Codex, “Fix this bug,” the system itself could trigger Codex whenever an event happens — like a new bug report being filed. In other words, this functionality is giving Codex the ability to act on its own, and it’s not just generating suggestions for a developer, but actually executing actions automatically inside your engineering pipeline. It is powerful, but risky, as AI might misunderstand a bug, break something, or even expose sensitive data if not properly sandboxed. So, full confidence in the system’s reliability and safety is required before giving it that level of control.

New Management Features

When Codex works with your repository, it doesn’t touch your real code directly. Instead, it creates an isolated container for it to work in. While many developers assumed that Codex deleted these containers after the session ended, in fact, Codex keeps them for a while so you can look at logs, and diffs. Of course, these containers could contain sensitive information. Now, ChatGPT administrators can delete these containers proactively. Administrators can also set more secure defaults for using the Codex CLI and IDE extension, for example, by defining overrides through managed configuration or monitoring actions taken by Codex. The company also introduced new analytics for managers/admins that report on usage across different interaction modes and track the quality of code reviews or changes generated by Codex. These administrative features are available for the Business, Edu, and Enterprise plans.

General Availability

Codex was already being used by lots of people — it was working well, had paying users, and had even grown 10× between August and September. Now, with the “Generally Available” announcement, OpenAI is saying the product has officially left testing or feedback status and is considered a complete, stable product integrated into professional workflows.

So, Should You Switch to It if You Are Already Using Claude Code?

Until the latest upgrades from Codex, Claude Code held the lead in this area, and Codex was perceived as a less mature solution. Now, OpenAI is attempting to change the balance of power. At the moment, both Claude Code and Codex have become pretty similar in what they can do. At the same time, in terms of reasoning, Codex has started to provide flexible reasoning controls as over-reasoning on basic tasks was annoying. Now, tasks can be assigned low, medium, high, or minimal reasoning. In contrast, Claude Code offers fewer options for controlling reasoning depth, so it tends to slow down to provide thoughtful guidance by default. In essence, Codex gives more control over how fast or detailed the output should be, while Claude leans toward careful, step-by-step explanations.

Codex is highly platform-agnostic — local, cloud, IDE, web — while Claude Code is mainly terminal/IDE-focused and local-first. When it comes to features, Claude Code provides additional options such as sub-agents, custom hooks, and slash commands. Codex, while more limited in features, supports the Agents.md standard for instructions, making it compatible with other tools, like Cursor, Windsurf, GitHub Copilot, etc. while Claude uses its own proprietary, Anthropic-specific format, prioritizing deep customization within its ecosystem.

GitHub integration is a notable strength for Codex, enabling automated pull request reviews, inline comments, and actionable recommendations that streamline code review workflows. Claude Code offers GitHub integrations but they are generally less seamless in practice. Claude Code also provides GitHub connectivity, but developers report they require manual invocation and lack Codex's native automation, making them less seamless for team workflows.

Where Codex really shines is pricing and usage. Codex uses your existing ChatGPT subscription. Plus for $20/mo offers access with certain usage limits, while Pro at $200/mo and Team/Enterprise at custom pricing provide expanded usage. Pricing plans are similar to those of Claude Codet, Codex seems to be more efficient under the hood, so you get more done for less money, and the limits on the lower tiers are generous. Claude Code can feel restrictive if you hit usage caps.

So, based on the brief and overall comparison of the two, Codex offers more advanced functionality, at the same time the Claude vs Codex discussion isn’t just about choosing between OpenAI and Anthropic. It’s about the evolving role of AI agents as their capabilities are only expected to grow. The choice between Claude and Codex reflects a broader question about how AI will integrate into workflows and influence the way people work and interact with technology. Claude Code leans toward collaboration. It is like a friendly teammate that works with you, not just for you. It is quick, explains code clearly, perfect for frontend work, documentation, and small fixes. Codex focuses on efficiency, speed and task-focused execution. It is more like a brilliant but silent expert, sometimes slower, but more meticulous, great for complex refactors, backend logic, and critical production tasks. It generates outcomes quickly and uses fewer resources, which is perfect if you just want something working yesterday.

Final Thoughts

The upgraded Codex represents a significant leap in AI-powered software engineering. It combines GPT-5’s general reasoning strength with deep, autonomous programming capabilities, supporting complex multi-step workflows, from planning and coding to debugging and reviewing. Its integration across IDEs, GitHub, and cloud environments makes it more than a code assistant — it’s becoming a full-scale AI engineering platform. But with all these advantages, should you switch from Claude Code? Right now, both competitors are getting hundreds of billions in funding. As discussed earlier, Claude Code excels as a collaborative partner for exploration, while Codex prioritizes rapid, autonomous execution. There's no universal winner. It is more about the choice between approaches that will fit your workflow and priorities better. It’s worth giving both a try to understand which one fits your needs.

FAQ

What is Codex?

Codex is an AI-powered software engineering agent powered by GPT-5-Codex, optimized specifically for programming tasks. It can work autonomously on real coding projects - reading repositories, writing code, running tests, fixing bugs, and creating pull requests in isolated cloud sandboxes.

What’s new in GPT-5-Codex?

GPT-5-Codex, released in September 2025, is fine-tuned for programming and agentic workflows. It can handle multi-step tasks, interact with external tools, review code, accept screenshots/images, and generate complete projects from descriptions.

Who can use Codex?

Both experienced developers and non-programmers can use it. Instead of writing code manually, users can describe what they want in plain English, and Codex will generate the complete app, including frontend, backend and database setup. However, developers get the most value from its advanced features like code reviews and large codebase refactoring.

How can developers access Codex?

Codex is available via CLI, IDE extensions (like VS Code and its forks), cloud-based workflows, GitHub integration, and the ChatGPT interface. This availability makes it versatile for local development, cloud execution, or integrated CI/CD pipelines.

What's Codex pricing? Is it more expensive than Claude Code?

Codex uses your existing ChatGPT subscription. Plus for $20/mo offers access with certain usage limits, while Pro at $200/mo and Team/Enterprise at custom pricing provide expanded usage. Pricing plans are similar to those of Claude Code, but Codex offers more generous usage limits even on lower plans. Claude Code can feel more restrictive on heavy use, especially for large repositories or long sessions.

Can Codex act autonomously?

Codex can schedule tasks, run tests, fix errors, and iterate on projects without constant human input. Features like the Codex SDK and GitHub Actions allow automated task execution based on events in development workflows.

How does Codex handle private code repositories?

Codex runs in isolated containers that never touch your actual repository. Your code is copied into temporary sandboxes. Code tracks actions through logs and diffs, and allows administrators to delete data, set secure defaults, and monitor activity.

How does Codex handle code review?

Codex analyzes the full codebase, compares pull request targets with actual changes, runs tests, and flags critical issues. It can provide inline suggestions on GitHub PRs, implement fixes, and generate reports for better collaboration and faster releases.

Should I switch to Codex right now?

It depends on your workflow. If your focus is speed, automation, and scalability, Codex might be the upgrade you’ve been waiting for. If you rely on explanations, reasoning, and guided collaboration, Claude Code is still the better choice. The best way to determine which approach better fits your development style is to try both for a week or two.