OpenAI Codexsuperappcoding agentOpenAI Codex superapp pivot 2026Codex 3 million weekly users growth

OpenAI Codex Hits 3M Weekly Users, Pivots to Superapp

19 Apr 20266 min readSarah Chen

// IN THIS ARTICLE

01The Numbers 02What's Actually New 03What's Priced In for AI Development 04Contrarian View 05Key Takeaways 06Frequently Asked Questions

Codex is at 3 million weekly users and growing 70 percent month over month. That compounding rate, if it holds even two more months, puts the product above 8 million weekly actives by mid-summer, which would make it the fastest-scaling developer-adjacent surface OpenAI has ever shipped. The April 17 update also quietly stops describing Codex as a coding agent. Codex head Thibault Sottiaux said the company is "building the super app out in the open," and the feature list backs that framing.

The Numbers

The headline metric is the growth curve, not the absolute user count. 70 percent month over month is the kind of figure you normally see in the first 90 days of a consumer launch, not on a product that started life as a CLI coding tool. For comparison inside OpenAI's own portfolio, ChatGPT's growth curve flattened into the low single digits month over month long before it reached this scale. Codex is behaving like a new product, not a mature one, which is what makes the repositioning credible.

As The Rundown AI reported, the April 17 release folds five things into a single surface: background computer use across any Mac app, parallel agents, an Atlas-powered in-app browser, inline image generation via gpt-image-1.5, and a memory feature currently in preview. Automations extend that memory across days, so a long-running task can be resumed later without re-priming context.

The competitive frame matters. Anthropic released Claude Opus 4.7 in the same news cycle, jumping from 53.4 percent to 64.3 percent on SWE-bench Pro, priced identically to 4.6 at the API tier. Anthropic's unreleased Mythos Preview sits at 77.8 percent but is gated to exclusive partners. That's a 13.5-point delta on the same benchmark between what the public can buy and what Anthropic is sitting on. Anthropic is also running a roughly two-month public cadence, per its own disclosure. What we don't know: OpenAI has not disclosed the underlying model mix powering Codex's new features, nor whether background computer use runs on a variant of GPT-5.4 or something newer. The bound: if OpenAI's pricing stays consistent with current platform docs, inference cost per parallel agent session is the load-bearing variable for whether this scales past heavy users.

Separately, OpenAI launched GPT-Rosalind, its first domain-specialized reasoning model, within three days of GPT-5.4-Cyber. On a blind RNA test from Dyno Therapeutics, Rosalind beat 95 percent of human scientists on prediction tasks. Amgen, Moderna, and the Allen Institute are already using it.

What's Actually New

Strip away the marketing and three things are genuinely different this cycle.

First, background computer use on any Mac app is a categorical change, not an incremental one. Previous agent frameworks required API access or a sanctioned automation surface. Codex now operates apps that have no API at all, which means the addressable surface for automation just expanded to roughly every piece of desktop software a developer touches. Anthropic has been shipping computer use primitives (documented in the Anthropic docs) for a while, but the packaging here, background execution with parallel agents and persistent memory, is the first time this has felt like an operating layer rather than a demo.

Second, the in-app browser with page markup is the piece engineers should pay attention to. Letting a developer annotate a DOM to direct an agent collapses the prompt engineering problem into a UI problem. You point, the agent acts. That's a fundamentally different interaction model than structured tool-use via something like MCP, and it's going to bifurcate how teams build agent integrations: programmatic for backend workflows, annotation-based for anything with a screen.

Third, memory plus automations changes the planning horizon. Most agent products today die at the session boundary. If Codex can genuinely pick up a task "days later" with retained context and user preferences, the unit of work stops being a prompt and becomes a project. That's the precondition for any agent to do anything economically meaningful. Whether it actually holds state reliably at 3 million weekly users is the open question. Memory is in preview, which is OpenAI's usual signal that retention quality is not yet at GA bar.

Prediction: if background computer use scales without obvious failure modes, we should see at least one major IDE vendor (likely Windsurf, which just shipped 2.0 with an Agent Command Center and Devin integration) pivot its agent story to match within 60 days.

What's Priced In for AI Development

The market already expected OpenAI to push toward a superapp. That has been telegraphed for months and the acquisition of Atlas made it structurally inevitable. What's priced in: OpenAI bundling ChatGPT, browser, and agent surfaces into one app. Also priced in: Anthropic and OpenAI trading the SWE-bench crown back and forth every quarter. Opus 4.7 taking the top public slot from GPT-5.4 on agentic coding surprises nobody.

What's not priced in, in my read: the speed at which domain-specialized models are arriving. GPT-Rosalind and GPT-5.4-Cyber shipped within three days of each other. That's two vertical models in one week from a company that until recently insisted general-purpose frontier models would subsume everything. A life sciences model that beats 95 percent of human scientists on a specific RNA prediction task, shipped to Amgen, Moderna, and the Allen Institute during a test phase, implies the calculus has changed. Vertical models with gated enterprise access is an Anthropic-flavored strategy, not the OpenAI strategy of 18 months ago.

Also underpriced: the Mythos Preview gap. 77.8 percent vs 64.3 percent on SWE-bench Pro means Anthropic is holding back roughly a generation of capability from the public API. If OpenAI has a similar gated tier, we don't know about it, and that asymmetry matters for anyone building on top of public APIs expecting parity with frontier capability.

Contrarian View

The consensus take is that Codex at 3 million weekly with 70 percent growth validates the superapp thesis. I'd push back. 70 percent month over month on a base of 3 million is an impressive number in isolation, but we don't know the user mix. The source doesn't disclose how many of those weekly actives are paid, how many are in trial, or what retention looks like at day 30. That matters because background computer use and parallel agents are inference-heavy features, and if the growth is being pulled by a free or heavily subsidized tier, the unit economics could look very different from ChatGPT's.

Early reactions to Opus 4.7 are also, per the source, "divided on capabilities despite benchmarks." Benchmarks and lived developer experience are diverging. The same risk applies to Codex: a feature list that reads like a superapp can still produce a daily workflow that feels like a beta. The testable bound: if Codex retention at day 30 is below 40 percent (a reasonable floor for productivity tools), the superapp framing is marketing, not product reality. We'll know within two quarters.

Key Takeaways

Codex at 3 million weekly users and 70 percent month-over-month growth is the fastest curve OpenAI has shown on a developer-adjacent product, but the user mix and retention aren't disclosed.
Background computer use on any Mac app, not just API-enabled software, is the categorical shift. It expands the automation surface to essentially all desktop software.
Opus 4.7 at 64.3 percent on SWE-bench Pro tops GPT-5.4 and Gemini 3.1 Pro publicly, but Anthropic's gated Mythos Preview at 77.8 percent shows a 13.5-point frontier gap the public cannot access.
Domain-specialized models (GPT-Rosalind, GPT-5.4-Cyber, three days apart) signal OpenAI is abandoning the pure generalist stance. Rosalind beat 95 percent of human scientists on a blind Dyno Therapeutics RNA task.
Watch for IDE vendors (Windsurf 2.0 already shipped an Agent Command Center with Devin) to mirror Codex's parallel-agent pattern inside 60 days. If they don't, the moat is real.

Frequently Asked Questions

Q: What changed in the April 17, 2026 Codex update?

OpenAI added background computer use across any Mac app, parallel agents, an Atlas-powered in-app browser, inline image generation via gpt-image-1.5, and a memory feature in preview. Automations extend memory across sessions so long-running tasks can resume days later. Codex head Thibault Sottiaux described it as "building the super app out in the open."

Q: How does Claude Opus 4.7 compare to OpenAI's models on coding benchmarks?

Opus 4.7 scores 64.3 percent on SWE-bench Pro, up from Opus 4.6's 53.4 percent, and tops both GPT-5.4 and Gemini 3.1 Pro on agentic coding. Anthropic's unreleased Mythos Preview, accessible only to exclusive partners, scores 77.8 percent on the same benchmark. Opus 4.7 is priced identically to 4.6 at the API level.

Q: What is GPT-Rosalind and who has access to it?

GPT-Rosalind is OpenAI's first domain-specialized reasoning model, targeting life sciences, drug discovery, and biological research. It can read papers, query lab databases, design experiments, and generate hypotheses. On a blind RNA prediction test from Dyno Therapeutics it beat 95 percent of human scientists. Amgen, Moderna, and the Allen Institute are using it during the current enterprise test phase.

Sarah Chen

RiverCore Analyst · Dublin, Ireland

// RELATED ARTICLES