Follow ZDNET: Add america arsenic a preferred source connected Google.
ZDNET's cardinal takeaways
- OpenAI targets "conversational" coding, not dilatory batch-style agents.
- Big latency wins: 80% faster roundtrip, 50% faster time-to-first-token.
- Runs connected Cerebras WSE-3 chips for a latency-first Codex serving tier.
The Codex squad astatine OpenAI is on fire. Less than 2 weeks aft releasing a dedicated agent-based Codex app for Macs, and lone a week aft releasing the faster and much steerable GPT-5.3-Codex connection model, OpenAI is counting connected lightning striking for a 3rd time.
Also: OpenAI's caller GPT-5.3-Codex is 25% faster and goes mode beyond coding present - what's new
Today, the institution has announced a probe preview of GPT-5.3-Codex-Spark, a smaller mentation of GPT-5.3-Codex built for real-time coding successful Codex. The institution reports that it generates codification 15 times faster portion "remaining highly susceptible for real-world coding tasks." There is simply a catch, and I'll speech astir that successful a minute.
Also: OpenAI's Codex conscionable got its ain Mac app - and anyone tin effort it for escaped now
Codex-Spark volition initially beryllium disposable lone to $200/mo Pro tier users, with abstracted complaint limits during the preview period. If it follows OpenAI's accustomed merchandise strategy for Codex releases, Plus users volition beryllium next, with different tiers gaining entree reasonably quickly.
(Disclosure: Ziff Davis, ZDNET's genitor company, filed an April 2025 suit against OpenAI, alleging it infringed Ziff Davis copyrights successful grooming and operating its AI systems.)
Expanding the Codex household for real-time collaboration
OpenAI says Codex-Spark is its "first exemplary designed specifically for moving with Codex successful real-time -- making targeted edits, reshaping logic, oregon refining interfaces and seeing results immediately."
Let's deconstruct this briefly. Most agentic AI programming tools instrumentality a portion to respond to instructions. In my programming work, I tin springiness an acquisition (and this applies to some Codex and Claude Code) and spell disconnected and enactment connected thing other for a while. Sometimes it's conscionable a fewer minutes. Other times, it tin beryllium agelong capable to get lunch.
Also: I got 4 years of merchandise improvement done successful 4 days for $200, and I'm inactive stunned
Codex-Spark is seemingly capable to respond overmuch faster, allowing for speedy and continuous work. This could velocity up improvement considerably, particularly for simpler prompts and queries.
I cognize that I've been occasionally frustrated erstwhile I've asked an AI a ace elemental question that should person generated an contiguous response, but alternatively I inactive had to hold 5 minutes for an answer.
By making responsiveness a halfway feature, the exemplary supports much fluid, conversational coding. Sometimes, utilizing coding agents feels much similar old-school batch benignant coding. This is designed to flooded that feeling.
GPT-5.3-Codex-Spark isn't intended to regenerate the basal GPT-5.3-Codex. Instead, Spark was designed to complement high-performance AI models built for long-running, autonomous tasks lasting hours, days, oregon weeks.
Performance
The Codex-Spark exemplary is intended for enactment wherever responsiveness matters arsenic overmuch arsenic intelligence. It supports interruption and redirection mid-task, enabling choky iteration loops.
This is thing that appeals to me, due to the fact that I ever deliberation of thing much to archer the AI 10 seconds aft I've fixed it an assignment.
The Spark exemplary defaults to lightweight, targeted edits, making speedy tweaks alternatively than taking large swings. It besides doesn't automatically tally tests unless requested.
OpenAI has been capable to trim latency (faster turnaround) crossed the afloat request-response pipeline. It says that overhead per client/server roundtrip has been reduced by 80%. Per-token overhead has been reduced by 30%. The time-to-first-token has been reduced by 50% done league initialization and streaming optimizations.
Another mechanics that improves responsiveness during iteration is the instauration of a persistent WebSocket connection, truthful the transportation doesn't person to continually beryllium renegotiated.
Powered by Cerebras AI chips
In January, OpenAI announced a partnership with AI chipmaker Cerebras. We've been covering Cerebras for a while. We've covered its inference service, its work with DeepSeek, its enactment boosting the show of Meta's Llama model, and Cerebras' announcement of a really large AI chip, meant to treble LLM performance.
GPT-5.3-Codex-Spark is the archetypal milestone for the OpenAI/Cerebras concern announced past month. The Spark exemplary runs connected Cerebras' Wafer Scale Engine 3, which is simply a high-performance AI spot architecture that boosts velocity by putting each the compute resources connected a azygous wafer-scale processor the size of a pancake.
Also: 7 ChatGPT settings tweaks that I tin nary longer enactment without - and I'm a powerfulness user
Usually, a semiconductor wafer contains a full clump of processors, which aboriginal successful the accumulation process get chopped isolated and enactment into their ain packaging. The Cerebras wafer contains conscionable 1 chip, making it a very, precise large processor with very, precise intimately coupled connections.
According to Sean Lie, CTO and co-founder of Cerebras, "What excites america astir astir GPT-5.3-Codex-Spark is partnering with OpenAI and the developer assemblage to observe what accelerated inference makes imaginable -- caller enactment patterns, caller usage cases, and a fundamentally antithetic exemplary experience. This preview is conscionable the beginning."
The gotchas
Now, present are the gotchas.
First, OpenAI says that "when request is high, you whitethorn spot slower entree oregon impermanent queuing arsenic we equilibrium reliability crossed users." So, fast, unless excessively galore radical privation to spell fast.
Here's the kicker. The institution says, "On SWE-Bench Pro and Terminal-Bench 2.0, 2 benchmarks evaluating agentic bundle engineering capability, GPT-5.3-Codex-Spark underperforms GPT-5.3-Codex, but tin execute the tasks successful a fraction of the time."
Last week, successful the GPT-5.3-Codex announcement, OpenAI said that GPT-5.3-Codex was the archetypal exemplary it classifies arsenic "high capability" for cybersecurity, according to its published Preparedness Framework. On the different hand, the institution admitted that GPT-5.3-Codex-Spark "does not person a plausible accidental of reaching our Preparedness Framework threshold for precocious capableness successful cybersecurity."
Think connected these statements, beloved reader. This AI isn't arsenic smart, but it does bash those not-as-smart things a batch faster. 15x velocity is surely thing to sneeze at. But bash you truly privation an AI to marque coding mistakes 15 times faster and nutrient codification that is little secure?
Let maine archer you this. "Eh, it's bully enough" isn't truly bully capable erstwhile you person thousands of pissed disconnected users coming astatine you with torches and pitchforks due to the fact that you abruptly broke their bundle with a caller release. Ask maine however I know.
Last week, we learned that OpenAI uses Codex to constitute Codex. We besides cognize that it uses it to beryllium capable to physique codification overmuch faster. So the institution intelligibly has a usage lawsuit for thing that's mode faster, but not arsenic smart. As I get a amended grip connected what that is and wherever Spark fits, I'll fto you know.
What's next?
OpenAI shared that it is moving toward dual modes of reasoning and real-time enactment for its Codex models.
The institution says, "Codex-Spark is the archetypal measurement toward a Codex with 2 complementary modes: longer-horizon reasoning and execution, and real-time collaboration for accelerated iteration. Over time, the modes volition blend."
The workflow exemplary it envisions is interesting. According to OpenAI, the intent is that yet "Codex tin support you successful a choky interactive loop portion delegating longer-running enactment to sub-agents successful the background, oregon fanning retired tasks to galore models successful parallel erstwhile you privation breadth and speed, truthful you don't person to take a azygous mode up front."
Also: I tried a Claude Code rival that's local, unfastened source, and wholly escaped - however it went
Essentially, it's moving toward the champion of some worlds. But for now, you tin take accelerated oregon accurate. That's a pugnacious choice. But the close is getting much accurate, and now, astatine least, you tin opt for accelerated erstwhile you privation it (as agelong arsenic you support the trade-offs successful caput and you're paying for the Pro tier).
What astir you? Would you commercialized immoderate quality and information capableness for 15x faster coding responses? Does the thought of a real-time, interruptible AI collaborator entreaty to you, oregon bash you similar a much deliberate, higher-accuracy exemplary for superior improvement work?
How acrophobic are you astir the cybersecurity favoritism betwixt Codex-Spark and the afloat GPT-5.3-Codex model? And if you're a Pro user, bash you spot yourself switching betwixt "fast" and "smart" modes depending connected the task? Let america cognize successful the comments below.
You tin travel my day-to-day task updates connected societal media. Be definite to subscribe to my play update newsletter, and travel maine connected Twitter/X astatine @DavidGewirtz, connected Facebook astatine Facebook.com/DavidGewirtz, connected Instagram astatine Instagram.com/DavidGewirtz, connected Bluesky astatine @DavidGewirtz.com, and connected YouTube astatine YouTube.com/DavidGewirtzTV.

2 days ago
8









English (US) ·