Google's Gemini 3.1 Pro is here, and it just doubled its reasoning score

3 hours ago 5
3-1-pro
Google

Follow ZDNET: Add america arsenic a preferred source on Google.


ZDNET's cardinal takeaways

  • Gemini 3.1 Pro is present available.
  • It builds connected the benchmark advancement Gemini 3 established for Google.
  • Model capabilities are yet relative, 1 adept said. 

Another week, different "smarter" exemplary -- this clip from Google, which conscionable released Gemini 3.1 Pro. 

Gemini 3 outperformed respective rival models since its release successful November, beating Copilot in a fewer of our in-house task tests, and has mostly received praise from users. Google said this latest Gemini model, announced Thursday, achieved "more than treble the reasoning show of 3 Pro" successful testing, based connected its 77.1% people connected the ARC-AGI-2 benchmark for "entirely caller logic patterns." 

Also: Gemini vs. Copilot: I compared the AI tools connected 7 mundane tasks, and there's a wide winner

The latest exemplary follows a "major upgrade" to Gemini 3 Deep Think past week, which boasted caller capabilities successful chemistry and physics alongside caller accomplishments successful mathematics and coding, according to Google. The company said the Gemini 3 Deep Think upgrade was built to code "tough probe challenges -- wherever problems often deficiency wide guardrails oregon a azygous close solution and information is often messy oregon incomplete." Google said Gemini 3.1 Pro undergirds that science-heavy investment, calling the exemplary the "upgraded halfway quality that makes those breakthroughs possible."

Late past year, Gemini 3 scored a caller precocious of 38.3% crossed each presently disposable models connected the Humanity's Last Exam (HLE) benchmark test. Developed to combat increasingly beatable industry-standard benchmarks and amended measurement exemplary advancement against quality ability, HLE is meant to beryllium a much rigorous test, though benchmarks unsocial aren't capable to find performance.

According to Google, Gemini 3.1 Pro present bests that people astatine 44.4% -- though the Deep Think upgrade technically scored higher astatine 48.4%. Similarly, the Deep Think update scored 84.6% -- higher than 3.1 Pro's aforementioned 77.1% -- connected the ARC-AGI-2 logic benchmark. 

Also: The making of Gemini 3 - however Google's dilatory and dependable attack won the AI contention (for now)

All that said, Anthropic's Claude Opus 4.6 inactive tops the Center for AI Safety (CAIS) substance capability leaderboard (for reasoning and different text-based queries), which averages different applicable benchmark scores extracurricular of HLE. Anthropic's Opus 4.5, Sonnet 4.5, and Opus 4.6 besides bushed Gemini 3 successful presumption of safety, according to the CAIS hazard appraisal leaderboard. 

Hype management 

Benchmark records aside, the lifecycle of a exemplary doesn't extremity with a splashy release. At the existent complaint of AI development, caller models are awesome lone successful comparative presumption to their contention -- clip and investigating volition archer wherever the 3.1 Pro excels oregon fails. Gemini 3 gives the caller exemplary a beardown foundation, but that whitethorn lone past until the adjacent laboratory releases a state-of-the-art upgrade. 

Also: Inside Google's AI program to extremity Android developer toil - and velocity up innovation

"The trial numbers look to connote that it's got important betterment implicit Gemini 3, and Gemini 3 was beauteous good, but I don't deliberation we're truly going to cognize close away, and it's not disposable but to the much costly plans yet," said ZDNET elder contributing exertion David Gewirtz of the release. "The footwear hasn't yet fallen connected GPT 5.3 either, and I deliberation erstwhile it does, we'll person a much cosmopolitan acceptable of upgrades that we tin readdress."

While we hold for that exemplary to drop, Gewirtz looked into GPT-5.3-Codex, OpenAI's astir caller coding-specific release, which famously helped physique itself. 

Try it yourself

Developers tin entree Gemini 3.1 Pro successful preview contiguous done the API successful Google's AI Studio, Android Studio, Google Antigravity, and Gemini CLI. Enterprise customers tin effort it successful Vertex AI and Gemini Enterprise, and regular users tin find it successful NotebookLM and the Gemini app. 

Read Entire Article