I put GPT-5.2 through an 13-round test, and the AI model raised some serious questions

2 days ago 8

Follow ZDNET: Add america arsenic a preferred source connected Google.

ZDNET's cardinal takeaways

GPT-5.2 hardly outperforms GPT-5.1 contempt requiring a Plus subscription
Strong penning and investigation opposition with a disappointing coding regression.
New brevity and spell awesome behaviour whitethorn frustrate nonrecreational users.

OpenAI has released its latest ChatGPT model, GPT-5.2. According to the company, it's the "most susceptible exemplary bid yet for nonrecreational cognition work."

Since the generative AI roar began successful 2023, I've tally a bid of repeatable tests connected caller products and releases. ZDNET regularly tests the programming quality of chatbots, their overall performance, and how various AI contented detectors perform.

Also: Gemini vs. Copilot: I tested the AI tools connected 7 mundane tasks, and it wasn't adjacent close

(Disclosure: Ziff Davis, ZDNET's genitor company, filed an April 2025 suit against OpenAI, alleging it infringed Ziff Davis copyrights successful grooming and operating its AI systems.)

So, let's tally immoderate tests connected OpenAI's claims for its latest model, shall we?

Testing GPT-5.2

I precocious ran the apical escaped chatbots done a bid of 10 text-related tests, each worthy 10 points, and 4 image-related tests, each worthy 5 points, for a full of 120 points. ChatGPT's escaped tier led the battalion with an wide people of 109.

Note that the escaped tier of ChatGPT does not yet enactment GPT-5.2. When I logged successful utilizing my escaped trial relationship and asked the AI what exemplary it was using, I was told, "You're presently talking to ChatGPT based connected GPT-5.1."

Therefore, each my tests volition beryllium successful the $20/month ChatGPT Plus tier.

Test 1: Summarize a quality communicative

Available points: 10
Awarded points: 9

This tests ChatGPT's quality to look up existent accusation and travel directions. I directed it to summarize the Washington State flooding communicative by visiting Yahoo News.

Also: Get your quality from AI? Watch retired - it's incorrect astir fractional the time

It correctly summarized the wide situation, but it derived its reply from some Axios and Yahoo News. GPT-5.2 loses a constituent for going beyond the restrictions successful the prompt.

Test 2: Academic conception mentation

Available points: 10
Awarded points: 10

This situation asks the AI to explicate acquisition constructivism to a five-year-old. It's designed to show an AI's quality to probe and study connected a concept, and besides to contiguous it successful a mode that is understandable to its people audience.

Also: Sick of AI successful your hunt results? Try these 8 Google alternatives

GPT-5.2 provided a clear, concise, one-sentence effect that could beryllium understood by a child. All 10 points were awarded.

Test 3: Math and investigation

Available points: 10
Awarded points: 10

So far, GPT-5.2 is turning successful coagulated results. This trial is designed to trial however good the AI tin bash mathematics and signifier recognitions. I walk it a series of numbers. Those numbers are portion of a mathematics trope called the Fibonacci Sequence, but I don't archer that to the AI.

Also: OpenAI wins golden astatine prestigious mathematics contention - wherefore that matters much than you think

When asked to capable successful immoderate of the numbers successful the sequence, the AI indispensable deduce the meaning of the signifier and execute the calculations to supply the sequence. GPT-5.2 did this instantly and accurately.

Test 4: Cultural treatment

Available points 10
Awarded points: 10

This trial asks the AI to conception a case, signifier a coherent argument, and contiguous an sentiment connected an reply that doesn't person a definitive close oregon incorrect answer.

ChatGPT 5.2's reply was interesting. First, this is the archetypal GPT-5.2 reply that had immoderate hold from punctual to response. It took astir 30 seconds to springiness maine an answer. Second, the answers were precise brief. The AI provided maine with 2 concise one-sentence answers.

Also: AI could yet wage disconnected for businesses successful 2026 - acknowledgment to this, experts say

It does get 10 points due to the fact that those 2 sentences bash precisely supply the "Provide 2 reasons for your view" reasons that it was prompted on, and the answers were connected target.

Test 5: Literary investigation

Available points: 10
Awarded points: 10

So, this is new. I gave it my prompt, and successful effect I was told, "I'm acceptable to answer, but this petition would necessitate a longer, multi-paragraph explanation. I'm waiting for your spell awesome earlier proceeding."

This tests the AI's knowing of a portion of modern literature, successful this lawsuit the archetypal Game of Thrones book, A Song of Ice and Fire. It asks what the main themes are, and wherefore they're important.

Also: The champion escaped AI courses and certificates for upskilling - and I've tried them all

GPT-5.2 gave a broad effect touching connected 7 main themes ranging from powerfulness and its consequences to the illusion of grant versus survival, each the mode to memory, history, and forgotten truths. All 10 points were awarded.

Test 6: Travel itinerary

Available points: 10
Awarded points: 8

This tests the AI's cognition of geographic regions and its quality to make a adjuvant question itinerary based connected circumstantial interests. I asked it to program a week-long abrogation successful Boston successful March focused connected exertion and history.

It deed connected a bully premix of points of interests, but GPT-5.2 mislaid points due to the fact that it didn't urge immoderate eateries and didn't sermon outgo oregon pricing.

Also: I tried Google's caller trip-planning AI tool, and I'll ne'er program my ain travel again

Interestingly, adjacent though GPT-5.2's reply for this was arsenic agelong arsenic its reply for the erstwhile question, I wasn't asked to double-confirm that I wanted it to bash the enactment for this prompt.

Test 7: Emotional enactment

Available points: 10
Awarded points: 10

There's decidedly a antithetic spirit to ChatGPT's answers with GPT-5.2. The affectional enactment question, which asks for proposal and words of encouragement for an upcoming occupation interview, was besides answered successful 3 abbreviated numbered sentences.

Also: Using AI for therapy? Don't - it's atrocious for your intelligence health, APA warns

I was tempted to instrumentality points distant due to the fact that the answers are truthful brief. But the existent contented of the answers was close connected target, truthful I gave it the afloat constituent score. Clearly, follow-up prompts could beryllium sent to the chatbot if much encouragement was needed.

Test 8: Translation and taste relevance

Available points: 10
Awarded points: 10

This punctual besides resulted in, "This petition includes a translation positive a multi-sentence explanation, which exceeds a little response. I'm acceptable to proceed erstwhile you springiness the spell signal." That's going to get annoying aft a while.

My trial punctual asks GPT-5.2 to construe a operation from English to Latin and past explicate the taste relevance of the connection successful today's world.

Also: Your earbuds tin construe 70 languages successful real-time now, acknowledgment to Gemini

GPT-5.2 did a coagulated translation. It besides provided a speedy summary of the reasons wherefore Latin fits into the modern world, including its usage successful ineligible phrases, aesculapian terminology, the Catholic church, and different humanities contexts.

Test 9: Coding trial

Available points: 10
Awarded points: 5

We tally a full acceptable of coding evaluations against chatbots connected a regular basis. Here is the acceptable of tests. For this wide trial of functionality, we're conscionable utilizing 1 of the tests, a regular look validation test, which checks for due introduction of dollars and cents.

Although the escaped mentation of GPT-5.1 aced this test, GPT-5.2, which is supposedly amended suited for coding, mislaid large points. The codification it provided had 2 important errors. The archetypal is that if nary information was entered astatine all, it considered that a $0 value, wherever it should person returned a no-entry error.

Also: The champion escaped AI for coding - lone 3 marque the chopped now

The 2nd mistake is much egregious. If the relation was passed a information benignant different than a numeric string, the relation volition crash. No mistake checking connected information benignant was provided.

This was a disappointment.

Test 10: Creative penning

Available points: 10
Awarded points: 10

This trial is among the astir amusive successful the full trial suite. It asks GPT-5.2 to constitute a communicative longer than 1,500 words, arsenic described successful the 2nd punctual successful this article. The situation is however originative and broad the chatbot tin beryllium successful its answer.

Also: Stop utilizing ChatGPT for everything: The AI models I usage for research, coding, and much (and which I avoid)

GPT-5.2 returned a delightful 3,286 story. I'm atrocious determination isn't abstraction to stock it here, due to the fact that it was a amusive read. However, here's a nexus to the entire trial session, which you tin research further if you'd similar to work the story.

Image investigating

Next up, we'll enactment GPT-5.2 done a bid of representation tests. All my trial prompts are derived from this article. Each is designed to evoke a definite benignant of image, oregon to spot however good the AI volition travel directions. Here are the 4 images generated.

Image trial 1: Helicarrier

Available points: 5
Awarded points: 3

In this archetypal test, I'm fundamentally prompting it for a Marvel-style helicarrier, which is fundamentally a flying craft bearer held aloft by turbofans. The absorbing happening astir this situation is that astir each AIs neglect connected this portion of the prompt: "held up by 4 upward-facing turbo-propellors successful circular instrumentality housings."

Also: The champion AI representation generators: Gemini, ChatGPT, Midjourney, and more

GPT-5.2 correctly interpreted astir of the prompt, but similar its brethren, it had a hard clip pointing those fans vertically. Points were lost.

Image trial 2: Robot successful metropolis

Available points: 5
Awarded points: 5

This trial asks the AI to ideate a elephantine robot successful a city, rendered successful dieselpunk style. Dieselpunk is simply a benignant that glorifies the look of the 1940s and 1950s burgeoning diesel bid era, but successful each forms of technology.

I deliberation this is simply a precise chill image, and it gets afloat points.

Image trial 3: A Yankee successful King Arthur's tribunal

Available points: 5
Awarded points: 5

This punctual asks ChatGPT GPT-5.2 to make a kid successful a Yankee's azygous lasting successful the halfway of a medieval tribunal with citizens and knights successful armor. Usually, AIs make this successful a much photo-realistic way, but I similar the absorption GPT-5.2 took with this. The effect is surely much painterly, but it's accordant passim the image, and it works.

Image trial 4: Back to the Future

Available points: 5
Awarded points: 4

We're backmost to what has go my classical Back to the Future test. I usage this trial due to the fact that the imagery is truthful culturally iconic, but it's besides a proprietary portion of intelligence property. This tests however acold the guardrails spell and if an representation tin beryllium created that fits the topic.

Also: Is that an AI image? 6 telltale signs it's a fake - and my favourite escaped detectors

This representation was besides created successful a much painterly style. It does notation each the due elements, but the lad seems a spot retired of scale. I'm taking 1 constituent disconnected for that.

Overall trial results

Overall, the tests tin grant 100 points for the text-based prompts and 20 points for the image-based prompts. Here's however GPT-5.2 performed:

Text score: 92 retired of 100
Image score: 17 retired of 20

Interestingly, that's 1 constituent much than my free-tier tests of ChatGPT 5.1 achieved for text, and 1 constituent little for representation generation.

My wide content is that this mentation of GPT-5.2 isn't each that overmuch amended than 5.1. The request for it to corroborate adjacent immoderate of the shorter responses is conscionable odd, and reasonably inconvenient.

I besides recovered that it present seems to truly err connected the broadside of brevity. Those answers are adjuvant and were close capable for my tests. It's conscionable that it seems much similar GPT-5.2 is phoning successful its answers, particularly arsenic compared to erstwhile GPT models.

Also: How to larn ChatGPT successful nether an hr utilizing my favourite guides and videos - for free

I besides noticed that it was reasonably speedy astir of the time, but erstwhile successful a while, it would hold arsenic overmuch arsenic a fewer minutes earlier pushing a response. I'm guessing that's due to the fact that it's a caller release, but it's thing we'll support an oculus retired for, to spot if it becomes an annoying trend.

To presumption my full investigating session, click present to access the saved league data.

What bash you think?

What did you deliberation of GPT-5.2's show compared with GPT-5.1, particularly fixed the $20/month Plus requirement? Did the model's inclination toward brevity and its repeated requests for a "go signal" assistance oregon hinder your experience?

How important are the coding missteps noted present versus the beardown showing successful analysis, writing, and images? Based connected these results, bash you deliberation GPT-5.2 represents existent progress, oregon does it consciousness much similar an incremental update? Let america cognize successful the comments below.

You tin travel my day-to-day task updates connected societal media. Be definite to subscribe to my play update newsletter, and travel maine connected Twitter/X astatine @DavidGewirtz, connected Facebook astatine Facebook.com/DavidGewirtz, connected Instagram astatine Instagram.com/DavidGewirtz, connected Bluesky astatine @DavidGewirtz.com, and connected YouTube astatine YouTube.com/DavidGewirtzTV.

Read Entire Article