In the years since ChatGPT’s launch in late 2022, it’s been hard not to get swept up in feelings of euphoria or dread about the looming impacts of generative AI. This reaction has been fueled, in part, by the confident declarations of tech CEOs, who have veered toward increasingly bombastic rhetoric.
“AI is starting to get better than humans at almost all intellectual tasks,” Anthropic CEO Dario Amodei recently told Anderson Cooper. He added that half of entry-level white collar jobs might be “wiped out” in the next one to five years, creating unemployment levels as high as 20%—a peak last seen during the Great Depression.
Meanwhile, OpenAI’s Sam Altman said that AI can now rival the abilities of a job seeker with a PhD, leading one publication to plaintively ask, “So what’s left for grads?”
Not to be outdone, Mark Zuckerberg claimed that superintelligence is “now in sight.” (His shareholders hope he’s right, as he’s reportedly offering compensation packages worth up to $300 million to lure top AI talent to Meta.)
But then, two weeks ago, OpenAI finally released its long-awaited GPT-5, a large language model that many had hoped would offer leaps in capabilities, comparable to the head-turning advancements introduced by previous major releases, such as GPT-3 and GPT-4. But the resulting product seemed to be just fine.
GPT-5 was marginally better than previous models in certain use cases, but worse in others. It had some nice new usability updates, but others that some found annoying. (Within days, more than 4,000 ChatGPT users signed a change.org petition asking OpenAI to make their previous model, GPT-4o, available again, as they preferred it to the new release.) An early YouTube reviewer concluded that GPT-5 was a product that “was hard to complain about,” which is the type of thing you’d say about the iPhone 16, not a generation-defining technology. AI commentator Gary Marcus, who had been predicting this outcome for years, summed up his early impressions succinctly when he called GPT-5 “overdue, overhyped, and underwhelming.”
This all points to a critical question that, until recently, few would have considered: Is it possible that the AI we are currently using is basically as good as it’s going to be for a while?
In my most recent article for The New Yorker, which came out last week, I sought to answer this question. In doing so, I ended up reporting on a technical narrative that’s not widely understood outside of the AI community. The breakthrough performance of the GPT-3 and GPT-4 language models was due to improvements in a process called pretraining, in which a model digests an astonishingly large amount of text, effectively teaching itself to become smarter. Both of these models’ acclaimed improvements were caused by increasing their size as well as the amount of text on which they were pretrained.
At some point after GPT-4’s release, however, the AI companies began to realize that this approach was no longer as effective as it once was. They continued to scale up model size and training intensity, but saw diminishing returns in capability gains.
In response, starting around last fall, these companies turned their attention to post-training techniques, a form of training that takes a model that has already been pretrained and then refines it to do better on specific types of tasks. This allowed AI companies to continue to report progress on their products’ capabilities, but these new improvements were now much more focused than before.
Here’s how I explained this shift in my article:
“A useful metaphor here is a car. Pre-training can be said to produce the vehicle; post-training soups it up. [AI researchers had] predicted that as you expand the pre-training process you increase the power of the cars you produce; if GPT-3 was a sedan, GPT-4 was a sports car. Once this progression faltered, however, the industry turned its attention to helping the cars that they’d already built to perform better.”
The result was a confusing series of inscrutably named models—o1, o3-mini, o3-mini-high, -4-mini-high—each with bespoke post-training upgrades. These models boasted widely-publicized increases on specific benchmarks, but no longer the large leaps in practical capabilities we once expected. “I don’t hear a lot of companies using AI saying that 2025 models are a lot more useful to them than 2024 models, even though the 2025 models perform better on benchmarks,” Gary Marcus told me.
The post-training approach, it seems, can lead to incrementally better products, but not the continued large leaps in ability that would be necessary to fulfill the tech CEO’s more outlandish predictions.
None of this, of course, implies that generative AI tools are worthless. They can be very cool, especially when used to help with computer programming (though maybe not as much as some thought), or to conduct smart searches, or to power custom tools for making sense of large quantities of text. But this paints a very different picture from one in which AI is “better than humans at almost all intellectual tasks.”
For more details on this narrative, including a concrete prediction for what to actually expect from this technology in the near future, read the full article. But in the meantime, I think it’s safe, at least for now, to turn your attention away from the tech titans’ increasingly hyperbolic claims and focus instead on things that matter more in your life.
I have a PhD in a tiny corner of a hyperspecialised field, and ChatGPT is terrible at pretty much everything I have trained and learned to do, leading me to be very suspicious of its performance in all but the most basic of tasks. I have found a few genuinely helpful use cases, e.g. troubleshooting simple code (which I am terrible at and hate) or checking for basic spelling and grammar errors when I have to write texts in other languages. (It is *ok* at translation, but it sometimes wildly misjudges register.)
My bigger worries are that the internet is going to become even more horrible to use, that AI will employ the capabilities it does have to seriously harm people, and that chatbots are going to invade education in ways that will be really hard to extract ourselves from.
I agree with you. In my tech area, which is not even obscure or too heavily specialized, AI is usually terrible. All it does is repeat some of the falsehoods of the internet or mention deprecated libraries and features. The worst part is that it does so with incredible confidence. When you tell it it’s wrong, it answers, “oh yes, sorry, you’re right” then keeps “hallucinating” which is just a fancy term used when AI starts lying and making things up.
I mean, it could certainly replace the current president of the US (behaves the same) but for real qualified jobs, I’m not buying it.
I also have the same experience as a Software Engineer,
If you are focusing on a narrow branch of something as a specialist, probably these LLM’s are not helpful. Let’s say if you want to create some frontend app, and specifically using React(which probably one of the most used frontend libraries), it is capable of producing some dirty code(like the majority of code existing on the internet), but if you want to use a less common framework or library, it does hallucination and you have re-write everything.
In my opinion, at the moment, AI is used most for generating garbage posts on LinkedIn, random youtube shorts and Tik Tok videos with an imaginary story in the background.
Search engines should also find a way to filter these AI generated contents, it is becoming impossible to find authentic and valuable content.
I also have the same experience as a Software Engineer,
If you are focusing on a narrow branch of something as a specialist, probably these LLM’s are not helpful. Let’s say if you want to create some frontend app, and specifically using React(which probably one of the most used frontend libraries), it is capable of producing some dirty code(like the majority of code existing on the internet), but if you want to use a less common framework or library, it does hallucination and you have re-write everything.
In my opinion, at the moment, AI is used most for generating garbage posts on LinkedIn, random youtube shorts and Tik Tok videos with an imaginary story in the background.
Search engines should also find a way to filter these AI generated contents, it is becoming impossible to find authentic and valuable content.
If you’re limiting yourself and your analysis to openAI then you’re damaging your own credibility.
OpenAI is the only thing we are allowed to use for the work tasks I am referring to, because my employer has a subscription (4.1, 4o) and it allegedly does not retain our data, which are often sensitive. That is why I referred to ChatGPT specifically. But when I have personally experimented with other openly available systems, they have generally been the same or worse. And AI-generated google search results are definitely unreliable. I don’t claim any expertise outside my own field (and barely any within it!), and maybe I’m simply bad at generating prompts, but I’m just reporting my experiences.
I’m not talking about bespoke AI models, which I know can be useful for dealing with specific tasks, as Cal mentions above. Unless I have a specific task where AI is needed AND a team to help me set it up, I’m not going to waste my time, which is better spent reading, writing, thinking, and collaborating with humans (and, I suppose, commenting on this blog…).
I also edit a journal where we are getting more and more submissions of AI-generated articles. When the submissions fall within my own areas of expertise, it very quickly becomes clear that they are complete rubbish. But when the topics are outside those areas, it becomes more difficult to discern their quality – they often seem good at a glance, and we wind up wasting time on them, although the writing style is usually a tell that it’s not human-made. I take this as another signal that we need to be VERY careful using generative AI for tasks that we don’t understand and can’t independently verify.
You are in for a treat then. Ask your typical difficult task the latest ChatGPT (5 Thinking) – you’ll see a huge improvement. If you go with the $200 per month plan (GPT-5 Pro) – you’ll be blown away.
Well, I tried “5 Thinking” with a work-related puzzle I’ve been trying to figure out, and its answer, while possibly better than what 4.1 would’ve said, was a lot of confident BS. So I’m not yet blown away. But as I said, what I work on is obscure, and the model doesn’t have access to all the data I have access to.
It finally said “you’re right to push on this”, apologized, and provided me with some fake references I could look to for further information. (And I know they are fake, because the person they cited is the very person who posed the question to me in the first place.)
I’m sure this model is great at many tasks, and better than me at most, but I’m not ready to trust it just yet.
Honestly, if AI capabilities level out at this point for a while, this would be the ideal scenario. They are already very useful in their own way and its going to take a while for our society to come to terms with how to best use this new technology. If it can be mundanely useful without post major existential risks, then that is the most positive result.
AI capabilities can level out but AI use cases will continue to evolve.
We haven’t utilized the current AI technology for what it’s capable of, because we don’t know what to do with it.
It’s kind of like saying electricity is only electricity. True, but since discovering we could use it in appliances, we’ve found thousands of different use cases throughout the years.
So it doesn’t really matter if AI doesn’t improve that much past this point. It’s revolutionary enough to have more future use cases than we can fathom. Think of it like any other invention, the internet to serve as an example: a new internet-related business sprawls up numerous times every day.
I for one have been blown away by AI’s capabilities (I use paid models of OpenAI and Grok). The ‘reasoning’ models – O3, Grok-4 (incidentally from 2025) – are very smart, able to do incredibly deep search, and think logically.
I’ve been using these in my work for more than a year, seen large productivity gain.
Or maybe I’m just too dumb to figure out the stupidity in these models, we’ll see.
I understand that the reasoning and logical thinking are in fact not happening, as the model just predicts the next token(s) based on the one(s) before. But there is a point where the faking of the thinking is so good that it’s irrelevant whether the process has occurred or not. It might not be able to win a Fields Medal, but I know several software devs whose ability to actually reason about code is already shadowed by agentic AI. Yes, the reasoning is faked, but does that matter?
I think your title is extremely misleading. (Maybe you didn’t write your own title for the New Yorker, but you did for this post.)
Reasonable people should have no doubt whatsoever that AI will get much much much better than this. After all, human brains (and bodies, and civilizations) work by algorithms and science, not magic. And human civilizations can sure do lots of things that AIs cannot (yet). For example, humans autonomously invented language, science, technology, and an entire $80T global economy, entirely from scratch. Today’s autonomous AI agents sure can’t do that! But it’s clearly a possible thing for AI to be able to do—we have an existence proof.
There’s a possibility that LLMs specifically “won’t get much better than this”. But that’s a very different claim! “AI” is much broader than “LLMs”; LLMs didn’t even exist until 2018.