Apple says generative AI cannot think like a human - research paper pours cold water on reasoning models
https://www.tomshardware.com/tech-industry/artificial-intelligence/apple-says-generative-ai-cannot-think-like-a-human-research-paper-pours-cold-water-on-reasoning-models
Apple researchers discovered that LRMs perform differently depending on problem complexity. On simple tasks, standard LLMs, without explicit reasoning mechanisms, were more accurate and efficient and delivered better results with fewer compute resources. However, as problem complexity increased to a moderate level, models equipped with structured reasoning, like Chain-of-Thought prompting, gained the advantage and outperformed their non-reasoning counterparts. When the complexity grew further, both types of models failed completely: their accuracy dropped to zero regardless of the available compute resources. (Keep in mind that the the Claude 3.7 Sonnet Thinking and DeepSeek-R1 LRMs have limitations when it comes to their training.)
This is upset LANL managers who have an insane zeal for AI in hopes of getting rid of the scientists.
Comments
I guess AI can think like a simple animal in that it can mimic but it cannot think like a human. You may well have a point about certain humans. There seems to be some humans with no self-awareness or reasoning skills at all, they just do what they do. I have wondered if more intelligent people have fooled themselves into thinking that they "think". The question is is there some kind of real way to distinguish true thinking from just very complex mimicry. If the most intelligent humans are just like dogs but with more connections I think AI can eventually get there. However if there is some other aspect coming into play our current AI models will never really get there and we have to actually build artificial brains.
Radiology has embraced AI enthusiastically, and the labor force is growing nevertheless. The augmentation-not-automation effect of AI is despite the fact that AFAICT there is no identified "task" at which human radiologists beat AI. So maybe the "jobs are bundles of tasks" model in labor economics is incomplete. Paraphrasing something
@MelMitchell1
pointed out to me, if you define jobs in terms of tasks maybe you're actually defining away the most nuanced and hardest-to-automate aspects of jobs, which are at the boundaries between tasks.
Can you break up your own job into a set of well-defined tasks such that if each of them is automated, your job as a whole can be automated? I suspect most people will say no. But when we think about *other people's jobs* that we don't understand as well as our own, the task model seems plausible because we don't appreciate all the nuances.
If this is correct, it is irrelevant how good AI gets at task-based capability benchmarks. If you need to specify things precisely enough to be amenable to benchmarking, you will necessarily miss the fact that the lack of precise specification is often what makes jobs messy and complex in the first place. So benchmarks can tell us very little about automation vs augmentation.
* Hinton insists that he was directionally correct but merely wrong in terms of timing. This is a classic motte-and-bailey retreat of forecasters who get it wrong. It has the benefit of being unfalsifiable! It's always possible to claim that we simply haven't waited long enough for the claimed prediction to come true.
It is literally the opposite of what LANL management thinks. They are hoping
that AI can help get rid of scientists and techs. As someone pointed out there is this manic LANL zeal for AI that it will code better, be smatter, and will not question management like the scientists do. As many has said before LANL management has an uncanny way to to utterly overestimate and and underestimate AI at the same time. They never seem excited by the science that AI can do, just with the idea that it can do science and hence get rid of the workforce.