Apple says generative AI cannot think like a human - research paper pours cold water on reasoning models
https://www.tomshardware.com/tech-industry/artificial-intelligence/apple-says-generative-ai-cannot-think-like-a-human-research-paper-pours-cold-water-on-reasoning-models
Apple researchers discovered that LRMs perform differently depending on problem complexity. On simple tasks, standard LLMs, without explicit reasoning mechanisms, were more accurate and efficient and delivered better results with fewer compute resources. However, as problem complexity increased to a moderate level, models equipped with structured reasoning, like Chain-of-Thought prompting, gained the advantage and outperformed their non-reasoning counterparts. When the complexity grew further, both types of models failed completely: their accuracy dropped to zero regardless of the available compute resources. (Keep in mind that the the Claude 3.7 Sonnet Thinking and DeepSeek-R1 LRMs have limitations when it comes to their training.)
This is upset LANL managers who have an insane zeal for AI in hopes of getting rid of the scientists.
7 comments:
There was a recent report from some LANL fellows about the AI. I heard management was very disappointed in the report in that it says pretty much what the paper from Apple said. Also LANL management is very big into the idea that LANL will like the next open AI place or something and AI will do the scientific work.
On an 14 hour flight I sat next to a college student who bought Wi-Fi to have Claude summarizes research papers into an essay which he then feeds into an “AI detection” website. He repeats this process with Claude over and over until the output clears the website’s detection.
AI can't think like a human? I've encountered people who can't think like a human.
AI can't think like a human? I've encountered people who can't think like a human.
I guess AI can think like a simple animal in that it can mimic but it cannot think like a human. You may well have a point about certain humans. There seems to be some humans with no self-awareness or reasoning skills at all, they just do what they do. I have wondered if more intelligent people have fooled themselves into thinking that they "think". The question is is there some kind of real way to distinguish true thinking from just very complex mimicry. If the most intelligent humans are just like dogs but with more connections I think AI can eventually get there. However if there is some other aspect coming into play our current AI models will never really get there and we have to actually build artificial brains.
What choice do we have? Its impossible to get talented scientists to move to Los Alamos to spend 90% of their time filling out paperwork, writing internal funding proposals--and then watching their program fail as it gets stuck in limbo waiting four years for LANL to install an electrical outlet in a lab to plug in a piece of equipment. Most sane people would say f*ck it, and get a job anywhere else. LANL might as well cash its chips in on ChatGPT. It's better than a bunch of third rate people with degrees from ITT Tech putting organic kitty litter in the nuke waste drums.
I find the story of AI and radiology fascinating. Of course, Hinton's prediction was wrong* and tech advances don't automatically and straightforwardly cause job replacement — that's not the interesting part.
Radiology has embraced AI enthusiastically, and the labor force is growing nevertheless. The augmentation-not-automation effect of AI is despite the fact that AFAICT there is no identified "task" at which human radiologists beat AI. So maybe the "jobs are bundles of tasks" model in labor economics is incomplete. Paraphrasing something
@MelMitchell1
pointed out to me, if you define jobs in terms of tasks maybe you're actually defining away the most nuanced and hardest-to-automate aspects of jobs, which are at the boundaries between tasks.
Can you break up your own job into a set of well-defined tasks such that if each of them is automated, your job as a whole can be automated? I suspect most people will say no. But when we think about *other people's jobs* that we don't understand as well as our own, the task model seems plausible because we don't appreciate all the nuances.
If this is correct, it is irrelevant how good AI gets at task-based capability benchmarks. If you need to specify things precisely enough to be amenable to benchmarking, you will necessarily miss the fact that the lack of precise specification is often what makes jobs messy and complex in the first place. So benchmarks can tell us very little about automation vs augmentation.
* Hinton insists that he was directionally correct but merely wrong in terms of timing. This is a classic motte-and-bailey retreat of forecasters who get it wrong. It has the benefit of being unfalsifiable! It's always possible to claim that we simply haven't waited long enough for the claimed prediction to come true.
AI will likely increase the number of workers in many areas. It will be allow for new techniques in certain fields so that there can be new directions or it can create new fields altogether, both which would lead to more oppurtunities for people not less.
It is literally the opposite of what LANL management thinks. They are hoping
that AI can help get rid of scientists and techs. As someone pointed out there is this manic LANL zeal for AI that it will code better, be smatter, and will not question management like the scientists do. As many has said before LANL management has an uncanny way to to utterly overestimate and and underestimate AI at the same time. They never seem excited by the science that AI can do, just with the idea that it can do science and hence get rid of the workforce.
Post a Comment