There wasn't a post yet on this somehow, but Google's latest AI is reportedly quite a bit ahead of others in performance:
https://www.thealgorithmicbridge.com/p/google-gemini-3-just-killed-every
There wasn't a post yet on this somehow, but Google's latest AI is reportedly quite a bit ahead of others in performance:
https://www.thealgorithmicbridge.com/p/google-gemini-3-just-killed-every
Comments
Second, there are so many metrics that the companies can just pick the few that their models excel at and claim they are the best. Regardless, they can game the test by training the model to do well specifically on certain metrics.
Lastly, since we are in an AI bubble, the media is in a frenzy over every minor, incremental improvement. If one model is 10% better than the others, it is a revolution and AGI is around the corner. But we were promised exponential scaling of performance and since GPT 4, all we've gotten is AI video generators and higher hallucination rates.
I am not sure any of the is is true. I predict that with at LANL the opposite will will happen. I am not saying that AI will not be super intelligent and could do the job of 5 people. What will happen at the NNSA labs is someone will say "AI tell me how to make make more paper work, make more crazy rules, and absurd procedures and tell us how we can justify hiring 5 more people for every one person we actually need" AI in all its power will figure out how to do it. In other words the NNSA labs will use AI to SUPERCHARGE inefficiencies, what use to take only week in paperwork will not take months, things that could be done in day or two will be three more weeks. AI will make paper work so insanely complex that the only way to solve these propels will be to use another AI! In the end we will be 10 times slower thanks to AI, but at the same time become twice as fast with AI, and will be spun as great success.
I am serious. In the past two to three years I have seen the inefficiencies actually grow, things are slower and far more inefficient, than before. Some claim this is due to a backlash against DOGE, or Covid I am not sure.
"That thing is scary. While it isn’t 100% accurate, it has amazing insights into complex problems."
I hear this every once in while but (1) I have yet to hear it from someone I think is very smart or an actual accomplished scientist. (2) I have yet to hear this from anyone who can actually write very well. (3) I have yet to hear this from anyone who is an expert coder. I know lots of people of have played with AI but stopped after a couple of weeks as it was simply not fast enough, good enough, or ran into major problems. They all stoped using it or only use for super simple stuff. You cannot trust it for real code, it seems to fine for trivial stuff. In fact even 5 years ago we had some AI that did this but was not user friendly, now it seems to be user friendly so my hairdresser aunt can use it and claims it is the greatest thing ever. Also if you look at Nature, Science, Nature family and Physical Review you will see less than 5% of the papers use AI. Oddly is seems to be used 10 time more for low impact or bad journals but you do not see all that much for upper level journals in terms of techniques or other use. My academic colleagues also tell me that the the good students seem to never use it or use it only for spell checking but hate it for writing or projects. On the other hand the bad students only use AI. The papers are all boring, the same and simple minded. For coding the same thing happens. The good students can come up with new stuff, while the bad students all have the exact same AI code that does not work that well. If you if ask them to improve it they cannot do it, unless the AI does it. I have a few summer students use and it is pure crap.
That being said I have used and even have a few publications using AI technique. The truth is that at least for me it does not add much. I suppose if you had no clue about statistics, mathematics, or correlations the AI is better than nothing but so far I have gotten nothing all that interesting out of it expect for funding. I even got some fairly big "highlights" out of our work, but it is more like "see AI can do bad to an ok job at reproducing some known results and pretty bad job at finding new results". We of course do not say it that way more like "AI can get 90% of the previously known solutions 10 times faster than before and finds tantalizing hints at things no one has ever dreamed of!"
To be fair looking at other papers on AI in my field they are all doing the same thing. If there is money you can always apply the method get a paper and a highlight.
Also to be fair the methods that won the noble prize last year in "AI" are interesting but the dirty secret is that anyone in the know, knows these are not the a LLM AI methods so hyped up by evervody but far older ML type methods that no one would call AI today.
Like everyone else I own a ton of stock in the big 7 companies, so I want to the hype train to keep going so AI is pretty amazing for me in terms of money.
In other words, it could be that both insights are right, the "amazing insights" claim as well as being useless to produce original content, especially by someone who does not already have expert level knowledge in the particular area of discussion, in which case it becomes a time-saving tool.
What I do not really understand well, is that AI at present seems to be capable of critiques more than contribution. Having a critique that projects are not worthwhile, does not really act to support any further work on the base of the pyramid.
However, the latest models from anthropic are great at software development, where there is a ton of training data and the opportunity for automated reinforcement learning with a compiler or interpreter in the loop.
I personally use AI to sketch out an idea and make a quick prototype, which has resulted in breakthroughs by using it as a kind of super capable tool. But after that step, it's necessary to do a manual implementation because there are usually errors and problems, even if the rough sketch is there.
But don't look to LLMs to take a 75 year old scientific field like actinide chemistry and discover a tiny incremental improvement to achieve 0.1% better results (an NNSA "breakthrough").