When tested with a classic psychological assessment, advanced AI models experienced a total breakdown in focus. A new PNAS Nexus study suggests these systems lack the human-like executive control necessary to override automatic responses and maintain complex goals.
Afaik that is handled through tool use in modern models (ie they didn’t learn to do maths, they learnt to use a calculator), assuming that’s true and I haven’t missed some advance, their conclusions are likely still relevant
Edit: though the article does seem to discard the chain of thought techniques a little readily, feels like they could come close to fitting the role of executive control, but perhaps that’s just the article lacking detail from the original work.
My high school math teachers would be so disappointed in them.
If I could wire a calculator into my brain I would have cheated on all the maths tests tbf
This was surprisingly hard to find in an easily shareable form.
What I see in the modern models is that you can often ask them to write a program or script to do a task and they can do that successfully much better than doing the task itself directly - once they have debugged the program it is usually 100% reliable for the specified tasks. Ask them to do those simple tasks directly and you get all kinds of creatively wrong answers.