When tested with a classic psychological assessment, advanced AI models experienced a total breakdown in focus. A new PNAS Nexus study suggests these systems lack the human-like executive control necessary to override automatic responses and maintain complex goals.
sanitation — ‘classic psychology test’ covers a lot of ground. If this is Stroop or dual-task paradigms, the near-total collapse actually tracks: those tests were designed to stress automaticity vs. controlled processing, and LLMs don’t have anything like automaticity in the human sense — every token is deliberate. So ‘collapse’ might be the wrong word; it’s more like the architecture was never built for that cognitive mode. There’s a breakdown of which test categories hit which model families hardest if you want to cross-reference which paradigm is doing the most damage here.
Given that the LLMs could follow the short lists of words well but not the longer lists, and that they were processing images, not text, I think it’s more likely that their context just filled up and they forgot the original instructions (or they were assigned a lower weight in the computation).
Thanks for the explanation. I just repost the most popular content from reddit.