Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

fubarx@lemmy.world · 3 days ago

Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

rockSlayer@lemmy.blahaj.zone · 1 day ago

It’s interesting to see it build the context necessary to answer the question, but this seems to be a lot of text just to come up with a simple answer

Buffy@libretechni.ca · 22 hours ago

They’re showing the thinking the model did, the actual response is the sentence at the end.

Schadrach@lemmy.sdf.org · 1 day ago

The whole premise of deep think and similar in other models is to come up with an answer, then ask itself if the answer is right and how it could be wrong until the result is stable.

The seahorse emoji question is one that trips up a lot of models (it’s a Mandela effect thing where it doesn’t exist but lots of people remember it and as a consequence are firm that it’s real), I asked GLM 4.7 about it with deep think on and it wrote about two dozen paragraphs trying to think of everywhere a seahorse emoji could be hiding, if it was in a previous or upcoming standard, if maybe there was another emoji that might be mistaken for a seahorse, etc, etc. It eventually decided that it didn’t exist, double checked that it wasn’t missing anything, and gave an answer.

It was startlingly like flow.ofnconaciousness of someone experiencing the Mandela effect trying desperately to find evidence they were right, except it eventually gave up and realized the truth.

Pup Biru@aussie.zone · 23 hours ago

yeah i find the thinking fascinating with maths too… like LLMs are horrible at maths but so am i if i have to do it in my head… the way it breaks a problem down into tiny bits that is certainly in its training data, and then combine those bits is an impressive emergent behaviour imo given it’s just doing statistical next token

mirshafie@europe.pub · 20 hours ago

Your verbal faculties are bad at math. Other parts of your brain do calculations.

LLMs are a computer’s verbal faculties. But guess what, they’re just a really big calculator. So when LLMs realize that they’re doing a math problem and launch a calculator/equation solver, they’re not so bad after all.

Pup Biru@aussie.zone · 20 hours ago

that solver would be tool use though… i’m talking about just the “thinking” LLMs. it’s fascinating to read the thinking block, because it breaks the problem down into basic chunks, solves the basic chunks (which it would have been in its training data, so easy), and solves them with multiple methods and then compares to check itself

Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

Opper