Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.
Also includes outtakes on the ‘reasoning’ models.
Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.
Also includes outtakes on the ‘reasoning’ models.
So the chat bots getting it right consistently probably have it in their system prompt temporarily until they can be retrained with it incorporated into the training data. 😆
Edit:
Oh, I see the linked article is part of a marketing campaign to promote this company’s paid cloud service that has source available SDKs as a solution to the problem being outlined here:
I can see where this approach might be helpful, but why is it necessary to pay them per API call as opposed to using an open source solution that runs locally (aside from the fact that it’s better for their monetization this way)? Good chance they’re running it through yet another LLM and charging API fees to cover their inference costs with a profit. What happens when that LLM returns the wrong example?
There are models with open weights, and you can run those locally on your GPU. It can be a bit slower depending on model and GPU. For example, GLM has an open version, both full and pruned, but it’s not the newest version. A bunch of image generation models have local versions too.