Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.

Also includes outtakes on the ‘reasoning’ models.

    • Zos_Kia@jlai.lu
      link
      fedilink
      English
      arrow-up
      1
      ·
      16 hours ago

      I’m sorry but no, models are definitely not collapsing. They still have a million issues and are subject to a variety of local optima, but they are not collapsing in any way. It is not known whether this can even happen in large models, and if it can it would require months of active effort to generate the toxic data and fine-tune models on that data. Nobody is gonna spend that kind of money to shoot themselves in the foot.

      • kescusay@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 hours ago

        Then why are newer versions of the major models performing so poorly? For instance, GPT 5.2 is definitely not an improvement over 4.5. What’s the root cause?