"Good enough" open weights models were "almost there" since 2022. I distrust the...

ivan_gammel · 2026-02-22T18:06:21 1771783581

The generation of frontier models from H1 2025 is the good enough benchmark.

ACCount37 · 2026-02-22T18:46:35 1771785995

Flash forward one year and it'll be H1 2026.

ivan_gammel · 2026-02-22T20:32:25 1771792345

I don’t see why. Today frontier models are already 2 generations ahead of good enough. For many users they did not offer substantial improvement, sometimes things got even worse. What is going to happen within 1 year that will make users desire something beyond already working solution? LLMs are reaching maturity faster than smartphones, which now are good enough to stay on the same model for at least 5-6 years.

ACCount37 · 2026-02-23T14:25:50 1771856750

Any considerable bump in model capability craters my willingness to tolerate the ineptitude of less capable models. And I'm far from being alone in this.

Ever wondered why those stupid "they secretly nerfed the model!" myths persist? Why users report that "model got dumber", even if benchmarks stay consistent, even if you're on the inference side yourself and know with certainty that they are actually being served the same inference over the same exact weights on the same hardware quantized the same way?

Because user demands rise over time, always.

Users get a new flashy model, and it impresses them. It can do things the old model couldn't. Then they push it, and learn its limitations and quirks as they use it. And then it feels like it "got dumber" - because they got more aggressive about using it, got better at spotting all the ways it was always dumb in.

It's a treadmill, and you pretty much have to keep improving the models just to stay ahead of user expectations.

ivan_gammel · 2026-02-24T12:03:07 1771934587

> users report that "model got dumber"

I have seen this with ChatGPT progression from 4o to 5.2 applied to the newest model. Old prompts stop working reliably, different hallucination modes etc.