It might be saturated for smaller scopes of work, but it’s not hard to see the cracks when you scale up what you ask of SOTA models/agents.
One example, to try and single shot prompt coding a ChatGPT equivalent chatbot.
Sure it will spit something out, but the feature depth, UX subtitles, backend integration, and lots of pragmatic engineering decisions along the way will just not be baked.
Another example is building a C compiler from scratch which Anthropic showed is still a struggle to do.
Not that these these specific examples are important but just to point out scaling up expectations shows the cracks.
It’s not just a model problem of course, better agents, orchestration features (like Dynamic Workflows mentioned in the post), all need to continue to evolve.
Ar what point does my CS degree become totally useless is an open question.
This is the kind of design wisdom that’s both true and difficult to win an argument over.
It reminds me of arguments related to over-engineering and complexity. The principles are super important to having a codebase that scales and continues to be efficient to work in as the team grows, but they are hard to objectively measure.
Locally or in isolation something may sound like a great idea. Being able to step back and see the greater ripple effects require some experience and intuition that can’t always be used to convince people otherwise.
It’s a bizarre feeling isn’t it? Sorry you’re having to defend the act of thinking.
The problem is you can’t defend it right? Someone could say your evidence came from a prompt:
“Take this article and reverse engineer a hypothetical unpolished first draft written in a mix of Russian and English”
I’m not sure what the right answer is here. Fwiw I have no doubt you wrote it unassisted.
I've seen many people on reddit use AIs to translate their text. Given that it clearly puts the "default AI voice" on top of their text, it makes me think that it is a fairly inaccurate translation. I suspect something like Google Translate is still better for most people, because it seems to do better at maintaining the voice. Of course in the limit, what I'm calling "voice" simply can't be translated between languages, but you can certainly do much better than slamming "default AI voice" on top of people's writing. I'm sure under the hood Google Translate is a whole bunch of LLMs too now, but special-purpose translation LLMs without the agent refinement can do a lot better. It's unfortunate that people think this is an easy way to translate but the chatbot LLMs, while capable of understanding multiple languages and superficially translating them, probably shouldn't be used for this purpose.
It may be possible to prompt the chatbots to also use a certain style in the target language to get it out , but I'm not fluent enough in a second language to know if it worked and I'm yet to see any of the several people I've suggested this to try it, so I'd be interested if anyone knows if this works.
In my experience, Google Translate is still so much worse than even free ChatGPT at translating that it is unusable for anything you want to put out and have it seem at least somewhat professional.
Especially the voice, ChatGPT seems to infer the formality and overall tone much better than Google Translate. YMMV.
Bummer. I'm sure someone could build a fantastic translator with our current AI stack but I can't argue with them that the return would not be worth it compared to training another general-purpose AI. But the general-purpose AIs impose way too much of their own voice on the translated results.
Chain of trust from RFID chips embedded in their fingertips that authenticated to their keyboard, proving that at least their fingers grazed the keys that formed the message.
But what if they're reading off of a pre-written message?
The other day I was criticized for posting a comment people thought was AI but was actually not.
I’m starting to notice that more often with others as well. Happens sometimes to those who were always using emdash, sometimes to those who happen to have traits that these machines themselves learned from how to write, and now they sound suspicious.
I don’t think this means we should never call out slop or lazy writing, but it does seem our ability to detect this stuff is on a spectrum. Some of it is obvious. But beyond a certain point, for example with this article, the signals can become too weak to make any strong claims.
It’s disconcerting to admit that we’ve come to a point where it’s possible to be completely fooled one way or the other by what’s human or AI. Lots of stuff we can still detect, and sometimes it’s obvious, but at the margins we can no longer reliably discriminate.
Wow, this is rough. Gemini Cli was already losing and it’s now being replaced by something they’re saying doesn’t yet have feature parity. Doesn’t seem likely to inspire defections from competitors.
One could argue coding is only a use case and that their models are still killing it overall. However agents are strategic across the board and coding agents are at the forefront. They’ve already lead to new products like CoWork and it’s easy to understand why Google should be doing everything possible to catch up.
Surprised they’re not trying to entice developers away with more heavily subsidized subscription plans. Maybe it’s because as some say those days are ending and soon we’ll all be paying per token. Or maybe it would just put too much of a strain on available compute.
Say what you want about Cursor but they don’t lack for ambition.
Forking VS Code, going big on bleeding edge features like cloud agents, and now they’ve thrown down the gauntlet directly challenging frontier labs by training their own model (“much larger” than Kimi 2.5’s 1T parameters) from scratch.
They’ve been highly successful so far. Raised $50B, $2B in revenue, forecast to end 2026 above $6B. But even at these heights, they’re just not in the same league as OpenAI/Anthropic/Google.
And if building a state of the art multitrillion parameter model is not challenging enough, it’s a mountain you don’t climb just once. Every few months you need to push it farther with a new release. Fall off for a couple cycles and like Facebook you may never catch up again.
It is most likely AI generated with a nice "Raised $50B" hallucination and filled with cliches ("thrown down the gauntlet", "mountain you don’t climb just once", "not for the faint of heart").
EDIT: As others have pointed out, the comment above contains hallucinations (Like the $50 billion number) and a lot of AI tells. The account doesn’t have a history of AI-like comments but the hallucinations and structure in this one are suspicious. If anything, don’t trust the numbers it cites because they’re made up.
Cursor is a team that I want to see succeed. They have stacked their company with very smart people and they’re going hard at a highly competitive market. We all win when there is more competition and more innovation.
My problem is that every few months I look at Cursor’s product offerings and maybe retry it, but it never feels like something I want to use. Part is personal preference, the other part is the fact that my combination of other tools and services just does a better job. Their biggest advantage felt like first-mover advantage when they came out early and captured market share, but at in person meetups I hear stories about companies switching away from Cursor or trying to convince their management to let them switch away. They need to come up with a compelling advantage fast, which is a hard thing to do against the other companies with their virtually unlimited budgets by comparison.
1. Evidently you’re no longer able to distinguish AI from people as the whole comment was written by a human off the cuff.
2. The numbers are not hallucinations. It’s word on the street reporting, so yes it’s speculative, but a model did not make up it up unless that’s where TechCrunch got it which is not on me.
Same, I kick the tires on Cursor every several weeks wanting to find they've finally crossed some chasm I can't quite explain. But every time, I bounce off the ground-truth that they're forked off vscode, which just isn't for me. I think moving agents to the center of their experience and developing a model that focuses on speed/efficiency over maximum depth is a promising step away from being a spicy vscode fork.
My company is heavy on Cursor and I still ask them to provide me GitHub Copilot, for the sole reason that Cursor is probably the reason Microsoft had to implement technical enforcement of their TOS on proprietary plugins. Previously, you could use PyLance on VSCodium but now those plugins do not work outside VSCode anymore.
If Cursor (and every other commercial VSCode forks) didn't use MS extension store in the beginning and violate the TOS these might not have happened.
Yeah I want them to do well. I find Cursor to be a much better tool for actually working with the code the agent writes than whatever the big vendors provide.
> now they’ve thrown down the gauntlet directly challenging frontier labs by training their own model (“much larger” than Kimi 2.5’s 1T parameters) from scratch.
To clarify, the model Composer 2.5 announced in this post is not that; it uses Kimi 2.5 as a strong starting point. This is not to discount Cursor's work or future ambitions, but one of the most striking things about the last 6 months is that multiple open-source models/labs are now within striking distance of the frontier closed-sourced labs.
They have no choice but to train their own model to try and survive. They're paying API pricing for the top tier models but competing against subsidized subscriptions.
Them raising this much money doesn't mean they're successful, it only means they know how to fool the investors well. A project that is basically an extension to VSCode only adding a chat interface, isn't really worth this much money. Obviously, it's the users, but people think it's something genius and revolutionary, but no.
Less hot air and more substance please. It’s easy to deconstruct a company as an arm chair quarterback. It’s much harder to build a viable one. Until you have something constructive, kick rocks. Hot air is boring.
I realize you’re a troll account but at least be a fun troll.
I think that the product is easy to build, that's what I think because in my gathered experience it's easy. What more do you want?
This is the last time I'm responding. Good luck on whatever journey you're on. I'm sure it's an interesting journey since you've realizations over troll accounts, very interesting.
As a heavy user, I don't think the model is their product. Cursor is primarily a harness and lately, a specialized agent dashboard.
Composer, their in house model, is dispatched by other models like Claude Opus for individual items on a task list. No one is suggesting you write your main prompt to Composer 2.
they aren't "throwing down the gauntlet", they're trying to find ways to eke margin out of their product by owning a commodity-level coding model. it's an impressive engineering task but it's not particularly ambitious.
Good point, could be a solid benchmark. Sites are adversarially built to resist automation and success is verifiable later when records actually disappear, so harder to game than WebArena.
It might be saturated for smaller scopes of work, but it’s not hard to see the cracks when you scale up what you ask of SOTA models/agents.
One example, to try and single shot prompt coding a ChatGPT equivalent chatbot.
Sure it will spit something out, but the feature depth, UX subtitles, backend integration, and lots of pragmatic engineering decisions along the way will just not be baked.
Another example is building a C compiler from scratch which Anthropic showed is still a struggle to do.
Not that these these specific examples are important but just to point out scaling up expectations shows the cracks.
It’s not just a model problem of course, better agents, orchestration features (like Dynamic Workflows mentioned in the post), all need to continue to evolve.
Ar what point does my CS degree become totally useless is an open question.
reply