Hacker Newsnew | past | comments | ask | show | jobs | submit | WhitneyLand's commentslogin

“Maybe my own tastes are saturated now”

It might be saturated for smaller scopes of work, but it’s not hard to see the cracks when you scale up what you ask of SOTA models/agents.

One example, to try and single shot prompt coding a ChatGPT equivalent chatbot.

Sure it will spit something out, but the feature depth, UX subtitles, backend integration, and lots of pragmatic engineering decisions along the way will just not be baked.

Another example is building a C compiler from scratch which Anthropic showed is still a struggle to do.

Not that these these specific examples are important but just to point out scaling up expectations shows the cracks.

It’s not just a model problem of course, better agents, orchestration features (like Dynamic Workflows mentioned in the post), all need to continue to evolve.

Ar what point does my CS degree become totally useless is an open question.


> At what point does my CS degree become totally useless is an open question.

Why are you people saying all these things.

We'll probably see long-distance space travel long before a degree in generic problem identification and solving becomes totally useless.


Every STEM field regards itself as "generic problem identification and solving" though

And they're all correct in that assessment.

“Like meditation, journaling, and other contemplative practices”

The big difference is that meditation and journaling do not require a belief that you are communicating with supernatural beings.

“I don't think intelligence and spiritual practice are mutually exclusive.”

That’s a low bar. At the least we know supernatural/religious beliefs are negatively correlated with scientific training and scientific eminence.


So in other words if the research had tried to assign a severity to the mistakes models made the entire paper may collapse as uninteresting?

Agreed.

This is the kind of design wisdom that’s both true and difficult to win an argument over.

It reminds me of arguments related to over-engineering and complexity. The principles are super important to having a codebase that scales and continues to be efficient to work in as the team grows, but they are hard to objectively measure.

Locally or in isolation something may sound like a great idea. Being able to step back and see the greater ripple effects require some experience and intuition that can’t always be used to convince people otherwise.


It’s a bizarre feeling isn’t it? Sorry you’re having to defend the act of thinking.

The problem is you can’t defend it right? Someone could say your evidence came from a prompt: “Take this article and reverse engineer a hypothetical unpolished first draft written in a mix of Russian and English”

I’m not sure what the right answer is here. Fwiw I have no doubt you wrote it unassisted.


I've seen many people on reddit use AIs to translate their text. Given that it clearly puts the "default AI voice" on top of their text, it makes me think that it is a fairly inaccurate translation. I suspect something like Google Translate is still better for most people, because it seems to do better at maintaining the voice. Of course in the limit, what I'm calling "voice" simply can't be translated between languages, but you can certainly do much better than slamming "default AI voice" on top of people's writing. I'm sure under the hood Google Translate is a whole bunch of LLMs too now, but special-purpose translation LLMs without the agent refinement can do a lot better. It's unfortunate that people think this is an easy way to translate but the chatbot LLMs, while capable of understanding multiple languages and superficially translating them, probably shouldn't be used for this purpose.

It may be possible to prompt the chatbots to also use a certain style in the target language to get it out , but I'm not fluent enough in a second language to know if it worked and I'm yet to see any of the several people I've suggested this to try it, so I'd be interested if anyone knows if this works.


In my experience, Google Translate is still so much worse than even free ChatGPT at translating that it is unusable for anything you want to put out and have it seem at least somewhat professional.

Especially the voice, ChatGPT seems to infer the formality and overall tone much better than Google Translate. YMMV.


Bummer. I'm sure someone could build a fantastic translator with our current AI stack but I can't argue with them that the return would not be worth it compared to training another general-purpose AI. But the general-purpose AIs impose way too much of their own voice on the translated results.

Chain of trust from RFID chips embedded in their fingertips that authenticated to their keyboard, proving that at least their fingers grazed the keys that formed the message.

But what if they're reading off of a pre-written message?


And are those RFID chips firmware signed by a big tech overlord that we trust? And with kernel level anti-heat? Cause if not...

Are those RFID chips preventing me (or a physical robot) from typing generated text?

Proving a negative is nearly impossible. "Prove you didnt use ai"... its a common argument tactic used all the time.

But you don’t really know that do you?

The other day I was criticized for posting a comment people thought was AI but was actually not.

I’m starting to notice that more often with others as well. Happens sometimes to those who were always using emdash, sometimes to those who happen to have traits that these machines themselves learned from how to write, and now they sound suspicious.

I don’t think this means we should never call out slop or lazy writing, but it does seem our ability to detect this stuff is on a spectrum. Some of it is obvious. But beyond a certain point, for example with this article, the signals can become too weak to make any strong claims.

It’s disconcerting to admit that we’ve come to a point where it’s possible to be completely fooled one way or the other by what’s human or AI. Lots of stuff we can still detect, and sometimes it’s obvious, but at the margins we can no longer reliably discriminate.


Wow, this is rough. Gemini Cli was already losing and it’s now being replaced by something they’re saying doesn’t yet have feature parity. Doesn’t seem likely to inspire defections from competitors.

One could argue coding is only a use case and that their models are still killing it overall. However agents are strategic across the board and coding agents are at the forefront. They’ve already lead to new products like CoWork and it’s easy to understand why Google should be doing everything possible to catch up.

Surprised they’re not trying to entice developers away with more heavily subsidized subscription plans. Maybe it’s because as some say those days are ending and soon we’ll all be paying per token. Or maybe it would just put too much of a strain on available compute.


Their rationale might be that it’s size and intelligence are growing relative to the market.

Fwiw it’s beating Claude Sonnet in most benchmarking (benchmaxxing?), yet they’ve priced it almost half off on a per token basis.

Question is are you going to persuade anyone with this argument?

Are there many devs at Google who legit prefer Gemini over Claude and Codex? Would love to hear about that.


> Are there many devs at Google who legit prefer Gemini over Claude and Codex? Would love to hear about that.

A few weeks ago, Steve Yegge claimed he'd heard that Google employees are banned from using Claude & Codex.

https://x.com/Steve_Yegge/status/2046260541912707471

A number of Googlers replied to say that was totally false, including Demis Hassabis, but they were all on the DeepMind team.

https://x.com/demishassabis/status/2043867486320222333

This person here claims they left Google because of the ban, and because the ban applied outside of Google work as well:

https://x.com/mihaimaruseac/status/2046272726881693960


> and because the ban applied outside of Google work as well

I think false (or hasn't filtered to everyone lol)


Say what you want about Cursor but they don’t lack for ambition.

Forking VS Code, going big on bleeding edge features like cloud agents, and now they’ve thrown down the gauntlet directly challenging frontier labs by training their own model (“much larger” than Kimi 2.5’s 1T parameters) from scratch.

They’ve been highly successful so far. Raised $50B, $2B in revenue, forecast to end 2026 above $6B. But even at these heights, they’re just not in the same league as OpenAI/Anthropic/Google.

And if building a state of the art multitrillion parameter model is not challenging enough, it’s a mountain you don’t climb just once. Every few months you need to push it farther with a new release. Fall off for a couple cycles and like Facebook you may never catch up again.

Not for the faint of heart.


Why is this comment upvoted?

It is most likely AI generated with a nice "Raised $50B" hallucination and filled with cliches ("thrown down the gauntlet", "mountain you don’t climb just once", "not for the faint of heart").


Good catch. I didn’t even notice it at first, but the hallucinations on top of cliches gives it away.

The account doesn’t have a history of other comments that have too much of an AI vibe, but this one does. Even if it wasn’t AI, it’s misinformation.


Please see reply to your other comment on this thread.

I wrote this 100% off the top of my head on my phone while eating a sandwich.

Ffs.

edit: removed cursing you out. Sorry but this is frustrating. I don’t leave AI generated comments here (or anywhere else).


EDIT: As others have pointed out, the comment above contains hallucinations (Like the $50 billion number) and a lot of AI tells. The account doesn’t have a history of AI-like comments but the hallucinations and structure in this one are suspicious. If anything, don’t trust the numbers it cites because they’re made up.

Cursor is a team that I want to see succeed. They have stacked their company with very smart people and they’re going hard at a highly competitive market. We all win when there is more competition and more innovation.

My problem is that every few months I look at Cursor’s product offerings and maybe retry it, but it never feels like something I want to use. Part is personal preference, the other part is the fact that my combination of other tools and services just does a better job. Their biggest advantage felt like first-mover advantage when they came out early and captured market share, but at in person meetups I hear stories about companies switching away from Cursor or trying to convince their management to let them switch away. They need to come up with a compelling advantage fast, which is a hard thing to do against the other companies with their virtually unlimited budgets by comparison.


So, you’re wrong on two counts.

1. Evidently you’re no longer able to distinguish AI from people as the whole comment was written by a human off the cuff.

2. The numbers are not hallucinations. It’s word on the street reporting, so yes it’s speculative, but a model did not make up it up unless that’s where TechCrunch got it which is not on me.

https://techcrunch.com/2026/04/17/sources-cursor-in-talks-to...


Quoting directly from your comment:

> They’ve been highly successful so far. Raised $50B,

They have not raised $50B. The article you linked says they're raising $2B, not $50B.

The valuation is not the amount raised.


So I made a mistake reading the article? So what?

The point is you made two brigade style comments about my posts sounding suspiciously like an LLM and having hallucinations.

Neither turned out to be true and I think a better response would concede the point.

It may be more helpful for us to stick together as humans since we can’t always recognize each other so easily anymore.


What do you mean neither turned out to be true?

Your comment DOES sound like an LLM and it DOES have hallucinations!

Please make your humanness more recognizeable next time, don't waste readers time with posh fanboying and lazy fact checking.


Same, I kick the tires on Cursor every several weeks wanting to find they've finally crossed some chasm I can't quite explain. But every time, I bounce off the ground-truth that they're forked off vscode, which just isn't for me. I think moving agents to the center of their experience and developing a model that focuses on speed/efficiency over maximum depth is a promising step away from being a spicy vscode fork.

My company is heavy on Cursor and I still ask them to provide me GitHub Copilot, for the sole reason that Cursor is probably the reason Microsoft had to implement technical enforcement of their TOS on proprietary plugins. Previously, you could use PyLance on VSCodium but now those plugins do not work outside VSCode anymore.

If Cursor (and every other commercial VSCode forks) didn't use MS extension store in the beginning and violate the TOS these might not have happened.


Cursor 3 is a full rewrite. No VS Code

Yeah I want them to do well. I find Cursor to be a much better tool for actually working with the code the agent writes than whatever the big vendors provide.

> now they’ve thrown down the gauntlet directly challenging frontier labs by training their own model (“much larger” than Kimi 2.5’s 1T parameters) from scratch.

To clarify, the model Composer 2.5 announced in this post is not that; it uses Kimi 2.5 as a strong starting point. This is not to discount Cursor's work or future ambitions, but one of the most striking things about the last 6 months is that multiple open-source models/labs are now within striking distance of the frontier closed-sourced labs.

See eg Kimi 2.6 benchmarks: https://www.kimi.com/blog/kimi-k2-6


They have no choice but to train their own model to try and survive. They're paying API pricing for the top tier models but competing against subsidized subscriptions.

Them raising this much money doesn't mean they're successful, it only means they know how to fool the investors well. A project that is basically an extension to VSCode only adding a chat interface, isn't really worth this much money. Obviously, it's the users, but people think it's something genius and revolutionary, but no.

This is rsync all over again. Go create it yourself if you think it’s just a simple extension.

You're right, I regret I didn't have the sense to do the same as them at the time.

Nope you are blowing hot air. Take it elsewhere.

You can take yourself elsewhere. Good luck.

Less hot air and more substance please. It’s easy to deconstruct a company as an arm chair quarterback. It’s much harder to build a viable one. Until you have something constructive, kick rocks. Hot air is boring.

I realize you’re a troll account but at least be a fun troll.


I think that the product is easy to build, that's what I think because in my gathered experience it's easy. What more do you want?

This is the last time I'm responding. Good luck on whatever journey you're on. I'm sure it's an interesting journey since you've realizations over troll accounts, very interesting.


As a heavy user, I don't think the model is their product. Cursor is primarily a harness and lately, a specialized agent dashboard.

Composer, their in house model, is dispatched by other models like Claude Opus for individual items on a task list. No one is suggesting you write your main prompt to Composer 2.


they aren't "throwing down the gauntlet", they're trying to find ways to eke margin out of their product by owning a commodity-level coding model. it's an impressive engineering task but it's not particularly ambitious.

AI comment... BOO!

Could this task be a nice benchmark for computer use models?

Would interesting to see the success rate for Claude Cowork or Codex’s equivalent feature.


Good point, could be a solid benchmark. Sites are adversarially built to resist automation and success is verifiable later when records actually disappear, so harder to game than WebArena.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: