Just this morning, I had Claude come up with a C++ solution that would have unde...

jsrozner · 2025-06-15T15:58:53 1750003133

This is exactly right. LLMs do not build appropriate world models. And no...python and JS have similar failure cases.

Still, sometimes it can solve a problem like magic. But since it does not have a world model it is very unreliable, and you need to be able to fall back to real intelligence (i.e., yourself).

rangestransform · 2025-06-15T16:12:08 1750003928

> assuming iterator stability in a vector that was being modified

This is the crux of an interview question I ask, and you’d be amazed how many experienced cpp devs require heavy hints to get it

unshavedyak · 2025-06-15T16:08:10 1750003690

I agree, but i think the thing we often miss in these discussions is how much LLMs have potential to be productivity multipliers.

Yea, they still need to improve a bit - but i suspect there will be a point at which individual devs could be getting 1.5x more work done in aggregate. So if everyone is doing that much more work, it has potential to "take the job" of someone else.

Yea, software is being needed more and more and more, so perhaps it'll just make us that much more dependent on devs and software. But i do think it's important to remember that productivity always has potential to replace devs, and LLMs imo have huge potential in productivity.

scuol · 2025-06-15T16:11:09 1750003869

Oh I agree it can be a multiplier for sure. I think it's not "AI will take your job" but rather "someone who uses AI well will take your job if you don't learn it".

At least for C++, I've found it does very mediocre at suggesting project code (because it has the tendency to drop in subtle bugs all over the place, you basically have to carefully review it instead of just writing it yourself), but asking things in copilot like "Is there any UB in this file?" (not that it will be perfect, but sometimes it'll point something out) or especially writing tests, I absolutely love it.

unshavedyak · 2025-06-15T16:16:28 1750004188

Yea i'm a big fan of using it in Rust for that same reason. I watch it work through compile errors constantly, i can't imagine what it would be like in JS or Python

skerit · 2025-06-15T16:53:05 1750006385

Sonnet or Opus? Well, I guess they both still can do that. But I'm just keeping on asking it to review all its code. To make sure it works. Eventually, it'll catch its errors.

Now this isn't a viable way of working if you're paying for this token-by-token, but with the Claude Code $200 plan ... this thing can work for the entire day, and you will get a benefit from it. But you will have to hold its hand quite a bit.

mistrial9 · 2025-06-15T15:57:42 1750003062

a difference emerges when an agent can run code and examine the results. Most platforms are very cautious about this extension. Recent MCP does define toolsets and can enable these feedback loops in a way that can be adopted by markets and software ecosystems.

phamilton · 2025-06-15T16:35:54 1750005354

(not trolling) Would that undefined behavior have occurred in idiomatic rust?

Will the ability to use AI to write such a solution correctly be enough motivation to push C++ shops to adopt rust? (Or perhaps a new language that caters to the blindspots of AI somehow)

There will absolutely be a tipping point where the potential benefits outweigh the costs of such a migration.

ddaud · 2025-06-15T15:56:29 1750002989

I agree. That mental model is precisely why I don’t use LLMs for programming.

fassssst · 2025-06-15T16:19:44 1750004384

It’s another league for JS and python, yes.

pepinator · 2025-06-15T16:05:47 1750003547

This is where one can notice that LLM are, after all, just stochastic parrots. If we don't have a reliable way to systematically test their outputs, I don't see many jobs being replaced by AI either.

mistrial9 · 2025-06-15T16:18:19 1750004299

> just stochastic parrots

this is flatly false for two reasons -- one is that all LLMs are not equal. The models and capacities are quite different, by design. Secondly a large number of standardized LLM testing, tests for sequence of logic or other "reasoning" capacity. Stating the fallacy of stochastic parrots is basically proof of not looking at the battery of standardized tests that are common in LLM development.

pepinator · 2025-06-15T16:31:46 1750005106

Even if not all LLMs are equal, almost all of them are based on the same base model: transformers. So the general idea is always the same: predict the next token. It becomes more obvious when you try to use LLMs to solve things that you can't find in internet (even if they're simple).

And the testing does not always work. You can be sure that only 80% of the time it will be really really correct, and that forces you to check everything. Of course, using LLMs makes you faster for some tasks, and the fact that they are able to do so much is super impressive, but that's it.

zozbot234 · 2025-06-15T16:08:31 1750003711

> undefined behavior that even a mid-level C++ dev could have easily caught (assuming iterator stability in a vector that was being modified)

This is not an AI thing, plenty of "mid-level" C++ developers could have made that same mistake. New code should not be written in C++.

(I do wonder how Claude AI does when coding Rust, where at least you can be pretty sure that your code will work once it compiles successfully. Or Safe C++, if that ever becomes a thing.)

sampullman · 2025-06-15T16:41:26 1750005686

It does alright with Rust, but you can't assume it works as intended if it compiles successfully. The issue with current AI when solving complex or large scale coding problems is usually not syntax, it's logical issues and poor abstraction. Rust is great, but the borrow checker doesn't protect you from that.

I'm able to use AI for Rust code a lot more now than 6 months ago, but it's still common to have it spit out something decent looking, but not quite there. Sometimes re-prompting fixes all the issues, but it's pretty frustrating when it doesn't.

zozbot234 · 2025-06-15T16:52:33 1750006353

That's why I said "work" (i.e. probably will do something and won't crash) not "work as intended". Big difference there!

bugglebeetle · 2025-06-15T16:16:06 1750004166

I haven’t tried with the most recent Claude models, but for the last iteration, Gemini was far better at Rust and what I still use to write anything in it. As an experiment, I even fed it a whole ebook on Rust design patterns and a small script (500 lines) and it was able to refactor to use the correct ones, with some minor back and forth to fix build errors!

steveklabnik · 2025-06-15T16:36:46 1750005406

I use Claude Code with Rust regularly and am very happy with it.

jeffreygoesto · 2025-06-15T16:16:29 1750004189

Go ahead and modify A Python dict while iterating over it then.