Every time some production environment can be simplified, it is good news in my opinion. The ideal situation with Rails would be if there is a simple way to switch back to Redis, so that you can start simple, and as soon as you hit some fundamental issue with using SolidQueue (mostly scalability, I guess, in environments where the queue is truly stressed -- and you don't want to have a Postgres scalability problem because of your queue), you have a simple upgrade path. But I bet a lot of Rails apps don't have high volumes, and having to maintain two systems can be just more complexity.
> The ideal situation with Rails would be if there is a simple way to switch back to Redis
That's largely the case.
Rails provide an abstracted API for jobs (Active Job). Of course some application do depend on queue implementation specific features, but for the general case, you just need to update your config to switch over (and of course handle draining the old queue).
The primary pain point I see here is if devs lean into transactions such that their job is only created together with the everything else that happened.
Losing that guarantee can make the eventual migration harder, even if that migration is to a different postgres instance than the primary db.
Using the database as a queue, you no longer need to setup transaction triggers to fire your tasks, you can have atomic guarantees that the data and the task were created successfully, or nothing was created.
the problem i see here is that we end up treating the background job/task processor as part of the production system (e.g. the server that responds to requests, in the case of a web application) instead of a separate standalone thing. rails doesn’t make this distinction clear enough. it’s okay to back your tasks processor with a pg database (e.g. river[0]) but, as you indirectly pointed out, it shouldn’t be the same as the production database. this is why redis was preferred anyways: it was a lightweight database for the task processor to store state, etc. there’s still great arguments in favor of this setup. from what i’ve seen so far, solidqueue doesn’t make this separation.
It does not scale forever, and as you grow in throughput and job table size you will probably need to do some tuning to keep things running smoothly. But after the amount of time I've spent in my career tracking down those numerous distributed systems issues arising from a non-transactional queue, I've come to believe this model is the right starting point for the vast majority of applications. That's especially true given how high the performance ceiling is on newer / more modern job queues and hardware relative to where things were 10+ years ago.
If you are lucky enough to grow into the range of many thousands of jobs per second then you can start thinking about putting in all that extra work to build a robust multi-datastore queueing system, or even just move specific high-volume jobs into a dedicated system. Most apps will never hit this point, but if you do you'll have deferred a ton of complexity and pain until it's truly justified.
I don't disagree with that call out. However, we've been through these discussions many times over the years. The solid queue of yesteryear was delayed_job which was originally created by Shopify's CEO.
Shopify however grew (as many others) and we saw a host of blog posts and talks about moving away from DB queues to Redis, RabbitMQ, Kafka etc. We saw posts about moving from Resque to SideKiq etc. All this to day storing a task queue in the db has always been the naive approach. Engineers absolutely shouldn't be shocked that approach isn't viable at higher workloads.
It's not like I'll get a choice between the task database going down and not going down. If my task database goes down, I'm either losing jobs or duplicating jobs, and I have to pick which one I want. Whether the downtime is at the same time as the production database or not is irrelevant.
In fact, I'd rather it did happen at the same time as production, so I don't have to reconcile a bunch of data on top of the tasks.
Frequently you have to couple the transactional state of the queue db and the app db, colocating them is the simplest way to achieve that without resorting to distributed transactions or patterns that involve orchestrated compensation actions.
solid_queue by default prefers you use a different db than app db, and will generate that out of the box (also by default with sqlite3, which, separate discussion) but makes it possible, and fairly smooth, to configure to use the same db.
Personally, I prefer the same db unless I were at a traffic scale where splitting them is necessary for load.
One advantage of same db is you can use db transaction control over enqueing jobs and app logic too, when they are dependent. But that's not the main advantage to me, I don't actually need that. I just prefer the simplicity, and as someone else said above, prefer not having to reconcile app db state with queue state if they are separate and only ONE goes down. Fewer moving parts are better in the apps I work on which are relatively small-scale, often "enterprise", etc.
You misunderstood my stock market remarks. I don't care since anyway the technology has a value that is not connected to the economy nor the stock market. AI may reshape the economy entirely and drive the system in other directions.
Thanks for reading / commenting this post. Initially it seemed like I received a bunch of very negative comments, now I read most of the thread, and there are very good points, articulated with sensibility. Thank you.
I wanted to provide some more context that is not part of the blog post. Since somebody may believe I don't enjoy / love the act of writing code.
1. I care a lot about programming, I love creating something from scratch, line by line. But: at this point, I want to do programming in a way that makes me special, compared to machines. When the LLM hits a limit, and I write a function in a way it can't compete, that is good.
2. If I write a very small program that is like a small piece of poetry, this is good human expression. I'll keep doing this as well.
3. But, if I need to develop a feature, and I have a clear design idea, and I can do it in 2 hours instead of 2 weeks, how to justify to myself that, just for what I love, I will use a lot more time? That would be too much of ego-centric POV, I believe.
4. For me too this is painful, as a transition, but I need to adapt. Fortunately I also enjoyed a lot the design / ideas process, so I can focus on that. And write code myself when needed.
5. The reason why I wrote this piece is because I believe there are still a lot of people that are unprepared for the fact we are going to be kinda of obsolete in what defined us, as a profession: the ability to write code. A complicated ability requiring a number of skills at the same time, language skills, algorithms, problem decomposition. Since this is painful, and I believe we are headed in a certain direction, I want to tell the other folks in programming to accept reality. It will be easier, this way.
I think there is a danger in the enthusiasm for AI inside of these excellent points, namely that the skills that make a good programmer are not inherent, they are learned.
The comparison would be a guy who is an excellent journeyman electrician. This guy has visual-spatial skills that makes bending and installing conduit a kind of art. He has a deep and intuitive understanding of how circuits are balanced in a panel, so he does not overload a phase. But he was not born with them. These are acquired over many years of labor and tutelage.
If AI removes these barriers--and I think it will, as AI-enhanced programmers will out-perform and out-compete those who are not in today's employment market--then the programmer will learn different skills that may or may not be in keeping with language skills, algorithms, problem decomposition, etc. They may in fact be orthogonal to these skills.
The effect of this may be an improvement, of course. It's hard to say for sure as I left my crystal ball in my other jacket. But it will certainly be different. And those who are predisposed for programming in the old-school way may not find the field as attractive because it is no longer the same sort of engineering, something like the difference between the person that designs a Lego set and the person that assembles a Lego set. It could, in fact, mean that the very best programmers become a kind of elite, able to solve most problems with just a handful of those elite programmers. I'm sure that's the dream of Google and Microsoft. However this will centralize the industry in a way not seen since perhaps IBM, only with a much smaller chance of outside disruption.
Maybe solving more trivial problems with AI will left novice programmer to do more depth problems and will make them better faster, because they will spend time solving problems that matter.
That is possible, for sure. But think of it like a person learning the piano. You could practice your arpeggios on a Steinway, or you can buy a Casio with an arpeggiator button.
At a certain point, the professional piano player can make much better use of the arpeggiator button. But the novice piano player benefits greatly from all the slogging arpeggio practice. It's certainly possible that skipping all that grunt work will improve and/or advance music, but it's hardly a sure thing. That's the experiment we're running right now with AI programming. I suppose we'll see soon enough, and I hope I'm utterly wrong about the concerns I have.
That's really interesting, but i'm wondering if this is as rational as it looks.
> we are going to be kinda of obsolete in what defined us, as a profession: the ability to write code
Is it a fact, really? I don't think "writing code" is a defining factor, maybe it's a prerequisite, as being able to write words hardly defines "a novelist".
Anyway, prompt writing skills might become obsolete quite soon. So the main question might be to know which trend of technological evolution to pick and when, in order not to be considered obsolete. A crystal ball might still be more relevant than LLMs for that.
I call it "the ability to communicate intent [using a programming language]" and suddenly building with AI looks at lot more like the natural extension of what we used to do writing code by ourselves.
I don't think our profession was writing code to begin with (and this may be a bit uuhh. rewriting history?); what we do is take an idea, requirements, an end goal and make it reality. Often times that involves writing code, but that's only one aspect of the software developer's job.
Analogy time because comment sections love analogies. A carpenter can hammer nails, screw screws, make holes, saw wood to size. If they then use machines to make that work easier, do they stop being carpenters?
It's good if not essential to be able to write code. It's more important to know what to write and when. Best thing to do at this point is to stop attaching one's self-worth with the ability to write code. That's like a novelist (more analogies) who praises their ability to type at 100wpm. The 50 shades books proved you don't need to either touch type (the first book was mostly written on a blackberry apparently) or be good at writing to be successful, lol.
Agreed - as I see it, it's akin to the transitions from machine code -> assembly language -> C -> Javascript. As time went by, knowing the deep internals of the machine became less and less necessary, even though having that knowledge still gives an engineer a useful insight into their work and often makes them better at their job. The goal remains the same - make the computer do the thing; only the mechanism changes as the tools evolve.
"-> AI" is just the next step along that journey. Maybe it will end at "-> AGI" and then humans will engage in programming mostly for the craft and the pleasure of it, like other crafts that were automated away over the ages.
As a specific example of this, the U.S. 18F team had helped the Forest Service a decade ago with implementing a requirement to help people get a permit to cut down a Christmas tree.
Although there was a software component for the backend, the thing that the actual user ended up with was a printed-out form rather than a mobile app or QR code. This was a deliberate design decision (https://greacen.com/media/guides/2019/02/12/open-forest-laun...), not due to a limitation of software.
I still really, really, really struggle to see how humans are going to maintain and validate the programs written by LLMs if we no longer know (intimately) how to program. Any thoughts?
Very few people have the expertise to write efficient assembly code, yet everyone relies on compilers and assemblers to translate high-level code to byte-level machine code. I think same concept is true here.
Once coding agents become trivial, few people will know the detail of the programming language and make sure intent is correctly transformed to code, and the majority will focus on different objectives and take LLM programming for granted.
No, that's a completely different concept, because we have faultless machines which perfectly and deterministically translate high-level code into byte-level machine code. This is another case of (nearly) perfect abstraction.
On the other hand, the whole deal of the LLM is that it does so stochastically and unpredictably.
The unpredictable part isn't new - from a project manager's point of view, what's the difference between an LLM and a team of software engineers? Both, from that POV, are a black box. The "how" is not important to them, the details aren't important. What's important is that what they want is made a reality, and that customers can press on a button to add a product to their shopping cart (for example).
LLMs mean software developers let go of some control of how something is built, which makes one feel uneasy because a lot of the appeal of software development is control and predictability. But this is the same process that people go through as they go from coder to lead developer or architect or project manager - letting go of control. Some thrive in their new position, having a higher overview of the job, while some really can't handle it.
"But this is the same process that people go through as they go from coder to lead developer or architect or project manager - letting go of control."
In those circumstances, it's delegating control. And it's difficult to judge whether the authority you delegated is being misused if you lose touch with how to do the work itself. This comparison shouldn't be pushed too far, but it's not entirely unlike a compiler developer needing to retain the ability to understand machine code instructions.
As someone that started off with assembly issues for a large corporation - assembly code may sometimes contain very similiar issues that mroe high-level code those, the perfection of the abstraction is not guaranteed.
But yeah, there's currently a wide gap between that and a stochastic LLM.
We also have machines that can perfectly and deterministically check written code for correctness.
And the stohastic LLM can use those tools to check whether its work was sufficient, if not, it will try again - without human intervention. It will repeat this loop until the deterministic checks pass.
You can make analysers that check for deeply nested code, people calling methods in the wrong order and whatever you want to check. At work we've added multiple Roslyn analysers to our build pipeline to check for invalid/inefficient code, no human will be pinged by a PR until the tests pass. And an LLM can't claim "Job's Done" before the analysers say the code is OK.
And you don't need to make one yourself, there are tons you can just pick from:
> It's not like testing code is a new thing. Junit is almost 30 years old today.
Unit tests check whether code behaves in specific ways. They certainly are useful to weed out bugs and to ensure that changes don't have unintended side effects.
> And code correctness:
These are tools to check for syntactic correctness. That is, of course, not what I meant.
Algorithmic correctness? Unit tests are great for quickly poking holes in obviously algorithmically incorrect code, but far from good enough to ensure correctness. Passing unit tests is necessary, not sufficient.
Syntactic correctness is more or less a solved problem, as you say. Doesn't matter if the author is a human or an LLM.
It depends on the algorithm of course. If your code is trying to prove P=NP, of course you can't test for it.
But it's disingenuous to claim that even the majority of code written in the world is so difficult algorithmically that it can't be unit-tested to a sufficient degree.
Suppose you're right and the "majority of code" is fully specified by unit testing (I doubt it). The remaining body of code is vast, and the comments in this thread seem to overlook that.
> Very few people have the expertise to write efficient assembly code, yet everyone relies on compilers and assemblers to translate high-level code to byte-level machine code. I think same concept is true here.
That's a poor analogy which gets repeated in every discussion: compilers are deterministic, LLMs are not.
> That's a poor analogy which gets repeated in every discussion: compilers are deterministic, LLMs are not.
Compilers are not used directly, they are used by human software developers who are also not deterministic.
From the perspective of an organization with a business or service-based mission, they already know how to supervise non-deterministic LLMs because they already know how to supervise non-deterministic human developers.
Why does it matter if LLMs are not deterministic? Who cares?
There should be tests covering meaningful functionality, as long as the code passes the tests, ie. the externally observable behaviour is the same, I don't care. (Especially, if many tests can also be autogenerated with the LLM.)
>>> Very few people have the expertise to write efficient assembly code, yet everyone relies on compilers and assemblers to translate high-level code to byte-level machine code. I think same concept is true her
>> That's a poor analogy which gets repeated in every discussion: compilers are deterministic, LLMs are not.
> Why does it matter if LLMs are not deterministic? Who cares?
In the context of this analogy, it matters. If you're not using this analogy, then sure, only the result matters. But when the analogy being used is deterministic, then, yes, it matters.
You can't very well claim "We'll compare this non-deterministic process to this other deterministic process that we know works."
The difference is that if you write in C you can debug in C. You don't have to debug the assembly. You can write an english wish list for an LLM but you will still have to debug the generated code. To debug it you will need to understand it.
> how humans are going to maintain and validate the programs written by LLMs if we no longer know (intimately) how to program
Short answer: we wouldn’t be able to. Slightly-less short answer: unlikely to happen.
Most programmers today can’t explain the physics of computation. That’s fine. Someone else can. And if nobody can, someone else can work backwards to it.
> > how humans are going to maintain and validate the programs written by LLMs if we no longer know (intimately) how to program
> Short answer: we wouldn’t be able to.
That's a huge problem! A showstopper for many kinds of programs!
> Slightly-less short answer: unlikely to happen.
Could you elaborate?
> Most programmers today can’t explain the physics of computation. That’s fine. Someone else can. And if nobody can, someone else can work backwards to it.
That's not the same at all. We have properly abstracted away the physics of computation. A modern computer operates in a way where, if you use it the way you've been instructed to, the physics underlying the computations cannot affect the computation in any undocumented way. Only a very few (and crucically, known and understood!!) physical circumstances can make the physics influence the computations. A layperson does not need to know how those circumstances work, only roughly what their boundaries are.
This is wildly different from the "abstraction" to programming that LLMs provide.
> That's a huge problem! A showstopper for many kinds of programs!
We have automated validation and automated proofs.
Proof is necessary. Do you validate the theorem prover, or trust that it works? Do you prove the compiler is correctly compiling the program (when it matters, you should, given they do sometimes re-write things incorrectly) or trust the compiler?
> We have properly abstracted away the physics of computation. A modern computer operates in a way where, if you use it the way you've been instructed to, the physics underlying the computations cannot affect the computation in any undocumented way.
You trust the hardware the code is running on? You shouldn't.
Rowhammer comes to mind, but it's hardly the only case. US banned some Chinese chips for unspecified potential that this was going on.
For some people it's OK to run a few simple tests on the chip's output to make sure it doesn't have something like the Pentium FDIV bug, for others they remove the silicon wafer from the packaging and scan it with an electron microscope, verify not just each transistor is in the right place but also that the wires aren't close enough to have currents quantum tunnelling or act as an antenna that leaks out some part of a private key.
Some people will go all the way down to the quantum mechanics. Exploits are possible at any level, domains where the potential losses exceed the cost of investigation do exist, e.g. big countries and national security.
Proof is necessary. The abstraction of hardware is good enough for most of us, and given the excessive trust already given to NPM and other package management tools, LLM output that passes automated tests is already sufficient for most.
People like me who don't trust package management tools, or who filed bugs with Ubuntu for not using https enough and think that Ubuntu's responses and keeping the bug open for years smelled like "we have a court order requiring this but can't admit it" (https://bugs.launchpad.net/ubuntu-website-content/+bug/15349...)… well, I can't speak for the paranoid, but I'm also the curious type who learned how to program just because the book was there next to the C64 game tapes.
> We have automated validation and automated proofs.
Example?
> Proof is necessary. Do you validate the theorem prover, or trust that it works? Do you prove the compiler is correctly compiling the program (when it matters, you should, given they do sometimes re-write things incorrectly) or trust the compiler?
I trust that the people who wrote the compiler and use it will fix mistakes. I trust the same people to discover compiler backdoors.
As for the rest of what you wrote: you're missing the point entirely. Rowhammer, the fdiv bug, they're all mistakes. And sure, malevolence also exists. But when mistakes or malevolence are found, they're fixed, or worked around, or at least documented as mistakes. With an LLM you don't even know how it's supposed to behave.
> you're missing the point entirely. Rowhammer, the fdiv bug, they're all mistakes. And sure, malevolence also exists.
Rowhammer was a thing because the physics was ignored. Calling it a mistake is missing the point, it demonstrates the falseness of the previous claim:
We have properly abstracted away the physics of computation. A modern computer operates in a way where, if you use it the way you've been instructed to, the physics underlying the computations cannot affect the computation in any undocumented way.
Rowhammer *is* the physics underlying the computations affecting the computation in a way that was undocumented prior to it getting discovered and, well, documented. Issues like this exist before they're documented, and by definition nobody knows how many unknown things like this have yet to be found.
> But when mistakes or malevolence are found, they're fixed, or worked around, or at least documented as mistakes.
If you vibe code (as in: never look at the code), then find an error with the resulting product, you can still just ask the LLM to fix that error.
I only had a limited time to experiment with this before Christmas (last few days of a free trial, thought I'd give it a go to see what the limits were), and what I found it doing wrong was piling up technical debt, not that it was a mysterious ball of mud beyond its own ability to rectify.
> With an LLM you don't even know how it's supposed to behave.
LLM generated source code: if you've forgotten how to read the source code it made for you to solve your problem and can't learn how to read that source code and can't run the tests of that source code, at which point it's as interpretable as psychology.
The LLMs themselves: yes, this is the "interpretability" problem, people are working on that.
I am of course aware. Any malevolent backdoor in your compiler could also exist in your LLM. Or the compiler that compiled the LLM. So you can never do better.
> Rowhammer is the physics underlying the computations affecting the computation in a way that was undocumented prior to it getting discovered and, well, documented. Issues like this exist before they're documented, and by definition nobody knows how many unknown things like this have yet to be found.
Yep. But it's a bug. It's a mistake. The unreliability of LLMs is not.
> If you vibe code (as in: never look at the code), then find an error with the resulting product, you can still just ask the LLM to fix that error.
Of course. But you need skills to verify that it did.
> LLM generated source code: if you've forgotten how to read the source code it made for you to solve your problem and can't learn how to read that source code and can't run the tests of that source code, at which point it's as interpretable as psychology.
Reading source code is such a minute piece of the task of understanding code that I can barely understand what you mean.
At the current time this is essentially science fiction though. This something that the best funded companies on the planet (as well as many many others) work on and seem to be completely unable to achieve despite trying their best for years now, despite an incredible hype.
It feels like if those resources were poured in nuclear fusion for example we'd have it production ready by now.
The field is also not a couple of years old, this has been tried for decades. Sure only now companies decided to put essentially "unlimited" resources into it, but while it showed that certain things are possible and work extremely well, it also strongly hinted that at least the current approach will not get us there, especially not without significant trade-off (that whole over training vs "creativity" and hallucination topic).
Doesn't mean it won't come, but that it doesn't appear a "we just need a bit more development" topic. The state hasn't changed much. Models became bigger and bigger and people added that "thinking" hack and agents and agents for agents, but it also didn't change much about the initial approach and its limitations, given that they haven't cracked these problems after years of hyped funding.
Would be amazing if we would have AIs that automate research and maybe help us fix all the huge problems the world is facing. I'd absolutely love that. I'd also love it if people could easily create tools, games, art. However that's not the reality we live in. Sadly.
Tutoring – whether AI or human – does not provide the in-depth understanding necessary for validation and long-term maintenance. It can be a very useful step on the way there, but only a step.
Same how we do it now - look at the end result, test it. Testers never went away.
Besides, your comment goes by the assumption that we no longer know (intimately) how to program - is that true? I don't know C or assembly or whatever very well, but I'm still a valuable worker because I know other things.
I mean it could be partially true - but it's like having years of access to Google to quickly find just what I need, meaning I never learned how to read e.g. books on software development or scientific paper end to end. Never felt like I needed to have that skill, but it's a skill that a preceding generation did have.
> Besides, your comment goes by the assumption that we no longer know (intimately) how to program - is that true? I don't know C or assembly or whatever very well, but I'm still a valuable worker because I know other things.
The proposal seems to be for LLMs to take over the task of coding. I posit that if you do not code, you will not gain the skills to do so well.
> I mean it could be partially true - but it's like having years of access to Google to quickly find just what I need, meaning I never learned how to read e.g. books on software development or scientific paper end to end.
I think you've misunderstood what papers are for or what "the previous generation" used them for. It is certainly possible to extract something useful from a paper without understanding what's going on. Googling can certainly help you. That's good. And useful. But not the main point of the paper.
Fair question but haven't we been doing this for decades? Very few people know how to write assembly and yet software has proliferated. This is just another abstraction.
> Fair question but haven't we been doing this for decades? Very few people know how to write assembly and yet software has proliferated. This is just another abstraction.
Not at all. Given any "layperson input", the expert who wrote the compiler that is supposed to turn it into assembly can describe in excruciating detail what the compiler will do and why. Not so with LLMs.
Said differently: If I perturb a source code file with a few bytes here and there, anyone with a modicum of understanding of the compiler used can understand why the assembly changed the way it did as a result. Not so with LLMs.
But there's a limit to that. There's (relatively) very few people that can explain the details of e.g. a compiler, compared to for example React front-end developers that build B2C software (...like me). And these software projects grow, ultimately to the limit of what one person can fit in their head.
Which is why we have lots of "rules" and standards on communication, code style, commenting, keeping history, tooling, regression testing, etc. And I'm afraid those will be the first to suffer when code projects are primarily written by LLMs - do they even write unit tests if you don't tell them to?
Really enjoyed your article, and it reflects a lot of the pain-points I experience with models. I tend to still write and review LLM code after creation, but there is definitely a shift within how much code I create "artisinally" and how much is reviewd in terms of scale.
If I need to implement a brand new feature for the project, I will find myself needing to force a view into a LLM because it will help me achieve 80% of the feature in 1% of the time, even if the end result requires a scale of refactoring, it's rarely the time that the original feature would've taken me.
But, I think that's also because I have solid refactoring foundations, I know what makes good code, and I think if I had access to these tools 5 years ago, I would not be able to achieve that same result, as LLMs typically steer towards junior level coding as a consequence of their architecture.
This is by far the best summary of the state of affairs, or rather, the most sensible perspective that one should have on the state of affairs, that I've read so far.
Of the four coding examples you describe, I find none of them compelling either in their utility or as a case for firing a dev (with one important caveat [0]).
In each example, you were already very familiar with the problem at hand, and that probably took far longer than any additional time savings AI could offer.
0. Perhaps I consider your examples as worthless simply because you gloss over them so quickly, in which case that greatly increases the odds in most companies that you would be fired.
All of that makes a lot of sense. And unlike a lot of both pro-AI and anti-AI people I would find it great if it was the case. Unlike maybe a lot of people here I am less attached to this profession as a profession. I'd also love it if I could have some LLM do the projects I always wanted to finish. It would be essentially Christmas.
However your experiences really clash with mine and I am trying to work out why, because so far I haven't been able to copy your workflow with success. It would be great if I could write a proper spec and the output of the LLM would be good (not excellent, not poetry, but just good). However the output for anything that isn't "stack overflow autocomplete" style it is abysmal. Honestly I'd be happy if good output is even on the horizon.
And given that "new code" is a lot better than working on an existing project and an existing LLM generated project being better than a human made project and it still being largely bad, often with subtle "insanity" I have a hard time to apply what you say to reality.
I do not understand the disconnect. I am used to writing specs. I tried a lot of prompting changes, to a degree where it almost feels like a new programming language. Sure there are things that help, but the sad reality is that I usually spend more time dealing with the LLM than I'd need to write that code myself. And worse still, I will have to fix it and understand it, etc. to be able to keep on working on it and "refining" it, something that simply isn't needed at least to that extent if I wrote that code myself.
I really wished LLMs would provide that. And don't get me wrong, I do think there are really good applications for LLMs. Eg anything that needs a transform where even a complex regex won't do. Doing very very basic stuff where one uses LLMs essentially as an IDE-integrated search engine, etc.
However the idea that it's enough to write a spec for something even semi-novel currently appears to be out of reach. For trivial generic code it essentially saves you from either writing it yourself copy pasting it off some open source projects.
Much context, for the question that hopefully explains a lot of stuff. Those 2 hours that you use instead of two weeks. How do you spend them? Is that refining prompts, is that fixing the LLM output, is that writing/adapting specs, is it something else?
Also could it be that there is a bias on "time spent" because of it being different work or even just a general focus on productivity, more experience, etc.?
I am trying to understand where that huge gap in experience that people have really stems from. I read your posts, I watch video on YouTube, etc. I just haven't seen "I write a spec [that is is shorter/less effort than the actual code] and get good output". Every time I read claims about it in blog posts and so on there appear to be parts missing to reproduce the experience.
I know that there are a lot of "ego-centric POV" style AI "fear". People of course have worries about their jobs, and I understand. However, personally I really don't and as mentioned I'd absolutely love to use it like that on some projects, but whenever I try to replicate experiences that aren't just "toying" in the sense of anything that even has basic reliability requirements and is a bit more complex I fail to do so and it's probably me, but I tried for at least a year to replicate such things and it's failure after failure even for more simple things.
That said there are productivity gains with autocomplete, transforming stuff and what people largely call "boilerplate" as well as more quickly writing small helpers that I'd otherwise have copied off some older project. Those things work good enough, just like how autocomplete is good enough. For bigger and more novel things where a search engine is also not the right approach it fails, but this is where the interesting bits are. Having topics that haven't been solved a hundred times over.
Thanks for the post. I found it very interesting and I agree with most of what you said. Things are changing, regardless of our feelings on the matter.
While I agree that there is something tragic about watching what we know (and have dedicated significant time and energy in learning) devalued. I'm still exited for the future, and for the potential this has. I'm sure that given enough time this will result in amazing things that we cannot even imagine today. The fact that the open models and research is keeping up is incredibly important, and probably the main things that keeps me optimistic for the future.
I care a lot about programming, but I want to do programming in a way that makes me special compared to machines. When the LLM hits a limit, and I write a function in a way it can't compete, that is good. If I write a very small program that is like a small piece of poetry, this is good human expression. But if I need to develop a feature, and I have a clear design idea, and I can do it in 2 hours instead of 2 weeks, how to justify with myself that just for what I love I use a lot more time? That would be too much of ego, I believe. So even if for me too this is painful, as a transition, I need to adapt. Fortunately I also enjoyed a lot the desing / ideas process, so I can focus on that. And write code myself when needed.
I like programming, and I do write code all the times. But when there is to do something productive, it is very hard to justify that for my ego or passion I don't leverage AI and go N times faster only because I'm used to enjoy a given process. I try to also enjoy the other process not related to writing code: ideas and design.
Force it to have clear metrics / observability on what it is doing. For instance the other day I wanted Claude to modify a Commodore 64 emulator, and I started saying it to implement an observability framework where as the emulator run, it can connect to a socket and ask for registers, read/write memory areas, check the custom chips status, set breakpoints, ... As you can guess, after this the work is of a different kind.
After you review, instead of rewriting 70% of the code, have you tried to follow up with a message with a list of things to fix?
Also: in my experience 1. and 2. are not needed for you to have bad results. The existing code base is a fundamental variable. The more complex / convoluted it is, the worse is the result. Also in my experience LLMs are constantly better at producing C code than anything else (Python included).
I have the feeling that the simplicity of the code bases I produced over the years, and that now I modify with LLMs, and the fact they are mostly in C, is a big factor why LLMs appear to work so well for me.
Another thing: Opus 4.5 for me is bad on the web, compared to Gemini 3 PRO / GPT 5.2, and very good if used with Claude Code, since it requires to reiterate to reach the solution, why the others sometimes are better first-shotter. If you generate code via the web interface, this could be another cause.
> After you review, instead of rewriting 70% of the code, have you tried to follow up with a message with a list of things to fix?
This is one of my problems with the whole thing, at least from a programming PoV. Even though superficially it seems like the ST:TNG approach to using an intelligent but not aware computer as a tool to collaboratively solve a problem, it is really more like guiding a junior through something complex. While guiding a junior (or even some future AGI) in that way is definitely a good thing, if I am a good guide they will learn from the experience so it will be a useful knowledge sharing process, that isn't a factor for an LLM (at least not the current generations). But if I understand the issue well enough to be a good guide, and there is no teaching benefit external to me, I'd rather do it myself and at most use the LLM as a glorified search engine to help muddle through bad documentation for hidden details.
That and TBH I got into techie things because I like tinkering with the details. If I thought I'd not dislike guiding others doing the actual job, I'd have not resisted becoming a manager throughout all these years!
> After you review, instead of rewriting 70% of the code, have you tried to follow up with a message with a list of things to fix?
I think this is the wrong approach, already by having "wrong code" in the context, makes every response after this worse.
Instead, try restarting, but this time specify exactly how you expected that 70% of the code to actually have worked, from the get go. Often, LLMs seem to make choices because they have to, and if you think they made the wrong choice, you can often find that you didn't actually specify something well enough, hence the LLM had to do something, since apparently the single most important thing for them is that they finish something, no matter how right or wrong.
After a while, you'll get better at knowing what you have to be precise, specific and "extra verbose" about, compared to other things. Something that also seems to depend on the model, like with how Gemini you can have 5 variations of "Don't add any comments" yet it does anyways, but say that once to GPT/Claude-family of models and it seems they get it at once.
There are some problems where this becomes a game of whack-a-mole either way you approach it (restart or modify with existing context). I end up writing more prompts than the code I could've written myself.
This isn't to say I don't think LLMs are an asset, they have helped me solve problems and grow in domains where I lack experience.
Please note that the majority of OSS efforts where already non monetized and deeply exploited. At least, what it is happening has the potential to change the model towards a more correct one. What you see with Tailwind and similar cases, it is not really an open source business model issue, it is a "low barrier to entry" business model issue, since with AI a lot of things can be done without efforts and without purchasing PRO products. And also documentation is less useful, but this is a general thing, not just related to OSS software. In general people that write OSS are, for the most part, not helped enough by the companies using their code to make money, by users, buy everybody else, basically.
Yep, they work especially if you instruct them to add into your program ways for them to "see" what it is happening. And the more embedding models are getting better, the better results we will get too, from their ability to "see". For now Gemini 3 is the best at this, but is not the best at coding as an agent, so we will have to wait a bit.
If you can't see this by working with Claude Code for a few weeks, I don't want to go into bigger efforts than writing a blog post to convince you. It's not a mission, mine. I just want to communicate with the part of people that are open enough to challenge their ideas and are willing to touch with their hands what is happening. Also, if you tried and failed, it means that either for your domain AI is not good enough, or you are not able to extract the value. The fact is, this does not matter: a bigger percentage of programmers is using AI with success every day, and as it progresses this will happen more and in more diverse programming fields and tasks. If you disagree and are happy to avoid LLMs, well, it's ok as well.
okay, but again: if you say in your blog that those are "facts", then... show us the facts?
You can't just hand-wavily say "a bigger percentage of programmers is using AI with success every day" and not give a link to a study that shows it's true
as a matter of fact, we know that a lot of companies have fired people by pretending that they are no longer needed in the age of AI... only to re-hire offshored people for much cheaper
for now, there hasn't been a documented sudden increase in velocity / robustness for code, a few anecdotical cases sure
I use it myself, and I admit it saves some time to develop some basic stuff and get a few ideas, but so far nothing revolutionary. So let's take it at face value:
- a tech which helps slightly with some tasks (basically "in-painting code" once you defined the "border constraints" sufficiently well)
- a tech which might cause massive disruption of people's livelihoods (and safety) if used incorrectly, which might FAR OUTWEIGH the small benefits and be a good enough reason for people to fight against AI
- a tech which emits CO2, increases inequalities, depends on quasi slave-work of annotators in third-world countries, etc
so you can talk all day long about not dismissing AI, but you should take it also with everything that comes with it
1. If you can't convince yourself, after downloading Claude Code or Codex and playing with them for 1 week, that programming is completely revolutionized, there is nothing I can do: you have it at your fingertips and you search for facts I should communicate for you.
2. The US alone air conditioning usage is around 4 times the energy / CO2 usage of all the world data centers (not just AI) combined together. AI is 10% of the data centers usage, so just AC is 40 times that.
I enjoyed about your blog post, but I was curious about the claim in point 2 above. I asked Claude and it seems the claim is false:
# Fact-Checking This Climate Impact Claim
Let me break down this claim with actual data:
## The Numbers
*US Air Conditioning:*
- US A/C uses approximately *220-240 TWh/year* (2020 EIA data)
- This represents about 6% of total US electricity consumption
*Global Data Centers:*
- Estimated *240-340 TWh/year globally* (IEA 2022 reports)
- Some estimates go to 460 TWh including cryptocurrency
*AI's Share:*
- AI represents roughly *10-15%* of data center energy (IEA estimates this is growing rapidly)
## Verdict: *The claim is FALSE*
The math doesn't support a 4:1 ratio. US A/C and global data centers use *roughly comparable* amounts of energy—somewhere between 1:1 and 1:1.5, not 4:1.
The "40 times AI" conclusion would only work if the 4x premise were true.
## Important Caveats
1. *Measurement uncertainty*: Data center energy use is notoriously difficult to measure accurately
2. *Rapid growth*: AI energy use is growing much faster than A/C
3. *Geographic variation*: This compares one country's A/C to global data centers (apples to oranges)
## Reliable Sources
- US EIA (Energy Information Administration) for A/C data
- IEA (International Energy Agency) for data center estimates
- Lawrence Berkeley National Laboratory studies
The quote significantly overstates the disparity, though both are indeed major energy consumers.
I tried Claude on a project where I'd got stuck trying to use some MacOS media APIs in a Rust app.
It just went in circles between something that wouldn't compile, and a "solution" that compiled but didn't work despite the output insisting it worked. Anything it said that wasn't already in the (admittedly crap) Apple documentation was just hallucination.
A bit like we should trust RFK on how "vaccines don't work" thanks to his wide experience?
The idea here is not to say that antirez has no knowledge about coding or software engineering, the idea was that if he says "hey we have the facts", and then when people ask "okay, show us the fact" he says: "just download claude code and play with it one hour and you have the facts" we don't trust that, that's not science
That's a great example in support of my argument here, because RFK Jr clearly has no relevant experience at all - so "figuring out, based on prior reputation and performance, who you should trust" should lead you to not listen to a word he says.
Well guess what, a lot of people will "trust him" because he is a "figure of power" (he's a minister of the current administration). So that's exactly why "authority arguments" are bad... and we should rely on science and studies
1. "if you can't convince yourself by playing anecdotically" is NOT "facts"
2. it's not because the US is incredibly bad at energy spending in AC that it somehow justifies the fact that we would add another, mostly unnecessary, polluting source, even if it's slightly lower. ACs have existed for decades. AI has been exploding for a few years, so we can definitely see it go way, way past the AC usage
also the idea is of "accelerationnism". Why do we need all this tech? What good does it make to have 10 more silly slop AI videos and disinformation campaigns during election? Just so that antirez can be a little bit faster at doing his code... that's not what the world is about.
Our world should be about humans, connecting together (more slowly, not "faster"), about having meaningful work, and caring about planetary resources
The exact opposite of what capitalistic accelerationism / AI is trying to sell us
Sure, but I wasn't the one pretending to have "facts" on AI...
> Slightly odd question to be asking here on Hacker News!
It's absolutely not? The first line of question when you work in a domain SHOULD BE "why am I doing this" and "what is the impact of my work on others"
> If you can solve "measure programming productivity with data" you'll have cracked one of the hardest problems in our industry.
That doesn't mean that we have to accept claims that LLMs drastically increase productivity without good evidence (or in the presence of evidence to the contrary). If anything, it means the opposite.
At the is point the best evidence we have is a large volume of extremely experienced programmers - like antirez - saying "this stuff is amazing for coding productivity".
My own personal experience supports that too.
If you're determined to say "I refuse to accept appeal to authority here, I demand a solution to the measuring productivity problem first" then you're probably in for a long wait.
> At the is point the best evidence we have is a large volume of extremely experienced programmers - like antirez - saying "this stuff is amazing for coding productivity".
The problem is that we know that developers' - including experienced developers' - subjective impressions of whether LLMs increase their productivity at all is unreliable and biased towards overestimation. Similarly, we know that previously the claims of massive productivity gains were false (no study reputable showed even a 50% improvement, let alone the 2x, 5x, 10x, etc that some were claiming, indicators of actual projects shipped were flat, etc). People have been making the same claims for years at this point, and every time when we actually were able to check, it turned out they were wrong. Further, while we can't check the productivity claims (yet) because that takes time, we can check other claims (e.g. the assertion that a model produces code that doesn't need to be reviewed by a human anymore), and those claims do turn out to be false.
> If you're determined to say "I refuse to accept appeal to authority here, I demand a solution to the measuring productivity problem first" then you're probably in for a long wait.
Maybe, but my point still stands. In the absence of actual measurement and evidence, claims of massive productivity gains do not win by default.
If a bunch of people say "it's impossible to go to the moon, nobody has done it" and Buzz Aldrin says "I have been to the moon, here are the photos/video/NASA archives to prove it", who do you believe?
The equivalent of "we've been to the moon" in the case of LLMs would be:
"Hey Claude, generate a full Linux kernel from scratch for me, go on the web to find protocol definitions, it should handle Wifi, USB, Bluetooth, and have WebGL-backed window server"
And then have it run in a couple of hours/days to deliver, without touching it.
If a bunch of people say "there are no cafes in this town that serve bench on a Sunday" and then Buzz Aldrin says "I just had a great brunch in the cafe over there, here's a photo", who would you listen to?
Many issues have been pointed in the comments, in particular the fact that most of the things that antirez speaks about is how "LLMs make it easy to fill code for stuff he already knows how to do"
And indeed, in this case, "LLM code in-painting" (eg let the user define the constraints, then act as a "code filler") works relatively nicely... BECAUSE the user knows how it should work, and directed the LLM to do what he needs
But this is just, eg, 2x/3x acceleration of coding tasks for good coders already, this is neither 100x, nor is it reachable for beginner coders.
Because what we see is that LLMs (for good reasons!!) *can't be trusted* so you need to have the burden of checking their code every time
So 100x productivity IS NOT POSSIBLE simply because it would be too long (and frankly too boring) for a human to check the output of 100x of a normal engineer (as long as you don't spend 1000 hours upfront trying to encode all your domain in a theorem-proving language like Lean and then ensure the implementation is checked through it... which would be so costly that the "100x gains" would already have disappeared)
Nobody is saying we want to "turn down" (although, this would be a discussion between pros/cons if the boost is "only" 2x and the cons could be "this tech leads to authoritarian regimes everywhere)
What we are discussing here is whether this is a true step-change for coding, or this is merely a "coding improvement tool"
This is obviously a collision between our human culture and the machine culture, and on the surface its intent is evil, as many have guessed already. But what it also does is it separates the two sides cleanly, as they want to pursue different and wildly incompatible futures. Some want to herd sheep, others want to unite with tech, and the two can't live under one sky. The AI wedge is a necessity in this sense.
I continue to hope that we see the opposite effect: the drop of cost in software development drives massively increased demand for both software and our services.
Why do you care so much to write a blog post? Like if it's such a big advantage, why not stay quiet and exploit it? Why not make Anti-AI blog posts to gain even more of an advantage?
One of the big red flags I see around the pro-AI side is this constant desire to promote the technology. At least the anti-ai side is reactionary.
It seems quite profitable nowadays to position yourself as [insert currently overhyped technology] GURU to generate clicks/views. Just look at the amount of comments in this thread.
I am waiting people to commits their prompt/agents setup instead of the code to call this a changing paradigm. So far it is "just" machine generating code and generating code doesn't solve all the software problem (but yeah they get pretty good at generating code)
reply