I think AI has come as the industry was somewhat maturing and most frameworks/software had previous incarnations that mostly did the same thing or could be done adhoc anyway. The need for libraries as the models get better probably declines as well.
Not all open source but a lot of it is fundamentally for humans to consume. If AI can, at its extreme (still remains to be seen), just magic up the software then the value of libraries and a lot of open source software will decline. In some ways its a fundamentally different paradigm of computing, and we don't yet understand what that looks like.
As AI gets better OSS contributes to it; but in its source code feeding the training data not as a direct framework dependency. If the LLM's continue to get better I can see the whole concept of frameworks being less and less necessary.
Mostly. I had the "AI bot tsunami" problem on my own personal site and blocked a bunch of bot user agents in robots.txt. Most of them were from companies I had never heard of before. The only big AI name I recognized was GPTBot from OpenAI.
https://www.anthropic.com/careers/jobs/5025624008 - "Research Engineer – Cybersecurity RL" - "This role blends research and engineering, requiring you to both develop novel approaches and realize them in code. Your work will include designing and implementing RL environments, conducting experiments and evaluations, delivering your work into production training runs, and collaborating with other researchers, engineers, and cybersecurity specialists across and outside Anthropic."
https://www.anthropic.com/careers/jobs/4924308008 - "Research Engineer / Research Scientist, Biology & Life Sciences" - "As a founding member of our team, you'll work at the intersection of cutting-edge AI and the biological sciences, developing rigorous methods to measure and improve model performance on complex scientific tasks."
The key trend in 2025 was a new emphasis on reinforcement learning - models are no longer just trained by dumping in a ton of scraped text, there's now a TON of work involved designing reinforcement learning loops that teach them how to do specific useful things - and designing those loops requires subject-matter expertise.
That's why they got so much better at code over the past six months - code is the perfect target for RL because you can run generated code and see if it works or not.
The funny part is how they think this will give them the power to take control of what is the defacto standard and circumvent standards.
It will instead further distinguish what is AI slop because it doesn't work and be siloed off to people who don't care about the code so can't fix it.
If people want good interoperable production ready code that can be deployed instantly and just works and meets all current standards and ongoing discussions, we've had it for many decades and it's called open source.
I am yet to see a vibe coded success that isn't a small program that already exists in multiple forms in the training data. Let's see something ground-breaking. If AI coding is so great and is going to take us to 10x or 100x productivity let's see it generate a new, highly efficient compression algorithm or a state-of-art travelling salesman solution.
Why? People don't ask hammers to do much more than bash in nails into walls.
AI coding tools can be incredibly powerful -- but shouldn't that power be focused on what the tool is actually good at?
There are many, many times that AI coding tools can and should be used to create a "small program that already exists in multiple forms in the training data."
I do things like this very regularly for my small business. It's allowed me to do things that I simply would not have been able to do previously.
People keep asking AI coding tools to be something other than what they currently are. Sure, that would be cool. But they absolutely have increased my productivity 10x for exactly the type of work they're good at assisting with.
>People don't ask hammers to do much more than bash in nails into walls.
“It resembles a normal hammer but is outfitted with an little motor and an flexible head part which moves back and forth in a hammering motion, sparing the user from moving his or her own hand to hammer something by their own force and by so making their job easier”
Good reference and a funny scene but doesn't quite hit home because we have invented improved hammers in the form of pneumatic nail guns and even cordless nailers (some pneumatic and some motorized) which could truly be called an "electric hammer".
With this context the example may support the quote, nail guns do make driving nails much faster and easier but that's all they do. You can't pull a nail with a nail gun and you can't use it for any of the other things that a regular hammer can do. They do 10x your ability to drive nails though.
On the other hand, LLMs are significantly more multi-purpose than a nail gun.
> People keep asking AI coding tools to be something other than what they currently are.
I think it's for a very reasonable reason: the AI coding tool salespeople are often selling the tools as something other than what they currently are.
I think you're right, that if you calibrate your expectations to what the tools are capable of, there's definitely. It would be nice if the marketing around AI also did the same thing.
AI sales seems to be very much aligned with productivity improvement - "do more of the same but faster" or "do the same with fewer people"). No one is selling "do more".
> I think it's for a very reasonable reason: the AI coding tool salespeople are often selling the tools as something other than what they currently are.
And if this submission was an AI salesperson trying to sell something, the comment/concern would be pertinent. It is otherwise irrelevant here.
Yes! I can't tell you the number of times I thought to myself "If only there was a way for this problem to be solved once instead of being solved over and over again". If that is the only thing AI is good at, then it's still a big step up for software IMO.
Because that's the vision of many companies trying to sell AI. Saying that what it can do now is actually already good enough might be true, but it's also moving the goalposts compared to what was promised (or feared, depending who you're asking).
One of the many important skills needed to navigate our weird new LLM landscape is ignoring what the salespeople say and listening to the non-incentivized practitioners instead.
I've been thinking a lot about the fact that so much of our software has become engagement driven. E.g. Duolingo isn't optimized for learning a language, Facebook isn't optimized for connecting with your friends and family.
I wonder if AI coding tools might get out of this for some cases at least. Make an app that is clearly derivative but actually is optimized to do the thing it actually is supposed to do.
Harder with network effect apps but might be possible for others.
To be clear, I see a lot of "magical thinking" among people who promote AI. They imagine a "perfect" AI tool that can basically do everything better than a human can.
Maybe this is possible. Maybe not.
However, it's a fantasy. Granted, it is a compelling fantasy. But its not one based on reality.
A good example:
"AI will probably be smarter than any single human next year. By 2029, AI is probably smarter than all humans combined.” -- Elon Musk
This is, of course, ridiculous. But, why should we let reality get in the way of a good fantasy?
> AI will probably be smarter than any single human next year.
Arguably that's already so. There's no clear single dimension for "smart"; even within exact sciences, I wouldn't know how to judge e.g. "Who was smarter, Einstein or Von Neumann?". But for any particular "smarts competition", especially if it's time limited, I'd expect Claude 4.5 Opus and Gemini 3 Pro to get higher scores than any single human.
Hear me out: let's say that generating a new and better compression algorithm is something that might take a dedicated researcher about a year of their life, and that person is being paid to work on it, in the industry or via a grant. Is there anyone who has been running Claude Code instances for a human-year in a loop with the instruction to try different approaches until it has a better compression algorithm?
Because I keep wondering myself if AI is here and our output is charged up, then why am I keep seeing more of the same products but with an "AI" sticker slapped on top of them? From a group of technologists like HN and the startup world, that live on the edge of evolution and revolution, maybe my expectations were a bit too high.
All I see is the equivalent of a "look how fast my new car made me go to the super market, when I'm not too demanding on the super market I want to end up with, and all I want is milk and eggs". Which is 100% fine, but at the end of the day I eat the same omelette as always. In this metaphor, I don't feel the slightest behind, or have any sense of FOMO if I cook my omelette slowly. I guess I have more time for my kids if I see the culinary arts as just a job. And it's not like restaurants suddenly get all their tables booked faster just because everyone cooks omelettes faster.
>It's allowed me to do things that I simply would not have been able to do previously.
You're not the one doing them. Me barking orders to John Carmack himself doesn't make me a Quake co-creator, and even if I micromanage his output like the world's most toxic micromanager who knows better I'm still not Carmack.
On top of that, you would have been able to do previously, if you cared enough to upskill to the point where token feeding isn't needed for you to feel productive. Tons of programmers broke barriers, and solved problems that haven't been solved by anyone in their companies before.
I don't see why everyone claiming that they previously couldn't do something is a bragging point. The LLM's that you're using were trained by the Google results you could've gotten if you Google searched.
> Why? People don't ask hammers to do much more than bash in nails into walls.
No one is propping up a multi-billion dollar tech bubble by promising hammers that do more than bash nails. As a point of comparison that makes no sense.
The software development market is measured in tens of billions to hundreds of billions of dollars depending on which parts you're looking at so inventing a better hammer (development tool) can be expected to drive billions of dollars of value. How many billions depends on how good of a tool it turns out to be in the end. That's only counting software, it's also directly applicable to all media (image, video, audio, text) and some scientific domains (genetics, medicine, materials, etc.)
You’re right, but at the same time, 99% of software people need has already been done in some form. This gets back to the article on “perfect software” [1] posted last week. This bookshelf is perfect for the guy who wrote it and there isn’t anything exactly like it out there. The common tools on the App Store (goodreads) don’t fit his needs. But he was able to create a piece of “perfect software” that exactly meets his own goals and his own design preferences. And it was very easy to accomplish with LLMs, just by putting together pieces of things that have been done before.
Yes, that's an excellent framing of where we're at and the role that LLM generated software is excelling in. Custom software has been out of reach for many people who would benefit from it due to requiring either a lot of money to pay someone to build it or a lot of time to learn how to build it yourself and execute on that process. Right now you can essentially use services like Claude as a custom software "app store", although I'd really call it a service, where you can say "I'd like an app that does X" and depending on the scope you can get that app as a Claude Artifact in a few minutes or, if you're familiar with software development and build/deployment processes, in a few hours to days as a more traditional software artifact which you can host somewhere or install locally. Google is working hard to make this even more achievable for non-developers with Google AI Studio https://aistudio.google.com/ and Firebase Studio https://firebase.studio/
Exactly. LLMs aren't taking the jobs of developers - unless they're selling some micro-SaaS stuff that anyone with enough domain knowledge can duplicate with a Claude subscription over a weekend.
Like the DVD catalogue software I was using[0] became subscription based, and I'm not paying 50€/year for that.
Just rewrote the bits that I need this afternoon with Claude. It definitely doesn't have the same features as the CLZ Movies app had, but it has all the ones I need specifically (adding movies easily, quickly seeing if I already own a movie).
And the same will happen to more and more SaaS style things unless they offer something unique a self-made and self-hosted one can't provide.
Much of the coding we do is repetitive and exists in the training data, so I think its pretty great if AI can eliminate that toil and liberate the meat to focus on the creative work.
There’s a reason they call working at Google “shuffling protobufs” for the vast majority of engineers. Most software work isn’t innovative compression algorithms. It’s moving data around, which is a well understood problem
And to add to this, for some reason people really bristle if you say that many LLM’s are just search with extra steps. This feels like an extension of that. It’s just reinventing the wheel over and over again based on a third party’s (admittedly often a solid approximation but still not exact) educated guess of what a wheel may be. It all seems like a rather circuitous way to accomplish things unless your goal isn’t to build a wheel but rather tinker and experiment with the concept of a wheel and learn something in the process. Totally valid, but I’m pretty sure that’s not what open AI et al are pitching lol
I find this type of comment depressing. This is a time for exploration and learning new things. This is a prefect way to do so. It’s a small project that solves the problem. Better time spent vibe coding it then to evaluate existing alternatives.
Forget utterly groundbreaking things, I want to hear maintainers of complex, actively developed, and widely used open-source projects (e.g. ffmpeg, curl, openssh, sqlite) start touting a massive uptick in positive contributions, pointing to a concrete influx of high-quality AI-assisted commits. If AI is indeed a 10x force multiplier, shouldn't these projects have seen 10 years' worth of development in the last year?
Don't get me wrong, AI is at least as game-changing for programming as StackOverflow and Google were back in the day. Being able to not only look up but automatically integrate things into your codebase that already exist in some form in the training data is incredibly useful. I use it every day, and it's saved me hours of work for certain specific tasks [0]. For tasks like that, it is indeed a 10x productivity multiplier. But since these tasks only comprise a small fraction of the full software development process, the rest of which cannot be so easily automated, AI is not the overall 10x force multiplier that some claim.
> I want to hear maintainers of complex, actively developed, and widely used open-source projects (e.g. ffmpeg, curl, openssh, sqlite) start touting a massive uptick in positive contributions
That's obviously not going to happen, because AI tools can't solve for taste. Just because a developer can churn out working code with an LLM doesn't mean they have the skills to figure out what the right working code to contribute to a project is, and how to do so in a way that makes the maintainers lives easier and not harder.
That skill will remain rare.
(Also SQLite famously refuses to accept external contributions, but that's a different issue.)
No, Simon, we don't "refuse". We are just very selective and there is a lot of paperwork involved to confirm the contribution is in the public domain and does not contaminate the SQLite core with licensed code. Please put the false narrative that "SQLite refuses outside contributions" to rest. The bar is high to get there, but the SQLite code base does contain contributed code.
Dr. Hipp, I love SQLite but also had simonw's misapprehension that the project did not accept contributions. The SQLite copyright page says:
> Contributed Code
> In order to keep SQLite completely free and unencumbered by copyright, the project does not accept patches. If you would like to suggest a change and you include a patch as a proof-of-concept, that would be great. However, please do not be offended if we rewrite your patch from scratch.
I realize that the section, "Open-Source, not Open-Contribution" says that the project accepts contributions, but I'm having trouble understanding how that section and the "Contributed Code" section can both be accurate. Is there a distinction between accepting a "patch" vs. accepting a "contribution?"
If you're planning to update this page to reduce confusion of the contribution policy, I humbly suggest a rewrite of this sentence to eliminate the single and double negatives, which make it harder to understand:
> In order to keep SQLite in the public domain and ensure that the code does not become contaminated with proprietary or licensed content, the project does not accept patches from people who have not submitted an affidavit dedicating their contribution into the public domain.
Could be rewritten as:
> In order to keep SQLite in the public domain and prevent contamination of the code from proprietary or licensed content, the project only accepts patches from people who have submitted an affidavit dedicating their contribution into the public domain.
I will make sure not to spread that misinformation further in the future!
Update: I had a look in fossil and counted 38 contributors:
brew install fossil
fossil clone https://www.sqlite.org/src sqlite.fossil
fossil sql -R sqlite.fossil "
SELECT user, COUNT(*) as commits
FROM event WHERE type='ci'
GROUP BY user ORDER BY commits DESC
"
Man, your behavior when you realize you got something wrong is something the rest of us can aspire to. This is one of the things I like the best about you.
I learned it from newspapers: papers that publish prompt and clear corrections when they publish mistakes are more credible than papers that don't acknowledge their errors.
> Being able to not only look up but automatically integrate things into your codebase that already exist in some form in the training data is incredibly useful.
Until it decides to include code it gathered from a stackoverflow post 15 years ago probably introducing security related issues or makes up libraries on the go or even worse, tries to make u install libs that were part of a data poisoning attack.
It's no different from supervising a naïve junior engineer who also copy/pastes from 15 year old SO posts (a tale as old as time): you need to carefully review and actually grok the code the junior/AI writes. Sometimes this ends up taking longer than writing it yourself, sometimes it doesn't. As with all decisions in delegating work, the trick is knowing ahead of time whether this will be the case.
Naive junior engineers eventually learn and become competent senior engineers. LLMs forget everything they "learn" as soon as the context window gets too big.
Might the creator of Claude Code have some … incentives … to develop like that, or at least claim that he does?
As someone who frequently uses Claude Code, I cannot say that a year's worth of features/improvements have been added in the last month. It bears repeating: if AI is truly a 10x force multiplier, you should expect to see a ~year's worth of progress in a month.
Nobody here claimed that Boris wasn't a biased source.
I do however think he is not an actively dishonest source. When he says "In the last thirty days, I landed 259 PRs -- 497 commits, 40k lines added, 38k lines removed. Every single line was written by Claude Code + Opus 4.5." I believe he is telling the truth.
That's what dogfooding your own product looks like!
curl in particular is being plagued by AI-slop security reports which are actively slowing development by forcing the maintainers to triage crap when they could be working on new features (or, you know, enjoying their lives) eg https://www.theregister.com/2025/07/15/curl_creator_mulls_ni...
I have worked on out of sample problems, and AI absolutely struggles, but it dramatically accelerates the research process. Testing ideas is cheap, support tools are quick to write, and the LLM itself is a tremendous research tool itself.
More generally, I do think LLMs grant 10x+ performance for most
common work: most of what people do manually is in the training data (which is why there's so much of it in the first place.) 10x+ in those domains can in theory free up more brain space to solve the problems you're talking about.
My advice to you is to tone down the cynicism, and see how it could help you. I'll admit, AI makes me incredibly anxious about my future, but it's still fun to use.
I am yet to see an "AI doesn't impress me" comment that added anything to the discussion. Yes, there's always going to be a state of the art and things that are as of yet beyond the state of the art.
You're just asking for the opposite of what AI does.
90-99% of an engineer's work isn't entirely novel coding that has never existed before, so by succeeding at what "already exists", it can take us to 10x-100x productivity.
The automation of all that work is groundbreaking in and of itself.
I think that, for a while into the future at least, humans will be relegated to generating that groundbreaking work, and the AI will increasingly handle the rest.
Every two months, I run a very simple experiment to decide whether I should stop shorting NVDA....Think of it as my personal Pelican on a Bike test. :-)
Here is how it works: I take the latest state of the art model, usually one of the two or three currently being hyped....and ask it to create a short document that teaches Java, Python, or Rust, in 30 to 60 min, complete with code examples. Then I ask the same model to review its own produced artifact, for correctness and best practices.
What happens next is remarkably consistent. The model produces a glowing review, confidently declaring the document “production ready”… while the code either does not compile, contains obvious bugs, or relies on outright bad practices.
When I point this out, the model apologizes profusely and generates a “fixed” version which still contains errors. I rinse and repeat until I give up.
This is still true today, including with models like Opus 4.5 and ChatGPT 5.2.
So whenever I read comments about these models being historic breakthroughs, I can’t help but imagine they are mostly coming from teams proudly generating technical debt at 100× the usual speed.
Things go even worst, when you ask the model to review a Cloud Architecture....
Ok, but if you wrote some massive corpus of code with no testing it probably would not compile either.
I think if you want to make this a useful experiment you should use one of the coding assistants that can test and iterate on its code, not some chatbot which is optimized to impress nontechnical people while being as cheap as possible to run.
That depends a lot on the system prompt and the tooling available to the model. Are you trying thin in Claude code or Factory.ai, or are you using a chat interface? The difference in the outcome can be large.
The name of the model is not the end of the story. There is a Pareto frontier of performance vs computational cost, and the companies have various knobs and dials they can tune to trade off performance for cost. This is why openai reports costs of $1k/problem when they test their models on the math/coding benchmarks, yet charge you only $15/month for a subscription to their web interface.
I'm sorry but I don't quite believe you because I've done exactly this for learning much more complicated topics. For fun I've been learning about video game programming in the Odin programming language using a Claude project where I have Opus 4.5 write tutorials, including working code examples that are designed to be integrated with each other into a larger project. We've covered maze generation, Delaunay triangulation, MSTs, state machines, rendering via Raylib and RayGUI, and tweening for animations. All of those worked quite well with only very minor corrections which Opus was also very helpful for diagnosing and fixing. I also had it produce a full tutorial on implementing a relational database in Odin but I haven't had time to work my way through all of it yet. This is all with a somewhat niche language like Odin that I wouldn't expect there to be a lot of training data for so you'll excuse my incredulity that you couldn't get usable introductory code for much more commonly used languages like Java and Python.
I'm wondering if your test includes allowing the models to run their code in order to validate it and then fix it using the error output? Would you be willing to share the prompts and maybe some examples of the errors?
I haven't had many problems working in Claude Code even with full on "vibe coding". One notable recent exception was in writing integration tests for a p2p app that uses WebRTC, XTerm.js, and Yjs where it ran into some difficulty creating a testing framework that involved a headless client and a local MQTT broker where we had to fork a few child processes to test communication between them. Opus got bogged down working on its own so I stepped in and got things set up correctly (while chatting with Opus through the web interface instead of CC). The problem seemed to be due to overfilling the context since the test suite files were too long so I could have probably avoided the manual work by just having Opus break those up first.
Its good that way right? Let me as a human do the interesting thinking for which my brains are meant while you AI do what they chips were built for.
I am happy as is tbh, not even looking for AGI and all. Just that the LLM be close enough to my thinking scale so that it does not feel "why am I talking with this robot".
That's about as far removed from vibe coding as you can get. It's the result of an algorithm developed for a specific purpose by researchers at one of the most advanced machine learning companies.
Who really cares? The goalpost of "AI is useless because I can't vibe code novel discoveries" is a strawman. AI and vibe coding are transformational. So are AI-enhanced efforts to solve longstanding, difficult scientific problems. If cancer is cured with AI assistance, does it really matter if it was vibe-cured or state-of-the-art-lab-cured?
Microsoft is currently hiring engineers to rewrite their entire codebase in Rust via vibecoding. Something to the tune of a million lines of code per developer per month.
… or, let’s see humans who are now 10-100x more productive (due to automation of mundane tasks that are already part of the training data) do the things you’re asking for.
1. Current LLMs do much better than produce "small programs that already exist in multiple forms in the training data". Of course the knowledge they use does need to exist somewhere in training data, but they operate at a higher level of abstraction than simply spitting out programs they've already seen whole cloth. Way higher.
2. Inventing a new compression algorithm is beyond the expectations of all but the the most wild-eyed LLM proponents, today.
"the knowledge they use does need to exist somewhere in training data", I'm not to sure about that. The current coding enviroments for AI give the models a lot of reasoning power with tooling to test, iterate and web search. They frequently look at the results of their code runs now and try different approaches to get the desired result. Its common for them to write their own tests unprompted and re-evaluate accordingly.
trifling.org is an entire Python coding site, offline first (localstorage after first load), with docs, turtle graphics, canvas, and avatar editor, vibe coded from start to finish, with all conversations in the GitHub repo here: https://github.com/zellyn/trifling/tree/main/docs/sessions
This is going to destroy my home network, since I never moved it off the little Lenovo box sitting in my laundry room beside the Eero waypoint, but I’m out of town for three days, so
Granted, the seed of the idea was someone posting about how they wired pyiodide to Ace in 400 lines of JavaScript, so I can’t truly argue it’s non-trivial.
As a light troll to hackernews, only AI-written contributions are accepted
[Edit: the true inception of this project was my kid learning Python at school and trinket.io inexplicably putting Python 3 but not 2 behind the paywall. Alas, Securely will not let him and his classmates actually access it ]
> If AI coding is so great and is going to take us to 10x or 100x productivity
That seems to be a strawman here, no? Sure, there exist people/companies claiming 10x-100x productivity improvements. I agree it's bullshit.
But the article doesn't seem to be claiming anything like this - it's showing the use of vibe-coding for a small personalized side-project, something that's completely valid, sensible, and a perfect use-case for vibe-coding.
> let's see it generate a new, highly efficient compression algorithm or a state-of-art travelling salesman solution.
This is the "promise" that was being sold here and in reality, we yet haven't seen anything innovative or even a sophisticated original groundbreaking discovery from an LLM with most of the claims being faked or unverified.
Most of the 'vibe-coding' uses here are quite frankly performative or used for someone's blog for 'content'.
It’s bizarre to me that anyone would get confused by this. I can only imagine it’s a USA thing. Indicators flash on the side you’re turning towards. I’d never expect it to be an arrow pointing in any direction.
"Even if you could push through the bitterness, it’s unlikely you’d be able to stomach the bucketfuls of tea required to get enough salicin from willow bark (or similar plants) to ease your discomfort."
So, rather than killing pain, they probably just stopped complaining about it to save them from having to drink any more bitter willow tea.
In a manner of speaking, the grid is already the storage mechanism. In summer you sell the excess to the grid; in winter you buy it back. Obviously you pay more to buy than you get for selling but that's the premium for using someone else's infrastructure. You'd spend a load more buying a battery the size of a small house.
reply