I asked about this in previous HN discussions. The old rule: You had to wait three months. Is there a meaningful economic difference between waiting 15 days or ~90 days? I don't see it. (For transparency: I own ETFs that track the S&P 500, which has lots of overlap with Nasdaq 100.) To be clear: OpenAI and Anthropic will sure IPO this year or next and have a similar effect -- they will be (or nearly) trillion dollar market caps upon listing.
> SpaceX has successfully lobbied the Nasdaq stock exchange to loosen rules governing how and when it adds companies to its Nasdaq 100 index – a group of large-cap companies that it bills as “fundamentally sound and innovative.”
"They" aren't a single group. Broadly speaking, publishers are the ones suing anna's archive, and they're involved in suits against AI companies as well. I'm not aware of any efforts by AI companies to take down anna's archive.
The magazin was bought I think. There's always some interesting background of the stories.
Solarcity was clearly a great example of Elon's ,,no investor left behind'' philosophy: if he promotes a company and gets investors to invest in it, he is doing whatever he can to make sure that they at least don't lose their money (by merging it to a bigger company he controls), even if it wouldn't be the best financial decision.
So far this strategy has been working quite well for both him and the investors.
It would be better for investors if he didn't destroy all his products with both his obscene politics and also his little fantasy pet projects that no one wants. Eventually the house of cards will collapse. At this point all that investors can hope for is something akin to a government bond. Unfortunately one for a government that is currently excelling at depreciating anything related to it.
,,a beginner is pushed toward using AI before they have built the instincts the AI is replacing. That is an anti-pattern.''
The same article talks about CTF skills as a way to learn about security best practices and separately a sport.
In reality it was all about learning an extremely important skillset (securing/attacking software and systems) that is getting automated.
The real thing the author seems to be frustrated about is AGI is coming in computationally verifiable domains first, and lot of his skillset was taken over in a big part.
I'm actually excited for somebody trying experimenting with automated translation, but I'm afraid this will be lots of backwards compatibility issues.
I started looking at the commits, and it's basically solving the ,,tests not pass'' problem by changing the tests themselves. The real work of making it working on programs that are already deployed will be just starting now.
The only silver lining I see is that the server side JS community for some reason is already used to breakages all the time.
> it's basically solving the ,,tests not pass'' problem by changing the tests themselves.
False.
0 test files were deleted. 0 pre-existing tests were skipped, todo’d, or had assertions removed. 5 new tests were added in test.skip/test.todo state to track known not-yet-fixed bugs in the port that lacked test coverage before.
The merge changed 28 test files in total.
+1,312 lines
−141 lines
Most of that +1,312 is new tests.
The depth-of-recursion tests for TOML/JSONC parsers went from 25_000 -> 200_000 because Rust’s smaller stack frames (LLVM lifetime annotations let the optimizer reuse stack slots) mean 25k levels no longer reaches the 18 MB stack on Windows.
It's too bad you haven't structured the commits and pull requests a bit differently so that it's easier to review the exact changes, but I hope it goes well.
For example doing the test refactorings in a first pull request, and using something like test.xfail that is first fails then after the merge succeeds (but the test code itself doesn't change).
Also I have seen some tests getting stricter, which is again not a problem, but separating to a different pull request would have improved the reviewability significantly for a runtime that many people and companies depend on.
I'm sorry you were downvoted by HN and your comment got ,,dead'', that's not the way to review things.
The whole idea that my RUNTIME contains code that a single human hasn't looked at does make me uncomfortable, but if this actually works without a ton of issues it's pretty remarkable.
The speed of the change did. This is the “climate has always been changing” argument climate deniers make. It is a true statement which is still a lie by omission. Climate deniers purposely ignore that the climate has never changed at the current rate, and AI-stans neglect to mention that before AI nobody was merging a 1M+ lines of code in one go.
No that's my point, Jarred didn't write the code. Before AI, at least the person who wrote the code "reviewed" it (as being aware of the code you wrote was a necessary part of the process of writing code).
On the other hand, the sleep fits better to the test description, "should allow reading stdout after a few milliseconds". Even if 1 != 'a few'. It's possible the part of the commit reverted here, https://github.com/oven-sh/bun/commit/a42bf70139980c4d13cc55..., defeated the purpose of the test by removing the sleep. I don't think adding the sleep back is an example of AI cheating.
> I started looking at the commits, and it's basically solving the ,,tests not pass'' problem by changing the tests themselves
Not sure if these decisions were made by the LLM, but I've always felt that Claude is more prone to doing "shady stuff" like modifying tests than finding correct solutions to problems.
Yeah, Claude is very creative in finding ways of "solving" problems that go against what the user probably intended.
Having said that, after looking at some of the test changes, they seem to be minor things, like changing timeouts, not changing the actual intended semantics of the tests. But it's too much code to review everything, so I might be completely wrong about that, and in real-world usage, even minor changes like these will cause issues.
I doubt it will end up as stable release very soon, but I'm happy to be proven wrong. I have some skepticism about this whole rewrite, Jarred Sumner has enormous internet following and it feels like an ad.
How do you wash to define ad, and why does it matter? If I tell you I had lunch, I mean. okay, great. If I tell you I had a delicious Coca-Cola with my lunch, sure. If I happen to work at Coca-Cola, does that now become an ad? And what level does it become an issue? And I what is the issue?
If you work for Coca-Cola then yea there’s reason to question your intent even if simply because you aren’t objective due to your proximity to Coca-Cola.
> I started looking at the commits, and it's basically solving the ,,tests not pass'' problem by changing the tests themselves. The real work of making it working on programs that are already deployed will be just starting now.
Wow, This is definitely quite something for sure.
Can jarred comment about if he has read the commits or not too or respond to your comment, this has basically made me lose the small faith I had in what bun is doing if it turns out to be correct.
It's OK, we'll see how it goes. He and Antropic are giving it us for free, and nowdays just forking the old version is easy if a project needs that. Even maintenance is much easier using LLMs.
I'm happy it's not a project I'm depending on, but a large enough project had to try this at some point so that we all can learn from how it goes.
I think this is why Antropic bought bun, so that they can sell big code translation as a feature for all the banks with COBOL code that they want to get rid of for a long time.
Still, those banks / enterprises won't appreciate the number of unit test changes.
And I agree with another comment that Codex xhigh is much better for these kinds of tasks, but still hard on this kind of scale.
Jared has commented on this elsewhere in the thread, basically claiming the parent you replied to is outright lying: it has removed no tests and has not meaningfully changed annotations to reduce coverage of effectiveness. It added additional tests and made a few changes to hard coded values due to differences in, as an example, how LLVM and Zig handle stack frames.
The MR is right there, linked at the top of this page. You can check who is telling the truth.
That said, I don't know how anyone is actually claiming to have done that. All day, the size of the MR makes the diff take too long to load and GitHub dies. I'll have to pull it later to check myself.
in tsz[0] 100% of tests pass yet I have a ton of bugs. I don't think any software out there is fully tested really. I'm experimenting this this idea as well. So far learned a ton.
I'm convinced the future of writing code is heavily LLM assisted
I think Andrej Karpathy's quote summarizes well what all software engineers are going through:
,,you can outsource your thinking but not your understanding''
There's just no way to not generate much more amount of code with LLMs than we would do as humans, so well structuring code gets much more important than ever before.
I don't fully agree with the quote. You can't really outsource thinking nor understanding. You can outsource the generation of streams of tokens that may or may not be appropriate for what you're looking for. But you absolutely have to know what you're looking for, or have a very solid intuition of what it should look like and behave, otherwise you're just digging your own grave.
The skill is in making the LLMs reliably generate useful and pertinent streams of tokens. That takes work, reading the output, intuition, experience, rigor, real commitment to doing good work, not fall prey to being lazy, etc.
The compounding issue is that understanding atrophies without thinking constantly reinforcing it. Shed no tears for the boilerplate, but the exercise is useful.
Google needs to beat OpenAI and Antropic in coding models because that's where the big money is going. I love using the Gemini pro model for quick questions, but that's not where I'm spending the real money.
They have so many great software engineers but unable to use them to speed up coding AI research. Hopefully with Sergey's focus it will get better.
This cursor thing is just another experiment nobody cares about.
Token cost started increasing exponentially for frontier LLMs, and they improved mostly on coding tasks incredibly over the last half year while staying behind in non-verifiable tasks.
The main social problem with automation in general was that less intelligent people have been left behind as only boring physical tasks are left for them to do, and people don't generally want to go back destroying their body from the prospects of an office job.
At some point frontier AI will only getting only worthwile to use for only super highly intelligent and motivated AI researchers which is a tiny part of the population.
May I also add that this isn't just (or at all) about intelligence.
I'm lucky enough to be at a company where I have a large budget in terms of what I can spend in tokens. This gives me an enormous advantage over someone who is just as intelligent as me and who has the same experience as me minus the interaction I have with LLMs.
In this case the crucial difference is not intelligence, it's that I found myself in the right place to be able to go up, whereas a lot of people which are otherwise like me didn't get that opportunity through no fault of their own.
People tend to attribute their successes to their own merit and their failures to happenstance, but if we're honest with ourselves the real world has a lot of randomness in it.
You're totally right, I probably simplified the problem too much. At the same people don't just get randomly assigned to companies, and I know I would quickly switch if I would be working at a company which doesn't have this policy.
reply