RE 2: It's not that far down the road either. Laziliy reviewed or unreviewed LLM code rapidly turns your codebase into an absolute mess that LLMs can't maintain either. Very quickly you find yourself with lots of redundant code and duplicated logic, random unused code that's called by other unused code that gets called inside a branch that only tests will trigger, stuff like that. Eventually LLMs start fixing the code that isn't used and then confidently report that they solved the problem, filling up a context window with redundant nonsense every prompt, so they can't get anywhere. Yolo AI coding is like the payday loan of tech debt.
This can happen sooner than you think too. I asked for what I thought was a simple feature and the AI wrote and rewrote a number of times trying to get it right, and eventually (not making this up) it told me the file was corrupt and could I please restore it from backup. This happened within about 20-30 minutes of asking for the change.
Sure thing. I work on a few projects for my company. The main one is an Android and iOS audiobook-media-player app. I had to update the Android side to use the Google Media3 library instead of the legacy ExoPlayer library. In a typical app this would be pretty straightforward, but we’ve mixed in a lot of custom elements that made the transition quite challenging. I actually attempted it manually back in the day and probably spent three or four days on it, but I simply didn’t have time to finish, so I shelved it. A few months ago I tried again in Codex; within two prompts it was done flawlessly and starting from the beginning.
Another example: I also maintain the back-end API for these apps. There was a lot of old utility code that was messy, unorganized, and frankly gross. I had Codex completely reorganize the code into a sensible folder structure and strip out any functions that weren’t being used anymore—something like 300-plus functions. Doing that by hand would easily have taken a day or more
I could go on, but those were the two initial “aha” moments that showed me how powerful this tool can be when it’s used for the right tasks.
30% of code written by AI, but 100% of tools must be enshittified with the terrible and behind Microsoft Copilot even if it means you will blow up the goodwill for VS Code in a matter of months
The title of this article leads with "Jewish, Pro-Israel MIT Professor..." so I think they've already decided to go with the "victim of antisemitism" default until proven otherwise.
Another data point is that Jews are getting killed and assaulted around the world. With that said, I agree that for now there's no actual evidence supporting this allegation. But I wouldn't be totally shocked to learn that his ethnicity or zionist beliefs had something to do with this, if indeed he was Jewish (which hasn't been confirmed).
The problem is that most people have many parts of their identities and you don’t know which factored into the attack. It certainly wouldn’t be a shock if it was anti-Semitism but it’s unclear why he would have been singled out from the many thousands of other Jews in the Boston area.
This is problematic because most of the sources saying he was Jewish and pro-Israel seem to be quoting each other. The Wikipedia reference was added yesterday and removed today because the linked sources didn’t say anything about his religion, and I haven’t seen any sources about pro-Israel stances which I’d think would be easier to find if he was outspoken enough to be targeted. It’s still quite possible that he was the unfortunate victim of a stalker-most of the professors I know have had to work with security to keep someone off campus because colleges attract a certain brand of mentally ill people–but it seems odd that these sources are so confident about this assertion without citing sources.
Based on e.g. https://news.mit.edu/2018/nuno-loureiro-faculty-physics-1016 it really seems like his passion was physics and I think we should commemorate someone who tried to improve humanity’s understanding of the universe. If new details emerge, I’m sure they’ll be posted here.
There is no global hate movement against lefties that encourages and relishes in their pain. There is for Jews, who despite accounting for only 2% of the US population are victims of 69% of religious-based hate crimes. Doing the math, Jews are 35x more likely to be the victim of a hate crime. This is not true for lefties.
(This is a general statement responding to your analogy. As I mentioned in my earlier comment, I don't even know if this professor was Jewish or why he was killed.)
No, I don't. I include only FBI statistics of hate crimes targeting religious groups in the United States. Jews are 35x more likely to be victims of hate crimes in the US. Nothing to do with Gaza.
Some people refuse to acknowledge this reality and others attempt to justify it. Many resort to sarcasm as a defense mechanism, revealing their own biases in written records on major public forums.
First of all, that’s not true. Your statistic is probably based on one that indicates Jews account for 69% of religious-based hate crimes, while being 2% of the population. That’s about 35 times more likely if religious-motivated hate crimes were the only type of hate crime. But they’re not, so you’re just misrepresenting the data. The most generous stat you could use would be the one from 2023-2024, which has Jews as 16% of all hate crimes in the US, so an 8x multiplier. But this was a dramatic uptick, which came along with the genocide being committed in their name.
Also, there is a massively asymmetric application of hate crime laws, as you can clearly see by the automatic “hate crime” conclusion you’re already seeing here simply because the victim was Jewish. This asymmetry is glaringly obvious when you look at the handling of these two stabbings.
In one case, the perpetrator stabbed a white woman to death, and said on camera "got the white bitch." In the other case, the subway stabbing happened "blocks from" a synagogue following an argument. Which one do you think gets the hate crime treatment?
This asymmetry makes it impossible to gain much insight from the statistics on this. It’s very likely that 8x is a very high upper bound, and only in an exceptional year where those stats coincided with a genocide committed in their name, which has been a cause for global outrage and disgust.
You’re not “correcting” me. You’re swapping denominators and then accusing me of misrepresentation.
The ~69% figure is not “probably based on” anything. It’s directly from the FBI’s 2023 hate crime data as summarized by DOJ: 2,699 religion-based incidents, 1,832 anti-Jewish. That is 1,832 / 2,699 = 67.9% (call it ~68–69%).
Source: https://www.justice.gov/crs/news/2023-hate-crime-statistics
Now you try to “debunk” that by quietly switching the denominator to all hate crimes. Fine. Do that math too: 1,832 / 11,862 total incidents = 15.4% of all reported hate crime incidents in 2023. For a ~2% population, that’s still about 7–8x disproportionate. So no, it’s not “not true.” You’re just changing the question and hoping nobody notices. You even implicitly concede the underlying statistic (“69% of religion-based hate crimes”) and then pretend it’s false by changing denominators mid-argument.
Your “only if religion-based hate crimes were the only type” line is nonsense. I explicitly restricted the claim to religion-based incidents, and the DOJ/FBI table does the same. You’re arguing with a strawman you invented.
As for “overreported” and “asymmetric” enforcement: that’s vibes plus two cherry-picked links about a specific incident. If you think the FBI/DOJ figures are inflated, show a dataset and a method, not anecdotes and insinuation.
Also, plenty of incidents never get reported at all. I’ve personally been assaulted for being Jewish and didn’t report it. That is what undercount looks like in real life.
Finally, please stop misrepresenting what I wrote. I explicitly said “religion-based hate crimes.” Your comment only makes sense if you pretend I didn’t.
Switched the denominator? So you were specifically talking about religious-based hate crimes? Why were you talking about that very specific subset, and why wouldn’t you mention that or imply it anywhere in your comment? You wouldn’t be… a liar, would you?
Also, nice AI slop - I stopped reading at the first angle quotes.
You’re accusing me of “not mentioning the subset” while quoting a thread where I literally wrote “religious-based hate crimes.” So either you missed it or you’re pretending you missed it. But it's here in this exact thread for anyone to see. Permalink: https://news.ycombinator.com/item?id=46304753
If you want to change the denominator to “all hate crimes,” say so up front. That gives 15.4% of all incidents, still massively disproportionate.
It's common to use angle quotes on HN, but either way, you accusing me of "AI slop" because you don't like the way I quote things doesn't change the arithmetic and is not a rebuttal.
I see it now, you said that in a different comment which wasn’t the one I replied to. My bad for not noticing.
Still, restricting it to “religious-based” hate crimes is transparently misleading. Using a statistic from a narrow category to imply a claim about the whole is a classic substitution error. Either you are lacking in statistical literacy, or you are being intentionally misleading.
And let’s not forget the massive, undeniable asymmetry here that makes the entire point meaningless. None of this is sufficient to assume that a crime against a Jew is automatically a hate crime until proven otherwise.
Thanks for the correction, I appreciate you owning it.
But the rest is just another goalpost move. Quoting a clearly labeled subset is not “transparently misleading,” as you put it It’s how statistics work. I said “religious-based hate crimes” explicitly, because we were discussing hostility toward Jews. The DOJ/FBI table is explicit: 2023 had 2,699 religion-based hate crime incidents; 1,832 were anti‑Jewish.
And I already gave the “whole” denominator too: those same 1,832 incidents are 15.4% of all 11,862 hate crime incidents in 2023. For a 2% population, that is still ~7–8x disproportionate, as I've mentioned. So the “substitution error” accusation doesn’t apply here, because I didn’t imply 69% of all hate crimes. I stated the subset and then did the math for the broader denominator as well.
On the “asymmetry makes it meaningless” claim: I see you're asserting that, but you haven’t demonstrated it. FBI hate crime data is not “crime against a Jew = hate crime until proven otherwise.” It’s incidents agencies specifically classify as bias-motivated based on evidence. The well-known problem in this space is underreporting and incomplete reporting, not some magical inflation that conveniently zeros out anti‑Jewish bias. I can attest to the underreporting having not reported an assault where I was beat up on the NYC subway and told "they should have burned you all" while minding my own business on an NYC subway.
Finally, none of this was me calling any specific crime a hate crime. I explicitly said we don’t know the motive in the professor’s case. This thread started because you challenged a statistical claim. The numbers stand. Given you opened with "liar" and "AI slop," you might want to recalibrate before accusing others of ‘statistical illiteracy'.
Using a religion-specific hate-crime metric to argue about hate crimes in general is not valid inference. It’s a case of category substitution amplified by base-rate neglect, and is misleading even if every quoted number is technically true.
You’re still arguing with a sentence I did not write.
I did not use a “religion-specific metric to argue about hate crimes in general.” I said, explicitly, “69% of religious-based hate crimes.” Then, when you insisted on the “general” denominator, I gave that too: anti‑Jewish incidents are 15.4% of all hate crime incidents in 2023, still ~7–8x disproportionate for a ~2% population. Both numbers come from the same DOJ/FBI table.
https://www.justice.gov/crs/news/2023-hate-crime-statistics
So the “category substitution / base-rate neglect” lecture is just a rhetorical reset button. You keep pretending I implied “69% of all hate crimes” because that’s the only way your critique has a target.
At this point the pattern is clear:
1) miss what I actually wrote,
2) accuse me of lying/AI,
3) admit you missed it,
4) reframe anyway by inventing a broader claim I never made,
5) argue against your invention.
I’m not doing more laps of that. If you want to dispute the DOJ/FBI numbers or show actual evidence of systematic inflation, present a dataset and method. Otherwise we’re done here.
Contrary to the consensus opinion, losing a war one started is not genocide. For any doubts you can use comparables for civilian deaths in various theatres of war throughout history.
He's saying Hamas lost a war, that's all that happened. You're making an unrelated point, which is that genocide is often carried out in the context of war. That may be true, but that doesn't make the hoax that Israel's war against Hamas was a genocide any less false.
Israel's war against Hamas was not a genocide. (Nor was it a distinct war, but merely part of the much longer war against the Palestinian people.)
Israel's war against Hamas was part of a campaign of genocide against the Palestinian people that has been conducted through much of that longer war, a campaign that it started decades before Hamas existed (and fostered the creation of Hamas, during the more intense period of its occupation of Gaza, as a tactic to facilitate through both dividing its opposition and making it less internationally sympathetic, as the primary constraint on the campaign has always been international, and particular US, tolerance.)
Of course it's not and never was a genocide. But jimbo808 wishes it were, because he thinks that will help him justify the very rise in hate crimes against Jews that he also tries to downplay.
jimbo808 wrote: "The most generous stat you could use would be the one from 2023-2024, which has Jews as 16%... which came along with the genocide being committed in their name."
What kind of worldview motivates such a comment? He invents a genocide and says it's being committed in the name of American Jews? This is a novel claim even by the low standards of the antizionist crowd. Laughable.
They don't have to micromanage companies. A company's activities must align with the goals of the CCP, or it will not continue to exist. This produces companies that will micromanage themselves in accordance with the CCP's strategic vision.
I don't believe that they believe it, I believe that they're all in on doing all the things you'd do if your goal was to demonstrate to investors that you truly believe it.
The safety-focused labs are the marketing department.
An AI that can actually think and reason, and not just pretend to by regurgitating/paraphrasing text that humans wrote, is not something we're on any path to building right now. They keep telling us these things are going to discover novel drugs and do all sorts of important science, but internally, they are well aware that these LLM architectures fundamentally can't do that.
A transformer-based LLM can't do any of the things you'd need to be able to do as an intelligent system. It has no truth model, and lacks any mechanism of understanding its own output. It can't learn and apply new information, especially not if it can't fit within one context window. It has no way to evaluate if a particular sequence of tokens is likely to be accurate, because it only selects them based on the probability of appearing in a similar sequence, based on the training data. It can't internally distinguish "false but plausible" from "true but rare." Many things that would be obviously wrong to a human, would appear to be "obviously" correct when viewed from the perspective of an LLM's math.
These flaws are massive, and IMO, insurmountable. It doesn't matter if it can do 50% of a person's work effectively, because you can't reliably predict which 50% it will do. Given this unpredictability, its output has to be very carefuly reviewed by an expert in order to be used for any work that matters. Even worse, the mistakes it makes are meant to be difficult to spot, because it will always generate the text that looks the most right. Spotting the fuckup in something that was optimized not to look like a fuckup is much more difficult than reviewing work done by a well-intentioned human.
No, Anthropic and OpenAI definitely actually believe what they're saying. If you believe companies only care about their shareholders, then you shouldn't believe this about them because they don't even have that corporate structure - they're PBCs.
There doesn't seem to be a reason to believe the rest of this critique either; sure those are potential problems, but what do any of them have to do with whether a system has a transformer model in it? A recording of a human mind would have the same issues.
> It has no way to evaluate if a particular sequence of tokens is likely to be accurate, because it only selects them based on the probability of appearing in a similar sequence, based on the training data.
This in particular is obviously incorrect if you think about it, because the critique is so strong that if it was true, the system wouldn't be able to produce coherent sentences. Because that's actually the same problem as producing true sentences.
(It's also not true because the models are grounded via web search/coding tools.)
> if it was true, the system wouldn't be able to produce coherent sentences. Because that's actually the same problem as producing true sentences
It is...not at all the same? Like they said, you can create perfectly coherent statements that are just wrong. Just look at Elon's ridiculously hamfisted attempts around editing Grok system prompts.
Also, a lot of information on the web is just wrong or out of date, and coding tools only get you so far.
"Paris is the capital of France" is a coherent sentence, just like "Paris dates back to Gaelic settlements in 1200 BC", or "France had a population of about 97,24 million in 2024".
The coherence of sentences generated by LLMs is "emergent" from the unbelievable amount of data and training, just like the correct factoids ("Paris is the capital of France").
It shows that Artificial Neural Networks using this architecture and training process can learn to fluently use language, which was the goal? Because language is tied to the real world, being able to make true statements about the world is to some degree part of being fluent in a language, which is never just syntax, also semantics.
I get what you mean by "miracle", but your argument revolving around this doesn't seem logical to me, apart from the question: what is the the "other miracle" supposed to be?
Zooming out, this seems to be part of the issue: semantics (concepts and words) neatly map the world, and have emergent properties that help to not just describe, but also sometimes predict or understand the world.
But logic seems to exist outside of language to a degree, being described by it. Just like the physical world.
Humans are able to reason logically, not always correctly, but language allows for peer review and refinement. Humans can observe the physical world. And then put all of this together using language.
But applying logic or being able to observe the physical world doesn't emerge from language. Language seems like an artifact of doing these things and a tool to do them in collaboration, but it only carries logic and knowledge because humans left these traces in "correct language".
> But applying logic or being able to observe the physical world doesn't emerge from language. Language seems like an artifact of doing these things and a tool to do them in collaboration, but it only carries logic and knowledge because humans left these traces in "correct language".
That's not the only element that went into producing the models. There's also the anthropic principle - they test them with benchmarks (that involve knowledge and truthful statements) and then don't release the ones that fail the benchmarks.
And there is Reinforcement Learning, which is essential to make models act "conversational" and coherent, right?
But I wanted to stay abstract and not go into to much detail outside my knowledge and experience.
With the GPT-2 and GPT-3 base models, you were easily able to produce "conversations" by writing fitting preludes (e.g. Interview style), but these went off the rails quickly, in often comedic ways.
Part of that surely is also due to model size.
But RILHF seems more important.
I enjoyed the rambling and even that was impressive at the time.
I guess the "anthropic principle" you are referring to works in a similar direction, although in a different way (selection, not training).
The only context in which I've heard details about selection processes post-training so far was this article about OpenAIs model updates from GPT-4o onwards, discussed earlier here:
The parts about A/B-Testing are pretty interesting.
The focus is ChatGPT as an enticing consumer product and maximizing engagement, not so much the benchmarks and usefulness of models. It briefly addresses the friction between usefulness and sycophancy though.
Anyway, it's pretty clever to use the wording "anthropic principle" here, I only knew the metaphysical usage (why do humans exist).
Because it's not a miracle? I'm not being difficult here, it's just true. It's neat and fun to play with, and I use it, but in order to use anything well, you have to look critically at the results and not get blinded by the glitter.
Saying "Why can't you be amazed that a horse can do math?" [0] means you'll miss a lot of interesting phenomena.
If you dont believe they believe it you havent paid any attention to the company. Maybe Dario is lying, although that would be an extremely long con, but the rank and file 100% believe it.
It's good we are building all this excess capacity which will be used for applications in other fields or research or open up new fields.
I think the dilemma I see with building so much data centers so fast is exactly like whether I should buy latest iPhone now or should wait few years when the specs or form factor improves later on. The thing is we have proven tech with current AI models so waiting for better tech to develop on small scale before scaling up is a bad strategy.
Claude is pretty good at totally disregarding most of what’s in your CLAUDE.md, so I’m not optimistic. For example a project I work on gives it specific scripts to run when it runs automated tests, because the project is set up in a way that requires some special things to happen before tests will work correctly. I’ve never once seen it actually call those scripts on the first try. It always tries to run them using the typical command that doesn’t work with our setup, and I have to remind it the what correct thing to run is.
I've had a similar experience with Gemini ignoring things I've explicitly told it (sometimes more than once). It's probably context rot. LLM give you a huge advertised number of tokens in the context, but the more stuff you put in there, the less reliably it remembers everything, which makes sense given how transformer attention blocks work internally.
That's kind of the opposite of what I mean. CLAUDE.md is (ostensibly) always loaded into the context window so it affects everything the model does.
I'm suggesting a POTENTIAL_TOOLS.md file that is not loaded into the context, but which Claude knows the existence of. That file would be an exhaustive list of all the tools you use, but which would be too many tokens to have perpetually in the context.
Finally, Claude would know - while it's planning - to invoke a sub-agent to read that file with a high level idea of what it wants to do, and let the sub-agent identify the subset of relevant tools and return those to the main agent. Since it was the sub-agent that evaluated the huge file, the main agent would only have the handful of relevant tools in its context.
Claude is pretty good at forgetting to run maven with -am flag, writing bash with heredocs that it's interpreter doesn't weird out on, using the != operator in jq. Maybe Claude has early onset dementia.
I had the same problem. My Claude md eventually gets forgotten and it forgets best practices that I put in there. I've switched to using hooks that run it through a variety of things like requiring testing. That seems to work better than Claude md because it has to run the hook every time it makes changes.
I really need something like this up for tasks I want Claude to run before handing off a task to me as "complete". It routinely ignores my instructions of checklist items that need to be satisfied to be considered successful. I have a helper script documented in CLAUDE.md that lets Claude or me get specific build/log outputs with a few one liner commands yet Claude can't be bothered to remember running them half the time.
Way too frequently Claude goes, "The task is fully implemented, error free with tests passing and no bugs or issues!" and I have to reply "did you verify server build/log outputs with run-dev per CLAUDE.md". It immediately knows the command I am referencing from the instructions buried in its context already, notices an issue and then goes back and fixes it correctly the second time. Whenever it happens it instantly makes an agentic coding session go from feeling like breezy, effortless fun to pulling teeth.
I've started to design a subagent to handle chores after every task to avoid context pollution but it sounds like hooks are the missing piece I need to deterministically guarantee it will run every time instead of just when Claude feels the vibes are right.
Instead of including all these instructions in CLAUDE.md, have you considered using custom Skills? I’ve implemented something similar, and Skills works really well. The only downside is that it may consume more tokens.
Yes, sometimes skills are more reliable, but not always. That is the biggest culprit to me so far. The fact that you cannot reliably trust these LLMs to follow steps or instructions makes them unsuitable for my applications.
Another thing that helps is adding a session hook that triggers on startup|resume|clear|compact to remind Claude about your custom skills. Keeps things consistent, especially when you're using it for a long time without clearing context
The matching logic for a skill is pretty strict. I wonder whether mentioning ‘git’ in the front matter and using ‘gitlab’ would give a match for a skill to get triggered.
reply