Hacker Newsnew | past | comments | ask | show | jobs | submit | vikramkr's commentslogin

this model is whack. Exclamation marks everywhere, sycophantic - not producing working code on prompts the other models handle fine.

"The reason it is echoing back your messages is because gpt-5.4-nano is a fictional model name!"

"Everything is in perfect order! Let's-Go-ready for the next phase, which will connect this durable infrastructure to the user-facing UI!"

It's like they RLed it on thumbs up and downs on ai overview responses and forgot to make it not be a sycophantic echo chamber machine. And like, the thing it built doesn't work because it's not actually in perfect order, but it doesn't seem to be able to figure out what's wrong because everything is clearly remarkably engineered


That's a list of like 6 things. And each of those less complicated a question then the seven thousand questions people throw at you when you complain about something not working right on a Linux distro or about speeding up build times for a new tool or configuring webpack or like pretty much any software tool. What lint rules are you using are you using poetry or uv are you running on Mac windows linux or wsl how are your security groups configured in aws - some tools are more plug and play but it's quite the stretch to say that asking "how is your code organized, do you have your agents.md config file set up, do you have tests, and how large is the codebase" is some sort of unmanageable list of questions for a software engineer to think through when figuring out wtf is going on with some new tooling they're using

My take is there was one big inflection point around opus 4.5 when they got the agentic stuff working and now whether or not it works depends on whether your use case/area of software engineering is profitable enough for the companies to have spent a bunch of money generating synthetic data to RL on, or if it's similar enough to areas that they've done that for. With similar enough being a very loose constraint given how much overlap there is in a lot of coding fundamentals. Tbh if the models aren't working for you now I don't think they're gonna be working for you in 6 months

It's very real but probably very domain specific. It got really good at a lot of traditional web dev stuff, bash, sql, and writing one off scripts to accomplish random tasks (hence all the agent stuff taking off). And they got good at staying on task. That may not translate to game dev because from what I understand a lot of these gains are basically around post training methods driven by synthetic data generation etc (with potential caveats on how synthetic that data actually is lol). I wouldn't be surprised if the areas of code the llms are good at now are straight up just product decisions of where to allocate budget for generating those synthetic data sets, and game dev stuff might not be at the top of the list because the customer base for that might not be as big

If I had to guess - lumber costs might be dominated by labor costs? If they don't have guest worker programs it might not be cost effective anymore as wages go up

That's not actually a thing. Very few trees we plant have specific male vs female plants. One of the few that does that gets brought up in this context, ginko, tends to have male trees preferred because the fruit kind of reeks. Ginkgo fruit is also toxic so you really don't want masses of it getting washed into local waterways in ecosystems the tree isn't native to - not a great time for the local wildlife. A significant supermajority of all the rest of the trees that you plant in cities are gonna have male and female flowers on the same plant or male and female structures within the same flower.

Cool, I did not know that this is so disputed a quasi factoid. Thanks for cleaning my brain!

Germany has “Baumkataster” which are databases for public trees in cities, they save all kind of tree metadata but gender is missing …

https://hub.arcgis.com/search?tags=baumkataster


I do think you mean sex, not gender - trees don't really have a gender or gender expression. Either way, it would be rather irrelevant, as most trees planted in cities have both male and female flowers (oak, birch, most conifers), or even hermaphroditic flowers (citrus).

Great, now I have to hate trees for being gender-fluid, too?

I just spent $500 on gay conversion therapy for my dog! /s


It's not really disputed. It's something that happened in one small place or two that people insist on repeating on the internet as if it's some universal thing.

thanks for this clarification. until today i was under the impression that they planted male trees only because they looked prettier and weren't as messy as the female ones (to reduce the cleaning bill of the local municipal)

For examples from more recent history think giant scissors used for ribbon cutting ceremonies, the golden spike used to signal completion of the intercontinental railroad, or like all the stuff related to militaries like changing of the guard that are ceremonial. Or even something like leaving a celebratory emoji on the first or merged by a new hire or a box of donuts on someone's birthday. Or bringing out the champagne after closing on a house. Theres just a ton of ritual in day to day life, and even more surrounding big high impact moments that might leave behind a bunch of artifacts like weddings and funerals


> TanStack is no way saver than npm. No one understands TanStack.

Pandas is also in no way safer than pip. Because pandas is a library and pip is a package manager and that comparison makes no sense lmao. It sounds like you maybe don't really get or use typescript and don't even really use like basic mypy style types in python (or don't get the difference between what a zod/pydantic validator does vs what a mypy/typescript type system does - zod is also only on the boundary). Which is OK but but there's a difference between not getting why a stack is useful or not having experience with it versus confidently and comically declaring that nobody else understands types either while seeming not understanding what any of the parts here do


For models that reveal reasoning traces I've seen their inner nature as a word calculator show up as they spend way too many tokens complaining about the typo (and AI code review bots also seem obsessed with typos to the point where in a mid harness a few too many irrelevant typos means the model fixates on them and doesn't catch other errors). I don't know if they've gotten better at that recently but why bother. Plus there's probably something to the model trying to match the user's style (it is auto complete with many extra steps) resulting in sloppier output if you give it a sloppier prompt.


Just a tier list I think


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: