Hacker Newsnew | past | comments | ask | show | jobs | submit | bonesss's commentslogin

Bumping into those limits is trivial, those 5 hour windows are anxiety inducing, and I guess the idea is to have a credit card on tap to pay for overages but…

I’m messing around on document production, I can’t imagine being on a crunch facing a deadline or dealing with a production issue and 1) seeing some random fuck-up eat my budget with no take backs (‘sure thing, I’ll make a custom docx editor to open that…’), 2) having to explain to my boss why Thursday cost $500 more than expected because of some library mismatch, or 3) trying to decide whether we’re gonna spend or wait while stressing some major issue (the LLM got us in it, so we kinda need the LLM to get us out).

That’s a lot of extra shizz on top of already tricky situations.


Just a thought: the timeline of the vibe techs rolling out and the timeline of increasing product rot, sloppiness, and user-hostile “has anyone ever actually used this shit!?!” coming out of MS overlap.

Vibing won’t help out at all, and years from now we’re gonna have project math on why 10x-LLM-ing mediocre devs on a busted project that’s behind schedule isn’t the play (like how adding more devs to a late project generally makes it more late). But it takes years for those failures to aggregate and spread up the stack.

I believe the vibing is highlighting the missteps from the wave right before which has been cloud-first, cloud-integrated, cloud-upselling that cannibalized MS’s core products, multiplied by the massive MS layoff waves. MS used to have a lot of devs that made a lot of culture who are simply gone. The weakened offerings, breakdown of vision, and platform enshittification have been obvious for a while. And then ChatGPT came.

Stock price reflects how attractive stocks are for stock purchasers on the stock market, not how good something is. MS has been doing great things for their stock price.

LLMs make getting into emacs and Linux and OSS and OCaml easier than ever. SteamOS is maturing. Windows Subsytem for Linux is a mature bridge. It’s a bold time for MS to be betting on brand loyalty and product love, even if their shit worked.


I also have felt like these kinds of efforts at instructions and agent files have been worthwhile, but I am increasingly of the opinion that such feelings represent self-delusion from seeing and expecting certain things aided by a tool that always agrees with my, or its, take on utility. The agent.md file looks like it’d work, it looks how you’d expect, but then it fails over and over. And the process of tweaking is pleasant chatting with supportive supposed insights and solutions, which means hours of fiddling with meta-documentation without clear rewards because of partial adherence.

The papers conclusions align with my personal experiments at managing a small knowledge base with LLM rules. The application of rules was inconsistent, the execution of them fickle, and fundamental changes in processing would happen from week-to-week as the model usage was tweaked. But, rule tweaking always felt good. The LLM said it would work better, and the LLM said it had read and understood the instructions and the LLM said it would apply them… I felt like I understoood how best to deliver data to the LLMs, only to see recurrent failures.

LLMs lie. They have no idea, no data, and no insights into specific areas, but they’ll make pleasant reality-adjacent fiction. Since chatting is seductive, and our time sense is impacted by talking, I think the normal time versus productivity sense is further pulled out of ehack. Devs are notoriously bad at estimating where they’re using time, long feedback loops filled with phone time and slow ass conversation don’t help.


HAM & pirate radio vs corporate broadcasting.

Hams, by and large, despise pirate radio.

Parallel hypothesis: the intensity of competition between models is so intense that any high-engagement high-relevance web discussion about any LLM/AI generation is gonna hit the self-guided self-reinforced model training and result in de facto benchmaxxing.

Which is only to say: if we HN-front-page it, they will come (generate).


LLM bots are gonna start back dating commits to look more legit.

Yep, i absolutely expect this to happen, the quality signals that humans use are going to be forever in flux now as the humans try to stay ahead of the bots.

The last letter.

[Did I pass the interview? No? Understandable.]


They could add “Verified Human” checkmarks to GitHub.

You know, charge a small premium and make recurring millions solving problems your corporate overlords are helping create.

I think that counts as vertical integration, even. The board’s gonna love it.


Already browsing boat builder web sites..

Microsoft is deeply entwined in OpenAI and has obvious reasons to dogfood, yet their people are using Anthropic solutions.

Valuation behemoth OpenAI has been forced by the market to use Anthropic standards a couple times, having no comparable solutions of their own.

… I can see it.


Anthropic's marketing somehow punches hard. Not sure why, but the stuff they do sticks. Not because the products are great, but because the way they communicate about it gives people the right feeling. They do have legitimately the best coding model now for most tasks, and for narrative prose, but the marketing stuck and people stan'd them even when they were trailing.

Anthropic develops tools for developers and power users which are the actual people doing the evangelizing and marketing for them.

> Anthropic's marketing somehow punches hard. Not sure why

The fish rots from the head and marketing depends on being relatable.

https://www.youtube.com/watch?v=qMAg8_yf9zA

Take a scroll through the comments.


LoC desire-ability is also dependent on projects stage.

Early we should see huge chunky contributions and bursts. Loc means things are being realized.

In a mature product shipping at a sustained and increasing velocity, seeing LoC decrease our grow glacially year-on-year is a warm fuzzy feeling.

By my estimation aircraft designs should grow a lot for a bit (from 0 to not 0), churn for a while, then aim for specified performance windows in periods of punctuated stability.

Reuse scenarios create some nice bubbles where LoC growth in highly validated frameworks/components is amazing, as surrounding systems obviate big chunks of themselves. Local explosions, global densification and refinement.


  > Early we should see huge chunky contributions and bursts. Loc means things are being realized.
There is nothing more permanent than a temporary fix that works.

This is a common way for tech debt to build. You're right that strategies like "move fast and break things" is a very useful strategy, but it only really works if it is followed by "cleanup, everybody do your share."

LoC as a measurement is nothing without context. But that context is constantly changing and even dependent on people's coding styles. I like to write my first iteration of even small programs pretty dirty, before I refine. I'll even commit them, but they generally won't show up in a PR because I quickly distill.

I think measuring activity or productivity is an extremely difficult thing to measure. A thing that's extremely easy to fool yourself into believing you're measuring accurately. A first order approximation is going to look fine for a little while, but that is the trap. That is how you fool yourself. In a relatively local timeframe it'll keep looking like it is working, but you have no idea if it is accurate over the timeframes that actually matter. The measure is too simple and adding more first order approximations only makes the measurements worse, not better. Context allows you to progress, but complexity increases exponentially while accuracy may not.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: