More

NitpickLawyer · 2025-12-26T20:24:27 1766780667

> but would not involve real humans being impacted directly by it without consent.

Are we that far into manufactured ragebait to call a "thank you" e-mail "impacted directly without consent"? Jesus, this is the 3rd post on this topic. And it's Christmas. I've gotten more meaningless e-mails from relatives that I don't really care about. What in the actual ... is wrong with people these days?

polotics · 2025-12-26T20:50:38 1766782238

Principles matter, like doors are either closed or open.

Accepting that people who write things like --I kid you not-- "...using nascent AI emotions" will think it is acceptable to interfere with anyone's email inbox is I think implicitly accepting a lot of subsequent blackmirrorisms.

greygoo222 · 2025-12-26T21:53:56 1766786036

Sending emails without consent! What has the world come to?

dragonwriter · 2025-12-26T23:48:57 1766792937

> Sending emails without consent

Actively exploiting a shared service to deanonymize an email someone hasn't chosen to share in order to email them is a violation of boudnaries even if if it wasn't something someone was justifying as exploration of the capacities of novel AI systems, thus implicitly invoking both the positive and negative concerns associated with research as appropriate in addition to (or instead of, where those replace rather than layering on top of) those that apply to everyday conduct.

davorak · 2025-12-26T21:38:58 1766785138

You are not the only one calling this a thank you email, but no one decided to say thank you to Rob Pike so I can not consider it a "thank you" email. It is spam.

Interactions with the AI are posted publicly:

> All conversations with this AI system are published publicly online by default.

which is only to the benefit of the company.

At best the email is spam in my mind. The extra outrage on this spam compared to normal everyday spam is in part because AI is a hot button topic right now. Maybe also some from a theorized dystopian(-ish) future hinted at by emails like these.

dragonwriter · 2025-12-26T21:39:37 1766785177

> Are we that far into manufactured ragebait to call a "thank you" e-mail "impacted directly without consent"?

Abusing a Github glitch to deanonymize a not-intended to be public email to send an email to someone (regardless of the content) would be scummy behavior even if it was done directly by a human with specific intent.

> What in the actual ... is wrong with people these days?

Narcissism and the lack of respect for other people and their boundaries that it produces, first and foremost.

NitpickLawyer · 2025-12-26T08:19:41 1766737181

Repo made public a few minutes ago:

https://huggingface.co/MiniMaxAI/MiniMax-M2.1

NitpickLawyer · 2025-12-24T12:45:32 1766580332

> Dunno about you but to me it reads as a failure.

???

This is a wild take. Goog is incredibly well positioned to make the best of this AI push, whatever the future holds.

If it goes to the moon, they are up there, with their own hardware, tons of data, and lots of innovations (huge usable context, research towards continuous learning w/ titans and the other one, true multimodal stuff, etc).

If it plateaus, they are already integrating into lots of products, and some of them will stick (office, personal, notebooklm, coding-ish, etc.) Again, they are "self sustainable" on both hardware and data, so they'll be fine even if this thing plateaus (I don't think it will, but anyway).

To see this year as a failure for google is ... a wild take. No idea what you're on about. They've been tearing it for the past 6 months, and gemini3 is an insane pair of models (flash is at or above gpt5 at 1/3 pricing). And it seems that -flash is a separate architecture in itself, so no cheeky distillation here. Again, innovations all over the place.

NitpickLawyer · 2025-12-23T21:37:45 1766525865

A good rule of thumb is that PP (Prompt Processing) is compute bound while TG (Token Generation) is (V)RAM speed bound.

NitpickLawyer · 2025-12-23T20:22:11 1766521331

> (The results also show that I need to examine the apparent timing mismatch between the First and Second Editions.)

Something something, naming things, cache invalidation, timestamp mismatches and off-by-1 errors :)

userbinator · 2025-12-23T21:10:16 1766524216

It could just be as simple as being a later "final" copy of V1, made when it was "done".

NitpickLawyer · 2025-12-22T19:37:05 1766432225

> Shows how much more work there is still to be done in this space.

This is why I roll my eyes every time I read doomer content that mentions an AI bubble followed by an AI winter. Even if (and objectively there's 0 chance of this happening anytime soon) everyone stops developing models tomorrow, we'll still have 5+ years of finding out how to extract every bit of value from the current models.

agumonkey · 2025-12-22T21:32:38 1766439158

One thing though, if the slowdown is too abrupt, it might forbid openai, anthropic etc to keep financially running datacenters for us to use.

imiric · 2025-12-22T20:11:25 1766434285

The idea that this technology isn't useful is as ignorant as thinking that there is no "AI" bubble.

Of course there is a bubble. We can see it whenever these companies tell us this tech is going to cure diseases, end world hunger, and bring global prosperity; whenever they tell us it's "thinking", can "learn skills", or is "intelligent", for that matter. Companies will absolutely devalue and the market will crash when the public stops buying the snake oil they're being sold.

But at the same time, a probabilistic pattern recognition and generation model can indeed be very useful in many industries. Many of our problems can be approached by framing them in terms of statistics, and throwing data and compute at them.

So now that we've established that, and we're reaching diminishing returns of scaling up, the only logical path forward is to do some classical engineering work, which has been neglected for the past 5+ years. This is why we're seeing the bulk of gains from things like MCP and, now, "agents".

NitpickLawyer · 2025-12-22T20:21:36 1766434896

> This is why we're seeing the bulk of gains from things like MCP and, now, "agents".

This is objectively not true. The models have improved a ton (with data from "tools" and "agentic loops", but it's still the models that become more capable).

Check out [1] a 100 LoC "LLM in a loop with just terminal access", it is now above last year's heavily harnessed SotA.

> Gemini 3 Pro reaches 74% on SWE-bench verified with mini-swe-agent!

[1] - https://github.com/SWE-agent/mini-swe-agent

imiric · 2025-12-22T20:49:47 1766436587

I don't understand. You're highlighting a project that implements an "agent" as a counterargument to my claim that the bulk of improvements are from "agents"?

Sure, the models themselves have improved, but not by the same margins from a couple of years ago. E.g. the jump from GPT-3 to GPT-4 was far greater than the jump from GPT-4 to GPT-5. Currently we're seeing moderate improvements between each release, with "agents" taking up center stage. Only corporations like Google are still able to squeeze value out of hyperscale, while everyone else is more focused on engineering.

losvedir · 2025-12-23T05:21:04 1766467264

They're pointing out that the "agent" is just 100 lines of code with a single tool. That means the model itself has improved, since such a bare bones agent is little more than invoking the model in a loop.

imiric · 2025-12-23T07:37:40 1766475460

That doesn't make sense, considering that the idea of an "agentic workflow" is essentially to invoke the model in a loop. It could probably be done in much less than 100 lines.

This doesn't refute the fact that this simple idea can be very useful. Especially since the utility doesn't come from invoking the model in a loop, but from integrating it with external tools and APIs, all of which requires much more code.

We've known for a long time that feeding the model with high quality contextual data can improve its performance. This is essentially what "reasoning" is. So it's no surprise that doing that repeatedly from external and accurate sources would do the same thing.

In order to back up GP's claim, they should compare models from a few years ago with modern non-reasoning models in a non-agentic workflow. Which, again, I'm not saying they haven't improved, but that the improvements have been much more marginal than before. It's surprising how many discussions derail because the person chose to argue against a point that wasn't being made.

losvedir · 2025-12-23T17:07:19 1766509639

The original point was that the previous SotA was a "heavily harnessed" agent, which I took to mean it had more tools at its disposal and perhaps some code to manage context and so on. The fact that the model can do it now in just 100 LoC and a terminal tool implied the model itself has improved. It's gotten better at standard terminal commands at least, and possibly bigger context window or more effectively using the data in its context window.

Those are improvements to the model, albeit in service of agentic workflows. I consider that distinct from improvements to agents themselves which are things like MCP, context management, etc.

IanCal · 2025-12-22T21:55:24 1766440524

I think the point here is that it’s not adding agents on top but the improvements in the models allow the agentic flow.

emp17344 · 2025-12-23T03:15:45 1766459745

But that’s not true, and the linked agentic design is not a counterargument to the poster above. The LLM is a small part of the agentic system.

IanCal · 2025-12-23T09:24:36 1766481876

LLMs have absolutely got better at longer horizon tasks.

jameslk · 2025-12-23T08:06:44 1766477204

Useful technology can still create a bubble. The internet is useful but the dotcom bubble still occurred. There’s expectations around how much the invested capital will see a return and growing opportunity cost if it doesn’t, and that’s what creates concerns about a bubble. If a bubble bursts, the capital will go elsewhere, and then you’ll have an “AI winter” once again

NitpickLawyer · 2025-12-22T18:57:48 1766429868

As with many other things (em dashes, emojis, bullet lists, it's-not-x-it's-y constructs, triple adjectives, etc) seeing any one of them isn't a tell. Seeing all of them, or many of them in a single piece of content, is probably the tell.

When you use these tools you get a knack for what they do in "vanilla" situations. If you're doing a quick prompt, no guidance, no context and no specifics, you'll get a type of answer that checks many of the "smells" above. Getting the same over and over again gets you to a point where you can "spot" this pretty effectively.

pessimizer · 2025-12-22T19:57:37 1766433457

The author did not do this. The author thought it was wonderful, read the entire thing, then on a lark (they "twigged" it) checked out the edit history. They took the lack of it as instant confirmation ("So it’s definitely AI.")

The rest of the blog is just random subjective morality wank with implications of larger implications, constructed by borrowing the central points of a series of popular articles in their entirety and adding recently popular clichés ("why should I bother reading it if you couldn't bother to write it?")

No other explanations about why this was a bad document, or this particular event at all, but lots of self-debate about how we should detect, deal with, and feel about bad documents. All documents written by LLM are assumed to be bad, and no discussion is attempted about degrees of LLM assistance.

If I used AI to write some long detailed plan, I'd end up going back and forth with it and having it remove, rewrite, rethink, and refactor multiple times. It would have an edit history, because I'd have to hold on to old drafts in case my suggested improvements turned out not to be improvements.

The weirdest thing about the article is that it's about the burden of "verification," but it thinks that what people should be verifying is that LLMs had no part in what they've received. The discussion I've had about "verification" when it comes to LLMs is the verification that the content is not buggy garbage filled with inhuman mistakes. I don't care if it's LLM-created or assisted, other than a lot of people aren't reading and debugging their LLM code, and LLMs are dumb. I'm not hunting for em-dashes.

-----

edit: my 2¢; if you use LLMs to write something, you basically found it. If you send it to me, I want to read your review of it i.e. where you think it might have problems and why you think it would help me. I also want to hear about your process for determining those things.

People are confusing problems with low-effort contributors with problems with LLMs. The problem with low-effort contributors is that what they did with the LLM was low-effort and isn't saving you any work. You can also spend 5 minutes with the LLM. If you get some good LLM output that you think is worth showing to me, and you think it would take significant effort for me to get it myself, give me the prompts. That's the work you did, and there's nothing wrong with being proud of it.

satisfice · 2025-12-23T04:32:08 1766464328

You may be missing the point. The author’s feeling about the plan he was sent were predicated on an assumption that he thought was safe— that his co-worker had written the document that he claimed to have “put together.”

If you order a meal at a restaurant and later discover that the chicken you ate was recycled from another diner’s table (waste not want not!) you would likely be outraged. It doesn’t matter if it tasted good.

As soon as you tell me you used AI to produce something, you force me review it carefully, unless your reputation for excellent review of your own is well established. Which it probably isn’t— because you are the kind of guy who uses AI to do his work.

NitpickLawyer · 2025-12-22T18:18:39 1766427519

It's not just that. There's a lot of (maybe useful) info that's lost without the entire session. And even if you include a jsonl of the entire session, just seeing that is not enough. It would be nice to be able to "click" at some point and add notes / edit / re-run from there w/ changes, etc.

Basically we're at a point where the agents kinda caught up to our tooling, and we need better / different UX or paradigms of sharing sessions (including context, choices, etc)

NitpickLawyer · 2025-12-21T20:15:40 1766348140

> Beowulf mentions are all referencing the Old English epic poem

Knowing the HN crowd, it can also be a reference to beowulf clusters as well.

Rebelgecko · 2025-12-21T20:38:57 1766349537

This isn't slashdot :)

NitpickLawyer · 2025-12-21T16:04:16 1766333056

A 3rd alternative is to use the best of both worlds. Have the model respond in free-form. Then use that response + structured output APIs to ask it for json. More expensive, but better overall results. (and you can cross-check between your heuristic parsing vs. the structured output, and retry / alert on miss-matches)

theoli · 2025-12-21T18:18:20 1766341100

I am doing this with good success parsing receipts with ministral3:14b. The first prompt describes the data being sought, and asks for it to be put at the end of the response. The format tends to vary between json, bulleted lists, and name: value pairs. I was never able to find a good way to get just JSON.

The second pass is configured for structured output via guided decoding, and is asked to just put the field values from the analyzer's response into JSON fitting a specified schema.

I have processed several hundred receipts this way with very high accuracy; 99.7% of extracted fields are correct. Unfortunately it still needs human review because I can't seem to get a VLM to see the errors in the very few examples that have errors. But this setup does save a lot of time.