Hacker Newsnew | past | comments | ask | show | jobs | submit | jmalicki's commentslogin

The title is wrong.

The title of the paper is "Silicon Formalism: Rules, Standards, and Judge AI"

When they say legally correct they are clear that they mean in a surface formal reading of the law. They are using it to characterize the way judges vs. GPT-5 treat legal decisions, and leave it as an open question which is better.

The conclusion of the paper is "Whatever may explain such behavior in judges and some LLMs, however, certainly does not apply to GPT-5 and Gemini 3 Pro. Across all conditions, regardless of doctrinal flexibility, both models followed the law without fail. To the extent that LLMs are evolving over time, the direction is clear: error-free allegiance to formalism rather than the humans’ sometimesbumbling discretion that smooths away the sharper edges of the law. And does that mean that LLMs are becoming better than human judges or worse?"


> We find the LLM to be perfectly formalistic, applying the legally correct outcome in 100% of cases; this was significantly higher than judges, who followed the law a mere 52% of the time.

A game engine is sort of just a UI toolkit for interacting with 3d objects. If you want a 3d model you can interact with, you call that a game engine. Should they invent something new because they dislike the word "game"?

Worked for a shop that had the hardest time grasping this. We had to tell them UE4 was a "3D Rendering Engine" because they couldn't get over the term "game engine" in planning meetings...

I feel like Nvidia Omniverse's main innovation is that they call themselves an 'industrial metaverse platform' even though their product is practically not much different from a game engine.

At the time YouTube was acquired their infrastructure costs were quite high. Not as crazy as today's AI companies, but in the same way a lot of people were questioning if they could ever make money because of it.

Ah I see, I think I was still in my teens when that purchase happened, so I had no idea about the concerns or anything. I just was shocked at the amount of money because it's a lot of money.

the first version of YT were based on Flash/MM, IIRC

They ran out of passively collected data. RLHF allows them to gather deeper more targeted data.

There is a lot of RLHF effort around this.

> A programmer might write a function, notice it becoming too long or doing too much, and then decide break it down into smaller subroutines. I've never seen an LLM really do this, they seem biased towards being additive.

The nice thing is a programmer with an LLM just steps in here, and course-corrects, and still has that value add, without taking all the time to write the boilerplate in between.

And in general, the cleaner your codebase the cleaner LLM modifications will be, it does pick up on coding style.


>The nice thing is a programmer with an LLM just steps in here, and course-corrects

This does not seem to be the direction things are going. People are talking about shipping code they haven't edited, most notably the author of Claude Code. Sometimes they haven't even read the code at all. With LLMs the path of least resistance is to take your hands off the wheel completely. Only programmers taking particular care are still playing an editorial role.

When the code is constructed by an LLM, the human in the driving seat doesn't get a chance to build the mental models that they usually would writing it manually. This stifles the ability to see opportunities to refactor. It is widely considered to be harder to read code than to write it.

>And in general, the cleaner your codebase the cleaner LLM modifications will be

Whilst true, this is a kind of "you're holding it wrong" argument. If LLMs had model of what differentiates good code from bad code, whatever they pull into their context should make no difference.


> Whilst true, this is a kind of "you're holding it wrong" argument. If LLMs had model of what differentiates good code from bad code, whatever they pull into their context should make no difference.

Good code is in the eye of the beholder. What reviewers in one shop would consider good code is dramatically different than another.

Conforming to the existing code base style is good in and of itself, if the context it pulls in makes no difference that makes it useless.


> When the code is constructed by an LLM, the human in the driving seat doesn't get a chance to build the mental models that they usually would writing it manually

I'm asking the LLM for alternatives and options constantly, to test different models. It can give me a write-up description of options, or go spin up subagents to go try 4 different things at once.

> It is widely considered to be harder to read code than to write it

Even more than writing code, I think LLM's are exceptional at reading code. They can review huge amounts of code incredibly fast, to understand very complex systems. And then you can just ask it questions! Don't understand? Ask more questions!

I have mcp-neovim-server open, so I just ask it to open the relevant pieces of code at those lines, and it can then show me. CodeCompanion makes it easy to ask questions about a line. It's amazing how

Reading code was one of the extremely hard parts of programming, and the machine is far far better at it than us!

> When the code is constructed by an LLM, the human in the driving seat doesn't get a chance to build the mental models that they usually would writing it manually.

Here's one way to tell me you haven't tried the thing without saying you haven't tried the thing. The ability to do deep inquiry into topics & to test &btry different models is far far far better than it has ever been. We aren't stuck with what we right, we can keep iterating &b trying at vastly lower cost, to do the hard work to discover what is a good model. Programmers rarely have had the luxury of time and space to keep working on a problem again and again, to adjust and change and tweak until the architecture truly sings. Now you can try a weeks worth of architectures in an afternoon. There is no better time for those who want to understand to do so.

I feel like one thing missing from this thread is that most people adopting AI at a serious level are building really strong AGENTS.md files, that refine tastes and practices and forms. The AI is pretty tasteless, isnt deliberate. It is up to us to explore the possibility space when working on problems, and to create good context that steers towards good solutions. And our ability to get information out, to probe into systems, to asses, to test hypothesis, is vastly vastly higher, which we can keep using to become far better steersfolk.


What do you think the entire issue was with supply chain attacks of skills moltbook was installing? Those skills were downloading rootkits to steal crypto.

I find that AI allows me to get into algorithm design more, and the intersection of math and programming more, by avoiding boilerplate.

You can indulge even more by letting AI take care of the easy stuff so you can focus on the hard stuff.


What happens when the AI does the hard stuff as well?

As described above, I think with AI coding, our role shifts from "programmer" to "project manager", but even as a project manager, you can still choose to delegate some tasks to yourself. Whether if you want to do the hard stuff yourself, or the easy stuff, or the stuff that happens on Thursdays. It's not about what AI is capable of doing, but rather, what you choose to have it do.

SkyNet. When it can do the hard stuff, why do you think we'll still be around for project management and prompting? At that point, we are livestock.

Look around. We have been livestock for at least a decade now.

In fact, we are worse. At least livestock are cared for.


Here's an example from my recent experience: I've been building a bunch of mostly throwaway TUIs using AI (using Python and Rich), and a lot of the stuff just works trivially.

But there are some things where the AI just does not understand how to do proper boundary check to prevent busted layouts, and so I can either argue with it for an hour while it goes back and forth breaking the code in the process of trying to fix my layout issues - or I can just go in and fix it myself.


In cursor you highlight and hit Ctrl-L, and use the voice prompting - I can do this today!

All you have to do is record a table of fixup locations you can fill in in a second pass once the labels are resolved.

In practice, one of the difficulties in getting _clang_ to assemble the Linux kernel (as opposed to GNU `as` aka GAS), was having clang implement support for "fragments" in more places.

https://eli.thegreenplace.net/2013/01/03/assembler-relaxatio...

There were a few cases IIRC around usage of the `.` operator which means something to the effect of "the current point in the program." It can be used in complex expressions, and sometimes resolving those requires multiple passes. So supporting GAS compatible syntax in more than just the basic cases forces the architecture of your assembler to be multi-pass.


I mean, no, it's more than that.

You also need to choose optimal instruction encoding, and you need to understand how relocs work - which things can you resolve now vs which require you to encode info for the linker to fill in once the program is launched, etc etc.

Not sure why I'm on this little micro-rant about this; I'm sure Claude could write a workable assembler. I'm more like.. I've written one assembler and many, many parsers, and the parsers where way simpler, yet this thread is littered with people that seem to think assemblers are just lookup tables from ascii to machine code with a loop slapped on top of them.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: