Hacker Newsnew | past | comments | ask | show | jobs | submit | thesz's commentslogin

You said AI: https://github.com/stassa/louise

  Louise (Patsantzis & Muggleton 2021) is a machine learning system that learns Prolog programs.

  Louise is a Meta-Interpretive Learning (MIL) system. MIL (Muggleton et al. 2014), (Muggleton et al. 2015), is a new setting for Inductive Logic Programming (ILP) (Muggleton, 1991). ILP is a form of weakly-supervised machine learning of logic programs from examples of program behaviour (meaning examples of the inputs and outputs of the programs to be learned). Unlike conventional, statistical machine learning algorithms, ILP approaches do not need to see examples of programs to learn new programs and instead rely on background knowledge, a library of pre-existing logic programs that they reuse to compose new programs.
This is what was done by Douglas Lenat from late 1970-s on [1]. He did his work using Lisp, this thing does something close using Prolog.

[1] https://en.wikipedia.org/wiki/Eurisko


If we're going down that path: Ehud Shapiro got there back in 1984 [1]. His PhD thesis is excellent and shows what logic programming could do (/could have been).

He viewed the task of learning predicates (programs/relations) as a debugging task. The magic is in a refinement operator that enumerates new programs. The diagnostic part was wildly insightful -- he showed how to operationalise Popper's notion of falsification. There are plenty of more modern accounts of that aspect but sadly the learning part was broadly neglected.

There are more recent probabilistic accounts of this approach to learning from the 1990s.

... and if you want to go all the way back you can dig up Gordon Plotkin's PhD thesis on antiunification from the early 1970s.

[1] https://en.wikipedia.org/wiki/Algorithmic_program_debugging


SQL is not a pipeline, it is a graph.

Imagine three joins of three queries A,B and C, where first join J1 joins A and B, second join J2 joins A and C and third join J3 joins J1 and J2. Note that I said "queries," not "tables" - these A, B and C can be complex things one would not want or be able to compute more than once. Forget about compute, A, B and C can be quite complex to even write down and the user may really do not want to repeat itself. Look at TPC-DS, there are subqueries in the "with" sections that are quite complex.

This is why pipeline replacements for SQL are more or less futile efforts. They simplify simple part and avoid touching complex one.

I think that something like Verse [1] is more or less way to go. Not the Verse itself, but functional logic programming as an idea, where you can have first class data producers and effect system to specify transactions.

[1] https://en.wikipedia.org/wiki/Unreal_Engine#Verse


Thanks, I'll check out Verse.

I haven't seen anyone make the point about graphs before. FWIW PRQL allows defining named subqueries that can be reused, like J1 and J2 in your example.


TIL about Verse looks cool I'll have to check it out.

> SQL is not a pipeline, it is a graph.

Maybe it's both? and maybe there will always be hard-to-express queries in SQL, and that's ok?

the RDBMS's relational model is certainly a graph and joins accordingly introduce complexity.

For me, just as creators of the internet regret that subdomains come before domains, I really we could go back in time and have `FROM` be the first predicate and not `SELECT`. This is much more intuitive and lends itself to the idea of a pipeline: a table scan (FROM) that is piped to a projection (SELECT).


Pipeline is a specific kind of a graph.

Yes, there will always be hard-to-express queries, the question is how far can we go?


Crazy to think that Fortnite might unleash a new population of people who toyed with functional-logic as their first paradigm.

Does it really help to call SQL a graph?

right? like it's a graph and a relational model query and a pipeline and a language and an abstract syntax tree and declarative logical plan

It does. Just like any other programming language.

May as well call everything a graph at that point; meaningless.

  > meaningless.
No.

You present "programs are graphs" as trivial truth. True trivial truths are, as you pointed out, meaningless. But you leave out degree of applicability - information in the dependence graph differs between programming languages.

Dependencies form a graph, and analyses needed to optimize execution of the program graph differ wildly between languages. Look at С++ aliasing rules and C's "restrict" keyword.

One can't escape the dependence graph. But one can execute dependence graph better or worse, depending (pun intended) on the programming language.


800 mm2, about 90mm per side, if imagined as a square. Also, 250 W of power consumption.

The form factor should be anything but thumbdrive.


mmmhhhhh 800mm2 ~= (30mm)2, which is more like a (biggish) thumb drive.

Thanks!

I haven't had my coffee yet. ;)


Shit happens :D

always after the coffee :)

the radiator wouldn't be though

8B coefficients are packed into 53B transistors, 6.5 transistors per coefficient. Two-inputs NAND gate takes 4 transistors and register takes about the same. One coefficient gets processed (multiplied by and result added to a sum) with less than two two-inputs NAND gates.

I think they used block quantization: one can enumerate all possible blocks for all (sorted) permutations of coefficients and for each layer place only these blocks that are needed there. For 3-bit coefficients and block size of 4 coefficients only 330 different blocks are needed.

Matrices in the llama 3.1 are 4096x4096, 16M coefficients. They can be compressed into only 330 blocks, if we assume that all coefficients' permutations are there, and network of correct permutations of inputs and outputs.

Assuming that blocks are the most area consuming part, we have block's transistor budget of about 250 thousands of transistors, or 30 thousands of 2-inputs NAND gates per block.

250K transistors per block * 330 blocks / 16M transistors = about 5 transistors per coefficient.

Looks very, very doable.

It does look doable even for FP4 - these are 3-bit coefficients in disguise.


I'm looking forward to the model.toVHDL() method in PyTorch.

Ugh, quick, everyone start panic-buying FPGAs now.

largest FPGAs have on the order of tens of millions of logic cells/elements. They’re not even remotely big enough to emulate these designs except to validate small parts of it at a time and unlike memory chips or GPUs, companies don’t need millions of them to scale infrastructure.

(The chips also cost tens of thousands of dollars each)


they also arent power friendly

Pretty close to what you describe: https://github.com/fastmachinelearning/hls4ml

Deep Differentiable Logic Gate Networks

I see you and I raise approximate logic synthesis [1] [2].

[1] https://www.sciencedirect.com/science/article/pii/S138376212...

[2] https://arxiv.org/abs/2506.22772

You can synthesize a logic circuit that is as complex as it gets to have a certain accuracy.

Deep differentiable logic networks, in my experience, do not scale well for larger (more inputs) logic elements. One still has to apply logic optimization and synthesis afterwards. So why not to synthesize ones own approximate circuit to the accuracy one's desire?


Is this a thing?

I gave a short talk about compiling PyTorch to Verilog at Latte '22. Back then we were just looking at a simple dot product operation, but the approach could theoretically scale up to whole models.

https://capra.cs.cornell.edu/latte22/paper/2.pdf

https://www.youtube.com/watch?v=QxwZpYfD60g


They mentioned that they using strong quantization (iirc 3bit) and that the model was degradeted from that. Also, they don't have to use transistors to store the bits.

I think they are talking about the transistors that apply the weights to the inputs.

gpt-oss is fp4 - they're saying they'll next try mid size one, I'm guessing gpt-oss-20b then large one, i'm guessing gpt-oss-120b as their hardware is fp4 friendly

Whats the theoretixal full wafer scale model they could produce?

  > what's your alternative?
Brain derived neurotrophic factor (BDNF) serum levels are inversely associated with depression's severity [1].

https://pmc.ncbi.nlm.nih.gov/articles/PMC3188695/

Give yourself a good BDNF boost through diet and/or exercise.

Ketogenic diet even improves on schizophrenia to the point that patients go off from medication [2] [3].

[2] https://pmc.ncbi.nlm.nih.gov/articles/PMC12237970/

[3] https://www.frontiersin.org/journals/nutrition/articles/10.3...

That's my alternative.


  > Assume FP64 units are ~2-4x bigger.
This is wrong assumption. FP64 usually uses the same circuitry as two FP32, adding not that much ((de)normalization, mostly).

From the top of my head, overhead is around 10% or so.

  > Why would gamers want to pay for any features they don't use?
https://www.youtube.com/watch?v=lEBQveBCtKY

Apparently FP80, which is even wider than FP64, is beneficial for pathfinding algorithms in games.

Pathfinding for hundredths of units is a task worth putting on GPU.


Has FP80 ever existed anywhere other than x87?

Then you get to definitions like ": open ( string -- handle 1 | 0) ... ;" which describes returning algebraic type Maybe Handle unboxed on the stack. Algebraic types are fun, they can easily represent Peano arithmetic and get us into the realm Goedel incompleteness theorem very quickly.

Or you can deduce signature for EXEC EXEC sequence. EXEC's stack effect can be described as ( \alpha (\alpha -- \beta) -- \beta), where \greekletter is a placeholder for a stack part of arbitrary length. Notice that this type comment has nested brackets and does not adhere to Forth stack-effect comment convention.

When I thought about this more than fifteen years ago, I've got at least two equally valid types for the EXEC EXEC: one where xt at top of stack consumes all its input and leaves no output ( \alpha (\alpha -- \gamma) \beta (\beta -- ) -- \gamma) and when first EXEC produces something for second to execute upon ( \alpha \beta (\beta -- \gamma (\alpha \gamma -- \theta) -- \theta).

One can argue that second type of EXEC EXEC subsume first one, if greek-letter-named-stack-parts are allowed to be empty.

Still it shows that typing Forth, at the very least, needs unification on the Peano's arithmetic level, implementing deduction from length zero to unbounded length.

So, in my opinion, for LLM to dependably combine typed Forth/concatenative definitions, it needs to call external tool like Prolog to properly deduce type(s) of the sequence of Forth's (or concatenative language's) definitions.

And here we enter a realm interesting in itself.

Here it is: https://github.com/stassa/louise

This is a Prolog system to learn programs in polynomial time. For one example, it can one-shot-learn a grammar, without being "trained" on millions of samples.

So, should one use a LLM that either needs a paid access or just slow to run, or freely go where "old school" systems like Eurisco [1] and Cyc went?

[1] https://en.wikipedia.org/wiki/Eurisko

Eurisco demonstrated superhuman abilities in 1982-83. It also demonstrated knowledge transfer at the time, where rules from VLSI place-and-route algorithms were used to design winning Traveler TCS fleet.


How several wrong assumptions make it right with increasing trials?

You can ask Opus 4.6 to do a task and leave it running for 30min or more to attempt one-shooting it. Imagine doing this with three agents in parallel in three separate work trees. Then spin up a new agent to decide which approach of the three is best on the merits. Repeat this analysis in fresh contexts and sample until there is clear consensus on one. If no consensus after N runs, reframe to provide directions for a 4th attempt. Continue until a clear winning approach is found.

This is one example of an orchestration workflow. There are others.


  > Then spin up a new agent to decide which approach of the three is best on the merits. Repeat this analysis in fresh contexts and sample until there is clear consensus on one.
If there are several agents doing analysis of solutions, how do you define a consensus? Should it be unanimous or above some threshold? Are agents scores soft or hard? How threshold is defined if scores are soft? There is a whole lot of science in voting approaches, which voting approach is best here?

Is it possible for analyzing agents to choose the best of wrong solutions? E.g., longest remembered table of FizzBuzz answers amongst remembered tables of FizzBuzz answers.


We have a voting algorithm that we use, but we're not at the level of confidential disclosure if we proceed further in this discussion. There's lots of research out there into unbiased voting algorithms for consensus systems.

You conveniently decided not to answer my question about quality of the solutions to vote on (ranking FizzBuzz memorization).

To me, our discussion shows that what you presented as a simple thing is not simple at all, even voting is complex, and actually getting a good result is so hard it warrants omitting answer altogether.


Yeah, you've got unrealistic expectations if you expect me to divulge my company's confidential IP in a HN comment.

I had no expectations at all, I just asked questions, expecting answers. At the very beginning the tone of your comment, as I read it, was "agentic coding is nothing but simple, look they vote." Now answers to simple but important questions are "confidential IP."

Okay then, agentic coding is nothing but complex task requiring knowledge of unbiased voting (what is this thing really?) and, apparently, use of necessarily heavy test suite and/or theorem provers.


It was a scene from a sci-fi movie (i mean Claude demo to CTOs)

  > It will accelerate the work and change your role from what it is today to something different;
We yet to see if different is good.

My short experience with LLM reviewing my code is that LLM's output is overly explanatory and it slows me down.

  > something that takes time and experience to work with.
So you invite us to participate in sunken cost fallacy.

Tell it to summarize?

I cannot because these reviews are comments in Github PRs. I have to read them.

  > You should have test coverage, type checking, and integration tests that catch the edge cases automatically.
You should assume that if you are going to cover edge cases your tests will be tens to hundredths times as big as the code tested. It is the case for several database engines (MariaDB has 24M of C++ in sql directory and 288M of tests in mysql-test directory), it was the case when I developed VHDL/Verilog simulator. And not everything can be covered with type checking, many things, but not all.

AMD's FPU had hundredths of millions test cases for its' FPU and formal modeling caught several errors [1].

[1] https://www.cs.utexas.edu/~moore/acl2/v6-2/INTERESTING-APPLI...

SQLite used to have 1100 LOC of tests per one LOC of C code, now the multiplier is smaller, but still is big.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: