More

crvdgc · 2026-01-17T10:45:37 1768646737

Some valid points, but I hope the authors had developed them more.

On the semantic gap between the original software and its representation in the ITP, program extraction like in Rocq probably deserves some discussion, where the software is written natively in the ITP and you have to prove the extraction itself sound. For example, Meta Rocq did this for Rocq.

For the how far down the stack problem, there are some efforts from https://deepspec.org/, but it's inherently a difficult problem and often gets less love than the lab environment projects.

crvdgc · 2026-01-11T14:05:42 1768140342

This specific example to me is less likely a consequence of model collapsing, but the "personality" adjustment about how aggressively it should read into the user's intention.

From time to time, I enjoy the model guessing what I meant rather than what I wrote. For example, "Find the backend.py" can be auto-corrected into "find the app.py".

crvdgc · 2025-12-27T10:38:38 1766831918

> But let's hit the random button on wikipedia and pick a sentence, see if you can draw a picture to convey it, mm?

The inverse is also difficult. Pick a random 15 second movie clip, how to describe it using text without losing much of its essence? Or can one really port a random game into a text version? Can a pilot fly a plane with text-based instrument panel?

Text is not a superset of all communication media. They are just different.

upofadown · 2025-12-27T13:10:56 1766841056

Commercial aviation involves mostly textual interaction[1] to determine what the aircraft does, for most of the time. Aviation is rife with plain text, usually upper case for better legibility[2].

[1] https://en.wikipedia.org/wiki/Flight_management_system

[2] https://en.wikipedia.org/wiki/NOTAM

crvdgc · 2025-12-17T00:20:02 1765930802

> how do you verify the verification program?

The program used to check the validity of a proof is called a kernel. It just need to check one step at a time and the possible steps can be taken are just basic logic rules. People can gain more confidence on its validity by:

- Reading it very carefully (doable since it's very small)

- Having multiple independent implementations and compare the results

- Proving it in some meta-theory. Here the result is not correctness per se, but relative consistency. (Although it can be argued all other points are about relative consistency as well.)

crvdgc · 2025-11-28T08:41:34 1764319294

Checking the validity of a given proof is deterministic, but filling in the proof in the first place is hard.

It's like Chess, checking who wins for a given board state is easy, but coming up with the next move is hard.

Of course, one can try all possible moves and see what happens. Similar to Chess AI based on search methods (e.g. MinMax), there are proof search methods. See the related work section of the paper.

blazespin · 2025-11-28T20:13:06 1764360786

who likely wins, fify

crvdgc · 2025-10-17T20:01:34 1760731294

> imagine a folder full of skills that covers tasks like the following:

> Where to get US census data from and how to understand its structure

Reminds me of my first time using Wolfram Alpha and got blown away by its ability to use actual structured tools to solve the problem, compared to normal search engine.

In fact, I tried again just now and am still amazed: https://www.wolframalpha.com/input?i=what%27s+the+total+popu...

I think my mental model for Skills would be Wolfram Alpha with custom extensions.

FireInsight · 2025-10-17T20:51:17 1760734277

When clicking your link, for me it opened the following query on Wolfram Alpha: `what%27s the total population of the United States%3F`

Funnily enough, this was the result: `6.1% mod 3 °F (degrees Fahrenheit) (2015-2019 American Community Survey 5-year estimates)`

I wonder how that was calculated...

KeplerBoy · 2025-10-17T21:19:42 1760735982

Wolfram alpha never took input in such a natural language. But something like population(USA) and many variations thereof work.

idk-92 · 2025-10-17T20:41:13 1760733673

tbh wolfram alpha was the craziest thing ever. haven't done much research on how this was implemented back in the day but to achieve what they did for such complex mathematical problems without AI was kind of nuts

globular-toast · 2025-10-17T21:06:54 1760735214

Wolfram Alpha is AI. It's just not an LLM. AI has been a thing since the 60s. LLMs will also become "not AI" in a few years probably.

fragmede · 2025-10-18T13:35:57 1760794557

I doubt that if the underlying parts changed, anyone outside the industry or enthusiasts would know what that is. How many people know what kind of engine is in their car? I stomp on the floor of my Corolla and away we go! Others might know that their Dodge Challenger has a Hemi. What even is that? Thankfully we have the Internet these days, and someone who's interested can just select the word and right click to Google for the Wikipedia article for it. AI is just such an entirely undefined term coloquially, that any attempts to define it will be wrong.

phs318u · 2025-10-17T21:33:04 1760736784

Not sure why you’re getting downvoted. The marketing that LLM=AI seems to have been interpreted as “_only_ LLM=AI”

svdr · 2025-10-17T21:58:29 1760738309

I think the difference now is that traditional software ultimately comes down to a long series of if/then statements (also the old AI's like Wolfram), whereas the new AI (mainly LLM's) have a fundamentally different approach.

globular-toast · 2025-10-17T22:16:53 1760739413

Look into something like Prolog (~50 years old) to see how systems can be built from rules rather than it/else statements. It wasn't all imperative programming before LLMs.

If you mean that it all breaks down to if/else at some level then, yeah, but that goes for LLMs too. LLMs aren't the quantum leap people seem to think they are.

TheOtherHobbes · 2025-10-17T23:52:24 1760745144

They are from the user POV. Not necessarily in a good way.

The whole point of algorithmic AI was that it was deterministic and - if the algorithm was correct - reliable.

I don't think anyone expected that soft/statistical linguistic/dimensional reasoning would be used as a substitute for hard logic.

It has its uses, but it's still a poor fit for many problems.

globular-toast · 2025-10-18T08:10:44 1760775044

Yeah, the result is pretty cool. It's probably how it felt to eat pizza for the first time. People had been grinding grass seeds into flour, mixing with water and putting it on hot stones for millennia. Meanwhile others had been boiling fruits into pulp and figuring out how to make milk curdle in just the right way. Bring all of that together and, boom, you have the most popular food in the world.

We're still at the stage of eating pizza for the first time. It'll take a little while to remember that you can do other things with bread and wheat, or even other foods entirely.

ozim · 2025-10-17T23:02:28 1760742148

maybe not on their own - but having enough computing power to use LLMs in a way we do now and actually using them is quite a leap.

eloisant · 2025-10-18T11:48:53 1760788133

You're talking about non-deterministic algorithms, who yes are often associated with AI but existed way before LLM's

pjmlp · 2025-10-17T21:15:33 1760735733

It is basically another take on Lisp, and the development approach Lisp Machines had, repackaged in a more friendly syntax.

Lisp was the AI language until the first AI Winter took place, and also took Prolog alongside it.

Wolfram Alpha basically builds on them, to put in a very simplistic way.

krackers · 2025-10-17T21:46:56 1760737616

It's one of the only M-expression versions of Lisp. All the weird stuff about Wolfram Language suddenly made sense when I saw it through that lens

magicalhippo · 2025-10-17T21:21:32 1760736092

Would really like something selfhosted that does the basic Wolfram Alpha math things.

Doesn't need the craziest math capability but standard symbolic math stuff like expression reduction, differentiation and integration of common equations, plotting, unit wrangling.

All with an easy to use text interface that doesn't require learning.

jhallenworld · 2025-10-17T22:18:17 1760739497

Try maxima, it's open source:

https://maxima.sourceforge.io/

I used it when it was called Macsyma running on TOPS-20 (and a PDP-10 / Decsystem-20).

Text interface will require a little learning, but not much.

jgalt212 · 2025-10-17T22:55:18 1760741718

Maxima is amazing and has a GUI. My only beef with it is it doesn't show its work step by step.

krackers · 2025-10-17T21:43:50 1760737430

That's wolfram mathematica.

harrall · 2025-10-17T23:51:49 1760745109

Personal faves:

- Mathematica

- Maple

- MathStudio (mobile)

- Ti-89 calculator (high school favorite)

Others:

- SageMath

- GNU Octave

- SymPy

- Maxima

- Mathcad

skylurk · 2025-10-18T13:24:24 1760793864

TI-89 has surprisingly good symbolics tools and solvers for something that runs all year on a single set of AAA batteries. Feels like magic alien tech.

ge96 · 2025-10-17T22:40:29 1760740829

I used it a lot for calc as it would show you how they got the answer if I remember right, also liked how it understands symbols which ibv but cool to paste an integral sign in there

fooker · 2025-10-18T01:13:09 1760749989

> without AI

We only call it AI until we understand it.

Once we understand LLMs more and there's a new promising poorly understood technology, we'll call our current AI something more computer sciency

simonw · 2025-10-18T01:39:41 1760751581

My favorite definition of AI: "AI is whatever hasn't been done yet." - Larry Tesler, https://en.wikipedia.org/wiki/AI_effect

NuclearPM · 2025-10-17T21:55:08 1760738108

Thank you for being honest.

crvdgc · 2025-10-12T04:59:57 1760245197

Some Chinese language source claims that it's a reaction to the Pakistan-US rare earth deal.

My pet theory is that this is intended as an attack to the concept of long-arm jurisdiction itself, due to

1. This is the first ever long-arm jurisdiction policy from China.

2. Diplomatically, China usually advocates for the total sovereignty of each country within its border.

3. The recent chip entity list has been a huge headache.

4. Notice how the language mirrors the US justification for the chip restriction: dual use, national security.

noisy_boy · 2025-10-12T05:23:44 1760246624

> Some Chinese language source claims that it's a reaction to the Pakistan-US rare earth deal.

Maybe they approached India for a deal that was too lopsided in favour of US for the former to accept so US did the show-and-tell cozying up to Pakistan to get a better while publicly shitting on India? Just follow the money?

burnt-resistor · 2025-10-12T15:17:34 1760282254

Xi is getting China ready to attack Taiwan in 2026 or 2027, and the now mutual unwinding of economic relations between the US and China is underway. Still frenemies at this point, but Trump is aiming for more enemy status sooner because it causes media drama and draws attention to him. The US will be screwed because domestic production takes years to happen and it has lost most of its machine tool suppliers, knowledge, and workers. Manufacturing productivity is essential for any sort of war, as evidenced by the history of the American Civil War and WW II.

If the US doesn't impeach and remove Trump and Vance, and get a real, war-time leader who isn't a celebrity reality star ASAP, it will be doomed as China will rapidly seize Taiwan, disrupt Western chip production and plunge the West into an economic armageddon, and likely widen to a war with Japan who would definitely intervene militarily to defend economic technological resources in Taiwan. No more incompetent, self-destructive, corrupt, ideologue chaos can be tolerated.

crvdgc · 2025-10-10T10:02:07 1760090527

From the title I thought they solved math! Turns out to be a framework to use SMT solvers for decision-based proof. For additional types, you still need to write the bridging part. Interesting nonetheless.

ProofHouse · 2025-10-10T16:24:03 1760113443

crvdgc · 2025-10-05T08:03:41 1759651421

Nice. When using OpenAI Codex CLI, I find the /compact command very useful for large tasks. In a way it's similar to the context editing tool. Maybe I can ask it to use a dedicated directory to simulate the memory tool.

EnPissant · 2025-10-05T09:32:55 1759656775

Claude Code already compacts automatically.

crvdgc · 2025-10-05T10:45:45 1759661145

I believe Codex CLI also auto compacts when the context limit is met, but in addition to that, you can manually issue a /compact command at any time.

brulard · 2025-10-05T15:29:44 1759678184

Claude Code had this /compact command for a long time, you can even specify your preferences for compaction after the slash command. But this is quite limited and to get the best results out of your agent you need more than rely on how the tool decides to prune your context. I ask it explicitly to write down the important parts of our conversation into an md file, and I review and iterate over the doc until I'm happy with it. Then /clear the context and give it instructions to continue based on the MD doc.

EnPissant · 2025-10-06T01:16:13 1759713373

Codex was only explicit last time I checked

mritchie712 · 2025-10-05T10:45:48 1759661148

CC also has the same `/compact` command if you want to force it

danielbln · 2025-10-05T11:52:54 1759665174

/compact accepts parameters, so you can tell it to focus on something specific when compacting.

_joel · 2025-10-05T10:54:33 1759661673

/clear too

crvdgc · 2025-10-01T07:09:11 1759302551

Duolingo is useful, but not efficient. When people say I want to learn a language, they often mean I want to learn this language efficiently, e.g. to be able to write an essay like the post says after a realistic period of time.

I personally don't believe its pedagogical deficiency is mere incompetence. The whole business model is to keep you on the platform as long as possible, so why would they make you learn faster rather than just enough to keep you there?

As a long time user before, I have observed a lot of mechanism changes that bear out this observation.