I often wonder how people can look at a log like this and still confidently stat...

quinnjh · 2025-11-21T20:46:09 1763757969

It (thinking steps) has moments of brilliance, and generally convincing looking steps and improved outputs. Wether that is reasoning seems to be a matter of interpretation.

From skimming the log > After popping the 2, the stack is [X11, then pushing X2 would make it [X2, X1]? No, because pushing adds to the top. So after popping 2, the stack is [X1],then pushing X2 adds it on top → [X2, X1].

Wait, no, when you push, you add to the top. So after popping the 2, the stack is [X1], then pushing X2 would make it [X2, X1]? No, wait, the stack is LIFO. So pushing X2 would put it on top of X1 → stack becomes [X2, X1]? No, no. Wait, after popping the 2, the stack is [X1]. Then pushing X2 would make the stack [X2, X1]? No, no. Wait, when you push, the new element is added to the top. So after popping the 2 (so stack is [X1]), then pushing X2 gives [X2, X1]? No, no. Wait, the stack was [X1], then pushing X2 would make it [X2] on top of X1 → so stack is [X2, X1]? Yes, exactly.

garciasn · 2025-11-21T20:50:21 1763758221

Depends on the definition of reasoning:

1) think, understand, and form judgments by a process of logic.

—- LLMs do not think, nor do they understand; they also cannot form ‘judgments’ in any human-relatable way. They’re just providing results in the most statistically relevant way their training data permits.

2) find an answer to a problem by considering various possible solutions

—- LLMs can provide a result that may be an answer after providing various results that must be verified as accurate by a human, but they don’t do this in any human-relatable way either.

—-

So; while LLMs continue to be amazing mimics, thus they APPEAR to be great at ‘reasoning’, they aren’t doing anything of the sort, today.

CamperBob2 · 2025-11-21T20:57:38 1763758658

Exposure to our language is sufficient to teach the model how to form human-relatable judgements. The ability to execute tool calls and evaluate the results takes care of the rest. It's reasoning.

garciasn · 2025-11-21T21:12:57 1763759577

SELECT next_word, likelihood_stat FROM context ORDER BY 2 DESC LIMIT 1

is not reasoning; it just appears that way due to Clarke’s third law.

int_19h · 2025-11-21T22:01:31 1763762491

Sure, at the end of the day it selects the most probable token - but it has to compute the token probabilities first, and that's the part where it's hard to see how it could possibly produce a meaningful log like this without some form of reasoning (and a world model to base that reasoning on).

So, no, this doesn't actually answer the question in a meaningful way.

CamperBob2 · 2025-11-21T21:31:04 1763760664

(Shrug) You've already had to move your goalposts to the far corner of the parking garage down the street from the stadium. Argument from ignorance won't help.