More

Leary · 2025-09-09T18:46:11 1757443571

“Large frontier developer”: a frontier developer that together with its affiliates collectively had annual gross revenues in excess of five hundred million dollars ($500,000,000) in the preceding calendar year.

So any large company training LLMs, no matter the capability, is considered a frontier developer?

Leary · 2025-09-02T20:48:57 1756846137

Except that does make sense because the new country would be the weighted average of the average CO2 contribution for both Americans and Nigerians.

ACCount37 · 2025-09-02T20:58:22 1756846702

Clearly, that means annexing Nigeria is the best way for US to reduce its GHG emissions.

Just add poor people to your country! Per capita emission metrics HATE this one weird trick!

triceratops · 2025-09-02T21:34:16 1756848856

Jeez the extent some people will go to not install solar.

yongjik · 2025-09-03T04:28:59 1756873739

I'm fine with that. So go ahead, annex Nigeria, you coward.

Leary · 2025-08-28T23:17:00 1756423020

Amazing analysis, except for the fact China was dirt poor before communism.

Crestwave · 2025-08-29T12:16:15 1756469775

That's essentially a blip in time for China's 5000 year history. They quite literally invented paper, printing, gunpowder, etc.

It's not like their society randomly collapsed by itself, either, they had plenty of help with the opium wars...

aurareturn · 2025-08-29T03:26:04 1756437964

Sure but China was historically wealthy and advanced well before communism and century of humiliation.

Leary · 2025-08-28T19:30:44 1756409444

No way. I read Noah Smith (number 1 economist on substack) every week and he says one must measure China in per capita terms for all the good stuff: GDP (preferably nominal), and in aggregate terms for all the bad stuff (pollution, carbon dioxide emissions).

Also, it's logically impossible for China to be good. I have found a mathematical proof:

1: Democracy is good

2: China is not a democracy

Therefore, obviously China is not good.

layman51 · 2025-08-28T19:51:46 1756410706

You did a logical fallacy which is called “denying the antecedent.” From your first two propositions, it doesn’t follow that anything that isn’t a democracy isn’t good.

Leary · 2025-08-28T17:53:44 1756403624

But how are you gonna foresee the total annihilation of the Chinese semi industry without Jordan the necromancer interpreting party documents for you?

maxglute · 2025-08-28T18:13:34 1756404814

TBH Jordan paid his dues in PRC, did great interviews with interesting people. But TFW he is still... literally brain injury / damaged and now in Washington and seems to be doing more policy/analysis work.

Leary · 2025-08-07T19:50:23 1754596223

Does anyone know which technology on this tree has the most descendents?

croddin · 2025-08-07T21:11:32 1754601092

I vibe coded with gpt-5 and the source json (https://www.historicaltechtree.com/api/inventions) to get this list:

Top 10 inventions by number of direct descendants

1: High-vacuum tube — 13

2: Automobile — 12

3: Stored-program computer — 12

4: Voltaic pile — 11

5: High-pressure steam engine — 11

6: Glass blowing — 10

7: Papermaking — 10

8: Bipolar junction transistor — 10

9: Writing (Mesopotamia) — 9

10: MOSFET — 8

croddin · 2025-08-07T21:22:05 1754601725

Top 10 by total descendants (direct + indirect)

1: Control of fire — 585

2: Charcoal — 444

3: Iron — 422

4: Iron smelting and wrought iron — 419

5: Ceramic — 404

6: Pottery — 402

7: Induction coil — 389

8: Raft — 365

9: Boat — 363

10: Alcohol fermentation — 353

Top 10 by total ancestors (direct + indirect)

1: Robotaxi — 253

2: Moon landing — 242

3: Space telescope — 238

4: Lidar — 236

5: Satellite television — 231

6: Space station — 228

7: Stealth aircraft — 228

8: Reusable spacecraft — 224

9: Satellite navigation system — 224

10: Communications satellite — 224

Leary · 2025-08-07T17:26:29 1754587589

METR of only 2 hours and 15 minutes. Fast takeoff less likely.

kqr · 2025-08-07T18:31:56 1754591516

Seems like it's on the line that's scaring people like AI 2027, isn't it? https://aisafety.no/img/articles/length-of-tasks-log.png

FergusArgyll · 2025-08-07T20:04:56 1754597096

It's above the exponential line & right around the Super exponential line

Davidzheng · 2025-08-08T04:45:21 1754628321

I actually think there's a high chance that this curve becomes almost vertical at some point around a few hours. I think in less than 1 hour regime, scaling the time scales the complexity which the agent must internalize. While after a few hours, limitations of humans means we have to divide into subtasks/abstractions each of which are bounded in complexity which must be internalized. And there's a separate category of skills which are needed like abstraction, subgoal creation, error correction. It's a flimsy argument but I don't see scaling time of tasks for humans as a very reliable metric at all.

qsort · 2025-08-07T17:33:21 1754588001

Isn't that pretty much in line with what people were expecting? Is it surprising?

usaar333 · 2025-08-07T18:33:28 1754591608

No, this is below expectations on both Manifold and lesswrong (https://www.lesswrong.com/posts/FG54euEAesRkSZuJN/ryan_green...). Median was ~2.75 hours on both (which already represented a bearish slowdown).

Not massively off -- manifold yesterday implied odds this low were ~35%. 30% before Claude Opus 4.1 came out which updated expected agentic coding abilities downward.

qsort · 2025-08-07T18:41:03 1754592063

Thanks for sharing, that was a good thread!

dingnuts · 2025-08-07T18:07:18 1754590038

It's not surprising to AI critics but go back to 2022 and open r/singularity and then answer: what "people" were expecting? Which people?

SamA has been promising AGI next year for three years like Musk has been promising FSD next year for the last ten years.

IDK what "people" are expecting but with the amount of hype I'd have to guess they were expecting more than we've gotten so far.

The fact that "fast takeoff" is a term I recognize indicates that some people believed OpenAI when they said this technology (transformers) would lead to sci fi style AI and that is most certainly not happening

ToValueFunfetti · 2025-08-07T19:08:10 1754593690

>SamA has been promising AGI next year for three years like Musk has been promising FSD next year for the last ten years.

Has he said anything about it since last September:

>It is possible that we will have superintelligence in a few thousand days (!); it may take longer, but I’m confident we’ll get there.

This is, at an absolute minimum, 2000 days = 5 years. And he says it may take longer.

Did he even say AGI next year any time before this? It looks like his predictions were all pointing at the late 2020s, and now he's thinking early 2030s. Which you could still make fun of, but it just doesn't match up with your characterization at all.

falcor84 · 2025-08-07T18:56:06 1754592966

I would say that there are quite a lot of roles where you need to do a lot of planning to effectively manage an ~8 hour shift, but then there are good protocols for handing over to the next person. So once AIs get to that level (in 2027?), we'll be much closer to AIs taking on "economically valuable work".

umanwizard · 2025-08-07T17:45:05 1754588705

What is METR?

ravendug · 2025-08-07T18:26:34 1754591194

https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measu...

tunesmith · 2025-08-07T18:21:32 1754590892

The 2h 15m is the length of tasks the model can complete with 50% probability. So longer is better in that sense. Or at least, "more advanced" and potentially "more dangerous".

Leary · 2025-08-07T18:07:49 1754590069

https://metr.github.io/autonomy-evals-guide/gpt-5-report/

wisemang · 2025-08-08T00:10:57 1754611857

To maybe save others some time METR is a group called Model Evaluation and Threat Research who

> propose measuring AI performance in terms of the length of tasks AI agents can complete.

Not that hard to figure out but the way people refer were referring to them made me think it stood for an actual metric.

Leary · 2025-08-05T17:18:00 1754414280

GPQA Diamond: gpt-oss-120b: 80.1%, Qwen3-235B-A22B-Thinking-2507: 81.1%

Humanity’s Last Exam: gpt-oss-120b (tools): 19.0%, gpt-oss-120b (no tools): 14.9%, Qwen3-235B-A22B-Thinking-2507: 18.2%

jasonjmcghee · 2025-08-05T17:19:18 1754414358

Wow - I will give it a try then. I'm cynical about OpenAI minmaxing benchmarks, but still trying to be optimistic as this in 8bit is such a nice fit for apple silicon

modeless · 2025-08-05T17:41:49 1754415709

Even better, it's 4 bit

amarcheschi · 2025-08-05T17:20:13 1754414413

Glm 4.5 seems on par as well

thegeomaster · 2025-08-05T17:25:45 1754414745

GLM-4.5 seems to outperform it on TauBench, too. And it's suspicious OAI is not sharing numbers for quite a few useful benchmarks (nothing related to coding, for example).

One positive thing I see is the number of parameters and size --- it will provide more economical inference than current open source SOTA.

lcnPylGDnU4H9OF · 2025-08-05T17:38:52 1754415532

Was the Qwen model using tools for Humanity's Last Exam?

Leary · 2025-08-03T23:43:03 1754264583

Did it with https://sites.google.com/site/jiaxiongyao16/nighttime-lights...

USA (2013-2023 CAGR: 2.3%) 2014: 6.2% 2015: -5.3% 2016: -1.8% 2017: 15.2% 2018: -4.9% 2019: 4.5% 2020: -5.4% 2021: 6.7% 2022: 14.5% 2023: -3.6%

China (2013-2023 CAGR 7.9%) 2014: -1.7% 2015: -1.2% 2016: -5.1% 2017: 53.3% 2018: -1.0% 2019: 7.5% 2020: 6.5% 2021: 11.4% 2022: 4.2% 2023: 10.8%

neuroelectron · 2025-08-04T01:06:23 1754269583

Wow, 2017 was a good year

potato3732842 · 2025-08-04T11:55:32 1754308532

That "feels about right" IMO

golem14 · 2025-08-04T04:04:00 1754280240

Well, how does it compare with published numbers?

abdullahkhalids · 2025-08-04T00:15:52 1754266552

Individual yearly number are unlikely to be useful. Likely you can only predict long term trends with the help of fits.

Leary · 2025-06-12T18:09:51 1749751791

0:10 into the video, that's light skinned to you?!