Hacker Newsnew | past | comments | ask | show | jobs | submit | Leary's commentslogin

“Large frontier developer”: a frontier developer that together with its affiliates collectively had annual gross revenues in excess of five hundred million dollars ($500,000,000) in the preceding calendar year.

So any large company training LLMs, no matter the capability, is considered a frontier developer?


Except that does make sense because the new country would be the weighted average of the average CO2 contribution for both Americans and Nigerians.


Clearly, that means annexing Nigeria is the best way for US to reduce its GHG emissions.

Just add poor people to your country! Per capita emission metrics HATE this one weird trick!


Jeez the extent some people will go to not install solar.


I'm fine with that. So go ahead, annex Nigeria, you coward.


Amazing analysis, except for the fact China was dirt poor before communism.


That's essentially a blip in time for China's 5000 year history. They quite literally invented paper, printing, gunpowder, etc.

It's not like their society randomly collapsed by itself, either, they had plenty of help with the opium wars...


Sure but China was historically wealthy and advanced well before communism and century of humiliation.


No way. I read Noah Smith (number 1 economist on substack) every week and he says one must measure China in per capita terms for all the good stuff: GDP (preferably nominal), and in aggregate terms for all the bad stuff (pollution, carbon dioxide emissions).

Also, it's logically impossible for China to be good. I have found a mathematical proof:

1: Democracy is good

2: China is not a democracy

Therefore, obviously China is not good.


You did a logical fallacy which is called “denying the antecedent.” From your first two propositions, it doesn’t follow that anything that isn’t a democracy isn’t good.


But how are you gonna foresee the total annihilation of the Chinese semi industry without Jordan the necromancer interpreting party documents for you?


TBH Jordan paid his dues in PRC, did great interviews with interesting people. But TFW he is still... literally brain injury / damaged and now in Washington and seems to be doing more policy/analysis work.


Does anyone know which technology on this tree has the most descendents?


I vibe coded with gpt-5 and the source json (https://www.historicaltechtree.com/api/inventions) to get this list:

Top 10 inventions by number of direct descendants

1: High-vacuum tube — 13

2: Automobile — 12

3: Stored-program computer — 12

4: Voltaic pile — 11

5: High-pressure steam engine — 11

6: Glass blowing — 10

7: Papermaking — 10

8: Bipolar junction transistor — 10

9: Writing (Mesopotamia) — 9

10: MOSFET — 8


Top 10 by total descendants (direct + indirect)

1: Control of fire — 585

2: Charcoal — 444

3: Iron — 422

4: Iron smelting and wrought iron — 419

5: Ceramic — 404

6: Pottery — 402

7: Induction coil — 389

8: Raft — 365

9: Boat — 363

10: Alcohol fermentation — 353

Top 10 by total ancestors (direct + indirect)

1: Robotaxi — 253

2: Moon landing — 242

3: Space telescope — 238

4: Lidar — 236

5: Satellite television — 231

6: Space station — 228

7: Stealth aircraft — 228

8: Reusable spacecraft — 224

9: Satellite navigation system — 224

10: Communications satellite — 224


METR of only 2 hours and 15 minutes. Fast takeoff less likely.


Seems like it's on the line that's scaring people like AI 2027, isn't it? https://aisafety.no/img/articles/length-of-tasks-log.png


It's above the exponential line & right around the Super exponential line


I actually think there's a high chance that this curve becomes almost vertical at some point around a few hours. I think in less than 1 hour regime, scaling the time scales the complexity which the agent must internalize. While after a few hours, limitations of humans means we have to divide into subtasks/abstractions each of which are bounded in complexity which must be internalized. And there's a separate category of skills which are needed like abstraction, subgoal creation, error correction. It's a flimsy argument but I don't see scaling time of tasks for humans as a very reliable metric at all.


Isn't that pretty much in line with what people were expecting? Is it surprising?


No, this is below expectations on both Manifold and lesswrong (https://www.lesswrong.com/posts/FG54euEAesRkSZuJN/ryan_green...). Median was ~2.75 hours on both (which already represented a bearish slowdown).

Not massively off -- manifold yesterday implied odds this low were ~35%. 30% before Claude Opus 4.1 came out which updated expected agentic coding abilities downward.


Thanks for sharing, that was a good thread!


It's not surprising to AI critics but go back to 2022 and open r/singularity and then answer: what "people" were expecting? Which people?

SamA has been promising AGI next year for three years like Musk has been promising FSD next year for the last ten years.

IDK what "people" are expecting but with the amount of hype I'd have to guess they were expecting more than we've gotten so far.

The fact that "fast takeoff" is a term I recognize indicates that some people believed OpenAI when they said this technology (transformers) would lead to sci fi style AI and that is most certainly not happening


>SamA has been promising AGI next year for three years like Musk has been promising FSD next year for the last ten years.

Has he said anything about it since last September:

>It is possible that we will have superintelligence in a few thousand days (!); it may take longer, but I’m confident we’ll get there.

This is, at an absolute minimum, 2000 days = 5 years. And he says it may take longer.

Did he even say AGI next year any time before this? It looks like his predictions were all pointing at the late 2020s, and now he's thinking early 2030s. Which you could still make fun of, but it just doesn't match up with your characterization at all.


I would say that there are quite a lot of roles where you need to do a lot of planning to effectively manage an ~8 hour shift, but then there are good protocols for handing over to the next person. So once AIs get to that level (in 2027?), we'll be much closer to AIs taking on "economically valuable work".


What is METR?



The 2h 15m is the length of tasks the model can complete with 50% probability. So longer is better in that sense. Or at least, "more advanced" and potentially "more dangerous".



To maybe save others some time METR is a group called Model Evaluation and Threat Research who

> propose measuring AI performance in terms of the length of tasks AI agents can complete.

Not that hard to figure out but the way people refer were referring to them made me think it stood for an actual metric.


GPQA Diamond: gpt-oss-120b: 80.1%, Qwen3-235B-A22B-Thinking-2507: 81.1%

Humanity’s Last Exam: gpt-oss-120b (tools): 19.0%, gpt-oss-120b (no tools): 14.9%, Qwen3-235B-A22B-Thinking-2507: 18.2%


Wow - I will give it a try then. I'm cynical about OpenAI minmaxing benchmarks, but still trying to be optimistic as this in 8bit is such a nice fit for apple silicon


Even better, it's 4 bit


Glm 4.5 seems on par as well


GLM-4.5 seems to outperform it on TauBench, too. And it's suspicious OAI is not sharing numbers for quite a few useful benchmarks (nothing related to coding, for example).

One positive thing I see is the number of parameters and size --- it will provide more economical inference than current open source SOTA.


Was the Qwen model using tools for Humanity's Last Exam?


Did it with https://sites.google.com/site/jiaxiongyao16/nighttime-lights...

USA (2013-2023 CAGR: 2.3%) 2014: 6.2% 2015: -5.3% 2016: -1.8% 2017: 15.2% 2018: -4.9% 2019: 4.5% 2020: -5.4% 2021: 6.7% 2022: 14.5% 2023: -3.6%

China (2013-2023 CAGR 7.9%) 2014: -1.7% 2015: -1.2% 2016: -5.1% 2017: 53.3% 2018: -1.0% 2019: 7.5% 2020: 6.5% 2021: 11.4% 2022: 4.2% 2023: 10.8%


Wow, 2017 was a good year


That "feels about right" IMO


Well, how does it compare with published numbers?


Individual yearly number are unlikely to be useful. Likely you can only predict long term trends with the help of fits.


0:10 into the video, that's light skinned to you?!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: