More

explorigin · 2026-01-30T19:34:01 1769801641

https://unsloth.ai/docs/models/kimi-k2.5

Requirements are listed.

KolmogorovComp · 2026-01-30T20:10:05 1769803805

To save everyone a click

> The 1.8-bit (UD-TQ1_0) quant will run on a single 24GB GPU if you offload all MoE layers to system RAM (or a fast SSD). With ~256GB RAM, expect ~10 tokens/s. The full Kimi K2.5 model is 630GB and typically requires at least 4× H200 GPUs. If the model fits, you will get >40 tokens/s when using a B200. To run the model in near full precision, you can use the 4-bit or 5-bit quants. You can use any higher just to be safe. For strong performance, aim for >240GB of unified memory (or combined RAM+VRAM) to reach 10+ tokens/s. If you’re below that, it'll work but speed will drop (llama.cpp can still run via mmap/disk offload) and may fall from ~10 tokens/s to <2 token/s. We recommend UD-Q2_K_XL (375GB) as a good size/quality balance. Best rule of thumb: RAM+VRAM ≈ the quant size; otherwise it’ll still work, just slower due to offloading.

Gracana · 2026-01-30T20:21:38 1769804498

I'm running the Q4_K_M quant on a xeon with 7x A4000s and I'm getting about 8 tok/s with small context (16k). I need to do more tuning, I think I can get more out of it, but it's never gonna be fast on this suboptimal machine.

segmondy · 2026-01-30T21:41:03 1769809263

you can add 1 more GPU so you can take advantage of tensor parallel. I get the same speed with 5 3090's with most of the model on 2400mhz ddr4 ram, 8.5tk almost constant. I don't really do agents but chat, and it holds up to 64k.

Gracana · 2026-01-30T21:58:52 1769810332

That is a very good point and I would love to do it, but I built this machine in a desktop case and the motherboard has seven slots. I did a custom water cooling manifold just to make it work with all the cards.

I'm trying to figure out how to add another card on a riser hanging off a slimsas port, or maybe I could turn the bottom slot into two vertical slots.. the case (fractal meshify 2 xl) has room for a vertical mounted card that wouldn't interfere with the others, but I'd need to make a custom riser with two slots on it to make it work. I dunno, it's possible!

I also have an RTX Pro 6000 Blackwell and an RTX 5000 Ada.. I'd be better off pulling all the A7000s and throwing both of those cards in this machine, but then I wouldn't have anything for my desktop. Decisions, decisions!

esafak · 2026-01-30T21:01:31 1769806891

The pitiful state of GPUs. $10K for a sloth with no memory.

explorigin · 2026-01-13T16:58:47 1768323527

> CTO & Co-Founder, GovHawk.

Checks out. More legislation boosts your business.

showerst · 2026-01-13T17:25:15 1768325115

Both making and removing regulation boosts my business, as my clients care about changes. That said, I assure you that one regulation getting made out of millions has no effect on my bottom line.

The drone thing is a personal opinion. If the US ends up in a war (whether it’s one I agree with or not, likely not), I don’t want millions of drones to be remote controllable by the folks we’re fighting.

sroussey · 2026-01-13T19:08:18 1768331298

This is silly. You are worried that china will remotely control my DJI mini during a war?

How will it even get out of my closet to create such havoc for 26 minutes before it drops out of the sky begging me for a new battery?

showerst · 2026-01-13T21:26:18 1768339578

I'm honestly much more worried about the fact that China has access to production lines for zillions of the things than what they'd do with existing ones, but I did make the comment so I'll run with it =).

Let's put on our fun James Bond villain hats for a bit.

The US has around 1.75MM drones that people have bothered to register. DJI has around 75% of that, so call it 1.25MM. This registration program is relatively new so let's say 750K of those are still operable.

How many of those are in the air at any given time? Keep in mind many of these bigger registered drones are used by businesses.

Let's say it's 1%, so 7,500 drones suddenly open some backdoor and get commanded to do a nose dive for the nearest power line. Now add in the smaller ones that are less likely to do damage, but there are 10x as many. Now combine it with a simultaneous cyber attack on infrastructure, and some pre-planned terror attacks.

Is it going to end the country? Of course not. Is there potential for that to cause huge chaos? I think so.

Is that more absurd than the Hezbollah pager bombings? I don't think so.

So yeah, I'd pay more for my drones, my cars, my cell phone towers, etc etc to avoid them being controlled by a country that we might end up in a stupid war with. I'm not saying you can make everything locally in the modern world, that's absurd. But there are valid strategic and natsec concerns about the US/China trade relationship in 2025.

rixtox · 2026-01-14T08:59:50 1768381190

If US and China, two nuclear powers, engaged in direct wars, drone attacks would be the least to be worried about.

sroussey · 2026-01-14T21:21:08 1768425668

> Is that more absurd than the Hezbollah pager bombings? I don't think so.

OMG, get serious. DJI can't blow up the drones. It is in my closet, not my pocket.

Again, this is just silly. Even for James Bond! ;p

It is more "We have to do something!!" that reminds me of cities in California having moratoriums on building new housing -- who would have thought that people would want to build in order to live in a nice climate. But really was about a new kind of neighbor...

layla5alive · 2026-01-13T23:22:14 1768346534

THIS.

explorigin · 2026-01-02T21:56:20 1767390980

Go type this into perplexity: "Are there any health studies about what exposure to pornography does to childhood development?"

Here's another good one: "Are there any health studies about what exposure to violence or horror does to childhood development?"

There is a reason that rating systems exist and that we shelter children from these things.

The pre-rebuttal that you posted "this was common in my childhood" is no indicator that this was a healthy behavior for you or the masses.

pnt12 · 2026-01-02T22:30:23 1767393023

That's an even weaker argument: AI and ratings.

Ratings are very criticized by artists, eg as being fueled by conservative moms. For example, in the USA, movies with guns and explosion can be shown to younger audiences than nudity - seems very illogical.

Also, some anecdotes: lots of my friends were into GTA as kids, ie early teens, and turned out fine. Comparing to kids who didn't do so well, I consider the most important factors to br family, education, and finances, not violent multimedia.

With that being said, I'm sympathetic to limiting internet access due to communication with strangers, and extreme content (eg violent rethorics that appeal to action, not fantasy violence).

UqWBcuFx6NV4r · 2026-01-03T05:16:27 1767417387

Okay. Society isn’t asking you to police how parents choose to parent. Not like this. It is reasonable for someone to want to be able to buy something advertised as having a certain feature without it being implemented with malicious deception. Nobody wants to have the “are bideo games good or bad?” debate again.

explorigin · 2025-10-08T14:19:07 1759933147

So many parental opinions on here. Not every kid is the same. Trying to apply blanket parental strategies speaks of ignorance. I have neurodivergent kids and this could be great for them.

gantengx · 2025-10-09T01:00:59 1759971659

Thanks for this. Our child psychologist recommended structured routines, and paper-based scheduling just didn't work for us hence the app

explorigin · 2025-09-27T17:39:19 1758994759

I bought some ebooks from other vendors to avoid lock-in and side-loaded them on my kindle. Last year, if Amazon sold one of these titles it would dissappear if I turned on wifi. I now have a kobo.

explorigin · 2025-09-18T13:07:05 1758200825

I've been using my steamdeck as my personal computer for more than a year now. It's desktop mode is a polished KDE experience that anyone could use.

hasperdi · 2025-09-18T13:13:16 1758201196

Are you using the standard Steam OS desktop mode, or installed a different linux distro with KDE?

explorigin · 2025-09-18T17:11:31 1758215491

Standard desktop mode.

explorigin · 2025-09-18T01:53:22 1758160402

I've done live demos of AI. Even with the same queries, I got a different answers than my 4 previous practice attempts. My demos keep me on my toes and I try to limit the scope much more now.

(I didn't have control over temperature settings.)

danjc · 2025-09-18T06:23:25 1758176605

It looks like true 0-temperature (i.e. determinism) will happen. Here's some good context: https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

ayewo · 2025-09-18T11:28:46 1758194926

HN discussion https://news.ycombinator.com/item?id=45200925

FergusArgyll · 2025-09-18T11:36:50 1758195410

But 0 temp is much less "Creative" and may not be conducive to showing off the AI's latest tricks

explorigin · 2025-09-18T17:12:47 1758215567

True. It depends on the feature you're demoing...but determinism is a VERY DESIRABLE feature for giving demos.

hdjrudni · 2025-09-18T03:46:02 1758167162

> (I didn't have control over temperature settings.)

That's...interesting. You'd think they'd dial the temperature to 0 for you before the demo at least. Regardless, if the tech is good, I'd hope all the answers are at least decent and you could roll with it. If not....then maybe it needs to stay in R&D.

danpalmer · 2025-09-18T03:54:16 1758167656

Reducing temperature to 0 doesn't make LLMs deterministic. There's still a bunch of other issues such as float math results depending on which order you perform mathematically commutative operations in.

riffraff · 2025-09-18T04:50:55 1758171055

I keep reading this but I don't get it: for the same input shouldn't the order of resulting operations be deterministic too?

NitpickLawyer · 2025-09-18T06:05:53 1758175553

It gets more complicated with things like batch processing. Depending on where in the stack your query gets placed, and how the underlying hardware works, and how the software stack was implemented, you might get small differences that get compounded over many token generations. (vLLM - a popular inference engine, has this problem as well).

danpalmer · 2025-09-18T05:01:30 1758171690

Not necessarily. This is a good blog post from a few days about it: https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

riffraff · 2025-09-19T04:59:39 1758257979

Fantastic article, thanks!

bschwindHN · 2025-09-18T05:25:14 1758173114

Previous discussion:

https://news.ycombinator.com/item?id=19567011

And a quora link (sorry):

https://www.quora.com/If-floating-point-addition-isnt-associ...

HDThoreaun · 2025-09-18T15:19:23 1758208763

Associative property of multiplication breaks down with floating point math because of the error. If the engine is multithreaded then its pretty easy to see how ordering of multiplication can change which can change the output.

explorigin · 2025-09-03T12:56:24 1756904184

If you click on the link, they show a comparison chart with other similar models.

explorigin · 2025-08-21T23:32:18 1755819138

I suppose it depends if AI is writing the tests an documentation.

explorigin · 2025-08-08T16:11:51 1754669511

For me it was the lack of confirmation with the backend. When it was the next big thing, it sent changes to the backend without waiting for a response. This made the interface crazy fast but I just couldn't take the risk of the FE being out-of-sync with the backend. I hope they grew out of that model but I never took it serious for that one reason.

rogerkirkness · 2025-08-08T16:18:41 1754669921

Yeah I built my first startup on Meteor, and the prototype for my second one, but there was so many weird state bugs after it got more complicated that we had to eventually switch back to normal patterns to scale it.