Be sure to turn on "pedantic mode" to get the footnotes that make this make more...

LegNeato · 2025-10-23T17:09:50 1761239390

More software than you think can run fully on the GPU, especially with datacenter cards. We'll be sharing some demos in the coming weeks.

simonask · 2025-10-23T17:13:29 1761239609

With current GPU architectures, this seems unlikely. Like, you would need a ton of cells with almost perfectly aligned inputs before even the DMA bus roundtrip is worth it.

We’re talking at least hundreds of thousands of cells, depending on the calculation, or at least a number that will make the UI very sad long before you’ll see a slowdown from calculation.

Databases, on the other hand…

zozbot234 · 2025-10-23T17:22:34 1761240154

There isn't always a DMA roundtrip; unified memory is a thing. But programming for the GPU is very awkward at a systems level. Even with unified memory, there is generally no real equivalent to virtual memory or mmap() so you have to shuffle your working set in and out of VRAM by hand anyway (i.e. backing and residency is managed explicitly, even with "sparse" allocation api's that might otherwise be expected to ease some of the work). Better GPU drivers may be enough to mitigate this, along with broad-based standardization of some current vendor-specific extensions (it's not clear that real HW changes are needed) but this creates a very real limitation in the scale of software (including the AI kind) you can realistically run on any given device.