Be sure to turn on "pedantic mode" to get the footnotes that make this make more sense. Some examples of what this means by "applications" would help. I don't think the prediction here is that Excel's main event loop is going to run on the GPU, but I can see that its calculation engine might.
With current GPU architectures, this seems unlikely. Like, you would need a ton of cells with almost perfectly aligned inputs before even the DMA bus roundtrip is worth it.
We’re talking at least hundreds of thousands of cells, depending on the calculation, or at least a number that will make the UI very sad long before you’ll see a slowdown from calculation.
There isn't always a DMA roundtrip; unified memory is a thing. But programming for the GPU is very awkward at a systems level. Even with unified memory, there is generally no real equivalent to virtual memory or mmap() so you have to shuffle your working set in and out of VRAM by hand anyway (i.e. backing and residency is managed explicitly, even with "sparse" allocation api's that might otherwise be expected to ease some of the work). Better GPU drivers may be enough to mitigate this, along with broad-based standardization of some current vendor-specific extensions (it's not clear that real HW changes are needed) but this creates a very real limitation in the scale of software (including the AI kind) you can realistically run on any given device.