Hacker Newsnew | past | comments | ask | show | jobs | submit | badlibrarian's commentslogin

Author uses a lot of odd, confusing terminology and brings CPU baggage to the GPU creating the worst of both worlds. Shader hacks and CPU-bound partitioning and choosing the Greek letter alpha to be your accumulator in a graphics article? Oh my.

NV_path_rendering solved this in 2011. https://developer.nvidia.com/nv-path-rendering

It never became a standard but was a compile-time option in Skia for a long time. Skia of course solved this the right way.

https://skia.org/


> NV_path_rendering solved this in 2011.

By no means is this a solved problem.

NV_path_rendering is an implementation of "stencil then cover" method with a lot of CPU preprocessing.

It's also only available on OpenGL, not on any other graphics API.

The STC method scales very badly with increasing resolutions as it is using a lot of fill rate and memory bandwidth.

It's mostly using GPU fixed function units (rasterizer and stencil test), leaving the "shader cores" practically idle.

There's a lot of room for improvement to get more performance and better GPU utilization.


While the author doesn't seem to be aware of state of the art in the field, vector rendering is absolute NOT a solved problem whether on CPU or GPU.

Vello by Raph Levien seems to be a nice combination of what is required to pull this off on GPUs. https://www.youtube.com/watch?v=_sv8K190Zps


Yeah, I have high hopes for Vello to take off. I could throw away lots of hacks and caching and whatnot if I could do fast vector rendering reliable on the GPU.

I think Rive also does vector rendering on the GPU

https://rive.app/renderer

But it is not really meant (yet?) as a general graphics libary, but just a renderer for the rive design tools.


AFAIK you can use the Rive renderer in your C++ app.

http://github.com/rive-app/rive-runtime


> While the author doesn't seem to be aware of state of the art in the field

The blog post is from 2022, though


You know nothing.

Skia is definitely not a good example at all. Skia started as a CPU renderer, and added GPU rendering later, which heavily relies on caching. Vello, for example, takes a completely different approach compared to Skia.

NV path rendering is a joke. nVidia though that ALL graphics would be rendered on GPU within 2 years after making the presentation, and it took 2 decades and 2D CPU renderers still shine.


I believe Skia's new Graphite architecture is much more similar to Vello

Right. The question is does Skia grows its broad and useful toolkit with an eye toward further GPU optimization? Or does Vello (broadened and perhaps burdened by Rust and the shader-obsessive crowd) grow a broad and useful API?

There's also the issue of just how many billions of line segments you really need to draw every 1/120th of a second at 8K resolution, but I'll leave those discussions to dark-gray Discord forums rendered by Skia in a browser.


> There's also the issue of just how many billions of line segments you really need to draw every 1/120th of a second at 8K resolution

IMO, one of biggest benefit of a high performance renderer would be power savings (very important for laptops and phones). If I can run the same work but use half the power, then by all means I'd be happy to deal with the complications that the GPU brings. AFAIK though, no one really cares about that and even efforts like Vello are just targeting fps gains, which do correlate with reduced power consumption but only indirectly.


Adding a power draw into the mix is pretty interesting. Just because a GPU can render something 2x faster in a particular test doesn't mean you have consumed 50% less power, especially when we talk about dedicated GPUs that can have power draw in hundreds of watts.

Historically 2D rendering on CPU was pretty much single-threaded. Skia is single-threaded, Cairo too, Qt mostly (they offload gradient rendering to threads, but it's painfully slow for small gradients, worse than single-threaded), AGG is single-threaded, etc...

In the end only Blend2D, Blaze, and now Vello can use multiple threads on CPU, so finally CPU vs GPU comparisons can be made more fairy - and power draw is definitely a nice property of a benchmark. BTW Blend2D was probably the first library to offer multi-threaded rendering on CPU (just an option to pass to the rendering context, same API).

As far as I know - nobody did a good benchmarking between CPU and GPU 2D renderers - it's very hard to do completely unbiased comparison, and you would be surprised how good the CPU is in this mix. Modern CPU cores consume maybe few watts and you can render to a 4K framebuffer with that single CPU core. Put rendering text to the mix and the numbers would start to be very interesting. Also GPU memory allocation should be included, because rendering fonts on GPU means to pre-process them as well, etc...

2D is just very hard, on both CPU and GPU you would be solving a little bit different problems, but doing it right is insane amount of work, research, and experimentation.


It's not a formal benchmark, but my Browser Engine / Webview (https://github.com/DioxusLabs/blitz/) has pluggable rendering backends (via https://github.com/DioxusLabs/anyrender) with Vello (GPU), Vello CPU, Skia (various backends incl. Vulkan, Metal, OpenGL, and CPU) currently implemented

On my Apple M1 Pro, the Vello CPU renderer is competitive with the GPU renderers on simple scenes, but falls behind on more complex ones. And especially seems to struggle with large raster images. This is also without a glyph cache (so re-rasterizing every glyph every time, although there is a hinting cache) which isn't implemented yet. This is dependent on multi-threading being enabled and can consume largish portions of all-core CPU while it runs. Skia raster (CPU) gets similarish numbers, which is quite impressive if that is single-threaded.


I think Vello CPU would always struggle with raster images, because it does a bounds check for every pixel fetched from a source image. They have at least described this behavior somewhere in Vello PRs.

The obsession for memory safety just doesn't pay off in some cases - if you can batch 64 pixels at once with SIMD it just cannot be compared to a per-pixel processor that has a branch in a path.


It's an argument you can make in any performance effort. But I think the "let's save power using GPUs" ship sailed even before Microsoft started buying nuclear reactors to power them.

So what is the right way that Skia uses? Why is there still discussion on how to do vector graphics on the GPU right if Skia's approach is good enough?

Not being sarcastic, genuinely curious.


The major unsolved problem is real-time high-quality text rendering on GPU. Skia just renders fonts on the CPU with all kinds of hacks ( https://skia.org/docs/dev/design/raster_tragedy/ ). It then renders them as textures.

Ideally, we want to have as much stuff rendered on the GPU as possible. Ideally with support for glyph layout. This is not at all trivial, especially for complex languages like Devanagari.

In the perfect world, we want to be able to create a 3D cube and just have the renderer put the text on one of its facets. And have it rendered perfectly as you rotate the cube.


I know lots of broke-ass people who manage to travel and have a cup of coffee while there. It's choices, not privilege. Author of the piece sure is insufferable, though.

Buffett's restraint was legendary and his transparency even more so.

Bill Gates also initially dismissed him, thinking he had nothing to learn.

General Electric also tried to "make a number go up" and effed up the insurance part despite having Buffett as a model and putting 10,000 people through their custom management training facility every year.

Raise a glass to the man and read his letters.


Almost. A modem sometimes had a phone jack as well as a coupler, for those cases when the handset was hardwired into the phone and the phone was hardwired into the wall.

We tapped where we could and we were happy. Bonus points if the rotary phone had a lock on it and you dialed out by pulsing the hangup switch.


Often, one could dial out by pulsing the on hook switch on any phone. Ask me how I know. That was such a fun discovery! I did it frequently from many different phones.


I thought I was being clever by coining the term "non-invasive phrenology" but it appears people are already using it non-ironically.


In many ways old-school bump measurement is actually less invasive


("wallet biopsy" is another fun term if you haven't encountered it)


Cashectomy.


I saw Parvizi say this in a talk back in 2019!


The Fourier Transform isn't even Fourier's deepest insight. Unless we're now ranking scientific discoveries based on whether or not they get a post every weekend on HN.

The FFT is nifty but that's FINO. The Google boys also had a few O(N^2) to O(N log N) moments. Those seemed to move the needle a bit as well.

But even if we restrict to "things that made Nano Banana Pro possible" Shannon and Turing leapfrog Fourier.


>Unless we're now ranking scientific discoveries based on whether or not they get a post every weekend on HN.

Glad I'm not the only one who noticed there is a weekly (or more) post on what Fourier transform is.


It's really getting in the way of all the daily AI opinion pieces I come here to read.

More seriously, there are tens of thousands of people who come to HN. If Fourier stuff gets upvoted, it's because people find it informative. I happen to know the theory, but I wouldn't gatekeep.


I'm reminded of the 1983 deal to corner the market on Frozen Concentrated Orange Juice.


Or the Hunt brothers and silver which was just a few years before that.

How'd that turn out? https://en.wikipedia.org/wiki/Silver_Thursday#:~:text=On%20J...


That's a great comparison. The consequences are pretty universal too. History implies this won't end well for OpenAI.


I remember it didn’t work out well for Randolph and Mortimer. Sam may pull it out, though, if he just sells the DRAM now while the market is still hot.


"Mortimer ... we're back!"


Sell! Sell! Get back in there and sell!


That's not the point of the Identity. You exponentiated the beauty right out of it.


Beauty is in the eye of the beholder.

Instead shoehorning it into an arbitrary symbol salad by gimping its generality, I prefer the one which makes a statement: "What does it mean to apply inversion partially?"


Which would be e^(i*tau) - 1 = 0 if you wanted to honor the spirit of the Identity.


It's not the 19th Century. You don't need to punch holes in cards to help the machine "think" any more.


> You don't need to punch holes in cards to help the machine "think" any more.

That's literally what "prompt engineering" is, though.


"Transpose this MIDI file down a third" requires neither a specialized data format nor fancy prompt engineering. ChatGPT asked: "A) Major third up (+4 semitones) or B) Minor third up (+3 semitones)" then did it.


I still don't understand how this or the top level comment are related to the post.

I also don't get how you can claim we don't have to 'punch holes in cards to help the machine "think"', and also mention a MIDI file in your next comment. MIDI is much closer to punch cards than the proposed file format in the post.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: