2020-global/talks/03_LATAM/02-glowcoil.txt

      Ochre: Highly portable GPU accelerated vector graphics
      Micah
      
       Inaki: Welcome back, everyone, for our second talk. I would like to present Micah. Micah is a software engineer who works in the game industry by day and is interested in Rust's potential for applications from game development to music software and visual art. He will be presenting Ochre, a vector graphics and text rendering library that's designed to be portable and GPX accelerated.
      >> Glowcoil shows how vectors can act to create a great UI. In fact, they are easy to do on a slow GPU! And they won't fall together with stacked.
      >> Okay. Looks like we have a...
      >> Glowcoil shows how vectors can ask, and in fact they are easy to do on a slow GPU and they won't fall together when stacked.
      >> So, I'm really sorry about this little hiccup. Let me finish the introduction. So, yes. As he was saying, Micah works as a software engineer in the games industry, does music and also GPU accelerated software. And if I understand correctly, you do also portable stuff. So, you work on mobile phones.
      Micah: Portable is more intended in the sense of just working on different operating systems, potentially mobile, potentially in the browser as well.
      >> Cool. Well, with the intro, thank you very much and take it away.
      Micah: Thanks. Hi, I'm Micah Johnston. I also go by the username glowcoil. Today I'm presenting a project I have been working on called Ochre which is a GPU accelerated vector graphics and text renderer library for Rust. And the primary use case that I've intended Ochre for is UI rendering. So, first, I'm gonna try to answer the question why I'm making a vector graphics render in the first place. So, I wouldn't make the claim that vector graphics is by far the dominant type of representation for graphical content in user interfaces today. We use it for our font formats and as well as emoji. And HTML and CSS are... they basically comprise a vector format and a pretty massive amount of UIs use HTML, CSS and the browser rendering engine today.
      And then beyond that, most OS platform UI toolkits and cross platform UI toolkits all use... pretty much use vector graphics at this point. And that's for some important reasons. So, first of all, it's just more space efficient to store the vector form of something than the image form. Especially at higher resolutions. But beyond that, it's a resolution independent format. So, suppose you want your app to run on both a retina MacBook and an older 1080p monitor, if you're using images, you have to export a new image for the retina MacBook, whereas they can be rendered the same on similar pixel densities. That's a powerful benefit. Beyond that, it's just a good toolkit for if you have anything that's layout dependent based on the size of why window or if you're doing something that's a wave form visualizer or    or a line graph in a data visualization program. Vector graphics, it's a really good toolkit for procedural visualizations like that.
      So, we would like to have a vector graphics renderer for our UIs. Now I will try to answer the question, why does it need to be GPU accelerated? There are two trends over the past 15, 20 years or so that are the reason I would say GPU acceleration is important for UIs. The resolution of screens are going up. This is a visualization of iPhone sizes over time. This is a drastic increase in and doesn't include the latest iPhone, it's like 2700x1100 pixels. That's a lot of pixels. We need to render every frame. If you want your app to run at a smooth 60 frames per second, you have to render a lot more pixels every second. And that means using more computing power.
      So, at the same time we don't want to drain battery from the laptop or a mobile phone. And you need to hit that 60 frames per second deadline if you want to have a smooth app. So, the other important trend is that GPUs have become really ubiquitous in consumer hardware. So, you can... you can get a lot more computation done both per second and per watt with a GPU than a consumer single CPU core. So, this lets you be both more efficient and hit your frame deadlines and also more power efficient so that you don't drain the battery or use too much power just for rendering something simple like a UI which because presumably you would like to use the rest of your CPU for other applications. Your app is presumably not just a UI.
      So, because of increasing resolutions and you should note, refresh rates are also starting to become an issue because 120 hertz, 144 hertz monitors are starting to enter the market. So, that's even more pixels you have to paint per second. GPUs are really good for highly parallel tasks. And rendering happens to a lot of aspects of 2D rendering happen to be highly parallel. Since a lot of tasks you do the same operation per pixel. So, it's a good fit for GPUs. The set of GPUs available in consumer hardware are a good fit for UIs. And as kind of evidence of this, both macOS and Windows have been using GPU hardware to accurate... basically the step uptaking the windows you have home and painting them on to one frame on your monitor, which is an operation called compositing. Both the Mac and the Windows have been using that since    Mac since 2004 and Windows since 2007 with Vista. In addition to that, browsers are also increasingly taking advantage of the GPU.
      So, there's evidence that it is a good idea to use GPU acceleration to render the UI for both efficiency and power efficiency reasons. So, hopefully I've convinced you that a GPU accelerated vector renderer is a desirable thing to have for a UI. But using the GPU comes with a catch. That is that you can't write a single program and run it on every GPU out in the wild. There is a big variety of manufacturers and then of APIs and platforms which give you some subset of the APIs, but not all of them. So, I have this table here showing which APIs are available on which operating systems.
      And there isn't really an API that has full coverage of all the operating systems you might want to target. OpenGL looks promising. But it's officially deprecated on Apple platforms. And even before that, there was only an older version with more limited features was supported. And this table actually looks a bit more rosy than the real situation. Because and the... Vulkan and the Metal are only available    if you have a 10 year old desktop, it may not be able to run the features. If you have picked the GPU API, you have to look at which you are willing to target and which users you are going to exclude from your application. So, portability is kind of a hard question. You have to figure out how you're going to make your application use different APIs on different platforms if do want to be cross platform.
      Probably the hardest part of this, when you write code that runs on the GPU, you use what's called a shading language. And each of those different APIs have different shading languages. So, either you're going to have to rewrite your shaders for each platform, or you're going to have to figure out some cross compilation setup where your build system includes a shader compiler which increases complexity, and you have to negotiate platform specific features when you're doing so. It's a big headache. The approach I have taken with Ochre is just to choose a small subset. The smallest subset of GPU features that are going to be available pretty much anywhere. That are beginning to let us leverage GPU performance advantages and still let us do the UI graphics that we would like to do.
      So, to get into a little bit more detail about what a vector renderer has to do, there are two aspects. The first aspect is you take the shapes. For instance, a font glyph, a letter and a font, it basically defines a soled region bounded by a curve and the renderer has to determine which pixels are inside or outside that curve and then fill them in with the appropriate color. Whether there's a solid color or a gradient.
      And the other aspect is taking multiple shapes like that and compositing them in order using what's called the painter's algorithm. It's called that because later things that you draw paint over earlier things. The second step, compositing, on the left here, GPUs are really good at it. And as I mentioned before, this is what operating systems and browsers have been making a lot of use of GPUs for, for a long time now. It's an interesting thing to do with GPUs, they're good at it. It's much more power efficient than a CPU. On the other hand, the operation which I will call painting, determining inside and outside of the shape like this doesn't come naturally to GPUs since they kind of only natively speak in triangles. So, you have to somehow translate these types of shapes into triangles in one way or another for the GPU to understand.
      So, that's the hard part. There are a lot of different approaches to doing so. So, I have this kind of spectrum here from renderers that use more CPU to renderers that use more GPU. This is a huge oversimplification. This is super subjective. And in different ways, you could argue that they should be in different orders. So, this is just intended to kind of give a broad overview. So, on one end we have dune rendering entirely, and then there's Tessellation that busts apart this into triangles and shovels them over to the GPU to be rendered. It's robust, it's simple. It does work well for performance. But there are some big downsides. Namely that GPUs aren't capable of rendering triangles with the kind of anti aliasing that you need for small text. You have to do a hybrid approach with stereotypes, and maybe rendering the bigger shapes on the GPU. And there's another approach called stencil and cover. It uses a feature of the GPU called the tensile buffer to rather than doing the kind of mathematically hard of breaking a curve into triangles, it draws a point and draws a fan of triangles out from that point in such a way that they all cancel each other out to leave only the points inside the shape filled and the points outside the shape empty.
      And it has the same disadvantages as Tessellation. It's less GPU. It's using more than the rationing. It's a tradeoff, it's not a pure win. There's an example of the library that does that called NanoVG. It uses the stencil and can have cover approach, takes a hybrid approach with text rendering, text on the CPU and shapes on the GPU. It's similar to Ochre, it's minimal library focused on portability and working on as many situations as possible. Moving further down, it's more complicated approaches such as pathfinder which is a Rust library written by Patrick Walton. It was probably the number one inspiration for Ochre. I'm not sure I would have written Ochre without Pathfinder I have to give big acknowledgments to Patrick for that.
      And it's a refinement of the stencil and cover approach that does a lot of the CPU work, so you only have to do work near the edges rather than the big parts of the shape. It's CPU/GPU hybrid. Both places. It can outperform rasters like Cairo. It's a hybrid approach like that, it's not pure GPU. And then you... you    the next item on my list here is a vector textures architecture.
      It works kind of differently. The way it works is you have a CPU pre process. And then you can render it many times on the GPU from many angles. For instance, at 3D scene. So, it offloads a lot of the work to the GPU. But it has performance tradeoffs more appropriate to the game than a UI because the reason for that is the end to end render time from loading a font or generating a scene from scratch to processing it, uploading it, and rendering it on the GPU is not even really faster than maybe full on CPU rasterization sometimes. So, I don't consider this a good approach for UIs. But it is a good approach for other situations such as games.
      And then finally, last on the list, we have a pure GPU compute renderer. Which uploads the scene as a data structure to the GPU. Uses modern GPU compute features to render it from scratch there. You'll notice Pathfinder also appears here because Patrick Walton in recent months developed another renderer for Pathfinder that uses GPU community. And there's another Rust library, piet gpu by Levine. And they are impressive performance with a high end GPU. They scale up to really using the GPU. But there's kind of this central tradeoff here. As you use the GPU better, you can scale up better with bigger GPUs. But it makes it harder to work on older hardware and makes it harder to port between different APIs. And this is, as I understand it, this is why Pathfinder has both renderers. Because the non compute rendering is trying to achieve more portability. Ochre is trying to strike a balance on this tradeoff. It's an attempt to strike a balance closer to Tessellation. But also get higher quality so you can render text with it.
      Let me go over a little bit of my earlier process that led me to the current state of Ochre. So, the first thing they worked on, I was working on this for about a year starting about a year and a half ago. And it works like the vector textures architecture I mentioned earlier. So, it has the tradeoffs that I mentioned where it's better suited to a 3D game where you render something that's the same. Many times from different angles. And I put a lot of work into this and then I decided it wasn't the appropriate approach for UIs. So, I've kind of shelved it for now. But Gouache will return eventually. But I was searching for something that struck that tradeoff better for UIs.
      And this was the first thing I got really excited about. I call it sparse scanline rendering. And the way it works is it only renders the pixels that are intersected by the outlines. And then it... and then it uploads those at horizontal lines, the GPU. I like to call it the GL_lines gobrrrr architecture. You can see on this diagram, anything that's not a pixel intersected by the curve doesn't have to be processed on the CPU and gets filled in by the GPU, which is good at doing kind of simple, highly parallel tasks like filling in a bunch of solid pixels.
      So, yeah. You can see here. This is what the horizontal lines look like that get uploaded to the GPU. And you can think of this architecture as run length encoding the image. So, you have to upload less data and compute less data in the first place because you inherently skip all the work of this solid spans. And this was... this had much better performance than I expected from how kind of weird of a design and also how simple it is. So, for complex scenes like the    the    there's a tiger scene that this is a clipping of. I was getting times that were 10xs as fast as Cairo, which is a single threaded CPU renderer. That was really promising.
      But it has some downsides. Basically, it doesn't handle humongous solid spaces very well. So, if you're rendering a full screen rectangle, the GPU gets stressed out by how many lines you're trying to shovel through it. And it is about 5xs slower than doing it with two rectangles. I tweaked this approach and ended up closer to... this is similar to what Pathfinder does, it does more on the CPU, whereas Pathfinder does more on the GPU. Basically I break it down into edge, tiles and spans. Rather than edge pixels and solid spans, I have 8x8 edge tiles and then solid 8xN spans in the middle. And I pack these into an Atlas texture, upload that to the GPU and render it all using triangles to make up those rectangles. This is an example texture Atlas    this is what it looks like for the tiger. This is all of those little 8x8 chunks put in order in the Atlas.
      So, that's how Ochre works. And I'm gonna get a little bit more into how it... how it works from an API design standpoint. So, basically, I    I wanted Ochre to be usable as a component rather than kind of taking over the design of your program if you use it. So, when it builds these tiles and spans, what it does is it just builds that data for you and then you can take that data. And you can upload it yourself to the GPU. So, that lets you    whether you're on direct X or OpenGL or Metal, whether you have an existing game energy you want to use or you're using some proprietary console API that I couldn't have foreseen or added to Ochre as an API, you can just add the support yourself very straight forwardly using some simple operations.
      And it will still use the GPU efficiently. But all the work... all the work to build this data is done for you by Ochre and then, so, I like to think of this as humble library design where you    you respect what the user wants to do. The user knows best their platform and performance things. Performance constraints. And you just make a library that can serve as a component alongside the other components of an application. Rather than kind of trying to take over and insist on things being done a certain way. I think that's the best way to be portable because it allows many different situations to make use of your library.
      And I guess one more shoutout for inspiration here walk the Dear ImGui library, a UI rendering library for C++ which takes a very similar approach and it's been used in very diverse scenarios including the codebase for the large Hadron Collider. Being humble like this gets you a long way. Anyway, that's how Ochre works and that's why I made it the way I made it. It's on GitHub. Hoping to release it on crates.io. Thank you for listening. I don't think I have very much time left. I think I have a minute or two left.
      Inaki: We'll take questions in the chat. Thank you, Micah, I think we'll take questions in the chat.
      Micah: Okay. Well, thank you, everyone.
      Inaki: Thank you for that great talk. It's... graphics is on everyone's minds lately.