diff --git a/404.html b/404.html
new file mode 100644
index 0000000..243d771
--- /dev/null
+++ b/404.html
@@ -0,0 +1 @@
+<!DOCTYPE html> <html lang="en" xmlns="http://www.w3.org/1999/html"> <meta charset="UTF-8"> <base href="/"> <title>Not found</title> <script>if (window.location.pathname === "index" || window.location.pathname === "index/") { let next_url = window.location.protocol + "//" + window.location.host + "/"; window.location.replace(next_url);} if (window.location.pathname === "table-of-contents" || window.location.pathname === "table-of-contents/") { let next_url = window.location.protocol + "//" + window.location.host + "/table-of-contents.html"; window.location.replace(next_url);} if (window.location.pathname === "meta" || window.location.pathname === "meta/") { let next_url = window.location.protocol + "//" + window.location.host + "/meta.html"; window.location.replace(next_url);} if (window.location.pathname === "pgwm03" || window.location.pathname === "pgwm03/") { let next_url = window.location.protocol + "//" + window.location.host + "/pgwm03.html"; window.location.replace(next_url);} if (window.location.pathname === "boot" || window.location.pathname === "boot/") { let next_url = window.location.protocol + "//" + window.location.host + "/boot.html"; window.location.replace(next_url);} if (window.location.pathname === "pgwm04" || window.location.pathname === "pgwm04/") { let next_url = window.location.protocol + "//" + window.location.host + "/pgwm04.html"; window.location.replace(next_url);} if (window.location.pathname === "threads" || window.location.pathname === "threads/") { let next_url = window.location.protocol + "//" + window.location.host + "/threads.html"; window.location.replace(next_url);} if (window.location.pathname === "static-pie" || window.location.pathname === "static-pie/") { let next_url = window.location.protocol + "//" + window.location.host + "/static-pie.html"; window.location.replace(next_url);} if (window.location.pathname === "kbd-smp" || window.location.pathname === "kbd-smp/") { let next_url = window.location.protocol + "//" + window.location.host + "/kbd-smp.html"; window.location.replace(next_url);} if (window.location.pathname === "rust-kbd" || window.location.pathname === "rust-kbd/") { let next_url = window.location.protocol + "//" + window.location.host + "/rust-kbd.html"; window.location.replace(next_url);} if (window.location.pathname === "x11-to-xcb" || window.location.pathname === "x11-to-xcb/") { let next_url = window.location.protocol + "//" + window.location.host + "/x11-to-xcb.html"; window.location.replace(next_url);} if (window.location.pathname === "rust-linux-kernel-module" || window.location.pathname === "rust-linux-kernel-module/") { let next_url = window.location.protocol + "//" + window.location.host + "/rust-linux-kernel-module.html"; window.location.replace(next_url);} if (window.location.pathname === "test" || window.location.pathname === "test/") { let next_url = window.location.protocol + "//" + window.location.host + "/test.html"; window.location.replace(next_url);} </script> NOT FOUND
\ No newline at end of file
diff --git a/boot.html b/boot.html
new file mode 100644
index 0000000..b980981
--- /dev/null
+++ b/boot.html
@@ -0,0 +1,305 @@
+<!DOCTYPE html>
+<html lang="en" xmlns="http://www.w3.org/1999/html">
+
+    <meta charset="UTF-8">
+    <base href="/">
+    <link rel="stylesheet" href="static/styles.css">
+    <link rel="stylesheet" href="static/github-markdown.css">
+    <link rel="stylesheet" href="static/starry_night.css">
+    <title>Boot</title>
+
+
+<div id="menu">
+<a href=/ class="menu-item">Home</a><a href=/table-of-contents.html class="menu-item">Table of contents</a>
+</div>
+<div id="content">
+<div class="markdown-body"><h1>Boot-rs securing a Linux bootloader</h1>
+<p>I recently dug into a previously unfamiliar part of Linux, the bootloader.</p>
+<p>This is a medium-length write-up of how the Linux boot-process works and how to modify it, told through
+the process of me writing my own janky bootloader.</p>
+<p>I wanted the boot process to be understandable, ergonomic, and secure.</p>
+<h2>Notes about distributions</h2>
+<p>I did what's described in this write-up on Gentoo, although it would work the same on any
+linux machine. Depending on the distribution this setup might not be feasible. Likely these steps would have to
+be modified depending on the circumstance.</p>
+<h2>Preamble, Security keys</h2>
+<p>I got some <a href="https://www.yubico.com/">Yubikeys</a> recently. Yubikeys are security keys, which essentially is a fancy
+name for a drive (USB in this case) created to store secrets securely.</p>
+<p>Some secrets that are loaded into the key cannot escape at all, they can even be created on the key, never having seen
+the light of day.<br>
+Some secrets can escape and can therefore be injected as part of a pipeline in other security processes. An example
+of this could be storing a cryptodisk secret which is then passed to <a href="https://gitlab.com/cryptsetup/cryptsetup">cryptsetup</a>
+in the case of Linux disk encryption.</p>
+<p>I did some programming against the Yubikeys, I published a small runner to sign data with a Yubikey <a href="https://github.com/MarcusGrass/yk-verify">here</a>
+but got a bit discouraged by the need for <a href="https://pcsclite.apdu.fr/">pcscd</a>, a daemon with an accompanying c-library to
+interface with it, to connect.<br>
+Later I managed to do a pure rust integration against the Linux usb interface, and will publish that pretty soon.</p>
+<p>I started thinking about ways to integrate Yubikeys into my workflow more, I started
+examining my boot process, I got derailed.</p>
+<h2>Bootloader woes</h2>
+<p>I have used <a href="https://en.wikipedia.org/wiki/GNU_GRUB">GRUB</a> as my bootloader since I started using Linux, it has generally
+worked well, but it does feel old.</p>
+<p>When I ran <code>grub-mkconfig -o ...</code>, updating my boot configuration, and ran into
+<a href="https://www.reddit.com/r/EndeavourOS/comments/wygfds/full_transparency_on_the_grub_issue/">this</a> issue I figured it
+was time to survey for other options. After burning another ISO to get back into my system.</p>
+<h2>Bootloader alternatives</h2>
+<p>I was looking into alternatives, finding <a href="https://wiki.archlinux.org/title/EFISTUB">efi stub</a>, compiling the kernel
+into its own bootable efi-image, to be the most appealing option.
+If the kernel can boot itself, why even have a bootloader?</p>
+<p>With Gentoo, integrating that was fairly easy assuming no disk encryption.</p>
+<p>Before getting into this, a few paragraphs about the Linux boot process may be appropriate.</p>
+<h2>Boot in short</h2>
+<p>The boot process, in my opinion, starts on the motherboard firmware and ends when the kernel hands over execution to <code>/sbin/init</code>.</p>
+<h3>UEFI</h3>
+<p>The motherboard powers on and starts running UEFI firmware (I'm pretending bios don't exist because I'm not stuck in the past).<br>
+UEFI can run images, such as disk, keyboard, and basic display-drivers, kernels, and Rust binaries.</p>
+<p>Usually, this stage of the process will be short, as the default task to perform is to check if the user wants to enter
+setup and interface with the UEFI system, or continue with the highest priority boot-image.</p>
+<p>That boot image could be a grub.efi-program, which may perform some work, such as decrypting your boot partition and then
+handing execution over to the kernel image.<br>
+It could also be an efi stub kernel image that gets loaded directly, or some other bootloader.</p>
+<h3>Kernel boot</h3>
+<p>The kernel process starts, initializing the memory it needs, starting tasks, and whatever else the kernel does.</p>
+<h3>Initramfs</h3>
+<p>When the kernel has performed its initialization, early userspace starts in the initramfs.<br>
+<a href="https://en.wikipedia.org/wiki/Initial_ramdisk">Initramfs</a>, also called early userspace, is the first place a Linux user
+is likely to spread their bash-spaghetti in the boot-process.</p>
+<p>The initramfs is a ram-contained (in-memory) file-system, <a href="https://cateee.net/lkddb/web-lkddb/INITRAMFS_SOURCE.html">it can be baked into the kernel</a>,
+or provided where the kernel can find it during the boot process. Its purpose is to set up user-space so that it's ready
+enough for <code>init</code> to take over execution. Here is where disk-decryption happens in the case of <code>cryptsetup</code>.</p>
+<p>The Initramfs-stage ends by handing over execution to <code>init</code>:</p>
+<p><code>exec switch_root &#x3C;root-partition> &#x3C;init></code>, an example could be <code>exec switch_root /mnt/root /sbin/init</code>,
+by convention, <code>init</code> is usually found at <code>/sbin/init</code>.</p>
+<p>The initramfs prepares user-space, while <code>init</code> "starts" it, e.g. processes, such as <a href="https://wiki.archlinux.org/title/dhcpcd">dhcpcd</a>,
+are taken care of by <code>init</code>.</p>
+<h3>Init</h3>
+<p>Init is the first userspace process to be started, the parent to all other processes, it has PID 1 and if it dies,
+the kernel panics.
+Init could be any executable, like <a href="https://en.wikipedia.org/wiki/Bash_(Unix_shell)">Bash</a>.</p>
+<p>In an example system where bash is init, the user will be dropped into the command-line, in a bash shell, at the destination that the
+initramfs specified in <code>switch_root</code>. From a common user's perspective this is barely a functional system, it has no internet,
+it will likely not have connections to a lot of peripheral devices, and there is no login management.</p>
+<h4>Init daemon</h4>
+<p>Usually Linux systems have an init daemon. Some common init-daemons are <a href="https://en.wikipedia.org/wiki/Systemd">systemd</a>, <a href="https://en.wikipedia.org/wiki/OpenRC">openrc</a>,
+and <a href="https://en.wikipedia.org/wiki/Runit">runit</a>.<br>
+The init daemon's job is to start processes that make the system usable, up to the user's specification. Usually it
+will start <code>udev</code> to get device events and populate <code>/dev</code> with device interfaces, as well as ready internet interfaces
+and start login management.</p>
+<h2>DIY initramfs</h2>
+<p>I wanted at least basic security, this means encrypted disks, if I lose my computer, or it gets stolen, I can be fairly sure that
+the culprits won't get access to my data without considerable effort.<br>
+Looking back up over the steps, it means that I need to create an initramfs, so that my disks can be decrypted on boot.
+There are tools to create an initramfs, <a href="https://en.wikipedia.org/wiki/Dracut_(software)">dracut</a> being
+one example, <a href="https://wiki.archlinux.org/title/Mkinitcpio">mkinitcpio</a> that Arch Linux uses is another.</p>
+<p>Taking things to the most absurd level, I figured I'd write my own initramfs instead.</p>
+<h3>The process</h3>
+<p>The most basic decrypting initramfs is just a directory which could be created like this:</p>
+<div class="highlight highlight-shell"><pre>[gramar@grentoo /home/gramar/misc/initramfs]<span class="pl-c"># touch init</span>
+[gramar@grentoo /home/gramar/misc/initramfs]<span class="pl-c"># chmod +x init</span>
+[gramar@grentoo /home/gramar/misc/initramfs]<span class="pl-c"># mkdir -p mnt/root</span>
+[gramar@grentoo /home/gramar/misc/initramfs]<span class="pl-c"># ls -lah</span>
+total 12K
+drwxr-xr-x 3 gramar gramar 4.0K Mar 21 15:11 <span class="pl-c1">.</span>
+drwxr-xr-x 4 gramar gramar 4.0K Mar 21 15:11 ..
+-rwxr-xr-x 1 gramar gramar    0 Mar 21 15:11 init
+drwxr-xr-x 3 gramar gramar 4.0K Mar 21 15:11 mnt
+</pre></div>
+<p>The init contents being this:</p>
+<div class="highlight highlight-shell"><pre><span class="pl-c">#!/bin/bash</span>
+cryptsetup open /dev/disk/by-uuid/<span class="pl-k">&#x3C;</span>xxxx<span class="pl-k">></span> croot <span class="pl-c"># Enter password</span>
+cryptsetup open /dev/disk/by-uuid/<span class="pl-k">&#x3C;</span>xxxx<span class="pl-k">></span> cswap <span class="pl-c"># Enter password</span>
+cryptsetup open /dev/disk/by-uuid/<span class="pl-k">&#x3C;</span>xxxx<span class="pl-k">></span> chome <span class="pl-c"># Enter password</span>
+<span class="pl-c"># Mount filesystem</span>
+mount /dev/mapper/croot /mnt/root
+mount /dev/mapper/chome /mnt/root/home
+swapon /dev/mapper/cswap 
+<span class="pl-c"># Hand over execution to init</span>
+<span class="pl-c1">exec</span> switch_root /mnt/root /sbin/init
+</pre></div>
+<p>If we point the kernel at this directory, build it, and then try to boot it, we'll find out that this doesn't work at all,
+and if you somehow ended up here through Googling and copied that, I'm sorry.</p>
+<p>One reason for this is that <code>/bin/bàsh</code> does not exist on the initramfs, we can't call it to execute the commands in the scripts.</p>
+<p>If we add it, for example by:</p>
+<div class="highlight highlight-shell"><pre>[gramar@grentoo /home/gramar/misc/initramfs]<span class="pl-c"># mkdir bin</span>
+[gramar@grentoo /home/gramar/misc/initramfs]<span class="pl-c"># cp /bin/bash bin/bash</span>
+</pre></div>
+<p>Then try again, it still won't work and will result in a kernel panic.<br>
+The reason is that bash (if you didn't build it yourself using dark magic), is dynamically
+linked, we can see that this is indeed the case using <a href="https://en.wikipedia.org/wiki/Ldd_(Unix)">ldd</a>
+to list dynamic dependencies.</p>
+<div class="highlight highlight-shell"><pre>[gramar@grentoo /home/gramar/misc/initramfs]<span class="pl-c"># ldd bin/bash</span>
+        linux-vdso.so.1 (0x00007ffc7f9a1000)
+        libreadline.so.8 =<span class="pl-k">></span> /lib64/libreadline.so.8 (0x00007fd040f06000)
+        libtinfo.so.6 =<span class="pl-k">></span> /lib64/libtinfo.so.6 (0x00007fd040ec6000)
+        libc.so.6 =<span class="pl-k">></span> /lib64/libc.so.6 (0x00007fd040cf3000)
+        libtinfow.so.6 =<span class="pl-k">></span> /lib64/libtinfow.so.6 (0x00007fd040cb2000)
+        /lib64/ld-linux-x86-64.so.2 (0x00007fd04104f000)
+</pre></div>
+<p>Now we can just try to appease <code>Bash</code> here and copy these dependencies into the initramfs at the appropriate places,
+but there are quite a few files, and we risk cascading dependencies, what if we need to update and the dependencies have changed?</p>
+<p>And how about <code>cryptsetup</code>, <code>mount</code>, <code>swapon</code>, and <code>switch_root</code>?</p>
+<h4>Static linking and BusyBox</h4>
+<p>Many of the tools used to interface with Linux (usually) come from <a href="https://en.wikipedia.org/wiki/GNU_Core_Utilities">GNU coreutils</a>.<br>
+There are other sources however, like <a href="https://github.com/uutils/coreutils">the Rust port</a>, but the most popular is likely
+<a href="https://en.wikipedia.org/wiki/BusyBox">BusyBox</a>.</p>
+<p>BusyBox is a single binary which on my machine is 2.2M big, it contains most of the coreutils.<br>
+One benefit of using BusyBox is that it can easily be <a href="https://en.wikipedia.org/wiki/Static_library">statically linked</a>
+which means that copying that single binary is enough, no dependencies required.<br>
+Likewise <code>cryptsetup</code> can easily be statically linked.</p>
+<h3>Busybox initramfs</h3>
+<p>The binaries are placed in the initramfs. (I realize that I need a tty, console, and null to run our shell
+so I copy those too).</p>
+<div class="highlight highlight-shell"><pre>[gramar@grentoo /home/gramar/misc/initramfs]<span class="pl-c"># cp /bin/busybox bin/busybox</span>
+[gramar@grentoo /home/gramar/misc/initramfs]<span class="pl-c"># mkdir sbin        </span>
+[gramar@grentoo /home/gramar/misc/initramfs]<span class="pl-c"># cp /sbin/cryptsetup sbin/cryptsetup</span>
+[gramar@grentoo /home/gramar/misc/initramfs]<span class="pl-c"># cp -a /dev/{null,console,tty} dev</span>
+</pre></div>
+<p>And then change the script's <a href="https://en.wikipedia.org/wiki/Shebang_(Unix)">shebang</a>.</p>
+<div class="highlight highlight-shell"><pre><span class="pl-c">#!/bin/busybox sh</span>
+<span class="pl-k">export</span> PATH=<span class="pl-s"><span class="pl-pds">"</span>/bin:/sbin:<span class="pl-smi">$PATH</span><span class="pl-pds">"</span></span>
+cryptsetup open /dev/disk/by-uuid/<span class="pl-k">&#x3C;</span>xxxx<span class="pl-k">></span> croot <span class="pl-c"># Enter password</span>
+cryptsetup open /dev/disk/by-uuid/<span class="pl-k">&#x3C;</span>xxxx<span class="pl-k">></span> cswap <span class="pl-c"># Enter password</span>
+cryptsetup open /dev/disk/by-uuid/<span class="pl-k">&#x3C;</span>xxxx<span class="pl-k">></span> chome <span class="pl-c"># Enter password</span>
+<span class="pl-c"># Mount filesystem</span>
+mount /dev/mapper/croot /mnt/root
+mount /dev/mapper/chome /mnt/root/home
+swapon /dev/mapper/cswap 
+<span class="pl-c"># Hand over execution to init</span>
+<span class="pl-c1">exec</span> switch_root /mnt/root /sbin/init
+</pre></div>
+<p>Finally, we can execute the init script at boot time, and immediately panic again, <code>cryptsetup</code> can't find the disk.</p>
+<h3>Udev</h3>
+<p>There are multiple ways to address disks, we could for example, copy the disk we need in the initramfs as it shows up
+under <code>/dev</code>, <code>cp -a /dev/sda2 dev</code>.  But the regular disk naming convention isn't static, <code>/dev/sda</code> might be tomorrow's
+<code>/dev/sdb</code>. Causing an un-bootable system, ideally we would specify it by uuid.</p>
+<p>Udev is a tool that finds devices, listens to device events, and a bit more. What we need it for, is to populate
+<code>/dev</code> with the devices that we expect.</p>
+<p>I call it Udev because it's ubiquitous, it's actually a <a href="https://en.wikipedia.org/wiki/Udev">systemd project</a>.<br>
+There is a fork, that used to be maintained by the Gentoo maintainers, <a href="https://en.wikipedia.org/wiki/Systemd#Forks_and_alternative_implementations">Eudev</a>.<br>
+Both of the above are not ideal for an initramfs, what we'd really like is to just oneshot generate <code>/dev</code>.<br>
+Luckily for us, there is a perfect implementation that does just that, contained within BusyBox, <a href="https://wiki.gentoo.org/wiki/Mdev">Mdev</a>.</p>
+<p>To save us from further panics, I will fast-forward through discovering that we need to mount three pseudo-filesystems
+to make mdev work, <code>proc</code>, <code>sys</code>, and <code>dev</code> (<code>dev</code> shouldn't be that surprising). We also need to create the mount points.</p>
+<div class="highlight highlight-shell"><pre>[gramar@grentoo /home/gramar/misc/initramfs]<span class="pl-c"># mkdir proc</span>
+[gramar@grentoo /home/gramar/misc/initramfs]<span class="pl-c"># mkdir dev</span>
+[gramar@grentoo /home/gramar/misc/initramfs]<span class="pl-c"># mkdir sys</span>
+</pre></div>
+<h3>Working initramfs</h3>
+<div class="highlight highlight-shell"><pre><span class="pl-c">#!/bin/busybox sh</span>
+<span class="pl-k">export</span> PATH=<span class="pl-s"><span class="pl-pds">"</span>/bin:/sbin:<span class="pl-smi">$PATH</span><span class="pl-pds">"</span></span>
+<span class="pl-c"># Mount pseudo filesystems</span>
+mount -t proc none /proc
+mount -t sysfs none /sys
+mount -t devtmpfs none /dev
+<span class="pl-c"># Mdev populates /dev with symlinks</span>
+mdev -s
+cryptsetup open /dev/disk/by-uuid/<span class="pl-k">&#x3C;</span>xxxx<span class="pl-k">></span> croot <span class="pl-c"># Enter password</span>
+cryptsetup open /dev/disk/by-uuid/<span class="pl-k">&#x3C;</span>xxxx<span class="pl-k">></span> cswap <span class="pl-c"># Enter password</span>
+cryptsetup open /dev/disk/by-uuid/<span class="pl-k">&#x3C;</span>xxxx<span class="pl-k">></span> chome <span class="pl-c"># Enter password</span>
+<span class="pl-c"># Mount filesystem</span>
+mount /dev/mapper/croot /mnt/root
+mount /dev/mapper/chome /mnt/root/home
+swapon /dev/mapper/cswap 
+<span class="pl-c"># Unmount the pseudo filesystems, except dev which is now busy.  </span>
+umount /proc
+umount /sys
+<span class="pl-c"># Hand over execution to init</span>
+<span class="pl-c1">exec</span> switch_root /mnt/root /sbin/init
+</pre></div>
+<h3>Ergonomics</h3>
+<p>This setup requires me to enter my password three times, which is easily fixed by saving it in a variable and piping
+it into <code>cryptsetup</code>.</p>
+<h2>Reflections on security</h2>
+<p>While the above setup works, it has less security than my last.<br>
+I boot directly into my kernel which now must be unencrypted, and could therefore be tampered with.<br>
+This is a different attack-surface than the last considered one: I lose my laptop. It's: Someone tampers with my
+boot process to get access to my data on subsequent uses.</p>
+<h3>Bootloader tampering</h3>
+<p>Depending on your setup, your bootloader (kernel in this case) may be more or less subject to tampering.<br>
+Usually, one would have the bootloader in a <code>/boot</code> directory, which may or may not be on a separate partition.</p>
+<p>If that directory is writeable only by root, it doesn't really matter if it's on an unmounted partition or not.<br>
+Someone with root access to your machine could edit the contents (or mount the partition and then edit the contents).<br>
+That means that if someone has root access to your machine then your bootloader could be tampered with remotely.</p>
+<h4>Evil maids</h4>
+<p>Another possible avenue of compromise is if someone has physical access to the disk on which you store your bootloader.<br>
+I am not a high-value target, as far as I know at least, and that kind of attack, also known as an <a href="https://en.wikipedia.org/wiki/Evil_maid_attack">evil maid attack</a>
+is fairly high-effort to pull off.  The attacker needs to modify my kernel without me noticing, which for me as a target,
+again, is pretty far-fetched.</p>
+<p>But this is not about being reasonable, it's never been about that, it's about taking things to the extreme.</p>
+<h3>Encrypting the kernel</h3>
+<p>The problem with encrypting the kernel is that something has to decrypt it, we need to move further down the boot-chain.<br>
+I need to, at the UEFI level, decrypt and then hand over execution to the kernel image.</p>
+<h2>Writing a bootloader</h2>
+<p>I hinted earlier at UEFI being able to run Rust binaries, indeed there is an <a href="https://github.com/rust-osdev/uefi-rs">UEFI</a> target
+and library for Rust.</p>
+<h3>Encrypt and Decrypt without storing secrets</h3>
+<p>We can't have the bootloader encrypted, it needs to be a ready UEFI image.<br>
+This means that we can't store decryption keys in the bootloader, it needs to ask the user for input
+and deterministically derive the decryption key from that input.</p>
+<p>Best practice for secure symmetric encryption is <a href="https://en.wikipedia.org/wiki/Advanced_Encryption_Standard">AES</a>,
+since I want the beefiest encryption, I opt for AES-256, that means that the decryption key is 32 bytes long.</p>
+<p>Brute forcing a random set of 32 bytes is currently not feasible, but passwords generally are not random and random brute forcing
+would not likely be the method anyone would use to attack this encryption scheme.<br>
+What is more likely is that a password list would be used to try leaked passwords,
+or dictionary-generated passwords would be used.</p>
+<p>To increase security a bit, the 32 bytes will be generated by a good key derivation function, at the moment <a href="https://en.wikipedia.org/wiki/Argon2">Argon2</a>
+is the best tool for that as far as I know. This achieves two objectives:</p>
+<ol>
+<li>Whatever the length of your password, it will end up being 32 random(-ish) bytes long.
+<li>The time and computational cost of brute forcing a password will be extended by the time it takes to
+run argon2 to the derive a key from each password that is attempted.
+</ol>
+<p>This leaves the attacker with two options:</p>
+<ol>
+<li>Randomly try to brute force every 32 byte combination, which is unfeasible.
+<li>Use a password list and try every known or generated password after running argon2 on it.
+</ol>
+<p>Option 2 may or may not be unfeasible, depending on the strength of the password, transforming a bad password
+into 32 bytes doesn't do much if the password doesn't take enough attempts to guess.</p>
+<h3>Uefi development</h3>
+<p>I fire up a new virtual machine, with UEFI support, and start iterating. The development process was less painful than
+I thought that It would be. The caveat being that I am writing an extremely simple bootloader, it finds the kernel
+on disk, asks the user for a password, derives a key from it using Argon2, decrypts the kernel with that key, and
+then hands over execution to the decrypted kernel. The code for it can be found at <a href="https://github.com/MarcusGrass/boot-rs">this repo</a>.</p>
+<h2>New reflections on security</h2>
+<p>All post-boot content, as well as the kernel is now encrypted, the kernel itself is read straight into RAM and then executed,
+the initramfs decrypts the disks after getting password input, deletes itself, and then hands over execution to <code>init</code>.</p>
+<h3>Bootloader compromise</h3>
+<p>There is still one surface for attack, the unencrypted bootloader.<br>
+A malicious actor could replace my bootloader with something else, take my keyboard input, and decrypt my kernel.
+Or an attacker could replace my bootloader, take my keyboard input (possibly just discarding it), then boot into a malicious kernel where I enter
+my decryption keys, and decrypt my disks.</p>
+<h3>Moving cryptodisk secrets into the initramfs</h3>
+<p>Since the initramfs is now encrypted, an ergonomic move is to create a new decryption key for my disks,
+move that into the initramfs, then use those secrets to decrypt the disks automatically during that stage.</p>
+<p>The "boot into malicious kernel attack", becomes more difficult to pull off.
+I'd notice if my disks aren't being automatically decrypted.</p>
+<h2>Secure boot</h2>
+<p>Some people think Secure Boot and UEFI in general is a cynical push by Microsoft to force Linux desktop user share
+down to zero (from close to zero).  Perhaps, but Secure Boot can be used to add some security to the most sensitive part
+of our now fairly secured boot process.</p>
+<p>Secure Boot works by only allowing the UEFI firmware to boot from images that are signed by its stored cryptographic keys.<br>
+Microsoft's keys are (almost) always vendored and exist in the store by default, but they can be removed (kind of) and
+replaced by your own keys.</p>
+<p>The process for adding your own keys to Secure Boot, as well as signing your bootloader, will be left out of this write-up.</p>
+<h1>Final reflections on security</h1>
+<p>Now my boot-process is about as secure as I am capable of making it while retaining some sense of ergonomics.<br>
+The disks are encrypted and can't easily be decrypted. The kernel itself is decrypted and I would notice if it's replaced
+by something else through the auto-decryption.<br>
+The bootloader cannot be exchanged without extracting my setup password.</p>
+<p>The main causes of concerns are now BUGS, and still, evil maids.</p>
+<ol>
+<li>Bugs in secure boot.
+<li>Bugs in my implementation.
+<li>Bugs in the AES library that I'm using.
+<li>Bugs in the Argon2 library that I'm using.
+<li>Bugs in <code>cryptsetup</code>.
+<li>Bugs everywhere.
+</ol>
+<p>But those are hard to get away from.</p>
+<h1>Epilogue</h1>
+<p>I'm currently using this setup, and I will for as long as I use Gentoo I would guess.
+Once set up it's pretty easy to re-compile and re-encrypt the kernel when it's time to upgrade.</p>
+<p>Thanks for reading!</p>
+</div>
+</div>
\ No newline at end of file
diff --git a/index.html b/index.html
new file mode 100644
index 0000000..cc3e045
--- /dev/null
+++ b/index.html
@@ -0,0 +1 @@
+<!DOCTYPE html> <html lang="en" xmlns="http://www.w3.org/1999/html"> <meta charset="UTF-8"> <base href="/"> <link rel="stylesheet" href="static/styles.css"> <link rel="stylesheet" href="static/github-markdown.css"> <link rel="stylesheet" href="static/starry_night.css"> <title>Marcus Grass' pages</title> <div id="menu"> <a href=/table-of-contents.html class="menu-item">Table of contents</a> </div> <div id="content"> <div class="markdown-body"><h1>About</h1> <p>This site is a place where I intend to store things I've learned so that I won't forget it.</p> <h2>This page</h2> <p>There's not supposed to be a web 1.0 vibe to it, but I'm horrible at front-end styling so here we are.<br> The site is constructed in <code>javascript</code> but as with all things in my free time I make things more complicated than they need to be.<br> There is a <code>Rust</code> runner that takes the md-files, generates html and javascript, and then minifies that.<br> The markdown styling is ripped from <a href="https://github.com/sindresorhus/github-markdown-css">this project</a>, it's GitHub's markdown CSS, I don't want to stray too far out of my comfort zone...</p> <p>The highlighting is done with the use of <a href="https://github.com/wooorm/starry-night">starry-night</a>.</p> <p>All page content except for some glue is just rendered markdown contained in <a href="https://github.com/MarcusGrass/marcusgrass.github.io">the repo</a>.</p> <h2>Content</h2> <p>See the menu bar at the top left to navigate to the table of contents, if I end up writing a lot of stuff here I'm going to have to look into better navigation and search.</p> <h2>License</h2> <p>The license for this pages code can be found in the repo <a href="https://github.com/MarcusGrass/marcusgrass.github.io/blob/main/LICENSE">here</a>.<br> The license for the styling is under that repo <a href="https://github.com/sindresorhus/github-markdown-css/blob/main/license">here</a>.<br> The license for starry night is for some reason kept in this 1MB file in their repo <a href="https://github.com/wooorm/starry-night/blob/c73aac7b8bff41ada86747f668dd932a791b851b/notice">here</a> (TLDR it's MIT/Apache2 licensed under MIT)</p> </div> </div>
\ No newline at end of file
diff --git a/kbd-smp.html b/kbd-smp.html
new file mode 100644
index 0000000..d0b4391
--- /dev/null
+++ b/kbd-smp.html
@@ -0,0 +1,189 @@
+<!DOCTYPE html>
+<html lang="en" xmlns="http://www.w3.org/1999/html">
+
+    <meta charset="UTF-8">
+    <base href="/">
+    <link rel="stylesheet" href="static/styles.css">
+    <link rel="stylesheet" href="static/github-markdown.css">
+    <link rel="stylesheet" href="static/starry_night.css">
+    <title>KbdSmp</title>
+
+
+<div id="menu">
+<a href=/ class="menu-item">Home</a><a href=/table-of-contents.html class="menu-item">Table of contents</a>
+</div>
+<div id="content">
+<div class="markdown-body"><h1>Symmetric multiprocessing in your keyboard</h1>
+<p>While my daughter sleeps during my parental leave I manage to get up to
+more than I thought I would. This time, a deep-dive into <a href="https://docs.qmk.fm/#/">QMK</a>.</p>
+<h2>Overview</h2>
+<p>This writeup is about how I enabled multicore processing on my keyboard,
+the structure is as follows:</p>
+<ol>
+<li>A short intro to <code>QMK</code>.
+<li>A dive into keyboards, briefly how they function.
+<li>Microcontrollers and how they interface with the keyboard.
+<li>Threading on Chibios.
+<li>Multithread vs multicore, concurrency vs parallelism.
+<li>Tying it together.
+</ol>
+<h2>QMK and custom keyboards</h2>
+<p><code>QMK</code> contains open source firmware for keyboards, it provides implementations for most custom keyboard functionality,
+like key presses (that one's obvious), rotary encoders, and oled screens.</p>
+<p>It can be thought of as an OS for your keyboard, which can be configured by plain <code>json</code>,
+with <a href="https://config.qmk.fm/#/xelus/kangaroo/rev1/LAYOUT_ansi_split_bs_rshift">online tools</a>, and other
+simple tools that you don't need to be able to program to use.</p>
+<p>But, you can also get right into it if you want, which is where it gets interesting.</p>
+<h2>Qmk structure</h2>
+<p>Saying that <code>QMK</code> is like an OS for your keyboard might drive some pedantics mad, since <code>QMK</code> packages
+an OS and installs it configured on your keyboard, with your additions.</p>
+<p>Most features are toggled by defining constants in different <code>make</code> or header files, like:</p>
+<div class="highlight highlight-c"><pre>#<span class="pl-k">pragma</span> once
+<span class="pl-c">// Millis</span>
+#<span class="pl-k">define</span> <span class="pl-en">OLED_UPDATE_INTERVAL</span> <span class="pl-c1">50</span>
+#<span class="pl-k">define</span> <span class="pl-en">OLED_SCROLL_TIMEOUT</span> <span class="pl-c1">0</span>
+#<span class="pl-k">define</span> <span class="pl-en">ENCODER_RESOLUTION</span> <span class="pl-c1">2</span>
+<span class="pl-c">// Need to propagate oled data to right side</span>
+#<span class="pl-k">define</span> <span class="pl-en">SPLIT_TRANSACTION_IDS_USER</span> OLED_DATA_SYNC
+</pre></div>
+<p>It also exposes some API's which provide curated functionality,
+here's an example from the <a href="https://github.com/qmk/qmk_firmware/blob/master/drivers/oled/oled_driver.h">oled driver</a>:</p>
+<div class="highlight highlight-c"><pre><span class="pl-c">// Writes a string to the buffer at current cursor position</span>
+<span class="pl-c">// Advances the cursor while writing, inverts the pixels if true</span>
+<span class="pl-k">void</span> <span class="pl-en">oled_write</span>(<span class="pl-k">const</span> <span class="pl-k">char</span> *data, <span class="pl-k">bool</span> invert);
+</pre></div>
+<p>Above is an API that allows you to write text to an <code>oled</code> screen, very convenient.</p>
+<p>Crucially, <code>QMK</code> does actually ship an OS, in my case <a href="https://chibiforge.org/doc/21.11/full_rm/">chibios</a>.
+Chibios is a full-featured <a href="https://en.wikipedia.org/wiki/Real-time_operating_system">RTOS</a>. That OS contains
+the drivers for my microcontrollers, and from my custom code I can interface with
+the operating system.</p>
+<h2>Keyboards keyboards keyboards</h2>
+<p>I have been building keyboards since I started working as a programmer.
+There is much that can be said about them, but not a lot of it is particularly interesting. I'll give a brief
+explanation of how they work.</p>
+<h3>Keyboard internals</h3>
+<p>A keyboard is like a tiny computer that tells the OS (The other one, the one not in the keyboard)
+what keys are being pressed.</p>
+<p>Here are three arbitrarily chosen important components to a keyboard:</p>
+<ol>
+<li>The <a href="https://en.wikipedia.org/wiki/Printed_circuit_board">Printed Circuit Board (PCB)</a>, it's a large
+chip that connects all the keyboard components. If you're thinking: "Hey that's a motherboard!", then you
+aren't far off. Split keyboards (usually) have two PCBs working in tandem, connected by (usually) an aux cable.
+<li>The microcontroller, the actual computer part that you program. It can be integrated directly with the PCB,
+or soldered on to it.
+<li><a href="https://en.wikipedia.org/wiki/Keyboard_technology#Notable_switch_mechanisms">The switches</a>,
+the things that when pressed connects circuits on the PCB, which the microcontroller can see
+and interpret as a key being pressed.
+</ol>
+<h2>Back to the story</h2>
+<p>I used an <a href="https://keeb.io/collections/iris-split-ergonomic-keyboard">Iris</a> for years and loved it, but since some pretty impressive microcontrollers that aren't <a href="https://en.wikipedia.org/wiki/AVR_microcontrollers">AVR</a>,
+but <a href="https://en.wikipedia.org/wiki/ARM_architecture_family">ARM</a> came out, surpassing the AVR ones in cost-efficiency, memory, and speed, while being compatible,
+I felt I needed an upgrade.</p>
+<p>A colleague tipped me off about <a href="https://splitkb.com/products/aurora-lily58">lily58</a>, which takes any <a href="https://github.com/sparkfun/Pro_Micro">pro-micro</a>-compatible microcontroller,
+so I bought it. Alongside a couple of <a href="https://www.raspberrypi.com/documentation/microcontrollers/rp2040.html">RP2040</a>-based microcontrollers.</p>
+<h3>RP2040 and custom microcontrollers</h3>
+<p>Another slight derailment, the RP2040 microcontroller is a microcontroller with an
+<a href="https://developer.arm.com/Processors/Cortex-M0-Plus">Arm-cortex-m0+ cpu</a>. Keyboard-makers take this kind
+of microcontroller, and customize them to fit keyboards, since pro-micro microcontrollers have influenced a lot
+of the keyboard PCBs, many new microcontroller designs fit onto a PCB the same way that a pro-micro does. Meaning,
+often you can use many combinations of microcontrollers, with many combinations of PCBs.</p>
+<p>The arm-cortex-m0+ cpu is pretty fast, cheap, and has two cores, TWO CORES, why would someone even need that?
+But, if there are two cores on there, then they should both definitely be used.</p>
+<h2>Back to the story, pt2</h2>
+<p>I was finishing up my keyboard and realized that <code>oled</code>-rendering is by default set to 50ms, to not impact
+matrix scan rate. (The matrix scan rate is when the microcontroller checks the PCB for what keys are being held down,
+if it takes too long it may impact the core functionality of key-pressing and releasing being registered correctly).</p>
+<p>Now I found the purpose of multicore, if rendering to the oled takes time,
+then that job could (and therefore should) be shoveled onto a
+different thread. My keyboard has 2 cores, I should parallelize this by using a thread!</p>
+<h2>Chibios and threading</h2>
+<p>Chibios is very well documented; it even
+<a href="https://chibiforge.org/doc/21.11/full_rm/group__threads.html">has a section on threading</a>, and it even has a
+convenience function for
+<a href="https://chibiforge.org/doc/21.11/full_rm/group__threads.html#gabf1ded9244472b99cef4dfa54caecec4">spawning a static thread</a>.</p>
+<p>It can be used like this:</p>
+<div class="highlight highlight-c"><pre><span class="pl-k">static</span> <span class="pl-en">THD_WORKING_AREA</span>(my_thread_area, <span class="pl-c1">512</span>);
+<span class="pl-k">static</span> <span class="pl-en">THD_FUNCTION</span>(my_thread_fn, arg) {
+    <span class="pl-c">// Cool function body</span>
+}
+<span class="pl-k">void</span> <span class="pl-en">start_worker</span>(<span class="pl-k">void</span>) {
+    <span class="pl-c1">thread_t</span> *thread_ptr = <span class="pl-c1">chThdCreateStatic</span>(my_thread_area, <span class="pl-c1">512</span>, NORMALPRIO, my_thread_fn, <span class="pl-c1">NULL</span>);
+}
+</pre></div>
+<p>Since my CPU has two cores, if I spawn a thread, work will be parallelized, I thought, so I went for it. (This is
+foreshadowing).</p>
+<p>After wrangling some <a href="https://chibiforge.org/doc/21.11/full_rm/group__mutexes.html">mutex locks</a>, and messing
+with the firmware to remove race conditions, I had a multithreaded implementation that could offload rendering
+to the <code>oled</code> display on a separate thread, great! Now why is performance so bad?</p>
+<h2>Multithread != Multicore, an RTOS is not the same as a desktop OS</h2>
+<p>When I printed the core-id of the thread rendering to the <code>oled</code>-display, it was <code>0</code>. I wasn't
+actually using the extra core which would have core-id <code>1</code>.</p>
+<p>The assumption that:</p>
+<blockquote>
+<p>If I have two cores and I have two threads, the two threads should be running
+or at least be available to accept tasks almost 100% of the time.</p>
+</blockquote>
+<p>does not hold here.
+It would hold up better on a regular OS like <code>Linux</code>, but on <code>Chibios</code> it's a bit more explicit.</p>
+<p><strong>Note:</strong>
+Disregarding that <code>Chibios</code> spawns both a main-thread, and an idle-thread (on the same core) by default, so it's not just one,
+although that's not particularly important to performance.</p>
+<h3>On concurrency vs parallelism</h3>
+<p>Threading without multiprocessing can produce concurrency, like in <a href="https://www.python.org/">Python</a> with
+the <a href="https://wiki.python.org/moin/GlobalInterpreterLock">GIL</a> enabled. A programmer can run multiple tasks at the same time and if those tasks don't
+require CPU-time, such as waiting for some io, the tasks can make progress at the same time, which
+is why Python with the GIL can run webservers pretty well. However, tasks that require CPU-time to make
+progress will not benefit from having more threads in the single-core case.</p>
+<p>One more caveat are blocking tasks that do not park the thread, this will come down to how to the OS decides to schedule
+things: In a single-core scenario, the main thread offloads some io-work to a separate thread,
+the OS schedules (simplified) 1 millisecond to the io-thread, but that thread is stuck waiting for io to complete,
+the application will make no progress for that millisecond.
+One way to mitigate this is to park the waiting thread inside the
+io-api, then waking it up on some condition, in that case the blocking io won't hang the application.</p>
+<p>In my case, SMP not being enabled meant that the oled-drawer-thread just got starved of CPU-time resulting in
+drawing to the oled being painfully slow, but even if it hadn't been, there may have been a performance hit because
+it could have interfered with the regular key-processing.</p>
+<h3>Parallelism</h3>
+<p>I know I have two cores, parallelism should therefore be possible, I'll just have to enable
+<a href="https://en.wikipedia.org/wiki/Symmetric_multiprocessing">Symmetric multiprocessing(SMP)</a>.
+SMP means that the processor can actually do things in parallel.
+It's not enabled by default, Chibios has some <a href="https://www.chibios.org/dokuwiki/doku.php?id=chibios:articles:smp_rt7">documentation on this</a>.</p>
+<p>Enabling SMP is not trivial as it turns out, it needs a config flag for chibios,
+a makeflag when building for the platform (rp2040), and some other fixing.
+So I had to mess with the firmware once more,
+but checking some flags in the code, and some internal structures, I can see that <code>Chibios</code> is now compiled
+ready to use SMP, it even has a reference that I can use to my other core's context <code>&#x26;ch1</code> (<code>&#x26;ch0</code> is core 0).</p>
+<p>On <code>Linux</code> multicore and multithreading is opaque, you spawn a thread, it runs on some core (also assuming that
+SMP is enabled, but it generally is for servers and desktops). On Chibios, if you
+spawn a thread, it runs on the core that spawned it by default.</p>
+<p>Back to the docs, I see that I can instead create a thread from a <a href="https://chibiforge.org/doc/21.11/full_rm/group__threads.html#gad51eb52a2e308ba1cb6e5cd8a337817e">thread descriptor</a>,
+which takes a reference to the instance-context, <code>&#x26;ch1</code>. Perfect, now I'll spawn a thread on the other core, happily ever
+after.</p>
+<p><strong>WRONG!</strong></p>
+<p>It still draws from core-0 on the oled.</p>
+<p>Checking the chibios source code, I see that it falls back to <code>&#x26;ch0</code> if <code>&#x26;ch1</code> is <code>null</code>, now why is it <code>null</code>?</p>
+<h3>Main 2, a single main function is for suckers</h3>
+<p>Browsing through the chibios repo I find <a href="https://github.com/ChibiOS/ChibiOS/blob/master/demos/RP/RT-RP2040-PICO/c1_main.c">the next piece of the puzzle</a>,
+a demo someone made of SMP on the RP2040, it needs a separate main function where the instance context (<code>&#x26;ch1</code>)
+for the new core is initialized. I write some shim-code, struggle with some more configuration, and finally,
+core 1 is doing the <code>oled</code> work.</p>
+<p>Performance is magical, it's all worth it in the end.</p>
+<h2>Conclusion</h2>
+<p>My keyboard now runs multicore and I've offloaded all non-trivial
+work to core 1 so that core 0 can do the time-sensitive matrix scanning,
+and I can draw as much and often as I want to the oled display.</p>
+<p>I had to mess a bit with the firmware to specify that there is an extra
+core on the RP2040, and to keep <code>QMK</code>s hands off of oled state, since
+that code isn't thread-safe.</p>
+<p>In reality this kind of optimization probably isn't necessary for most users,
+but if there is work that the keyboard is doing
+that's triggered by key processing, such as rgb-animations, oled-animations, and similar. Offloading that
+to a separate core could improve performance, allowing more of that kind of work for a given keyboard.</p>
+<p>The code is in my fork <a href="https://github.com/MarcusGrass/qmk_firmware/tree/mg/lily58">here</a>,
+with commits labeled <code>[FIRMWARE]</code> being the ones messing with the firmware.</p>
+<p>The keyboard-specific code is contained
+<a href="https://github.com/MarcusGrass/qmk_firmware/tree/mg/lily58/keyboards/splitkb/aurora/lily58/keymaps/gramar">here</a>,
+on the same branch.</p>
+<p>I hope this was interesting to someone!</p>
+</div>
+</div>
\ No newline at end of file
diff --git a/meta.html b/meta.html
new file mode 100644
index 0000000..adad4a4
--- /dev/null
+++ b/meta.html
@@ -0,0 +1,178 @@
+<!DOCTYPE html>
+<html lang="en" xmlns="http://www.w3.org/1999/html">
+
+    <meta charset="UTF-8">
+    <base href="/">
+    <link rel="stylesheet" href="static/styles.css">
+    <link rel="stylesheet" href="static/github-markdown.css">
+    <link rel="stylesheet" href="static/starry_night.css">
+    <title>Meta</title>
+
+
+<div id="menu">
+<a href=/ class="menu-item">Home</a><a href=/table-of-contents.html class="menu-item">Table of contents</a>
+</div>
+<div id="content">
+<div class="markdown-body"><h1>Writing these pages</h1>
+<p>I did a number of rewrites of this web application, some of which could probably be
+found in the repository's history.<br>
+The goal has changed over time, but as with all things I wanted to create something that's as small as possible,
+and as fast as possible, taken to a ridiculous and counterproductive extent.</p>
+<h2>Rust for frontend</h2>
+<p>Rust can target <a href="https://en.wikipedia.org/wiki/WebAssembly">WebAssembly</a> through its target
+<code>wasm32-unknown-unknown</code>, which can then be run on the web. Whether this is a good idea or not remains to be seen.</p>
+<p>I've been working with <code>Rust</code> for a while now, even written code targeting <code>wasm</code>, but hadn't yet written anything
+to be served through a browser using <code>Rust</code>.</p>
+<p>After thinking that I should start writing things down more, I decided to make a blog to collect my thoughts.<br>
+Since I'm a disaster at front-end styling I decided that if I could get something to format markdown, that's good
+enough.<br>
+I could have just kept them as <code>.md</code> files in a git-repo, and that would have been the reasonable thing to do,
+but the concept of a dedicated page for it spoke to me, with GitHub's free hosting I started looking for alternatives
+for a web framework.</p>
+<h2>SPA</h2>
+<p>An SPA (<a href="https://en.wikipedia.org/wiki/Single-page_application">Single Page Application</a>), is a web application where
+the user doesn't have to follow a link and load a new page from the server to navigate to different pages of the
+application. It dynamically injects html based on path. This saves the user an http round trip when switching
+pages within the application, causing the application to feel more responsive.</p>
+<p>I've worked with SPAs a bit in the past with the <a href="https://angular.io/">Angular</a> framework, and I wanted to see if I
+could implement an SPA using Rust.</p>
+<h2>Yew</h2>
+<p>I didn't search for long before finding <a href="https://yew.rs/">yew</a>, it's a framework for developing front-end applications
+in <code>Rust</code>. It looked pretty good so I started up.</p>
+<p>I like how <code>Yew</code> does things, you construct <code>Components</code> that pass messages and react to them, changing their state
+and maybe causing a rerender.
+Although, I have a personal beef with <code>macros</code> and especially since <code>0.20</code> <code>Yew</code> uses them a lot,
+but we'll get back to that.</p>
+<p>My first shot was using <a href="https://github.com/raphlinus/pulldown-cmark">pulldown-cmark</a> directly from the <code>Component</code>.<br>
+I included the <code>.md</code>-files as <code>include_str!(...)</code> and then converted those to html within the component at view-time.</p>
+<h3>How the page worked</h3>
+<p>The page output is built using <a href="https://trunkrs.dev/">Trunk</a> a <code>wasm</code> web application bundler.</p>
+<p><code>trunk</code> takes my wasm and assets, generates some glue javascript to serve it, and moves it into a <code>dist</code> directory along
+with my <code>index.html</code>. From the <code>dist</code> directory, the web application can be loaded.</p>
+<p>The code had included my <code>.md</code>-files in the binary, a <code>const String</code> inserted into the <code>wasm</code>. When a
+page was to be loaded through navigation, my <code>component</code> checked the path of the <code>url</code>, if for example it was
+<code>/</code> it would select the hardcoded string from the markdown of <code>Home.md</code>, convert that to <code>html</code> and then inject
+that html into the page.</p>
+<h3>Convert at compile time</h3>
+<p>While not necessarily problematic, this seemed unnecessary, since the <code>.md</code>-content doesn't change and is just
+going to be converted, I might as well only do that once.
+The alternatives for that is at compile time or at application load time, opposed to what I was currently doing,
+which I guess would be called <code>render time</code> or <code>view-time</code> (in other words, every time content was to be injected).</p>
+<p>I decided to make build-scripts which takes my <code>.md</code>-pages, and converts them to <code>html</code>, then my application
+could load that <code>const String</code> instead of the old one, skipping the conversion step and the added binary dependency of
+<a href="https://github.com/raphlinus/pulldown-cmark">pulldown-cmark</a>.</p>
+<p>It was fairly easily done, and now the loading was (theoretically) faster.</p>
+<h3>Styling</h3>
+<p>I wanted my markdown to look nice, the default markdown-to-html conversion rightfully doesn't apply any styling.
+As someone who is artistically challenged I needed to find some off-the-shelf styling to apply.</p>
+<p>I thought GitHub's <code>css</code> for their markdown rendering looks nice and wondered if I could find the source for it,
+after just a bit of searching I found <a href="https://github.com/sindresorhus/github-markdown-css">github-markdown-css</a>, where
+a generator for that <code>css</code>, as well as already generated copies of it. I added that too my page.</p>
+<h3>Code highlighting</h3>
+<p>Code highlighting was difficult, there are a few alternatives for highlighting.<br>
+If I understood it correctly, GitHub uses something similar to <a href="https://github.com/wooorm/starry-night">starry-nigth</a>.<br>
+Other alternatives are <a href="https://highlightjs.org/">highlight.js</a> and <a href="https://prismjs.com/">Prism</a>.<br>
+After a brief look, <code>highlight.js</code> seemed easy to work with, and produces some nice styling, I went with that.</p>
+<p>The easiest way of implementing <code>highlight.js</code> (or <code>prism.js</code>, they work essentially the same), is to load a<br>
+<code>&#x3C;script src="highlight.js">&#x3C;/script></code> at the bottom of the page body. Loading the script calls the
+<code>highlightAll()</code> function, which takes code elements and highlights them.<br>
+This turned out to not be that easy the way I was doing things.<br>
+Since I was rendering the body dynamically, previously highlighted elements would be de-highlighted on navigation,
+since the <code>highlightAll()</code> function had already been called. While I'm sure that you can call js-functions from <code>Yew</code>,
+finding how to do that in the documentation is difficult. Knowing when the call them is difficult as well,
+as many comprehensive frameworks, they work as black boxes sometimes. While it's easy to look at page-html with
+<code>javascript</code> and understand what's happening and when, it's difficult to view corresponding <code>Rust</code> code and know when
+an extern <code>javascript</code> function would be called, if I could figure out how to insert such a call in the <code>component</code>.<br>
+I settled for not having highlighting and continued building.</p>
+<h3>Navigation</h3>
+<p>I wanted a nav-bar, some <a href="https://en.wikipedia.org/wiki/Hamburger_button">hamburger menu</a> which would unfold and
+give the user access to navigation around the page. Constructing that with my knowledge of css was a disaster.<br>
+It never scaled well, it was difficult putting it in the correct place, and eventually I just gave up
+and created a navigation page <code>.md</code>-style, like all other pages in the application.<br>
+I kept a menu button for going back to home, or to the navigation page, depending on the current page.</p>
+<p>An issue with this is that links in an <code>.md</code>-file, when converted to <code>html</code>, become regular <code>&#x3C;a href=".."</code> links,
+which will cause a new page-load. My internal navigation was done using <code>Yew</code> callbacks, swapping out
+page content on navigation, that meant I'd have to replace those <code>href</code> links with <code>Yew</code> templating.
+I decided to make my build script more complex, instead of serving raw converted <code>html</code>, I would generate small
+rust-files which would convert the <code>html</code> into <code>Yew</code>'s <code>html!</code> macro. This was ugly in practice, html that looked like
+this</p>
+<div class="highlight highlight-text-html-basic"><pre>
+&#x3C;<span class="pl-ent">div</span>>
+    Content here
+&#x3C;/<span class="pl-ent">div</span>>
+</pre></div>
+<p>Would have to be converted to this:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-en">yew</span><span class="pl-k">::</span><span class="pl-en">html!</span> {
+    &#x3C;<span class="pl-smi">div</span>>
+        {{<span class="pl-s"><span class="pl-pds">"</span>Content here<span class="pl-pds">"</span></span>}}
+    &#x3C;<span class="pl-k">/</span><span class="pl-smi">div</span>>
+}
+</pre></div>
+<p>Any raw string had to be double bracketed then quoted.<br>
+Additionally, to convert to links, raw <code>html</code> that looked like this:</p>
+<div class="highlight highlight-text-html-basic"><pre>&#x3C;<span class="pl-ent">a</span> <span class="pl-e">href</span>=<span class="pl-s"><span class="pl-pds">"</span>/test<span class="pl-pds">"</span></span>>Test!&#x3C;/<span class="pl-ent">a</span>>
+</pre></div>
+<p>Would have to be converted to this:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-en">yew</span><span class="pl-k">::</span><span class="pl-en">html!</span> {
+    &#x3C;<span class="pl-smi">a</span> <span class="pl-smi">onclick</span><span class="pl-k">=</span>{<span class="pl-k">move</span> <span class="pl-k">|</span><span class="pl-smi">_</span><span class="pl-k">|</span> <span class="pl-smi">scope</span><span class="pl-k">.</span>navigator<span class="pl-k">.</span><span class="pl-en">unwrap</span>()<span class="pl-k">.</span><span class="pl-en">replace</span>(<span class="pl-k">&#x26;</span><span class="pl-en">Location</span><span class="pl-k">::</span><span class="pl-en">Test</span>)}><span class="pl-en">Test!</span>&#x3C;<span class="pl-k">/</span><span class="pl-smi">a</span>>
+}
+</pre></div>
+<p>On top of that, the css specifies special styling for <code>&#x3C;a></code> which contains <code>href</code> vs <code>&#x3C;a></code> which doesn't.<br>
+That was a fairly easy to change, from this:
+<code>.markdown-body a:not([href])</code> to this <code>.markdown-body a:not([href]):not(.self-link)</code> as well as
+adding the class <code>self-link</code> to the links that were replaced.<br>
+Some complexity was left out, such as the <code>scope</code> being moved into the function, so I had to generate a bunch of
+<code>scope_n</code> at the top of the generated function from which the <code>html</code> was returned.</p>
+<p>In the end it worked, an internal link was replaced by a navigation call, and navigation worked from my <code>.md</code>
+navigation page.</p>
+<p>The page was exactly how I wanted.</p>
+<h3>Yew page retrospective</h3>
+<p>Looking at only the <code>wasm</code> for this fairly minimal page, it was more than <code>400K</code>. To make the page work
+I had to build a complex build script that generated <code>Rust</code> code that was valid with the <code>Yew</code> framework.<br>
+And to be honest, since bumping <code>Yew</code> from <code>0.19</code> to <code>0.20</code> during this process, seeing a turn towards even heavier
+use of macros for functionality. I didn't see this as maintainable even in the medium term.<br>
+I had a big slow page which probably wouldn't be maintainable where highlighting was tricky to integrate.</p>
+<h2>RIIJS</h2>
+<p>I decided to rewrite the page in javascript, or rather generate javascript from a <code>Rust</code> build script and skip
+<code>Yew</code> entirely.<br>
+It took less than two hours and the size of the application was now <code>68K</code> in total, and much less complex.</p>
+<p>The only dependencies now were pulldown-cmark for the build script, I wondered if I could get this to be even smaller.<br>
+I found a <code>css</code> and <code>js</code> minifier written in <code>Rust</code>: <a href="https://github.com/GuillaumeGomez/minifier-rs">minifier-rs</a>.</p>
+<p>After integrating that, the page was down to <code>60K</code>, about <code>7</code> times smaller than before.<br>
+Doing it in <code>javascript</code> also made it easy to apply highlighting again. I went back and had another look, finding
+that <code>Prism.js</code> was fairly tiny, integrating that made highlighting work, bringing to page size to a bit over <code>70K</code>.</p>
+<p>I wasn't completely content with highlighting being done after the fact on a static page, and if that was to be
+off-loaded
+I might as well go with the massive <a href="https://github.com/wooorm/starry-night">starry-night</a> library.<br>
+Sadly this meant creating a build-dependency on <code>npm</code> and the dependency swarm that that brings. But in the
+end my page was equally small as with <code>prism</code>, and doing slightly less work at view-time, with some nice highlighting.</p>
+<h2>In defense of Yew</h2>
+<p><code>Yew</code> is not a bad framework, and that's not the point of this post. The point is rather the importance of
+using the best tool for the job. <code>wasm</code> is not necessarily faster than <code>javascript</code> on the web, and if not doing
+heavy operations which can be offloaded to the <code>wasm</code>, the complexity and size of a framework that utilizes it may not
+be worth it. This page is just a simple collection of html with some highlighting, anything dynamic on the page
+is almost entirely in the scope of <code>DOM</code> manipulation, which <code>wasm</code> just can't handle at the moment.</p>
+<h2>CI</h2>
+<p>Lastly, I wanted my page to be rebuilt and published in CI, and I wanted to not have to check in the <code>dist</code> folder,
+so I created a pretty gnarly <code>bash</code>-script. The complexity isn't the bad part, the bad part is the
+chained operations where each is more dangerous than the last.<br>
+In essence, it checks out a temporary branch from main, builds a new <code>dist</code>, creates a commit, and then
+force pushes that to the <code>gh-pages</code> branch. If this repo's history grows further in the future,
+I'll look into making it even more destructive by just compacting the repo's entire history into one commit and
+pushing that to that branch. But I don't think that will be necessary.</p>
+<h2>Rants on macros and generics</h2>
+<p>I like some of the philosophies of <code>Yew</code>, separating things into <code>Components</code> that pass messages. But, seeing
+the rapid changes and the increasing use of proc-macros that do the same things as structs and
+traits, only more opaquely, makes me fear that web development in <code>Rust</code> will follow the same churn-cycle as
+<code>javascript</code>. What I may appreciate most about statically, strongly typed languages is that you know the type
+of any given object. Macros and generics dilute this strength, and in my opinion should be used sparingly
+when creating libraries, although I realize their respective strength and necessity at times.
+I believe that adding macros creates a maintenance trap, and if what you're trying to do can already be
+done without macros I think that's a bad decision by the authors.
+Macros hide away internals, you don't get to see the objects and functions that you're calling,
+if a breaking change occurs, knowing how to fix it can become a lot more difficult as you may have
+to re-learn both how the library used to work internally, and the way it currently works, to preserve the old
+functionality.<br>
+<code>&#x3C;/rant></code></p>
+</div>
+</div>
\ No newline at end of file
diff --git a/pgwm03.html b/pgwm03.html
new file mode 100644
index 0000000..e8d3514
--- /dev/null
+++ b/pgwm03.html
@@ -0,0 +1,267 @@
+<!DOCTYPE html>
+<html lang="en" xmlns="http://www.w3.org/1999/html">
+
+    <meta charset="UTF-8">
+    <base href="/">
+    <link rel="stylesheet" href="static/styles.css">
+    <link rel="stylesheet" href="static/github-markdown.css">
+    <link rel="stylesheet" href="static/starry_night.css">
+    <title>Pgwm03</title>
+
+
+<div id="menu">
+<a href=/ class="menu-item">Home</a><a href=/table-of-contents.html class="menu-item">Table of contents</a>
+</div>
+<div id="content">
+<div class="markdown-body"><h1>PGWM 0.3, tiny-std, and xcb-parse</h1>
+<p>I recently made a substantial rewrite of my (now) pure rust x11 window manager and want to collect my thoughts on it
+somewhere.</p>
+<h2>X11 and the Linux desktop</h2>
+<p>PGWM is an educational experience into Linux desktop environments,
+the <a href="https://en.wikipedia.org/wiki/X_Window_System">x11 specification</a>
+first came about in 1984 and has for a long time been the only mainstream way for gui-applications on Linux to
+show what they need on screen for their users.</p>
+<p>When working on desktop applications for Linux, the intricacies of that protocol are mostly hidden by the desktop
+frameworks a developer might encounter. In <code>Rust</code>,
+the cross-platform library <a href="https://github.com/rust-windowing/winit">winit</a> can be used for this purpose,
+and applications written in <code>Rust</code> like the terminal emulator <a href="https://github.com/alacritty/alacritty">Alacritty</a>
+uses <code>winit</code>.</p>
+<p>At the core of the Linux desktop experience lies the Window Manager, either alone or accompanied by a Desktop
+Enviroment (DE). The Window Manager makes decisions on how windows are displayed.</p>
+<h3>The concept of a Window</h3>
+<p><em>Window</em> is a loose term often used to describe some surface that can be drawn to on screen.<br>
+In X11, a window is a <code>u32</code> id that the <code>xorg-server</code> keeps information about. It has properties, such as a height and
+width, it can be visible or not visible, and it enables the developer to ask the server to subscribe to events.</p>
+<h3>WM inner workings and X11 (no compositor)</h3>
+<p>X11 works by starting the <code>xorg-server</code>, the <code>xorg-server</code> takes care of collecting input
+from <a href="https://en.wikipedia.org/wiki/Human_interface_device">HIDs</a>
+like the keyboard and mouse, collecting information about device state,
+such as when a screen is connected or disconnected,
+and coordinates messages from running applications including the Window Manager.<br>
+This communication goes over a socket, TCP or Unix. The default is <code>/tmp/.X11-unix/X0</code> for a single-display desktop
+Linux environment.</p>
+<p>The details of the communication are specified in xml files in Xorg's gitlab
+repo <a href="https://gitlab.freedesktop.org/xorg/proto/xcbproto/-/tree/master/src">xcbproto</a>.
+The repo contains language bindings, xml schemas that specify how an object passed over the socket should be structured
+to be recognized by the xorg-server.
+The name for the language bindings is XCB for 'X protocol C-language Binding'.<br>
+Having this kind of protocol means that a developer who can't or won't directly link to and use the <code>xlib</code> C-library
+can instead construct their own representations of those objects and send those over the socket.</p>
+<p>In PGWM a <code>Rust</code> language representation of these objects are used, containing serialization and deserialization methods
+that turn Rust structs into raw bytes that can be transmitted on the socket.</p>
+<p>If launching PGWM through <a href="https://wiki.archlinux.org/title/xinit">xinit</a>, an xorg-server is started at the beginning
+of that script, if PGWM is launched inside that script it will try to become that server's Window Manager.</p>
+<p>When an application starts within the context of X11, a handshake takes place. The application asks for setup
+information from the server, and if the server replies with a success the application can start interfacing
+with the server.<br>
+In a WM's case, it will request to set the <code>SubstructureRedirectMask</code> on the root X11 window.<br>
+Only one application can have that mask on the root window at a given time. Therefore, there can only be one WM active
+for a running xorg-server.<br>
+If the change is granted, layout change requests will be sent to the WM. From then on the WM can make decisions on the
+placements of windows.</p>
+<p>When an application wants to be displayed on screen it will send a <code>MapRequest</code>, when the WM gets that request it will
+make a decision whether that window will be shown, and its dimensions, and forward that decision to the xorg-server
+which is responsible for drawing it on screen. Changing window dimensions works much the same way.</p>
+<p>A large part of the trickiness of writing a WM, apart from the plumbing of getting the socket communication right, is
+handling focus.<br>
+In X11, focus determines which window will receive user input, aside from the WM making the decision of what should
+be focused at some given time, some <code>Events</code> will by default trigger focus changes, making careful reading of the
+protocol an important part of finding maddening bugs.<br>
+What is currently focused can be requested from the xorg-server by any application, and notifications on focus changes
+are produced if requested. In PGWM, focus becomes a state that needs to be kept on both the WM's and X11's side to
+enable swapping between <code>workspaces</code> and having previous windows re-focused, and has been a constant source of bugs.</p>
+<p>Apart from that, the pure WM responsibilities are not that difficult, wait for events, respond by changing focus or
+layout, rinse and repeat.
+The hard parts of PGWM has been removing all C-library dependencies, and taking optimization to a stupid extent.</p>
+<h1>Remove C library dependencies, statically link PGWM 0.2</h1>
+<p>I wanted PGWM to be statically linked, small and have no C-library dependencies for 0.2. I had one problem.</p>
+<h2>Drawing characters on screen</h2>
+<p>At 0.1, PGWM used language bindings to the <a href="https://en.wikipedia.org/wiki/Xft">XFT</a>(X FreeType interface library)
+C-library, through the Rust <code>libx11</code> bindings library <a href="https://crates.io/crates/x11">X11</a>. XFT handles font rendering.
+It was used to draw characters on the status bar.</p>
+<p>XFT provides a fairly nice interface, and comes with the added bonus
+of <a href="https://en.wikipedia.org/wiki/Fontconfig">Fontconfig</a> integration.
+Maybe you've encountered something like this <code>JetBrainsMono Nerd Font Mono:size=12:antialias=true</code>, it's
+an excerpt from my <code>~/.Xresources</code> file and configures the font for Xterm. Xterm uses fontconfig to figure out where
+that font is located on my machine. Removing XFT and fontconfig with it, means that fonts have to specified by path,
+now this is necessary to find fonts: <code>/usr/share/fonts/JetBrains\ Mono\ Medium\ Nerd\ Font\ Complete\ Mono.ttf</code>, oof.
+I still haven't found a non <code>C</code> replacement for finding fonts without specifying an absolute path.</p>
+<p>One step in drawing a font is taking the font data and creating a vector of light intensities, this process is called
+Rasterization. Rust has a font rasterization library <a href="https://github.com/mooman219/fontdue">fontdue</a>
+that at least at one point claimed to be the fastest font rasterizer available.
+Since I needed to turn the fonts into something that could be displayed as a vector of bytes,
+I integrated that into PGWM. The next part was drawing it in the correct place. But, instead of looking
+at how XFT did it I went for a search around the protocol and found the <code>shm</code> (shared memory) extension (This maneuver
+cost me about a week).</p>
+<h3>SHM</h3>
+<p>The X11 <code>shm</code> extension allows an application to share memory with X11, and request the xorg-server to draw what's in
+that shared memory at some chosen location.
+So I spent some time encoding what should be displayed, pixel by pixel from the background color, with the characters as
+bitmaps rasterized by <code>fontdue</code> on top, into a shared memory segment, then having the xorg-server draw from that
+segment.
+It worked, but it took a lot of memory, increased CPU usage, and was slow.</p>
+<h3>Render</h3>
+<p>I finally went to look at XFT's code and found that it uses
+the <a href="https://gitlab.freedesktop.org/xorg/proto/xcbproto/-/blob/master/src/render.xml">render</a>
+extension, an extension that can register byte representations as glyphs, and then draw those glyphs at specified
+locations, by glyph-id. This is the sane way to do
+it. After implementing that, font rendering was again working, and the performance was good.</p>
+<h1>PGWM 0.3 how can I make this smaller and faster?</h1>
+<p>I wanted PGWM to be as resource efficient as possible, I decided to dig into the library that I used do serialization
+and deserialization of <code>Rust</code> structs that were to go over the socket to the <code>xorg-server</code>.</p>
+<p>The library I was using was <a href="https://github.com/psychon/x11rb">X11rb</a> an excellent safe and performant library for doing
+just that.
+However, I was taking optimization to a ridiculous extent, so I decided to make that library optimized for my specific
+use case.</p>
+<h2>PGWM runs single threaded</h2>
+<p>X11rb can handle multithreading, making the execution path for single threaded applications longer than necessary.<br>
+I first rewrote the connection logic from interior mutability (the connection handles synchronization) to exterior
+mutability (user handles synchronization, by for example wrapping it in an <code>Arc&#x3C;RwLock&#x3C;Connection>></code>).<br>
+This meant a latency decrease of about 5%, which was pretty good. However, it did mean
+that <a href="https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization">RAII</a>
+no longer applied and the risk of memory leaks went up.
+I set the WM to panic on leaks in debug and cleaned them up where I found them to handle that.</p>
+<h2>Optimize generated code</h2>
+<p>In X11rb, structs were serialized into owned allocated buffers of bytes, which were then sent over the socket.
+This means a lot of allocations. Instead, I created a connection which holds an out-buffer, structs are always
+serialized directly into it, that buffer is then flushed over the socket. Meaning no allocations are necessary during
+serialization.</p>
+<p>The main drawback of that method is management of that buffer. If it's growable then the largest unflushed batch
+will take up memory for the WM's runtime, or shrink-logic needs to be inserted after each flush.
+If the buffer isn't growable, some messages might not fit depending on how the
+buffer is proportioned. It's pretty painful in edge-cases. I chose to have a fixed-size buffer of 64kb.</p>
+<p>At this point I realized that the code generation was hard to understand and needed a lot of changes to support my
+needs. Additionally, making my WM <code>no_std</code> and removing all traces of <code>libc</code> was starting to look achievable.</p>
+<h3>Extreme yak-shaving, generate XCB from scratch</h3>
+<p>This was by far the dumbest part of the process, reworking the entire library to support <code>no_std</code> and generate the
+structures and code from scratch. From probing the Wayland specification I had written a very basic <code>Rust</code> code
+generation library <a href="https://github.com/MarcusGrass/codegen-rs">codegen-rs</a>, I decided to use that for code generation.</p>
+<p>After a few weeks I had managed to write a parser for the <code>xproto.xsd</code>, a parser for the actual protocol files, and a
+code generator that I could work with.<br>
+A few more weeks followed and I had finally generated a <code>no_std</code> fairly optimized library for interfacing with <code>X11</code>
+over socket, mostly by looking at how x11rb does it.</p>
+<h3>Extreme yak-shaving, pt 2, no libc allowed</h3>
+<p>In <code>Rust</code>, <code>libc</code> is the most common way that the standard library interfaces with the OS, with some direct
+<a href="https://en.wikipedia.org/wiki/System_call">syscalls</a> where necessary.
+There are many good reasons for using <code>libc</code>, even when not building cross-platform/cross-architecture libraries,
+I wanted something pure <code>Rust</code>, so that went out the window.</p>
+<h4>Libc</h4>
+<p><code>libc</code> does a vast amount of things, on Linux there are two implementations that dominate, <code>glibc</code> and <code>musl</code>.
+I won't go into the details of the differences between them, but simplified, they are C-libraries that make your C-code
+run as you expect on Linux.<br>
+As libraries they expose methods to interface with the OS, for example reading or writing to a file,
+or connecting to a socket.<br>
+Some functions are essentially just a proxies for <code>syscalls</code> but some do more things behind the scenes, like
+synchronization of shared application resources such as access to the environment pointer.</p>
+<h3>Removing the std-library functions and replacing them with syscalls</h3>
+<p>I decided to set PGWM to <code>!#[no_std]</code> and see what compiled. Many things in <code>std::*</code> are just re-exports from <code>core::*</code>
+and were easily replaced. For other things like talking to a socket I used raw <code>syscalls</code> through the
+excellent <a href="https://github.com/japaric/syscall.rs">syscall crate</a>
+and some glue-code to approximate what <code>libc</code> does. It was a bit messy,
+but not too much work replacing it, PGWM is now 100% not cross-platform, although it wasn't really before either...</p>
+<h3>No allocator</h3>
+<p>Since the standard library provides the allocator I had to find a new one, I decided to
+use <a href="https://github.com/alexcrichton/dlmalloc-rs">dlmalloc</a>,
+it works <code>no_std</code>, it was a fairly simple change.</p>
+<h3>Still libc</h3>
+<p>I look into my crate graph and see that quite a few dependencies still pull in libc:</p>
+<ol>
+<li><a href="https://github.com/time-rs/time">time.rs</a>
+<li><a href="https://github.com/toml-rs/toml-rs">toml.rs</a>
+<li><a href="https://github.com/alexcrichton/dlmalloc-rs">dlmalloc-rs</a>
+<li><a href="https://github.com/notflan/smallmap">smallmap</a>
+</ol>
+<p>I got to work forking these libraries and replacing libc with direct syscalls.<br>
+<code>time</code> was easy, just some <code>Cargo.toml</code> magic that could easily be upstreamed.<br>
+<code>toml</code> was a bit trickier, the solution was ugly and I decided not to upstream it.<br>
+<code>dlmalloc-rs</code> was even harder, it used the pthread-api to make the allocator synchronize, and implementing that
+was beyond even my yak-shaving. Since PGWM is single threaded anyway I left it as-is and <code>unsafe impl</code>'d
+<code>Send</code> and <code>Sync</code>.<br>
+<code>smallmap</code> fairly simple, upstreaming in progress.</p>
+<h3>The ghost of libc, time for nightly</h3>
+<p>With no traces of <code>libc</code> I try to compile the WM. It can't start, it doesn't know how to start.<br>
+The reason is that <code>libc</code> provides the application's entrypoint <code>_start</code>, without linking <code>libc</code> <code>Rust</code> doesn't
+know how to create an entrypoint.<br>
+As always the amazing <a href="https://fasterthanli.me/series/making-our-own-executable-packer/part-12">fasterthanli.me</a> has
+a write-up about how to get around that issue. The solution required nightly and some assembly.<br>
+Now the application won't compile, but for a different reason, I have no global alloc error handler.<br>
+When running a <code>no_std</code> binary with an allocator, <code>Rust</code> needs to know what to do if allocation fails, but there is
+at present no way to provide it with a way without another nightly feature
+<a href="https://github.com/rust-lang/rust/pull/102318">default_global_alloc_handler</a> which looks like it's about to be
+stabilized soon (TM).<br>
+Now the WM works, <code>no_std</code> no <code>libc</code>, life is good.</p>
+<h2>Tiny-std</h2>
+<p>I was looking at terminal emulator performance. Many new terminal emulators seem to
+have <a href="https://www.reddit.com/r/linux/comments/jc9ipw/why_do_all_newer_terminal_emulators_have_such_bad/">very poor input performance</a>
+.
+I had noticed this one of the many times PGWM crashed and sent me back to the cold hard tty, a comforting
+speed. <code>alacritty</code> is noticeably sluggish at rendering keyboard input to the screen,
+I went back to <code>xterm</code>, but now that PGWM worked I was toying with the idea to write a fast, small,
+terminal emulator in pure rust.<br>
+I wanted to share the code I used for that in PGWM with this new application, and clean it up in the process: <code>tiny-std</code>
+.</p>
+<p>The goal of <code>tiny-std</code> is to make a std-compatible <code>no_std</code> library with no <code>libc</code> dependencies available for use with
+Linux <code>Rust</code> applications on x86_64 and aarch64, which are the platforms I'm interested in. Additionally, all
+functionality
+that can work without an allocator should. You shouldn't need to pull in <code>alloc</code> to read/write from a file, just
+provide your own buffer.</p>
+<h3>The nightmare of cross-architecture</h3>
+<p>Almost immediately I realize why <code>libc</code> is so well-used. After a couple of hours of debugging a segfault, and it turning
+out to be incompatible field ordering depending on architecture one tends to see the light.
+Never mind the third time that happens.<br>
+I'm unsure of the best way to handle this, perhaps by doing some libgen straight from the kernel source, but we'll see.</p>
+<h3>Start, what's this on my stack?</h3>
+<p>I wanted to be able to get arguments and preferably environment variables
+into <code>tiny-std</code>. <a href="https://fasterthanli.me/series/making-our-own-executable-packer/part-12">Fasterthanli.me</a>
+helped with the args, but for the rest I had to go to the <a href="https://git.musl-libc.org/cgit/musl">musl source</a>.<br>
+When an application starts on Linux, the first 8 bytes of the stack contains <code>argc</code>, the number of input arguments.
+Following that are the null-terminated strings of the arguments (<code>argv</code>), then a null pointer,
+then comes a pointer to the environment variables.<br>
+<code>musl</code> then puts that pointer into a global mutable variable, and that's the environment.<br>
+I buckle under and do the same, I see a world where arguments and environment are passed to main, and it's the
+application's job, not the library, to decide to handle it in a thread-safe way
+(although you can use <code>env_p</code> as an argument to <code>main</code> in <code>C</code>).<br>
+Being no better than my predecessors, I store the environment pointer in a static variable, things like spawning
+processes becomes a lot more simple that way, <code>C</code> owns the world, we just live in it.</p>
+<h3>vDSO (virtual dynamic shared object), what there's more on the stack?</h3>
+<p>Through some coincidence when trying to make sure all the processes that I spawn don't become zombies I encounter
+the <a href="https://en.wikipedia.org/wiki/VDSO">vDSO</a>.<br>
+<code>ldd</code> has whispered the words, but I never looked it up.</p>
+<div class="highlight highlight-shell"><pre>[gramar@grarch marcusgrass.github.io]$ ldd <span class="pl-s"><span class="pl-pds">$(</span>which cat<span class="pl-pds">)</span></span>
+        linux-vdso.so.1 (0x00007ffc0f59c000)
+        libc.so.6 =<span class="pl-k">></span> /usr/lib/libc.so.6 (0x00007ff14e93d000)
+        /lib64/ld-linux-x86-64.so.2 =<span class="pl-k">></span> /usr/lib64/ld-linux-x86-64.so.2 (0x00007ff14eb4f000)
+</pre></div>
+<p>It turns out to be a shared library between the Linux kernel and a running program, mapped into that program's memory.<br>
+When I read that it provides faster ways to interface with the kernel I immediately stopped reading and started
+implementing, I could smell the nanoseconds.</p>
+<h4>Aux values</h4>
+<p>To find out where the VDSO is mapped into memory for an application, the application needs to inspect the
+<a href="https://man7.org/linux/man-pages/man3/getauxval.3.html">AUX values</a> at runtime.
+After the environment variable pointer comes another null pointer, following that are the <code>AUX</code> values.
+The <code>AUX</code> values are key-value(like) pairs of information sent to the process.
+Among them are 16 random bytes, the <code>pid</code> of the process, the <code>gid</code>, and about two dozen more entries of
+possibly useful values.<br>
+I write some more code into the entrypoint to save these values.</p>
+<h3>A memory mapped elf-file</h3>
+<p>Among the aux-values is <code>AT_SYSINFO_EHDR</code>, a pointer to the start of the <code>vDSO</code> which is a full
+<a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format">ELF-file</a> mapped into the process' memory.<br>
+I know that in this file is a function pointer for the <code>clock_gettime</code> function through the
+<a href="https://man7.org/linux/man-pages/man7/vdso.7.html">Linux vDSO docs</a>. I had benchmarked <code>tiny-std</code>'s
+<code>Instant::now()</code> vs the standard library's, and found it to be almost seven times slower.
+I needed to find this function pointer.</p>
+<p>After reading more Linux documentation, and ELF-documentation, and Linux-ELF-documentation,
+I managed to write some code that parses the ELF-file to find the address of the function.
+Of course that goes into another global variable, you know, <code>C</code>-world and all that.</p>
+<p>I created a feature that does the vDSO parsing, and if <code>clock_gettime</code> is found, uses that instead of the syscall.
+This increased the performance if <code>Instant::now()</code> from <code>~std * 7</code> to <code>&#x3C; std * 0.9</code>. In other words, it now outperforms
+standard by taking around 12% less time to get the current time from the system.</p>
+<h1>Conclusion</h1>
+<p>I do a lot of strange yak-shaving, mostly for my own learning, I hope that this write-up might have given you something
+too.<br>
+The experience of taking PGWM to <code>no_std</code> and no <code>libc</code> has been incredibly rewarding, although I think PGWM is mostly
+the same, a bit more efficient, a bit less stable.<br>
+I'll keep working out the bugs and API och <code>tiny-std</code>, plans to do a minimal terminal emulator are still in the back of
+my mind, we'll see if I can find the time.</p>
+</div>
+</div>
\ No newline at end of file
diff --git a/pgwm04.html b/pgwm04.html
new file mode 100644
index 0000000..046027b
--- /dev/null
+++ b/pgwm04.html
@@ -0,0 +1,208 @@
+<!DOCTYPE html>
+<html lang="en" xmlns="http://www.w3.org/1999/html">
+
+    <meta charset="UTF-8">
+    <base href="/">
+    <link rel="stylesheet" href="static/styles.css">
+    <link rel="stylesheet" href="static/github-markdown.css">
+    <link rel="stylesheet" href="static/starry_night.css">
+    <title>Pgwm04</title>
+
+
+<div id="menu">
+<a href=/ class="menu-item">Home</a><a href=/table-of-contents.html class="menu-item">Table of contents</a>
+</div>
+<div id="content">
+<div class="markdown-body"><h1>PGWM 0.4, io-uring, stability, and static pie linking</h1>
+<p>A while back I decided to look into io-uring for an event-loop for
+<a href="https://github.com/MarcusGrass/pgwm">pgwm</a>, I should have written
+about it when I implemented it, but couldn't find the time then.</p>
+<p>Now that I finally got <a href="https://github.com/MarcusGrass/pgwm">pgwm</a> to compile
+using the stable toolchain, I'm going to write a bit about the way there.</p>
+<h2>Io-uring</h2>
+<p><a href="https://en.wikipedia.org/wiki/Io_uring">Io-uring</a> is a linux syscall interface
+that allows you to submit io-tasks, and later collect the results of those tasks.
+It does so by providing two ring buffers, one for submissions, and one for completions.</p>
+<p>In the simplest possible terms, you put some tasks on one queue, and later collect them on some other
+queue. In practice, it's a lot less simple than that.</p>
+<p>As I've written about in previous entries on this website, I decided to scrap the std-lib and <code>libc</code>, and write
+my own syscall interface in <a href="https://github.com/MarcusGrass/tiny-std">tiny-std</a>.<br>
+Therefore I had to look into the gritty details of how to set up these buffers, you can see those details
+<a href="https://github.com/MarcusGrass/tiny-std/blob/e48179de9f11e687e5f523bb2f271b7c3bb71175/rusl/src/io_uring.rs">here</a>.
+Or, look at the c-implementation which I ripped off <a href="https://github.com/axboe/liburing">here</a>.</p>
+<h3>Why io-uring?</h3>
+<p>I've written before about my x11-wm <a href="https://github.com/MarcusGrass/pgwm">pgwm</a>, but in short:
+It's an x11-wm is based on async socket communication where the wm-reacts to incoming messages, like a key-press, and
+responds with some set of outgoing messages on that same socket.<br>
+When the WM had nothing to do it used the <code>poll</code> interface to await another message.</p>
+<p>So the loop could be summed up as:</p>
+<pre><code>1. Poll until there's a message on the socket.
+2. Read from the socket.
+3. Handle the message.
+</code></pre>
+<p>With io-uring that could be compacted to:</p>
+<pre><code>1. Read from the socket when there are bytes available.
+2. Handle the message.
+</code></pre>
+<p>io-uring sounded cool, and this seemed efficient, so off I went.</p>
+<h3>Why not io-uring?</h3>
+<p>Io-uring is complex, the set-up is complex and there are quite a few considerations that need to be made.
+Ring-buffers are set up, how big should they be? What if we get an incoming message pile-up? What if we get an
+outgoing message pile-up? When is the best time to flush the buffers? What settings should I put on the uring?</p>
+<p>There are more considerations than that, but I didn't really need to tackle most of these issues, since I'm not shipping
+a production-ready lib that I'll support indefinitely, I'm just messing around with my WM. I cranked up the buffer
+size to more than necessary, and it works fine.</p>
+<p>Something that I did consider however, was whether to use <code>SQ-poll</code>, we'll get more into that and what that is.</p>
+<h3>Sharing memory with the kernel</h3>
+<p>Something that theoretically makes Io-uring more efficient than other io-alternatives is that the ring-buffers
+are shared with the kernel. There is no need to make a separate syscall for each sent message, if you put a message
+on the buffer, and update its offset through an atomic operation, that will be available for the kernel to use.<br>
+But the kernel does need to find out about the submission outside of just the updated state.
+There are two ways of doing this:</p>
+<ol>
+<li>Make a syscall. Write an arbitrary amount of tasks to the submission queue, then tell the kernel about them through
+a syscall. That same syscall can be used to wait until there are completions available as well, it's very flexible.
+<li>Have the kernel poll the shared memory for changes in the queue-offset and pick tasks up as they're added. Potentially,
+this is a large latency-decrease as well as a throughput increase, no more waiting for syscalls!
+</ol>
+<p>I thought this sounded great, in practice however, <code>SQPoll</code> resulted in a massive cpu-usage increase. I couldn't
+tolerate that, so I'll have to save that setting for a different project.
+In the end io-uring didn't change much about pgwm.</p>
+<h2>Stable</h2>
+<p>Since I ripped out <code>libc</code>, pgwm has required nightly to build, this has bothered me quite a bit.
+The reason that the nightly compiler was necessary was because of <code>tiny-std</code> using the <code>#[naked]</code> feature to create
+the assembly entrypoint (<code>_start</code> function), where the application starts execution.</p>
+<h3>Asm to global_asm</h3>
+<p>To be able to get <code>aux</code>-values, the <code>environment variable pointer</code>, and the arguments passed to the binary, access to
+the stack-pointer at its start-position is required. Therefore, a function that doesn't mess up the stack needs to be
+injected, passing that pointer to a normal function that can extract what's necessary.</p>
+<p>An example:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-c">/// Binary entrypoint</span>
+#[naked]
+#[no_mangle]
+#[cfg(all(feature <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>symbols<span class="pl-pds">"</span></span>, feature <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>start<span class="pl-pds">"</span></span>))]
+<span class="pl-k">pub</span> <span class="pl-k">unsafe</span> <span class="pl-k">extern</span> <span class="pl-s"><span class="pl-pds">"</span>C<span class="pl-pds">"</span></span> <span class="pl-k">fn</span> <span class="pl-en">_start</span>() {
+<span class="pl-c">    // Naked function making sure that main gets the first stack address as an arg</span>
+    #[cfg(target_arch <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>x86_64<span class="pl-pds">"</span></span>)]
+    {
+        <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">arch</span><span class="pl-k">::</span><span class="pl-en">asm!</span>(<span class="pl-s"><span class="pl-pds">"</span>mov rdi, rsp<span class="pl-pds">"</span></span>, <span class="pl-s"><span class="pl-pds">"</span>call __proxy_main<span class="pl-pds">"</span></span>, <span class="pl-en">options</span>(<span class="pl-smi">noreturn</span>))
+    }
+    #[cfg(target_arch <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>aarch64<span class="pl-pds">"</span></span>)]
+    {
+        <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">arch</span><span class="pl-k">::</span><span class="pl-en">asm!</span>(<span class="pl-s"><span class="pl-pds">"</span>MOV X0, sp<span class="pl-pds">"</span></span>, <span class="pl-s"><span class="pl-pds">"</span>bl __proxy_main<span class="pl-pds">"</span></span>, <span class="pl-en">options</span>(<span class="pl-smi">noreturn</span>))
+    }
+}
+<span class="pl-c">/// Called with a pointer to the top of the stack</span>
+#[no_mangle]
+#[cfg(all(feature <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>symbols<span class="pl-pds">"</span></span>, feature <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>start<span class="pl-pds">"</span></span>))]
+<span class="pl-k">unsafe</span> <span class="pl-k">fn</span> <span class="pl-en">__proxy_main</span>(<span class="pl-smi">stack_ptr</span><span class="pl-k">:</span> <span class="pl-k">*const</span> <span class="pl-en">u8</span>) {
+<span class="pl-c">    // Fist 8 bytes is a u64 with the number of arguments</span>
+    <span class="pl-k">let</span> <span class="pl-smi">argc</span> <span class="pl-k">=</span> <span class="pl-k">*</span>(<span class="pl-smi">stack_ptr</span> <span class="pl-k">as</span> <span class="pl-k">*const</span> <span class="pl-en">u64</span>);
+<span class="pl-c">    // Directly followed by those arguments, bump pointer by 8</span>
+    <span class="pl-k">let</span> <span class="pl-smi">argv</span> <span class="pl-k">=</span> <span class="pl-smi">stack_ptr</span><span class="pl-k">.</span><span class="pl-en">add</span>(<span class="pl-c1">8</span>) <span class="pl-k">as</span> <span class="pl-k">*const</span> <span class="pl-k">*const</span> <span class="pl-en">u8</span>;
+    <span class="pl-k">let</span> <span class="pl-smi">ptr_size</span> <span class="pl-k">=</span> <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">mem</span><span class="pl-k">::</span><span class="pl-en">size_of</span><span class="pl-k">::</span>&#x3C;<span class="pl-en">usize</span>>();
+<span class="pl-c">    // Directly followed by a pointer to the environment variables, it's just a null terminated string.</span>
+<span class="pl-c">    // This isn't specified in Posix and is not great for portability, but we're targeting Linux so it's fine</span>
+    <span class="pl-k">let</span> <span class="pl-smi">env_offset</span> <span class="pl-k">=</span> <span class="pl-c1">8</span> <span class="pl-k">+</span> <span class="pl-smi">argc</span> <span class="pl-k">as</span> <span class="pl-en">usize</span> <span class="pl-k">*</span> <span class="pl-smi">ptr_size</span> <span class="pl-k">+</span> <span class="pl-smi">ptr_size</span>;
+<span class="pl-c">    // Bump pointer by combined offset</span>
+    <span class="pl-k">let</span> <span class="pl-smi">envp</span> <span class="pl-k">=</span> <span class="pl-smi">stack_ptr</span><span class="pl-k">.</span><span class="pl-en">add</span>(<span class="pl-smi">env_offset</span>) <span class="pl-k">as</span> <span class="pl-k">*const</span> <span class="pl-k">*const</span> <span class="pl-en">u8</span>;
+    <span class="pl-k">unsafe</span> {
+        <span class="pl-c1">ENV</span><span class="pl-k">.</span>arg_c <span class="pl-k">=</span> <span class="pl-smi">argc</span>;
+        <span class="pl-c1">ENV</span><span class="pl-k">.</span>arg_v <span class="pl-k">=</span> <span class="pl-smi">argv</span>;
+        <span class="pl-c1">ENV</span><span class="pl-k">.</span>env_p <span class="pl-k">=</span> <span class="pl-smi">envp</span>;
+    }
+    <span class="pl-k">...</span><span class="pl-smi">etc</span>
+</pre></div>
+<p>I got this from an article by <a href="https://fasterthanli.me/">fasterthanli.me</a>. But later realized that
+you can use the <code>global_asm</code>-macro to generate the full function instead:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-c">// Binary entrypoint</span>
+#[cfg(all(feature <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>symbols<span class="pl-pds">"</span></span>, feature <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>start<span class="pl-pds">"</span></span>, target_arch <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>x86_64<span class="pl-pds">"</span></span>))]
+<span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">arch</span><span class="pl-k">::</span><span class="pl-en">global_asm!</span>(
+    <span class="pl-s"><span class="pl-pds">"</span>.text<span class="pl-pds">"</span></span>,
+    <span class="pl-s"><span class="pl-pds">"</span>.global _start<span class="pl-pds">"</span></span>,
+    <span class="pl-s"><span class="pl-pds">"</span>.type _start,@function<span class="pl-pds">"</span></span>,
+    <span class="pl-s"><span class="pl-pds">"</span>_start:<span class="pl-pds">"</span></span>,
+    <span class="pl-s"><span class="pl-pds">"</span>mov rdi, rsp<span class="pl-pds">"</span></span>,
+    <span class="pl-s"><span class="pl-pds">"</span>call __proxy_main<span class="pl-pds">"</span></span>
+);
+</pre></div>
+<h3>Symbols</h3>
+<p>While this means that <code>tiny-std</code> itself could potentially be part of a binary compiled with stable,
+if one would like to use for example <code>alloc</code> to have an allocator, then <code>rustc</code> would start emitting symbols
+like <code>memcpy</code>. Which rust doesn't provide for some reason.</p>
+<p>The solution to the missing symbols is simple enough, these symbols are provided in the external
+<a href="https://github.com/rust-lang/compiler-builtins">compiler-builtins</a> library, but that uses a whole host of features
+that require nightly. So I copied the implementation (and license), removing dependencies on nightly features, and
+exposed the symbols in <code>tiny-std</code>.</p>
+<p>Now an application (like pgwm), can be built with the stable toolchain using <code>tiny-std</code>.</p>
+<h2>Static</h2>
+<p>In my boot-writeup I wrote about creating a minimal <code>rust</code> bootloader. A problem I encountered was that it needed
+an interpreter. You can't see it with ldd:</p>
+<div class="highlight highlight-shell"><pre>[21:55:04 gramar@grarch marcusgrass.github.io]$ ldd ../pgwm/target/x86_64-unknown-linux-gnu/lto/pgwm
+        statically linked
+</pre></div>
+<p>Ldd lies (or maybe technically not), using <code>file</code>:</p>
+<div class="highlight highlight-shell"><pre>file ../pgwm/target/x86_64-unknown-linux-gnu/lto/pgwm
+../pgwm/target/x86_64-unknown-linux-gnu/lto/pgwm: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=9b54c91e5e84a8d3c90fdb9523f46e09cbf5c6e2, stripped
+</pre></div>
+<p>Or <code>readelf -S</code>:</p>
+<div class="highlight highlight-shell"><pre>
+[21:57:21 gramar@grarch marcusgrass.github.io]$ readelf -S ../pgwm/target/x86_64-unknown-linux-gnu/lto/pgwm
+There are 18 section headers, starting at offset 0x16a0b0:
+Section Headers:
+  [Nr] Name              Type             Address           Offset
+       Size              EntSize          Flags  Link  Info  Align
+  [ 0]                   NULL             0000000000000000  00000000
+       0000000000000000  0000000000000000           0     0     0
+  [ 1] .interp           PROGBITS         00000000000002a8  000002a8
+       000000000000001c  0000000000000000   A       0     0     1
+  [ 2] .note.gnu.bu[...] NOTE             00000000000002c4  000002c4
+       0000000000000024  0000000000000000   A       0     0     4
+  [ 3] .gnu.hash         GNU_HASH         00000000000002e8  000002e8
+       000000000000001c  0000000000000000   A       4     0     8
+  [ 4] .dynsym           DYNSYM           0000000000000308  00000308
+       0000000000000018  0000000000000018   A       5     1     8
+  [ 5] .dynstr           STRTAB           0000000000000320  00000320
+       0000000000000001  0000000000000000   A       0     0     1
+  [ 6] .rela.dyn         RELA             0000000000000328  00000328
+       0000000000008310  0000000000000018   A       4     0     8
+  [ 7] .text             PROGBITS         0000000000009000  00009000
+       000000000013d5a4  0000000000000000  AX       0     0     16
+  [ 8] .rodata           PROGBITS         0000000000147000  00147000
+       000000000000eb20  0000000000000000   A       0     0     32
+  [ 9] .eh_frame_hdr     PROGBITS         0000000000155b20  00155b20
+       0000000000001a8c  0000000000000000   A       0     0     4
+  [10] .eh_frame         PROGBITS         00000000001575b0  001575b0
+       000000000000c1dc  0000000000000000   A       0     0     8
+  [11] .gcc_except_table PROGBITS         000000000016378c  0016378c
+       000000000000000c  0000000000000000   A       0     0     4
+  [12] .data.rel.ro      PROGBITS         0000000000164e28  00163e28
+       0000000000006088  0000000000000000  WA       0     0     8
+  [13] .dynamic          DYNAMIC          000000000016aeb0  00169eb0
+       0000000000000110  0000000000000010  WA       5     0     8
+  [14] .got              PROGBITS         000000000016afc0  00169fc0
+       0000000000000040  0000000000000008  WA       0     0     8
+  [15] .data             PROGBITS         000000000016b000  0016a000
+       0000000000000008  0000000000000000  WA       0     0     8
+  [16] .bss              NOBITS           000000000016b008  0016a008
+       0000000000000458  0000000000000000  WA       0     0     8
+  [17] .shstrtab         STRTAB           0000000000000000  0016a008
+       00000000000000a8  0000000000000000           0     0     1
+</pre></div>
+<p>Both <code>file</code> and <code>readelf</code> (<code>.interp</code> section) shows that this binary needs an interpreter, that being
+<code>/lib64/ld-linux-x86-64.so.2</code>. If the binary is run in an environment without it, it
+will immediately crash.</p>
+<p>If compiled statically with <code>RUSTFLAGS='-C target-feature=+crt-static'</code> the application segfaults, oof.</p>
+<p>I haven't found out the reason why <code>tiny-std</code> cannot run as a
+<a href="https://en.wikipedia.org/wiki/Position-independent_code">position-independent</a> executable,
+or I know why, all the addresses to symbols (like static variables) are wrong. What I don't know yet is
+how to fix it.</p>
+<p>There is a no-code way of fixing it though: <code>RUSTFLAGS='-C target-feature=+crt-static -C relocation-model=static'</code>.<br>
+This way the application will be statically linked, without requiring an interpreter, but it will not be
+position independent.</p>
+<p>If you know how to make that work, please tell me, because figuring that out isn't easy.</p>
+<h2>Future plans</h2>
+<p>I'm tentatively looking into making threading work, but that is a lot of work and a
+lot of segfaults on the way.</p>
+</div>
+</div>
\ No newline at end of file
diff --git a/rust-kbd.html b/rust-kbd.html
new file mode 100644
index 0000000..6f7efe7
--- /dev/null
+++ b/rust-kbd.html
@@ -0,0 +1,818 @@
+<!DOCTYPE html>
+<html lang="en" xmlns="http://www.w3.org/1999/html">
+
+    <meta charset="UTF-8">
+    <base href="/">
+    <link rel="stylesheet" href="static/styles.css">
+    <link rel="stylesheet" href="static/github-markdown.css">
+    <link rel="stylesheet" href="static/starry_night.css">
+    <title>RustKbd</title>
+
+
+<div id="menu">
+<a href=/ class="menu-item">Home</a><a href=/table-of-contents.html class="menu-item">Table of contents</a>
+</div>
+<div id="content">
+<div class="markdown-body"><h1>Building keyboard firmware in Rust, an embedded journey</h1>
+<p><a href="/kbd-smp">Last time</a>, I wrote about enabling Symmetric Multiprocessing on a keyboard using
+<a href="https://qmk.fm/">QMK</a> (and <a href="https://www.chibios.org/dokuwiki/doku.php">Chibios</a>).<br>
+This was discovered to be a bad idea, as I was told by a maintainer, or at least the way I was doing it, QMK
+is not made for multithreading (yet).</p>
+<p>My daughter sleeps a lot during the days, so I decided to step up the level of ambition a bit:
+Can keyboard firmware be reasonably written from "scratch" using Rust, I asked myself, and found out that it can.</p>
+<h2>Overview</h2>
+<p>This writeup is about how I wrote multicore firmware using Rust for a <a href="https://splitkb.com/products/aurora-lily58?variant=43553010090243">lily58 PCB</a>,
+and a <a href="https://splitkb.com/products/liatris?_pos=1&#x26;_sid=9363d742f&#x26;_ss=r">Liatris</a> (<a href="https://www.raspberrypi.com/products/rp2040/">rp2040-based</a>)
+microcontroller. The code for it is <a href="https://github.com/MarcusGrass/rp2040-kbd">here</a>.</p>
+<ol>
+<li>Callback to the last writeup
+<li>Embedded on Rust
+<li>Development process (Serial interfaces)
+<li>Figuring out the MCU&#x3C;->PCB interplay using QMK
+<li>Split keyboard communication woes
+<li>Keymaps
+<li>USB HID Protocol
+<li>OLED displays
+<li>BUUUUGS
+<li>Performance
+<li>Epilogue
+</ol>
+<h2>On the last episode of 'Man wastes time reinventing wheel'</h2>
+<p>Last time I did a pretty thorough dive into QMK, explaining keyboard basics, and most of the jargon used.<br>
+I'm not going to be as thorough this time, but briefly:</p>
+<h3>Enthusiast keyboards</h3>
+<p>There are communities building enthusiast keyboards, often soldering components together themselves, and tailoring their
+own firmware to fit their needs (or wants).</p>
+<p>Generally, a keyboard consists of the PCB, microcontroller (sometimes integrated with the PCB), switches that go on the PCB,
+and keycaps that go on the switches. Split keyboards are also fairly popular, those keyboards generally have two separate PCBs
+that are connected to each other by wire, I've been using the split keyboard
+<a href="https://keeb.io/collections/iris-split-ergonomic-keyboard">iris</a> for a long time.
+There are also peripherals, such as <a href="https://keeb.io/products/rotary-encoder-ec11?_pos=1&#x26;_sid=0becfc852&#x26;_ss=r">rotary encoders</a>,
+<a href="https://en.wikipedia.org/wiki/OLED">oled</a> displays, sound emitters, RGB lights and many more that can be integrated
+with the keyboard. Pretty much any peripheral that the microcontroller can interface with is a possible add-on to
+a user's keyboard.</p>
+<h4>QMK</h4>
+<p>To get the firmware together, an open source firmware repo called QMK can be used. There are a few others but to my
+knowledge QMK is the most popular and mature alternative. You can make a keymap without writing any code at all,
+but if you want to interface with peripherals, or execute advanced logic, some C-code will be necessary.</p>
+<h3>Back to last time</h3>
+<p>I bought a microcontroller which has dual cores, and I wanted to use them to offload oled-drawing to the core that
+doesn't handle latency-sensitive activities, and did a deep dive into enabling that for my setup.
+While it worked it was not thread-safe and generally discouraged by the maintainers.</p>
+<p>That's when I decided to write my own firmware in Rust.</p>
+<h2>Embedded on Rust</h2>
+<p>I hadn't written code for embedded targets before my last foray into keyboard firmware, I had some tangential experience
+with the <a href="https://github.com/rust-embedded/heapless">heapless</a> library which exposes stack-allocated collections.
+These can be useful for performance in some cases, but very useful if you haven't got a heap at all, like you
+often will not have on embedded devices.</p>
+<p>I searched for rp2040 Rust and found <a href="https://github.com/rp-rs/rp-hal">rp-hal</a>, hal stands for Hardware
+Abstraction Layer, and the crate exposes high-level code to interface with low-level processor and peripheral functionality.</p>
+<p>For example, spawning a task on the second core, resetting to bootloader, reading <a href="https://en.wikipedia.org/wiki/General-purpose_input/output">GPIO</a>
+pins, and more. This was a good starting point, when I found this project I had already soldered together
+the keyboard and was ready to write firmware for it.</p>
+<h3>CPU and board</h3>
+<p>rp-hal provides access to the basic CPU-functionality, but that CPU is mounted on a board in itself, which has
+peripherals, in this case it's the <a href="https://splitkb.com/products/liatris">Liatris</a>, the mapping of the outputs
+of the board to code is called a Board support package (BSP), and can be put in the
+<a href="https://github.com/rp-rs/rp-hal-boards">rp-hal-boards</a> repo so that they can be shared.
+I haven't made a PR for <a href="https://github.com/marcusgrass/rp-hal-boards">my fork yet</a>,
+I'm planning to do it when I've worked out all remaining bugs in my code, but it's very much based on the
+<a href="https://github.com/rp-rs/rp-hal-boards/tree/main/boards/rp-pico">rp-pico BSP</a>.</p>
+<h2>Starting development</h2>
+<p>Now I wanted to get any firmware running just to see that it's working.</p>
+<h3>USB serial</h3>
+<p>The Liatris MCU has an integrated USB-port, I figured that the easiest way to see if the firmware boots and works
+at all was to implement some basic communication over that port, until I can get some information out of the MCU
+I'm flying completely blind.</p>
+<p>The <a href="https://github.com/rp-rs/rp-hal-boards/blob/main/boards/rp-pico/examples/pico_usb_serial.rs">rp-pico BSP examples</a>
+were excellent, using them I could set up a serial interface which just echoed back what was written to it to the OS.</p>
+<p>Hooking the serial interface up to the OS was another matter though. I compiled the firmware and
+flashed it to the keyboard by holding down the onboard boot-button and pressing reset, then went to
+figure out the OS parts.</p>
+<h4>USB CDC ACM</h4>
+<p>After some searching I realize that I need some drivers to connect to the serial device:
+USB CDC ACM, USB and two meaningless letter combinations. Together they stand for</p>
+<blockquote>
+<p>Universal Serial Bus Communication Device Class Abstract Control Model</p>
+</blockquote>
+<p>When the correct drivers are installed, and the keyboard plugged in, <a href="https://man7.org/linux/man-pages/man1/dmesg.1.html">dmesg</a>
+tells me that there's a new device under <code>/dev/ttyACM0</code>.</p>
+<div class="highlight highlight-shell"><pre><span class="pl-c1">echo</span> <span class="pl-s"><span class="pl-pds">"</span>Hello!<span class="pl-pds">"</span></span> <span class="pl-k">>></span> /dev/ttyACM0
+</pre></div>
+<p>No response.</p>
+<p>I do some more searching and find out that two-way communication with serial devices over the CDC-ACM-driver
+isn't as easy as echoing and <code>cat</code>ing a file. <a href="https://linux.die.net/man/1/minicom">minicom</a> is a program
+that can interface with this kind of device, but the UX was obtuse, looking for alternatives I found
+<a href="https://linux.die.net/man/8/picocom">picocom</a> which serves the same purpose but is slightly nicer to use:</p>
+<div class="highlight highlight-shell"><pre>[root@grentoo /home/gramar]<span class="pl-c"># picocom -b 115200 -l /dev/ttyACM0</span>
+picocom v3.1
+port is        <span class="pl-c1">:</span> /dev/ttyACM0
+flowcontrol    <span class="pl-c1">:</span> none
+baudrate is    <span class="pl-c1">:</span> 115200
+parity is      <span class="pl-c1">:</span> none
+databits are   <span class="pl-c1">:</span> 8
+stopbits are   <span class="pl-c1">:</span> 1
+escape is      <span class="pl-c1">:</span> C-a
+<span class="pl-k">local</span> <span class="pl-c1">echo</span> is  <span class="pl-c1">:</span> no
+noinit is      <span class="pl-c1">:</span> no
+noreset is     <span class="pl-c1">:</span> no
+hangup is      <span class="pl-c1">:</span> no
+nolock is      <span class="pl-c1">:</span> yes
+send_cmd is    <span class="pl-c1">:</span> sz -vv
+receive_cmd is <span class="pl-c1">:</span> rz -vv -E
+imap is        <span class="pl-c1">:</span> 
+omap is        <span class="pl-c1">:</span> 
+emap is        <span class="pl-c1">:</span> crcrlf,delbs,
+logfile is     <span class="pl-c1">:</span> none
+initstring     <span class="pl-c1">:</span> none
+exit_after is  <span class="pl-c1">:</span> not <span class="pl-c1">set</span>
+<span class="pl-c1">exit</span> is        <span class="pl-c1">:</span> no
+Type [C-a] [C-h] to see available commands
+Terminal ready
+</pre></div>
+<p>There's a connection! Enabling echo and writing <code>hello</code> gives the output <code>hHeElLlLoO</code>, the Liatris responding
+with a capitalized echo.</p>
+<h4>Making DevEx nicer</h4>
+<p>I write some code that checks the last entered characters and executes commands depending on what they are.
+First off, making a reboot easier:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">if</span> <span class="pl-smi">last_chars</span><span class="pl-k">.</span><span class="pl-en">ends_with</span>(<span class="pl-s">b<span class="pl-pds">"</span>boot<span class="pl-pds">"</span></span>) {
+    <span class="pl-en">reset_to_usb_boot</span>(<span class="pl-c1">0</span>, <span class="pl-c1">0</span>);
+}
+</pre></div>
+<p>Great, now I can connect to the device and type boot, and it'll boot into flash-mode so that I can load new firmware
+onto it, this made iterating much faster. Since everything was soldered and mounted, I had to use a (wooden) skewer
+to reach under the oled and press the boot button on the microcontroller before this. I recommend not soldering on
+components blocking access to the boot-button if doing this kind of programming.</p>
+<h2>Developing actual keyboard functionality</h2>
+<p>There are <a href="https://docs.splitkb.com/hc/en-us/articles/6942088875292-Aurora-Lily58-schematics">schematics for the pcb</a>
+online, as well as a <a href="https://docs.splitkb.com/hc/en-us/articles/6485704310044-Elite-Pi-Technical-data">schematic of the pinout of the elite-c MCU</a>,
+which the developers told me were the same as for the Liatris, this seems to be true.</p>
+<p>Rows and columns are connected to GPIO-pins in the MCU, switches connect rows and columns, if switches are pressed a current can flow between them.
+My first thought was that if a switch that sits between <code>row0</code> and <code>col0</code> is pressed, the pin for <code>row0</code> and <code>col0</code> would read
+<code>high</code> (or <code>low</code>), that's not the case.</p>
+<h3>PullUp and PullDown resistors</h3>
+<p>Here is where my complete ignorance of embedded comes to haunt me, GPIO pins can be configured to be either PullUp or
+PullDown, what that meant was beyond me, it still is to a large extent. The crux of it is that either
+there's a resistor connected to power or ground, up or down respectively.</p>
+<p>That made some sense to me, I figure either the rows or columns should be PullUp while the other is PullDown.
+This did not produce any reasonable results either.
+At this point, I had written some debug-code which scanned all GPIO-pins and printed if their state changed, and
+I was mashing keyboard buttons with strange output as a result.</p>
+<p>I was getting frustrated with non-progress and decided to look into QMK, there's a lot of <code>__weak__</code>-linkage,
+the <a href="https://en.wikipedia.org/wiki/Weak_symbol">abstract class of C</a>, so actually following the code in QMK
+can be difficult, which is why I hadn't browsed it in more depth earlier.</p>
+<p>But I did find the problem. All pins, rows and columns, should be pulled <code>high</code> (PullUp),
+then the column that should be checked is set <code>low</code>, and then all rows are checked, if any row goes <code>low</code> then the switch
+connecting the checked column and that row is being pressed. In other words:</p>
+<p>Set <code>col0</code> to <code>low</code>, if <code>row0</code> is still <code>high</code>, switch <code>0, 0</code> top-left for example, is not pressed.
+If <code>row1</code> is now <code>low</code>, it means that switch <code>1, 0</code>, first key on the second row, is being pressed.</p>
+<p>Now I can detect which keys are being pressed, useful functionality for a keyboard.</p>
+<h3>Split keyboards</h3>
+<p>Looking back at the <a href="https://docs.splitkb.com/hc/en-us/article_attachments/8356613654940">schematic</a> I see that there's a pin
+labeled side-indicator, that either goes to ground or voltage. After a brief check it reads, as expected, <code>high</code> on the left
+side, and <code>low</code> on the right side.</p>
+<p>Now that I can detect which keys are being pressed, by coordinates, and which side is being run,
+it's time to transmit key-presses from the right-side to the left.</p>
+<p>The reason to do it that way is that the left is the side that I'm planning on connecting to the computer with a
+usb-cable. Now, I could have written to code to be side-agnostic, checking whether a USB-cable is connected and choosing
+whether to send key-presses over the wire connecting the sides, or the USB-cable. However, that approach both increases
+complexity and binary size, so I opted not to.</p>
+<h4>Stupid note</h4>
+<p>I could also have made each side a separate independent keyboard, which would have been pretty fun, but problematic
+for a lot of reasons, like using left shift pressing a right-key, I'd have to have software on the computer to patch them
+together.</p>
+<h4>Bits over serial</h4>
+<p>Looking at the schematics again, I see that one pin is labeled <code>DATA</code>, that pin is the one connected to the
+pad that the <a href="https://splitkb.com/products/coiled-angled-trrs-cable">TRRS cable</a> connects the sides with.<br>
+However, there is only one pin on each side, which means that all communication is limited to setting/reading
+<code>high</code>/<code>low</code> on a single pin. Transfer is therefore limited to one bit at a time.</p>
+<p>Looking over the default configuration for my keyboard in QMK the <a href="https://github.com/qmk/qmk_firmware/blob/master/docs/serial_driver.md">BitBang</a>
+driver is used since nothing else is specified, there are also USART, single- and full-duplex available.</p>
+<h4>UART/USART</h4>
+<p><a href="https://en.wikipedia.org/wiki/Universal_asynchronous_receiver-transmitter">UART</a> stands for Universal Asynchronous
+Receiver-Transmitter, and is a protocol (although the wiki says a peripheral device, terminology unclear)
+to send bits over a wire.</p>
+<p>There is a UART-implementation for the rp2040, in the <a href="https://github.com/rp-rs/rp-hal">rp-hal-crate</a>, but it
+assumes usage of the builtin uart-peripheral, that uses both an RX and TX-pin in a pre-defined set position, in my case
+I want to either have half-duplex communication (one side communicates at a time), or simplex communication from right
+to left. That means that the <code>DATA</code>-pin on the left side should be UART-RX (receiver) while the <code>DATA</code>-pin on
+the right is UART-TX (transmitter).</p>
+<p>I search further for single-pin UART and find out about <a href="https://tutoduino.fr/en/pio-rp2040-en/">PIO</a>.</p>
+<h4>PIO</h4>
+<p>The rp2040 has blocks with state-machines which can run like a separate processor manipulating and reading
+pin-states, these can be programmed with specific assembly, and there just happens to be someone
+who programmed a uart-implementation in that assembly <a href="https://github.com/raspberrypi/pico-examples/blob/master/pio/uart_rx/uart_rx.pio">here</a>.</p>
+<p>It also turns out that someone ported that implementation to a Rust library <a href="https://github.com/Sympatron/pio-uart">here</a>.</p>
+<p>I hooked up the RX-part to the left side, and the TX to the right, and it worked!</p>
+<p><em>Note</em></p>
+<p>You could probably make a single-pin half-duplex uart implementation by modifying the above pio-asm by not that much.
+You'd just have to figure out how to wait on either data in the input register from the user program, or communication
+starting from the other side. There's a race-condition there though, maybe I'll get to that later.</p>
+<h4>Byte-protocol</h4>
+<p>Since I'm using hardware to send data bit-by-bit I made a slimmed-down protocol. The right side has 28 buttons and a
+rotary-encoder. A delta can be fit into a single byte.</p>
+<hr>
+<p><em>Edit 2024-04-17</em></p>
+<p>Changed this to two bytes, where the content is sandwitched between a header and footer like this:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">const</span> <span class="pl-c1">HEADER</span><span class="pl-k">:</span> <span class="pl-en">u16</span> <span class="pl-k">=</span> <span class="pl-c1">0b0101_0000_0000_0000</span>;
+<span class="pl-k">const</span> <span class="pl-c1">FOOTER</span><span class="pl-k">:</span> <span class="pl-en">u16</span> <span class="pl-k">=</span> <span class="pl-c1">0b0000_0000_0000_0101</span>;
+<span class="pl-c">// convert 8 bit msg into 16 bits, shift it 4 to the left</span>
+<span class="pl-c">// Then OR with header and footer to create 16 bits with the actual message at the middle</span>
+<span class="pl-k">let</span> <span class="pl-smi">msg</span> <span class="pl-k">=</span> ((<span class="pl-smi">byte_to_send</span> <span class="pl-k">as</span> <span class="pl-en">u16</span>) <span class="pl-k">&#x3C;&#x3C;</span> <span class="pl-c1">4</span>) <span class="pl-k">|</span> <span class="pl-c1">HEADER</span> <span class="pl-k">|</span> <span class="pl-c1">FOOTER</span>;
+</pre></div>
+<p>The reason is that if the right-side is disconnected and reconnected, the lowering and then
+raising of the uart-pin becomes a valid message, but it'll be wrong. Either it will be all <code>0s</code> or all <code>1s</code>
+at the head or tail of the message, which these bit-patterns eliminate.</p>
+<hr>
+<p>Visualizing the keyboard's keys as a matrix with <code>5</code> rows, and <code>6</code> columns there's at most 30 keys.
+The keys can be translated into a matrix-index where <code>0,0</code> => <code>0</code>, <code>1,0</code> -> <code>6</code>, <code>2, 3</code> -> <code>15</code>, by rolling out
+the <code>2d</code>-array into a <code>1d</code> one.</p>
+<p>In the protocol, the first 5 bits gives the matrix-index of the key that changed. The 6th bit is whether
+that key was pressed or released, the 7th bit indicates whether the rotary-encoder has a change, and the 8th
+bit indicates whether that change was clock- or counter-clockwise.</p>
+<p>For better or worse, almost all bit-patterns are valid, some may represent keys that do not exist, since there are
+28 keys, but 32 slots for the 5 bits indicating the matrix-index.</p>
+<p>I used the <a href="https://docs.rs/bitvec/latest/bitvec/">bitvec</a> crate for bit-manipulation when prototyping,
+that library is excellent.<br>
+I warmly recommend it, even though I went with a more custom solution for performance reasons (I made some
+specific optimizations to my use-case, see 'Performance').</p>
+<h2>Keymap</h2>
+<p>Now, to send key-presses to the OS, <a href="https://docs.rs/usbd-hid/latest/usbd_hid/">of course there's a crate for that</a>.</p>
+<p>It helps with the plumbing and exposes the struct that I've got to send to the OS (and the API to do the sending),
+I just have to fill it with reasonable values:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-c">/// Struct that the OS wants</span>
+<span class="pl-k">pub</span> <span class="pl-k">struct</span> <span class="pl-en">KeyboardReport</span> {
+    <span class="pl-k">pub</span> <span class="pl-smi">modifier</span><span class="pl-k">:</span> <span class="pl-en">u8</span>,
+    <span class="pl-k">pub</span> <span class="pl-smi">reserved</span><span class="pl-k">:</span> <span class="pl-en">u8</span>,
+    <span class="pl-k">pub</span> <span class="pl-smi">leds</span><span class="pl-k">:</span> <span class="pl-en">u8</span>,
+    <span class="pl-k">pub</span> <span class="pl-smi">keycodes</span><span class="pl-k">:</span> [<span class="pl-en">u8</span>; <span class="pl-c1">6</span>],
+}
+</pre></div>
+<p>I found <a href="https://usb.org/sites/default/files/hut1_3_0.pdf">this pdf from usb.org</a>, which specifies keycode and modifier
+values. I encoded those <a href="https://github.com/MarcusGrass/rp2040-kbd/blob/main/rp2040-kbd/src/hid/keycodes.rs#L3">as a struct</a>.</p>
+<div class="highlight highlight-rust"><pre>#[repr(transparent)]
+#[derive(<span class="pl-en">Copy</span>, <span class="pl-en">Clone</span>, <span class="pl-en">Debug</span>, <span class="pl-en">Eq</span>, <span class="pl-en">PartialEq</span>)]
+<span class="pl-k">pub</span> <span class="pl-k">struct</span> <span class="pl-en">KeyCode</span>(<span class="pl-k">pub</span> <span class="pl-en">u8</span>);
+#[allow(dead_code)]
+<span class="pl-k">impl</span> <span class="pl-en">KeyCode</span> {
+<span class="pl-c">    //Keyboard = 0x01; //ErrorRollOver1 Sel N/A 3 3 3 4/101/104</span>
+<span class="pl-c">    //Keyboard = 0x02; //POSTFail1 Sel N/A 3 3 3 4/101/104</span>
+<span class="pl-c">    //Keyboard = 0x03; //ErrorUndefined1 Sel N/A 3 3 3 4/101/104</span>
+    <span class="pl-k">pub</span> <span class="pl-k">const</span> <span class="pl-c1">A</span><span class="pl-k">:</span> <span class="pl-c1">Self</span> <span class="pl-k">=</span> <span class="pl-c1">Self</span>(<span class="pl-c1">0x04</span>);<span class="pl-c"> //a and A2 Sel 31 3 3 3 4/101/104</span>
+    <span class="pl-k">pub</span> <span class="pl-k">const</span> <span class="pl-c1">B</span><span class="pl-k">:</span> <span class="pl-c1">Self</span> <span class="pl-k">=</span> <span class="pl-c1">Self</span>(<span class="pl-c1">0x05</span>);<span class="pl-c"> //b and B Sel 50 3 3 3 4/101/104</span>
+<span class="pl-c">    // ... etc etc etc</span>
+</pre></div>
+<p>Now I know which button is pressed by coordinates, and how to translate those to values that the OS can understand.</p>
+<p>And it works! Kind of...</p>
+<h3>USB HID Protocol?</h3>
+<p>I will admit that I did not read the entire PDF, what I did find out was that there's a poll-rate that the OS specifies,
+I set that at the lowest possible value, 1ms. Each 1 ms the OS triggers an interrupt:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-c">/// Interrupt handler</span>
+<span class="pl-c">/// Safety: Called from the same core that publishes</span>
+#[interrupt]
+#[allow(non_snake_case)]
+#[cfg(feature <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>hiddev<span class="pl-pds">"</span></span>)]
+<span class="pl-k">unsafe</span> <span class="pl-k">fn</span> <span class="pl-en">USBCTRL_IRQ</span>() {
+    <span class="pl-k">crate::</span><span class="pl-en">runtime</span><span class="pl-k">::</span><span class="pl-en">shared</span><span class="pl-k">::</span><span class="pl-en">usb</span><span class="pl-k">::</span><span class="pl-en">hiddev_interrupt_poll</span>();
+}
+</pre></div>
+<h4>Oh right, interrupts</h4>
+<p><a href="https://en.wikipedia.org/wiki/Interrupt">Interrupts</a> are ways for the processor to interrupt current executing code
+and executing something else, interrupt handlers are similar to <a href="https://man7.org/linux/man-pages/man7/signal.7.html">Linux signal handlers</a>.</p>
+<p>In this specific case, the USB-peripheral generates an interrupt when polled, the core that registered an interrupt
+handler for that specific interrupt (<code>USBCTRL_IRQ</code>) will pause current execution and run the code contained in
+the interrupt-handler.</p>
+<p>This has potential of triggering UB with unsafe code (depending on where the core was stopped, it may have been holding
+a mutable reference which the interrupt handler needs), and deadlocks with code that guards against multiple mutable
+references through locking.</p>
+<p>One way to handle this, if using mutable statics (which you almost certainly have to without an allocator),
+is to execute sensitive code within a <code>critical_section</code>, of course,
+<a href="https://docs.rs/critical-section/latest/critical_section/">there's a library for that</a>.<br>
+The critical-section, when entered, causes the core to ignore interrupts until exited.</p>
+<div class="highlight highlight-rust"><pre><span class="pl-c">// Both of these functions use the same static mut variable</span>
+#[cfg(feature <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>hiddev<span class="pl-pds">"</span></span>)]
+<span class="pl-k">pub</span> <span class="pl-k">unsafe</span> <span class="pl-k">fn</span> <span class="pl-en">try_push_report</span>(<span class="pl-smi">keyboard_report</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span><span class="pl-en">usbd_hid</span><span class="pl-k">::</span><span class="pl-en">descriptor</span><span class="pl-k">::</span><span class="pl-en">KeyboardReport</span>) <span class="pl-k">-></span> <span class="pl-en">bool</span> {
+<span class="pl-c">    // This core won't be interrupted while handling the mutable reference.</span>
+<span class="pl-c">    // A regular lock without a critical section here would cause a deadlock in the below interrupt handling procedure </span>
+<span class="pl-c">    // if timing is unfortunate.</span>
+    <span class="pl-en">critical_section</span><span class="pl-k">::</span><span class="pl-en">with</span>(<span class="pl-k">|</span><span class="pl-smi">_cs</span><span class="pl-k">|</span> {
+        <span class="pl-c1">USB_HIDDEV</span>
+            <span class="pl-k">.</span><span class="pl-en">as_mut</span>()
+            <span class="pl-k">.</span><span class="pl-en">is_some_and</span>(<span class="pl-k">|</span><span class="pl-smi">hid</span><span class="pl-k">|</span> <span class="pl-smi">hid</span><span class="pl-k">.</span><span class="pl-en">try_submit_report</span>(<span class="pl-smi">keyboard_report</span>))
+    })
+}
+#[cfg(feature <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>hiddev<span class="pl-pds">"</span></span>)]
+<span class="pl-k">pub</span> <span class="pl-k">unsafe</span> <span class="pl-k">fn</span> <span class="pl-en">hiddev_interrupt_poll</span>() {
+<span class="pl-c">    // This core won't be interrupted, because there's only one interrupt registered, so there's nothing to interrupt this.</span>
+<span class="pl-c">    // Since it's already interrupted the core that handles the other mutable reference to this variable </span>
+<span class="pl-c">    // we can be certain that this is the only mutable reference active without a critical section or other lock.</span>
+    <span class="pl-k">if</span> <span class="pl-k">let</span> <span class="pl-en">Some</span>(<span class="pl-smi">hid</span>) <span class="pl-k">=</span> <span class="pl-c1">USB_HIDDEV</span><span class="pl-k">.</span><span class="pl-en">as_mut</span>() {
+        <span class="pl-smi">hid</span><span class="pl-k">.</span><span class="pl-en">poll</span>();
+    }
+}
+</pre></div>
+<h3>USB HID protocol</h3>
+<p>Back to the protocol, the API has two ends, one for polling the OS, one for submitting HID-reports.<br>
+It turns out that even if you don't expect any data from the OS the device needs to be polled to communicate.</p>
+<p>In my first shot I just pushed keyboard reports on every diff and polling immediately after. This caused
+key-actions to disappear, they didn't reach the OS.</p>
+<p>I still haven't quite figured out why since I'm not overflowing the buffer, digging into the code didn't help me
+understand much either, but it was pretty opaque.</p>
+<p>I settled for pushing at most one keyboard report per poll, that means at most one per ms.
+This means a worst case latency of 1ms on a key-action assuming there's no queue-backup, I keep eventual unpublishable
+reports in a queue that's drained 1 entry per poll. Again, there may be something written in the specifications
+about this, but it's good enough for now.</p>
+<h4>Follow-up</h4>
+<p>I did try to find more information about the USB HID protocol but was unable to.
+I also tried to figure out how to do <a href="https://en.wikipedia.org/wiki/Key_rollover">keyrollover, specifically NKRO</a>
+but could not figure out how to have more registered keys than the <code>keyboard_report</code>-struct
+can fit (6), so the keyboard is <code>6KRO</code>, which Is fine by me.</p>
+<h2>Oled displays</h2>
+<p>One of the motivators for using multiple cores were the ability to render to oled on-demand with low latency.</p>
+<p>Drawing to an oled display is comparatively slow, so offloading that to a separate core was something that I was interested
+in doing.</p>
+<p>I created a shared message queue guarded by a spin-lock:</p>
+<div class="highlight highlight-rust"><pre>#[derive(<span class="pl-en">Debug</span>, <span class="pl-en">Copy</span>, <span class="pl-en">Clone</span>)]
+<span class="pl-k">pub</span> <span class="pl-k">enum</span> <span class="pl-en">KeycoreToAdminMessage</span> {
+<span class="pl-c">    // Notify on any user action</span>
+    <span class="pl-en">Touch</span>,
+<span class="pl-c">    // Send loop count to calculate scan latency</span>
+    <span class="pl-en">Loop</span>(<span class="pl-en">LoopCount</span>),
+<span class="pl-c">    // Output which layer is active</span>
+    <span class="pl-en">LayerChange</span>(<span class="pl-en">KeymapLayer</span>),
+<span class="pl-c">    // Output bytes received over UART</span>
+    <span class="pl-en">Rx</span>(<span class="pl-en">u16</span>),
+<span class="pl-c">    // Write a boot message then trigger usb-boot</span>
+    <span class="pl-en">Reboot</span>,
+}
+</pre></div>
+<p>When displayed it looks like this:</p>
+<p><img src="/static/rust-kbd-oled.jpg" alt="oleds"></p>
+<p>Setting it up was pretty trivial, there's a library for <a href="https://docs.rs/ssd1306/latest/ssd1306/">SSD1306 oleds</a>
+which works great!</p>
+<p>Now I have a keyboard that can submit key-presses to the OS, and display some debug information on its oleds,
+time to get into the bugs.</p>
+<h2>BUUUUUUUGS</h2>
+<p>Almost immediately when trying to type I discovered that keys would be repeated, pressing t would result in
+19 t's for example.</p>
+<h3>Spooky electrons, debounce!</h3>
+<p>I looked into QMK once more, since my keyboard with QMK firmware doesn't have issues (IE not a hardware problem).<br>
+All excepts of C below are from <a href="https://github.com/qmk/qmk_firmware">QMK</a>, <a href="https://github.com/qmk/qmk_firmware/blob/master/license_GPLv3.md">license here</a>.</p>
+<p>Here's the function that reads pins:</p>
+<div class="highlight highlight-c"><pre><span class="pl-c">/// quantum/matrix.c</span>
+<span class="pl-en">__attribute__</span>((weak)) void matrix_read_rows_on_col(<span class="pl-c1">matrix_row_t</span> current_matrix[], <span class="pl-c1">uint8_t</span> current_col, <span class="pl-c1">matrix_row_t</span> row_shifter) {
+    <span class="pl-k">bool</span> key_pressed = <span class="pl-c1">false</span>;
+    <span class="pl-c">// Select col</span>
+    <span class="pl-k">if</span> (!<span class="pl-c1">select_col</span>(current_col)) { <span class="pl-c">// select col</span>
+        <span class="pl-k">return</span>;                     <span class="pl-c">// skip NO_PIN col</span>
+    }
+    <span class="pl-c1">matrix_output_select_delay</span>();
+    <span class="pl-c">// For each row...</span>
+    <span class="pl-k">for</span> (<span class="pl-c1">uint8_t</span> row_index = <span class="pl-c1">0</span>; row_index &#x3C; ROWS_PER_HAND; row_index++) {
+        <span class="pl-c">// Check row pin state</span>
+        <span class="pl-k">if</span> (<span class="pl-c1">readMatrixPin</span>(row_pins[row_index]) == <span class="pl-c1">0</span>) {
+            <span class="pl-c">// Pin LO, set col bit</span>
+            current_matrix[row_index] |= row_shifter;
+            key_pressed = <span class="pl-c1">true</span>;
+        } <span class="pl-k">else</span> {
+            <span class="pl-c">// Pin HI, clear col bit</span>
+            current_matrix[row_index] &#x26;= ~row_shifter;
+        }
+    }
+    <span class="pl-c">// Unselect col</span>
+    <span class="pl-c1">unselect_col</span>(current_col);
+    <span class="pl-c1">matrix_output_unselect_delay</span>(current_col, key_pressed); <span class="pl-c">// wait for all Row signals to go HIGH</span>
+}
+</pre></div>
+<p>I had looked at it previously, but disregarded those delays (<code>matrix_output_select_delay()</code> and
+<code>matrix_output_unselect_delay(current_col, key_pressed); // wait for all Row signals to go HIGH</code>), because
+we're trying to be speedy here. <code>Thread.sleep()</code> isn't speedy, everyone knows that.</p>
+<p>However, it turns out that they are important. Again I have to follow weak functions, a nightmare:</p>
+<div class="highlight highlight-c"><pre><span class="pl-c">/// quantum/matrix_common.c</span>
+<span class="pl-en">__attribute__</span>((weak)) void matrix_output_select_delay(<span class="pl-k">void</span>) {
+    <span class="pl-c1">waitInputPinDelay</span>();
+}
+<span class="pl-c">// Found implementation in -> </span>
+<span class="pl-c">/// platform/chibios/_wait.h</span>
+#<span class="pl-k">ifndef</span> GPIO_INPUT_PIN_DELAY
+#    <span class="pl-k">define</span> <span class="pl-en">GPIO_INPUT_PIN_DELAY</span> (CPU_CLOCK / <span class="pl-c1">1000000L</span> / <span class="pl-c1">4</span>)
+#<span class="pl-k">endif</span>
+#<span class="pl-k">define</span> <span class="pl-en">waitInputPinDelay</span>() wait_cpuclock(GPIO_INPUT_PIN_DELAY)
+</pre></div>
+<p>I get no editor support in this project, so I have to grep through countless board implementations until I found
+the correct one, which isn't exactly easy to tell. But, after setting the <code>col</code>-pin to <code>low</code>, there's a <code>250ns</code> wait.</p>
+<p>I implement it, and it changes nothing. On to the next!</p>
+<div class="highlight highlight-c"><pre><span class="pl-c">/// quantum/matrix_common.c</span>
+<span class="pl-en">__attribute__</span>((weak)) void matrix_output_unselect_delay(<span class="pl-c1">uint8_t</span> line, <span class="pl-k">bool</span> key_pressed) {
+    <span class="pl-c1">matrix_io_delay</span>();
+}
+<span class="pl-c">/// quantum/matrix_common.c</span>
+<span class="pl-c">/* `matrix_io_delay ()` exists for backwards compatibility. From now on, use matrix_output_unselect_delay(). */</span>
+<span class="pl-en">__attribute__</span>((weak)) void matrix_io_delay(<span class="pl-k">void</span>) {
+    <span class="pl-c1">wait_us</span>(MATRIX_IO_DELAY);
+}
+<span class="pl-c">// quantum/matrix_common.c</span>
+#<span class="pl-k">ifndef</span> MATRIX_IO_DELAY
+#    <span class="pl-k">define</span> <span class="pl-en">MATRIX_IO_DELAY</span> <span class="pl-c1">30</span>
+#<span class="pl-k">endif</span>
+</pre></div>
+<p>for all of the above symbols, I need to check that it's not specifically overridden by my keyboard implementation,
+none were. <code>matrix_output_unselect_delay(current_col, key_pressed)</code> therefore waits <code>30μs</code>.</p>
+<p>I add the delay and the number of t's go from 19 to sometimes <em>many</em>, good not great. But, my scan-rate which is directly influencing
+latency on presses goes from around <code>40μs</code> to <code>200μs+</code> (6 columns, each with a <code>30μs</code> sleep), unacceptable. The above code did come with a comment,
+it wants the row-pins to settle back into <code>high</code>, so I could just check for that instead!</p>
+<div class="highlight highlight-rust"><pre><span class="pl-c">// Wait for all rows to settle</span>
+<span class="pl-k">for</span> <span class="pl-smi">row</span> <span class="pl-k">in</span> <span class="pl-smi">rows</span> {
+    <span class="pl-k">while</span> <span class="pl-en">matches!</span>(<span class="pl-smi">row</span><span class="pl-k">.</span><span class="pl-c1">0.</span><span class="pl-en">is_low</span>(), <span class="pl-en">Ok</span>(<span class="pl-c1">true</span>)) {}
+}
+</pre></div>
+<p>Now latency lands around <code>50μs</code>. I still have that issue of the many t's, but at least the problem didn't get worse.</p>
+<p>I hook up the keyboard to <code>picocom</code> and start reading output lines.<br>
+I output each state-delta as <code>M0, R0, C0 -> true [90237]</code>, matrix index, row_index, column index, and whether the key
+is pressed or not, followed by the number of microseconds since the last state-change.</p>
+<p>I can see that the activation-behavior is strange, sometimes, immediately (generally around <code>250μs</code> after a
+legitimate key-action) state-flips unexpectedly and holds in the ghost-state for <code>100-2500μs</code>.
+It's not a rogue flip, the state is actually changed as if the switch is pressed (or released) for quite some time.</p>
+<p>However much I tried, I could not get these ghosts out of my keyboard, I had to learn to live with them.</p>
+<h4>Debouncing</h4>
+<p>Debouncing is a way to regulate signals (I think, this really isn't my field, don't roast me on the definitions), and
+is a broad concept which can be applied to noisy signals in all kinds of areas.</p>
+<p>I wanted to implement debouncing in a way that affected latency minimally, luckily this behaviour is only triggered
+after legitimate key-actions, and on a per-key basis. IE. I only have to regulate keys after the first signal which I
+know is good, and only for the same key that produced the good signal.</p>
+<p>I record the last key-action and set up quarantine logic, it goes like this:</p>
+<blockquote>
+<p>If a key has a delta shortly (implemented with a constant, 10_000 micros at writing) after the previous delta,
+require that the new state is repeated for a short (same as above) time before producing a signal.</p>
+</blockquote>
+<p>My fastest repeated key-pressing of a single key is around <code>40_000μs</code> between presses, so this should not activate
+on good presses. Furthermore, if it does and that state is held for long enough the key comes through anyway.</p>
+<p>This worked like a charm, on a given keypress it should not increase latency at all, but it killed the noise.</p>
+<h3>Mysterious halting</h3>
+<p>At some point of developing the keymap, the keyboard would start freezing on boot, not producing any output.
+I couldn't understand why, but <code>core1</code>, which handles key-presses wouldn't report anything. Once more I had to
+get the dedicated boot-skewer out to flash new firmware.</p>
+<p>I started removing the latest changes and realized that scanning 5 columns for changes but not 6 on the left side
+would work fine. Adding back scanning 6 columns would freeze immediately again.</p>
+<p>I took a break and when doing something else it suddenly struck me, here!</p>
+<div class="highlight highlight-rust"><pre>#[allow(static_mut_refs)]
+<span class="pl-k">if</span> <span class="pl-k">let</span> <span class="pl-en">Err</span>(<span class="pl-smi">_e</span>) <span class="pl-k">=</span> <span class="pl-smi">mc</span><span class="pl-k">.</span><span class="pl-en">cores</span>()[<span class="pl-c1">1</span>]<span class="pl-k">.</span><span class="pl-en">spawn</span>(<span class="pl-k">unsafe</span> { <span class="pl-k">&#x26;mut</span> <span class="pl-c1">CORE_1_STACK_AREA</span> }, <span class="pl-k">move</span> <span class="pl-k">||</span> {
+    <span class="pl-en">run_core1</span>(
+        <span class="pl-smi">receiver</span>,
+        <span class="pl-smi">left_buttons</span>,
+        <span class="pl-smi">timer</span>,
+        #[<span class="pl-en">cfg</span>(<span class="pl-smi">feature</span> <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>hiddev<span class="pl-pds">"</span></span>)]
+        <span class="pl-smi">usb_bus</span>,
+    )
+})
+</pre></div>
+<p>Can you see it?</p>
+<p>Well?</p>
+<p>The unsafe draws the attention, but I'm manually setting the stack area for core1:</p>
+<p><code>static mut CORE_1_STACK_AREA: [usize; 1024] = [0; 1024];</code></p>
+<p>When adding the code for a 6th row, the stack overflows and the core halts, increasing the stack area
+immediately solved the issue.</p>
+<h2>Performance</h2>
+<p>Now the keyboard is actually usable, time for the fun part, performance. This is my first real embedded project,
+and I learned a lot programming for a different target.</p>
+<h3>Real time</h3>
+<p>First off, since there's not much of a scheduler running (disregarding interrupts) the displayed scan rate on the
+oleds gives very direct feedback on changes in performance, usually it's much more difficult to see how code-changes
+impact performance, but here it's immediate and easy to spot.</p>
+<h3>Priorities</h3>
+<p>Measurement is the key to performance, and the measurements of interests are, in order, scan rate, key-processing-rate,
+and binary size. Scan rate is important, because that determines the latency of key-press -> OS,
+secondly, key-processing can't be too slow since that immediately tacks on to the latency, lastly there's a size restriction
+of 2MB on the produced image.</p>
+<h3>Methodology</h3>
+<p>The oled displays scan rate, so that's easy. Key-processing-rate can't be measured as easily. However, jamming
+the keyboard at max speed and checking the scan rate was used as a proxy. Binary size can be inspected on compilation.</p>
+<h2>Inlining</h2>
+<p>When people talk about performance <a href="https://en.wikipedia.org/wiki/Inline_expansion">inlining</a> often comes up.</p>
+<p>Briefly, inlining is replacing a function call with the code from that function at the call-site, here's an example.</p>
+<div class="highlight highlight-rust"><pre>
+<span class="pl-k">fn</span> <span class="pl-en">my_add</span>(<span class="pl-smi">a</span><span class="pl-k">:</span> <span class="pl-en">i32</span>, <span class="pl-smi">b</span><span class="pl-k">:</span> <span class="pl-en">i32</span>) <span class="pl-k">-></span> <span class="pl-en">i32</span> {
+    <span class="pl-smi">a</span> <span class="pl-k">+</span> <span class="pl-smi">b</span>
+}
+<span class="pl-k">fn</span> <span class="pl-en">not_inlined_caller</span>() {
+<span class="pl-c">    // Not inlined the function is called, moving 1, and 2 into the correct ABI-defined registers</span>
+<span class="pl-c">    // then invoking the function.</span>
+    <span class="pl-en">my_add</span>(<span class="pl-c1">1</span>, <span class="pl-c1">2</span>); 
+}
+<span class="pl-k">fn</span> <span class="pl-en">inlined_caller_after_inlining</span>() {
+<span class="pl-c">    // my_add(1, 2) &#x3C;- disappears</span>
+    <span class="pl-c1">1</span> <span class="pl-k">+</span> <span class="pl-c1">2</span><span class="pl-c"> // &#x3C;- `my_add` function body copied into this function</span>
+}
+</pre></div>
+<p>Inlining reduces some overhead, such as shuffling around values to registers, and invoking functions,
+but all that copying of code can produce a lot of instructions, which may thrash the CPU's instruction cache.</p>
+<p>Here's an example of how that could become problematic:</p>
+<div class="highlight highlight-rust"><pre>#[inline]
+<span class="pl-k">fn</span> <span class="pl-en">my_very_long_fn</span>() {
+<span class="pl-c">    // 1000 lines of spooky code</span>
+}
+<span class="pl-k">fn</span> <span class="pl-en">my_caller</span>(<span class="pl-smi">rarely_true</span><span class="pl-k">:</span> <span class="pl-en">bool</span>) {
+    <span class="pl-k">if</span> <span class="pl-smi">rarely_true</span> {
+        <span class="pl-en">my_very_long_fn</span>();
+    }
+}
+</pre></div>
+<p>Depending on the CPU, it might, on entering <code>my_caller</code>, have to fetch all the instructions contained in <code>my_very_long_fn</code>
+draining space in the instruction cache resulting in re-fetches which may take a long time.
+If <code>rarely_true</code> is rarely true this could be an unnecessary overhead, and if the function is long enough, the
+eventual savings from inlining may pale in comparison to the execution-time of the inlined function meaning that
+there's no upside in the <code>rarely_true == true</code>-case, and huge downside in the <code>rarely_true == false</code>-case.</p>
+<p>It's hard to draw general conclusions however, you have to measure to be sure, luckily I measured!</p>
+<h3>Inlining in practice</h3>
+<p>There weren't huge surprises on where inlining made the most difference, but I was surprised with how much it mattered.</p>
+<p>The general logic of <code>core1</code> is this:</p>
+<ol>
+<li>Check for changes (uart, gpio, usb).
+<li>On a change, execute some logic (left side sends a keypress to the OS, right side sends it to the left).
+<li>Report changes to <code>core0</code>.
+</ol>
+<p>The vast majority of the time each loop produces no change, here's an excerpt from left side <code>core1</code>:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">loop</span> {
+    <span class="pl-k">let</span> <span class="pl-k">mut</span> <span class="pl-smi">any_change</span> <span class="pl-k">=</span> <span class="pl-c1">false</span>;
+    <span class="pl-k">if</span> <span class="pl-k">let</span> <span class="pl-en">Some</span>(<span class="pl-smi">update</span>) <span class="pl-k">=</span> <span class="pl-smi">receiver</span><span class="pl-k">.</span><span class="pl-en">try_read</span>() {
+<span class="pl-c">        // Right side sent an update</span>
+        <span class="pl-smi">rx</span> <span class="pl-k">+=</span> <span class="pl-c1">1</span>;
+<span class="pl-c">        // Update report state</span>
+        <span class="pl-smi">kbd</span><span class="pl-k">.</span><span class="pl-en">update_right</span>(<span class="pl-smi">update</span>, <span class="pl-k">&#x26;mut</span> <span class="pl-smi">report_state</span>);
+        <span class="pl-smi">any_change</span> <span class="pl-k">=</span> <span class="pl-c1">true</span>;
+    }
+<span class="pl-c">    // Check left side gpio and update report state</span>
+    <span class="pl-k">if</span> <span class="pl-smi">kbd</span><span class="pl-k">.</span><span class="pl-en">scan_left</span>(<span class="pl-k">&#x26;mut</span> <span class="pl-smi">left_buttons</span>, <span class="pl-k">&#x26;mut</span> <span class="pl-smi">report_state</span>, <span class="pl-smi">timer</span>) {
+        <span class="pl-smi">any_change</span> <span class="pl-k">=</span> <span class="pl-c1">true</span>;
+    }
+    <span class="pl-k">if</span> <span class="pl-smi">any_change</span> {
+        <span class="pl-en">push_touch_to_admin</span>();
+    }
+    #[cfg(feature <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>hiddev<span class="pl-pds">"</span></span>)]
+    {
+        <span class="pl-k">let</span> <span class="pl-k">mut</span> <span class="pl-smi">pop</span> <span class="pl-k">=</span> <span class="pl-c1">false</span>;
+        <span class="pl-k">if</span> <span class="pl-k">let</span> <span class="pl-en">Some</span>(<span class="pl-smi">next_update</span>) <span class="pl-k">=</span> <span class="pl-smi">report_state</span><span class="pl-k">.</span><span class="pl-en">report</span>() {
+<span class="pl-c">            // Publish the next update on queue if present</span>
+            <span class="pl-k">unsafe</span> {
+                <span class="pl-smi">pop</span> <span class="pl-k">=</span> <span class="pl-k">crate::</span><span class="pl-en">runtime</span><span class="pl-k">::</span><span class="pl-en">shared</span><span class="pl-k">::</span><span class="pl-en">usb</span><span class="pl-k">::</span><span class="pl-en">try_push_report</span>(<span class="pl-smi">next_update</span>);
+            }
+        }
+        <span class="pl-k">if</span> <span class="pl-smi">pop</span> {
+<span class="pl-c">            // Remove the sent report (it's down here because of the borrow checker)</span>
+            <span class="pl-smi">report_state</span><span class="pl-k">.</span><span class="pl-en">accept</span>();
+        }
+    }
+    <span class="pl-k">if</span> <span class="pl-k">let</span> <span class="pl-en">Some</span>(<span class="pl-smi">change</span>) <span class="pl-k">=</span> <span class="pl-smi">report_state</span><span class="pl-k">.</span><span class="pl-en">layer_update</span>() {
+        <span class="pl-en">push_layer_change</span>(<span class="pl-smi">change</span>);
+    }
+    <span class="pl-k">if</span> <span class="pl-smi">rx</span> <span class="pl-k">></span> <span class="pl-c1">0</span> <span class="pl-k">&#x26;&#x26;</span> <span class="pl-en">push_rx_change</span>(<span class="pl-smi">rx</span>) {
+        <span class="pl-smi">rx</span> <span class="pl-k">=</span> <span class="pl-c1">0</span>;
+    }
+    <span class="pl-k">if</span> <span class="pl-smi">loop_count</span><span class="pl-k">.</span><span class="pl-en">increment</span>() {
+        <span class="pl-k">let</span> <span class="pl-smi">now</span> <span class="pl-k">=</span> <span class="pl-smi">timer</span><span class="pl-k">.</span><span class="pl-en">get_counter</span>();
+        <span class="pl-k">let</span> <span class="pl-smi">lc</span> <span class="pl-k">=</span> <span class="pl-smi">loop_count</span><span class="pl-k">.</span><span class="pl-en">value</span>(<span class="pl-smi">now</span>);
+        <span class="pl-k">if</span> <span class="pl-en">push_loop_to_admin</span>(<span class="pl-smi">lc</span>) {
+            <span class="pl-smi">loop_count</span><span class="pl-k">.</span><span class="pl-en">reset</span>(<span class="pl-smi">now</span>);
+        }
+    }
+}
+</pre></div>
+<p>Some of the code in that loop is only triggered in certain cases, I followed the philosophy of inlining most of what
+always runs, and refusing to inline things that are conditionally called, Rust has facilities for this:</p>
+<p><code>#[inline]</code>, <code>#[inline(never)]</code>, and <code>#[inline(always)]</code>, the compiler is usually smart enough that it makes
+the correct call if <code>#[inline]</code> is specified or not, so <code>#[inline(never)]</code>, and <code>#[inline(always)]</code> aren't that necessary.</p>
+<p><a href="https://nnethercote.github.io/perf-book/inlining.html">More information here</a> on cross-crate stuff, but I'm compiling
+with <a href="https://doc.rust-lang.org/rustc/codegen-options/index.html">fat-lto</a> anyway, so it doesn't really matter to me here.</p>
+<p>The most impressive change was removing <code>#[inline]</code> from <code>kbd.update_right(update, &#x26;mut report_state);</code> inside the
+if-statement above, that took the current scan latency from <code>80μs</code> to around <code>36μs</code>. Not inlining it halved the
+scan latency.</p>
+<p>Last notes on inlining, the compiler makes decisions about inlining that can be very hard to understand, you change
+something seemingly irrelevant, and suddenly the binary increases in size by 25% and latency increases by about
+the same amount because the compiler decided to inline something that doesn't fit with your performance goals.
+I want the scan-loop to be fast, but the compiler saw an opportunity to make something else fast at the expense of
+the scan-loop, for example. It's not a <em>bad</em> decision, but it's a bad fit.<br>
+Making small changes and testing them is therefore important, and interesting!</p>
+<h2>Const evaluation, bounds checking</h2>
+<p>Fewer instructions are often better, fewer instructions are generally faster to execute than more instructions,
+they take up less space in the instruction-cache, and may therefore make an inlining-tradeoff make more sense.</p>
+<p>This <code>get_unchecked</code> which elides the bounds-check made a massive difference in performance.</p>
+<div class="highlight highlight-rust"><pre><span class="pl-c">/// self.buffer[self.tail] -> unsafe {self.buffer.get_unchecked_mut(self.tail)};</span>
+</pre></div>
+<p>It did it in two parts, it caused the compiler to inline the function, that in itself did a lot.
+I manually marked it inline and reverted the change, and it still provided a several microsecond benefit.
+Since I do bounds-checking elsewhere, I was confident keeping this <code>unsafe</code>.</p>
+<p>To further improve performance I wanted to evaluate as much as possible at compilation time, so that things are
+accessed efficiently, if I can assert that indices are in bounds at comptime, I can safely use unsafe
+index accesses. Rust's type system provides tools for that, and since I know how many keys I have on my keyboard,
+I don't have to have any dynamically sized arrays.</p>
+<p>Here's an example:</p>
+<div class="highlight highlight-rust"><pre>#[repr(transparent)]
+#[derive(<span class="pl-en">Debug</span>, <span class="pl-en">Copy</span>, <span class="pl-en">Clone</span>)]
+<span class="pl-k">pub</span> <span class="pl-k">struct</span> <span class="pl-en">RowIndex</span>(<span class="pl-k">pub</span> <span class="pl-en">u8</span>);
+<span class="pl-k">impl</span> <span class="pl-en">RowIndex</span> {
+    #[must_use]
+    #[allow(clippy<span class="pl-k">::</span>missing_panics_doc)]
+    <span class="pl-k">pub</span> <span class="pl-k">const</span> <span class="pl-k">fn</span> <span class="pl-en">from_value</span>(<span class="pl-smi">ind</span><span class="pl-k">:</span> <span class="pl-en">u8</span>) <span class="pl-k">-></span> <span class="pl-c1">Self</span> {
+        <span class="pl-en">assert!</span>(
+            <span class="pl-smi">ind</span> <span class="pl-k">&#x3C;</span> <span class="pl-c1">NUM_ROWS</span>,
+            <span class="pl-s"><span class="pl-pds">"</span>Tried to construct row index from a bad value<span class="pl-pds">"</span></span>
+        );
+        <span class="pl-c1">Self</span>(<span class="pl-smi">ind</span>)
+    }
+    #[inline]
+    #[must_use]
+    <span class="pl-k">pub</span> <span class="pl-k">const</span> <span class="pl-k">fn</span> <span class="pl-en">index</span>(<span class="pl-c1">self</span>) <span class="pl-k">-></span> <span class="pl-en">usize</span> {
+        <span class="pl-c1">self</span><span class="pl-k">.</span><span class="pl-c1">0</span> <span class="pl-k">as</span> <span class="pl-en">usize</span>
+    }
+}
+</pre></div>
+<p>The <code>RowIndex</code>-struct only accepts indices that are valid, therefore it's always safe to use to index into
+structures with <code>NUM_ROWS</code> length or more.</p>
+<p>Using this strategy to elide bounds-checking shaved more microseconds off the loop-times. Since pin-indexing
+is done on the <code>gpio</code> pin-scan on each loop, these improvements makes quite the difference.</p>
+<h2>Macros to avoid branching</h2>
+<p>I abhor macros, they're difficult to follow and understand, and professionally I try to avoid them like the plague.
+But, here in my private life it's all about the performance, and they can be useful to avoid branching.</p>
+<p>Consider the connection of the actual GPIO-pin, and the struct that I use to keep a pin's state in memory.</p>
+<p>They have different types, all the GPIO-pins have different types, and all the keys as well, they can't be
+kept in a collection together without using a <a href="https://doc.rust-lang.org/std/keyword.dyn.html">v-table</a>. This, in my opinion, is fixable in <code>Rust</code>.
+The reason that the buttons, for example, can't be kept together, is that each button may have a different memory layout.</p>
+<p>In my case they all have the same layout and all expose the same function, here's an example:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">impl</span> <span class="pl-en">KeyboardButton</span> <span class="pl-k">for</span> <span class="pl-en">LeftRow0Col0</span> {
+    <span class="pl-k">fn</span> <span class="pl-en">on_press</span>(<span class="pl-k">&#x26;mut</span> <span class="pl-c1">self</span>, <span class="pl-smi">keyboard_report_state</span><span class="pl-k">:</span> <span class="pl-k">&#x26;mut</span> <span class="pl-en">KeyboardReportState</span>) {
+        <span class="pl-smi">keyboard_report_state</span><span class="pl-k">.</span><span class="pl-en">push_key</span>(<span class="pl-en">KeyCode</span><span class="pl-k">::</span><span class="pl-c1">TAB</span>);
+    }
+    <span class="pl-k">fn</span> <span class="pl-en">on_release</span>(
+        <span class="pl-k">&#x26;mut</span> <span class="pl-c1">self</span>,
+        <span class="pl-smi">_last_press_state</span><span class="pl-k">:</span> <span class="pl-en">LastPressState</span>,
+        <span class="pl-smi">keyboard_report_state</span><span class="pl-k">:</span> <span class="pl-k">&#x26;mut</span> <span class="pl-en">KeyboardReportState</span>,
+    ) {
+        <span class="pl-smi">keyboard_report_state</span><span class="pl-k">.</span><span class="pl-en">pop_key</span>(<span class="pl-en">KeyCode</span><span class="pl-k">::</span><span class="pl-c1">TAB</span>);
+    }
+}
+</pre></div>
+<p>I generate the key-structs from a macro, they all have the exact same layout.
+I should be able to store them in an array (assuming that the function addresses of each respective button's methods are knowable
+which thinking about it, they might not be).</p>
+<p>Macros are a way around this though:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-en">macro_rules!</span> <span class="pl-en">impl_read_pin_col</span> {
+    (<span class="pl-k">$</span>(<span class="pl-k">$</span><span class="pl-smi">structure</span><span class="pl-k">:</span> <span class="pl-smi">expr</span>, <span class="pl-k">$</span><span class="pl-smi">row</span><span class="pl-k">:</span> <span class="pl-smi">tt</span>,)<span class="pl-k">*</span>, <span class="pl-k">$</span><span class="pl-smi">col</span><span class="pl-k">:</span> <span class="pl-smi">tt</span>) <span class="pl-k">=></span> {
+        <span class="pl-en">paste!</span> {
+            <span class="pl-k">pub</span> <span class="pl-k">fn</span> [&#x3C;<span class="pl-smi">read_col</span> <span class="pl-smi">_</span> <span class="pl-k">$</span><span class="pl-smi">col</span> <span class="pl-smi">_pins</span>>](<span class="pl-k">$</span>([&#x3C; <span class="pl-k">$</span><span class="pl-smi">structure</span><span class="pl-k">:</span><span class="pl-smi">snake</span> >]<span class="pl-k">:</span> <span class="pl-k">&#x26;mut</span> <span class="pl-k">$</span><span class="pl-smi">structure</span>,)<span class="pl-k">*</span> <span class="pl-smi">left_buttons</span><span class="pl-k">:</span> <span class="pl-k">&#x26;mut</span> <span class="pl-en">LeftButtons</span>, <span class="pl-smi">keyboard_report_state</span><span class="pl-k">:</span> <span class="pl-k">&#x26;mut</span> <span class="pl-en">KeyboardReportState</span>, <span class="pl-smi">timer</span><span class="pl-k">:</span> <span class="pl-en">Timer</span>) <span class="pl-k">-></span> <span class="pl-en">bool</span> {
+<span class="pl-c">                // Safety: Make sure this is properly initialized and restored</span>
+<span class="pl-c">                // at the end of this function, makes a noticeable difference in performance</span>
+                <span class="pl-k">let</span> <span class="pl-smi">col</span> <span class="pl-k">=</span> <span class="pl-k">unsafe</span> {<span class="pl-smi">left_buttons</span><span class="pl-k">.</span>cols<span class="pl-k">.$</span><span class="pl-smi">col</span><span class="pl-k">.</span><span class="pl-en">take</span>()<span class="pl-k">.</span><span class="pl-en">unwrap_unchecked</span>()};
+                <span class="pl-k">let</span> <span class="pl-smi">col</span> <span class="pl-k">=</span> <span class="pl-smi">col</span><span class="pl-k">.</span><span class="pl-en">into_push_pull_output_in_state</span>(<span class="pl-en">PinState</span><span class="pl-k">::</span><span class="pl-en">Low</span>);
+<span class="pl-c">                // Just pulling chibios defaults of 0.25 micros, could probably be 0</span>
+                <span class="pl-k">crate::</span><span class="pl-en">timer</span><span class="pl-k">::</span><span class="pl-en">wait_nanos</span>(<span class="pl-smi">timer</span>, <span class="pl-c1">250</span>);
+                <span class="pl-k">let</span> <span class="pl-k">mut</span> <span class="pl-smi">any_change</span> <span class="pl-k">=</span> <span class="pl-c1">false</span>;
+                <span class="pl-k">$</span>(
+                    {
+                        <span class="pl-k">if</span> [&#x3C; <span class="pl-k">$</span><span class="pl-smi">structure</span><span class="pl-k">:</span><span class="pl-smi">snake</span> >]<span class="pl-k">.</span><span class="pl-en">check_update_state</span>(<span class="pl-smi">left_buttons</span><span class="pl-k">.</span><span class="pl-en">row_pin_is_low</span>(<span class="pl-en">rp2040_kbd_lib</span><span class="pl-k">::</span><span class="pl-en">matrix</span><span class="pl-k">::</span><span class="pl-en">RowIndex</span><span class="pl-k">::</span><span class="pl-en">from_value</span>(<span class="pl-k">$</span><span class="pl-smi">row</span>)), <span class="pl-smi">keyboard_report_state</span>, <span class="pl-smi">timer</span>) {
+                            <span class="pl-smi">any_change</span> <span class="pl-k">=</span> <span class="pl-c1">true</span>;
+                        }
+                    }
+                )<span class="pl-k">*</span>
+                <span class="pl-smi">left_buttons</span><span class="pl-k">.</span>cols<span class="pl-k">.$</span><span class="pl-smi">col</span> <span class="pl-k">=</span> <span class="pl-en">Some</span>(<span class="pl-smi">col</span><span class="pl-k">.</span><span class="pl-en">into_pull_up_input</span>());
+                <span class="pl-k">$</span>(
+                    {
+                        <span class="pl-k">while</span> <span class="pl-smi">left_buttons</span><span class="pl-k">.</span><span class="pl-en">row_pin_is_low</span>(<span class="pl-en">rp2040_kbd_lib</span><span class="pl-k">::</span><span class="pl-en">matrix</span><span class="pl-k">::</span><span class="pl-en">RowIndex</span><span class="pl-k">::</span><span class="pl-en">from_value</span>(<span class="pl-k">$</span><span class="pl-smi">row</span>)) {}
+                    }
+                )<span class="pl-k">*</span>
+                <span class="pl-smi">any_change</span>
+            }
+        }
+    };
+}
+</pre></div>
+<p>Here's how it's used:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-en">impl_read_pin_col!</span>(
+    <span class="pl-en">LeftRow0Col1</span>, <span class="pl-c1">0</span>,
+    <span class="pl-en">LeftRow1Col1</span>, <span class="pl-c1">1</span>,
+    <span class="pl-en">LeftRow2Col1</span>, <span class="pl-c1">2</span>,
+    <span class="pl-en">LeftRow3Col1</span>, <span class="pl-c1">3</span>,
+    <span class="pl-en">LeftRow4Col1</span>, <span class="pl-c1">4</span>,
+    ,<span class="pl-c1">1</span>
+); 
+<span class="pl-c">// Produces function `read_col_1_pins` with proper typechecking</span>
+<span class="pl-k">let</span> <span class="pl-smi">col1_change</span> <span class="pl-k">=</span> <span class="pl-en">read_col_1_pins</span>(
+    <span class="pl-k">&#x26;mut</span> <span class="pl-c1">self</span><span class="pl-k">.</span>left_row0_col1,
+    <span class="pl-k">&#x26;mut</span> <span class="pl-c1">self</span><span class="pl-k">.</span>left_row1_col1,
+    <span class="pl-k">&#x26;mut</span> <span class="pl-c1">self</span><span class="pl-k">.</span>left_row2_col1,
+    <span class="pl-k">&#x26;mut</span> <span class="pl-c1">self</span><span class="pl-k">.</span>left_row3_col1,
+    <span class="pl-k">&#x26;mut</span> <span class="pl-c1">self</span><span class="pl-k">.</span>left_row4_col1,
+    <span class="pl-smi">left_buttons</span>,
+    <span class="pl-smi">keyboard_report_state</span>,
+    <span class="pl-smi">timer</span>,
+);
+</pre></div>
+<p>In practice the macro code is inlined like this:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">pub</span> <span class="pl-k">fn</span> <span class="pl-en">read_col_1_pins</span>(<span class="pl-smi">left_row0_col1</span><span class="pl-k">:</span> <span class="pl-k">&#x26;mut</span> <span class="pl-en">LeftRow0Col1</span>, <span class="pl-smi">left_row1_col1</span><span class="pl-k">:</span> <span class="pl-k">&#x26;mut</span> <span class="pl-en">LeftRow1Col1</span>, <span class="pl-smi">left_row2_col1</span><span class="pl-k">:</span> <span class="pl-k">&#x26;mut</span> <span class="pl-en">LeftRow2Col1</span>, <span class="pl-smi">left_row3_col1</span><span class="pl-k">:</span> <span class="pl-k">&#x26;mut</span> <span class="pl-en">LeftRow3Col1</span>, <span class="pl-smi">left_row4_col1</span><span class="pl-k">:</span> <span class="pl-k">&#x26;mut</span> <span class="pl-en">LeftRow4Col1</span>, <span class="pl-smi">left_buttons</span><span class="pl-k">:</span> <span class="pl-k">&#x26;mut</span> <span class="pl-en">LeftButtons</span>, <span class="pl-smi">keyboard_report_state</span><span class="pl-k">:</span> <span class="pl-k">&#x26;mut</span> <span class="pl-en">KeyboardReportState</span>, <span class="pl-smi">timer</span><span class="pl-k">:</span> <span class="pl-en">Timer</span>) <span class="pl-k">-></span> <span class="pl-en">bool</span> {
+    <span class="pl-k">let</span> <span class="pl-smi">col</span> <span class="pl-k">=</span> <span class="pl-k">unsafe</span> {
+        <span class="pl-smi">left_buttons</span><span class="pl-k">.</span>cols<span class="pl-k">.</span><span class="pl-c1">1</span>
+            <span class="pl-k">.</span><span class="pl-en">take</span>()<span class="pl-k">.</span><span class="pl-en">unwrap_unchecked</span>()
+    };
+    <span class="pl-k">let</span> <span class="pl-smi">col</span> <span class="pl-k">=</span> <span class="pl-smi">col</span><span class="pl-k">.</span><span class="pl-en">into_push_pull_output_in_state</span>(<span class="pl-en">PinState</span><span class="pl-k">::</span><span class="pl-en">Low</span>);
+    <span class="pl-k">crate::</span><span class="pl-en">timer</span><span class="pl-k">::</span><span class="pl-en">wait_nanos</span>(<span class="pl-smi">timer</span>, <span class="pl-c1">250</span>);
+    <span class="pl-k">let</span> <span class="pl-k">mut</span> <span class="pl-smi">any_change</span> <span class="pl-k">=</span> <span class="pl-c1">false</span>;
+    {
+        <span class="pl-k">if</span> <span class="pl-smi">left_row0_col1</span><span class="pl-k">.</span><span class="pl-en">check_update_state</span>(<span class="pl-smi">left_buttons</span><span class="pl-k">.</span><span class="pl-en">row_pin_is_low</span>(<span class="pl-en">rp2040_kbd_lib</span><span class="pl-k">::</span><span class="pl-en">matrix</span><span class="pl-k">::</span><span class="pl-en">RowIndex</span><span class="pl-k">::</span><span class="pl-en">from_value</span>(<span class="pl-c1">0</span>)), <span class="pl-smi">keyboard_report_state</span>, <span class="pl-smi">timer</span>) {
+            <span class="pl-smi">any_change</span> <span class="pl-k">=</span> <span class="pl-c1">true</span>;
+        }
+    }
+    {
+        <span class="pl-k">if</span> <span class="pl-smi">left_row1_col1</span><span class="pl-k">.</span><span class="pl-en">check_update_state</span>(<span class="pl-smi">left_buttons</span><span class="pl-k">.</span><span class="pl-en">row_pin_is_low</span>(<span class="pl-en">rp2040_kbd_lib</span><span class="pl-k">::</span><span class="pl-en">matrix</span><span class="pl-k">::</span><span class="pl-en">RowIndex</span><span class="pl-k">::</span><span class="pl-en">from_value</span>(<span class="pl-c1">1</span>)), <span class="pl-smi">keyboard_report_state</span>, <span class="pl-smi">timer</span>) {
+            <span class="pl-smi">any_change</span> <span class="pl-k">=</span> <span class="pl-c1">true</span>;
+        }
+    }
+    {
+        <span class="pl-k">if</span> <span class="pl-smi">left_row2_col1</span><span class="pl-k">.</span><span class="pl-en">check_update_state</span>(<span class="pl-smi">left_buttons</span><span class="pl-k">.</span><span class="pl-en">row_pin_is_low</span>(<span class="pl-en">rp2040_kbd_lib</span><span class="pl-k">::</span><span class="pl-en">matrix</span><span class="pl-k">::</span><span class="pl-en">RowIndex</span><span class="pl-k">::</span><span class="pl-en">from_value</span>(<span class="pl-c1">2</span>)), <span class="pl-smi">keyboard_report_state</span>, <span class="pl-smi">timer</span>) {
+            <span class="pl-smi">any_change</span> <span class="pl-k">=</span> <span class="pl-c1">true</span>;
+        }
+    }
+    {
+        <span class="pl-k">if</span> <span class="pl-smi">left_row3_col1</span><span class="pl-k">.</span><span class="pl-en">check_update_state</span>(<span class="pl-smi">left_buttons</span><span class="pl-k">.</span><span class="pl-en">row_pin_is_low</span>(<span class="pl-en">rp2040_kbd_lib</span><span class="pl-k">::</span><span class="pl-en">matrix</span><span class="pl-k">::</span><span class="pl-en">RowIndex</span><span class="pl-k">::</span><span class="pl-en">from_value</span>(<span class="pl-c1">3</span>)), <span class="pl-smi">keyboard_report_state</span>, <span class="pl-smi">timer</span>) {
+            <span class="pl-smi">any_change</span> <span class="pl-k">=</span> <span class="pl-c1">true</span>;
+        }
+    }
+    {
+        <span class="pl-k">if</span> <span class="pl-smi">left_row4_col1</span><span class="pl-k">.</span><span class="pl-en">check_update_state</span>(<span class="pl-smi">left_buttons</span><span class="pl-k">.</span><span class="pl-en">row_pin_is_low</span>(<span class="pl-en">rp2040_kbd_lib</span><span class="pl-k">::</span><span class="pl-en">matrix</span><span class="pl-k">::</span><span class="pl-en">RowIndex</span><span class="pl-k">::</span><span class="pl-en">from_value</span>(<span class="pl-c1">4</span>)), <span class="pl-smi">keyboard_report_state</span>, <span class="pl-smi">timer</span>) {
+            <span class="pl-smi">any_change</span> <span class="pl-k">=</span> <span class="pl-c1">true</span>;
+        }
+    }
+    <span class="pl-smi">left_buttons</span><span class="pl-k">.</span>cols<span class="pl-k">.</span><span class="pl-c1">1</span>
+        <span class="pl-k">=</span> <span class="pl-en">Some</span>(<span class="pl-smi">col</span><span class="pl-k">.</span><span class="pl-en">into_pull_up_input</span>());
+    {
+        <span class="pl-k">while</span> <span class="pl-smi">left_buttons</span><span class="pl-k">.</span><span class="pl-en">row_pin_is_low</span>(<span class="pl-en">rp2040_kbd_lib</span><span class="pl-k">::</span><span class="pl-en">matrix</span><span class="pl-k">::</span><span class="pl-en">RowIndex</span><span class="pl-k">::</span><span class="pl-en">from_value</span>(<span class="pl-c1">0</span>)) {}
+    }
+    {
+        <span class="pl-k">while</span> <span class="pl-smi">left_buttons</span><span class="pl-k">.</span><span class="pl-en">row_pin_is_low</span>(<span class="pl-en">rp2040_kbd_lib</span><span class="pl-k">::</span><span class="pl-en">matrix</span><span class="pl-k">::</span><span class="pl-en">RowIndex</span><span class="pl-k">::</span><span class="pl-en">from_value</span>(<span class="pl-c1">1</span>)) {}
+    }
+    {
+        <span class="pl-k">while</span> <span class="pl-smi">left_buttons</span><span class="pl-k">.</span><span class="pl-en">row_pin_is_low</span>(<span class="pl-en">rp2040_kbd_lib</span><span class="pl-k">::</span><span class="pl-en">matrix</span><span class="pl-k">::</span><span class="pl-en">RowIndex</span><span class="pl-k">::</span><span class="pl-en">from_value</span>(<span class="pl-c1">2</span>)) {}
+    }
+    {
+        <span class="pl-k">while</span> <span class="pl-smi">left_buttons</span><span class="pl-k">.</span><span class="pl-en">row_pin_is_low</span>(<span class="pl-en">rp2040_kbd_lib</span><span class="pl-k">::</span><span class="pl-en">matrix</span><span class="pl-k">::</span><span class="pl-en">RowIndex</span><span class="pl-k">::</span><span class="pl-en">from_value</span>(<span class="pl-c1">3</span>)) {}
+    }
+    {
+        <span class="pl-k">while</span> <span class="pl-smi">left_buttons</span><span class="pl-k">.</span><span class="pl-en">row_pin_is_low</span>(<span class="pl-en">rp2040_kbd_lib</span><span class="pl-k">::</span><span class="pl-en">matrix</span><span class="pl-k">::</span><span class="pl-en">RowIndex</span><span class="pl-k">::</span><span class="pl-en">from_value</span>(<span class="pl-c1">4</span>)) {}
+    }
+    <span class="pl-smi">any_change</span>
+}
+</pre></div>
+<p>There is no access by index for the pins here, they are manually checked one-by-one.</p>
+<h3>Performance summary</h3>
+<p>In the end I took 4 measurements on the left side:</p>
+<ol>
+<li>Scan latency
+<li>Change originating from left scan loop latency
+<li>Change originating from right scan loop latency
+<li>Inter-core message queue capacity
+</ol>
+<p>And 3 on the right:</p>
+<ol>
+<li>Scan latency
+<li>Change loop latency
+<li>Inter-core message queue capacity
+</ol>
+<p>The scan latency has been talked about, it ended up at about <code>20μs</code> after optimizations,
+that is, each pin is checked every <code>20μs</code> if the keyboard is idle (on both sides).</p>
+<p>Changes originating from the left measures the loop latency, the time it takes before discovering a change
+to completely processing it, when a change comes from the left side gpio pins. That landed on about
+<code>60μs</code>. In other words, from starting to check for changes, to discovering and handling a change is
+<code>60μs</code>.</p>
+<p>Changes originating from the right measures the same as above but from the right side, that takes about
+<code>70μs</code>.</p>
+<p>Inter-core message queue capacity sits firmly at 0 on both sides, even though the consumer-core writes messages to oled,
+it doesn't get overwhelmed.</p>
+<p>On the right-side the latency on changes is only <code>25μs</code> however, since the
+left side handles all the logic contained in the keymap, this makes sense.</p>
+<h4>Rough calculation of worst case latency</h4>
+<p>This means that the keyboard <em>should</em> at most add a <code>70μs</code> latency overhead from the left, and <code>25μs</code> on the right,
+and be able to detect a change lasting for <code>20μs</code> or more on both sides.</p>
+<p>The transfer rate between sides is set by the uart baud-rate which is <code>781 250</code> bits per second.<br>
+This calculates to <code>10.24μs</code> per byte sent, all messages sent are at most <code>1</code> byte.</p>
+<hr>
+<p><em>Edit 2040-04-17</em></p>
+<p>I changed to protocol to be two bytes for robustness, but updated the baud-rate to 20x.</p>
+<p>This puts one message at <code>1.024μs</code> of latency with better robustness.</p>
+<hr>
+<p>Worst case scenario <em>should</em> therefore be the <code>os_poll_latency + left_side_right_change_latency + right_side_latency + transfer_latency</code>,
+which would be <code>1000μs + 70μs + 25μs + 10μs = 1105μs</code>, when a single key is pressed on the right side,<code>os_poll_latency + left_side_left_change_latency = 1060μs</code> on the left.</p>
+<h4>Caveat</h4>
+<p>This only holds for single presses, if the keymap outputs sequences like when I press <code>^</code>, on eu keyboards
+that needs a second press to activate, so that you can send symbols like <code>â</code>. However, I don't do that, I want <code>^</code>
+to go out immediately so when <code>^</code> is pressed, I send KeyDown <code>^</code> + KeyUp <code>^</code> + KeyDown <code>^</code>
+which makes the os-latency alone be <code>3000μs</code>.</p>
+<h2>End</h2>
+<p>This has been my longest writeup yet, it was my first real foray into embedded development, and it ended with
+me writing this on a keyboard running my own firmware.</p>
+<p>There's still stuff to iron out with the keymap, but I'm really happy with the result.<br>
+The firmware is fast and works, the two things that I care about, the code can be found <a href="https://github.com/MarcusGrass/rp2040-kbd">here</a>.</p>
+<h3>Thoughts on QMK</h3>
+<p>I went on a bit of a rant on QMK, but it's a great robust codebase, it could probably be reimplemented in Rust
+if one really wanted to, but it seems unnecessary, and my firmware does not at all attempt to do it.<br>
+Mostly the macro-parts would need some thinking over, because the way I did keymaps were a real mess of boilerplate-code
+that is not nice to work with.</p>
+</div>
+</div>
\ No newline at end of file
diff --git a/rust-linux-kernel-module.html b/rust-linux-kernel-module.html
new file mode 100644
index 0000000..bcd952b
--- /dev/null
+++ b/rust-linux-kernel-module.html
@@ -0,0 +1,1004 @@
+<!DOCTYPE html>
+<html lang="en" xmlns="http://www.w3.org/1999/html">
+
+    <meta charset="UTF-8">
+    <base href="/">
+    <link rel="stylesheet" href="static/styles.css">
+    <link rel="stylesheet" href="static/github-markdown.css">
+    <link rel="stylesheet" href="static/starry_night.css">
+    <title>RustLinuxKernelModule</title>
+
+
+<div id="menu">
+<a href=/ class="menu-item">Home</a><a href=/table-of-contents.html class="menu-item">Table of contents</a>
+</div>
+<div id="content">
+<div class="markdown-body"><h1>Rust for Linux, how hard is it to write a Kernel module in Rust at present?</h1>
+<p>Once again I'm back on parental leave, I've been lazily following the <a href="https://rust-for-linux.com/">Rust for Linux</a>
+effort but finally decided to get into it and write a simple kernel module in <code>Rust</code>.</p>
+<h2>Contents</h2>
+<p>This write-up is about writing a kernel module in <code>Rust</code> which will expose a file under <code>/proc/rust-proc-file</code>,
+the file is going to function as a regular file, but backed by pure ram.</p>
+<p>It'll go through zero-cost abstractions and how one can safely wrap <code>unsafe extern "C" fn</code>'s hiding away
+the gritty details of <code>C</code>-API's.</p>
+<p>It'll also go through numerous way of causing and avoiding UB, as well as some kernel internals.</p>
+<h2>Objective</h2>
+<p>I've been a Linux user for quite a while but have never tried my hand at contributing to the codebase,
+the reason is that I generally spend my free time writing things that I myself would use. Having
+that as a guide leads to me finishing my side-projects. There hasn't been something that I've wanted or needed
+that I've been unable to implement in user-space, so it just hasn't happened.</p>
+<p>Sadly, that's still the case, so I had to contrive something: A proc-file that works just like a regular file.</p>
+<h3>The /proc Filesystem</h3>
+<p>The stated purpose of the <code>/proc</code> filesystem is to "provide information about the running Linux System", read
+more about it <a href="https://www.kernel.org/doc/html/latest/filesystems/proc.html">here</a>.</p>
+<p>On a Linux machine with the <code>/proc</code> filesystem you can find process information e.g. under <code>/proc/&#x3C;pid>/..</code>,
+like memory usage, mounts, cpu-usage, fd's, etc. With the above stated purpose, and how the <code>/proc</code> filesystem is
+used, the purpose of this module doesn't quite fit, but for simplicity that's what I chose.</p>
+<h4>A proc 'file'</h4>
+<p>Proc files can be created by the kernels <code>proc_fs</code>-api, it lives <a href="https://github.com/Rust-for-Linux/linux/blob/e31f0a57ae1ab2f6e17adb8e602bc120ad722232/include/linux/proc_fs.h">here</a>.</p>
+<p>The function, <a href="https://github.com/Rust-for-Linux/linux/blob/e31f0a57ae1ab2f6e17adb8e602bc120ad722232/include/linux/proc_fs.h#L111">proc_create</a>, looks like this:</p>
+<div class="highlight highlight-c"><pre><span class="pl-k">struct</span> proc_dir_entry *<span class="pl-en">proc_create</span>(<span class="pl-k">const</span> <span class="pl-k">char</span> *name, <span class="pl-c1">umode_t</span> mode, <span class="pl-k">struct</span> proc_dir_entry *parent, <span class="pl-k">const</span> <span class="pl-k">struct</span> proc_ops *proc_ops);
+</pre></div>
+<p>When properly invoked it will create a file under <code>/proc/&#x3C;name></code> (if no parent is provided).</p>
+<p>That file is an interface to the kernel, a pseudo-file where the user interacts with it as a regular file on one end,
+and the kernel provides handlers for regular file-functionality on the other end (like <code>open</code>, <code>read</code>, <code>write</code>, <code>lseek</code>,
+etc.).</p>
+<p>That interface is provided through the last argument <code>...,proc_ops *proc_ops);...</code></p>
+<p><a href="https://github.com/Rust-for-Linux/linux/blob/e31f0a57ae1ab2f6e17adb8e602bc120ad722232/include/linux/proc_fs.h#L29">proc_ops</a> is a struct defined like this:</p>
+<div class="highlight highlight-c"><pre><span class="pl-k">struct</span> proc_ops {
+	<span class="pl-k">unsigned</span> <span class="pl-k">int</span> proc_flags;
+	<span class="pl-c1">int</span>	(*proc_open)(<span class="pl-k">struct</span> inode *, <span class="pl-k">struct</span> file *);
+	<span class="pl-c1">ssize_t</span>	(*proc_read)(<span class="pl-k">struct</span> file *, <span class="pl-k">char</span> __user *, <span class="pl-c1">size_t</span>, <span class="pl-c1">loff_t</span> *);
+	<span class="pl-c1">ssize_t</span> (*proc_read_iter)(<span class="pl-k">struct</span> kiocb *, <span class="pl-k">struct</span> iov_iter *);
+	<span class="pl-c1">ssize_t</span>	(*proc_write)(<span class="pl-k">struct</span> file *, <span class="pl-k">const</span> <span class="pl-k">char</span> __user *, <span class="pl-c1">size_t</span>, <span class="pl-c1">loff_t</span> *);
+	<span class="pl-c">/* mandatory unless nonseekable_open() or equivalent is used */</span>
+	<span class="pl-c1">loff_t</span>	(*proc_lseek)(<span class="pl-k">struct</span> file *, <span class="pl-c1">loff_t</span>, <span class="pl-k">int</span>);
+	<span class="pl-c1">int</span>	(*proc_release)(<span class="pl-k">struct</span> inode *, <span class="pl-k">struct</span> file *);
+	<span class="pl-c1">__poll_t</span> (*proc_poll)(<span class="pl-k">struct</span> file *, <span class="pl-k">struct</span> poll_table_struct *);
+	<span class="pl-c1">long</span>	(*proc_ioctl)(<span class="pl-k">struct</span> file *, <span class="pl-k">unsigned</span> <span class="pl-k">int</span>, <span class="pl-k">unsigned</span> <span class="pl-k">long</span>);
+#<span class="pl-k">ifdef</span> CONFIG_COMPAT
+	<span class="pl-c1">long</span>	(*proc_compat_ioctl)(<span class="pl-k">struct</span> file *, <span class="pl-k">unsigned</span> <span class="pl-k">int</span>, <span class="pl-k">unsigned</span> <span class="pl-k">long</span>);
+#<span class="pl-k">endif</span>
+	<span class="pl-c1">int</span>	(*proc_mmap)(<span class="pl-k">struct</span> file *, <span class="pl-k">struct</span> vm_area_struct *);
+	<span class="pl-k">unsigned</span> <span class="pl-smi">long</span> (*proc_get_unmapped_area)(<span class="pl-k">struct</span> file *, <span class="pl-k">unsigned</span> <span class="pl-k">long</span>, <span class="pl-k">unsigned</span> <span class="pl-k">long</span>, <span class="pl-k">unsigned</span> <span class="pl-k">long</span>, <span class="pl-k">unsigned</span> <span class="pl-k">long</span>);
+} __randomize_layout;
+</pre></div>
+<h4>proc_open</h4>
+<p>When a user tries to <code>open</code> the proc-file, the handler <code>int	(*proc_open)(struct inode *, struct file *);</code>
+will be invoked.</p>
+<p>A perfectly functional <code>C</code>-implementation of that, in the case that no work needs to be done specifically when a
+user invokes <code>open</code> is:</p>
+<div class="highlight highlight-c"><pre><span class="pl-k">int</span> <span class="pl-en">proc_open</span>(<span class="pl-k">struct</span> inode *inode, <span class="pl-k">struct</span> file *file)
+{
+	<span class="pl-k">return</span> <span class="pl-c1">0</span>;
+}
+</pre></div>
+<p>It just returns <code>0</code> for success.</p>
+<p>There are cases where one would like to do something when the file is opened, in that case,
+the *file pointer could be modified, for example by editing the <code>void *private_data</code>-field to add some data
+that will follow the file into its coming operations.
+Read some more about the <a href="https://www.oreilly.com/library/view/linux-device-drivers/0596000081/ch03s04.html">file structure here</a>,
+or check out its definition <a href="https://github.com/Rust-for-Linux/linux/blob/e31f0a57ae1ab2f6e17adb8e602bc120ad722232/include/linux/fs.h#L992">here</a>.</p>
+<h4>proc_read</h4>
+<p>Now it's getting into some logic, when a user wants to read from the file
+it provides a buffer and an offset pointer, the signature looks like
+<a href="https://github.com/Rust-for-Linux/linux/blob/e31f0a57ae1ab2f6e17adb8e602bc120ad722232/include/linux/proc_fs.h#L32">this:</a></p>
+<div class="highlight highlight-c"><pre><span class="pl-c1">ssize_t</span>	<span class="pl-en">proc_read</span>(<span class="pl-k">struct</span> file *f, <span class="pl-k">char</span> __user *buf, <span class="pl-c1">size_t</span> buf_len, <span class="pl-c1">loff_t</span> *offset);
+</pre></div>
+<p>Again there's the file structure-pointer which could contain data that
+was put there in an <code>open</code>-implementation, as well as a suspiciously annotated
+<code>char __user *buf</code>.</p>
+<p>The kernel should write data into the user buffer, return the number of
+bytes written, and update the offset through the pointer.</p>
+<h4>proc_write</h4>
+<p>When a user tries to write to the file, it enters through <a href="https://github.com/Rust-for-Linux/linux/blob/e31f0a57ae1ab2f6e17adb8e602bc120ad722232/include/linux/proc_fs.h#L34">proc_write</a>.</p>
+<p>Which looks like this:</p>
+<div class="highlight highlight-c"><pre><span class="pl-c1">ssize_t</span>	(*proc_write)(<span class="pl-k">struct</span> file *f, <span class="pl-k">const</span> <span class="pl-k">char</span> __user *buf, <span class="pl-c1">size_t</span> buf_len, <span class="pl-c1">loff_t</span> *offset);
+</pre></div>
+<p>The user provides the buffer it wants to write into the file along with its length, and
+a pointer to update the offset. Again suspiciously annotating the buffer with <code>__user</code>.</p>
+<p>The kernel should write data from the user buffer into the backing storage.</p>
+<h4>proc_lseek</h4>
+<p>Lastly, if the file is to be seekable to an offset <a href="https://github.com/Rust-for-Linux/linux/blob/e31f0a57ae1ab2f6e17adb8e602bc120ad722232/include/linux/proc_fs.h#L36">proc_lseek</a>
+has to be implemented.</p>
+<p>It's signature looks like this:</p>
+<div class="highlight highlight-c"><pre><span class="pl-c1">loff_t</span> (*proc_lseek)(<span class="pl-k">struct</span> file *f, <span class="pl-c1">loff_t</span> offset, <span class="pl-k">int</span> whence);
+</pre></div>
+<p>Again the file is provided, the offset to seek to, and <code>whence</code> to seek,
+whence is an int which should have one of 5 values, those are described
+more in the docs <a href="https://man7.org/linux/man-pages/man2/lseek.2.html">here</a>,
+the most intuitive one is <code>SEEK_SET</code> which means that the file's offset should
+be set to the offset that the user provided.</p>
+<p>Assuming that the offset makes sense, the kernel should return the new offset.</p>
+<h3>Implementing it in Rust</h3>
+<p>That's it, with those 4 functions implemented there should be a fairly complete working
+file created when they're passed as members of the <code>proc_ops</code>-struct, time to start!</p>
+<h4>Generating bindings</h4>
+<p>Rust for Linux uses Rust-bindings generated from the kernel headers.
+They're conveniently added when building, as long as the correct headers are
+added <a href="https://github.com/MarcusGrass/linux/blob/8e8c948133ca1a0cbf8f8add191daa739a193d99/rust/bindings/bindings_helper.h#L19">here</a>,
+for this module only <code>proc_fs.h</code> is needed.</p>
+<h4>unsafe extern "C" fn</h4>
+<p>Since Rust is compatible with <code>C</code> by jumping through some hoops,
+theoretically the module could be implemented by just using the C-api
+directly as-is through the functions provided by the bindings.</p>
+<p>The power of Rust is being able to take unsafe code and make safe abstractions
+on top of them. But, it's a good start to figure out how the API's work.</p>
+<p>The generated rust functions-pointer-definitions look like this:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">unsafe</span> <span class="pl-k">extern</span> <span class="pl-s"><span class="pl-pds">"</span>C<span class="pl-pds">"</span></span> <span class="pl-k">fn</span> <span class="pl-en">proc_open</span>(
+        <span class="pl-smi">inode</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">inode</span>,
+        <span class="pl-smi">file</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">file</span>,
+    ) <span class="pl-k">-></span> <span class="pl-en">i32</span> {
+    <span class="pl-k">...</span>
+}
+<span class="pl-k">unsafe</span> <span class="pl-k">extern</span> <span class="pl-s"><span class="pl-pds">"</span>C<span class="pl-pds">"</span></span> <span class="pl-k">fn</span> <span class="pl-en">proc_read</span>(
+    <span class="pl-smi">file</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">file</span>,
+    <span class="pl-smi">buf</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_char</span>,
+    <span class="pl-smi">buf_cap</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+    <span class="pl-smi">read_offset</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span>,
+) <span class="pl-k">-></span> <span class="pl-en">isize</span> {
+    <span class="pl-k">...</span>
+}
+<span class="pl-k">unsafe</span> <span class="pl-k">extern</span> <span class="pl-s"><span class="pl-pds">"</span>C<span class="pl-pds">"</span></span> <span class="pl-k">fn</span> <span class="pl-en">proc_write</span>(
+    <span class="pl-smi">file</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">file</span>,
+    <span class="pl-smi">buf</span><span class="pl-k">:</span> <span class="pl-k">*const</span> <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_char</span>,
+    <span class="pl-smi">buf_cap</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+    <span class="pl-smi">write_offset</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span>,
+) <span class="pl-k">-></span> <span class="pl-en">isize</span> {
+    <span class="pl-k">...</span>
+}
+<span class="pl-k">unsafe</span> <span class="pl-k">extern</span> <span class="pl-s"><span class="pl-pds">"</span>C<span class="pl-pds">"</span></span> <span class="pl-k">fn</span> <span class="pl-en">proc_lseek</span>(
+    <span class="pl-smi">file</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">file</span>,
+    <span class="pl-smi">offset</span><span class="pl-k">:</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span>,
+    <span class="pl-smi">whence</span><span class="pl-k">:</span> <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_int</span>,
+) <span class="pl-k">-></span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span> {
+    <span class="pl-k">...</span>
+}
+</pre></div>
+<p>One key difference between these C-style function declarations and something
+like <code>Rust</code>'s <a href="https://doc.rust-lang.org/std/ops/trait.Fn.html"><code>Fn</code>-traits</a>
+is that these function cannot capture any state.</p>
+<p>This necessitates using global-static for persistent state that has to
+be shared between user-calls into the proc-file.
+(For modifications that do not have to be shared or persisted after the interaction ends
+, the <code>file</code>'s private data could be used).</p>
+<p>Another key difference is that that pesky <code>__user</code>-annotation is finally gone, let's not
+think more about that, the problem solved itself.</p>
+<h4>Abstraction</h4>
+<p>As mentioned previously, one key-point of rust is being able to
+abstract away unsafety, ideally an API would consist of <code>Rust</code> function-signatures
+containing refernces instead of <code>C</code>-style function-signatures containing raw pointers,
+it's a bit tricky, but it can be done.</p>
+<p>Here's an example of how to do the conversion in a way with zero-cost:</p>
+<p>Without any conversion, calling a rust-function within a <code>C</code>-style function:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">fn</span> <span class="pl-en">rust_fn</span>() <span class="pl-k">-></span> <span class="pl-en">i32</span> {
+    <span class="pl-en">std</span><span class="pl-k">::</span><span class="pl-en">hint</span><span class="pl-k">::</span><span class="pl-en">black_box</span>(<span class="pl-c1">5</span>) <span class="pl-k">*</span> <span class="pl-en">std</span><span class="pl-k">::</span><span class="pl-en">hint</span><span class="pl-k">::</span><span class="pl-en">black_box</span>(<span class="pl-c1">15</span>)
+}
+<span class="pl-k">pub</span> <span class="pl-k">unsafe</span> <span class="pl-k">extern</span> <span class="pl-s"><span class="pl-pds">"</span>C<span class="pl-pds">"</span></span> <span class="pl-k">fn</span> <span class="pl-en">my_callback2</span>() <span class="pl-k">-></span> <span class="pl-en">i32</span> {
+    <span class="pl-en">rust_fn</span>()
+}
+<span class="pl-k">pub</span> <span class="pl-k">fn</span> <span class="pl-en">main</span>() <span class="pl-k">-></span> <span class="pl-en">i32</span>{
+    <span class="pl-k">unsafe</span> {
+        <span class="pl-en">my_callback2</span>()
+    }
+}
+</pre></div>
+<p>This allows the user to define <code>rust_fn</code>, and then wrap it with <code>C</code>-style function.</p>
+<p>Through <a href="https://godbolt.org">godbolt</a> it produces this assembly:</p>
+<pre><code class="language-nasm">example::my_callback2::h381eee3be316e700:
+        mov     dword ptr [rsp - 8], 5
+        lea     rax, [rsp - 8]
+        mov     eax, dword ptr [rsp - 8]
+        mov     dword ptr [rsp - 4], 15
+        lea     rcx, [rsp - 4]
+        imul    eax, dword ptr [rsp - 4]
+        ret
+example::main::h11eebe12cad5e117:
+        mov     dword ptr [rsp - 8], 5
+        lea     rax, [rsp - 8]
+        mov     eax, dword ptr [rsp - 8]
+        mov     dword ptr [rsp - 4], 15
+        lea     rcx, [rsp - 4]
+        imul    eax, dword ptr [rsp - 4]
+        ret
+</code></pre>
+<p>The above shows that the entire function my_callback2 was inlined
+into main, a zero-cost abstraction should produce the same code,
+so any abstraction should produce the same assembly.</p>
+<p>Here is an example of such an abstraction:</p>
+<div class="highlight highlight-rust"><pre>
+<span class="pl-k">fn</span> <span class="pl-en">rust_fn</span>() <span class="pl-k">-></span> <span class="pl-en">i32</span> {
+    <span class="pl-en">std</span><span class="pl-k">::</span><span class="pl-en">hint</span><span class="pl-k">::</span><span class="pl-en">black_box</span>(<span class="pl-c1">5</span>) <span class="pl-k">*</span> <span class="pl-en">std</span><span class="pl-k">::</span><span class="pl-en">hint</span><span class="pl-k">::</span><span class="pl-en">black_box</span>(<span class="pl-c1">15</span>)
+}
+<span class="pl-k">pub</span> <span class="pl-k">trait</span> <span class="pl-en">MyTrait</span>&#x3C;'<span class="pl-en">a</span>> {
+    <span class="pl-k">const</span> <span class="pl-c1">CALLBACK_1</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span>'<span class="pl-en">a</span> <span class="pl-k">dyn</span> <span class="pl-en">Fn</span>() <span class="pl-k">-></span> <span class="pl-en">i32</span>;
+}
+<span class="pl-k">pub</span> <span class="pl-k">struct</span> <span class="pl-en">MyStruct</span>;
+<span class="pl-k">impl</span>&#x3C;'<span class="pl-en">a</span>> <span class="pl-en">MyTrait</span>&#x3C;'<span class="pl-en">a</span>> <span class="pl-k">for</span> <span class="pl-en">MyStruct</span> {
+    <span class="pl-k">const</span> <span class="pl-c1">CALLBACK_1</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span>'<span class="pl-en">a</span> <span class="pl-k">dyn</span> <span class="pl-en">Fn</span>() <span class="pl-k">-></span> <span class="pl-en">i32</span> <span class="pl-k">=</span> <span class="pl-k">&#x26;</span><span class="pl-smi">rust_fn</span>;
+}
+<span class="pl-k">pub</span> <span class="pl-k">struct</span> <span class="pl-en">Container</span>&#x3C;'<span class="pl-en">a</span>, <span class="pl-en">T</span>>(<span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">marker</span><span class="pl-k">::</span><span class="pl-en">PhantomData</span>&#x3C;<span class="pl-k">&#x26;</span>'<span class="pl-en">a</span> <span class="pl-en">T</span>>);
+<span class="pl-k">impl</span>&#x3C;'<span class="pl-en">a</span>, <span class="pl-en">T</span>> <span class="pl-en">Container</span>&#x3C;'<span class="pl-en">a</span>, <span class="pl-en">T</span>> <span class="pl-k">where</span> <span class="pl-en">T</span><span class="pl-k">:</span> <span class="pl-en">MyTrait</span>&#x3C;'<span class="pl-en">a</span>> {
+    <span class="pl-k">pub</span> <span class="pl-k">unsafe</span> <span class="pl-k">extern</span> <span class="pl-s"><span class="pl-pds">"</span>C<span class="pl-pds">"</span></span> <span class="pl-k">fn</span> <span class="pl-en">proxy_callback</span>() <span class="pl-k">-></span> <span class="pl-en">i32</span> {
+        <span class="pl-en">T</span><span class="pl-k">::</span><span class="pl-c1">CALLBACK_1</span>()
+    }
+}
+<span class="pl-k">pub</span> <span class="pl-k">fn</span> <span class="pl-en">main</span>() <span class="pl-k">-></span> <span class="pl-en">i32</span> {
+    <span class="pl-k">unsafe</span> {
+        <span class="pl-en">Container</span><span class="pl-k">::</span>&#x3C;'<span class="pl-en">_</span>, <span class="pl-en">MyStruct</span>><span class="pl-k">::</span><span class="pl-en">proxy_callback</span>()
+    }
+}
+</pre></div>
+<p>Which produces this assembly:</p>
+<pre><code class="language-nasm">example::main::h11eebe12cad5e117:
+        mov     dword ptr [rsp - 8], 5
+        lea     rax, [rsp - 8]
+        mov     eax, dword ptr [rsp - 8]
+        mov     dword ptr [rsp - 4], 15
+        lea     rcx, [rsp - 4]
+        imul    eax, dword ptr [rsp - 4]
+        ret
+</code></pre>
+<p>Again, the entire function was inlined, even though a <a href="https://doc.rust-lang.org/std/keyword.dyn.html"><code>dyn</code>-trait</a> is used
+the compiler can figure out that it should/can be inlined.</p>
+<p>This may seem a bit useless, since the only difference between the pre- and post-abstraction
+code is having the function connected to a struct, but using that better abstractions can be provided.</p>
+<h4>Better function signatures</h4>
+<p>Looking again at the function pointer that will be invoked for <code>lseek</code>:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">unsafe</span> <span class="pl-k">extern</span> <span class="pl-s"><span class="pl-pds">"</span>C<span class="pl-pds">"</span></span> <span class="pl-k">fn</span> <span class="pl-en">proc_lseek</span>(
+    <span class="pl-smi">file</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">file</span>,
+    <span class="pl-smi">offset</span><span class="pl-k">:</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span>,
+    <span class="pl-smi">whence</span><span class="pl-k">:</span> <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_int</span>,
+) <span class="pl-k">-></span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span> {
+    <span class="pl-k">...</span>
+}
+</pre></div>
+<p>It can be described as a pure-rust-function like this:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">fn</span> <span class="pl-en">proc_lseek</span>(<span class="pl-smi">file</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">file</span>,
+    <span class="pl-smi">offset</span><span class="pl-k">:</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span>,
+    <span class="pl-smi">whence</span><span class="pl-k">:</span> <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_int</span>) <span class="pl-k">-></span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span>;
+</pre></div>
+<p>Or even better like this:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-c">/// lseek valid variants [See the lseek docs for more detail](https://man7.org/linux/man-pages/man2/lseek.2.html)</span>
+#[repr(<span class="pl-en">u32</span>)]
+<span class="pl-k">pub</span> <span class="pl-k">enum</span> <span class="pl-en">Whence</span> {
+<span class="pl-c">    /// See above doc link</span>
+    <span class="pl-en">SeekSet</span> <span class="pl-k">=</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-c1">SEEK_SET</span>,
+<span class="pl-c">    /// See above doc link</span>
+    <span class="pl-en">SeekCur</span> <span class="pl-k">=</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-c1">SEEK_CUR</span>,
+<span class="pl-c">    /// See above doc link</span>
+    <span class="pl-en">SeekEnd</span> <span class="pl-k">=</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-c1">SEEK_END</span>,
+<span class="pl-c">    /// See above doc link</span>
+    <span class="pl-en">SeekData</span> <span class="pl-k">=</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-c1">SEEK_DATA</span>,
+<span class="pl-c">    /// See above doc link</span>
+    <span class="pl-en">SeekHole</span> <span class="pl-k">=</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-c1">SEEK_HOLE</span>,
+}
+<span class="pl-k">impl</span> <span class="pl-en">TryFrom</span>&#x3C;<span class="pl-en">u32</span>> <span class="pl-k">for</span> <span class="pl-en">Whence</span> {
+    <span class="pl-k">type</span> <span class="pl-en">Error</span> <span class="pl-k">=</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">error</span><span class="pl-k">::</span><span class="pl-en">Error</span>;
+    <span class="pl-k">fn</span> <span class="pl-en">try_from</span>(<span class="pl-smi">value</span><span class="pl-k">:</span> <span class="pl-en">u32</span>) <span class="pl-k">-></span> <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">result</span><span class="pl-k">::</span><span class="pl-en">Result</span>&#x3C;<span class="pl-c1">Self</span>, <span class="pl-c1">Self</span><span class="pl-k">::</span><span class="pl-en">Error</span>> {
+        <span class="pl-en">Ok</span>(<span class="pl-k">match</span> <span class="pl-smi">value</span> {
+            <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-c1">SEEK_SET</span> <span class="pl-k">=></span> <span class="pl-c1">Self</span><span class="pl-k">::</span><span class="pl-en">SeekSet</span>,
+            <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-c1">SEEK_CUR</span> <span class="pl-k">=></span> <span class="pl-c1">Self</span><span class="pl-k">::</span><span class="pl-en">SeekCur</span>,
+            <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-c1">SEEK_END</span> <span class="pl-k">=></span> <span class="pl-c1">Self</span><span class="pl-k">::</span><span class="pl-en">SeekEnd</span>,
+            <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-c1">SEEK_DATA</span> <span class="pl-k">=></span> <span class="pl-c1">Self</span><span class="pl-k">::</span><span class="pl-en">SeekData</span>,
+            <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-c1">SEEK_HOLE</span> <span class="pl-k">=></span> <span class="pl-c1">Self</span><span class="pl-k">::</span><span class="pl-en">SeekHole</span>,
+            <span class="pl-smi">_</span> <span class="pl-k">=></span> <span class="pl-k">return</span> <span class="pl-en">Err</span>(<span class="pl-c1">EINVAL</span>),
+        })
+    }
+}
+<span class="pl-k">fn</span> <span class="pl-en">proc_lseek</span>(<span class="pl-smi">file</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">file</span>,
+    <span class="pl-smi">offset</span><span class="pl-k">:</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span>,
+    <span class="pl-smi">whence</span><span class="pl-k">:</span> <span class="pl-en">Whence</span>) <span class="pl-k">-></span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span>;
+</pre></div>
+<p>Or even better, since even though the bindings specify a <code>*mut</code>, <a href="https://doc.rust-lang.org/nomicon/aliasing.html">converting that to a mutable reference
+is likely going to cause UB</a>, but converting it to
+an immutable reference <strong>should</strong> be safe.</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">fn</span> <span class="pl-en">proc_lseek</span>(<span class="pl-smi">file</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span><span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">file</span>,
+    <span class="pl-smi">offset</span><span class="pl-k">:</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span>,
+    <span class="pl-smi">whence</span><span class="pl-k">:</span> <span class="pl-en">Whence</span>) <span class="pl-k">-></span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span>;
+</pre></div>
+<p>Making a safer abstraction over the bindings struct <code>file</code> would be even better, but deemed out of scope,
+the rust-api now communicates that lseek takes a reference to a file that should not be mutated
+(it can safely be mutated with synchronization, again out of scope), an offset, and a <code>Whence</code>-enum which
+can only be one of 5 types.</p>
+<p>However, something needs to wrap this <code>Rust</code>-function, validate that <code>Whence</code> can be converted from the provided <code>int</code>
+from the <code>C</code>-style function, and check that the file-pointer is non-null, and turn it into a reference.</p>
+<p>Here's an example of how that could look:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-c">/// Raw C-entrypoint</span>
+<span class="pl-k">unsafe</span> <span class="pl-k">extern</span> <span class="pl-s"><span class="pl-pds">"</span>C<span class="pl-pds">"</span></span> <span class="pl-k">fn</span> <span class="pl-en">proc_lseek</span>(
+    <span class="pl-smi">file</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">file</span>,
+    <span class="pl-smi">offset</span><span class="pl-k">:</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span>,
+    <span class="pl-smi">whence</span><span class="pl-k">:</span> <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_int</span>,
+) <span class="pl-k">-></span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span> {
+<span class="pl-c">    // Take the `c_int` and Convert to a `Whence`-enum, return an error if invalid</span>
+    <span class="pl-k">let</span> <span class="pl-en">Ok</span>(<span class="pl-smi">whence_u32</span>) <span class="pl-k">=</span> <span class="pl-en">u32</span><span class="pl-k">::</span><span class="pl-en">try_from</span>(<span class="pl-smi">whence</span>) <span class="pl-k">else</span> {
+        <span class="pl-k">return</span> <span class="pl-c1">EINVAL</span><span class="pl-k">.</span><span class="pl-en">to_errno</span>()<span class="pl-k">.</span><span class="pl-en">into</span>();
+    };
+    <span class="pl-k">let</span> <span class="pl-en">Ok</span>(<span class="pl-smi">whence</span>) <span class="pl-k">=</span> <span class="pl-en">Whence</span><span class="pl-k">::</span><span class="pl-en">try_from</span>(<span class="pl-smi">whence_u32</span>) <span class="pl-k">else</span> {
+        <span class="pl-k">return</span> <span class="pl-c1">EINVAL</span><span class="pl-k">.</span><span class="pl-en">to_errno</span>()<span class="pl-k">.</span><span class="pl-en">into</span>();
+    };
+<span class="pl-c">    // Take the file-pointer, convert to a reference if not null</span>
+    <span class="pl-k">let</span> <span class="pl-smi">file_ref</span> <span class="pl-k">=</span> <span class="pl-k">unsafe</span> {
+        <span class="pl-k">let</span> <span class="pl-en">Some</span>(<span class="pl-smi">file_ref</span>) <span class="pl-k">=</span> <span class="pl-smi">file</span><span class="pl-k">.</span><span class="pl-en">as_ref</span>() <span class="pl-k">else</span> {
+            <span class="pl-k">return</span> <span class="pl-c1">EINVAL</span><span class="pl-k">.</span><span class="pl-en">to_errno</span>()<span class="pl-k">.</span><span class="pl-en">into</span>();
+        };
+        <span class="pl-smi">file_ref</span>
+    };
+<span class="pl-c">    // Execute the rust-function `T:LSEEK` with the converted arguments, and return the result, or error as an errno</span>
+    <span class="pl-k">match</span> (<span class="pl-en">T</span><span class="pl-k">::</span><span class="pl-c1">LSEEK</span>)(<span class="pl-smi">file_ref</span>, <span class="pl-smi">offset</span>, <span class="pl-smi">whence</span>) {
+        <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">result</span><span class="pl-k">::</span><span class="pl-en">Result</span><span class="pl-k">::</span><span class="pl-en">Ok</span>(<span class="pl-smi">offs</span>) <span class="pl-k">=></span> <span class="pl-smi">offs</span>,
+        <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">result</span><span class="pl-k">::</span><span class="pl-en">Result</span><span class="pl-k">::</span><span class="pl-en">Err</span>(<span class="pl-smi">e</span>) <span class="pl-k">=></span> {
+            <span class="pl-k">return</span> <span class="pl-smi">e</span><span class="pl-k">.</span><span class="pl-en">to_errno</span>()<span class="pl-k">.</span><span class="pl-en">into</span>();
+        }
+    }
+}
+</pre></div>
+<p>The <code>T::LSEEK</code> comes from a generic bound, as with the minimal example, this function-pointer comes from
+a struct, which is bounded on a struct implementing a trait.</p>
+<p>The definition of the generated <code>proc_ops</code> looks like this:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">pub</span> <span class="pl-k">struct</span> <span class="pl-smi">proc_ops</span> {
+    <span class="pl-k">pub</span> <span class="pl-smi">proc_flags</span><span class="pl-k">:</span> <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_uint</span>,
+    <span class="pl-k">pub</span> <span class="pl-smi">proc_open</span><span class="pl-k">:</span> <span class="pl-k">::</span><span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">option</span><span class="pl-k">::</span><span class="pl-en">Option</span>&#x3C;
+        <span class="pl-k">unsafe</span> <span class="pl-k">extern</span> "<span class="pl-en">C</span>" <span class="pl-k">fn</span>(<span class="pl-smi">arg1</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-smi">inode</span>, <span class="pl-smi">arg2</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-smi">file</span>) <span class="pl-k">-></span> <span class="pl-smi">core</span><span class="pl-k">::</span><span class="pl-smi">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_int</span>,
+    >,
+    <span class="pl-k">pub</span> <span class="pl-smi">proc_read</span><span class="pl-k">:</span> <span class="pl-k">::</span><span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">option</span><span class="pl-k">::</span><span class="pl-en">Option</span>&#x3C;
+        <span class="pl-k">unsafe</span> <span class="pl-k">extern</span> "<span class="pl-en">C</span>" <span class="pl-k">fn</span>(
+            <span class="pl-smi">arg1</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-smi">file</span>,
+            <span class="pl-smi">arg2</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-smi">core</span><span class="pl-k">::</span><span class="pl-smi">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_char</span>,
+            <span class="pl-smi">arg3</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+            <span class="pl-smi">arg4</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-smi">loff_t</span>,
+        ) <span class="pl-k">-></span> <span class="pl-en">isize</span>,
+    >,
+    <span class="pl-k">pub</span> <span class="pl-smi">proc_read_iter</span><span class="pl-k">:</span> <span class="pl-k">::</span><span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">option</span><span class="pl-k">::</span><span class="pl-en">Option</span>&#x3C;
+        <span class="pl-k">unsafe</span> <span class="pl-k">extern</span> "<span class="pl-en">C</span>" <span class="pl-k">fn</span>(<span class="pl-smi">arg1</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-smi">kiocb</span>, <span class="pl-smi">arg2</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-smi">iov_iter</span>) <span class="pl-k">-></span> <span class="pl-en">isize</span>,
+    >,
+    <span class="pl-k">pub</span> <span class="pl-smi">proc_write</span><span class="pl-k">:</span> <span class="pl-k">::</span><span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">option</span><span class="pl-k">::</span><span class="pl-en">Option</span>&#x3C;
+        <span class="pl-k">unsafe</span> <span class="pl-k">extern</span> "<span class="pl-en">C</span>" <span class="pl-k">fn</span>(
+            <span class="pl-smi">arg1</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-smi">file</span>,
+            <span class="pl-smi">arg2</span><span class="pl-k">:</span> <span class="pl-k">*const</span> <span class="pl-smi">core</span><span class="pl-k">::</span><span class="pl-smi">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_char</span>,
+            <span class="pl-smi">arg3</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+            <span class="pl-smi">arg4</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-smi">loff_t</span>,
+        ) <span class="pl-k">-></span> <span class="pl-en">isize</span>,
+    >,
+    <span class="pl-k">pub</span> <span class="pl-smi">proc_lseek</span><span class="pl-k">:</span> <span class="pl-k">::</span><span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">option</span><span class="pl-k">::</span><span class="pl-en">Option</span>&#x3C;
+        <span class="pl-k">unsafe</span> <span class="pl-k">extern</span> "<span class="pl-en">C</span>" <span class="pl-k">fn</span>(<span class="pl-smi">arg1</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-smi">file</span>, <span class="pl-smi">arg2</span><span class="pl-k">:</span> <span class="pl-smi">loff_t</span>, <span class="pl-smi">arg3</span><span class="pl-k">:</span> <span class="pl-smi">core</span><span class="pl-k">::</span><span class="pl-smi">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_int</span>) <span class="pl-k">-></span> <span class="pl-smi">loff_t</span>,
+    >,
+    <span class="pl-k">pub</span> <span class="pl-smi">proc_release</span><span class="pl-k">:</span> <span class="pl-k">::</span><span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">option</span><span class="pl-k">::</span><span class="pl-en">Option</span>&#x3C;
+        <span class="pl-k">unsafe</span> <span class="pl-k">extern</span> "<span class="pl-en">C</span>" <span class="pl-k">fn</span>(<span class="pl-smi">arg1</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-smi">inode</span>, <span class="pl-smi">arg2</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-smi">file</span>) <span class="pl-k">-></span> <span class="pl-smi">core</span><span class="pl-k">::</span><span class="pl-smi">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_int</span>,
+    >,
+    <span class="pl-k">pub</span> <span class="pl-smi">proc_poll</span><span class="pl-k">:</span> <span class="pl-k">::</span><span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">option</span><span class="pl-k">::</span><span class="pl-en">Option</span>&#x3C;
+        <span class="pl-k">unsafe</span> <span class="pl-k">extern</span> "<span class="pl-en">C</span>" <span class="pl-k">fn</span>(<span class="pl-smi">arg1</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-smi">file</span>, <span class="pl-smi">arg2</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-smi">poll_table_struct</span>) <span class="pl-k">-></span> <span class="pl-smi">__poll_t</span>,
+    >,
+    <span class="pl-k">pub</span> <span class="pl-smi">proc_ioctl</span><span class="pl-k">:</span> <span class="pl-k">::</span><span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">option</span><span class="pl-k">::</span><span class="pl-en">Option</span>&#x3C;
+        <span class="pl-k">unsafe</span> <span class="pl-k">extern</span> "<span class="pl-en">C</span>" <span class="pl-k">fn</span>(
+            <span class="pl-smi">arg1</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-smi">file</span>,
+            <span class="pl-smi">arg2</span><span class="pl-k">:</span> <span class="pl-smi">core</span><span class="pl-k">::</span><span class="pl-smi">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_uint</span>,
+            <span class="pl-smi">arg3</span><span class="pl-k">:</span> <span class="pl-smi">core</span><span class="pl-k">::</span><span class="pl-smi">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_ulong</span>,
+        ) <span class="pl-k">-></span> <span class="pl-smi">core</span><span class="pl-k">::</span><span class="pl-smi">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_long</span>,
+    >,
+    <span class="pl-k">pub</span> <span class="pl-smi">proc_compat_ioctl</span><span class="pl-k">:</span> <span class="pl-k">::</span><span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">option</span><span class="pl-k">::</span><span class="pl-en">Option</span>&#x3C;
+        <span class="pl-k">unsafe</span> <span class="pl-k">extern</span> "<span class="pl-en">C</span>" <span class="pl-k">fn</span>(
+            <span class="pl-smi">arg1</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-smi">file</span>,
+            <span class="pl-smi">arg2</span><span class="pl-k">:</span> <span class="pl-smi">core</span><span class="pl-k">::</span><span class="pl-smi">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_uint</span>,
+            <span class="pl-smi">arg3</span><span class="pl-k">:</span> <span class="pl-smi">core</span><span class="pl-k">::</span><span class="pl-smi">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_ulong</span>,
+        ) <span class="pl-k">-></span> <span class="pl-smi">core</span><span class="pl-k">::</span><span class="pl-smi">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_long</span>,
+    >,
+    <span class="pl-k">pub</span> <span class="pl-smi">proc_mmap</span><span class="pl-k">:</span> <span class="pl-k">::</span><span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">option</span><span class="pl-k">::</span><span class="pl-en">Option</span>&#x3C;
+        <span class="pl-k">unsafe</span> <span class="pl-k">extern</span> "<span class="pl-en">C</span>" <span class="pl-k">fn</span>(<span class="pl-smi">arg1</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-smi">file</span>, <span class="pl-smi">arg2</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-smi">vm_area_struct</span>) <span class="pl-k">-></span> <span class="pl-smi">core</span><span class="pl-k">::</span><span class="pl-smi">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_int</span>,
+    >,
+    <span class="pl-k">pub</span> <span class="pl-smi">proc_get_unmapped_area</span><span class="pl-k">:</span> <span class="pl-k">::</span><span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">option</span><span class="pl-k">::</span><span class="pl-en">Option</span>&#x3C;
+        <span class="pl-k">unsafe</span> <span class="pl-k">extern</span> "<span class="pl-en">C</span>" <span class="pl-k">fn</span>(
+            <span class="pl-smi">arg1</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-smi">file</span>,
+            <span class="pl-smi">arg2</span><span class="pl-k">:</span> <span class="pl-smi">core</span><span class="pl-k">::</span><span class="pl-smi">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_ulong</span>,
+            <span class="pl-smi">arg3</span><span class="pl-k">:</span> <span class="pl-smi">core</span><span class="pl-k">::</span><span class="pl-smi">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_ulong</span>,
+            <span class="pl-smi">arg4</span><span class="pl-k">:</span> <span class="pl-smi">core</span><span class="pl-k">::</span><span class="pl-smi">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_ulong</span>,
+            <span class="pl-smi">arg5</span><span class="pl-k">:</span> <span class="pl-smi">core</span><span class="pl-k">::</span><span class="pl-smi">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_ulong</span>,
+        ) <span class="pl-k">-></span> <span class="pl-smi">core</span><span class="pl-k">::</span><span class="pl-smi">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_ulong</span>,
+    >,
+}
+</pre></div>
+<p>It's a struct containing a bunch of optional function-pointers. Here's what it looks after abstracting most of the <code>C</code>-parts away
+(only implementing <code>open</code>, <code>read</code>, <code>write</code>, and <code>lseek</code>).</p>
+<div class="highlight highlight-rust"><pre><span class="pl-c">/// Type alias for open function signature</span>
+<span class="pl-k">pub</span> <span class="pl-k">type</span> <span class="pl-en">ProcOpen</span>&#x3C;'<span class="pl-en">a</span>> <span class="pl-k">=</span> <span class="pl-k">&#x26;</span>'<span class="pl-en">a</span> <span class="pl-k">dyn</span> <span class="pl-en">Fn</span>(<span class="pl-k">&#x26;</span><span class="pl-smi">inode</span>, <span class="pl-k">&#x26;</span><span class="pl-smi">file</span>) <span class="pl-k">-></span> <span class="pl-en">Result</span>&#x3C;<span class="pl-en">i32</span>>;
+<span class="pl-c">/// Type alias for read function signature</span>
+<span class="pl-k">pub</span> <span class="pl-k">type</span> <span class="pl-en">ProcRead</span>&#x3C;'<span class="pl-en">a</span>> <span class="pl-k">=</span> <span class="pl-k">&#x26;</span>'<span class="pl-en">a</span> <span class="pl-k">dyn</span> <span class="pl-en">Fn</span>(<span class="pl-k">&#x26;</span><span class="pl-smi">file</span>, <span class="pl-en">UserSliceWriter</span>, <span class="pl-k">&#x26;</span><span class="pl-smi">loff_t</span>) <span class="pl-k">-></span> <span class="pl-en">Result</span>&#x3C;(<span class="pl-en">usize</span>, <span class="pl-en">usize</span>)>;
+<span class="pl-c">/// Type alias for write function signature</span>
+<span class="pl-k">pub</span> <span class="pl-k">type</span> <span class="pl-en">ProcWrite</span>&#x3C;'<span class="pl-en">a</span>> <span class="pl-k">=</span> <span class="pl-k">&#x26;</span>'<span class="pl-en">a</span> <span class="pl-k">dyn</span> <span class="pl-en">Fn</span>(<span class="pl-k">&#x26;</span><span class="pl-smi">file</span>, <span class="pl-en">UserSliceReader</span>, <span class="pl-k">&#x26;</span><span class="pl-smi">loff_t</span>) <span class="pl-k">-></span> <span class="pl-en">Result</span>&#x3C;(<span class="pl-en">usize</span>, <span class="pl-en">usize</span>)>;
+<span class="pl-c">/// Type alias for lseek function signature</span>
+<span class="pl-k">pub</span> <span class="pl-k">type</span> <span class="pl-en">ProcLseek</span>&#x3C;'<span class="pl-en">a</span>> <span class="pl-k">=</span> <span class="pl-k">&#x26;</span>'<span class="pl-en">a</span> <span class="pl-k">dyn</span> <span class="pl-en">Fn</span>(<span class="pl-k">&#x26;</span><span class="pl-smi">file</span>, <span class="pl-smi">loff_t</span>, <span class="pl-en">Whence</span>) <span class="pl-k">-></span> <span class="pl-en">Result</span>&#x3C;<span class="pl-smi">loff_t</span>>;
+<span class="pl-c">/// Proc file ops handler</span>
+<span class="pl-k">pub</span> <span class="pl-k">trait</span> <span class="pl-en">ProcHandler</span>&#x3C;'<span class="pl-en">a</span>> {
+<span class="pl-c">    /// Open handler</span>
+    <span class="pl-k">const</span> <span class="pl-c1">OPEN</span><span class="pl-k">:</span> <span class="pl-en">ProcOpen</span>&#x3C;'<span class="pl-en">a</span>>;
+<span class="pl-c">    /// Read handler</span>
+    <span class="pl-k">const</span> <span class="pl-c1">READ</span><span class="pl-k">:</span> <span class="pl-en">ProcRead</span>&#x3C;'<span class="pl-en">a</span>>;
+<span class="pl-c">    /// Write handler</span>
+    <span class="pl-k">const</span> <span class="pl-c1">WRITE</span><span class="pl-k">:</span> <span class="pl-en">ProcWrite</span>&#x3C;'<span class="pl-en">a</span>>;
+<span class="pl-c">    /// Lseek handler</span>
+    <span class="pl-k">const</span> <span class="pl-c1">LSEEK</span><span class="pl-k">:</span> <span class="pl-en">ProcLseek</span>&#x3C;'<span class="pl-en">a</span>>;
+}
+<span class="pl-c">/// Wrapper for the kernel type `proc_ops`</span>
+<span class="pl-c">/// Roughly a translation of the expected `extern "C"`-function pointers that</span>
+<span class="pl-c">/// the kernel expects into Rust-functions with a few more helpful types.</span>
+<span class="pl-k">pub</span> <span class="pl-k">struct</span> <span class="pl-en">ProcOps</span>&#x3C;'<span class="pl-en">a</span>, <span class="pl-en">T</span>>
+<span class="pl-k">where</span>
+    <span class="pl-en">T</span><span class="pl-k">:</span> <span class="pl-en">ProcHandler</span>&#x3C;'<span class="pl-en">a</span>>,
+{
+    <span class="pl-smi">ops</span><span class="pl-k">:</span> <span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">proc_ops</span>,
+    <span class="pl-smi">_pd</span><span class="pl-k">:</span> <span class="pl-en">PhantomData</span>&#x3C;<span class="pl-k">&#x26;</span>'<span class="pl-en">a</span> <span class="pl-en">T</span>>,
+}
+<span class="pl-k">impl</span>&#x3C;'<span class="pl-en">a</span>, <span class="pl-en">T</span>> <span class="pl-en">ProcOps</span>&#x3C;'<span class="pl-en">a</span>, <span class="pl-en">T</span>>
+<span class="pl-k">where</span>
+    <span class="pl-en">T</span><span class="pl-k">:</span> <span class="pl-en">ProcHandler</span>&#x3C;'<span class="pl-en">a</span>>,
+{
+<span class="pl-c">    /// Create new ProcOps from a handler and flags</span>
+    <span class="pl-k">pub</span> <span class="pl-k">const</span> <span class="pl-k">fn</span> <span class="pl-en">new</span>(<span class="pl-smi">proc_flags</span><span class="pl-k">:</span> <span class="pl-en">u32</span>) <span class="pl-k">-></span> <span class="pl-c1">Self</span> {
+        <span class="pl-c1">Self</span> {
+            <span class="pl-smi">ops</span><span class="pl-k">:</span> <span class="pl-smi">proc_ops</span> {
+                <span class="pl-smi">proc_flags</span>,
+                <span class="pl-smi">proc_open</span><span class="pl-k">:</span> <span class="pl-en">Some</span>(<span class="pl-en">ProcOps</span><span class="pl-k">::</span>&#x3C;'<span class="pl-en">a</span>, <span class="pl-en">T</span>><span class="pl-k">::</span><span class="pl-smi">proc_open</span>),
+                <span class="pl-smi">proc_read</span><span class="pl-k">:</span> <span class="pl-en">Some</span>(<span class="pl-en">ProcOps</span><span class="pl-k">::</span>&#x3C;'<span class="pl-en">a</span>, <span class="pl-en">T</span>><span class="pl-k">::</span><span class="pl-smi">proc_read</span>),
+                <span class="pl-smi">proc_read_iter</span><span class="pl-k">:</span> <span class="pl-en">None</span>,
+                <span class="pl-smi">proc_write</span><span class="pl-k">:</span> <span class="pl-en">Some</span>(<span class="pl-en">ProcOps</span><span class="pl-k">::</span>&#x3C;'<span class="pl-en">a</span>, <span class="pl-en">T</span>><span class="pl-k">::</span><span class="pl-smi">proc_write</span>),
+                <span class="pl-smi">proc_lseek</span><span class="pl-k">:</span> <span class="pl-en">Some</span>(<span class="pl-en">ProcOps</span><span class="pl-k">::</span>&#x3C;'<span class="pl-en">a</span>, <span class="pl-en">T</span>><span class="pl-k">::</span><span class="pl-smi">proc_lseek</span>),
+                <span class="pl-smi">proc_release</span><span class="pl-k">:</span> <span class="pl-en">None</span>,
+                <span class="pl-smi">proc_poll</span><span class="pl-k">:</span> <span class="pl-en">None</span>,
+                <span class="pl-smi">proc_ioctl</span><span class="pl-k">:</span> <span class="pl-en">None</span>,
+                <span class="pl-smi">proc_compat_ioctl</span><span class="pl-k">:</span> <span class="pl-en">None</span>,
+                <span class="pl-smi">proc_mmap</span><span class="pl-k">:</span> <span class="pl-en">None</span>,
+                <span class="pl-smi">proc_get_unmapped_area</span><span class="pl-k">:</span> <span class="pl-en">None</span>,
+            },
+            <span class="pl-smi">_pd</span><span class="pl-k">:</span> <span class="pl-en">PhantomData</span>,
+        }
+    }
+    <span class="pl-k">unsafe</span> <span class="pl-k">extern</span> <span class="pl-s"><span class="pl-pds">"</span>C<span class="pl-pds">"</span></span> <span class="pl-k">fn</span> <span class="pl-en">proc_open</span>(
+        <span class="pl-smi">inode</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">inode</span>,
+        <span class="pl-smi">file</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">file</span>,
+    ) <span class="pl-k">-></span> <span class="pl-en">i32</span> {
+        <span class="pl-k">...</span>
+    }
+    <span class="pl-k">unsafe</span> <span class="pl-k">extern</span> <span class="pl-s"><span class="pl-pds">"</span>C<span class="pl-pds">"</span></span> <span class="pl-k">fn</span> <span class="pl-en">proc_read</span>(
+        <span class="pl-smi">file</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">file</span>,
+        <span class="pl-smi">buf</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_char</span>,
+        <span class="pl-smi">buf_cap</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+        <span class="pl-smi">read_offset</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span>,
+    ) <span class="pl-k">-></span> <span class="pl-en">isize</span> {
+        <span class="pl-k">...</span>
+    }
+    <span class="pl-k">unsafe</span> <span class="pl-k">extern</span> <span class="pl-s"><span class="pl-pds">"</span>C<span class="pl-pds">"</span></span> <span class="pl-k">fn</span> <span class="pl-en">proc_write</span>(
+        <span class="pl-smi">file</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">file</span>,
+        <span class="pl-smi">buf</span><span class="pl-k">:</span> <span class="pl-k">*const</span> <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_char</span>,
+        <span class="pl-smi">buf_cap</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+        <span class="pl-smi">write_offset</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span>,
+    ) <span class="pl-k">-></span> <span class="pl-en">isize</span> {
+        <span class="pl-k">...</span>
+    }
+    <span class="pl-k">unsafe</span> <span class="pl-k">extern</span> <span class="pl-s"><span class="pl-pds">"</span>C<span class="pl-pds">"</span></span> <span class="pl-k">fn</span> <span class="pl-en">proc_lseek</span>(
+        <span class="pl-smi">file</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">file</span>,
+        <span class="pl-smi">offset</span><span class="pl-k">:</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span>,
+        <span class="pl-smi">whence</span><span class="pl-k">:</span> <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_int</span>,
+    ) <span class="pl-k">-></span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span> {
+        <span class="pl-k">...</span>
+    }
+}
+</pre></div>
+<p>Some details are elided for brevity, the above code defines a trait <code>ProcHandler</code>, which contains
+constants for each of the functions to be provided. Those constants are <code>'static</code>-references to rust functions.</p>
+<p>Then it defines the <code>ProcOps</code>-struct, which is generic over <code>ProcHandler</code>, it defines the correct <code>C</code>-style
+functions which do conversions and call the provided <code>ProcHandler</code>'s <code>'&#x26;static</code>-functions and return their results.</p>
+<p>Using this, the <code>C</code>-style proc_create function can get a <code>Rust</code>-abstraction taking that <code>ProcOps</code>-struct:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-c">/// Create a proc entry with the filename `name`</span>
+<span class="pl-k">pub</span> <span class="pl-k">fn</span> <span class="pl-en">proc_create</span>&#x3C;'<span class="pl-en">a</span>, <span class="pl-en">T</span>>(
+    <span class="pl-smi">name</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span>'<span class="pl-en">static</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">str</span><span class="pl-k">::</span><span class="pl-en">CStr</span>,
+    <span class="pl-smi">mode</span><span class="pl-k">:</span> <span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">umode_t</span>,
+    <span class="pl-smi">dir_entry</span><span class="pl-k">:</span> <span class="pl-en">Option</span>&#x3C;<span class="pl-k">&#x26;</span><span class="pl-en">ProcDirEntry</span>&#x3C;'<span class="pl-en">a</span>>>,
+    <span class="pl-smi">proc_ops</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span>'<span class="pl-en">a</span> <span class="pl-en">ProcOps</span>&#x3C;'<span class="pl-en">a</span>, <span class="pl-en">T</span>>,
+) <span class="pl-k">-></span> <span class="pl-en">Result</span>&#x3C;<span class="pl-en">ProcDirEntry</span>&#x3C;'<span class="pl-en">a</span>>>
+<span class="pl-k">where</span>
+    <span class="pl-en">T</span><span class="pl-k">:</span> <span class="pl-en">ProcHandler</span>&#x3C;'<span class="pl-en">a</span>>,
+{
+<span class="pl-c">    // ProcOps contains the c-style struct, give the kernel a pointer to the address of that struct</span>
+    <span class="pl-k">let</span> <span class="pl-smi">pops</span> <span class="pl-k">=</span> <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">ptr</span><span class="pl-k">::</span><span class="pl-en">addr_of!</span>(<span class="pl-smi">proc_ops</span><span class="pl-k">.</span>ops);
+    <span class="pl-k">let</span> <span class="pl-smi">pde</span> <span class="pl-k">=</span> <span class="pl-k">unsafe</span> {
+        <span class="pl-k">let</span> <span class="pl-smi">dir_ent</span> <span class="pl-k">=</span> <span class="pl-smi">dir_entry</span>
+            <span class="pl-k">.</span><span class="pl-en">map</span>(<span class="pl-k">|</span><span class="pl-smi">de</span><span class="pl-k">|</span> <span class="pl-smi">de</span><span class="pl-k">.</span>ptr<span class="pl-k">.</span><span class="pl-en">as_ptr</span>())
+            <span class="pl-k">.</span><span class="pl-en">unwrap_or_else</span>(<span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">ptr</span><span class="pl-k">::</span><span class="pl-smi">null_mut</span>);
+        <span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-en">proc_create</span>(
+            <span class="pl-smi">name</span><span class="pl-k">.</span><span class="pl-en">as_ptr</span>() <span class="pl-k">as</span> <span class="pl-k">*const</span> <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_char</span>,
+            <span class="pl-smi">mode</span>,
+            <span class="pl-smi">dir_ent</span>,
+            <span class="pl-smi">pops</span>,
+        )
+    };
+    <span class="pl-k">match</span> <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">ptr</span><span class="pl-k">::</span><span class="pl-en">NonNull</span><span class="pl-k">::</span><span class="pl-en">new</span>(<span class="pl-smi">pde</span>) {
+        <span class="pl-en">None</span> <span class="pl-k">=></span> <span class="pl-en">Err</span>(<span class="pl-c1">ENOMEM</span>),
+        <span class="pl-en">Some</span>(<span class="pl-smi">nn</span>) <span class="pl-k">=></span> <span class="pl-en">Ok</span>(<span class="pl-en">ProcDirEntry</span> {
+            <span class="pl-smi">ptr</span><span class="pl-k">:</span> <span class="pl-smi">nn</span>,
+            <span class="pl-smi">_pd</span><span class="pl-k">:</span> <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">marker</span><span class="pl-k">::</span><span class="pl-en">PhantomData</span><span class="pl-k">::</span><span class="pl-en">default</span>(),
+        }),
+    }
+}
+</pre></div>
+<h4>Getting to work</h4>
+<p>Now it's time to use the abstraction, it looks like this:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">struct</span> <span class="pl-en">ProcHand</span>;
+<span class="pl-c">/// Implement `ProcHandler`, providing static references to rust-functions</span>
+<span class="pl-k">impl</span> <span class="pl-en">ProcHandler</span>&#x3C;'<span class="pl-en">static</span>> <span class="pl-k">for</span> <span class="pl-en">ProcHand</span> {
+    <span class="pl-k">const</span> <span class="pl-c1">OPEN</span><span class="pl-k">:</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">proc_fs</span><span class="pl-k">::</span><span class="pl-en">ProcOpen</span>&#x3C;'<span class="pl-en">static</span>> <span class="pl-k">=</span> <span class="pl-k">&#x26;</span><span class="pl-smi">popen</span>;
+    <span class="pl-k">const</span> <span class="pl-c1">READ</span><span class="pl-k">:</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">proc_fs</span><span class="pl-k">::</span><span class="pl-en">ProcRead</span>&#x3C;'<span class="pl-en">static</span>> <span class="pl-k">=</span> <span class="pl-k">&#x26;</span><span class="pl-smi">pread</span>;
+    <span class="pl-k">const</span> <span class="pl-c1">WRITE</span><span class="pl-k">:</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">proc_fs</span><span class="pl-k">::</span><span class="pl-en">ProcWrite</span>&#x3C;'<span class="pl-en">static</span>> <span class="pl-k">=</span> <span class="pl-k">&#x26;</span><span class="pl-smi">pwrite</span>;
+    <span class="pl-k">const</span> <span class="pl-c1">LSEEK</span><span class="pl-k">:</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">proc_fs</span><span class="pl-k">::</span><span class="pl-en">ProcLseek</span>&#x3C;'<span class="pl-en">static</span>> <span class="pl-k">=</span> <span class="pl-k">&#x26;</span><span class="pl-smi">plseek</span>;
+}
+#[inline]
+<span class="pl-k">fn</span> <span class="pl-en">popen</span>(<span class="pl-smi">_inode</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span><span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">inode</span>, <span class="pl-smi">_file</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span><span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">file</span>) <span class="pl-k">-></span> <span class="pl-en">Result</span>&#x3C;<span class="pl-en">i32</span>> {
+    <span class="pl-en">Ok</span>(<span class="pl-c1">0</span>)
+}
+<span class="pl-k">fn</span> <span class="pl-en">pread</span>(
+    <span class="pl-smi">_file</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span><span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">file</span>,
+    <span class="pl-k">mut</span> <span class="pl-smi">user_slice</span><span class="pl-k">:</span> <span class="pl-en">UserSliceWriter</span>,
+    <span class="pl-smi">offset</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span><span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span>,
+) <span class="pl-k">-></span> <span class="pl-en">Result</span>&#x3C;(<span class="pl-en">usize</span>, <span class="pl-en">usize</span>)> {
+    <span class="pl-k">...</span>
+}
+<span class="pl-k">fn</span> <span class="pl-en">pwrite</span>(
+    <span class="pl-smi">file</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span><span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">file</span>,
+    <span class="pl-smi">user_slice_reader</span><span class="pl-k">:</span> <span class="pl-en">UserSliceReader</span>,
+    <span class="pl-smi">offset</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span><span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span>,
+) <span class="pl-k">-></span> <span class="pl-en">Result</span>&#x3C;(<span class="pl-en">usize</span>, <span class="pl-en">usize</span>)> {
+    <span class="pl-k">...</span>
+}
+<span class="pl-k">fn</span> <span class="pl-en">plseek</span>(
+    <span class="pl-smi">file</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span><span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">file</span>,
+    <span class="pl-smi">offset</span><span class="pl-k">:</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span>,
+    <span class="pl-smi">whence</span><span class="pl-k">:</span> <span class="pl-en">Whence</span>,
+) <span class="pl-k">-></span> <span class="pl-en">Result</span>&#x3C;<span class="pl-smi">kernel</span><span class="pl-k">::</span><span class="pl-smi">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span>> {
+    <span class="pl-k">...</span>
+}
+</pre></div>
+<p>Oh right, the <code>__user</code>-part.</p>
+<p>On the first iterations of this module I conveniently ignored it, when the kernel is passed a buffer from a user
+that is marked <code>__user</code>, it needs to copy that memory from the user to be able to use it, it can't directly read from
+the provided buffer. The same goes for writing, it needs to copy memory into the buffer, it can't just directly use
+the buffer.</p>
+<p>On the <code>C</code>-side, this is done by the functions exposed by <code>linux/uaccess.h</code>
+<a href="https://github.com/MarcusGrass/linux/blob/8e8c948133ca1a0cbf8f8add191daa739a193d99/include/linux/uaccess.h#L189">copy_from_user</a>
+and <a href="https://github.com/MarcusGrass/linux/blob/8e8c948133ca1a0cbf8f8add191daa739a193d99/include/linux/uaccess.h#L201">copy_to_user</a>.</p>
+<p>The functions will:</p>
+<ol>
+<li>Check if the operation should fault, a bit complicated and I don't fully understand where faults may be injected,
+but the documentation is <a href="https://github.com/MarcusGrass/linux/blob/8e8c948133ca1a0cbf8f8add191daa739a193d99/Documentation/fault-injection/fault-injection.rst">here</a>.
+<li>Check that the memory is a valid user space address
+<li>Check that the object has space to be written into/read from a valid address (no OOB reads into memory the user
+doesn't have access to).
+<li>Do the actual copying
+</ol>
+<p>The <code>Rust</code> kernel code fairly conveniently wraps this into an api <a href="https://github.com/MarcusGrass/linux/blob/8e8c948133ca1a0cbf8f8add191daa739a193d99/rust/kernel/uaccess.rs">here</a>.</p>
+<p>The api is used in the wrapper for <code>PropOps</code>, it looks like this:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">unsafe</span> <span class="pl-k">extern</span> <span class="pl-s"><span class="pl-pds">"</span>C<span class="pl-pds">"</span></span> <span class="pl-k">fn</span> <span class="pl-en">proc_read</span>(
+    <span class="pl-smi">file</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">file</span>,
+    <span class="pl-smi">buf</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">ffi</span><span class="pl-k">::</span><span class="pl-smi">c_char</span>,
+    <span class="pl-smi">buf_cap</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+    <span class="pl-smi">read_offset</span><span class="pl-k">:</span> <span class="pl-k">*mut</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">bindings</span><span class="pl-k">::</span><span class="pl-smi">loff_t</span>,
+) <span class="pl-k">-></span> <span class="pl-en">isize</span> {
+    <span class="pl-k">...</span>
+    <span class="pl-k">let</span> <span class="pl-smi">buf</span> <span class="pl-k">=</span> <span class="pl-smi">buf</span> <span class="pl-k">as</span> <span class="pl-k">*mut</span> <span class="pl-en">u8</span> <span class="pl-k">as</span> <span class="pl-en">usize</span>;
+    <span class="pl-k">let</span> <span class="pl-smi">buf_ref</span> <span class="pl-k">=</span> <span class="pl-en">UserSlice</span><span class="pl-k">::</span><span class="pl-en">new</span>(<span class="pl-smi">buf</span>, <span class="pl-smi">buf_cap</span>);
+    <span class="pl-k">let</span> <span class="pl-smi">buf_writer</span> <span class="pl-k">=</span> <span class="pl-smi">buf_ref</span><span class="pl-k">.</span><span class="pl-en">writer</span>();
+    <span class="pl-k">...</span>
+    <span class="pl-k">match</span> (<span class="pl-en">T</span><span class="pl-k">::</span><span class="pl-c1">READ</span>)(<span class="pl-smi">file_ref</span>, <span class="pl-smi">buf_writer</span>, <span class="pl-smi">offset</span>) {
+        <span class="pl-k">...</span>
+    }
+}
+</pre></div>
+<p>The code takes the raw <code>buf</code>-ptr which lost its <code>__user</code>-annotation through bindgen, turns it into
+a raw address, and makes a <code>UserSlice</code> out of it, it then turns that slice into a <code>UserSliceWriter</code> (the user reads
+data, then the kernel needs to write data), and passes that into the module's supplied <code>READ</code>-function.
+Which again, has a signature that looks like this:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">pub</span> <span class="pl-k">type</span> <span class="pl-en">ProcRead</span>&#x3C;'<span class="pl-en">a</span>> <span class="pl-k">=</span> <span class="pl-k">&#x26;</span>'<span class="pl-en">a</span> <span class="pl-k">dyn</span> <span class="pl-en">Fn</span>(<span class="pl-k">&#x26;</span><span class="pl-smi">file</span>, <span class="pl-en">UserSliceWriter</span>, <span class="pl-k">&#x26;</span><span class="pl-smi">loff_t</span>) <span class="pl-k">-></span> <span class="pl-en">Result</span>&#x3C;(<span class="pl-en">usize</span>, <span class="pl-en">usize</span>)>;
+</pre></div>
+<h4>Writing the module</h4>
+<p>The module is defined by this convenient <code>module!</code>-macro:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">struct</span> <span class="pl-en">RustProcRamFile</span>;
+<span class="pl-en">module!</span> {
+    <span class="pl-k">type:</span> <span class="pl-en">RustProcRamFile</span>,
+    <span class="pl-smi">name</span><span class="pl-k">:</span> <span class="pl-s"><span class="pl-pds">"</span>rust_proc_ram_file<span class="pl-pds">"</span></span>,
+    <span class="pl-smi">author</span><span class="pl-k">:</span> <span class="pl-s"><span class="pl-pds">"</span>Rust for Linux Contributors<span class="pl-pds">"</span></span>,
+    <span class="pl-smi">description</span><span class="pl-k">:</span> <span class="pl-s"><span class="pl-pds">"</span>Rust proc ram file example<span class="pl-pds">"</span></span>,
+    <span class="pl-smi">license</span><span class="pl-k">:</span> <span class="pl-s"><span class="pl-pds">"</span>GPL<span class="pl-pds">"</span></span>,
+}
+</pre></div>
+<p>Most of that is metadata. But, the name will be the same name that can be <a href="https://linux.die.net/man/8/modprobe">modprobe'd</a>
+to load the module, e.g. <code>modprobe rust_proc_ram_file</code>.</p>
+<p>All that remains is implementing <code>kernel::Module</code> for <code>RustProcRamFile</code>, which is an arbitrary struct to represent
+module data.</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">impl</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">Module</span> <span class="pl-k">for</span> <span class="pl-en">RustProcRamFile</span> {
+    <span class="pl-k">fn</span> <span class="pl-en">init</span>(<span class="pl-smi">_module</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span>'<span class="pl-en">static</span> <span class="pl-en">ThisModule</span>) <span class="pl-k">-></span> <span class="pl-en">Result</span>&#x3C;<span class="pl-c1">Self</span>> {
+<span class="pl-c">        // Initialization-code</span>
+        <span class="pl-k">...</span>
+        <span class="pl-c1">Self</span>
+    }
+}
+</pre></div>
+<p>On hitch is that the module needs to be safe for concurrent access, it needs to be both <code>Send</code> + <code>Sync</code>.</p>
+<p>Remembering that the objective is to build a <code>file</code> that is backed by just bytes (a <code>Vec&#x3C;u8></code> being most convenient),
+creating a <code>RustProcRamFile(Vec&#x3C;u8>)</code> won't cut it.</p>
+<p>There's a need for shared mutable state and that's where this gets tricky.</p>
+<h4>Mutex</h4>
+<p>One of the simplest ways of creating (simplest by mental model at least) is by wrapping the state with a mutual-exclusion
+lock, a <code>Mutex</code>.</p>
+<p>Through the Kernel's <code>C</code>-API it's trivial to do that statically.</p>
+<div class="highlight highlight-c"><pre><span class="pl-k">static</span> <span class="pl-en">DEFINE_MUTEX</span>(my_mutex);
+</pre></div>
+<p>It statically defines a mutex (<a href="https://github.com/MarcusGrass/linux/blob/8e8c948133ca1a0cbf8f8add191daa739a193d99/include/linux/mutex.h#L75">definition here</a>)
+which can be interacted with, by e.g.
+<a href="https://github.com/MarcusGrass/linux/blob/8e8c948133ca1a0cbf8f8add191daa739a193d99/include/linux/mutex.h#L173">mutex_lock</a>,
+<a href="https://github.com/MarcusGrass/linux/blob/8e8c948133ca1a0cbf8f8add191daa739a193d99/include/linux/mutex.h#L192">mutex_unlock</a>,
+etc.</p>
+<p>In <code>Rust</code>-land there's a safe API for creating mutexes, it looks like this:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">let</span> <span class="pl-smi">pin_init_lock</span> <span class="pl-k">=</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">new_mutex!</span>(<span class="pl-en">Some</span>(<span class="pl-smi">data</span>), <span class="pl-s"><span class="pl-pds">"</span>proc_ram_mutex<span class="pl-pds">"</span></span>);
+</pre></div>
+<p><code>pin_init_lock</code> is something that implements <a href="https://github.com/MarcusGrass/linux/blob/8e8c948133ca1a0cbf8f8add191daa739a193d99/rust/kernel/init.rs#L838">PinInit</a>,
+the most important function of which is <code>__pinned_init(self, slot: *mut T)</code><br>
+which takes uninitialized memory that fits a <code>T</code> and initializes the variable there.</p>
+<p>For reasons that will become clearer later, the <code>mutex</code> will be initialized into static memory.</p>
+<p>Finally, to initialize the data that the <code>file</code> will be backed by, the code looks like this:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">mod</span> <span class="pl-en">backing_data</span> {
+    <span class="pl-k">use</span> <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">cell</span><span class="pl-k">::</span><span class="pl-en">UnsafeCell</span>;
+    <span class="pl-k">use</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">sync</span><span class="pl-k">::</span><span class="pl-en">lock</span><span class="pl-k">::</span>{<span class="pl-en">mutex</span><span class="pl-k">::</span><span class="pl-en">MutexBackend</span>, <span class="pl-en">Lock</span>};
+    <span class="pl-k">use</span> <span class="pl-c1">super</span><span class="pl-k">::*</span>;
+    <span class="pl-k">static</span> <span class="pl-k">mut</span> <span class="pl-c1">MAYBE_UNUNIT_DATA_SLOT</span><span class="pl-k">:</span> <span class="pl-en">MaybeUninit</span>&#x3C;<span class="pl-en">Mutex</span>&#x3C;<span class="pl-en">Option</span>&#x3C;<span class="pl-smi">alloc</span><span class="pl-k">::</span><span class="pl-smi">vec</span><span class="pl-k">::</span><span class="pl-en">Vec</span>&#x3C;<span class="pl-en">u8</span>>>>> <span class="pl-k">=</span>
+        <span class="pl-en">MaybeUninit</span><span class="pl-k">::</span><span class="pl-en">uninit</span>();
+    <span class="pl-k">...</span>
+<span class="pl-c">    /// Initialize the backing data of this module, letting new</span>
+<span class="pl-c">    /// users access it.</span>
+<span class="pl-c">    /// # Safety</span>
+<span class="pl-c">    /// Safe if only called once during the module's lifetime</span>
+    <span class="pl-k">pub</span>(<span class="pl-c1">super</span>) <span class="pl-k">unsafe</span> <span class="pl-k">fn</span> <span class="pl-en">init_data</span>(
+        <span class="pl-smi">lock_ready</span><span class="pl-k">:</span> <span class="pl-k">impl</span> <span class="pl-en">PinInit</span>&#x3C;<span class="pl-en">Lock</span>&#x3C;<span class="pl-en">Option</span>&#x3C;<span class="pl-smi">alloc</span><span class="pl-k">::</span><span class="pl-smi">vec</span><span class="pl-k">::</span><span class="pl-en">Vec</span>&#x3C;<span class="pl-en">u8</span>>>, <span class="pl-en">MutexBackend</span>>>,
+    ) <span class="pl-k">-></span> <span class="pl-en">Result</span>&#x3C;()> {
+        <span class="pl-k">unsafe</span> {
+            <span class="pl-k">let</span> <span class="pl-smi">slot</span> <span class="pl-k">=</span> <span class="pl-c1">MAYBE_UNUNIT_DATA_SLOT</span><span class="pl-k">.</span><span class="pl-en">as_mut_ptr</span>();
+            <span class="pl-smi">lock_ready</span><span class="pl-k">.</span><span class="pl-en">__pinned_init</span>(<span class="pl-smi">slot</span>)<span class="pl-k">?</span>;
+        }
+        <span class="pl-en">Ok</span>(())
+    }
+    <span class="pl-k">...</span>
+<span class="pl-c">    /// Get's the initialized data as a static reference</span>
+<span class="pl-c">    /// # Safety</span>
+<span class="pl-c">    /// Safe only if called after initialization, otherwise</span>
+<span class="pl-c">    /// it will return a pointer to uninitialized memory.  </span>
+    <span class="pl-k">pub</span>(<span class="pl-c1">super</span>) <span class="pl-k">unsafe</span> <span class="pl-k">fn</span> <span class="pl-en">get_initialized_data</span>() <span class="pl-k">-></span> <span class="pl-k">&#x26;</span>'<span class="pl-en">static</span> <span class="pl-en">Mutex</span>&#x3C;<span class="pl-en">Option</span>&#x3C;<span class="pl-smi">alloc</span><span class="pl-k">::</span><span class="pl-smi">vec</span><span class="pl-k">::</span><span class="pl-en">Vec</span>&#x3C;<span class="pl-en">u8</span>>>> {
+        <span class="pl-k">unsafe</span> { <span class="pl-c1">MAYBE_UNUNIT_DATA_SLOT</span><span class="pl-k">.</span><span class="pl-en">assume_init_ref</span>() }
+    }
+    <span class="pl-k">...</span>
+}
+<span class="pl-k">impl</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">Module</span> <span class="pl-k">for</span> <span class="pl-en">RustProcRamFile</span> {
+    <span class="pl-k">fn</span> <span class="pl-en">init</span>(<span class="pl-smi">_module</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span>'<span class="pl-en">static</span> <span class="pl-en">ThisModule</span>) <span class="pl-k">-></span> <span class="pl-en">Result</span>&#x3C;<span class="pl-c1">Self</span>> {
+        <span class="pl-k">...</span>
+        <span class="pl-k">let</span> <span class="pl-smi">data</span> <span class="pl-k">=</span> <span class="pl-en">alloc</span><span class="pl-k">::</span><span class="pl-en">vec</span><span class="pl-k">::</span><span class="pl-en">Vec</span><span class="pl-k">::</span><span class="pl-en">new</span>();
+        <span class="pl-k">let</span> <span class="pl-smi">lock</span> <span class="pl-k">=</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">new_mutex!</span>(<span class="pl-en">Some</span>(<span class="pl-smi">data</span>), <span class="pl-s"><span class="pl-pds">"</span>proc_ram_mutex<span class="pl-pds">"</span></span>);
+        <span class="pl-k">unsafe</span> {
+<span class="pl-c">            // Safety: Only place this is called, has to be invoked before `proc_create`</span>
+            <span class="pl-en">backing_data</span><span class="pl-k">::</span><span class="pl-en">init_data</span>(<span class="pl-smi">lock</span>)<span class="pl-k">?</span>
+        }
+        <span class="pl-k">...</span>
+    }
+}
+</pre></div>
+<p>That's quite a lot.</p>
+<p>First off, the <code>static mut MAYBE_UNUNIT_DATA_SLOT: MaybeUninit&#x3C;Mutex&#x3C;Option&#x3C;alloc::vec::Vec&#x3C;u8>>>> = MaybeUninit::uninit();</code>
+creates static uninitialized memory, that's represented by the <a href="https://doc.rust-lang.org/std/mem/union.MaybeUninit.html">MaybeUninit</a>.
+The memory has space for a <code>Mutex</code> containing an <code>Option&#x3C;alloc::vec::Vec&#x3C;u8>></code>.</p>
+<p>The reason for having the inner data be <code>Option</code> is to be able to remove it on module-unload and properly cleaning it up.
+The <code>Drop</code>-code will show how that cleanup works in more detail, and it's likely a bit pedantic.</p>
+<p>Second, in the module's <code>init</code>, a <code>Vec</code> is created, and put into a <code>PinInit -> Mutex</code> that needs memory for initialization.<br>
+That <code>PinInit</code> is passed to <code>init_data</code> which takes a pointer to the static memory <code>MAYBE_UNUNIT_DATA_SLOT</code> and writes
+the mutex into it.</p>
+<p>Now There's an initialized <code>Mutex</code>.</p>
+<h4>Storing the ProcDirEntry</h4>
+<p>Now a <code>proc_create</code> can be called which will create a <code>proc</code>-file.</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">mod</span> <span class="pl-en">backing_data</span> {
+    <span class="pl-k">...</span>
+    <span class="pl-k">struct</span> <span class="pl-en">SingleAccessPdeStore</span>(<span class="pl-en">UnsafeCell</span>&#x3C;<span class="pl-en">Option</span>&#x3C;<span class="pl-en">ProcDirEntry</span>&#x3C;'<span class="pl-en">static</span>>>>);
+    <span class="pl-k">unsafe</span> <span class="pl-k">impl</span> <span class="pl-en">Sync</span> <span class="pl-k">for</span> <span class="pl-en">SingleAccessPdeStore</span> {}
+    <span class="pl-k">static</span> <span class="pl-c1">ENTRY</span><span class="pl-k">:</span> <span class="pl-en">SingleAccessPdeStore</span> <span class="pl-k">=</span> <span class="pl-en">SingleAccessPdeStore</span>(<span class="pl-en">UnsafeCell</span><span class="pl-k">::</span><span class="pl-en">new</span>(<span class="pl-en">None</span>));
+    <span class="pl-k">...</span>
+<span class="pl-c">    /// Write PDE into static memory</span>
+<span class="pl-c">    /// # Safety</span>
+<span class="pl-c">    /// Any concurrent access is unsafe.  </span>
+    <span class="pl-k">pub</span>(<span class="pl-c1">super</span>) <span class="pl-k">unsafe</span> <span class="pl-k">fn</span> <span class="pl-en">set_pde</span>(<span class="pl-smi">pde</span><span class="pl-k">:</span> <span class="pl-en">ProcDirEntry</span>&#x3C;'<span class="pl-en">static</span>>) {
+        <span class="pl-k">unsafe</span> {
+            <span class="pl-c1">ENTRY</span><span class="pl-k">.</span><span class="pl-c1">0.</span><span class="pl-en">get</span>()<span class="pl-k">.</span><span class="pl-en">write</span>(<span class="pl-en">Some</span>(<span class="pl-smi">pde</span>));
+        }
+    }
+<span class="pl-c">    /// Remove the PDE</span>
+<span class="pl-c">    /// # Safety</span>
+<span class="pl-c">    /// While safe to invoke regardless of PDE initalization,</span>
+<span class="pl-c">    /// any concurrent access is unsafe.  </span>
+    <span class="pl-k">pub</span>(<span class="pl-c1">super</span>) <span class="pl-k">unsafe</span> <span class="pl-k">fn</span> <span class="pl-en">take_pde</span>() <span class="pl-k">-></span> <span class="pl-en">Option</span>&#x3C;<span class="pl-en">ProcDirEntry</span>&#x3C;'<span class="pl-en">static</span>>> {
+        <span class="pl-k">unsafe</span> {
+            <span class="pl-k">let</span> <span class="pl-smi">mut_ref</span> <span class="pl-k">=</span> <span class="pl-c1">ENTRY</span><span class="pl-k">.</span><span class="pl-c1">0.</span><span class="pl-en">get</span>()<span class="pl-k">.</span><span class="pl-en">as_mut</span>()<span class="pl-k">?</span>;
+            <span class="pl-smi">mut_ref</span><span class="pl-k">.</span><span class="pl-en">take</span>()
+        }
+    }
+}
+<span class="pl-k">fn</span> <span class="pl-en">init</span>(<span class="pl-smi">_module</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span>'<span class="pl-en">static</span> <span class="pl-en">ThisModule</span>) <span class="pl-k">-></span> <span class="pl-en">Result</span>&#x3C;<span class="pl-c1">Self</span>> {
+        <span class="pl-k">const</span> <span class="pl-c1">POPS</span><span class="pl-k">:</span> <span class="pl-en">ProcOps</span>&#x3C;'<span class="pl-en">static</span>, <span class="pl-en">ProcHand</span>> <span class="pl-k">=</span> <span class="pl-en">ProcOps</span><span class="pl-k">::</span>&#x3C;'<span class="pl-en">static</span>, <span class="pl-en">ProcHand</span>><span class="pl-k">::</span><span class="pl-en">new</span>(<span class="pl-c1">0</span>);
+<span class="pl-c">        // Struct defined inline since this is the only safe place for it to be used</span>
+        <span class="pl-k">struct</span> <span class="pl-en">ProcHand</span>;
+        <span class="pl-k">impl</span> <span class="pl-en">ProcHand</span> {
+            <span class="pl-k">...</span>
+        }
+        <span class="pl-k">let</span> <span class="pl-smi">data</span> <span class="pl-k">=</span> <span class="pl-en">alloc</span><span class="pl-k">::</span><span class="pl-en">vec</span><span class="pl-k">::</span><span class="pl-en">Vec</span><span class="pl-k">::</span><span class="pl-en">new</span>();
+        <span class="pl-k">let</span> <span class="pl-smi">lock</span> <span class="pl-k">=</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">new_mutex!</span>(<span class="pl-en">Some</span>(<span class="pl-smi">data</span>), <span class="pl-s"><span class="pl-pds">"</span>proc_ram_mutex<span class="pl-pds">"</span></span>);
+        <span class="pl-k">unsafe</span> {
+<span class="pl-c">            // Safety: Only place this is called, has to be invoked before `proc_create`</span>
+            <span class="pl-en">backing_data</span><span class="pl-k">::</span><span class="pl-en">init_data</span>(<span class="pl-smi">lock</span>)<span class="pl-k">?</span>
+        }
+<span class="pl-c">        // This is technically unsound, e.g. READ is not safe to invoke until</span>
+<span class="pl-c">        // `init_data` has been called, but could theoretically be invoked in a safe context before</span>
+<span class="pl-c">        // then, so don't, it's ordered like this for a reason.</span>
+        <span class="pl-k">impl</span> <span class="pl-en">ProcHandler</span>&#x3C;'<span class="pl-en">static</span>> <span class="pl-k">for</span> <span class="pl-en">ProcHand</span> {
+            <span class="pl-k">const</span> <span class="pl-c1">OPEN</span><span class="pl-k">:</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">proc_fs</span><span class="pl-k">::</span><span class="pl-en">ProcOpen</span>&#x3C;'<span class="pl-en">static</span>> <span class="pl-k">=</span> <span class="pl-k">&#x26;</span><span class="pl-c1">Self</span><span class="pl-k">::</span><span class="pl-smi">popen</span>;
+            <span class="pl-k">const</span> <span class="pl-c1">READ</span><span class="pl-k">:</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">proc_fs</span><span class="pl-k">::</span><span class="pl-en">ProcRead</span>&#x3C;'<span class="pl-en">static</span>> <span class="pl-k">=</span>
+                <span class="pl-k">&#x26;|</span><span class="pl-smi">f</span>, <span class="pl-smi">u</span>, <span class="pl-smi">o</span><span class="pl-k">|</span> <span class="pl-k">unsafe</span> { <span class="pl-c1">Self</span><span class="pl-k">::</span><span class="pl-en">pread</span>(<span class="pl-smi">f</span>, <span class="pl-smi">u</span>, <span class="pl-smi">o</span>) };
+            <span class="pl-k">const</span> <span class="pl-c1">WRITE</span><span class="pl-k">:</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">proc_fs</span><span class="pl-k">::</span><span class="pl-en">ProcWrite</span>&#x3C;'<span class="pl-en">static</span>> <span class="pl-k">=</span>
+                <span class="pl-k">&#x26;|</span><span class="pl-smi">f</span>, <span class="pl-smi">u</span>, <span class="pl-smi">o</span><span class="pl-k">|</span> <span class="pl-k">unsafe</span> { <span class="pl-c1">Self</span><span class="pl-k">::</span><span class="pl-en">pwrite</span>(<span class="pl-smi">f</span>, <span class="pl-smi">u</span>, <span class="pl-smi">o</span>) };
+            <span class="pl-k">const</span> <span class="pl-c1">LSEEK</span><span class="pl-k">:</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">proc_fs</span><span class="pl-k">::</span><span class="pl-en">ProcLseek</span>&#x3C;'<span class="pl-en">static</span>> <span class="pl-k">=</span>
+                <span class="pl-k">&#x26;|</span><span class="pl-smi">f</span>, <span class="pl-smi">o</span>, <span class="pl-smi">w</span><span class="pl-k">|</span> <span class="pl-k">unsafe</span> { <span class="pl-c1">Self</span><span class="pl-k">::</span><span class="pl-en">plseek</span>(<span class="pl-smi">f</span>, <span class="pl-smi">o</span>, <span class="pl-smi">w</span>) };
+        }
+        <span class="pl-k">let</span> <span class="pl-smi">pde</span> <span class="pl-k">=</span> <span class="pl-en">proc_create</span>(<span class="pl-en">c_str!</span>(<span class="pl-s"><span class="pl-pds">"</span>rust-proc-file<span class="pl-pds">"</span></span>), <span class="pl-c1">0666</span>, <span class="pl-en">None</span>, <span class="pl-k">&#x26;</span><span class="pl-c1">POPS</span>)<span class="pl-k">?</span>;
+        <span class="pl-k">unsafe</span> {
+<span class="pl-c">            // Safety: Only place this is called, no concurrent access</span>
+            <span class="pl-en">backing_data</span><span class="pl-k">::</span><span class="pl-en">set_pde</span>(<span class="pl-smi">pde</span>);
+        }
+        <span class="pl-en">pr_info!</span>(<span class="pl-s"><span class="pl-pds">"</span>Loaded /proc/rust-proc-file<span class="pl-cce">\n</span><span class="pl-pds">"</span></span>);
+        <span class="pl-en">Ok</span>(<span class="pl-c1">Self</span>)
+    }
+</pre></div>
+<p>That's also quite a lot.</p>
+<p>Now the code is encountering issues with unsoundness (an API that is not marked as unsafe but is unsafe under some conditions).</p>
+<p>Starting from the top:</p>
+<p>Calling <code>proc_create</code> returns a <code>ProcDirEntry</code> which when dropped removes the <code>proc</code>-file. The entry should be kept alive
+until the module is dropped. Therefore, a static variable <code>ENTRY</code> is created to house it, it will get removed on
+the module's <code>Drop</code>.</p>
+<p><code>static</code>-entries need to be <a href="https://doc.rust-lang.org/std/marker/trait.Sync.html">Sync</a> i.e. it can be shared between threads,
+<code>UnsafeCell</code> is not <code>Sync</code>, it therefore needs to be wrapped in the <a href="https://doc.rust-lang.org/rust-by-example/generics/new_types.html">newtype</a>
+<code>SingleAccessPdeStore</code>. It is indeed safe to be shared between threads in some conditions, so
+<code>Sync</code> is unsafely implemented through:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">unsafe</span> <span class="pl-k">impl</span> <span class="pl-en">Sync</span> <span class="pl-k">for</span> <span class="pl-en">SingleAccessPdeStore</span> {}
+</pre></div>
+<p>It tells the compiler that even though it doesn't look <code>Sync</code> it should treat is as <code>Sync</code>.
+(<code>Sync</code> and <code>Send</code> are examples of automatic trait implementations, if a <code>struct</code> contain types that all implement
+<code>Send</code> and/or <code>Sync</code>, that struct will also implement <code>Send</code> or <code>Sync</code>, a bit more on that <a href="https://doc.rust-lang.org/stable/unstable-book/language-features/auto-traits.html">here</a>).</p>
+<p>Next comes two <code>unsafe</code> functions. One sets the <code>ENTRY</code> to a provided <code>ProcDirEntry&#x3C;'static></code>,
+the operation is safe as long as it doesn't happen concurrently, that would create a data-race.</p>
+<p>The other takes the <code>ProcDirEntry</code> from <code>ENTRY</code>, this is done on module teardown, when the module is unloaded, for example
+through <a href="https://linux.die.net/man/8/rmmod">rmmod</a>, <code>rmmod rust_proc_ram_file</code>.</p>
+<p>Entering the <code>init</code>-function, there are struct definitions and trait-implementations defined inside the function.<br>
+The reasons for this is to make some inherent <code>unsoundness</code> about the memory-lifecycle less dangerous, it's worth getting
+into why that it is, and what the trade-offs of having some <code>unsoundness</code> is.</p>
+<h4>Memory lifecycle, you, me, and <code>C</code></h4>
+<p>Again, the C-api looks like this:</p>
+<div class="highlight highlight-c"><pre><span class="pl-k">struct</span> proc_ops {
+	<span class="pl-k">unsigned</span> <span class="pl-k">int</span> proc_flags;
+	<span class="pl-c1">int</span>	(*proc_open)(<span class="pl-k">struct</span> inode *, <span class="pl-k">struct</span> file *);
+	<span class="pl-c1">ssize_t</span>	(*proc_read)(<span class="pl-k">struct</span> file *, <span class="pl-k">char</span> __user *, <span class="pl-c1">size_t</span>, <span class="pl-c1">loff_t</span> *);
+	<span class="pl-c1">ssize_t</span> (*proc_read_iter)(<span class="pl-k">struct</span> kiocb *, <span class="pl-k">struct</span> iov_iter *);
+	<span class="pl-c1">ssize_t</span>	(*proc_write)(<span class="pl-k">struct</span> file *, <span class="pl-k">const</span> <span class="pl-k">char</span> __user *, <span class="pl-c1">size_t</span>, <span class="pl-c1">loff_t</span> *);
+	<span class="pl-c">/* mandatory unless nonseekable_open() or equivalent is used */</span>
+	<span class="pl-c1">loff_t</span>	(*proc_lseek)(<span class="pl-k">struct</span> file *, <span class="pl-c1">loff_t</span>, <span class="pl-k">int</span>);
+	<span class="pl-c1">int</span>	(*proc_release)(<span class="pl-k">struct</span> inode *, <span class="pl-k">struct</span> file *);
+	<span class="pl-c1">__poll_t</span> (*proc_poll)(<span class="pl-k">struct</span> file *, <span class="pl-k">struct</span> poll_table_struct *);
+	<span class="pl-c1">long</span>	(*proc_ioctl)(<span class="pl-k">struct</span> file *, <span class="pl-k">unsigned</span> <span class="pl-k">int</span>, <span class="pl-k">unsigned</span> <span class="pl-k">long</span>);
+#<span class="pl-k">ifdef</span> CONFIG_COMPAT
+	<span class="pl-c1">long</span>	(*proc_compat_ioctl)(<span class="pl-k">struct</span> file *, <span class="pl-k">unsigned</span> <span class="pl-k">int</span>, <span class="pl-k">unsigned</span> <span class="pl-k">long</span>);
+#<span class="pl-k">endif</span>
+	<span class="pl-c1">int</span>	(*proc_mmap)(<span class="pl-k">struct</span> file *, <span class="pl-k">struct</span> vm_area_struct *);
+	<span class="pl-k">unsigned</span> <span class="pl-smi">long</span> (*proc_get_unmapped_area)(<span class="pl-k">struct</span> file *, <span class="pl-k">unsigned</span> <span class="pl-k">long</span>, <span class="pl-k">unsigned</span> <span class="pl-k">long</span>, <span class="pl-k">unsigned</span> <span class="pl-k">long</span>, <span class="pl-k">unsigned</span> <span class="pl-k">long</span>);
+} __randomize_layout;
+<span class="pl-k">struct</span> proc_dir_entry *<span class="pl-en">proc_create</span>(<span class="pl-k">const</span> <span class="pl-k">char</span> *name, <span class="pl-c1">umode_t</span> mode, <span class="pl-k">struct</span> proc_dir_entry *parent, <span class="pl-k">const</span> <span class="pl-k">struct</span> proc_ops *proc_ops);
+</pre></div>
+<p>So, the module needs to call the function <code>proc_create</code> supplying a pointer <code>const struct proc_ops *proc_ops</code>
+which itself contains function pointers. What are the lifetime requirements?</p>
+<p><code>const struct proc_ops *proc_ops</code> has a requirement to live until <code>proc_remove</code> is called on the returned <code>proc_dir_entry*</code>,
+that's easily represented in <code>Rust</code>, we could model the API to accept something with the lifetime <code>'a</code> and return
+a <code>ProcDirEntry&#x3C;'a></code>, taking ownership of the reference to <code>ProcOps</code> and calling <code>proc_remove</code> in the destructor.</p>
+<p>But how long do the function pointers that are themselves contained in <code>proc_ops</code> need to live?</p>
+<p>On could assume it's the same, <code>'a</code>, but let's consider how the kernel 'routes' a user through the module and the
+lifecycle of an interaction.</p>
+<h5>A user interaction</h5>
+<p>A user wants to open the file, by name.</p>
+<ol>
+<li>The user issues the <a href="https://man7.org/linux/man-pages/man2/open.2.html">open</a> syscall.
+<li>The kernel accepts the open syscall, and finds this <code>*proc_dir_entry</code>.
+<li>The kernel enters the <code>proc_open</code>-function.
+<li>The kernel sets the correct register return address value.
+<li>The kernel yields execution.
+</ol>
+<p>The kernel handles two pointers from the module, non-atomically, in separate steps, multiple users could trigger
+this interaction concurrently (the reason for the lock).</p>
+<p>So, in the case that there exists a <code>*proc_dir_entry</code> but the <code>proc_open</code>-function pointer is
+<a href="https://en.wikipedia.org/wiki/Dangling_pointer">dangling</a>,
+because the lifetime of it is less than <code>*proc_dir_entry</code>, or they have the same lifetime but the mechanics of
+the free happens in an unfavourable order. In that case, the kernel will try to access a dangling pointer,
+which may or may not cause chaos. A dangling pointer is worse than a null-pointer in this case, since a
+null-pointer is generally going to be acceptable.</p>
+<p>In another case, the <code>proc_dir_entry</code> may definitively be removed first, but since some process may have read the
+function pointer <code>proc_open</code> from it, but not started executing it (race) yet, <code>proc_open</code> can theoretically
+never be safely destroyed. The reason for that is because in a <a href="https://en.wikipedia.org/wiki/Time-sharing">time-sharing OS</a>
+no guarantees are made about the timeliness of operations. Therefore, the lifetime requirement of
+<code>proc_open</code> is <code>'static</code> as represented by:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">...</span>
+<span class="pl-k">const</span> <span class="pl-c1">OPEN</span><span class="pl-k">:</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">proc_fs</span><span class="pl-k">::</span><span class="pl-en">ProcOpen</span>&#x3C;'<span class="pl-en">static</span>> <span class="pl-k">=</span> <span class="pl-k">&#x26;</span><span class="pl-c1">Self</span><span class="pl-k">::</span><span class="pl-smi">popen</span>;
+<span class="pl-k">...</span>
+</pre></div>
+<h5>Constraints caused by <code>'static</code>-lifetimes</h5>
+<p>Static (sloppily expressed) means 'for the duration of the program', if there's a <code>'static</code>-requirement for a variable
+it means that that variable needs its memory to be allocated in the binary.</p>
+<p>An example would be a string literal</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">const</span> <span class="pl-c1">MY_STR</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span>'<span class="pl-en">static</span> <span class="pl-en">str</span> <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>hello<span class="pl-pds">"</span></span>;
+<span class="pl-k">static</span> <span class="pl-c1">MY_STR2</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span>'<span class="pl-en">static</span> <span class="pl-en">str</span> <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>hello<span class="pl-pds">"</span></span>;
+<span class="pl-c">// or </span>
+<span class="pl-k">fn</span> <span class="pl-en">my_fn</span>() {
+    <span class="pl-k">let</span> <span class="pl-smi">my_str</span> <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>hello<span class="pl-pds">"</span></span>;
+}
+</pre></div>
+<p>In all cases the string-literal exists in the binary, the difference between these cases are that in the
+case of the <code>const</code>-variable some space is allocated in the binary that fits a reference to a <code>str</code>,
+which may point to some data that exist in the <code>data</code>-section of the binary (or somewhere else, implementation dependent).<br>
+<code>const</code> also dictates that this value may never change.</p>
+<p><code>static</code> also makes sure that the binary has space for the variable (still a reference to a string), it will also
+point to some data that is likely to be in the <code>data</code>-section, but it is theoretically legal to change the data that
+it's pointing to (with some constraints).</p>
+<p>In the function, space is made available on the stack for the reference, but the actual <code>hello</code> is likely again in
+the <code>data</code>-section.</p>
+<h5>Using static data for the backing storage</h5>
+<p>Looking back at the purpose of the module, data needs to be stored with a static lifetime, there are multiple ways
+to achieve this in <code>Rust</code>, the data can be owned directly, like a member of the module <code>RustProcRamFile</code>.
+However, this means that when the module is dropped, the data is dropped as well. Since the function-pointers
+have a <code>'static</code>-requirement that doesn't work.</p>
+<p>Even if the data is wrapped in a <code>Box</code>, or an <code>Arc</code>, the <code>RustProcRamFile</code>-module can't own it for the above reason,
+the functions needs to live for the duration of the program (and be valid), a global static is necessary (sigh).</p>
+<p>Here is where the globals come in:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">...</span>
+<span class="pl-k">static</span> <span class="pl-k">mut</span> <span class="pl-c1">MAYBE_UNINIT_DATA_SLOT</span><span class="pl-k">:</span> <span class="pl-en">MaybeUninit</span>&#x3C;<span class="pl-en">Mutex</span>&#x3C;<span class="pl-en">Option</span>&#x3C;<span class="pl-smi">alloc</span><span class="pl-k">::</span><span class="pl-smi">vec</span><span class="pl-k">::</span><span class="pl-en">Vec</span>&#x3C;<span class="pl-en">u8</span>>>>> <span class="pl-k">=</span>
+        <span class="pl-en">MaybeUninit</span><span class="pl-k">::</span><span class="pl-en">uninit</span>();
+<span class="pl-k">...</span>
+<span class="pl-k">static</span> <span class="pl-c1">ENTRY</span><span class="pl-k">:</span> <span class="pl-en">SingleAccessPdeStore</span> <span class="pl-k">=</span> <span class="pl-en">SingleAccessPdeStore</span>(<span class="pl-en">UnsafeCell</span><span class="pl-k">::</span><span class="pl-en">new</span>(<span class="pl-en">None</span>));
+<span class="pl-k">...</span>
+<span class="pl-k">const</span> <span class="pl-c1">POPS</span><span class="pl-k">:</span> <span class="pl-en">ProcOps</span>&#x3C;'<span class="pl-en">static</span>, <span class="pl-en">ProcHand</span>> <span class="pl-k">=</span> <span class="pl-en">ProcOps</span><span class="pl-k">::</span>&#x3C;'<span class="pl-en">static</span>, <span class="pl-en">ProcHand</span>><span class="pl-k">::</span><span class="pl-en">new</span>(<span class="pl-c1">0</span>);
+</pre></div>
+<p>Comes in.</p>
+<p>Looking at the definitions, two of these contain data that can (and will) be changed, those are therefore <code>static</code>,
+one (the container of the functions that are passed through the <code>C</code>-api) is marked as <code>const</code>, since it will never change.</p>
+<p><code>MAYBE_UNINIT_DATA_SLOT</code> is MaybeUninit, so that when the program starts, there is already space made available in
+the binary for the data it will contain, on module-initialization data will be written into that.</p>
+<p>Same goes for <code>Entry</code>, <code>UnsafeCell</code> does essentially the same thing, there's a reason that both aren't wrapped by
+<code>UnsafeCell&#x3C;Option></code>, partially performance.</p>
+<h5>MaybeUninit vs UnsafeCell&#x3C;Option></h5>
+<p><a href="https://doc.rust-lang.org/std/mem/union.MaybeUninit.html">MaybeUninit</a> contains potentially uninitialized data.
+Accessing that data, by for example creating a reference to it, is UB if that data is not yet initialized.<br>
+Which means that the requirements for safe-access is only possible if:</p>
+<ol>
+<li>Non-modifying access happens after initialization.
+<li>Modifying access happens in a non-concurrent context.
+</ol>
+<p><a href="https://doc.rust-lang.org/std/cell/struct.UnsafeCell.html">UnsafeCell&#x3C;Option></a> does not contain potentially
+uninitialized data, the uninitialized state is represented by the <code>Option</code>.
+Safe access only requires that there is no concurrent access (of any kind) at the same time as mutable access.
+It's a bit easier to make safe.</p>
+<p>I would prefer <code>UnsafeCell&#x3C;Option&#x3C;T>></code> in both cases, but as the <code>PinInit</code>-api is constructed (that is needed for
+the <code>Mutex</code>), a slot of type <code>T</code> (being the <code>Mutex</code>) needs to be provided. Therefore, it would have to be
+<code>static UnsafeCell&#x3C;Lock&#x3C;..>></code> which cannot be instantiated at compile-time in the same way that an <code>UnsafeCell&#x3C;Option&#x3C;T>></code>
+can (<code>static MY_VAR: UnsafeCell&#x3C;Option&#x3C;String>> = UnsafeCell::new(None)</code> for example).</p>
+<p>That is the reason why the variables look like they do.</p>
+<h5>Global POPS and an unsound API</h5>
+<p>Back again to POPS, the <code>init</code>-function and unsoundness:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">fn</span> <span class="pl-en">init</span>(<span class="pl-smi">_module</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span>'<span class="pl-en">static</span> <span class="pl-en">ThisModule</span>) <span class="pl-k">-></span> <span class="pl-en">Result</span>&#x3C;<span class="pl-c1">Self</span>> {
+        <span class="pl-k">const</span> <span class="pl-c1">POPS</span><span class="pl-k">:</span> <span class="pl-en">ProcOps</span>&#x3C;'<span class="pl-en">static</span>, <span class="pl-en">ProcHand</span>> <span class="pl-k">=</span> <span class="pl-en">ProcOps</span><span class="pl-k">::</span>&#x3C;'<span class="pl-en">static</span>, <span class="pl-en">ProcHand</span>><span class="pl-k">::</span><span class="pl-en">new</span>(<span class="pl-c1">0</span>);
+<span class="pl-c">        // Struct defined inline since this is the only safe place for it to be used</span>
+        <span class="pl-k">struct</span> <span class="pl-en">ProcHand</span>;
+        <span class="pl-k">impl</span> <span class="pl-en">ProcHand</span> {
+            <span class="pl-k">...</span>
+        }
+        <span class="pl-k">let</span> <span class="pl-smi">data</span> <span class="pl-k">=</span> <span class="pl-en">alloc</span><span class="pl-k">::</span><span class="pl-en">vec</span><span class="pl-k">::</span><span class="pl-en">Vec</span><span class="pl-k">::</span><span class="pl-en">new</span>();
+        <span class="pl-k">let</span> <span class="pl-smi">lock</span> <span class="pl-k">=</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">new_mutex!</span>(<span class="pl-en">Some</span>(<span class="pl-smi">data</span>), <span class="pl-s"><span class="pl-pds">"</span>proc_ram_mutex<span class="pl-pds">"</span></span>);
+        <span class="pl-k">unsafe</span> {
+<span class="pl-c">            // Safety: Only place this is called, has to be invoked before `proc_create`</span>
+            <span class="pl-en">backing_data</span><span class="pl-k">::</span><span class="pl-en">init_data</span>(<span class="pl-smi">lock</span>)<span class="pl-k">?</span>
+        }
+<span class="pl-c">        // This is technically unsound, e.g. READ is not safe to invoke until</span>
+<span class="pl-c">        // `init_data` has been called, but could theoretically be invoked in a safe context before</span>
+<span class="pl-c">        // then, so don't, it's ordered like this for a reason.</span>
+        <span class="pl-k">impl</span> <span class="pl-en">ProcHandler</span>&#x3C;'<span class="pl-en">static</span>> <span class="pl-k">for</span> <span class="pl-en">ProcHand</span> {
+            <span class="pl-k">const</span> <span class="pl-c1">OPEN</span><span class="pl-k">:</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">proc_fs</span><span class="pl-k">::</span><span class="pl-en">ProcOpen</span>&#x3C;'<span class="pl-en">static</span>> <span class="pl-k">=</span> <span class="pl-k">&#x26;</span><span class="pl-c1">Self</span><span class="pl-k">::</span><span class="pl-smi">popen</span>;
+            <span class="pl-k">const</span> <span class="pl-c1">READ</span><span class="pl-k">:</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">proc_fs</span><span class="pl-k">::</span><span class="pl-en">ProcRead</span>&#x3C;'<span class="pl-en">static</span>> <span class="pl-k">=</span>
+                <span class="pl-k">&#x26;|</span><span class="pl-smi">f</span>, <span class="pl-smi">u</span>, <span class="pl-smi">o</span><span class="pl-k">|</span> <span class="pl-k">unsafe</span> { <span class="pl-c1">Self</span><span class="pl-k">::</span><span class="pl-en">pread</span>(<span class="pl-smi">f</span>, <span class="pl-smi">u</span>, <span class="pl-smi">o</span>) };
+            <span class="pl-k">const</span> <span class="pl-c1">WRITE</span><span class="pl-k">:</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">proc_fs</span><span class="pl-k">::</span><span class="pl-en">ProcWrite</span>&#x3C;'<span class="pl-en">static</span>> <span class="pl-k">=</span>
+                <span class="pl-k">&#x26;|</span><span class="pl-smi">f</span>, <span class="pl-smi">u</span>, <span class="pl-smi">o</span><span class="pl-k">|</span> <span class="pl-k">unsafe</span> { <span class="pl-c1">Self</span><span class="pl-k">::</span><span class="pl-en">pwrite</span>(<span class="pl-smi">f</span>, <span class="pl-smi">u</span>, <span class="pl-smi">o</span>) };
+            <span class="pl-k">const</span> <span class="pl-c1">LSEEK</span><span class="pl-k">:</span> <span class="pl-en">kernel</span><span class="pl-k">::</span><span class="pl-en">proc_fs</span><span class="pl-k">::</span><span class="pl-en">ProcLseek</span>&#x3C;'<span class="pl-en">static</span>> <span class="pl-k">=</span>
+                <span class="pl-k">&#x26;|</span><span class="pl-smi">f</span>, <span class="pl-smi">o</span>, <span class="pl-smi">w</span><span class="pl-k">|</span> <span class="pl-k">unsafe</span> { <span class="pl-c1">Self</span><span class="pl-k">::</span><span class="pl-en">plseek</span>(<span class="pl-smi">f</span>, <span class="pl-smi">o</span>, <span class="pl-smi">w</span>) };
+        }
+        <span class="pl-k">let</span> <span class="pl-smi">pde</span> <span class="pl-k">=</span> <span class="pl-en">proc_create</span>(<span class="pl-en">c_str!</span>(<span class="pl-s"><span class="pl-pds">"</span>rust-proc-file<span class="pl-pds">"</span></span>), <span class="pl-c1">0666</span>, <span class="pl-en">None</span>, <span class="pl-k">&#x26;</span><span class="pl-c1">POPS</span>)<span class="pl-k">?</span>;
+        <span class="pl-k">unsafe</span> {
+<span class="pl-c">            // Safety: Only place this is called, no concurrent access</span>
+            <span class="pl-en">backing_data</span><span class="pl-k">::</span><span class="pl-en">set_pde</span>(<span class="pl-smi">pde</span>);
+        }
+        <span class="pl-en">pr_info!</span>(<span class="pl-s"><span class="pl-pds">"</span>Loaded /proc/rust-proc-file<span class="pl-cce">\n</span><span class="pl-pds">"</span></span>);
+        <span class="pl-en">Ok</span>(<span class="pl-c1">Self</span>)
+    }
+</pre></div>
+<p><code>ProcHand::pread, ProcHand::pwrite, and ProcHand::plseek</code> all access data that is not safe to
+access any time before initialization, but safe to access after, therefore they are marked as unsafe.</p>
+<p>However, since the API (that I wrote...) takes a safe-function, they are wrapped by a <code>'static</code> closure that
+is safe, then uses an <code>unsafe</code>-block internally.</p>
+<p>This wrapping is implemented AFTER the code that initializes the data that is safe to access after initialization.
+However, the API is still <code>unsound</code>, since the function could theoretically be called before that initialization,
+even though it's defined after it.</p>
+<p>One note on the wrapping, running it through <a href="https://godbolt.org/z/qrTW5PTW8">godbolt</a> again shows it's still being inlined.</p>
+<p>This problem can be worked around, by for example, creating a <code>static INITIALIZED: AtomicBool = AtomicBool::new(false);</code>,
+and then setting that during initialization. But that requires an atomic-read on each access for something that
+is set once on initialization. This is a tradeoff of soundness vs performance, in this case performance is chosen,
+because the plan for this code is not to be distributed to someone else's production environment,
+or having to be maintained by someone else. In that case opting for soundness may be preferable, although the
+'window' for creating UB here is quite slim.</p>
+<h4>Deallocation</h4>
+<p>Finally, the data is set up, and can be used with some constraints, now the teardown.</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">impl</span> <span class="pl-en">Drop</span> <span class="pl-k">for</span> <span class="pl-en">RustProcRamFile</span> {
+    <span class="pl-k">fn</span> <span class="pl-en">drop</span>(<span class="pl-k">&#x26;mut</span> <span class="pl-c1">self</span>) {
+<span class="pl-c">        // Remove the PDE if initialized</span>
+<span class="pl-c">        // Drop it to remove the proc entry</span>
+        <span class="pl-k">unsafe</span> {
+<span class="pl-c">            // Safety:</span>
+<span class="pl-c">            // Runs at most once, no concurrent access</span>
+            <span class="pl-en">backing_data</span><span class="pl-k">::</span><span class="pl-en">take_pde</span>();
+        }
+<span class="pl-c">        // Remove and deallocate the data</span>
+        <span class="pl-k">unsafe</span> {
+<span class="pl-c">            // Safety:</span>
+<span class="pl-c">            // This module is only instantiated if data is initialized, therefore</span>
+<span class="pl-c">            // the data is initialized when this destructor is run.</span>
+            <span class="pl-en">backing_data</span><span class="pl-k">::</span><span class="pl-en">get_initialized_data</span>()<span class="pl-k">.</span><span class="pl-en">lock</span>()<span class="pl-k">.</span><span class="pl-en">take</span>();
+        }
+<span class="pl-c">        // There is theoretically a race-condition, where module-users are currently in a</span>
+<span class="pl-c">        // proc handler, the handler itself is 'static, so the kernel will be trusted</span>
+<span class="pl-c">        // to keep function-related memory initialized until it's no longer needed.</span>
+<span class="pl-c">        // There is a race-condition where it's impossible that the file can be removed, and it's made sure that all users</span>
+<span class="pl-c">        // get a 'graceful' exit, i.e. all users who can see a file and start a proc-op gets to</span>
+<span class="pl-c">        // finish it. This is because the module recording that a user has entered, and removing</span>
+<span class="pl-c">        // the proc-entry can't happen atomically together. It's impossible to ensure that there</span>
+<span class="pl-c">        // isn't a gap between a user entering the proc-handler, then recording its presence, and</span>
+<span class="pl-c">        // removing the proc-entry and checking if the user registered.</span>
+<span class="pl-c">        // In that case, the user will get an EBUSY</span>
+    }
+}
+</pre></div>
+<p>First, the <code>ProcDirEntry</code> is dropped, invoking the kernel's <code>proc_remove</code> removing the proc-file.<br>
+After that, a reference to the initialized data is taken, and the mutex is accessed to remove the backing-data for the
+'file'. When that data is dropped, the backing data will be deallocated.
+With that, all runtime-created data is removed, the only thing that may remain are function pointers which were static
+anyway, and accessing them will produce a safe error.</p>
+<h2>Summing up</h2>
+<p>All important parts are now covered, the actual implementation of <code>pread</code>, <code>pwrite</code>, <code>plseek</code>, is fairly boring
+and straight-forward, the full code can be found <a href="https://github.com/MarcusGrass/linux/tree/8e8c948133ca1a0cbf8f8add191daa739a193d99">here</a>
+if that, and the rest of the implementation is interesting.</p>
+<h3>Generating bindings</h3>
+<p>First off bindings for the Linux <code>C</code>-API for creating a <code>proc-file</code> had to be generated, it only required adding
+a header in the list <a href="https://github.com/MarcusGrass/linux/blob/8e8c948133ca1a0cbf8f8add191daa739a193d99/rust/bindings/bindings_helper.h#L19">here</a></p>
+<h3>Wrapping the API with reasonable lifetimes</h3>
+<p>The <code>C</code>-API has some lifetime requirements, those are encoded in the <a href="https://github.com/MarcusGrass/linux/blob/8e8c948133ca1a0cbf8f8add191daa739a193d99/rust/kernel/proc_fs.rs">proc_fs.rs</a>.</p>
+<p>The <code>C</code>-API-parts that take function-pointers can be wrapped by a <code>Rust</code>-fn with zero-cost (<a href="https://godbolt.org/z/qrTW5PTW8">as was show here</a>), allowing a more <code>Rust</code>-y API
+to be exposed.</p>
+<h3>Dealing with static data in a concurrent context</h3>
+<p>Some static data needs to be initialized at runtime but not concurrently mutably accessed, that was represented by a <code>MaybeUninit</code>.</p>
+<p>Some static data does not need to be initialized at runtime, but cannot be mutable access concurrently, that was
+represented by an <code>UnsafeCell&#x3C;Option&#x3C;T>></code>.</p>
+<p>Some static data was also constant, never mutable, and safe for all non-mutable access, that was represented by a
+regular <code>const &#x3C;VAR></code>.</p>
+<h3>Tradeoff between soundness and performance</h3>
+<p>Lastly, there was a tradeoff where some functions were arbitrarily marked as safe, even though they are unsafe under
+same conditions. Whether that tradeoff is justified is up to the programmer.</p>
+</div>
+</div>
\ No newline at end of file
diff --git a/static-pie.html b/static-pie.html
new file mode 100644
index 0000000..dead529
--- /dev/null
+++ b/static-pie.html
@@ -0,0 +1,332 @@
+<!DOCTYPE html>
+<html lang="en" xmlns="http://www.w3.org/1999/html">
+
+    <meta charset="UTF-8">
+    <base href="/">
+    <link rel="stylesheet" href="static/styles.css">
+    <link rel="stylesheet" href="static/github-markdown.css">
+    <link rel="stylesheet" href="static/starry_night.css">
+    <title>StaticPie</title>
+
+
+<div id="menu">
+<a href=/ class="menu-item">Home</a><a href=/table-of-contents.html class="menu-item">Table of contents</a>
+</div>
+<div id="content">
+<div class="markdown-body"><h1>Static pie linking a nolibc Rust binary</h1>
+<p>Something has been bugging me for a while with <a href="https://github.com/MarcusGrass/tiny-std">tiny-std</a>,
+if I try to compile executables created with them as <code>-C target-feature=+crt-static</code> (statically link the <code>C</code>-runtime),
+it segfaults.</p>
+<p>The purpose of creating <code>tiny-std</code> was to avoid <code>C</code>, but to get <code>Rust</code> to link a binary statically, that flag needs
+to be passed. <code>-C target-feature=+crt-static -C relocation-model=static</code> does produce a valid binary though.
+The default relocation-model for static binaries is <code>-C relocation-model=pie</code>,
+(at least for the target <code>x86_64-unknown-linux-gnu</code>) so something about <code>PIE</code>-executables created with <code>tiny-std</code> fails,
+in this writeup I'll go into the solution for that.</p>
+<h2>Static pie linking</h2>
+<p>Static pie linking is a combination of two concepts.</p>
+<ol>
+<li><a href="https://en.wikipedia.org/wiki/Static_library">Static linking</a>, putting everything in the same place at compile time.
+As opposed to dynamic linking, where library dependencies can be found and used at runtime.
+Statically linking an executable gives it the property that it can be run on any system
+that can handle the executable type, i.e. I can start a statically linked elf-executable on any platform that can run
+elf-executables. Whereas a dynamically linked executable will not start if its dynamic dependencies cannot be found
+at application start.
+<li><a href="https://en.wikipedia.org/wiki/Position-independent_code">Position-independent code</a> is able to run properly
+regardless of where in memory is placed. The benefit, as I understand it, is security, and platform compatibility-related.
+</ol>
+<p>When telling <code>rustc</code> to create a static-pie linked executable through <code>-C target-feature=+crt-static -C relocation-model=pie</code>
+(relocation-model defaults to pie, could be omitted), it creates an elf-executable which has a header that marks it as
+<code>DYN</code>. Here's what an example <code>readelf -h</code> looks like:</p>
+<div class="highlight highlight-shell"><pre>ELF Header:
+  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
+  Class:                             ELF64
+  Data:                              2<span class="pl-s"><span class="pl-pds">'</span>s complement, little endian</span>
+<span class="pl-s">  Version:                           1 (current)</span>
+<span class="pl-s">  OS/ABI:                            UNIX - System V</span>
+<span class="pl-s">  ABI Version:                       0</span>
+<span class="pl-s">  Type:                              DYN (Position-Independent Executable file)</span>
+<span class="pl-s">  Machine:                           Advanced Micro Devices X86-64</span>
+<span class="pl-s">  Version:                           0x1</span>
+<span class="pl-s">  Entry point address:               0x24b8</span>
+<span class="pl-s">  Start of program headers:          64 (bytes into file)</span>
+<span class="pl-s">  Start of section headers:          1894224 (bytes into file)</span>
+<span class="pl-s">  Flags:                             0x0</span>
+<span class="pl-s">  Size of this header:               64 (bytes)</span>
+<span class="pl-s">  Size of program headers:           56 (bytes)</span>
+<span class="pl-s">  Number of program headers:         9</span>
+<span class="pl-s">  Size of section headers:           64 (bytes)</span>
+<span class="pl-s">  Number of section headers:         32</span>
+<span class="pl-s">  Section header string table index: 20</span>
+</pre></div>
+<p>This signals to the OS that the executable can be run position-independently, but since <code>tiny-std</code> assumes that
+memory addresses are absolute, the ones they were when compiled, the executable segfaults as soon as it tries to get
+the address of any symbols, like functions or static variables, since those have been moved.</p>
+<h2>Where are my symbols?</h2>
+<p>This seems like a tricky problem, as a programmer, I have a bunch of variable and function calls, some that the
+<code>Rust</code>-language emits for me, now each of the addresses for those variables and functions are in another place in memory.<br>
+Before using any of them I need to remap them, which means that I need to have remapping code before using any
+function calls (kinda).</p>
+<h2>The start function</h2>
+<p>The executable enters through the <code>_start</code> function, this is defined in <code>asm</code> for <code>tiny-std</code>:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-c">// Binary entrypoint</span>
+#[cfg(all(feature <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>symbols<span class="pl-pds">"</span></span>, feature <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>start<span class="pl-pds">"</span></span>, target_arch <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>x86_64<span class="pl-pds">"</span></span>))]
+<span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">arch</span><span class="pl-k">::</span><span class="pl-en">global_asm!</span>(
+    <span class="pl-s"><span class="pl-pds">"</span>.text<span class="pl-pds">"</span></span>,
+    <span class="pl-s"><span class="pl-pds">"</span>.global _start<span class="pl-pds">"</span></span>,
+    <span class="pl-s"><span class="pl-pds">"</span>.type _start,@function<span class="pl-pds">"</span></span>,
+    <span class="pl-s"><span class="pl-pds">"</span>_start:<span class="pl-pds">"</span></span>,
+    <span class="pl-s"><span class="pl-pds">"</span>xor rbp,rbp<span class="pl-pds">"</span></span>,<span class="pl-c"> // Zero the stack-frame pointer</span>
+    <span class="pl-s"><span class="pl-pds">"</span>mov rdi, rsp<span class="pl-pds">"</span></span>,<span class="pl-c"> // Move the stack pointer into rdi, c-calling convention arg 1</span>
+    <span class="pl-s"><span class="pl-pds">"</span>.weak _DYNAMIC<span class="pl-pds">"</span></span>,<span class="pl-c"> // Elf dynamic symbol</span>
+    <span class="pl-s"><span class="pl-pds">"</span>.hidden _DYNAMIC<span class="pl-pds">"</span></span>,
+    <span class="pl-s"><span class="pl-pds">"</span>lea rsi, [rip + _DYNAMIC]<span class="pl-pds">"</span></span>,<span class="pl-c"> // Load the dynamic address off the next instruction to execute incremented by _DYNAMIC into rsi</span>
+    <span class="pl-s"><span class="pl-pds">"</span>and rsp,-16<span class="pl-pds">"</span></span>,<span class="pl-c"> // Align the stack pointer</span>
+    <span class="pl-s"><span class="pl-pds">"</span>call __proxy_main<span class="pl-pds">"</span></span><span class="pl-c"> // Call our rust start function</span>
+);
+</pre></div>
+<p>The assembly prepares the stack by aligning it, putting the stack pointer into arg1 for the coming function-call,
+then adds the offset off <code>_DYNAMIC</code> to the special purpose <code>rip</code>-register address, and puts that in <code>rsi</code> which becomes
+our called function's arg 2.</p>
+<p>After that <code>__proxy_main</code> is called, the signature looks like this:</p>
+<p><code>unsafe extern "C" fn __proxy_main(stack_ptr: *const u8, dynv: *const usize)</code>
+It takes the <code>stack_ptr</code> and the <code>dynv</code>-dynamic vector as arguments, which were provided in
+the above assembly.</p>
+<p>I wrote more about the <code>_start</code>-function in <a href="/pgwm03">pgwm03</a> and <a href="https://fasterthanli.me/series/making-our-own-executable-packer/part-12">fasterthanli.me</a>
+wrote more about it at their great blog, but in short:</p>
+<p>Before running the user's <code>main</code> some setup is required, like arguments, environment variables, <a href="https://man7.org/linux/man-pages/man3/getauxval.3.html">aux-values</a>,
+map in faster functions from the vdso (see <a href="/pgwm03">pgwm03</a> for more on that), and set up some thread-state,
+see <a href="/threads">the thread writeup</a> for that.</p>
+<p>All these variables come off the executable's stack, which is why stack pointer needs to be passed as an argument to
+our setup-function, so that it can be used before the stack is polluted by the setup function.</p>
+<p>The first extraction looks like this:</p>
+<div class="highlight highlight-rust"><pre>#[no_mangle]
+#[cfg(all(feature <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>symbols<span class="pl-pds">"</span></span>, feature <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>start<span class="pl-pds">"</span></span>))]
+<span class="pl-k">unsafe</span> <span class="pl-k">extern</span> <span class="pl-s"><span class="pl-pds">"</span>C<span class="pl-pds">"</span></span> <span class="pl-k">fn</span> <span class="pl-en">__proxy_main</span>(<span class="pl-smi">stack_ptr</span><span class="pl-k">:</span> <span class="pl-k">*const</span> <span class="pl-en">u8</span>, <span class="pl-smi">dynv</span><span class="pl-k">:</span> <span class="pl-k">*const</span> <span class="pl-en">usize</span>) {
+<span class="pl-c">    // Fist 8 bytes is a u64 with the number of arguments</span>
+    <span class="pl-k">let</span> <span class="pl-smi">argc</span> <span class="pl-k">=</span> <span class="pl-k">*</span>(<span class="pl-smi">stack_ptr</span> <span class="pl-k">as</span> <span class="pl-k">*const</span> <span class="pl-en">u64</span>);
+<span class="pl-c">    // Directly followed by those arguments, bump pointer by 8 bytes</span>
+    <span class="pl-k">let</span> <span class="pl-smi">argv</span> <span class="pl-k">=</span> <span class="pl-smi">stack_ptr</span><span class="pl-k">.</span><span class="pl-en">add</span>(<span class="pl-c1">8</span>) <span class="pl-k">as</span> <span class="pl-k">*const</span> <span class="pl-k">*const</span> <span class="pl-en">u8</span>;
+    <span class="pl-k">let</span> <span class="pl-smi">ptr_size</span> <span class="pl-k">=</span> <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">mem</span><span class="pl-k">::</span><span class="pl-en">size_of</span><span class="pl-k">::</span>&#x3C;<span class="pl-en">usize</span>>();
+<span class="pl-c">    // Directly followed by a pointer to the environment variables, it's just a null terminated string.</span>
+<span class="pl-c">    // This isn't specified in Posix and is not great for portability, but this isn't meant to be portable outside of Linux.</span>
+    <span class="pl-k">let</span> <span class="pl-smi">env_offset</span> <span class="pl-k">=</span> <span class="pl-c1">8</span> <span class="pl-k">+</span> <span class="pl-smi">argc</span> <span class="pl-k">as</span> <span class="pl-en">usize</span> <span class="pl-k">*</span> <span class="pl-smi">ptr_size</span> <span class="pl-k">+</span> <span class="pl-smi">ptr_size</span>;
+<span class="pl-c">    // Bump pointer by combined offset</span>
+    <span class="pl-k">let</span> <span class="pl-smi">envp</span> <span class="pl-k">=</span> <span class="pl-smi">stack_ptr</span><span class="pl-k">.</span><span class="pl-en">add</span>(<span class="pl-smi">env_offset</span>) <span class="pl-k">as</span> <span class="pl-k">*const</span> <span class="pl-k">*const</span> <span class="pl-en">u8</span>;
+    <span class="pl-k">let</span> <span class="pl-k">mut</span> <span class="pl-smi">null_offset</span> <span class="pl-k">=</span> <span class="pl-c1">0</span>;
+    <span class="pl-k">loop</span> {
+        <span class="pl-k">let</span> <span class="pl-smi">val</span> <span class="pl-k">=</span> <span class="pl-k">*</span>(<span class="pl-smi">envp</span><span class="pl-k">.</span><span class="pl-en">add</span>(<span class="pl-smi">null_offset</span>));
+        <span class="pl-k">if</span> <span class="pl-smi">val</span> <span class="pl-k">as</span> <span class="pl-en">usize</span> <span class="pl-k">==</span> <span class="pl-c1">0</span> {
+            <span class="pl-k">break</span>;
+        }
+        <span class="pl-smi">null_offset</span> <span class="pl-k">+=</span> <span class="pl-c1">1</span>;
+    }
+<span class="pl-c">    // We now know how long the envp is</span>
+<span class="pl-c">    // ... </span>
+}
+</pre></div>
+<p>This works all the same as a <code>pie</code> because:</p>
+<h2>Prelude, inline</h2>
+<p>There will be trouble when trying to find a symbol contained in the binary, such as a function call.<br>
+Up to here, that hasn't been a problem because even though <code>ptr::add()</code> and <code>core::mem:size_of::&#x3C;T>()</code> is invoked,
+no addresses are needed for those. This is because of inlining.</p>
+<p>Looking at <code>core::mem::size_of&#x3C;T>()</code>:</p>
+<div class="highlight highlight-rust"><pre>#[inline(always)]
+#[must_use]
+#[stable(feature <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>rust1<span class="pl-pds">"</span></span>, since <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>1.0.0<span class="pl-pds">"</span></span>)]
+#[rustc_promotable]
+#[rustc_const_stable(feature <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>const_mem_size_of<span class="pl-pds">"</span></span>, since <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>1.24.0<span class="pl-pds">"</span></span>)]
+#[cfg_attr(not(test), rustc_diagnostic_item <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>mem_size_of<span class="pl-pds">"</span></span>)]
+<span class="pl-k">pub</span> <span class="pl-k">const</span> <span class="pl-k">fn</span> <span class="pl-en">size_of</span>&#x3C;<span class="pl-en">T</span>>() <span class="pl-k">-></span> <span class="pl-en">usize</span> {
+    <span class="pl-en">intrinsics</span><span class="pl-k">::</span><span class="pl-en">size_of</span><span class="pl-k">::</span>&#x3C;<span class="pl-en">T</span>>()
+}
+</pre></div>
+<p>It has the <code>#[inline(always)]</code> attribute, the same goes for <code>ptr::add()</code>. Since that code is inlined,
+an address to a function isn't necessary, and therefore it works even though all of the addresses are off.</p>
+<p>To be able to debug, I would like to be able to print variables, since I haven't been able to hook a debugger up
+to <code>tiny-std</code> executables yet. But, printing to the terminal requires code, code that usually isn't <code>#[inline(always)]</code>.</p>
+<p>So I wrote a small print:</p>
+<div class="highlight highlight-rust"><pre>#[inline(always)]
+<span class="pl-k">unsafe</span> <span class="pl-k">fn</span> <span class="pl-en">print_labeled</span>(<span class="pl-smi">msg</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span>[<span class="pl-en">u8</span>], <span class="pl-smi">val</span><span class="pl-k">:</span> <span class="pl-en">usize</span>) {
+    <span class="pl-en">print_label</span>(<span class="pl-smi">msg</span>);
+    <span class="pl-en">print_val</span>(<span class="pl-smi">val</span>);
+}
+#[inline(always)]
+<span class="pl-k">unsafe</span> <span class="pl-k">fn</span> <span class="pl-en">print_label</span>(<span class="pl-smi">msg</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span>[<span class="pl-en">u8</span>]) {
+    <span class="pl-en">syscall!</span>(<span class="pl-c1">WRITE</span>, <span class="pl-c1">1</span>, <span class="pl-smi">msg</span><span class="pl-k">.</span><span class="pl-en">as_ptr</span>(), <span class="pl-smi">msg</span><span class="pl-k">.</span><span class="pl-en">len</span>());
+}
+#[inline(always)]
+<span class="pl-k">unsafe</span> <span class="pl-k">fn</span> <span class="pl-en">print_val</span>(<span class="pl-smi">u</span><span class="pl-k">:</span> <span class="pl-en">usize</span>) {
+    <span class="pl-en">syscall!</span>(<span class="pl-c1">WRITE</span>, <span class="pl-c1">1</span>, <span class="pl-en">num_to_digits</span>(<span class="pl-smi">u</span>)<span class="pl-k">.</span><span class="pl-en">as_ptr</span>(), <span class="pl-c1">21</span>);
+}
+#[inline(always)]
+<span class="pl-k">unsafe</span> <span class="pl-k">fn</span> <span class="pl-en">num_to_digits</span>(<span class="pl-k">mut</span> <span class="pl-smi">u</span><span class="pl-k">:</span> <span class="pl-en">usize</span>) <span class="pl-k">-></span> [<span class="pl-en">u8</span>; <span class="pl-c1">22</span>] {
+    <span class="pl-k">let</span> <span class="pl-k">mut</span> <span class="pl-smi">base</span> <span class="pl-k">=</span> <span class="pl-k">*</span><span class="pl-s">b<span class="pl-pds">"</span><span class="pl-cce">\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\n</span><span class="pl-pds">"</span></span>;
+    <span class="pl-k">let</span> <span class="pl-k">mut</span> <span class="pl-smi">ind</span> <span class="pl-k">=</span> <span class="pl-smi">base</span><span class="pl-k">.</span><span class="pl-en">len</span>() <span class="pl-k">-</span> <span class="pl-c1">2</span>;
+    <span class="pl-k">if</span> <span class="pl-smi">u</span> <span class="pl-k">==</span> <span class="pl-c1">0</span> {
+        <span class="pl-smi">base</span>[<span class="pl-smi">ind</span>] <span class="pl-k">=</span> <span class="pl-c1">48</span>;
+    }
+    <span class="pl-k">while</span> <span class="pl-smi">u</span> <span class="pl-k">></span> <span class="pl-c1">0</span> {
+        <span class="pl-k">let</span> <span class="pl-smi">md</span> <span class="pl-k">=</span> <span class="pl-smi">u</span> <span class="pl-k">%</span> <span class="pl-c1">10</span>;
+        <span class="pl-smi">base</span>[<span class="pl-smi">ind</span>] <span class="pl-k">=</span> <span class="pl-smi">md</span> <span class="pl-k">as</span> <span class="pl-en">u8</span> <span class="pl-k">+</span> <span class="pl-c1">48</span>;
+        <span class="pl-smi">ind</span> <span class="pl-k">-=</span> <span class="pl-c1">1</span>;
+        <span class="pl-smi">u</span> <span class="pl-k">=</span> <span class="pl-smi">u</span> <span class="pl-k">/</span> <span class="pl-c1">10</span>;
+    }
+    <span class="pl-smi">base</span>
+}
+</pre></div>
+<p>Printing to the terminal can be done through the syscall <code>WRITE</code> on <code>fd</code> <code>1</code> (<code>STDOUT</code>).<br>
+It takes a buffer of bytes and a length. The call through <code>syscall!()</code> is always inlined.</p>
+<p>Since I primarily need look at addresses, I just print <code>usize</code>, and I wrote a beautifully stupid number to digits function.<br>
+Since the max digits of a <code>usize</code> on a 64-bit machine is 21, I allocate a slice on the stack filled with
+<code>null</code>-bytes, these won't be displayed. Then add digit by digit, which means that the number is formatted without leading or
+trailing zeroes.</p>
+<p>Invoking it looks like this:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">fn</span> <span class="pl-en">test</span>() {
+    <span class="pl-en">print_labeled</span>(<span class="pl-s">b<span class="pl-pds">"</span>My msg as bytes: <span class="pl-pds">"</span></span>, <span class="pl-c1">15</span>);
+}
+</pre></div>
+<h2>Relocation</h2>
+<p>Now that basic debug-printing is possible work to relocate the addresses can begin.</p>
+<p>I previously had written some code the extract <code>aux</code>-values, but now that code needs to run without using any
+non-inlined functions or variables.</p>
+<h3>Aux values</h3>
+<p>A good description of aux-values comes from <a href="https://man7.org/linux/man-pages/man3/getauxval.3.html">the docs here</a>,
+in short the kernel puts some data in the memory of a program when it's loaded.<br>
+This data points to other data that is needed to do relocation. It also has an insane layout for reasons that
+I haven't yet been able to find any motivation for.<br>
+A pointer to the aux-values are put after the <code>envp</code> on the stack.</p>
+<p>The aux-values were collected and stored pretty sloppily as a global static variable before implementing this change,
+this time it needs to be collected onto the stack, used for finding the dynamic relocation addresses,
+and then it could be put into a static variable after that (since the address of the static variable can't be found before
+remapping).</p>
+<p>The <code>dyn</code>-values are also required, which are essentially the same as aux-values, provided for <code>DYN</code>-objects.</p>
+<p>In musl, the aux-values that are put on the stack looks like this:</p>
+<div class="highlight highlight-c"><pre><span class="pl-c1">size_t</span> i, aux[AUX_CNT], dyn[DYN_CNT];
+</pre></div>
+<p>So I replicated the aux-vec on the stack like this:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-c">// There are 32 aux values.</span>
+<span class="pl-k">let</span> <span class="pl-k">mut</span> <span class="pl-smi">aux</span><span class="pl-k">:</span> [<span class="pl-c1">0</span><span class="pl-en">usize</span>; <span class="pl-c1">32</span>];
+</pre></div>
+<p>And then initialize it, with the <code>aux</code>-pointer provided by the OS.</p>
+<p>The OS-supplies some values in the <code>aux</code>-vector <a href="https://man7.org/linux/man-pages/man3/getauxval.3.html">more info here</a>
+the necessary ones for remapping are:</p>
+<ol>
+<li><code>AT_BASE</code> the base address of the program interpreter, 0 if no interpreter (static-pie).
+<li><code>AT_PHNUM</code>, the number of program headers.
+<li><code>AT_PHENT</code>, the size of one program header entry.
+<li><code>AT_PHDR</code>, the address of the program headers in the executable.
+</ol>
+<p>First a virtual address found at the program header that has the <code>dynamic</code> type must be found.</p>
+<p>The program header is laid out in memory as this struct:</p>
+<div class="highlight highlight-rust"><pre>#[repr(<span class="pl-en">C</span>)]
+#[derive(<span class="pl-en">Debug</span>, <span class="pl-en">Copy</span>, <span class="pl-en">Clone</span>)]
+<span class="pl-k">pub</span> <span class="pl-k">struct</span> <span class="pl-smi">elf64_phdr</span> {
+    <span class="pl-k">pub</span> <span class="pl-smi">p_type</span><span class="pl-k">:</span> Elf64_Word,
+    <span class="pl-k">pub</span> <span class="pl-smi">p_flags</span><span class="pl-k">:</span> Elf64_Word,
+    <span class="pl-k">pub</span> <span class="pl-smi">p_offset</span><span class="pl-k">:</span> Elf64_Off,
+    <span class="pl-k">pub</span> <span class="pl-smi">p_vaddr</span><span class="pl-k">:</span> Elf64_Addr,
+    <span class="pl-k">pub</span> <span class="pl-smi">p_paddr</span><span class="pl-k">:</span> Elf64_Addr,
+    <span class="pl-k">pub</span> <span class="pl-smi">p_filesz</span><span class="pl-k">:</span> Elf64_Xword,
+    <span class="pl-k">pub</span> <span class="pl-smi">p_memsz</span><span class="pl-k">:</span> Elf64_Xword,
+    <span class="pl-k">pub</span> <span class="pl-smi">p_align</span><span class="pl-k">:</span> Elf64_Xword,
+}
+</pre></div>
+<p>The address of the <code>AT_PHDR</code> can be treated as an array declared as:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">let</span> <span class="pl-smi">phdr</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span>[<span class="pl-smi">elf64_phdr</span>; <span class="pl-c1">AT_PHNUM</span>] <span class="pl-k">=</span> <span class="pl-k">...</span>
+</pre></div>
+<p>That array can be walked until finding a program header struct with <code>p_type</code> = <code>PT_DYNAMIC</code>,
+that program header holds an offset at <code>p_vaddr</code> that can be subtracted from the <code>dynv</code> pointer to get
+the correct <code>base</code> address.</p>
+<h2>Initialize the dyn section</h2>
+<p>The <code>dynv</code> pointer supplied by the os, as previously stated, is analogous to the <code>aux</code>-pointer but
+trying to stack allocate its value mappings like this:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">let</span> <span class="pl-smi">dyn_values</span> <span class="pl-k">=</span> [<span class="pl-c1">0</span><span class="pl-en">usize</span>; <span class="pl-c1">37</span>];
+</pre></div>
+<p>Will cause a segfault.</p>
+<h3>SYMBOLS!!!</h3>
+<p>It took me a while to figure out what's happening, a zeroed array is allocated in rust, and
+that array is larger than <code>[0usize; 32]</code> (256 bytes of zeroes seems to be the exact breakpoint)
+<code>rustc</code> instead of using <code>sse</code> instructions, uses <code>memset</code> to zero the memory it just took off the stack.</p>
+<p>The asm will look like this:</p>
+<pre><code class="language-asm">        ...
+        mov edx, 296
+        mov rdi, rbx
+        xor esi, esi
+        call qword ptr [rip + memset@GOTPCREL]
+        ...
+</code></pre>
+<p>Accessing that memset symbol is what causes the segfault.<br>
+I tried a myriad of ways to get the compiler to not emit that symbol, among
+<a href="https://users.rust-lang.org/t/reliably-working-around-rust-emitting-memset-when-putting-a-slice-on-the-stack/97080">posting this</a>
+help request.</p>
+<p>It seems that there is no reliable way to avoid <code>rustc</code> emitting unwanted symbols without doing it all in assembly,
+and since that seems a bit much, at least right now, I opted to instead restructure the code. Unpacking both
+the aux and dyn values and just keeping what <code>tiny-std</code> needs.<br>
+The unpacked aux values now look like this:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-c">/// Some selected aux-values, needs to be kept small since they're collected</span>
+<span class="pl-c">/// before symbol relocation on static-pie-linked binaries, which means rustc</span>
+<span class="pl-c">/// will emit memset on a zeroed allocation of over 256 bytes, which we won't be able</span>
+<span class="pl-c">/// to find and thus will result in an immediate segfault on start.</span>
+<span class="pl-c">/// See [docs](https://man7.org/linux/man-pages/man3/getauxval.3.html)</span>
+#[derive(<span class="pl-en">Debug</span>)]
+<span class="pl-k">pub</span>(<span class="pl-k">crate</span>) <span class="pl-k">struct</span> <span class="pl-en">AuxValues</span> {
+<span class="pl-c">    /// Base address of the program interpreter</span>
+    <span class="pl-k">pub</span>(<span class="pl-k">crate</span>) <span class="pl-smi">at_base</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+<span class="pl-c">    /// Real group id of the main thread</span>
+    <span class="pl-k">pub</span>(<span class="pl-k">crate</span>) <span class="pl-smi">at_gid</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+<span class="pl-c">    /// Real user id of the main thread</span>
+    <span class="pl-k">pub</span>(<span class="pl-k">crate</span>) <span class="pl-smi">at_uid</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+<span class="pl-c">    /// Address of the executable's program headers</span>
+    <span class="pl-k">pub</span>(<span class="pl-k">crate</span>) <span class="pl-smi">at_phdr</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+<span class="pl-c">    /// Size of program header entry</span>
+    <span class="pl-k">pub</span>(<span class="pl-k">crate</span>) <span class="pl-smi">at_phent</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+<span class="pl-c">    /// Number of program headers</span>
+    <span class="pl-k">pub</span>(<span class="pl-k">crate</span>) <span class="pl-smi">at_phnum</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+<span class="pl-c">    /// Address pointing to 16 bytes of a random value</span>
+    <span class="pl-k">pub</span>(<span class="pl-k">crate</span>) <span class="pl-smi">at_random</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+<span class="pl-c">    /// Executable should be treated securely</span>
+    <span class="pl-k">pub</span>(<span class="pl-k">crate</span>) <span class="pl-smi">at_secure</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+<span class="pl-c">    /// Address of the vdso</span>
+    <span class="pl-k">pub</span>(<span class="pl-k">crate</span>) <span class="pl-smi">at_sysinfo_ehdr</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+}
+</pre></div>
+<p>It only contains the aux-values that are actually used by <code>tiny-std</code>.</p>
+<p>The dyn-values are only used for relocations so far, so they were packed into this much smaller struct:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">pub</span>(<span class="pl-k">crate</span>) <span class="pl-k">struct</span> <span class="pl-en">DynSection</span> {
+    <span class="pl-smi">rel</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+    <span class="pl-smi">rel_sz</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+    <span class="pl-smi">rela</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+    <span class="pl-smi">rela_sz</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+}
+</pre></div>
+<p>Now that <code>rustc</code>'s memset emissions has been sidestepped, the <code>DynSection</code> struct can be filled with the values from the
+<code>dynv</code>-pointer, and then finally the symbols can be relocated:</p>
+<div class="highlight highlight-rust"><pre>#[inline(always)]
+<span class="pl-k">pub</span>(<span class="pl-k">crate</span>) <span class="pl-k">unsafe</span> <span class="pl-k">fn</span> <span class="pl-en">relocate</span>(<span class="pl-k">&#x26;</span><span class="pl-c1">self</span>, <span class="pl-smi">base_addr</span><span class="pl-k">:</span> <span class="pl-en">usize</span>) {
+<span class="pl-c">    // Relocate all rel-entries</span>
+    <span class="pl-k">for</span> <span class="pl-smi">i</span> <span class="pl-k">in</span> <span class="pl-c1">0</span><span class="pl-k">..</span>(<span class="pl-c1">self</span><span class="pl-k">.</span>rel_sz <span class="pl-k">/</span> <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">mem</span><span class="pl-k">::</span><span class="pl-en">size_of</span><span class="pl-k">::</span>&#x3C;<span class="pl-en">Elf64Rel</span>>()) {
+        <span class="pl-k">let</span> <span class="pl-smi">rel_ptr</span> <span class="pl-k">=</span> ((<span class="pl-smi">base_addr</span> <span class="pl-k">+</span> <span class="pl-c1">self</span><span class="pl-k">.</span>rel) <span class="pl-k">as</span> <span class="pl-k">*const</span> <span class="pl-c1">Elf64Rel</span>)<span class="pl-k">.</span><span class="pl-en">add</span>(<span class="pl-smi">i</span>);
+        <span class="pl-k">let</span> <span class="pl-smi">rel</span> <span class="pl-k">=</span> <span class="pl-en">ptr_unsafe_ref</span>(<span class="pl-smi">rel_ptr</span>);
+        <span class="pl-k">if</span> <span class="pl-smi">rel</span><span class="pl-k">.</span><span class="pl-c1">0.</span>r_info <span class="pl-k">==</span> <span class="pl-en">relative_type</span>(<span class="pl-c1">REL_RELATIVE</span>) {
+            <span class="pl-k">let</span> <span class="pl-smi">rel_addr</span> <span class="pl-k">=</span> (<span class="pl-smi">base_addr</span> <span class="pl-k">+</span> <span class="pl-smi">rel</span><span class="pl-k">.</span><span class="pl-c1">0.</span>r_offset <span class="pl-k">as</span> <span class="pl-en">usize</span>) <span class="pl-k">as</span> <span class="pl-k">*mut</span> <span class="pl-en">usize</span>;
+            <span class="pl-k">*</span><span class="pl-smi">rel_addr</span> <span class="pl-k">+=</span> <span class="pl-smi">base_addr</span>;
+        }
+    }
+<span class="pl-c">    // Relocate all rela-entries</span>
+    <span class="pl-k">for</span> <span class="pl-smi">i</span> <span class="pl-k">in</span> <span class="pl-c1">0</span><span class="pl-k">..</span>(<span class="pl-c1">self</span><span class="pl-k">.</span>rela_sz <span class="pl-k">/</span> <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">mem</span><span class="pl-k">::</span><span class="pl-en">size_of</span><span class="pl-k">::</span>&#x3C;<span class="pl-en">Elf64Rela</span>>()) {
+        <span class="pl-k">let</span> <span class="pl-smi">rela_ptr</span> <span class="pl-k">=</span> ((<span class="pl-smi">base_addr</span> <span class="pl-k">+</span> <span class="pl-c1">self</span><span class="pl-k">.</span>rela) <span class="pl-k">as</span> <span class="pl-k">*const</span> <span class="pl-c1">Elf64Rela</span>)<span class="pl-k">.</span><span class="pl-en">add</span>(<span class="pl-smi">i</span>);
+        <span class="pl-k">let</span> <span class="pl-smi">rela</span> <span class="pl-k">=</span> <span class="pl-en">ptr_unsafe_ref</span>(<span class="pl-smi">rela_ptr</span>);
+        <span class="pl-k">if</span> <span class="pl-smi">rela</span><span class="pl-k">.</span><span class="pl-c1">0.</span>r_info <span class="pl-k">==</span> <span class="pl-en">relative_type</span>(<span class="pl-c1">REL_RELATIVE</span>) {
+            <span class="pl-k">let</span> <span class="pl-smi">rel_addr</span> <span class="pl-k">=</span> (<span class="pl-smi">base_addr</span> <span class="pl-k">+</span> <span class="pl-smi">rela</span><span class="pl-k">.</span><span class="pl-c1">0.</span>r_offset <span class="pl-k">as</span> <span class="pl-en">usize</span>) <span class="pl-k">as</span> <span class="pl-k">*mut</span> <span class="pl-en">usize</span>;
+            <span class="pl-k">*</span><span class="pl-smi">rel_addr</span> <span class="pl-k">=</span> <span class="pl-smi">base_addr</span> <span class="pl-k">+</span> <span class="pl-smi">rela</span><span class="pl-k">.</span><span class="pl-c1">0.</span>r_addend <span class="pl-k">as</span> <span class="pl-en">usize</span>;
+        }
+    }
+<span class="pl-c">    // Skip implementing relr-entries for now</span>
+}
+</pre></div>
+<p>After the <code>relocate</code>-section runs, <code>symbols</code> can again be used, and <code>tiny-std</code> can continue with the setup.</p>
+<h2>Outro</h2>
+<p>The commit that added the functionality can be found <a href="https://github.com/MarcusGrass/tiny-std/commit/fce20899b891cb07913800dc63fae991f758a819">here</a>.</p>
+<p>Thanks for reading!</p>
+</div>
+</div>
\ No newline at end of file
diff --git a/static/github-markdown.css b/static/github-markdown.css
new file mode 100644
index 0000000..e451c64
--- /dev/null
+++ b/static/github-markdown.css
@@ -0,0 +1 @@
+@media (prefers-color-scheme:dark){:root{color-scheme:dark;--color-prettylights-syntax-comment:#8b949e;--color-prettylights-syntax-constant:#79c0ff;--color-prettylights-syntax-entity:#d2a8ff;--color-prettylights-syntax-storage-modifier-import:#c9d1d9;--color-prettylights-syntax-entity-tag:#7ee787;--color-prettylights-syntax-keyword:#ff7b72;--color-prettylights-syntax-string:#a5d6ff;--color-prettylights-syntax-variable:#ffa657;--color-prettylights-syntax-brackethighlighter-unmatched:#f85149;--color-prettylights-syntax-invalid-illegal-text:#f0f6fc;--color-prettylights-syntax-invalid-illegal-bg:#8e1519;--color-prettylights-syntax-carriage-return-text:#f0f6fc;--color-prettylights-syntax-carriage-return-bg:#b62324;--color-prettylights-syntax-string-regexp:#7ee787;--color-prettylights-syntax-markup-list:#f2cc60;--color-prettylights-syntax-markup-heading:#1f6feb;--color-prettylights-syntax-markup-italic:#c9d1d9;--color-prettylights-syntax-markup-bold:#c9d1d9;--color-prettylights-syntax-markup-deleted-text:#ffdcd7;--color-prettylights-syntax-markup-deleted-bg:#67060c;--color-prettylights-syntax-markup-inserted-text:#aff5b4;--color-prettylights-syntax-markup-inserted-bg:#033a16;--color-prettylights-syntax-markup-changed-text:#ffdfb6;--color-prettylights-syntax-markup-changed-bg:#5a1e02;--color-prettylights-syntax-markup-ignored-text:#c9d1d9;--color-prettylights-syntax-markup-ignored-bg:#1158c7;--color-prettylights-syntax-meta-diff-range:#d2a8ff;--color-prettylights-syntax-brackethighlighter-angle:#8b949e;--color-prettylights-syntax-sublimelinter-gutter-mark:#484f58;--color-prettylights-syntax-constant-other-reference-link:#a5d6ff;--color-fg-default:#c9d1d9;--color-fg-muted:#8b949e;--color-fg-subtle:#484f58;--color-canvas-default:#0d1117;--color-canvas-subtle:#161b22;--color-border-default:#30363d;--color-border-muted:#21262d;--color-neutral-muted:rgba(110,118,129,0.4);--color-accent-fg:#58a6ff;--color-accent-emphasis:#1f6feb;--color-attention-subtle:rgba(187,128,9,0.15);--color-danger-fg:#f85149;}}@media (prefers-color-scheme:light){:root{color-scheme:light;--color-prettylights-syntax-comment:#6e7781;--color-prettylights-syntax-constant:#0550ae;--color-prettylights-syntax-entity:#8250df;--color-prettylights-syntax-storage-modifier-import:#24292f;--color-prettylights-syntax-entity-tag:#116329;--color-prettylights-syntax-keyword:#cf222e;--color-prettylights-syntax-string:#0a3069;--color-prettylights-syntax-variable:#953800;--color-prettylights-syntax-brackethighlighter-unmatched:#82071e;--color-prettylights-syntax-invalid-illegal-text:#f6f8fa;--color-prettylights-syntax-invalid-illegal-bg:#82071e;--color-prettylights-syntax-carriage-return-text:#f6f8fa;--color-prettylights-syntax-carriage-return-bg:#cf222e;--color-prettylights-syntax-string-regexp:#116329;--color-prettylights-syntax-markup-list:#3b2300;--color-prettylights-syntax-markup-heading:#0550ae;--color-prettylights-syntax-markup-italic:#24292f;--color-prettylights-syntax-markup-bold:#24292f;--color-prettylights-syntax-markup-deleted-text:#82071e;--color-prettylights-syntax-markup-deleted-bg:#FFEBE9;--color-prettylights-syntax-markup-inserted-text:#116329;--color-prettylights-syntax-markup-inserted-bg:#dafbe1;--color-prettylights-syntax-markup-changed-text:#953800;--color-prettylights-syntax-markup-changed-bg:#ffd8b5;--color-prettylights-syntax-markup-ignored-text:#eaeef2;--color-prettylights-syntax-markup-ignored-bg:#0550ae;--color-prettylights-syntax-meta-diff-range:#8250df;--color-prettylights-syntax-brackethighlighter-angle:#57606a;--color-prettylights-syntax-sublimelinter-gutter-mark:#8c959f;--color-prettylights-syntax-constant-other-reference-link:#0a3069;--color-fg-default:#24292f;--color-fg-muted:#57606a;--color-fg-subtle:#6e7781;--color-canvas-default:#ffffff;--color-canvas-subtle:#f6f8fa;--color-border-default:#d0d7de;--color-border-muted:hsla(210,18%,87%,1);--color-neutral-muted:rgba(175,184,193,0.2);--color-accent-fg:#0969da;--color-accent-emphasis:#0969da;--color-attention-subtle:#fff8c5;--color-danger-fg:#cf222e;}}.markdown-body{-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%;margin:0;color:var(--color-fg-default);background-color:var(--color-canvas-default);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji";font-size:16px;line-height:1.5;word-wrap:break-word;}.markdown-body .octicon{display:inline-block;fill:currentColor;vertical-align:text-bottom;}.markdown-body h1:hover .anchor .octicon-link:before,.markdown-body h2:hover .anchor .octicon-link:before,.markdown-body h3:hover .anchor .octicon-link:before,.markdown-body h4:hover .anchor .octicon-link:before,.markdown-body h5:hover .anchor .octicon-link:before,.markdown-body h6:hover .anchor .octicon-link:before{width:16px;height:16px;content:' ';display:inline-block;background-color:currentColor;-webkit-mask-image:url("data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 16 16' version='1.1' aria-hidden='true'><path fill-rule='evenodd' d='M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z'></path></svg>");mask-image:url("data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 16 16' version='1.1' aria-hidden='true'><path fill-rule='evenodd' d='M7.775 3.275a.75.75 0 001.06 1.06l1.25-1.25a2 2 0 112.83 2.83l-2.5 2.5a2 2 0 01-2.83 0 .75.75 0 00-1.06 1.06 3.5 3.5 0 004.95 0l2.5-2.5a3.5 3.5 0 00-4.95-4.95l-1.25 1.25zm-4.69 9.64a2 2 0 010-2.83l2.5-2.5a2 2 0 012.83 0 .75.75 0 001.06-1.06 3.5 3.5 0 00-4.95 0l-2.5 2.5a3.5 3.5 0 004.95 4.95l1.25-1.25a.75.75 0 00-1.06-1.06l-1.25 1.25a2 2 0 01-2.83 0z'></path></svg>");}.markdown-body details,.markdown-body figcaption,.markdown-body figure{display:block;}.markdown-body summary{display:list-item;}.markdown-body[hidden]{display:none !important;}.markdown-body a{background-color:transparent;color:var(--color-accent-fg);text-decoration:none;}.markdown-body a:active,.markdown-body a:hover{outline-width:0;}.markdown-body abbr[title]{border-bottom:none;text-decoration:underline dotted;}.markdown-body b,.markdown-body strong{font-weight:600;}.markdown-body dfn{font-style:italic;}.markdown-body h1{margin:.67em 0;font-weight:600;padding-bottom:.3em;font-size:2em;border-bottom:1px solid var(--color-border-muted);}.markdown-body mark{background-color:var(--color-attention-subtle);color:var(--color-text-primary);}.markdown-body small{font-size:90%;}.markdown-body sub,.markdown-body sup{font-size:75%;line-height:0;position:relative;vertical-align:baseline;}.markdown-body sub{bottom:-0.25em;}.markdown-body sup{top:-0.5em;}.markdown-body img{border-style:none;max-width:100%;box-sizing:content-box;background-color:var(--color-canvas-default);}.markdown-body code,.markdown-body kbd,.markdown-body pre,.markdown-body samp{font-family:monospace,monospace;font-size:1em;}.markdown-body figure{margin:1em 40px;}.markdown-body hr{box-sizing:content-box;overflow:hidden;background:transparent;border-bottom:1px solid var(--color-border-muted);height:.25em;padding:0;margin:24px 0;background-color:var(--color-border-default);border:0;}.markdown-body input{font:inherit;margin:0;overflow:visible;font-family:inherit;font-size:inherit;line-height:inherit;}.markdown-body[type=button],.markdown-body[type=reset],.markdown-body[type=submit]{-webkit-appearance:button;}.markdown-body[type=button]::-moz-focus-inner,.markdown-body[type=reset]::-moz-focus-inner,.markdown-body[type=submit]::-moz-focus-inner{border-style:none;padding:0;}.markdown-body[type=button]:-moz-focusring,.markdown-body[type=reset]:-moz-focusring,.markdown-body[type=submit]:-moz-focusring{outline:1px dotted ButtonText;}.markdown-body[type=checkbox],.markdown-body[type=radio]{box-sizing:border-box;padding:0;}.markdown-body[type=number]::-webkit-inner-spin-button,.markdown-body[type=number]::-webkit-outer-spin-button{height:auto;}.markdown-body[type=search]{-webkit-appearance:textfield;outline-offset:-2px;}.markdown-body[type=search]::-webkit-search-cancel-button,.markdown-body[type=search]::-webkit-search-decoration{-webkit-appearance:none;}.markdown-body::-webkit-input-placeholder{color:inherit;opacity:.54;}.markdown-body::-webkit-file-upload-button{-webkit-appearance:button;font:inherit;}.markdown-body a:hover{text-decoration:underline;}.markdown-body hr::before{display:table;content:"";}.markdown-body hr::after{display:table;clear:both;content:"";}.markdown-body table{border-spacing:0;border-collapse:collapse;display:block;width:max-content;max-width:100%;overflow:auto;}.markdown-body td,.markdown-body th{padding:0;}.markdown-body details summary{cursor:pointer;}.markdown-body details:not([open])>*:not(summary){display:none !important;}.markdown-body kbd{display:inline-block;padding:3px 5px;font:11px ui-monospace,SFMono-Regular,SF Mono,Menlo,Consolas,Liberation Mono,monospace;line-height:10px;color:var(--color-fg-default);vertical-align:middle;background-color:var(--color-canvas-subtle);border:solid 1px var(--color-neutral-muted);border-bottom-color:var(--color-neutral-muted);border-radius:6px;box-shadow:inset 0 -1px 0 var(--color-neutral-muted);}.markdown-body h1,.markdown-body h2,.markdown-body h3,.markdown-body h4,.markdown-body h5,.markdown-body h6{margin-top:24px;margin-bottom:16px;font-weight:600;line-height:1.25;}.markdown-body h2{font-weight:600;padding-bottom:.3em;font-size:1.5em;border-bottom:1px solid var(--color-border-muted);}.markdown-body h3{font-weight:600;font-size:1.25em;}.markdown-body h4{font-weight:600;font-size:1em;}.markdown-body h5{font-weight:600;font-size:.875em;}.markdown-body h6{font-weight:600;font-size:.85em;color:var(--color-fg-muted);}.markdown-body p{margin-top:0;margin-bottom:10px;}.markdown-body blockquote{margin:0;padding:0 1em;color:var(--color-fg-muted);border-left:.25em solid var(--color-border-default);}.markdown-body ul,.markdown-body ol{margin-top:0;margin-bottom:0;padding-left:2em;}.markdown-body ol ol,.markdown-body ul ol{list-style-type:lower-roman;}.markdown-body ul ul ol,.markdown-body ul ol ol,.markdown-body ol ul ol,.markdown-body ol ol ol{list-style-type:lower-alpha;}.markdown-body dd{margin-left:0;}.markdown-body tt,.markdown-body code{font-family:ui-monospace,SFMono-Regular,SF Mono,Menlo,Consolas,Liberation Mono,monospace;font-size:12px;}.markdown-body pre{margin-top:0;margin-bottom:0;font-family:ui-monospace,SFMono-Regular,SF Mono,Menlo,Consolas,Liberation Mono,monospace;font-size:12px;word-wrap:normal;}.markdown-body .octicon{display:inline-block;overflow:visible !important;vertical-align:text-bottom;fill:currentColor;}.markdown-body::placeholder{color:var(--color-fg-subtle);opacity:1;}.markdown-body input::-webkit-outer-spin-button,.markdown-body input::-webkit-inner-spin-button{margin:0;-webkit-appearance:none;appearance:none;}.markdown-body .pl-c{color:var(--color-prettylights-syntax-comment);}.markdown-body .pl-c1,.markdown-body .pl-s .pl-v{color:var(--color-prettylights-syntax-constant);}.markdown-body .pl-e,.markdown-body .pl-en{color:var(--color-prettylights-syntax-entity);}.markdown-body .pl-smi,.markdown-body .pl-s .pl-s1{color:var(--color-prettylights-syntax-storage-modifier-import);}.markdown-body .pl-ent{color:var(--color-prettylights-syntax-entity-tag);}.markdown-body .pl-k{color:var(--color-prettylights-syntax-keyword);}.markdown-body .pl-s,.markdown-body .pl-pds,.markdown-body .pl-s .pl-pse .pl-s1,.markdown-body .pl-sr,.markdown-body .pl-sr .pl-cce,.markdown-body .pl-sr .pl-sre,.markdown-body .pl-sr .pl-sra{color:var(--color-prettylights-syntax-string);}.markdown-body .pl-v,.markdown-body .pl-smw{color:var(--color-prettylights-syntax-variable);}.markdown-body .pl-bu{color:var(--color-prettylights-syntax-brackethighlighter-unmatched);}.markdown-body .pl-ii{color:var(--color-prettylights-syntax-invalid-illegal-text);background-color:var(--color-prettylights-syntax-invalid-illegal-bg);}.markdown-body .pl-c2{color:var(--color-prettylights-syntax-carriage-return-text);background-color:var(--color-prettylights-syntax-carriage-return-bg);}.markdown-body .pl-sr .pl-cce{font-weight:bold;color:var(--color-prettylights-syntax-string-regexp);}.markdown-body .pl-ml{color:var(--color-prettylights-syntax-markup-list);}.markdown-body .pl-mh,.markdown-body .pl-mh .pl-en,.markdown-body .pl-ms{font-weight:bold;color:var(--color-prettylights-syntax-markup-heading);}.markdown-body .pl-mi{font-style:italic;color:var(--color-prettylights-syntax-markup-italic);}.markdown-body .pl-mb{font-weight:bold;color:var(--color-prettylights-syntax-markup-bold);}.markdown-body .pl-md{color:var(--color-prettylights-syntax-markup-deleted-text);background-color:var(--color-prettylights-syntax-markup-deleted-bg);}.markdown-body .pl-mi1{color:var(--color-prettylights-syntax-markup-inserted-text);background-color:var(--color-prettylights-syntax-markup-inserted-bg);}.markdown-body .pl-mc{color:var(--color-prettylights-syntax-markup-changed-text);background-color:var(--color-prettylights-syntax-markup-changed-bg);}.markdown-body .pl-mi2{color:var(--color-prettylights-syntax-markup-ignored-text);background-color:var(--color-prettylights-syntax-markup-ignored-bg);}.markdown-body .pl-mdr{font-weight:bold;color:var(--color-prettylights-syntax-meta-diff-range);}.markdown-body .pl-ba{color:var(--color-prettylights-syntax-brackethighlighter-angle);}.markdown-body .pl-sg{color:var(--color-prettylights-syntax-sublimelinter-gutter-mark);}.markdown-body .pl-corl{text-decoration:underline;color:var(--color-prettylights-syntax-constant-other-reference-link);}.markdown-body[data-catalyst]{display:block;}.markdown-body g-emoji{font-family:"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:1em;font-style:normal !important;font-weight:400;line-height:1;vertical-align:-0.075em;}.markdown-body g-emoji img{width:1em;height:1em;}.markdown-body::before{display:table;content:"";}.markdown-body::after{display:table;clear:both;content:"";}.markdown-body>*:first-child{margin-top:0 !important;}.markdown-body>*:last-child{margin-bottom:0 !important;}.markdown-body a:not([href]):not(.self-link){color:inherit;text-decoration:none;}.markdown-body .absent{color:var(--color-danger-fg);}.markdown-body .anchor{float:left;padding-right:4px;margin-left:-20px;line-height:1;}.markdown-body .anchor:focus{outline:none;}.markdown-body p,.markdown-body blockquote,.markdown-body ul,.markdown-body ol,.markdown-body dl,.markdown-body table,.markdown-body pre,.markdown-body details{margin-top:0;margin-bottom:16px;}.markdown-body blockquote>:first-child{margin-top:0;}.markdown-body blockquote>:last-child{margin-bottom:0;}.markdown-body sup>a::before{content:"[";}.markdown-body sup>a::after{content:"]";}.markdown-body h1 .octicon-link,.markdown-body h2 .octicon-link,.markdown-body h3 .octicon-link,.markdown-body h4 .octicon-link,.markdown-body h5 .octicon-link,.markdown-body h6 .octicon-link{color:var(--color-fg-default);vertical-align:middle;visibility:hidden;}.markdown-body h1:hover .anchor,.markdown-body h2:hover .anchor,.markdown-body h3:hover .anchor,.markdown-body h4:hover .anchor,.markdown-body h5:hover .anchor,.markdown-body h6:hover .anchor{text-decoration:none;}.markdown-body h1:hover .anchor .octicon-link,.markdown-body h2:hover .anchor .octicon-link,.markdown-body h3:hover .anchor .octicon-link,.markdown-body h4:hover .anchor .octicon-link,.markdown-body h5:hover .anchor .octicon-link,.markdown-body h6:hover .anchor .octicon-link{visibility:visible;}.markdown-body h1 tt,.markdown-body h1 code,.markdown-body h2 tt,.markdown-body h2 code,.markdown-body h3 tt,.markdown-body h3 code,.markdown-body h4 tt,.markdown-body h4 code,.markdown-body h5 tt,.markdown-body h5 code,.markdown-body h6 tt,.markdown-body h6 code{padding:0 .2em;font-size:inherit;}.markdown-body ul.no-list,.markdown-body ol.no-list{padding:0;list-style-type:none;}.markdown-body ol[type="1"]{list-style-type:decimal;}.markdown-body ol[type=a]{list-style-type:lower-alpha;}.markdown-body ol[type=i]{list-style-type:lower-roman;}.markdown-body div>ol:not([type]){list-style-type:decimal;}.markdown-body ul ul,.markdown-body ul ol,.markdown-body ol ol,.markdown-body ol ul{margin-top:0;margin-bottom:0;}.markdown-body li>p{margin-top:16px;}.markdown-body li+li{margin-top:.25em;}.markdown-body dl{padding:0;}.markdown-body dl dt{padding:0;margin-top:16px;font-size:1em;font-style:italic;font-weight:600;}.markdown-body dl dd{padding:0 16px;margin-bottom:16px;}.markdown-body table th{font-weight:600;}.markdown-body table th,.markdown-body table td{padding:6px 13px;border:1px solid var(--color-border-default);}.markdown-body table tr{background-color:var(--color-canvas-default);border-top:1px solid var(--color-border-muted);}.markdown-body table tr:nth-child(2n){background-color:var(--color-canvas-subtle);}.markdown-body table img{background-color:transparent;}.markdown-body img[align=right]{padding-left:20px;}.markdown-body img[align=left]{padding-right:20px;}.markdown-body .emoji{max-width:none;vertical-align:text-top;background-color:transparent;}.markdown-body span.frame{display:block;overflow:hidden;}.markdown-body span.frame>span{display:block;float:left;width:auto;padding:7px;margin:13px 0 0;overflow:hidden;border:1px solid var(--color-border-default);}.markdown-body span.frame span img{display:block;float:left;}.markdown-body span.frame span span{display:block;padding:5px 0 0;clear:both;color:var(--color-fg-default);}.markdown-body span.align-center{display:block;overflow:hidden;clear:both;}.markdown-body span.align-center>span{display:block;margin:13px auto 0;overflow:hidden;text-align:center;}.markdown-body span.align-center span img{margin:0 auto;text-align:center;}.markdown-body span.align-right{display:block;overflow:hidden;clear:both;}.markdown-body span.align-right>span{display:block;margin:13px 0 0;overflow:hidden;text-align:right;}.markdown-body span.align-right span img{margin:0;text-align:right;}.markdown-body span.float-left{display:block;float:left;margin-right:13px;overflow:hidden;}.markdown-body span.float-left span{margin:13px 0 0;}.markdown-body span.float-right{display:block;float:right;margin-left:13px;overflow:hidden;}.markdown-body span.float-right>span{display:block;margin:13px auto 0;overflow:hidden;text-align:right;}.markdown-body code,.markdown-body tt{padding:.2em .4em;margin:0;font-size:85%;background-color:var(--color-neutral-muted);border-radius:6px;}.markdown-body code br,.markdown-body tt br{display:none;}.markdown-body del code{text-decoration:inherit;}.markdown-body pre code{font-size:100%;}.markdown-body pre>code{padding:0;margin:0;word-break:normal;white-space:pre;background:transparent;border:0;}.markdown-body .highlight{margin-bottom:16px;}.markdown-body .highlight pre{margin-bottom:0;word-break:normal;}.markdown-body .highlight pre,.markdown-body pre{padding:16px;overflow:auto;font-size:85%;line-height:1.45;background-color:var(--color-canvas-subtle);border-radius:6px;}.markdown-body pre code,.markdown-body pre tt{display:inline;max-width:auto;padding:0;margin:0;overflow:visible;line-height:inherit;word-wrap:normal;background-color:transparent;border:0;}.markdown-body .csv-data td,.markdown-body .csv-data th{padding:5px;overflow:hidden;font-size:12px;line-height:1;text-align:left;white-space:nowrap;}.markdown-body .csv-data .blob-num{padding:10px 8px 9px;text-align:right;background:var(--color-canvas-default);border:0;}.markdown-body .csv-data tr{border-top:0;}.markdown-body .csv-data th{font-weight:600;background:var(--color-canvas-subtle);border-top:0;}.markdown-body .footnotes{font-size:12px;color:var(--color-fg-muted);border-top:1px solid var(--color-border-default);}.markdown-body .footnotes ol{padding-left:16px;}.markdown-body .footnotes li{position:relative;}.markdown-body .footnotes li:target::before{position:absolute;top:-8px;right:-8px;bottom:-8px;left:-24px;pointer-events:none;content:"";border:2px solid var(--color-accent-emphasis);border-radius:6px;}.markdown-body .footnotes li:target{color:var(--color-fg-default);}.markdown-body .footnotes .data-footnote-backref g-emoji{font-family:monospace;}.markdown-body .task-list-item{list-style-type:none;}.markdown-body .task-list-item label{font-weight:400;}.markdown-body .task-list-item.enabled label{cursor:pointer;}.markdown-body .task-list-item+.task-list-item{margin-top:3px;}.markdown-body .task-list-item .handle{display:none;}.markdown-body .task-list-item-checkbox{margin:0 .2em .25em -1.6em;vertical-align:middle;}.markdown-body .contains-task-list:dir(rtl) .task-list-item-checkbox{margin:0 -1.6em .25em .2em;}.markdown-body::-webkit-calendar-picker-indicator{filter:invert(50%);}
\ No newline at end of file
diff --git a/static/rust-kbd-oled.jpg b/static/rust-kbd-oled.jpg
new file mode 100644
index 0000000..8aadec7
Binary files /dev/null and b/static/rust-kbd-oled.jpg differ
diff --git a/static/starry_night.css b/static/starry_night.css
new file mode 100644
index 0000000..fb038cf
--- /dev/null
+++ b/static/starry_night.css
@@ -0,0 +1 @@
+ :root{--color-prettylights-syntax-comment:#6e7781;--color-prettylights-syntax-constant:#0550ae;--color-prettylights-syntax-entity:#8250df;--color-prettylights-syntax-storage-modifier-import:#24292f;--color-prettylights-syntax-entity-tag:#116329;--color-prettylights-syntax-keyword:#cf222e;--color-prettylights-syntax-string:#0a3069;--color-prettylights-syntax-variable:#953800;--color-prettylights-syntax-brackethighlighter-unmatched:#82071e;--color-prettylights-syntax-invalid-illegal-text:#f6f8fa;--color-prettylights-syntax-invalid-illegal-bg:#82071e;--color-prettylights-syntax-carriage-return-text:#f6f8fa;--color-prettylights-syntax-carriage-return-bg:#cf222e;--color-prettylights-syntax-string-regexp:#116329;--color-prettylights-syntax-markup-list:#3b2300;--color-prettylights-syntax-markup-heading:#0550ae;--color-prettylights-syntax-markup-italic:#24292f;--color-prettylights-syntax-markup-bold:#24292f;--color-prettylights-syntax-markup-deleted-text:#82071e;--color-prettylights-syntax-markup-deleted-bg:#ffebe9;--color-prettylights-syntax-markup-inserted-text:#116329;--color-prettylights-syntax-markup-inserted-bg:#dafbe1;--color-prettylights-syntax-markup-changed-text:#953800;--color-prettylights-syntax-markup-changed-bg:#ffd8b5;--color-prettylights-syntax-markup-ignored-text:#eaeef2;--color-prettylights-syntax-markup-ignored-bg:#0550ae;--color-prettylights-syntax-meta-diff-range:#8250df;--color-prettylights-syntax-brackethighlighter-angle:#57606a;--color-prettylights-syntax-sublimelinter-gutter-mark:#8c959f;--color-prettylights-syntax-constant-other-reference-link:#0a3069;}@media (prefers-color-scheme:dark){:root{--color-prettylights-syntax-comment:#8b949e;--color-prettylights-syntax-constant:#79c0ff;--color-prettylights-syntax-entity:#d2a8ff;--color-prettylights-syntax-storage-modifier-import:#c9d1d9;--color-prettylights-syntax-entity-tag:#7ee787;--color-prettylights-syntax-keyword:#ff7b72;--color-prettylights-syntax-string:#a5d6ff;--color-prettylights-syntax-variable:#ffa657;--color-prettylights-syntax-brackethighlighter-unmatched:#f85149;--color-prettylights-syntax-invalid-illegal-text:#f0f6fc;--color-prettylights-syntax-invalid-illegal-bg:#8e1519;--color-prettylights-syntax-carriage-return-text:#f0f6fc;--color-prettylights-syntax-carriage-return-bg:#b62324;--color-prettylights-syntax-string-regexp:#7ee787;--color-prettylights-syntax-markup-list:#f2cc60;--color-prettylights-syntax-markup-heading:#1f6feb;--color-prettylights-syntax-markup-italic:#c9d1d9;--color-prettylights-syntax-markup-bold:#c9d1d9;--color-prettylights-syntax-markup-deleted-text:#ffdcd7;--color-prettylights-syntax-markup-deleted-bg:#67060c;--color-prettylights-syntax-markup-inserted-text:#aff5b4;--color-prettylights-syntax-markup-inserted-bg:#033a16;--color-prettylights-syntax-markup-changed-text:#ffdfb6;--color-prettylights-syntax-markup-changed-bg:#5a1e02;--color-prettylights-syntax-markup-ignored-text:#c9d1d9;--color-prettylights-syntax-markup-ignored-bg:#1158c7;--color-prettylights-syntax-meta-diff-range:#d2a8ff;--color-prettylights-syntax-brackethighlighter-angle:#8b949e;--color-prettylights-syntax-sublimelinter-gutter-mark:#484f58;--color-prettylights-syntax-constant-other-reference-link:#a5d6ff;}}.pl-c{color:var(--color-prettylights-syntax-comment);}.pl-c1,.pl-s .pl-v{color:var(--color-prettylights-syntax-constant);}.pl-e,.pl-en{color:var(--color-prettylights-syntax-entity);}.pl-smi,.pl-s .pl-s1{color:var(--color-prettylights-syntax-storage-modifier-import);}.pl-ent{color:var(--color-prettylights-syntax-entity-tag);}.pl-k{color:var(--color-prettylights-syntax-keyword);}.pl-s,.pl-pds,.pl-s .pl-pse .pl-s1,.pl-sr,.pl-sr .pl-cce,.pl-sr .pl-sre,.pl-sr .pl-sra{color:var(--color-prettylights-syntax-string);}.pl-v,.pl-smw{color:var(--color-prettylights-syntax-variable);}.pl-bu{color:var(--color-prettylights-syntax-brackethighlighter-unmatched);}.pl-ii{color:var(--color-prettylights-syntax-invalid-illegal-text);background-color:var(--color-prettylights-syntax-invalid-illegal-bg);}.pl-c2{color:var(--color-prettylights-syntax-carriage-return-text);background-color:var(--color-prettylights-syntax-carriage-return-bg);}.pl-sr .pl-cce{font-weight:bold;color:var(--color-prettylights-syntax-string-regexp);}.pl-ml{color:var(--color-prettylights-syntax-markup-list);}.pl-mh,.pl-mh .pl-en,.pl-ms{font-weight:bold;color:var(--color-prettylights-syntax-markup-heading);}.pl-mi{font-style:italic;color:var(--color-prettylights-syntax-markup-italic);}.pl-mb{font-weight:bold;color:var(--color-prettylights-syntax-markup-bold);}.pl-md{color:var(--color-prettylights-syntax-markup-deleted-text);background-color:var(--color-prettylights-syntax-markup-deleted-bg);}.pl-mi1{color:var(--color-prettylights-syntax-markup-inserted-text);background-color:var(--color-prettylights-syntax-markup-inserted-bg);}.pl-mc{color:var(--color-prettylights-syntax-markup-changed-text);background-color:var(--color-prettylights-syntax-markup-changed-bg);}.pl-mi2{color:var(--color-prettylights-syntax-markup-ignored-text);background-color:var(--color-prettylights-syntax-markup-ignored-bg);}.pl-mdr{font-weight:bold;color:var(--color-prettylights-syntax-meta-diff-range);}.pl-ba{color:var(--color-prettylights-syntax-brackethighlighter-angle);}.pl-sg{color:var(--color-prettylights-syntax-sublimelinter-gutter-mark);}.pl-corl{text-decoration:underline;color:var(--color-prettylights-syntax-constant-other-reference-link);}
\ No newline at end of file
diff --git a/static/styles.css b/static/styles.css
new file mode 100644
index 0000000..71c8a4a
--- /dev/null
+++ b/static/styles.css
@@ -0,0 +1 @@
+@import "github-markdown.css";@media (prefers-color-scheme:dark){body{background-color:#07090d;}}@media (prefers-color-scheme:light){body{background-color:#e6eaed;}}.menu-item{text-align:center;font-size:2em;font-weight:600;font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji";color:var(--color-fg-default);background-color:var(--color-canvas-subtle);padding:5px;border-radius:3px;margin-right:5px;margin-top:5px;}.self-link{cursor:pointer;}button{all:unset;cursor:pointer;}#markdown-text{padding-top:10px;padding-bottom:10px;margin-left:5%;margin-right:5%;}#content{margin-left:calc((100vw - 980px - 45px) / 2);}@media (max-width:1025px){#content{margin:0 auto;}}.markdown-body{box-sizing:border-box;min-width:200px;max-width:980px;margin:0 auto;padding:45px;}@media (max-width:767px){.markdown-body{padding:15px;}}
\ No newline at end of file
diff --git a/table-of-contents.html b/table-of-contents.html
new file mode 100644
index 0000000..6d5340e
--- /dev/null
+++ b/table-of-contents.html
@@ -0,0 +1 @@
+<!DOCTYPE html> <html lang="en" xmlns="http://www.w3.org/1999/html"> <meta charset="UTF-8"> <base href="/"> <link rel="stylesheet" href="static/styles.css"> <link rel="stylesheet" href="static/github-markdown.css"> <link rel="stylesheet" href="static/starry_night.css"> <title>Nav</title> <div id="menu"> <a href=/ class="menu-item">Home</a> </div> <div id="content"> <div class="markdown-body"><h1>Table of contents</h1> <p>Because I'm terrible at web-dev and unable to make a side menu scale properly, I made things easier for myself and made navigation happen through this md-page instead.</p> <h2>Top level navigation</h2> <ul> <li><a href="/">Home(also top left on this page)</a> <li><a href="/table-of-contents.html">Table of contents(here, nothing will happen)</a> </ul> <h2>Projects</h2> <ul> <li><a href="/meta">This page</a> <li><a href="/pgwm03.html">Pgwm03 - nostd - nolibc window manager</a> <li><a href="/boot.html">Boot - Writing a tiny bootloader</a> <li><a href="/pgwm04.html">Pgwm04 - Wm runs on stable, uses io_uring</a> <li><a href="/threads.html">Threads - Tiny-std has threading support</a> <li><a href="/static-pie.html">Static pie - Tiny-std can compile as static-pie</a> <li><a href="/kbd-smp.html">Keyboard SMP - Multithreading in your keyboard</a> <li><a href="/rust-kbd.html">Rust kbd - Keyboard firmware in rust</a> <li><a href="/test.html">Test</a> </ul> <h2>Retrospectives</h2> <ul> <li><a href="/x11-to-xcb.html">RIIR, postmortem on changing out C-bindings for a pure Rust implementation</a> </ul> </div> </div>
\ No newline at end of file
diff --git a/test.html b/test.html
new file mode 100644
index 0000000..7f0f46f
--- /dev/null
+++ b/test.html
@@ -0,0 +1,24 @@
+<!DOCTYPE html>
+<html lang="en" xmlns="http://www.w3.org/1999/html">
+
+    <meta charset="UTF-8">
+    <base href="/">
+    <link rel="stylesheet" href="static/styles.css">
+    <link rel="stylesheet" href="static/github-markdown.css">
+    <link rel="stylesheet" href="static/starry_night.css">
+    <title>Test</title>
+
+
+<div id="menu">
+<a href=/ class="menu-item">Home</a><a href=/table-of-contents.html class="menu-item">Table of contents</a>
+</div>
+<div id="content">
+<div class="markdown-body"><h1>Here's a test write-up</h1>
+<p>I always test in prod.</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">fn</span> <span class="pl-en">main</span>() {
+    <span class="pl-en">panic!</span>(<span class="pl-s"><span class="pl-pds">"</span>Finally highlighting works<span class="pl-pds">"</span></span>);
+}
+</pre></div>
+<p>Test some change here!</p>
+</div>
+</div>
\ No newline at end of file
diff --git a/threads.html b/threads.html
new file mode 100644
index 0000000..26697f1
--- /dev/null
+++ b/threads.html
@@ -0,0 +1,517 @@
+<!DOCTYPE html>
+<html lang="en" xmlns="http://www.w3.org/1999/html">
+
+    <meta charset="UTF-8">
+    <base href="/">
+    <link rel="stylesheet" href="static/styles.css">
+    <link rel="stylesheet" href="static/github-markdown.css">
+    <link rel="stylesheet" href="static/starry_night.css">
+    <title>Threads</title>
+
+
+<div id="menu">
+<a href=/ class="menu-item">Home</a><a href=/table-of-contents.html class="menu-item">Table of contents</a>
+</div>
+<div id="content">
+<div class="markdown-body"><h1>Threads, some assembly required.</h1>
+<p>Lately I've been thinking about adding threads to <a href="https://github.com/MarcusGrass/tiny-std/">tiny-std</a>,
+my linux-only <code>x86_64</code>/<code>aarch64</code>-only tiny standard library for <a href="https://github.com/rust-lang/rust">Rust</a>.</p>
+<p>Now I've finally done that, with some jankiness, in this write-up I'll
+go through that process.</p>
+<h2>Parallelism</h2>
+<p>Sometimes in programming, <a href="https://en.wikipedia.org/wiki/Parallel_computing">parallelism</a> (doing multiple things at the
+same time), is desirable. For example, to complete some task two different long-running calculations have to be made.
+If those can be run in parallel, our latency becomes that of the slowest of those tasks (plus some overhead).</p>
+<p>Some ways of achieving parallelism in your program are:</p>
+<ol>
+<li><a href="https://en.wikipedia.org/wiki/Single_instruction,_multiple_data">SIMD</a>, hopefully
+your compiler does that for you. But here we're talking about singular processor operations,
+not arbitrary tasks.
+<li>Offloading tasks to the OS. If your OS has asynchronous apis then you could ask it to do multiple things at once
+and achieve parallelism that way.
+<li>Running tasks in other processes.
+<li>Running tasks in threads.
+</ol>
+<h2>Threads</h2>
+<p><a href="https://en.wikipedia.org/wiki/Thread_(computing)">Wikipedia</a> says of threads:</p>
+<blockquote>
+<p>"In computer science, a thread of execution is the smallest sequence of programmed instructions that can be
+managed independently by a scheduler, which is typically a part of the operating system."</p>
+</blockquote>
+<p>Threads from a programming perspective, are managed by the OS, how threads work is highly OS-dependent. I'll
+only go into <code>Linux</code> specifically here, and only from an api-consumers perspective.</p>
+<h3>Spawning a thread with a minimal task</h3>
+<p>In the rust std-library, a thread can be spawned with</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">fn</span> <span class="pl-en">main</span>() {
+    <span class="pl-k">let</span> <span class="pl-smi">handle</span> <span class="pl-k">=</span> <span class="pl-en">std</span><span class="pl-k">::</span><span class="pl-en">thread</span><span class="pl-k">::</span><span class="pl-en">spawn</span>(<span class="pl-k">||</span> {
+        <span class="pl-en">std</span><span class="pl-k">::</span><span class="pl-en">thread</span><span class="pl-k">::</span><span class="pl-en">sleep</span>(<span class="pl-en">std</span><span class="pl-k">::</span><span class="pl-en">time</span><span class="pl-k">::</span><span class="pl-en">Duration</span><span class="pl-k">::</span><span class="pl-en">from_millis</span>(<span class="pl-c1">500</span>));
+        <span class="pl-en">println!</span>(<span class="pl-s"><span class="pl-pds">"</span>Hello from my thread<span class="pl-pds">"</span></span>);
+    });
+<span class="pl-c">    // Suspends execution of the calling thread until the child-thread completes.  </span>
+    <span class="pl-smi">handle</span><span class="pl-k">.</span><span class="pl-en">join</span>()<span class="pl-k">.</span><span class="pl-en">unwrap</span>();   
+}
+</pre></div>
+<p>In the above program, some setup runs before the main-function, some delegated to
+<a href="https://en.wikipedia.org/wiki/C_standard_library">libc</a>. Which sets up what it deems appropriate.
+<code>Rust</code> sets up a panic handler, and miscellaneous things the program needs to run correctly,
+then the main-thread starts executing the <code>main</code> function.<br>
+In the <code>main</code> function, the main thread spawns a child, which at the point of spawn starts executing the task provided by the
+supplied closure <code>Wait 500 millis, then print a message</code>, then waits for that thread to complete.</p>
+<p>I wanted to replicate this API, without using <code>libc</code>.</p>
+<h3>Clone, swiss army syscall</h3>
+<p>The <code>Linux</code> <a href="https://man7.org/linux/man-pages/man2/clone.2.html">clone syscall</a> can be used for a lot of things.<br>
+So many that it's extremely difficult to use it correctly, it's very easy to cause security issues through
+various memory-management mistakes, many of which I discovered on this journey.</p>
+<p>The signature for the <a href="https://en.wikipedia.org/wiki/Glibc">glibc</a> clone wrapper function looks like:</p>
+<div class="highlight highlight-c"><pre><span class="pl-k">int</span> <span class="pl-en">clone</span>(<span class="pl-k">int</span> (*fn)(<span class="pl-k">void</span> *), void *stack, int flags, void *arg, ...
+<span class="pl-c">/* pid_t *parent_tid, void *tls, pid_t *child_tid */</span> );
+</pre></div>
+<p>Right away I can tell that calling this is not going to be easy from <code>Rust</code>, we've got
+<a href="https://en.wikipedia.org/wiki/Variadic_function">varargs</a> in there, which is problematic because:</p>
+<ol>
+<li><code>Rust</code> doesn't have varargs, porting some <code>C</code>-functionality from for example
+<a href="https://en.wikipedia.org/wiki/Musl">musl</a> won't be straight forward.
+<li>Varargs are not readable (objectively true opinion).
+</ol>
+<p>Skipping down to the <code>Notes</code>-section of the documentation shows the actual syscall interface (for <code>x86_64</code> in a
+conspiracy to ruin my life, the last two args are switched on <code>aarch64</code>):</p>
+<div class="highlight highlight-c"><pre><span class="pl-k">long</span> <span class="pl-en">clone</span>(<span class="pl-k">unsigned</span> <span class="pl-k">long</span> flags, <span class="pl-k">void</span> *stack,
+                      <span class="pl-k">int</span> *parent_tid, <span class="pl-k">int</span> *child_tid,
+                      <span class="pl-k">unsigned</span> <span class="pl-k">long</span> tls);
+</pre></div>
+<p>Very disconcerting, since the <code>C</code>-api which accepts varargs, seems to do quite a bit of work before making the syscall,
+handing over a task to the OS.</p>
+<p>In simple terms, clone is a way to "clone" the current process. If you have experience with
+<a href="https://man7.org/linux/man-pages/man2/fork.2.html">fork</a>, that's an example of <code>clone</code>.
+Here's an equivalent <code>fork</code> using the <code>clone</code> syscall from <code>tiny-std</code>:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-c">/// Fork isn't implemented for aarch64, we're substituting with a clone call here</span>
+<span class="pl-c">/// # Errors</span>
+<span class="pl-c">/// See above</span>
+<span class="pl-c">/// # Safety</span>
+<span class="pl-c">/// See above</span>
+#[cfg(target_arch <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>aarch64<span class="pl-pds">"</span></span>)]
+<span class="pl-k">pub</span> <span class="pl-k">unsafe</span> <span class="pl-k">fn</span> <span class="pl-en">fork</span>() <span class="pl-k">-></span> <span class="pl-en">Result</span>&#x3C;<span class="pl-en">PidT</span>> {
+<span class="pl-c">    // SIGCHLD is mandatory on aarch64 if mimicking fork it seems</span>
+    <span class="pl-k">let</span> <span class="pl-smi">cflgs</span> <span class="pl-k">=</span> <span class="pl-k">crate::</span><span class="pl-en">platform</span><span class="pl-k">::</span><span class="pl-en">SignalKind</span><span class="pl-k">::</span><span class="pl-c1">SIGCHLD</span>;
+    <span class="pl-k">let</span> <span class="pl-smi">res</span> <span class="pl-k">=</span> <span class="pl-en">syscall!</span>(<span class="pl-c1">CLONE</span>, <span class="pl-smi">cflgs</span><span class="pl-k">.</span><span class="pl-en">bits</span>()<span class="pl-k">.</span><span class="pl-c1">0</span>, <span class="pl-c1">0</span>, <span class="pl-c1">0</span>, <span class="pl-c1">0</span>, <span class="pl-c1">0</span>);
+    <span class="pl-en">bail_on_below_zero!</span>(<span class="pl-smi">res</span>, <span class="pl-s"><span class="pl-pds">"</span>CLONE syscall failed<span class="pl-pds">"</span></span>);
+    #[allow(clippy<span class="pl-k">::</span>cast_possible_truncation, clippy<span class="pl-k">::</span>cast_possible_wrap)]
+    <span class="pl-en">Ok</span>(<span class="pl-smi">res</span> <span class="pl-k">as</span> <span class="pl-en">i32</span>)
+}
+</pre></div>
+<p>What happens immediately after this call, is that our process is cloned and starts executing past the code which called
+<code>clone</code>, following the above <code>Rust</code> example:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">fn</span> <span class="pl-en">parallelism_through_multiprocess</span>() {
+    <span class="pl-k">let</span> <span class="pl-smi">pid</span> <span class="pl-k">=</span> <span class="pl-k">unsafe</span> { <span class="pl-en">rusl</span><span class="pl-k">::</span><span class="pl-en">process</span><span class="pl-k">::</span><span class="pl-en">fork</span>()<span class="pl-k">.</span><span class="pl-en">unwrap</span>() };
+    <span class="pl-k">if</span> <span class="pl-smi">pid</span> <span class="pl-k">==</span> <span class="pl-c1">0</span> {
+        <span class="pl-en">println!</span>(<span class="pl-s"><span class="pl-pds">"</span>In child!<span class="pl-pds">"</span></span>);
+        <span class="pl-en">rusl</span><span class="pl-k">::</span><span class="pl-en">process</span><span class="pl-k">::</span><span class="pl-en">exit</span>(<span class="pl-c1">0</span>);
+    } <span class="pl-k">else</span> {
+        <span class="pl-en">println!</span>(<span class="pl-s"><span class="pl-pds">"</span>In parent, spawned child {pid}<span class="pl-pds">"</span></span>);
+    }
+}
+</pre></div>
+<p>This program will print (in non-deterministic order):<br>
+<code>In parent, spawned child 24748</code> and<br>
+<code>In child</code>, and return to the caller.</p>
+<p>Here we achieved parallelism by spawning another process and doing work there, separately scheduled by the OS,
+then exited that process. At the same time, our caller returns as usual, only stopping briefly to make the syscall.</p>
+<p>Achieving parallelism in this way can be fine. If you want to run a command, <code>forking</code>/<code>cloning</code> then executing
+another binary through the <a href="https://man7.org/linux/man-pages/man2/execve.2.html">execve-syscall</a>
+is usually how that's done.<br>
+Multiprocessing can be a bad choice if the task is small, because setting up an entire other process can be resource
+intensive, and communicating between processes can be slower than communicating through shared memory.</p>
+<h3>Threads: Cloning intra-process with shared memory</h3>
+<p>What we think of as threads in linux are sometimes called
+<a href="https://en.wikipedia.org/wiki/Light-weight_process">Light-Weight Processes</a>, the above clone call spawned a regular
+process, which got a full copy of the parent-process' memory with copy-on-write semantics.</p>
+<p>To reduce overhead in both spawning, and communicating between the cloned process and the rest of the processes
+in the application, a combination of flags are used:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">let</span> <span class="pl-smi">flags</span> <span class="pl-k">=</span> <span class="pl-en">CloneFlags</span><span class="pl-k">::</span><span class="pl-c1">CLONE_VM</span>
+        <span class="pl-k">|</span> <span class="pl-en">CloneFlags</span><span class="pl-k">::</span><span class="pl-c1">CLONE_FS</span>
+        <span class="pl-k">|</span> <span class="pl-en">CloneFlags</span><span class="pl-k">::</span><span class="pl-c1">CLONE_FILES</span>
+        <span class="pl-k">|</span> <span class="pl-en">CloneFlags</span><span class="pl-k">::</span><span class="pl-c1">CLONE_SIGHAND</span>
+        <span class="pl-k">|</span> <span class="pl-en">CloneFlags</span><span class="pl-k">::</span><span class="pl-c1">CLONE_THREAD</span>
+        <span class="pl-k">|</span> <span class="pl-en">CloneFlags</span><span class="pl-k">::</span><span class="pl-c1">CLONE_SYSVSEM</span>
+        <span class="pl-k">|</span> <span class="pl-en">CloneFlags</span><span class="pl-k">::</span><span class="pl-c1">CLONE_CHILD_CLEARTID</span>
+        <span class="pl-k">|</span> <span class="pl-en">CloneFlags</span><span class="pl-k">::</span><span class="pl-c1">CLONE_SETTLS</span>;
+</pre></div>
+<p>Clone flags are tricky to explain, they interact with each other as well, but in short:</p>
+<ol>
+<li><code>CLONE_VM</code>, clone memory without copy-on-write semantics, meaning, the parent and child
+share memory space and can modify each-other's memory.
+<li><code>CLONE_FS</code>, the parent and child share the same filesystem information, such as current directory.
+<li><code>CLONE_FILES</code>, the parent and child share the same file-descriptor table,
+(if one opens an <code>fd</code>, that <code>fd</code> is available to the other).
+<li><code>CLONE_SIGHAND</code>, the parent and child share signal handlers.
+<li><code>CLONE_THREAD</code>, the child-process is placed in the same thread-group as the parent.
+<li><code>CLONE_SYSVSEM</code>, the parent and child shares semaphores.
+<li><code>CLONE_CHILD_CLEARTID</code>, wake up waiters for the supplied <code>child_tid</code> futex pointer when the child exits
+(we'll get into this).
+<li><code>CLONE_SETTLS</code>, set the thread-local storage to the data pointed at by the <code>tls</code>-variable (architecture specific,
+we'll get into this as well).
+</ol>
+<p>The crucial flags to run some tasks in a thread are only:</p>
+<ol>
+<li><code>CLONE_VM</code>
+<li><code>CLONE_THREAD</code>
+</ol>
+<p>The rest are for usability and expectation, as well as cleanup reasons.</p>
+<h2>Implementation</h2>
+<p>Now towards the actual implementation of a minimal threading API.</p>
+<h3>API expectation</h3>
+<p>The std library in <code>Rust</code> provides an interface that could be used like this:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">let</span> <span class="pl-smi">join_handle</span> <span class="pl-k">=</span> <span class="pl-en">std</span><span class="pl-k">::</span><span class="pl-en">thread</span><span class="pl-k">::</span><span class="pl-en">spawn</span>(<span class="pl-k">||</span> <span class="pl-en">println!</span>(<span class="pl-s"><span class="pl-pds">"</span>Hello from my thread!<span class="pl-pds">"</span></span>));
+<span class="pl-smi">join_handle</span><span class="pl-k">.</span><span class="pl-en">join</span>()<span class="pl-k">.</span><span class="pl-en">unwrap</span>();
+</pre></div>
+<p>A closure that is run on another thread is supplied and a <code>JoinHandle&#x3C;T></code> is returned, the join handle
+can be awaited by calling its <code>join</code>-method, which will block the calling thread until the thread executing the closure
+has completed. If it <code>panics</code>, the <code>Result</code> will be an <code>Err</code>, if it succeeds, it will be an <code>Ok(T)</code> where <code>T</code> is
+the return value from the closure, which in this case is nothing (<code>()</code>);</p>
+<h3>Executing a clone call</h3>
+<p>If <code>CLONE_VM</code> is specified, a stack should be supplied. <code>CLONE_VM</code> means sharing mutable memory, if we didn't
+supply the stack, both threads would continue mutating the same stack area, here's an excerpt from
+<a href="https://man7.org/linux/man-pages/man2/clone.2.html">the docs</a> about that:</p>
+<blockquote>
+<p>[..] (If the
+child shares the parent's memory because of the use of the
+CLONE_VM flag, then no copy-on-write duplication occurs and chaos
+is likely to result.) - "C library/kernel differences"-section</p>
+</blockquote>
+<h4>Allocating the stack</h4>
+<p>Therefore, setting up a stack is required. There are a few options for that, the kernel only needs a chunk of correctly
+aligned memory depending on what platform we're targeting. We could even just take some memory off our own stack
+if we want too.</p>
+<h5>Use the callers stack</h5>
+<div class="highlight highlight-rust"><pre><span class="pl-k">fn</span> <span class="pl-en">clone</span>() {
+<span class="pl-c">    /// 16 kib stack allocation</span>
+    <span class="pl-k">let</span> <span class="pl-k">mut</span> <span class="pl-smi">my_stack</span> <span class="pl-k">=</span> [<span class="pl-c1">0</span><span class="pl-en">u8</span>; <span class="pl-c1">16384</span>];
+    <span class="pl-k">let</span> <span class="pl-smi">stack_ptr</span> <span class="pl-k">=</span> <span class="pl-smi">my_stack</span><span class="pl-k">.</span><span class="pl-en">as_mut_ptr</span>();
+<span class="pl-c">    /// pass through to syscall</span>
+    <span class="pl-en">syscall!</span>(<span class="pl-c1">CLONE</span>, <span class="pl-k">...</span>, <span class="pl-smi">stack_ptr</span>, <span class="pl-k">...</span>);
+}
+</pre></div>
+<p>This is bad for a generic API for a multitude of reasons.
+It restricts the user to threads that complete before the caller has popped the stack frame in which they were created,
+since the part of the stack that was used in this frame will be reused by the caller later, possibly while the
+child-thread still uses it for its own stack. Which we now know, would result in chaos.</p>
+<p>Additionally, we will have to have stack space available on the calling thread, for an arbitrary amount of children
+if this API was exposed to users. In the case a heap-allocations, the assumption that we will have enough memory for
+reasonable thread-usage is valid. <code>Rust</code>'s default thread stack size is <code>2MiB</code>. On a system with <code>16GiB</code> of RAM, with
+<code>8GiB</code> available at a given time, that's 4000 threads, spawning that many is likely not intentional or performant.</p>
+<p>Keeping threads on the main-thread's stack, significantly reduces our memory availability, along with the risk of chaos.</p>
+<p>There is a case to be made for some very specific application which spawns some threads in scope, does some work, then exits,
+to reuse the caller's stack. But I have yet to encounter that kind of use-case in practice, let's move on to something
+more reasonable.</p>
+<h5>Mmap more stack-space</h5>
+<p>This is what <code>musl</code> does. We allocate the memory that we want to use from new os-pages and use those.<br>
+We could potentially do a regular <code>malloc</code> as well, although that would mean less control over the allocated memory.</p>
+<h4>Communicating with the started thread</h4>
+<p>Now <code>mmap</code>-ing some stack-memory is enough for the OS to start a thread with its own stack, but then what?<br>
+The thread needs to know what to do, we can't provide it with any arguments, we need to put all the data it needs
+on its stack before starting execution of the task.</p>
+<p>This means that we'll need some assembly, since using the clone syscall and then continuing in <code>Rust</code> relinquishes
+control that we need over the stack, we need to put almost the entire child-thread's lifetime in assembly.</p>
+<p>The structure of the call is mostly stolen from <code>musl</code>, with some changes for this more minimal use-case.
+The rust function will look like this:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">extern</span> <span class="pl-s"><span class="pl-pds">"</span>C<span class="pl-pds">"</span></span> {
+    <span class="pl-k">fn</span> <span class="pl-en">__clone</span>(
+        <span class="pl-smi">start_fn</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+        <span class="pl-smi">stack_ptr</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+        <span class="pl-smi">flags</span><span class="pl-k">:</span> <span class="pl-en">i32</span>,
+        <span class="pl-smi">args_ptr</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+        <span class="pl-smi">tls_ptr</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+        <span class="pl-smi">child_tid_ptr</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+        <span class="pl-smi">stack_unmap_ptr</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+        <span class="pl-smi">stack_sz</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+    ) <span class="pl-k">-></span> <span class="pl-en">i32</span>;
+}
+</pre></div>
+<ol>
+<li>It takes a pointer to a <code>start_fn</code>, which is a <code>C</code> calling convention function pointer, where our thread will pick up.
+<li>It also takes a pointer to the stack, <code>stack_ptr</code>.
+<li>It takes clone-flags which we send onto the OS in the syscall.
+<li>It takes an <code>args_ptr</code>, which is the closure we want to run, converted to a <code>C</code> calling convention function pointer.
+<li>It takes a <code>tls_ptr</code>, a pointer to some thread local storage, which we'll need to deallocate the thread's stack, and
+communicate with the calling thread.
+<li>It takes a <code>child_tid_ptr</code>, which will be used to synchronize with the calling thread.
+<li>It takes a <code>stack_unmap_ptr</code>, which is the base address that we allocated for the stack at its original <code>0</code> offset.
+<li>It takes the <code>stack_sz</code>, stack-size, which we'll need to deallocate the stack later.
+</ol>
+<h4>Syscalls</h4>
+<p><code>x86_64</code> and <code>aarch64</code> assembly has a command to execute a <code>syscall</code>.</p>
+<p>A syscall is like a function call to the kernel, we'll need to make three syscalls using assembly:</p>
+<ol>
+<li>CLONE, nr 56 on <code>x86_64</code>
+<li>MUNMAP, nr 11 on <code>x86_64</code>
+<li>EXIT, nr 60 on <code>x86_64</code>
+</ol>
+<p>The interface for the syscall is as follows:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-c">/// Syscall conventions are on 5 args:</span>
+<span class="pl-c">/// - arg -> arch: reg,</span>
+<span class="pl-c">/// - nr -> x86: rax, aarch64: x8</span>
+<span class="pl-c">/// - a1 -> x86: rdi, aarch64: x0</span>
+<span class="pl-c">/// - a2 -> x86: rsi, aarch64: x1</span>
+<span class="pl-c">/// - a3 -> x86: rdx, aarch64: x2</span>
+<span class="pl-c">/// - a4 -> x86: r10, aarch64: x3</span>
+<span class="pl-c">/// - a5 -> x86: r8,  aarch64: x4</span>
+<span class="pl-c">/// Pseudocode syscall as extern function: </span>
+<span class="pl-k">extern</span> <span class="pl-s"><span class="pl-pds">"</span>C<span class="pl-pds">"</span></span> {
+    <span class="pl-k">fn</span> <span class="pl-en">syscall</span>(<span class="pl-smi">nr</span><span class="pl-k">:</span> <span class="pl-en">usize</span>, <span class="pl-smi">a1</span><span class="pl-k">:</span> <span class="pl-en">usize</span>, <span class="pl-smi">a2</span><span class="pl-k">:</span> <span class="pl-en">usize</span>, <span class="pl-smi">a3</span><span class="pl-k">:</span> <span class="pl-en">usize</span>, <span class="pl-smi">a4</span><span class="pl-k">:</span> <span class="pl-en">usize</span>, <span class="pl-smi">a5</span><span class="pl-k">:</span> <span class="pl-en">usize</span>);
+}
+</pre></div>
+<p>Onto the assembly, it can be boiled down to this:</p>
+<ol>
+<li>Prepare arguments to go in the right registers for the syscall.
+<li>Put what the thread needs into its stack.
+<li>Execute the clone syscall, return directly to the caller (parent-thread).
+<li>Pop data from the spawned thread's stack into registers.
+<li>Execute the function we wanted to run in the spawned thread.
+<li>Unmap the spawned thread's own stack
+<li>Exit 0
+</ol>
+<pre><code class="language-asm">// Boilerplate to expose the symbol
+.text
+.global __clone
+.hidden __clone
+.type   __clone,@function
+// Actual declaration
+__clone:
+// tls_ptr already in r8, syscall arg 5 register, due to C calling convention on this function, same with stack_ptr in rsi
+// Zero syscall nr register ax (eax = 32bit ax)
+xor eax, eax
+// Move 56 into the lower 8 bits of ax (al = 8bit ax), 56 is the CLONE syscall nr for x86_64, will become: syscall(56, .., stack_ptr, .., tls_ptr)
+mov al, 56
+// Move start function into r11, scratch register, save it there since we need to shuffle stuff around
+mov r11, rdi
+// Move flags into rdi, syscall arg 1 register, well become: syscall(56, flags, stack_ptr, .., .., tls_ptr)
+mov rdi, rdx
+// Zero parent_tid_ptr from syscall arg 3 register (not using), will become: syscall(56, flags, stack_ptr, 0, .., tls_ptr)
+xor rdx, rdx
+// Move child_tid_ptr into syscall arg 4 register (our arg 6), will become: syscall(56, flags, stack_ptr, 0, child_tid_ptr, tls_ptr)
+mov r10, r9
+// Move start function into r9
+mov r9, r11
+// Align stack ptr to -16
+and rsi, -16
+// Move down 8 bytes on the stack ptr
+sub rsi, 8
+// Move args onto the the top of the stack
+mov [rsi], rcx
+// Move down 8 bytes more on the stack ptr
+sub rsi, 8
+// Move the first arg that went on the stack into rcx (stack_unmap_ptr)
+mov rcx, [8 + rsp]
+// Move stack_unmap_ptr onto our new stack
+mov [rsi], rcx
+// Move the second arg that went on the stack into rcx (stack_sz)
+mov rcx, [16 + rsp]
+// Move down stack ptr
+sub rsi, 8
+// Move stack_sz onto the new stack
+mov [rsi], rcx
+// Make clone syscall
+syscall
+// Check if the syscall return vaulue is 0
+test eax, eax
+// if not zero, return (we're the calling thread)
+jnz 1f
+// Child:
+// Zero the base pointer
+xor ebp, ebp
+// Pop the stack_sz off the provided stack into callee saved register
+pop r13
+// Pop the stack_ptr off the provided stack into another callee saved register
+pop r12
+// Pop the start fn args off the provided stack into rdi
+pop rdi
+// Call the function we saved in r9, rdi first arg
+call r9
+// Zero rax (function return, we don't care)
+xor rax, rax
+// Move MUNMAP syscall into ax
+mov al, 11
+// Stack ptr as the first arg
+mov rdi, r12
+// Stack len as the second arg
+mov rsi, r13
+// Syscall, unmapping the stack
+syscall
+// Clear the output register, we can't use the return value anyway
+xor eax,eax
+// Move EXIT syscall nr into ax
+mov al, 60
+// Set exit code for the thread to 0
+mov rdi, 0
+// Make exit syscall
+syscall
+1: ret
+</code></pre>
+<p>And that's it, kinda, with some code wrapping this we can run an arbitrary closure on a separate thread!</p>
+<h3>Race conditions</h3>
+<p>We're far from done, in the happy case we're starting a thread, it completes, and deallocates its own stack.
+But, we need to get its returned value, and we need to know if it's done.</p>
+<p>Unlike a process, we cannot use the <a href="https://man7.org/linux/man-pages/man2/waitpid.2.html">wait-syscall</a> to wait
+for the process to complete, but there is another way, alluded to in the note on <code>CLONE_CHILD_CLEARTID</code>.</p>
+<h4>Futex messaging</h4>
+<p>If <code>CLONE_CHILD_CLEARTID</code> is supplied in clone-flags along with a pointer to a futex variable, something with a <code>u32</code>-layout
+in <code>Rust</code> that's most reasonably <code>AtomicU32</code>, then the OS will set that futex-value to <code>0</code> (not null) when the thread exits,
+successfully or not.</p>
+<p>This means that if the caller wants to <code>join</code>, i.e. blocking-wait for the child-thread to finish, it can use the
+<a href="https://man7.org/linux/man-pages/man2/futex.2.html">futex-syscall</a>.</p>
+<h4>Getting the returned value</h4>
+<p>The return value is fairly simple, we need to allocate space for it, for example with a pointer to an <code>UnsafeCell&#x3C;Option&#x3C;T>></code>,
+and then have the child-thread update it. The catch here is that we can't have <code>&#x26;</code>-references to that value while the child-thread
+may be writing to it, since that's <code>UB</code>. We share a pointer with the child containing the value, and we need to be
+absolutely certain that the child-thread is done with
+its modification before we try to read it. For example by waiting for it to exit by <code>join</code>-ing.</p>
+<h3>Memory leaks, who deallocates what?</h3>
+<p>We don't necessarily have to keep our <code>JoinHandle&#x3C;T></code> around after spawning a thread. A perfectly valid use-case is to
+just spawn some long-running thread and then forget about it, this causes a problem, if the calling thread doesn't have
+sole responsibility of deallocating the shared memory (the <code>futex</code> variable, and the return value), then we need a way
+to signal to the child-thread that it's that thread's responsibility to deallocate those variables before exiting.</p>
+<p>Enter the third shared variable, an <code>AtomicBool</code> called <code>should_dealloc</code>, both threads share a pointer to this variable
+as well.</p>
+<p>Now there are three deallocation-scenarios:</p>
+<ol>
+<li>Caller joins the child thread by waiting for the <code>futex</code>-variable to change value to <code>0</code>.
+In this case the caller deallocates the <code>futex</code>, takes the return value of the heap freeing its memory, and
+deallocates the <code>should_dealloc</code> pointer.
+<li>Caller drops the <code>JoinHandle&#x3C;T></code>. This is racy, we need to read <code>should_dealloc</code> to see that the child thread hasn't
+already completed its work. If it has, we wait on the <code>futex</code> to make sure the child thread is completely done, then
+deallocate as above.
+<li>The child thread tries to set <code>should_dealloc</code> to <code>true</code> and fails, meaning that the calling thread has already
+dropped the <code>JoinHandle&#x3C;T></code>. In this case, the child thread needs to signal to the OS that the <code>futex</code> is no longer
+to be updated on thread exit through the
+<a href="https://man7.org/linux/man-pages/man2/set_tid_address.2.html">set_tid_address-syscall</a> (forgetting to do this results in a
+use after free, oof. Here's a <code>Linux</code>-code-comment calling me a dumbass that I found when trying to find the source of the segfaults:
+</ol>
+<div class="highlight highlight-c"><pre><span class="pl-c">// 929ed21dfdb6ee94391db51c9eedb63314ef6847, kernel/fork.c#L1634, written by Linus himself</span>
+<span class="pl-k">if</span> (tsk->clear_child_tid) {
+		<span class="pl-k">if</span> (<span class="pl-c1">atomic_read</span>(&#x26;mm-><span class="pl-smi">mm_users</span>) > <span class="pl-c1">1</span>) {
+			<span class="pl-c">/*</span>
+<span class="pl-c">			 * We don't check the error code - if userspace has</span>
+<span class="pl-c">			 * not set up a proper pointer then tough luck.</span>
+<span class="pl-c">			 */</span>
+			<span class="pl-c1">put_user</span>(<span class="pl-c1">0</span>, tsk-><span class="pl-smi">clear_child_tid</span>);
+			<span class="pl-c1">do_futex</span>(tsk-><span class="pl-smi">clear_child_tid</span>, FUTEX_WAKE,
+					<span class="pl-c1">1</span>, <span class="pl-c1">NULL</span>, <span class="pl-c1">NULL</span>, <span class="pl-c1">0</span>, <span class="pl-c1">0</span>);
+		}
+		tsk-><span class="pl-smi">clear_child_tid</span> = <span class="pl-c1">NULL</span>;
+	}
+</pre></div>
+<p>). Then it can safely deallocate the shared variables.</p>
+<h3>Oh, right. Panics...</h3>
+<p>I imagine a world where <code>Rust</code> doesn't contain panics. Sadly, we don't live in that world, and thus we need to handle them.<br>
+If the thread panics, and we try to join then it's no issue, we'll get a <code>None</code> return value, and can continue with
+the regular cleanup from the caller.<br>
+However, if the thread panics after the caller has dropped the <code>JoinHandle&#x3C;T></code> the shared memory is leaked,
+and the stack isn't deallocated.</p>
+<p>A <code>Rust</code> panic handler could like this:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-c">/// Dummy panic handler</span>
+#[panic_handler]
+<span class="pl-k">pub</span> <span class="pl-k">fn</span> <span class="pl-en">on_panic</span>(<span class="pl-smi">info</span><span class="pl-k">:</span> <span class="pl-k">&#x26;</span><span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">panic</span><span class="pl-k">::</span><span class="pl-en">PanicInfo</span>) <span class="pl-k">-></span> <span class="pl-k">!</span> {
+    <span class="pl-k">loop</span> {}
+}
+</pre></div>
+<p>The signature shows that it gets <code>PanicInfo</code> and never returns.<br>
+When a thread panics, it enters that function and never returns, it's here that we need to handle cleanup in the
+case that the thread panics.</p>
+<p>What we need:</p>
+<ol>
+<li>A pointer to the <code>futex</code>
+<li>A pointer to the return value
+<li>A pointer to the <code>should_dealloc</code> variable
+<li>The address at which we allocated this thread's stack
+<li>The size of that allocated stack
+</ol>
+<p>We could insert those in registers that shouldn't be touched by the user-supplied function, but that's fairly brittle,
+instead we'll use the dreaded <code>tls</code>.</p>
+<h4>Thread-local storage</h4>
+<p>Thread-local storage, or <code>tls</code> is a way to store thread-specific data.<br>
+For <code>x86_64</code> and <code>aarch64</code> there is a specific register we can use to store a pointer to some arbitrary data,
+we can read from that data at any time from any place, in other words, the data is global to the thread.</p>
+<p>In practice:</p>
+<div class="highlight highlight-rust"><pre>#[repr(<span class="pl-en">C</span>)]
+#[derive(<span class="pl-en">Copy</span>, <span class="pl-en">Clone</span>)]
+<span class="pl-k">pub</span>(<span class="pl-k">crate</span>) <span class="pl-k">struct</span> <span class="pl-en">ThreadLocalStorage</span> {
+<span class="pl-c">    // First arg needs to be a pointer to this struct, it's immediately dereferenced</span>
+    <span class="pl-k">pub</span>(<span class="pl-k">crate</span>) <span class="pl-smi">self_addr</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+<span class="pl-c">    // Info on spawned threads that allow us to unmap the stack later</span>
+    <span class="pl-k">pub</span>(<span class="pl-k">crate</span>) <span class="pl-smi">stack_info</span><span class="pl-k">:</span> <span class="pl-en">Option</span>&#x3C;<span class="pl-en">ThreadDealloc</span>>,
+}
+#[repr(<span class="pl-en">C</span>)]
+#[derive(<span class="pl-en">Copy</span>, <span class="pl-en">Clone</span>)]
+<span class="pl-k">pub</span>(<span class="pl-k">crate</span>) <span class="pl-k">struct</span> <span class="pl-en">ThreadDealloc</span> {
+<span class="pl-c">    // For the stack dealloc</span>
+    <span class="pl-smi">stack_addr</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+    <span class="pl-smi">stack_sz</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+<span class="pl-c">    // For the return value dealloc</span>
+    <span class="pl-smi">payload_ptr</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+    <span class="pl-smi">payload_layout</span><span class="pl-k">:</span> <span class="pl-en">Layout</span>,
+<span class="pl-c">    // Futex, </span>
+    <span class="pl-smi">futex_ptr</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+<span class="pl-c">    // Sync who deallocs</span>
+    <span class="pl-smi">sync_ptr</span><span class="pl-k">:</span> <span class="pl-en">usize</span>,
+}
+#[inline]
+#[must_use]
+<span class="pl-k">fn</span> <span class="pl-en">get_tls_ptr</span>() <span class="pl-k">-></span> <span class="pl-k">*mut</span> <span class="pl-en">ThreadLocalStorage</span> {
+    <span class="pl-k">let</span> <span class="pl-k">mut</span> <span class="pl-smi">output</span><span class="pl-k">:</span> <span class="pl-en">usize</span>;
+    #[cfg(target_arch <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>x86_64<span class="pl-pds">"</span></span>)]
+    <span class="pl-k">unsafe</span> {
+        <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">arch</span><span class="pl-k">::</span><span class="pl-en">asm!</span>(<span class="pl-s"><span class="pl-pds">"</span>mov {x}, fs:0<span class="pl-pds">"</span></span>, <span class="pl-smi">x</span> <span class="pl-k">=</span> <span class="pl-en">out</span>(<span class="pl-smi">reg</span>) <span class="pl-smi">output</span>);
+    }
+    #[cfg(target_arch <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>aarch64<span class="pl-pds">"</span></span>)]
+    <span class="pl-k">unsafe</span> {
+        <span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">arch</span><span class="pl-k">::</span><span class="pl-en">asm!</span>(<span class="pl-s"><span class="pl-pds">"</span>mrs {x}, tpidr_el0<span class="pl-pds">"</span></span>, <span class="pl-smi">x</span> <span class="pl-k">=</span> <span class="pl-en">out</span>(<span class="pl-smi">reg</span>) <span class="pl-smi">output</span>);
+    }
+    <span class="pl-smi">output</span> <span class="pl-k">as</span> <span class="pl-smi">_</span>
+}
+</pre></div>
+<p>This takes us to another of our clone-flags <code>CLONE_SETTLS</code>, we can now allocate and supply a pointer to a
+<code>ThreadLocalStorage</code>-struct, and that will be put into the thread's thread-local storage register by the OS,
+which registers are used can be seen in <code>get_tls_ptr</code>.</p>
+<p>Now when entering the <code>panic_handler</code> we can <code>get_tls_ptr</code> and see if there is a <code>ThreadDealloc</code> associated with the
+thread that's currently panicking. If there isn't, we're on the main thread, and we'll just bail out by exiting with
+code <code>1</code>, terminating the program.
+If there is a <code>ThreadDealloc</code> we can now first check if the caller has dropped the <code>JoinHandle&#x3C;T></code>,
+and if we have exclusive access to the shared memory, if we do have exclusive access we deallocate it,
+if we don't we let the caller handle it. Then, again we have to exit with some asm:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-c">// We need to be able to unmap the thread's own stack, we can't use the stack anymore after that</span>
+<span class="pl-c">// so it needs to be done in asm.</span>
+<span class="pl-c">// With the stack_ptr and stack_len in rdi/x0 and rsi/x1, respectively we can call mmap then</span>
+<span class="pl-c">// exit the thread</span>
+#[cfg(target_arch <span class="pl-k">=</span> <span class="pl-s"><span class="pl-pds">"</span>x86_64<span class="pl-pds">"</span></span>)]
+<span class="pl-en">core</span><span class="pl-k">::</span><span class="pl-en">arch</span><span class="pl-k">::</span><span class="pl-en">asm!</span>(
+<span class="pl-c">// Call munmap, all args are provided in this macro call.</span>
+<span class="pl-s"><span class="pl-pds">"</span>syscall<span class="pl-pds">"</span></span>,
+<span class="pl-c">// Zero eax from munmap ret value</span>
+<span class="pl-s"><span class="pl-pds">"</span>xor eax, eax<span class="pl-pds">"</span></span>,
+<span class="pl-c">// Move exit into ax</span>
+<span class="pl-s"><span class="pl-pds">"</span>mov al, 60<span class="pl-pds">"</span></span>,
+<span class="pl-c">// Exit code 0 from thread.</span>
+<span class="pl-s"><span class="pl-pds">"</span>mov rdi, 0<span class="pl-pds">"</span></span>,
+<span class="pl-c">// Call exit, no return</span>
+<span class="pl-s"><span class="pl-pds">"</span>syscall<span class="pl-pds">"</span></span>,
+<span class="pl-en">in</span>(<span class="pl-s"><span class="pl-pds">"</span>rax<span class="pl-pds">"</span></span>) <span class="pl-c1">MUNMAP</span>,
+<span class="pl-en">in</span>(<span class="pl-s"><span class="pl-pds">"</span>rdi<span class="pl-pds">"</span></span>) <span class="pl-smi">map_ptr</span>,
+<span class="pl-en">in</span>(<span class="pl-s"><span class="pl-pds">"</span>rsi<span class="pl-pds">"</span></span>) <span class="pl-smi">map_len</span>,
+<span class="pl-en">options</span>(<span class="pl-smi">nostack</span>, <span class="pl-smi">noreturn</span>)
+);
+</pre></div>
+<p>We also need to remember to deallocate the <code>ThreadLocalStorage</code>, what we keep in the register is just a pointer to
+that allocated heap-memory. This needs to be done both in successful and panicking thread-exits.</p>
+<h2>Final thoughts</h2>
+<p>I've been dreading reinventing this particular wheel, but I'm glad I did.
+I learnt a lot, and it was interesting to dig into how threading works in practice on <code>Linux</code>, plus <code>tiny-std</code> now has
+threads!</p>
+<p>The code for threads in tiny-std can be found <a href="https://github.com/MarcusGrass/tiny-std/blob/main/tiny-std/src/thread/spawn.rs">here</a>.
+With a huge amount of comments its 500 lines.</p>
+<p>I believe that it doesn't contain <code>UB</code> or leakage, but it's incredibly hard to test, what I know is lacking is signal
+handling, which is something else that I have been dreading getting into.</p>
+<h2>Next up</h2>
+<p>I've ordered a Pinephone explorer edition, I'll probably try doing stuff with that next.</p>
+<h2>Thanks for reading!</h2>
+</div>
+</div>
\ No newline at end of file
diff --git a/x11-to-xcb.html b/x11-to-xcb.html
new file mode 100644
index 0000000..aa7f8ac
--- /dev/null
+++ b/x11-to-xcb.html
@@ -0,0 +1,140 @@
+<!DOCTYPE html>
+<html lang="en" xmlns="http://www.w3.org/1999/html">
+
+    <meta charset="UTF-8">
+    <base href="/">
+    <link rel="stylesheet" href="static/styles.css">
+    <link rel="stylesheet" href="static/github-markdown.css">
+    <link rel="stylesheet" href="static/starry_night.css">
+    <title>X11ToXcb</title>
+
+
+<div id="menu">
+<a href=/ class="menu-item">Home</a><a href=/table-of-contents.html class="menu-item">Table of contents</a>
+</div>
+<div id="content">
+<div class="markdown-body"><h1>Rewrite it in Rust, a cautionary tale</h1>
+<p>RIIR (Rewrite It In Rust) is a pretty fun joke, at my current workplace my team writes
+essentially everything in Rust, for good and bad. We like to have a bit of
+fun with it, pushing the RIIR-agenda around the company.</p>
+<p>But, this short retrospective is about when porting something from C-bindings to Rust
+just made life harder.</p>
+<h2>Security advisory on Rust XCB-bindings</h2>
+<p>I've written a lot about <a href="https://en.wikipedia.org/wiki/XCB">XCB</a> and
+<a href="https://en.wikipedia.org/wiki/X_Window_System">X11</a> in my project write-ups
+about my <a href="https://github.com/MarcusGrass/pgwm">x11-wm</a>, I'm not going
+to get into it here, but for these purposes <code>XCB</code> can be summarized as a
+library to handle displaying things on a desktop.</p>
+<p>One day when building a project, a security advisory comes up on <a href="https://github.com/rust-x-bindings/rust-xcb">Rust's XCB bindings</a>.</p>
+<h3>Bindings</h3>
+<p>Generally if you want to use an existing big library, you can take the approach of reinventing the wheel,
+or creating bindings to a C-library that already exists. For example,
+Rust has a <a href="https://crates.io/crates/zstd">zstd crate</a> which contains bindings
+to <a href="https://github.com/facebook/zstd">libzstd</a>. If you want to use that,
+you need to have <code>libzstd</code> available to the binary. Sometimes, it's built as
+part of a build-script and statically compiled into the binary, then you don't
+have to worry about it at all (<a href="https://docs.rs/rocksdb/latest/rocksdb/">Rocksdb does this I think</a>).
+There's also a pure Rust implementation of <a href="https://github.com/KillingSpark/zstd-rs">zstd decompression</a>,
+which is the other approach, same algorithm, different implementation.</p>
+<h3>Why not?</h3>
+<p>There are some good reasons to RIIR, all the good things about using Rust can go here.
+But, there are some very good reasons not to, apart from the effort.<br>
+The one this retrospective is about is maturity, and the robustness that can come from it.</p>
+<h3>Porting x11-clipboard from C-bindings to Rust implementation</h3>
+<p>The security advisory comes up, transitively through <a href="https://github.com/quininer/x11-clipboard">x11-clipboard</a>,
+but the advisory is on the <code>XCB</code>-bindings.<br>
+As I mentioned, my previous work on my WM had made me familiar with a Rust
+library that replaces the bindings: <a href="https://github.com/psychon/x11rb">x11rb</a>.</p>
+<p>To be clear, <code>x11rb</code> is a great library, and the story is not about how it contained some unexpected bug,
+it didn't, it was the act of replacement that became the issue.</p>
+<p><a href="https://github.com/quininer/x11-clipboard/pull/29">I made a PR on June16, 2022</a> to replace usage of the bindings, to
+<code>x11rb</code> in `x11-clipboard. The PR is fairly large, but very procedural. The rust-api
+is essentially the same as the C-one, it was mostly a matter of changing the types.</p>
+<h2>Creeping issues</h2>
+<p><code>x11-clipboard</code> is a library that handles copying and pasting withing x11-sessions. It's used for a lot of Rust's
+gui-applications, so people are likely to run into mistakes if you make them, and there were mistakes.</p>
+<h3>Bug report through alacritty</h3>
+<p>9 months later, <a href="https://github.com/alacritty/alacritty/issues/6760">alacritty gets a bug report</a>, where
+when things are pasted FROM alacritty into other applications, they hang.</p>
+<p>The bug report is floated into <a href="https://github.com/quininer/x11-clipboard/issues/33">x11-clipboards issue tracker</a> after
+a bisection shows that the problem comes from the version update caused by my change.</p>
+<p>Debugging it was medium-difficult, it was easy to reproduce, but difficult to understand, but in the end it was
+resolved by a <a href="https://github.com/quininer/x11-clipboard/pull/34">+1 -1 change</a>,</p>
+<p>From this:</p>
+<div class="highlight highlight-rust"><pre>        <span class="pl-smi">time</span><span class="pl-k">:</span> <span class="pl-smi">event</span><span class="pl-k">.</span>time,
+        <span class="pl-smi">requestor</span><span class="pl-k">:</span> <span class="pl-smi">event</span><span class="pl-k">.</span>requestor,
+        <span class="pl-smi">selection</span><span class="pl-k">:</span> <span class="pl-smi">event</span><span class="pl-k">.</span>selection,
+        <span class="pl-smi">target</span>,
+        <span class="pl-smi">property</span><span class="pl-k">:</span> <span class="pl-smi">event</span><span class="pl-k">.</span>property
+    }
+);
+</pre></div>
+<p>To this:</p>
+<div class="highlight highlight-rust"><pre>        <span class="pl-smi">time</span><span class="pl-k">:</span> <span class="pl-smi">event</span><span class="pl-k">.</span>time,
+        <span class="pl-smi">requestor</span><span class="pl-k">:</span> <span class="pl-smi">event</span><span class="pl-k">.</span>requestor,
+        <span class="pl-smi">selection</span><span class="pl-k">:</span> <span class="pl-smi">event</span><span class="pl-k">.</span>selection,
+        <span class="pl-smi">target</span><span class="pl-k">:</span> <span class="pl-smi">event</span><span class="pl-k">.</span>target,
+        <span class="pl-smi">property</span><span class="pl-k">:</span> <span class="pl-smi">event</span><span class="pl-k">.</span>property
+    }
+);
+</pre></div>
+<p>The error was interesting, it caused some clients (a client in this context is an application like
+<a href="https://brave.com/">Brave browser</a>) to hang waiting for a notification that the application never sent.</p>
+<p>A funny note about <code>X11</code> is that the protocol has been around for so long, and seen so much misuse, that a lot of
+clients are built to handle this kind of mistake, so the error doesn't show up on for example
+<a href="https://www.mozilla.org/en-US/firefox/new/">Firefox</a>.</p>
+<h3>Bug report through pot-app</h3>
+<p><a href="https://github.com/pot-app/Selection/issues/3">On Jan 17, 2024 a bug report comes in from pot-app/Selection</a>.</p>
+<p><a href="https://github.com/pot-app">Pot App is:</a></p>
+<blockquote>
+<p>🌈一个跨平台的划词翻译和OCR软件 | A cross-platform software for text translation and recognition.</p>
+</blockquote>
+<p>To be fair, I think this was a pre-existing bug, but I was kind of on the hook at this point, and it was interesting.</p>
+<p>The clipboard library spawns a thread that listens for events, this threads holds a claim to the connection to the
+x-server, blocking waiting for a reply. Even if the handle that's given to the user is dropped that thread stays alive, keeping
+the connection alive. This means that if you're recreating the structure in a loop for example, you start leaking
+connections until the connection-pool is drained, which means that no new clients can connect. Or in other words,
+no more applications can start because you clogged up the server.</p>
+<p>A problem here is that the thread needs to know from the structure that spawned it, that it's done and should quit.
+There are not many nice way of signalling threads like that they are blocked waiting for something.</p>
+<p>The thread waits like this:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">while</span> <span class="pl-k">let</span> <span class="pl-en">Ok</span>(<span class="pl-smi">event</span>) <span class="pl-k">=</span> <span class="pl-smi">context</span><span class="pl-k">.</span>connection<span class="pl-k">.</span><span class="pl-en">wait_for_event</span>() {
+</pre></div>
+<p>The API doesn't have other facilities for waiting other than polling in a loop, and ideally one doesn't want to
+run the thread at 100% CPU just waiting.</p>
+<p>However, you can get the underlying file descriptor for the connection like this:</p>
+<div class="highlight highlight-rust"><pre><span class="pl-k">let</span> <span class="pl-smi">stream_fd</span> <span class="pl-k">=</span> <span class="pl-smi">context</span><span class="pl-k">.</span>connection<span class="pl-k">.</span><span class="pl-en">stream</span>()<span class="pl-k">.</span><span class="pl-en">as_fd</span>();
+</pre></div>
+<p>And if you have an FD, you can use Linux's APIs to check for readiness, instead of what's exposed through the
+<code>x11rb</code> API. This is only running on Linux anyway, so why not? (This is foreshadowing).</p>
+<p>In the end <a href="https://github.com/quininer/x11-clipboard/pull/46">I make a PR</a> that uses <a href="https://github.com/rust-lang/libc">libc</a>,
+the <a href="https://man7.org/linux/man-pages/man2/poll.2.html">Linux Poll API</a>, and an,
+<a href="https://man7.org/linux/man-pages/man2/eventfd.2.html">eventfd</a>.
+If the struct is dropped, it'll write an event on the <code>eventfd</code>. On the other side, the thread polls for
+either a new message on the stream, or an event on the <code>eventfd</code>, if an event arrives on the stream, it'll handle that
+like before, if it arrives through the <code>eventfd</code> it just quits.  That solved the issue.</p>
+<h3>Bug report through the regular issue tracker</h3>
+<p><a href="https://github.com/quininer/x11-clipboard/issues/48">On Feb 28, 2024 a bug report is posted on x11-clipboard</a>.</p>
+<p>Now, I figured <code>X11</code> was only used on Linux, Mac and Windows have their own display systems. But, I forgot about
+the BSDs, those operation systems can run <code>X11</code>, and I should have thought about that before picking the
+Linux specific <code>eventfd</code>.</p>
+<h4>POSIX</h4>
+<p><a href="https://en.wikipedia.org/wiki/POSIX">POSIX</a> is an OS-compatibility standard, if you use POSIX-compliant OS-apis,
+can generally get away with using APIs that interface with the OS for Linux and they'll still work for the BSDs,
+some examples: <code>poll</code>, <a href="https://man7.org/linux/man-pages/man2/read.2.html">read</a>, <a href="https://man7.org/linux/man-pages/man2/write.2.html">write</a>,
+<a href="https://man7.org/linux/man-pages/man2/pipe.2.html">pipe</a>. <code>eventfd</code> is a counter-example.</p>
+<p>What my bugfix was trying to achieve was a drop of the struct exposed through the <code>x11-clipboard</code> API causing something
+pollable to happen in the running thread. I thought <code>eventfd</code> was a good fit, but something POSIX-compliant would be
+to create a <code>pipe</code>, two fds, a read-end and a write-end, put the write-end in the user struct, the read-end in the
+thread, and poll for a <code>POLLHUP</code> (hangup), that gets sent to one end when the other end's <code>FD</code> is closed.</p>
+<p>Now I could use the existing RAII-closing of the write-end on the user struct, and just listen to a hangup on the running
+thread, and it works on the BSDs!</p>
+<h2>Conclusion</h2>
+<p>For now that's been it, I'll update this if more stuff comes in. I think that lessons learned here are that there's a
+maintenance cost to any change. While RIIR might be fun, it's good to think twice about how reasonable it is.</p>
+<p>Of course, there may be lurking bugs in the C-implementation that isn't seen because of selection bias, but I don't have
+any basis for that.</p>
+<p>Last of all, I'm sorry for the hassle <a href="https://github.com/quininer">quininer</a>, I know you don't want to maintain this
+project anymore, and I made your life a bit more difficult.</p>
+</div>
+</div>
\ No newline at end of file