diff --git a/404.html b/404.html new file mode 100644 index 0000000..243d771 --- /dev/null +++ b/404.html @@ -0,0 +1 @@ + Not found NOT FOUND \ No newline at end of file diff --git a/boot.html b/boot.html new file mode 100644 index 0000000..b980981 --- /dev/null +++ b/boot.html @@ -0,0 +1,305 @@ + + + + + + + + + Boot + + + +
+

Boot-rs securing a Linux bootloader

+

I recently dug into a previously unfamiliar part of Linux, the bootloader.

+

This is a medium-length write-up of how the Linux boot-process works and how to modify it, told through +the process of me writing my own janky bootloader.

+

I wanted the boot process to be understandable, ergonomic, and secure.

+

Notes about distributions

+

I did what's described in this write-up on Gentoo, although it would work the same on any +linux machine. Depending on the distribution this setup might not be feasible. Likely these steps would have to +be modified depending on the circumstance.

+

Preamble, Security keys

+

I got some Yubikeys recently. Yubikeys are security keys, which essentially is a fancy +name for a drive (USB in this case) created to store secrets securely.

+

Some secrets that are loaded into the key cannot escape at all, they can even be created on the key, never having seen +the light of day.
+Some secrets can escape and can therefore be injected as part of a pipeline in other security processes. An example +of this could be storing a cryptodisk secret which is then passed to cryptsetup +in the case of Linux disk encryption.

+

I did some programming against the Yubikeys, I published a small runner to sign data with a Yubikey here +but got a bit discouraged by the need for pcscd, a daemon with an accompanying c-library to +interface with it, to connect.
+Later I managed to do a pure rust integration against the Linux usb interface, and will publish that pretty soon.

+

I started thinking about ways to integrate Yubikeys into my workflow more, I started +examining my boot process, I got derailed.

+

Bootloader woes

+

I have used GRUB as my bootloader since I started using Linux, it has generally +worked well, but it does feel old.

+

When I ran grub-mkconfig -o ..., updating my boot configuration, and ran into +this issue I figured it +was time to survey for other options. After burning another ISO to get back into my system.

+

Bootloader alternatives

+

I was looking into alternatives, finding efi stub, compiling the kernel +into its own bootable efi-image, to be the most appealing option. +If the kernel can boot itself, why even have a bootloader?

+

With Gentoo, integrating that was fairly easy assuming no disk encryption.

+

Before getting into this, a few paragraphs about the Linux boot process may be appropriate.

+

Boot in short

+

The boot process, in my opinion, starts on the motherboard firmware and ends when the kernel hands over execution to /sbin/init.

+

UEFI

+

The motherboard powers on and starts running UEFI firmware (I'm pretending bios don't exist because I'm not stuck in the past).
+UEFI can run images, such as disk, keyboard, and basic display-drivers, kernels, and Rust binaries.

+

Usually, this stage of the process will be short, as the default task to perform is to check if the user wants to enter +setup and interface with the UEFI system, or continue with the highest priority boot-image.

+

That boot image could be a grub.efi-program, which may perform some work, such as decrypting your boot partition and then +handing execution over to the kernel image.
+It could also be an efi stub kernel image that gets loaded directly, or some other bootloader.

+

Kernel boot

+

The kernel process starts, initializing the memory it needs, starting tasks, and whatever else the kernel does.

+

Initramfs

+

When the kernel has performed its initialization, early userspace starts in the initramfs.
+Initramfs, also called early userspace, is the first place a Linux user +is likely to spread their bash-spaghetti in the boot-process.

+

The initramfs is a ram-contained (in-memory) file-system, it can be baked into the kernel, +or provided where the kernel can find it during the boot process. Its purpose is to set up user-space so that it's ready +enough for init to take over execution. Here is where disk-decryption happens in the case of cryptsetup.

+

The Initramfs-stage ends by handing over execution to init:

+

exec switch_root <root-partition> <init>, an example could be exec switch_root /mnt/root /sbin/init, +by convention, init is usually found at /sbin/init.

+

The initramfs prepares user-space, while init "starts" it, e.g. processes, such as dhcpcd, +are taken care of by init.

+

Init

+

Init is the first userspace process to be started, the parent to all other processes, it has PID 1 and if it dies, +the kernel panics. +Init could be any executable, like Bash.

+

In an example system where bash is init, the user will be dropped into the command-line, in a bash shell, at the destination that the +initramfs specified in switch_root. From a common user's perspective this is barely a functional system, it has no internet, +it will likely not have connections to a lot of peripheral devices, and there is no login management.

+

Init daemon

+

Usually Linux systems have an init daemon. Some common init-daemons are systemd, openrc, +and runit.
+The init daemon's job is to start processes that make the system usable, up to the user's specification. Usually it +will start udev to get device events and populate /dev with device interfaces, as well as ready internet interfaces +and start login management.

+

DIY initramfs

+

I wanted at least basic security, this means encrypted disks, if I lose my computer, or it gets stolen, I can be fairly sure that +the culprits won't get access to my data without considerable effort.
+Looking back up over the steps, it means that I need to create an initramfs, so that my disks can be decrypted on boot. +There are tools to create an initramfs, dracut being +one example, mkinitcpio that Arch Linux uses is another.

+

Taking things to the most absurd level, I figured I'd write my own initramfs instead.

+

The process

+

The most basic decrypting initramfs is just a directory which could be created like this:

+
[gramar@grentoo /home/gramar/misc/initramfs]# touch init
+[gramar@grentoo /home/gramar/misc/initramfs]# chmod +x init
+[gramar@grentoo /home/gramar/misc/initramfs]# mkdir -p mnt/root
+[gramar@grentoo /home/gramar/misc/initramfs]# ls -lah
+total 12K
+drwxr-xr-x 3 gramar gramar 4.0K Mar 21 15:11 .
+drwxr-xr-x 4 gramar gramar 4.0K Mar 21 15:11 ..
+-rwxr-xr-x 1 gramar gramar    0 Mar 21 15:11 init
+drwxr-xr-x 3 gramar gramar 4.0K Mar 21 15:11 mnt
+
+

The init contents being this:

+
#!/bin/bash
+cryptsetup open /dev/disk/by-uuid/<xxxx> croot # Enter password
+cryptsetup open /dev/disk/by-uuid/<xxxx> cswap # Enter password
+cryptsetup open /dev/disk/by-uuid/<xxxx> chome # Enter password
+# Mount filesystem
+mount /dev/mapper/croot /mnt/root
+mount /dev/mapper/chome /mnt/root/home
+swapon /dev/mapper/cswap 
+# Hand over execution to init
+exec switch_root /mnt/root /sbin/init
+
+

If we point the kernel at this directory, build it, and then try to boot it, we'll find out that this doesn't work at all, +and if you somehow ended up here through Googling and copied that, I'm sorry.

+

One reason for this is that /bin/bàsh does not exist on the initramfs, we can't call it to execute the commands in the scripts.

+

If we add it, for example by:

+
[gramar@grentoo /home/gramar/misc/initramfs]# mkdir bin
+[gramar@grentoo /home/gramar/misc/initramfs]# cp /bin/bash bin/bash
+
+

Then try again, it still won't work and will result in a kernel panic.
+The reason is that bash (if you didn't build it yourself using dark magic), is dynamically +linked, we can see that this is indeed the case using ldd +to list dynamic dependencies.

+
[gramar@grentoo /home/gramar/misc/initramfs]# ldd bin/bash
+        linux-vdso.so.1 (0x00007ffc7f9a1000)
+        libreadline.so.8 => /lib64/libreadline.so.8 (0x00007fd040f06000)
+        libtinfo.so.6 => /lib64/libtinfo.so.6 (0x00007fd040ec6000)
+        libc.so.6 => /lib64/libc.so.6 (0x00007fd040cf3000)
+        libtinfow.so.6 => /lib64/libtinfow.so.6 (0x00007fd040cb2000)
+        /lib64/ld-linux-x86-64.so.2 (0x00007fd04104f000)
+
+

Now we can just try to appease Bash here and copy these dependencies into the initramfs at the appropriate places, +but there are quite a few files, and we risk cascading dependencies, what if we need to update and the dependencies have changed?

+

And how about cryptsetup, mount, swapon, and switch_root?

+

Static linking and BusyBox

+

Many of the tools used to interface with Linux (usually) come from GNU coreutils.
+There are other sources however, like the Rust port, but the most popular is likely +BusyBox.

+

BusyBox is a single binary which on my machine is 2.2M big, it contains most of the coreutils.
+One benefit of using BusyBox is that it can easily be statically linked +which means that copying that single binary is enough, no dependencies required.
+Likewise cryptsetup can easily be statically linked.

+

Busybox initramfs

+

The binaries are placed in the initramfs. (I realize that I need a tty, console, and null to run our shell +so I copy those too).

+
[gramar@grentoo /home/gramar/misc/initramfs]# cp /bin/busybox bin/busybox
+[gramar@grentoo /home/gramar/misc/initramfs]# mkdir sbin        
+[gramar@grentoo /home/gramar/misc/initramfs]# cp /sbin/cryptsetup sbin/cryptsetup
+[gramar@grentoo /home/gramar/misc/initramfs]# cp -a /dev/{null,console,tty} dev
+
+

And then change the script's shebang.

+
#!/bin/busybox sh
+export PATH="/bin:/sbin:$PATH"
+cryptsetup open /dev/disk/by-uuid/<xxxx> croot # Enter password
+cryptsetup open /dev/disk/by-uuid/<xxxx> cswap # Enter password
+cryptsetup open /dev/disk/by-uuid/<xxxx> chome # Enter password
+# Mount filesystem
+mount /dev/mapper/croot /mnt/root
+mount /dev/mapper/chome /mnt/root/home
+swapon /dev/mapper/cswap 
+# Hand over execution to init
+exec switch_root /mnt/root /sbin/init
+
+

Finally, we can execute the init script at boot time, and immediately panic again, cryptsetup can't find the disk.

+

Udev

+

There are multiple ways to address disks, we could for example, copy the disk we need in the initramfs as it shows up +under /dev, cp -a /dev/sda2 dev. But the regular disk naming convention isn't static, /dev/sda might be tomorrow's +/dev/sdb. Causing an un-bootable system, ideally we would specify it by uuid.

+

Udev is a tool that finds devices, listens to device events, and a bit more. What we need it for, is to populate +/dev with the devices that we expect.

+

I call it Udev because it's ubiquitous, it's actually a systemd project.
+There is a fork, that used to be maintained by the Gentoo maintainers, Eudev.
+Both of the above are not ideal for an initramfs, what we'd really like is to just oneshot generate /dev.
+Luckily for us, there is a perfect implementation that does just that, contained within BusyBox, Mdev.

+

To save us from further panics, I will fast-forward through discovering that we need to mount three pseudo-filesystems +to make mdev work, proc, sys, and dev (dev shouldn't be that surprising). We also need to create the mount points.

+
[gramar@grentoo /home/gramar/misc/initramfs]# mkdir proc
+[gramar@grentoo /home/gramar/misc/initramfs]# mkdir dev
+[gramar@grentoo /home/gramar/misc/initramfs]# mkdir sys
+
+

Working initramfs

+
#!/bin/busybox sh
+export PATH="/bin:/sbin:$PATH"
+# Mount pseudo filesystems
+mount -t proc none /proc
+mount -t sysfs none /sys
+mount -t devtmpfs none /dev
+# Mdev populates /dev with symlinks
+mdev -s
+cryptsetup open /dev/disk/by-uuid/<xxxx> croot # Enter password
+cryptsetup open /dev/disk/by-uuid/<xxxx> cswap # Enter password
+cryptsetup open /dev/disk/by-uuid/<xxxx> chome # Enter password
+# Mount filesystem
+mount /dev/mapper/croot /mnt/root
+mount /dev/mapper/chome /mnt/root/home
+swapon /dev/mapper/cswap 
+# Unmount the pseudo filesystems, except dev which is now busy.  
+umount /proc
+umount /sys
+# Hand over execution to init
+exec switch_root /mnt/root /sbin/init
+
+

Ergonomics

+

This setup requires me to enter my password three times, which is easily fixed by saving it in a variable and piping +it into cryptsetup.

+

Reflections on security

+

While the above setup works, it has less security than my last.
+I boot directly into my kernel which now must be unencrypted, and could therefore be tampered with.
+This is a different attack-surface than the last considered one: I lose my laptop. It's: Someone tampers with my +boot process to get access to my data on subsequent uses.

+

Bootloader tampering

+

Depending on your setup, your bootloader (kernel in this case) may be more or less subject to tampering.
+Usually, one would have the bootloader in a /boot directory, which may or may not be on a separate partition.

+

If that directory is writeable only by root, it doesn't really matter if it's on an unmounted partition or not.
+Someone with root access to your machine could edit the contents (or mount the partition and then edit the contents).
+That means that if someone has root access to your machine then your bootloader could be tampered with remotely.

+

Evil maids

+

Another possible avenue of compromise is if someone has physical access to the disk on which you store your bootloader.
+I am not a high-value target, as far as I know at least, and that kind of attack, also known as an evil maid attack +is fairly high-effort to pull off. The attacker needs to modify my kernel without me noticing, which for me as a target, +again, is pretty far-fetched.

+

But this is not about being reasonable, it's never been about that, it's about taking things to the extreme.

+

Encrypting the kernel

+

The problem with encrypting the kernel is that something has to decrypt it, we need to move further down the boot-chain.
+I need to, at the UEFI level, decrypt and then hand over execution to the kernel image.

+

Writing a bootloader

+

I hinted earlier at UEFI being able to run Rust binaries, indeed there is an UEFI target +and library for Rust.

+

Encrypt and Decrypt without storing secrets

+

We can't have the bootloader encrypted, it needs to be a ready UEFI image.
+This means that we can't store decryption keys in the bootloader, it needs to ask the user for input +and deterministically derive the decryption key from that input.

+

Best practice for secure symmetric encryption is AES, +since I want the beefiest encryption, I opt for AES-256, that means that the decryption key is 32 bytes long.

+

Brute forcing a random set of 32 bytes is currently not feasible, but passwords generally are not random and random brute forcing +would not likely be the method anyone would use to attack this encryption scheme.
+What is more likely is that a password list would be used to try leaked passwords, +or dictionary-generated passwords would be used.

+

To increase security a bit, the 32 bytes will be generated by a good key derivation function, at the moment Argon2 +is the best tool for that as far as I know. This achieves two objectives:

+
    +
  1. Whatever the length of your password, it will end up being 32 random(-ish) bytes long. +
  2. The time and computational cost of brute forcing a password will be extended by the time it takes to +run argon2 to the derive a key from each password that is attempted. +
+

This leaves the attacker with two options:

+
    +
  1. Randomly try to brute force every 32 byte combination, which is unfeasible. +
  2. Use a password list and try every known or generated password after running argon2 on it. +
+

Option 2 may or may not be unfeasible, depending on the strength of the password, transforming a bad password +into 32 bytes doesn't do much if the password doesn't take enough attempts to guess.

+

Uefi development

+

I fire up a new virtual machine, with UEFI support, and start iterating. The development process was less painful than +I thought that It would be. The caveat being that I am writing an extremely simple bootloader, it finds the kernel +on disk, asks the user for a password, derives a key from it using Argon2, decrypts the kernel with that key, and +then hands over execution to the decrypted kernel. The code for it can be found at this repo.

+

New reflections on security

+

All post-boot content, as well as the kernel is now encrypted, the kernel itself is read straight into RAM and then executed, +the initramfs decrypts the disks after getting password input, deletes itself, and then hands over execution to init.

+

Bootloader compromise

+

There is still one surface for attack, the unencrypted bootloader.
+A malicious actor could replace my bootloader with something else, take my keyboard input, and decrypt my kernel. +Or an attacker could replace my bootloader, take my keyboard input (possibly just discarding it), then boot into a malicious kernel where I enter +my decryption keys, and decrypt my disks.

+

Moving cryptodisk secrets into the initramfs

+

Since the initramfs is now encrypted, an ergonomic move is to create a new decryption key for my disks, +move that into the initramfs, then use those secrets to decrypt the disks automatically during that stage.

+

The "boot into malicious kernel attack", becomes more difficult to pull off. +I'd notice if my disks aren't being automatically decrypted.

+

Secure boot

+

Some people think Secure Boot and UEFI in general is a cynical push by Microsoft to force Linux desktop user share +down to zero (from close to zero). Perhaps, but Secure Boot can be used to add some security to the most sensitive part +of our now fairly secured boot process.

+

Secure Boot works by only allowing the UEFI firmware to boot from images that are signed by its stored cryptographic keys.
+Microsoft's keys are (almost) always vendored and exist in the store by default, but they can be removed (kind of) and +replaced by your own keys.

+

The process for adding your own keys to Secure Boot, as well as signing your bootloader, will be left out of this write-up.

+

Final reflections on security

+

Now my boot-process is about as secure as I am capable of making it while retaining some sense of ergonomics.
+The disks are encrypted and can't easily be decrypted. The kernel itself is decrypted and I would notice if it's replaced +by something else through the auto-decryption.
+The bootloader cannot be exchanged without extracting my setup password.

+

The main causes of concerns are now BUGS, and still, evil maids.

+
    +
  1. Bugs in secure boot. +
  2. Bugs in my implementation. +
  3. Bugs in the AES library that I'm using. +
  4. Bugs in the Argon2 library that I'm using. +
  5. Bugs in cryptsetup. +
  6. Bugs everywhere. +
+

But those are hard to get away from.

+

Epilogue

+

I'm currently using this setup, and I will for as long as I use Gentoo I would guess. +Once set up it's pretty easy to re-compile and re-encrypt the kernel when it's time to upgrade.

+

Thanks for reading!

+
+
\ No newline at end of file diff --git a/index.html b/index.html new file mode 100644 index 0000000..cc3e045 --- /dev/null +++ b/index.html @@ -0,0 +1 @@ + Marcus Grass' pages

About

This site is a place where I intend to store things I've learned so that I won't forget it.

This page

There's not supposed to be a web 1.0 vibe to it, but I'm horrible at front-end styling so here we are.
The site is constructed in javascript but as with all things in my free time I make things more complicated than they need to be.
There is a Rust runner that takes the md-files, generates html and javascript, and then minifies that.
The markdown styling is ripped from this project, it's GitHub's markdown CSS, I don't want to stray too far out of my comfort zone...

The highlighting is done with the use of starry-night.

All page content except for some glue is just rendered markdown contained in the repo.

Content

See the menu bar at the top left to navigate to the table of contents, if I end up writing a lot of stuff here I'm going to have to look into better navigation and search.

License

The license for this pages code can be found in the repo here.
The license for the styling is under that repo here.
The license for starry night is for some reason kept in this 1MB file in their repo here (TLDR it's MIT/Apache2 licensed under MIT)

\ No newline at end of file diff --git a/kbd-smp.html b/kbd-smp.html new file mode 100644 index 0000000..d0b4391 --- /dev/null +++ b/kbd-smp.html @@ -0,0 +1,189 @@ + + + + + + + + + KbdSmp + + + +
+

Symmetric multiprocessing in your keyboard

+

While my daughter sleeps during my parental leave I manage to get up to +more than I thought I would. This time, a deep-dive into QMK.

+

Overview

+

This writeup is about how I enabled multicore processing on my keyboard, +the structure is as follows:

+
    +
  1. A short intro to QMK. +
  2. A dive into keyboards, briefly how they function. +
  3. Microcontrollers and how they interface with the keyboard. +
  4. Threading on Chibios. +
  5. Multithread vs multicore, concurrency vs parallelism. +
  6. Tying it together. +
+

QMK and custom keyboards

+

QMK contains open source firmware for keyboards, it provides implementations for most custom keyboard functionality, +like key presses (that one's obvious), rotary encoders, and oled screens.

+

It can be thought of as an OS for your keyboard, which can be configured by plain json, +with online tools, and other +simple tools that you don't need to be able to program to use.

+

But, you can also get right into it if you want, which is where it gets interesting.

+

Qmk structure

+

Saying that QMK is like an OS for your keyboard might drive some pedantics mad, since QMK packages +an OS and installs it configured on your keyboard, with your additions.

+

Most features are toggled by defining constants in different make or header files, like:

+
#pragma once
+// Millis
+#define OLED_UPDATE_INTERVAL 50
+#define OLED_SCROLL_TIMEOUT 0
+#define ENCODER_RESOLUTION 2
+// Need to propagate oled data to right side
+#define SPLIT_TRANSACTION_IDS_USER OLED_DATA_SYNC
+
+

It also exposes some API's which provide curated functionality, +here's an example from the oled driver:

+
// Writes a string to the buffer at current cursor position
+// Advances the cursor while writing, inverts the pixels if true
+void oled_write(const char *data, bool invert);
+
+

Above is an API that allows you to write text to an oled screen, very convenient.

+

Crucially, QMK does actually ship an OS, in my case chibios. +Chibios is a full-featured RTOS. That OS contains +the drivers for my microcontrollers, and from my custom code I can interface with +the operating system.

+

Keyboards keyboards keyboards

+

I have been building keyboards since I started working as a programmer. +There is much that can be said about them, but not a lot of it is particularly interesting. I'll give a brief +explanation of how they work.

+

Keyboard internals

+

A keyboard is like a tiny computer that tells the OS (The other one, the one not in the keyboard) +what keys are being pressed.

+

Here are three arbitrarily chosen important components to a keyboard:

+
    +
  1. The Printed Circuit Board (PCB), it's a large +chip that connects all the keyboard components. If you're thinking: "Hey that's a motherboard!", then you +aren't far off. Split keyboards (usually) have two PCBs working in tandem, connected by (usually) an aux cable. +
  2. The microcontroller, the actual computer part that you program. It can be integrated directly with the PCB, +or soldered on to it. +
  3. The switches, +the things that when pressed connects circuits on the PCB, which the microcontroller can see +and interpret as a key being pressed. +
+

Back to the story

+

I used an Iris for years and loved it, but since some pretty impressive microcontrollers that aren't AVR, +but ARM came out, surpassing the AVR ones in cost-efficiency, memory, and speed, while being compatible, +I felt I needed an upgrade.

+

A colleague tipped me off about lily58, which takes any pro-micro-compatible microcontroller, +so I bought it. Alongside a couple of RP2040-based microcontrollers.

+

RP2040 and custom microcontrollers

+

Another slight derailment, the RP2040 microcontroller is a microcontroller with an +Arm-cortex-m0+ cpu. Keyboard-makers take this kind +of microcontroller, and customize them to fit keyboards, since pro-micro microcontrollers have influenced a lot +of the keyboard PCBs, many new microcontroller designs fit onto a PCB the same way that a pro-micro does. Meaning, +often you can use many combinations of microcontrollers, with many combinations of PCBs.

+

The arm-cortex-m0+ cpu is pretty fast, cheap, and has two cores, TWO CORES, why would someone even need that? +But, if there are two cores on there, then they should both definitely be used.

+

Back to the story, pt2

+

I was finishing up my keyboard and realized that oled-rendering is by default set to 50ms, to not impact +matrix scan rate. (The matrix scan rate is when the microcontroller checks the PCB for what keys are being held down, +if it takes too long it may impact the core functionality of key-pressing and releasing being registered correctly).

+

Now I found the purpose of multicore, if rendering to the oled takes time, +then that job could (and therefore should) be shoveled onto a +different thread. My keyboard has 2 cores, I should parallelize this by using a thread!

+

Chibios and threading

+

Chibios is very well documented; it even +has a section on threading, and it even has a +convenience function for +spawning a static thread.

+

It can be used like this:

+
static THD_WORKING_AREA(my_thread_area, 512);
+static THD_FUNCTION(my_thread_fn, arg) {
+    // Cool function body
+}
+void start_worker(void) {
+    thread_t *thread_ptr = chThdCreateStatic(my_thread_area, 512, NORMALPRIO, my_thread_fn, NULL);
+}
+
+

Since my CPU has two cores, if I spawn a thread, work will be parallelized, I thought, so I went for it. (This is +foreshadowing).

+

After wrangling some mutex locks, and messing +with the firmware to remove race conditions, I had a multithreaded implementation that could offload rendering +to the oled display on a separate thread, great! Now why is performance so bad?

+

Multithread != Multicore, an RTOS is not the same as a desktop OS

+

When I printed the core-id of the thread rendering to the oled-display, it was 0. I wasn't +actually using the extra core which would have core-id 1.

+

The assumption that:

+
+

If I have two cores and I have two threads, the two threads should be running +or at least be available to accept tasks almost 100% of the time.

+
+

does not hold here. +It would hold up better on a regular OS like Linux, but on Chibios it's a bit more explicit.

+

Note: +Disregarding that Chibios spawns both a main-thread, and an idle-thread (on the same core) by default, so it's not just one, +although that's not particularly important to performance.

+

On concurrency vs parallelism

+

Threading without multiprocessing can produce concurrency, like in Python with +the GIL enabled. A programmer can run multiple tasks at the same time and if those tasks don't +require CPU-time, such as waiting for some io, the tasks can make progress at the same time, which +is why Python with the GIL can run webservers pretty well. However, tasks that require CPU-time to make +progress will not benefit from having more threads in the single-core case.

+

One more caveat are blocking tasks that do not park the thread, this will come down to how to the OS decides to schedule +things: In a single-core scenario, the main thread offloads some io-work to a separate thread, +the OS schedules (simplified) 1 millisecond to the io-thread, but that thread is stuck waiting for io to complete, +the application will make no progress for that millisecond. +One way to mitigate this is to park the waiting thread inside the +io-api, then waking it up on some condition, in that case the blocking io won't hang the application.

+

In my case, SMP not being enabled meant that the oled-drawer-thread just got starved of CPU-time resulting in +drawing to the oled being painfully slow, but even if it hadn't been, there may have been a performance hit because +it could have interfered with the regular key-processing.

+

Parallelism

+

I know I have two cores, parallelism should therefore be possible, I'll just have to enable +Symmetric multiprocessing(SMP). +SMP means that the processor can actually do things in parallel. +It's not enabled by default, Chibios has some documentation on this.

+

Enabling SMP is not trivial as it turns out, it needs a config flag for chibios, +a makeflag when building for the platform (rp2040), and some other fixing. +So I had to mess with the firmware once more, +but checking some flags in the code, and some internal structures, I can see that Chibios is now compiled +ready to use SMP, it even has a reference that I can use to my other core's context &ch1 (&ch0 is core 0).

+

On Linux multicore and multithreading is opaque, you spawn a thread, it runs on some core (also assuming that +SMP is enabled, but it generally is for servers and desktops). On Chibios, if you +spawn a thread, it runs on the core that spawned it by default.

+

Back to the docs, I see that I can instead create a thread from a thread descriptor, +which takes a reference to the instance-context, &ch1. Perfect, now I'll spawn a thread on the other core, happily ever +after.

+

WRONG!

+

It still draws from core-0 on the oled.

+

Checking the chibios source code, I see that it falls back to &ch0 if &ch1 is null, now why is it null?

+

Main 2, a single main function is for suckers

+

Browsing through the chibios repo I find the next piece of the puzzle, +a demo someone made of SMP on the RP2040, it needs a separate main function where the instance context (&ch1) +for the new core is initialized. I write some shim-code, struggle with some more configuration, and finally, +core 1 is doing the oled work.

+

Performance is magical, it's all worth it in the end.

+

Conclusion

+

My keyboard now runs multicore and I've offloaded all non-trivial +work to core 1 so that core 0 can do the time-sensitive matrix scanning, +and I can draw as much and often as I want to the oled display.

+

I had to mess a bit with the firmware to specify that there is an extra +core on the RP2040, and to keep QMKs hands off of oled state, since +that code isn't thread-safe.

+

In reality this kind of optimization probably isn't necessary for most users, +but if there is work that the keyboard is doing +that's triggered by key processing, such as rgb-animations, oled-animations, and similar. Offloading that +to a separate core could improve performance, allowing more of that kind of work for a given keyboard.

+

The code is in my fork here, +with commits labeled [FIRMWARE] being the ones messing with the firmware.

+

The keyboard-specific code is contained +here, +on the same branch.

+

I hope this was interesting to someone!

+
+
\ No newline at end of file diff --git a/meta.html b/meta.html new file mode 100644 index 0000000..adad4a4 --- /dev/null +++ b/meta.html @@ -0,0 +1,178 @@ + + + + + + + + + Meta + + + +
+

Writing these pages

+

I did a number of rewrites of this web application, some of which could probably be +found in the repository's history.
+The goal has changed over time, but as with all things I wanted to create something that's as small as possible, +and as fast as possible, taken to a ridiculous and counterproductive extent.

+

Rust for frontend

+

Rust can target WebAssembly through its target +wasm32-unknown-unknown, which can then be run on the web. Whether this is a good idea or not remains to be seen.

+

I've been working with Rust for a while now, even written code targeting wasm, but hadn't yet written anything +to be served through a browser using Rust.

+

After thinking that I should start writing things down more, I decided to make a blog to collect my thoughts.
+Since I'm a disaster at front-end styling I decided that if I could get something to format markdown, that's good +enough.
+I could have just kept them as .md files in a git-repo, and that would have been the reasonable thing to do, +but the concept of a dedicated page for it spoke to me, with GitHub's free hosting I started looking for alternatives +for a web framework.

+

SPA

+

An SPA (Single Page Application), is a web application where +the user doesn't have to follow a link and load a new page from the server to navigate to different pages of the +application. It dynamically injects html based on path. This saves the user an http round trip when switching +pages within the application, causing the application to feel more responsive.

+

I've worked with SPAs a bit in the past with the Angular framework, and I wanted to see if I +could implement an SPA using Rust.

+

Yew

+

I didn't search for long before finding yew, it's a framework for developing front-end applications +in Rust. It looked pretty good so I started up.

+

I like how Yew does things, you construct Components that pass messages and react to them, changing their state +and maybe causing a rerender. +Although, I have a personal beef with macros and especially since 0.20 Yew uses them a lot, +but we'll get back to that.

+

My first shot was using pulldown-cmark directly from the Component.
+I included the .md-files as include_str!(...) and then converted those to html within the component at view-time.

+

How the page worked

+

The page output is built using Trunk a wasm web application bundler.

+

trunk takes my wasm and assets, generates some glue javascript to serve it, and moves it into a dist directory along +with my index.html. From the dist directory, the web application can be loaded.

+

The code had included my .md-files in the binary, a const String inserted into the wasm. When a +page was to be loaded through navigation, my component checked the path of the url, if for example it was +/ it would select the hardcoded string from the markdown of Home.md, convert that to html and then inject +that html into the page.

+

Convert at compile time

+

While not necessarily problematic, this seemed unnecessary, since the .md-content doesn't change and is just +going to be converted, I might as well only do that once. +The alternatives for that is at compile time or at application load time, opposed to what I was currently doing, +which I guess would be called render time or view-time (in other words, every time content was to be injected).

+

I decided to make build-scripts which takes my .md-pages, and converts them to html, then my application +could load that const String instead of the old one, skipping the conversion step and the added binary dependency of +pulldown-cmark.

+

It was fairly easily done, and now the loading was (theoretically) faster.

+

Styling

+

I wanted my markdown to look nice, the default markdown-to-html conversion rightfully doesn't apply any styling. +As someone who is artistically challenged I needed to find some off-the-shelf styling to apply.

+

I thought GitHub's css for their markdown rendering looks nice and wondered if I could find the source for it, +after just a bit of searching I found github-markdown-css, where +a generator for that css, as well as already generated copies of it. I added that too my page.

+

Code highlighting

+

Code highlighting was difficult, there are a few alternatives for highlighting.
+If I understood it correctly, GitHub uses something similar to starry-nigth.
+Other alternatives are highlight.js and Prism.
+After a brief look, highlight.js seemed easy to work with, and produces some nice styling, I went with that.

+

The easiest way of implementing highlight.js (or prism.js, they work essentially the same), is to load a
+<script src="highlight.js"></script> at the bottom of the page body. Loading the script calls the +highlightAll() function, which takes code elements and highlights them.
+This turned out to not be that easy the way I was doing things.
+Since I was rendering the body dynamically, previously highlighted elements would be de-highlighted on navigation, +since the highlightAll() function had already been called. While I'm sure that you can call js-functions from Yew, +finding how to do that in the documentation is difficult. Knowing when the call them is difficult as well, +as many comprehensive frameworks, they work as black boxes sometimes. While it's easy to look at page-html with +javascript and understand what's happening and when, it's difficult to view corresponding Rust code and know when +an extern javascript function would be called, if I could figure out how to insert such a call in the component.
+I settled for not having highlighting and continued building.

+

Navigation

+

I wanted a nav-bar, some hamburger menu which would unfold and +give the user access to navigation around the page. Constructing that with my knowledge of css was a disaster.
+It never scaled well, it was difficult putting it in the correct place, and eventually I just gave up +and created a navigation page .md-style, like all other pages in the application.
+I kept a menu button for going back to home, or to the navigation page, depending on the current page.

+

An issue with this is that links in an .md-file, when converted to html, become regular <a href=".." links, +which will cause a new page-load. My internal navigation was done using Yew callbacks, swapping out +page content on navigation, that meant I'd have to replace those href links with Yew templating. +I decided to make my build script more complex, instead of serving raw converted html, I would generate small +rust-files which would convert the html into Yew's html! macro. This was ugly in practice, html that looked like +this

+
+<div>
+    Content here
+</div>
+
+

Would have to be converted to this:

+
yew::html! {
+    <div>
+        {{"Content here"}}
+    </div>
+}
+
+

Any raw string had to be double bracketed then quoted.
+Additionally, to convert to links, raw html that looked like this:

+
<a href="/test">Test!</a>
+
+

Would have to be converted to this:

+
yew::html! {
+    <a onclick={move |_| scope.navigator.unwrap().replace(&Location::Test)}>Test!</a>
+}
+
+

On top of that, the css specifies special styling for <a> which contains href vs <a> which doesn't.
+That was a fairly easy to change, from this: +.markdown-body a:not([href]) to this .markdown-body a:not([href]):not(.self-link) as well as +adding the class self-link to the links that were replaced.
+Some complexity was left out, such as the scope being moved into the function, so I had to generate a bunch of +scope_n at the top of the generated function from which the html was returned.

+

In the end it worked, an internal link was replaced by a navigation call, and navigation worked from my .md +navigation page.

+

The page was exactly how I wanted.

+

Yew page retrospective

+

Looking at only the wasm for this fairly minimal page, it was more than 400K. To make the page work +I had to build a complex build script that generated Rust code that was valid with the Yew framework.
+And to be honest, since bumping Yew from 0.19 to 0.20 during this process, seeing a turn towards even heavier +use of macros for functionality. I didn't see this as maintainable even in the medium term.
+I had a big slow page which probably wouldn't be maintainable where highlighting was tricky to integrate.

+

RIIJS

+

I decided to rewrite the page in javascript, or rather generate javascript from a Rust build script and skip +Yew entirely.
+It took less than two hours and the size of the application was now 68K in total, and much less complex.

+

The only dependencies now were pulldown-cmark for the build script, I wondered if I could get this to be even smaller.
+I found a css and js minifier written in Rust: minifier-rs.

+

After integrating that, the page was down to 60K, about 7 times smaller than before.
+Doing it in javascript also made it easy to apply highlighting again. I went back and had another look, finding +that Prism.js was fairly tiny, integrating that made highlighting work, bringing to page size to a bit over 70K.

+

I wasn't completely content with highlighting being done after the fact on a static page, and if that was to be +off-loaded +I might as well go with the massive starry-night library.
+Sadly this meant creating a build-dependency on npm and the dependency swarm that that brings. But in the +end my page was equally small as with prism, and doing slightly less work at view-time, with some nice highlighting.

+

In defense of Yew

+

Yew is not a bad framework, and that's not the point of this post. The point is rather the importance of +using the best tool for the job. wasm is not necessarily faster than javascript on the web, and if not doing +heavy operations which can be offloaded to the wasm, the complexity and size of a framework that utilizes it may not +be worth it. This page is just a simple collection of html with some highlighting, anything dynamic on the page +is almost entirely in the scope of DOM manipulation, which wasm just can't handle at the moment.

+

CI

+

Lastly, I wanted my page to be rebuilt and published in CI, and I wanted to not have to check in the dist folder, +so I created a pretty gnarly bash-script. The complexity isn't the bad part, the bad part is the +chained operations where each is more dangerous than the last.
+In essence, it checks out a temporary branch from main, builds a new dist, creates a commit, and then +force pushes that to the gh-pages branch. If this repo's history grows further in the future, +I'll look into making it even more destructive by just compacting the repo's entire history into one commit and +pushing that to that branch. But I don't think that will be necessary.

+

Rants on macros and generics

+

I like some of the philosophies of Yew, separating things into Components that pass messages. But, seeing +the rapid changes and the increasing use of proc-macros that do the same things as structs and +traits, only more opaquely, makes me fear that web development in Rust will follow the same churn-cycle as +javascript. What I may appreciate most about statically, strongly typed languages is that you know the type +of any given object. Macros and generics dilute this strength, and in my opinion should be used sparingly +when creating libraries, although I realize their respective strength and necessity at times. +I believe that adding macros creates a maintenance trap, and if what you're trying to do can already be +done without macros I think that's a bad decision by the authors. +Macros hide away internals, you don't get to see the objects and functions that you're calling, +if a breaking change occurs, knowing how to fix it can become a lot more difficult as you may have +to re-learn both how the library used to work internally, and the way it currently works, to preserve the old +functionality.
+</rant>

+
+
\ No newline at end of file diff --git a/pgwm03.html b/pgwm03.html new file mode 100644 index 0000000..e8d3514 --- /dev/null +++ b/pgwm03.html @@ -0,0 +1,267 @@ + + + + + + + + + Pgwm03 + + + +
+

PGWM 0.3, tiny-std, and xcb-parse

+

I recently made a substantial rewrite of my (now) pure rust x11 window manager and want to collect my thoughts on it +somewhere.

+

X11 and the Linux desktop

+

PGWM is an educational experience into Linux desktop environments, +the x11 specification +first came about in 1984 and has for a long time been the only mainstream way for gui-applications on Linux to +show what they need on screen for their users.

+

When working on desktop applications for Linux, the intricacies of that protocol are mostly hidden by the desktop +frameworks a developer might encounter. In Rust, +the cross-platform library winit can be used for this purpose, +and applications written in Rust like the terminal emulator Alacritty +uses winit.

+

At the core of the Linux desktop experience lies the Window Manager, either alone or accompanied by a Desktop +Enviroment (DE). The Window Manager makes decisions on how windows are displayed.

+

The concept of a Window

+

Window is a loose term often used to describe some surface that can be drawn to on screen.
+In X11, a window is a u32 id that the xorg-server keeps information about. It has properties, such as a height and +width, it can be visible or not visible, and it enables the developer to ask the server to subscribe to events.

+

WM inner workings and X11 (no compositor)

+

X11 works by starting the xorg-server, the xorg-server takes care of collecting input +from HIDs +like the keyboard and mouse, collecting information about device state, +such as when a screen is connected or disconnected, +and coordinates messages from running applications including the Window Manager.
+This communication goes over a socket, TCP or Unix. The default is /tmp/.X11-unix/X0 for a single-display desktop +Linux environment.

+

The details of the communication are specified in xml files in Xorg's gitlab +repo xcbproto. +The repo contains language bindings, xml schemas that specify how an object passed over the socket should be structured +to be recognized by the xorg-server. +The name for the language bindings is XCB for 'X protocol C-language Binding'.
+Having this kind of protocol means that a developer who can't or won't directly link to and use the xlib C-library +can instead construct their own representations of those objects and send those over the socket.

+

In PGWM a Rust language representation of these objects are used, containing serialization and deserialization methods +that turn Rust structs into raw bytes that can be transmitted on the socket.

+

If launching PGWM through xinit, an xorg-server is started at the beginning +of that script, if PGWM is launched inside that script it will try to become that server's Window Manager.

+

When an application starts within the context of X11, a handshake takes place. The application asks for setup +information from the server, and if the server replies with a success the application can start interfacing +with the server.
+In a WM's case, it will request to set the SubstructureRedirectMask on the root X11 window.
+Only one application can have that mask on the root window at a given time. Therefore, there can only be one WM active +for a running xorg-server.
+If the change is granted, layout change requests will be sent to the WM. From then on the WM can make decisions on the +placements of windows.

+

When an application wants to be displayed on screen it will send a MapRequest, when the WM gets that request it will +make a decision whether that window will be shown, and its dimensions, and forward that decision to the xorg-server +which is responsible for drawing it on screen. Changing window dimensions works much the same way.

+

A large part of the trickiness of writing a WM, apart from the plumbing of getting the socket communication right, is +handling focus.
+In X11, focus determines which window will receive user input, aside from the WM making the decision of what should +be focused at some given time, some Events will by default trigger focus changes, making careful reading of the +protocol an important part of finding maddening bugs.
+What is currently focused can be requested from the xorg-server by any application, and notifications on focus changes +are produced if requested. In PGWM, focus becomes a state that needs to be kept on both the WM's and X11's side to +enable swapping between workspaces and having previous windows re-focused, and has been a constant source of bugs.

+

Apart from that, the pure WM responsibilities are not that difficult, wait for events, respond by changing focus or +layout, rinse and repeat. +The hard parts of PGWM has been removing all C-library dependencies, and taking optimization to a stupid extent.

+

Remove C library dependencies, statically link PGWM 0.2

+

I wanted PGWM to be statically linked, small and have no C-library dependencies for 0.2. I had one problem.

+

Drawing characters on screen

+

At 0.1, PGWM used language bindings to the XFT(X FreeType interface library) +C-library, through the Rust libx11 bindings library X11. XFT handles font rendering. +It was used to draw characters on the status bar.

+

XFT provides a fairly nice interface, and comes with the added bonus +of Fontconfig integration. +Maybe you've encountered something like this JetBrainsMono Nerd Font Mono:size=12:antialias=true, it's +an excerpt from my ~/.Xresources file and configures the font for Xterm. Xterm uses fontconfig to figure out where +that font is located on my machine. Removing XFT and fontconfig with it, means that fonts have to specified by path, +now this is necessary to find fonts: /usr/share/fonts/JetBrains\ Mono\ Medium\ Nerd\ Font\ Complete\ Mono.ttf, oof. +I still haven't found a non C replacement for finding fonts without specifying an absolute path.

+

One step in drawing a font is taking the font data and creating a vector of light intensities, this process is called +Rasterization. Rust has a font rasterization library fontdue +that at least at one point claimed to be the fastest font rasterizer available. +Since I needed to turn the fonts into something that could be displayed as a vector of bytes, +I integrated that into PGWM. The next part was drawing it in the correct place. But, instead of looking +at how XFT did it I went for a search around the protocol and found the shm (shared memory) extension (This maneuver +cost me about a week).

+

SHM

+

The X11 shm extension allows an application to share memory with X11, and request the xorg-server to draw what's in +that shared memory at some chosen location. +So I spent some time encoding what should be displayed, pixel by pixel from the background color, with the characters as +bitmaps rasterized by fontdue on top, into a shared memory segment, then having the xorg-server draw from that +segment. +It worked, but it took a lot of memory, increased CPU usage, and was slow.

+

Render

+

I finally went to look at XFT's code and found that it uses +the render +extension, an extension that can register byte representations as glyphs, and then draw those glyphs at specified +locations, by glyph-id. This is the sane way to do +it. After implementing that, font rendering was again working, and the performance was good.

+

PGWM 0.3 how can I make this smaller and faster?

+

I wanted PGWM to be as resource efficient as possible, I decided to dig into the library that I used do serialization +and deserialization of Rust structs that were to go over the socket to the xorg-server.

+

The library I was using was X11rb an excellent safe and performant library for doing +just that. +However, I was taking optimization to a ridiculous extent, so I decided to make that library optimized for my specific +use case.

+

PGWM runs single threaded

+

X11rb can handle multithreading, making the execution path for single threaded applications longer than necessary.
+I first rewrote the connection logic from interior mutability (the connection handles synchronization) to exterior +mutability (user handles synchronization, by for example wrapping it in an Arc<RwLock<Connection>>).
+This meant a latency decrease of about 5%, which was pretty good. However, it did mean +that RAII +no longer applied and the risk of memory leaks went up. +I set the WM to panic on leaks in debug and cleaned them up where I found them to handle that.

+

Optimize generated code

+

In X11rb, structs were serialized into owned allocated buffers of bytes, which were then sent over the socket. +This means a lot of allocations. Instead, I created a connection which holds an out-buffer, structs are always +serialized directly into it, that buffer is then flushed over the socket. Meaning no allocations are necessary during +serialization.

+

The main drawback of that method is management of that buffer. If it's growable then the largest unflushed batch +will take up memory for the WM's runtime, or shrink-logic needs to be inserted after each flush. +If the buffer isn't growable, some messages might not fit depending on how the +buffer is proportioned. It's pretty painful in edge-cases. I chose to have a fixed-size buffer of 64kb.

+

At this point I realized that the code generation was hard to understand and needed a lot of changes to support my +needs. Additionally, making my WM no_std and removing all traces of libc was starting to look achievable.

+

Extreme yak-shaving, generate XCB from scratch

+

This was by far the dumbest part of the process, reworking the entire library to support no_std and generate the +structures and code from scratch. From probing the Wayland specification I had written a very basic Rust code +generation library codegen-rs, I decided to use that for code generation.

+

After a few weeks I had managed to write a parser for the xproto.xsd, a parser for the actual protocol files, and a +code generator that I could work with.
+A few more weeks followed and I had finally generated a no_std fairly optimized library for interfacing with X11 +over socket, mostly by looking at how x11rb does it.

+

Extreme yak-shaving, pt 2, no libc allowed

+

In Rust, libc is the most common way that the standard library interfaces with the OS, with some direct +syscalls where necessary. +There are many good reasons for using libc, even when not building cross-platform/cross-architecture libraries, +I wanted something pure Rust, so that went out the window.

+

Libc

+

libc does a vast amount of things, on Linux there are two implementations that dominate, glibc and musl. +I won't go into the details of the differences between them, but simplified, they are C-libraries that make your C-code +run as you expect on Linux.
+As libraries they expose methods to interface with the OS, for example reading or writing to a file, +or connecting to a socket.
+Some functions are essentially just a proxies for syscalls but some do more things behind the scenes, like +synchronization of shared application resources such as access to the environment pointer.

+

Removing the std-library functions and replacing them with syscalls

+

I decided to set PGWM to !#[no_std] and see what compiled. Many things in std::* are just re-exports from core::* +and were easily replaced. For other things like talking to a socket I used raw syscalls through the +excellent syscall crate +and some glue-code to approximate what libc does. It was a bit messy, +but not too much work replacing it, PGWM is now 100% not cross-platform, although it wasn't really before either...

+

No allocator

+

Since the standard library provides the allocator I had to find a new one, I decided to +use dlmalloc, +it works no_std, it was a fairly simple change.

+

Still libc

+

I look into my crate graph and see that quite a few dependencies still pull in libc:

+
    +
  1. time.rs +
  2. toml.rs +
  3. dlmalloc-rs +
  4. smallmap +
+

I got to work forking these libraries and replacing libc with direct syscalls.
+time was easy, just some Cargo.toml magic that could easily be upstreamed.
+toml was a bit trickier, the solution was ugly and I decided not to upstream it.
+dlmalloc-rs was even harder, it used the pthread-api to make the allocator synchronize, and implementing that +was beyond even my yak-shaving. Since PGWM is single threaded anyway I left it as-is and unsafe impl'd +Send and Sync.
+smallmap fairly simple, upstreaming in progress.

+

The ghost of libc, time for nightly

+

With no traces of libc I try to compile the WM. It can't start, it doesn't know how to start.
+The reason is that libc provides the application's entrypoint _start, without linking libc Rust doesn't +know how to create an entrypoint.
+As always the amazing fasterthanli.me has +a write-up about how to get around that issue. The solution required nightly and some assembly.
+Now the application won't compile, but for a different reason, I have no global alloc error handler.
+When running a no_std binary with an allocator, Rust needs to know what to do if allocation fails, but there is +at present no way to provide it with a way without another nightly feature +default_global_alloc_handler which looks like it's about to be +stabilized soon (TM).
+Now the WM works, no_std no libc, life is good.

+

Tiny-std

+

I was looking at terminal emulator performance. Many new terminal emulators seem to +have very poor input performance +. +I had noticed this one of the many times PGWM crashed and sent me back to the cold hard tty, a comforting +speed. alacritty is noticeably sluggish at rendering keyboard input to the screen, +I went back to xterm, but now that PGWM worked I was toying with the idea to write a fast, small, +terminal emulator in pure rust.
+I wanted to share the code I used for that in PGWM with this new application, and clean it up in the process: tiny-std +.

+

The goal of tiny-std is to make a std-compatible no_std library with no libc dependencies available for use with +Linux Rust applications on x86_64 and aarch64, which are the platforms I'm interested in. Additionally, all +functionality +that can work without an allocator should. You shouldn't need to pull in alloc to read/write from a file, just +provide your own buffer.

+

The nightmare of cross-architecture

+

Almost immediately I realize why libc is so well-used. After a couple of hours of debugging a segfault, and it turning +out to be incompatible field ordering depending on architecture one tends to see the light. +Never mind the third time that happens.
+I'm unsure of the best way to handle this, perhaps by doing some libgen straight from the kernel source, but we'll see.

+

Start, what's this on my stack?

+

I wanted to be able to get arguments and preferably environment variables +into tiny-std. Fasterthanli.me +helped with the args, but for the rest I had to go to the musl source.
+When an application starts on Linux, the first 8 bytes of the stack contains argc, the number of input arguments. +Following that are the null-terminated strings of the arguments (argv), then a null pointer, +then comes a pointer to the environment variables.
+musl then puts that pointer into a global mutable variable, and that's the environment.
+I buckle under and do the same, I see a world where arguments and environment are passed to main, and it's the +application's job, not the library, to decide to handle it in a thread-safe way +(although you can use env_p as an argument to main in C).
+Being no better than my predecessors, I store the environment pointer in a static variable, things like spawning +processes becomes a lot more simple that way, C owns the world, we just live in it.

+

vDSO (virtual dynamic shared object), what there's more on the stack?

+

Through some coincidence when trying to make sure all the processes that I spawn don't become zombies I encounter +the vDSO.
+ldd has whispered the words, but I never looked it up.

+
[gramar@grarch marcusgrass.github.io]$ ldd $(which cat)
+        linux-vdso.so.1 (0x00007ffc0f59c000)
+        libc.so.6 => /usr/lib/libc.so.6 (0x00007ff14e93d000)
+        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007ff14eb4f000)
+
+

It turns out to be a shared library between the Linux kernel and a running program, mapped into that program's memory.
+When I read that it provides faster ways to interface with the kernel I immediately stopped reading and started +implementing, I could smell the nanoseconds.

+

Aux values

+

To find out where the VDSO is mapped into memory for an application, the application needs to inspect the +AUX values at runtime. +After the environment variable pointer comes another null pointer, following that are the AUX values. +The AUX values are key-value(like) pairs of information sent to the process. +Among them are 16 random bytes, the pid of the process, the gid, and about two dozen more entries of +possibly useful values.
+I write some more code into the entrypoint to save these values.

+

A memory mapped elf-file

+

Among the aux-values is AT_SYSINFO_EHDR, a pointer to the start of the vDSO which is a full +ELF-file mapped into the process' memory.
+I know that in this file is a function pointer for the clock_gettime function through the +Linux vDSO docs. I had benchmarked tiny-std's +Instant::now() vs the standard library's, and found it to be almost seven times slower. +I needed to find this function pointer.

+

After reading more Linux documentation, and ELF-documentation, and Linux-ELF-documentation, +I managed to write some code that parses the ELF-file to find the address of the function. +Of course that goes into another global variable, you know, C-world and all that.

+

I created a feature that does the vDSO parsing, and if clock_gettime is found, uses that instead of the syscall. +This increased the performance if Instant::now() from ~std * 7 to < std * 0.9. In other words, it now outperforms +standard by taking around 12% less time to get the current time from the system.

+

Conclusion

+

I do a lot of strange yak-shaving, mostly for my own learning, I hope that this write-up might have given you something +too.
+The experience of taking PGWM to no_std and no libc has been incredibly rewarding, although I think PGWM is mostly +the same, a bit more efficient, a bit less stable.
+I'll keep working out the bugs and API och tiny-std, plans to do a minimal terminal emulator are still in the back of +my mind, we'll see if I can find the time.

+
+
\ No newline at end of file diff --git a/pgwm04.html b/pgwm04.html new file mode 100644 index 0000000..046027b --- /dev/null +++ b/pgwm04.html @@ -0,0 +1,208 @@ + + + + + + + + + Pgwm04 + + + +
+

PGWM 0.4, io-uring, stability, and static pie linking

+

A while back I decided to look into io-uring for an event-loop for +pgwm, I should have written +about it when I implemented it, but couldn't find the time then.

+

Now that I finally got pgwm to compile +using the stable toolchain, I'm going to write a bit about the way there.

+

Io-uring

+

Io-uring is a linux syscall interface +that allows you to submit io-tasks, and later collect the results of those tasks. +It does so by providing two ring buffers, one for submissions, and one for completions.

+

In the simplest possible terms, you put some tasks on one queue, and later collect them on some other +queue. In practice, it's a lot less simple than that.

+

As I've written about in previous entries on this website, I decided to scrap the std-lib and libc, and write +my own syscall interface in tiny-std.
+Therefore I had to look into the gritty details of how to set up these buffers, you can see those details +here. +Or, look at the c-implementation which I ripped off here.

+

Why io-uring?

+

I've written before about my x11-wm pgwm, but in short: +It's an x11-wm is based on async socket communication where the wm-reacts to incoming messages, like a key-press, and +responds with some set of outgoing messages on that same socket.
+When the WM had nothing to do it used the poll interface to await another message.

+

So the loop could be summed up as:

+
1. Poll until there's a message on the socket.
+2. Read from the socket.
+3. Handle the message.
+
+

With io-uring that could be compacted to:

+
1. Read from the socket when there are bytes available.
+2. Handle the message.
+
+

io-uring sounded cool, and this seemed efficient, so off I went.

+

Why not io-uring?

+

Io-uring is complex, the set-up is complex and there are quite a few considerations that need to be made. +Ring-buffers are set up, how big should they be? What if we get an incoming message pile-up? What if we get an +outgoing message pile-up? When is the best time to flush the buffers? What settings should I put on the uring?

+

There are more considerations than that, but I didn't really need to tackle most of these issues, since I'm not shipping +a production-ready lib that I'll support indefinitely, I'm just messing around with my WM. I cranked up the buffer +size to more than necessary, and it works fine.

+

Something that I did consider however, was whether to use SQ-poll, we'll get more into that and what that is.

+

Sharing memory with the kernel

+

Something that theoretically makes Io-uring more efficient than other io-alternatives is that the ring-buffers +are shared with the kernel. There is no need to make a separate syscall for each sent message, if you put a message +on the buffer, and update its offset through an atomic operation, that will be available for the kernel to use.
+But the kernel does need to find out about the submission outside of just the updated state. +There are two ways of doing this:

+
    +
  1. Make a syscall. Write an arbitrary amount of tasks to the submission queue, then tell the kernel about them through +a syscall. That same syscall can be used to wait until there are completions available as well, it's very flexible. +
  2. Have the kernel poll the shared memory for changes in the queue-offset and pick tasks up as they're added. Potentially, +this is a large latency-decrease as well as a throughput increase, no more waiting for syscalls! +
+

I thought this sounded great, in practice however, SQPoll resulted in a massive cpu-usage increase. I couldn't +tolerate that, so I'll have to save that setting for a different project. +In the end io-uring didn't change much about pgwm.

+

Stable

+

Since I ripped out libc, pgwm has required nightly to build, this has bothered me quite a bit. +The reason that the nightly compiler was necessary was because of tiny-std using the #[naked] feature to create +the assembly entrypoint (_start function), where the application starts execution.

+

Asm to global_asm

+

To be able to get aux-values, the environment variable pointer, and the arguments passed to the binary, access to +the stack-pointer at its start-position is required. Therefore, a function that doesn't mess up the stack needs to be +injected, passing that pointer to a normal function that can extract what's necessary.

+

An example:

+
/// Binary entrypoint
+#[naked]
+#[no_mangle]
+#[cfg(all(feature = "symbols", feature = "start"))]
+pub unsafe extern "C" fn _start() {
+    // Naked function making sure that main gets the first stack address as an arg
+    #[cfg(target_arch = "x86_64")]
+    {
+        core::arch::asm!("mov rdi, rsp", "call __proxy_main", options(noreturn))
+    }
+    #[cfg(target_arch = "aarch64")]
+    {
+        core::arch::asm!("MOV X0, sp", "bl __proxy_main", options(noreturn))
+    }
+}
+/// Called with a pointer to the top of the stack
+#[no_mangle]
+#[cfg(all(feature = "symbols", feature = "start"))]
+unsafe fn __proxy_main(stack_ptr: *const u8) {
+    // Fist 8 bytes is a u64 with the number of arguments
+    let argc = *(stack_ptr as *const u64);
+    // Directly followed by those arguments, bump pointer by 8
+    let argv = stack_ptr.add(8) as *const *const u8;
+    let ptr_size = core::mem::size_of::<usize>();
+    // Directly followed by a pointer to the environment variables, it's just a null terminated string.
+    // This isn't specified in Posix and is not great for portability, but we're targeting Linux so it's fine
+    let env_offset = 8 + argc as usize * ptr_size + ptr_size;
+    // Bump pointer by combined offset
+    let envp = stack_ptr.add(env_offset) as *const *const u8;
+    unsafe {
+        ENV.arg_c = argc;
+        ENV.arg_v = argv;
+        ENV.env_p = envp;
+    }
+    ...etc
+
+

I got this from an article by fasterthanli.me. But later realized that +you can use the global_asm-macro to generate the full function instead:

+
// Binary entrypoint
+#[cfg(all(feature = "symbols", feature = "start", target_arch = "x86_64"))]
+core::arch::global_asm!(
+    ".text",
+    ".global _start",
+    ".type _start,@function",
+    "_start:",
+    "mov rdi, rsp",
+    "call __proxy_main"
+);
+
+

Symbols

+

While this means that tiny-std itself could potentially be part of a binary compiled with stable, +if one would like to use for example alloc to have an allocator, then rustc would start emitting symbols +like memcpy. Which rust doesn't provide for some reason.

+

The solution to the missing symbols is simple enough, these symbols are provided in the external +compiler-builtins library, but that uses a whole host of features +that require nightly. So I copied the implementation (and license), removing dependencies on nightly features, and +exposed the symbols in tiny-std.

+

Now an application (like pgwm), can be built with the stable toolchain using tiny-std.

+

Static

+

In my boot-writeup I wrote about creating a minimal rust bootloader. A problem I encountered was that it needed +an interpreter. You can't see it with ldd:

+
[21:55:04 gramar@grarch marcusgrass.github.io]$ ldd ../pgwm/target/x86_64-unknown-linux-gnu/lto/pgwm
+        statically linked
+
+

Ldd lies (or maybe technically not), using file:

+
file ../pgwm/target/x86_64-unknown-linux-gnu/lto/pgwm
+../pgwm/target/x86_64-unknown-linux-gnu/lto/pgwm: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=9b54c91e5e84a8d3c90fdb9523f46e09cbf5c6e2, stripped
+
+

Or readelf -S:

+
+[21:57:21 gramar@grarch marcusgrass.github.io]$ readelf -S ../pgwm/target/x86_64-unknown-linux-gnu/lto/pgwm
+There are 18 section headers, starting at offset 0x16a0b0:
+Section Headers:
+  [Nr] Name              Type             Address           Offset
+       Size              EntSize          Flags  Link  Info  Align
+  [ 0]                   NULL             0000000000000000  00000000
+       0000000000000000  0000000000000000           0     0     0
+  [ 1] .interp           PROGBITS         00000000000002a8  000002a8
+       000000000000001c  0000000000000000   A       0     0     1
+  [ 2] .note.gnu.bu[...] NOTE             00000000000002c4  000002c4
+       0000000000000024  0000000000000000   A       0     0     4
+  [ 3] .gnu.hash         GNU_HASH         00000000000002e8  000002e8
+       000000000000001c  0000000000000000   A       4     0     8
+  [ 4] .dynsym           DYNSYM           0000000000000308  00000308
+       0000000000000018  0000000000000018   A       5     1     8
+  [ 5] .dynstr           STRTAB           0000000000000320  00000320
+       0000000000000001  0000000000000000   A       0     0     1
+  [ 6] .rela.dyn         RELA             0000000000000328  00000328
+       0000000000008310  0000000000000018   A       4     0     8
+  [ 7] .text             PROGBITS         0000000000009000  00009000
+       000000000013d5a4  0000000000000000  AX       0     0     16
+  [ 8] .rodata           PROGBITS         0000000000147000  00147000
+       000000000000eb20  0000000000000000   A       0     0     32
+  [ 9] .eh_frame_hdr     PROGBITS         0000000000155b20  00155b20
+       0000000000001a8c  0000000000000000   A       0     0     4
+  [10] .eh_frame         PROGBITS         00000000001575b0  001575b0
+       000000000000c1dc  0000000000000000   A       0     0     8
+  [11] .gcc_except_table PROGBITS         000000000016378c  0016378c
+       000000000000000c  0000000000000000   A       0     0     4
+  [12] .data.rel.ro      PROGBITS         0000000000164e28  00163e28
+       0000000000006088  0000000000000000  WA       0     0     8
+  [13] .dynamic          DYNAMIC          000000000016aeb0  00169eb0
+       0000000000000110  0000000000000010  WA       5     0     8
+  [14] .got              PROGBITS         000000000016afc0  00169fc0
+       0000000000000040  0000000000000008  WA       0     0     8
+  [15] .data             PROGBITS         000000000016b000  0016a000
+       0000000000000008  0000000000000000  WA       0     0     8
+  [16] .bss              NOBITS           000000000016b008  0016a008
+       0000000000000458  0000000000000000  WA       0     0     8
+  [17] .shstrtab         STRTAB           0000000000000000  0016a008
+       00000000000000a8  0000000000000000           0     0     1
+
+

Both file and readelf (.interp section) shows that this binary needs an interpreter, that being +/lib64/ld-linux-x86-64.so.2. If the binary is run in an environment without it, it +will immediately crash.

+

If compiled statically with RUSTFLAGS='-C target-feature=+crt-static' the application segfaults, oof.

+

I haven't found out the reason why tiny-std cannot run as a +position-independent executable, +or I know why, all the addresses to symbols (like static variables) are wrong. What I don't know yet is +how to fix it.

+

There is a no-code way of fixing it though: RUSTFLAGS='-C target-feature=+crt-static -C relocation-model=static'.
+This way the application will be statically linked, without requiring an interpreter, but it will not be +position independent.

+

If you know how to make that work, please tell me, because figuring that out isn't easy.

+

Future plans

+

I'm tentatively looking into making threading work, but that is a lot of work and a +lot of segfaults on the way.

+
+
\ No newline at end of file diff --git a/rust-kbd.html b/rust-kbd.html new file mode 100644 index 0000000..6f7efe7 --- /dev/null +++ b/rust-kbd.html @@ -0,0 +1,818 @@ + + + + + + + + + RustKbd + + + +
+

Building keyboard firmware in Rust, an embedded journey

+

Last time, I wrote about enabling Symmetric Multiprocessing on a keyboard using +QMK (and Chibios).
+This was discovered to be a bad idea, as I was told by a maintainer, or at least the way I was doing it, QMK +is not made for multithreading (yet).

+

My daughter sleeps a lot during the days, so I decided to step up the level of ambition a bit: +Can keyboard firmware be reasonably written from "scratch" using Rust, I asked myself, and found out that it can.

+

Overview

+

This writeup is about how I wrote multicore firmware using Rust for a lily58 PCB, +and a Liatris (rp2040-based) +microcontroller. The code for it is here.

+
    +
  1. Callback to the last writeup +
  2. Embedded on Rust +
  3. Development process (Serial interfaces) +
  4. Figuring out the MCU<->PCB interplay using QMK +
  5. Split keyboard communication woes +
  6. Keymaps +
  7. USB HID Protocol +
  8. OLED displays +
  9. BUUUUGS +
  10. Performance +
  11. Epilogue +
+

On the last episode of 'Man wastes time reinventing wheel'

+

Last time I did a pretty thorough dive into QMK, explaining keyboard basics, and most of the jargon used.
+I'm not going to be as thorough this time, but briefly:

+

Enthusiast keyboards

+

There are communities building enthusiast keyboards, often soldering components together themselves, and tailoring their +own firmware to fit their needs (or wants).

+

Generally, a keyboard consists of the PCB, microcontroller (sometimes integrated with the PCB), switches that go on the PCB, +and keycaps that go on the switches. Split keyboards are also fairly popular, those keyboards generally have two separate PCBs +that are connected to each other by wire, I've been using the split keyboard +iris for a long time. +There are also peripherals, such as rotary encoders, +oled displays, sound emitters, RGB lights and many more that can be integrated +with the keyboard. Pretty much any peripheral that the microcontroller can interface with is a possible add-on to +a user's keyboard.

+

QMK

+

To get the firmware together, an open source firmware repo called QMK can be used. There are a few others but to my +knowledge QMK is the most popular and mature alternative. You can make a keymap without writing any code at all, +but if you want to interface with peripherals, or execute advanced logic, some C-code will be necessary.

+

Back to last time

+

I bought a microcontroller which has dual cores, and I wanted to use them to offload oled-drawing to the core that +doesn't handle latency-sensitive activities, and did a deep dive into enabling that for my setup. +While it worked it was not thread-safe and generally discouraged by the maintainers.

+

That's when I decided to write my own firmware in Rust.

+

Embedded on Rust

+

I hadn't written code for embedded targets before my last foray into keyboard firmware, I had some tangential experience +with the heapless library which exposes stack-allocated collections. +These can be useful for performance in some cases, but very useful if you haven't got a heap at all, like you +often will not have on embedded devices.

+

I searched for rp2040 Rust and found rp-hal, hal stands for Hardware +Abstraction Layer, and the crate exposes high-level code to interface with low-level processor and peripheral functionality.

+

For example, spawning a task on the second core, resetting to bootloader, reading GPIO +pins, and more. This was a good starting point, when I found this project I had already soldered together +the keyboard and was ready to write firmware for it.

+

CPU and board

+

rp-hal provides access to the basic CPU-functionality, but that CPU is mounted on a board in itself, which has +peripherals, in this case it's the Liatris, the mapping of the outputs +of the board to code is called a Board support package (BSP), and can be put in the +rp-hal-boards repo so that they can be shared. +I haven't made a PR for my fork yet, +I'm planning to do it when I've worked out all remaining bugs in my code, but it's very much based on the +rp-pico BSP.

+

Starting development

+

Now I wanted to get any firmware running just to see that it's working.

+

USB serial

+

The Liatris MCU has an integrated USB-port, I figured that the easiest way to see if the firmware boots and works +at all was to implement some basic communication over that port, until I can get some information out of the MCU +I'm flying completely blind.

+

The rp-pico BSP examples +were excellent, using them I could set up a serial interface which just echoed back what was written to it to the OS.

+

Hooking the serial interface up to the OS was another matter though. I compiled the firmware and +flashed it to the keyboard by holding down the onboard boot-button and pressing reset, then went to +figure out the OS parts.

+

USB CDC ACM

+

After some searching I realize that I need some drivers to connect to the serial device: +USB CDC ACM, USB and two meaningless letter combinations. Together they stand for

+
+

Universal Serial Bus Communication Device Class Abstract Control Model

+
+

When the correct drivers are installed, and the keyboard plugged in, dmesg +tells me that there's a new device under /dev/ttyACM0.

+
echo "Hello!" >> /dev/ttyACM0
+
+

No response.

+

I do some more searching and find out that two-way communication with serial devices over the CDC-ACM-driver +isn't as easy as echoing and cating a file. minicom is a program +that can interface with this kind of device, but the UX was obtuse, looking for alternatives I found +picocom which serves the same purpose but is slightly nicer to use:

+
[root@grentoo /home/gramar]# picocom -b 115200 -l /dev/ttyACM0
+picocom v3.1
+port is        : /dev/ttyACM0
+flowcontrol    : none
+baudrate is    : 115200
+parity is      : none
+databits are   : 8
+stopbits are   : 1
+escape is      : C-a
+local echo is  : no
+noinit is      : no
+noreset is     : no
+hangup is      : no
+nolock is      : yes
+send_cmd is    : sz -vv
+receive_cmd is : rz -vv -E
+imap is        : 
+omap is        : 
+emap is        : crcrlf,delbs,
+logfile is     : none
+initstring     : none
+exit_after is  : not set
+exit is        : no
+Type [C-a] [C-h] to see available commands
+Terminal ready
+
+

There's a connection! Enabling echo and writing hello gives the output hHeElLlLoO, the Liatris responding +with a capitalized echo.

+

Making DevEx nicer

+

I write some code that checks the last entered characters and executes commands depending on what they are. +First off, making a reboot easier:

+
if last_chars.ends_with(b"boot") {
+    reset_to_usb_boot(0, 0);
+}
+
+

Great, now I can connect to the device and type boot, and it'll boot into flash-mode so that I can load new firmware +onto it, this made iterating much faster. Since everything was soldered and mounted, I had to use a (wooden) skewer +to reach under the oled and press the boot button on the microcontroller before this. I recommend not soldering on +components blocking access to the boot-button if doing this kind of programming.

+

Developing actual keyboard functionality

+

There are schematics for the pcb +online, as well as a schematic of the pinout of the elite-c MCU, +which the developers told me were the same as for the Liatris, this seems to be true.

+

Rows and columns are connected to GPIO-pins in the MCU, switches connect rows and columns, if switches are pressed a current can flow between them. +My first thought was that if a switch that sits between row0 and col0 is pressed, the pin for row0 and col0 would read +high (or low), that's not the case.

+

PullUp and PullDown resistors

+

Here is where my complete ignorance of embedded comes to haunt me, GPIO pins can be configured to be either PullUp or +PullDown, what that meant was beyond me, it still is to a large extent. The crux of it is that either +there's a resistor connected to power or ground, up or down respectively.

+

That made some sense to me, I figure either the rows or columns should be PullUp while the other is PullDown. +This did not produce any reasonable results either. +At this point, I had written some debug-code which scanned all GPIO-pins and printed if their state changed, and +I was mashing keyboard buttons with strange output as a result.

+

I was getting frustrated with non-progress and decided to look into QMK, there's a lot of __weak__-linkage, +the abstract class of C, so actually following the code in QMK +can be difficult, which is why I hadn't browsed it in more depth earlier.

+

But I did find the problem. All pins, rows and columns, should be pulled high (PullUp), +then the column that should be checked is set low, and then all rows are checked, if any row goes low then the switch +connecting the checked column and that row is being pressed. In other words:

+

Set col0 to low, if row0 is still high, switch 0, 0 top-left for example, is not pressed. +If row1 is now low, it means that switch 1, 0, first key on the second row, is being pressed.

+

Now I can detect which keys are being pressed, useful functionality for a keyboard.

+

Split keyboards

+

Looking back at the schematic I see that there's a pin +labeled side-indicator, that either goes to ground or voltage. After a brief check it reads, as expected, high on the left +side, and low on the right side.

+

Now that I can detect which keys are being pressed, by coordinates, and which side is being run, +it's time to transmit key-presses from the right-side to the left.

+

The reason to do it that way is that the left is the side that I'm planning on connecting to the computer with a +usb-cable. Now, I could have written to code to be side-agnostic, checking whether a USB-cable is connected and choosing +whether to send key-presses over the wire connecting the sides, or the USB-cable. However, that approach both increases +complexity and binary size, so I opted not to.

+

Stupid note

+

I could also have made each side a separate independent keyboard, which would have been pretty fun, but problematic +for a lot of reasons, like using left shift pressing a right-key, I'd have to have software on the computer to patch them +together.

+

Bits over serial

+

Looking at the schematics again, I see that one pin is labeled DATA, that pin is the one connected to the +pad that the TRRS cable connects the sides with.
+However, there is only one pin on each side, which means that all communication is limited to setting/reading +high/low on a single pin. Transfer is therefore limited to one bit at a time.

+

Looking over the default configuration for my keyboard in QMK the BitBang +driver is used since nothing else is specified, there are also USART, single- and full-duplex available.

+

UART/USART

+

UART stands for Universal Asynchronous +Receiver-Transmitter, and is a protocol (although the wiki says a peripheral device, terminology unclear) +to send bits over a wire.

+

There is a UART-implementation for the rp2040, in the rp-hal-crate, but it +assumes usage of the builtin uart-peripheral, that uses both an RX and TX-pin in a pre-defined set position, in my case +I want to either have half-duplex communication (one side communicates at a time), or simplex communication from right +to left. That means that the DATA-pin on the left side should be UART-RX (receiver) while the DATA-pin on +the right is UART-TX (transmitter).

+

I search further for single-pin UART and find out about PIO.

+

PIO

+

The rp2040 has blocks with state-machines which can run like a separate processor manipulating and reading +pin-states, these can be programmed with specific assembly, and there just happens to be someone +who programmed a uart-implementation in that assembly here.

+

It also turns out that someone ported that implementation to a Rust library here.

+

I hooked up the RX-part to the left side, and the TX to the right, and it worked!

+

Note

+

You could probably make a single-pin half-duplex uart implementation by modifying the above pio-asm by not that much. +You'd just have to figure out how to wait on either data in the input register from the user program, or communication +starting from the other side. There's a race-condition there though, maybe I'll get to that later.

+

Byte-protocol

+

Since I'm using hardware to send data bit-by-bit I made a slimmed-down protocol. The right side has 28 buttons and a +rotary-encoder. A delta can be fit into a single byte.

+
+

Edit 2024-04-17

+

Changed this to two bytes, where the content is sandwitched between a header and footer like this:

+
const HEADER: u16 = 0b0101_0000_0000_0000;
+const FOOTER: u16 = 0b0000_0000_0000_0101;
+// convert 8 bit msg into 16 bits, shift it 4 to the left
+// Then OR with header and footer to create 16 bits with the actual message at the middle
+let msg = ((byte_to_send as u16) << 4) | HEADER | FOOTER;
+
+

The reason is that if the right-side is disconnected and reconnected, the lowering and then +raising of the uart-pin becomes a valid message, but it'll be wrong. Either it will be all 0s or all 1s +at the head or tail of the message, which these bit-patterns eliminate.

+
+

Visualizing the keyboard's keys as a matrix with 5 rows, and 6 columns there's at most 30 keys. +The keys can be translated into a matrix-index where 0,0 => 0, 1,0 -> 6, 2, 3 -> 15, by rolling out +the 2d-array into a 1d one.

+

In the protocol, the first 5 bits gives the matrix-index of the key that changed. The 6th bit is whether +that key was pressed or released, the 7th bit indicates whether the rotary-encoder has a change, and the 8th +bit indicates whether that change was clock- or counter-clockwise.

+

For better or worse, almost all bit-patterns are valid, some may represent keys that do not exist, since there are +28 keys, but 32 slots for the 5 bits indicating the matrix-index.

+

I used the bitvec crate for bit-manipulation when prototyping, +that library is excellent.
+I warmly recommend it, even though I went with a more custom solution for performance reasons (I made some +specific optimizations to my use-case, see 'Performance').

+

Keymap

+

Now, to send key-presses to the OS, of course there's a crate for that.

+

It helps with the plumbing and exposes the struct that I've got to send to the OS (and the API to do the sending), +I just have to fill it with reasonable values:

+
/// Struct that the OS wants
+pub struct KeyboardReport {
+    pub modifier: u8,
+    pub reserved: u8,
+    pub leds: u8,
+    pub keycodes: [u8; 6],
+}
+
+

I found this pdf from usb.org, which specifies keycode and modifier +values. I encoded those as a struct.

+
#[repr(transparent)]
+#[derive(Copy, Clone, Debug, Eq, PartialEq)]
+pub struct KeyCode(pub u8);
+#[allow(dead_code)]
+impl KeyCode {
+    //Keyboard = 0x01; //ErrorRollOver1 Sel N/A 3 3 3 4/101/104
+    //Keyboard = 0x02; //POSTFail1 Sel N/A 3 3 3 4/101/104
+    //Keyboard = 0x03; //ErrorUndefined1 Sel N/A 3 3 3 4/101/104
+    pub const A: Self = Self(0x04); //a and A2 Sel 31 3 3 3 4/101/104
+    pub const B: Self = Self(0x05); //b and B Sel 50 3 3 3 4/101/104
+    // ... etc etc etc
+
+

Now I know which button is pressed by coordinates, and how to translate those to values that the OS can understand.

+

And it works! Kind of...

+

USB HID Protocol?

+

I will admit that I did not read the entire PDF, what I did find out was that there's a poll-rate that the OS specifies, +I set that at the lowest possible value, 1ms. Each 1 ms the OS triggers an interrupt:

+
/// Interrupt handler
+/// Safety: Called from the same core that publishes
+#[interrupt]
+#[allow(non_snake_case)]
+#[cfg(feature = "hiddev")]
+unsafe fn USBCTRL_IRQ() {
+    crate::runtime::shared::usb::hiddev_interrupt_poll();
+}
+
+

Oh right, interrupts

+

Interrupts are ways for the processor to interrupt current executing code +and executing something else, interrupt handlers are similar to Linux signal handlers.

+

In this specific case, the USB-peripheral generates an interrupt when polled, the core that registered an interrupt +handler for that specific interrupt (USBCTRL_IRQ) will pause current execution and run the code contained in +the interrupt-handler.

+

This has potential of triggering UB with unsafe code (depending on where the core was stopped, it may have been holding +a mutable reference which the interrupt handler needs), and deadlocks with code that guards against multiple mutable +references through locking.

+

One way to handle this, if using mutable statics (which you almost certainly have to without an allocator), +is to execute sensitive code within a critical_section, of course, +there's a library for that.
+The critical-section, when entered, causes the core to ignore interrupts until exited.

+
// Both of these functions use the same static mut variable
+#[cfg(feature = "hiddev")]
+pub unsafe fn try_push_report(keyboard_report: &usbd_hid::descriptor::KeyboardReport) -> bool {
+    // This core won't be interrupted while handling the mutable reference.
+    // A regular lock without a critical section here would cause a deadlock in the below interrupt handling procedure 
+    // if timing is unfortunate.
+    critical_section::with(|_cs| {
+        USB_HIDDEV
+            .as_mut()
+            .is_some_and(|hid| hid.try_submit_report(keyboard_report))
+    })
+}
+#[cfg(feature = "hiddev")]
+pub unsafe fn hiddev_interrupt_poll() {
+    // This core won't be interrupted, because there's only one interrupt registered, so there's nothing to interrupt this.
+    // Since it's already interrupted the core that handles the other mutable reference to this variable 
+    // we can be certain that this is the only mutable reference active without a critical section or other lock.
+    if let Some(hid) = USB_HIDDEV.as_mut() {
+        hid.poll();
+    }
+}
+
+

USB HID protocol

+

Back to the protocol, the API has two ends, one for polling the OS, one for submitting HID-reports.
+It turns out that even if you don't expect any data from the OS the device needs to be polled to communicate.

+

In my first shot I just pushed keyboard reports on every diff and polling immediately after. This caused +key-actions to disappear, they didn't reach the OS.

+

I still haven't quite figured out why since I'm not overflowing the buffer, digging into the code didn't help me +understand much either, but it was pretty opaque.

+

I settled for pushing at most one keyboard report per poll, that means at most one per ms. +This means a worst case latency of 1ms on a key-action assuming there's no queue-backup, I keep eventual unpublishable +reports in a queue that's drained 1 entry per poll. Again, there may be something written in the specifications +about this, but it's good enough for now.

+

Follow-up

+

I did try to find more information about the USB HID protocol but was unable to. +I also tried to figure out how to do keyrollover, specifically NKRO +but could not figure out how to have more registered keys than the keyboard_report-struct +can fit (6), so the keyboard is 6KRO, which Is fine by me.

+

Oled displays

+

One of the motivators for using multiple cores were the ability to render to oled on-demand with low latency.

+

Drawing to an oled display is comparatively slow, so offloading that to a separate core was something that I was interested +in doing.

+

I created a shared message queue guarded by a spin-lock:

+
#[derive(Debug, Copy, Clone)]
+pub enum KeycoreToAdminMessage {
+    // Notify on any user action
+    Touch,
+    // Send loop count to calculate scan latency
+    Loop(LoopCount),
+    // Output which layer is active
+    LayerChange(KeymapLayer),
+    // Output bytes received over UART
+    Rx(u16),
+    // Write a boot message then trigger usb-boot
+    Reboot,
+}
+
+

When displayed it looks like this:

+

oleds

+

Setting it up was pretty trivial, there's a library for SSD1306 oleds +which works great!

+

Now I have a keyboard that can submit key-presses to the OS, and display some debug information on its oleds, +time to get into the bugs.

+

BUUUUUUUGS

+

Almost immediately when trying to type I discovered that keys would be repeated, pressing t would result in +19 t's for example.

+

Spooky electrons, debounce!

+

I looked into QMK once more, since my keyboard with QMK firmware doesn't have issues (IE not a hardware problem).
+All excepts of C below are from QMK, license here.

+

Here's the function that reads pins:

+
/// quantum/matrix.c
+__attribute__((weak)) void matrix_read_rows_on_col(matrix_row_t current_matrix[], uint8_t current_col, matrix_row_t row_shifter) {
+    bool key_pressed = false;
+    // Select col
+    if (!select_col(current_col)) { // select col
+        return;                     // skip NO_PIN col
+    }
+    matrix_output_select_delay();
+    // For each row...
+    for (uint8_t row_index = 0; row_index < ROWS_PER_HAND; row_index++) {
+        // Check row pin state
+        if (readMatrixPin(row_pins[row_index]) == 0) {
+            // Pin LO, set col bit
+            current_matrix[row_index] |= row_shifter;
+            key_pressed = true;
+        } else {
+            // Pin HI, clear col bit
+            current_matrix[row_index] &= ~row_shifter;
+        }
+    }
+    // Unselect col
+    unselect_col(current_col);
+    matrix_output_unselect_delay(current_col, key_pressed); // wait for all Row signals to go HIGH
+}
+
+

I had looked at it previously, but disregarded those delays (matrix_output_select_delay() and +matrix_output_unselect_delay(current_col, key_pressed); // wait for all Row signals to go HIGH), because +we're trying to be speedy here. Thread.sleep() isn't speedy, everyone knows that.

+

However, it turns out that they are important. Again I have to follow weak functions, a nightmare:

+
/// quantum/matrix_common.c
+__attribute__((weak)) void matrix_output_select_delay(void) {
+    waitInputPinDelay();
+}
+// Found implementation in -> 
+/// platform/chibios/_wait.h
+#ifndef GPIO_INPUT_PIN_DELAY
+#    define GPIO_INPUT_PIN_DELAY (CPU_CLOCK / 1000000L / 4)
+#endif
+#define waitInputPinDelay() wait_cpuclock(GPIO_INPUT_PIN_DELAY)
+
+

I get no editor support in this project, so I have to grep through countless board implementations until I found +the correct one, which isn't exactly easy to tell. But, after setting the col-pin to low, there's a 250ns wait.

+

I implement it, and it changes nothing. On to the next!

+
/// quantum/matrix_common.c
+__attribute__((weak)) void matrix_output_unselect_delay(uint8_t line, bool key_pressed) {
+    matrix_io_delay();
+}
+/// quantum/matrix_common.c
+/* `matrix_io_delay ()` exists for backwards compatibility. From now on, use matrix_output_unselect_delay(). */
+__attribute__((weak)) void matrix_io_delay(void) {
+    wait_us(MATRIX_IO_DELAY);
+}
+// quantum/matrix_common.c
+#ifndef MATRIX_IO_DELAY
+#    define MATRIX_IO_DELAY 30
+#endif
+
+

for all of the above symbols, I need to check that it's not specifically overridden by my keyboard implementation, +none were. matrix_output_unselect_delay(current_col, key_pressed) therefore waits 30μs.

+

I add the delay and the number of t's go from 19 to sometimes many, good not great. But, my scan-rate which is directly influencing +latency on presses goes from around 40μs to 200μs+ (6 columns, each with a 30μs sleep), unacceptable. The above code did come with a comment, +it wants the row-pins to settle back into high, so I could just check for that instead!

+
// Wait for all rows to settle
+for row in rows {
+    while matches!(row.0.is_low(), Ok(true)) {}
+}
+
+

Now latency lands around 50μs. I still have that issue of the many t's, but at least the problem didn't get worse.

+

I hook up the keyboard to picocom and start reading output lines.
+I output each state-delta as M0, R0, C0 -> true [90237], matrix index, row_index, column index, and whether the key +is pressed or not, followed by the number of microseconds since the last state-change.

+

I can see that the activation-behavior is strange, sometimes, immediately (generally around 250μs after a +legitimate key-action) state-flips unexpectedly and holds in the ghost-state for 100-2500μs. +It's not a rogue flip, the state is actually changed as if the switch is pressed (or released) for quite some time.

+

However much I tried, I could not get these ghosts out of my keyboard, I had to learn to live with them.

+

Debouncing

+

Debouncing is a way to regulate signals (I think, this really isn't my field, don't roast me on the definitions), and +is a broad concept which can be applied to noisy signals in all kinds of areas.

+

I wanted to implement debouncing in a way that affected latency minimally, luckily this behaviour is only triggered +after legitimate key-actions, and on a per-key basis. IE. I only have to regulate keys after the first signal which I +know is good, and only for the same key that produced the good signal.

+

I record the last key-action and set up quarantine logic, it goes like this:

+
+

If a key has a delta shortly (implemented with a constant, 10_000 micros at writing) after the previous delta, +require that the new state is repeated for a short (same as above) time before producing a signal.

+
+

My fastest repeated key-pressing of a single key is around 40_000μs between presses, so this should not activate +on good presses. Furthermore, if it does and that state is held for long enough the key comes through anyway.

+

This worked like a charm, on a given keypress it should not increase latency at all, but it killed the noise.

+

Mysterious halting

+

At some point of developing the keymap, the keyboard would start freezing on boot, not producing any output. +I couldn't understand why, but core1, which handles key-presses wouldn't report anything. Once more I had to +get the dedicated boot-skewer out to flash new firmware.

+

I started removing the latest changes and realized that scanning 5 columns for changes but not 6 on the left side +would work fine. Adding back scanning 6 columns would freeze immediately again.

+

I took a break and when doing something else it suddenly struck me, here!

+
#[allow(static_mut_refs)]
+if let Err(_e) = mc.cores()[1].spawn(unsafe { &mut CORE_1_STACK_AREA }, move || {
+    run_core1(
+        receiver,
+        left_buttons,
+        timer,
+        #[cfg(feature = "hiddev")]
+        usb_bus,
+    )
+})
+
+

Can you see it?

+

Well?

+

The unsafe draws the attention, but I'm manually setting the stack area for core1:

+

static mut CORE_1_STACK_AREA: [usize; 1024] = [0; 1024];

+

When adding the code for a 6th row, the stack overflows and the core halts, increasing the stack area +immediately solved the issue.

+

Performance

+

Now the keyboard is actually usable, time for the fun part, performance. This is my first real embedded project, +and I learned a lot programming for a different target.

+

Real time

+

First off, since there's not much of a scheduler running (disregarding interrupts) the displayed scan rate on the +oleds gives very direct feedback on changes in performance, usually it's much more difficult to see how code-changes +impact performance, but here it's immediate and easy to spot.

+

Priorities

+

Measurement is the key to performance, and the measurements of interests are, in order, scan rate, key-processing-rate, +and binary size. Scan rate is important, because that determines the latency of key-press -> OS, +secondly, key-processing can't be too slow since that immediately tacks on to the latency, lastly there's a size restriction +of 2MB on the produced image.

+

Methodology

+

The oled displays scan rate, so that's easy. Key-processing-rate can't be measured as easily. However, jamming +the keyboard at max speed and checking the scan rate was used as a proxy. Binary size can be inspected on compilation.

+

Inlining

+

When people talk about performance inlining often comes up.

+

Briefly, inlining is replacing a function call with the code from that function at the call-site, here's an example.

+
+fn my_add(a: i32, b: i32) -> i32 {
+    a + b
+}
+fn not_inlined_caller() {
+    // Not inlined the function is called, moving 1, and 2 into the correct ABI-defined registers
+    // then invoking the function.
+    my_add(1, 2); 
+}
+fn inlined_caller_after_inlining() {
+    // my_add(1, 2) <- disappears
+    1 + 2 // <- `my_add` function body copied into this function
+}
+
+

Inlining reduces some overhead, such as shuffling around values to registers, and invoking functions, +but all that copying of code can produce a lot of instructions, which may thrash the CPU's instruction cache.

+

Here's an example of how that could become problematic:

+
#[inline]
+fn my_very_long_fn() {
+    // 1000 lines of spooky code
+}
+fn my_caller(rarely_true: bool) {
+    if rarely_true {
+        my_very_long_fn();
+    }
+}
+
+

Depending on the CPU, it might, on entering my_caller, have to fetch all the instructions contained in my_very_long_fn +draining space in the instruction cache resulting in re-fetches which may take a long time. +If rarely_true is rarely true this could be an unnecessary overhead, and if the function is long enough, the +eventual savings from inlining may pale in comparison to the execution-time of the inlined function meaning that +there's no upside in the rarely_true == true-case, and huge downside in the rarely_true == false-case.

+

It's hard to draw general conclusions however, you have to measure to be sure, luckily I measured!

+

Inlining in practice

+

There weren't huge surprises on where inlining made the most difference, but I was surprised with how much it mattered.

+

The general logic of core1 is this:

+
    +
  1. Check for changes (uart, gpio, usb). +
  2. On a change, execute some logic (left side sends a keypress to the OS, right side sends it to the left). +
  3. Report changes to core0. +
+

The vast majority of the time each loop produces no change, here's an excerpt from left side core1:

+
loop {
+    let mut any_change = false;
+    if let Some(update) = receiver.try_read() {
+        // Right side sent an update
+        rx += 1;
+        // Update report state
+        kbd.update_right(update, &mut report_state);
+        any_change = true;
+    }
+    // Check left side gpio and update report state
+    if kbd.scan_left(&mut left_buttons, &mut report_state, timer) {
+        any_change = true;
+    }
+    if any_change {
+        push_touch_to_admin();
+    }
+    #[cfg(feature = "hiddev")]
+    {
+        let mut pop = false;
+        if let Some(next_update) = report_state.report() {
+            // Publish the next update on queue if present
+            unsafe {
+                pop = crate::runtime::shared::usb::try_push_report(next_update);
+            }
+        }
+        if pop {
+            // Remove the sent report (it's down here because of the borrow checker)
+            report_state.accept();
+        }
+    }
+    if let Some(change) = report_state.layer_update() {
+        push_layer_change(change);
+    }
+    if rx > 0 && push_rx_change(rx) {
+        rx = 0;
+    }
+    if loop_count.increment() {
+        let now = timer.get_counter();
+        let lc = loop_count.value(now);
+        if push_loop_to_admin(lc) {
+            loop_count.reset(now);
+        }
+    }
+}
+
+

Some of the code in that loop is only triggered in certain cases, I followed the philosophy of inlining most of what +always runs, and refusing to inline things that are conditionally called, Rust has facilities for this:

+

#[inline], #[inline(never)], and #[inline(always)], the compiler is usually smart enough that it makes +the correct call if #[inline] is specified or not, so #[inline(never)], and #[inline(always)] aren't that necessary.

+

More information here on cross-crate stuff, but I'm compiling +with fat-lto anyway, so it doesn't really matter to me here.

+

The most impressive change was removing #[inline] from kbd.update_right(update, &mut report_state); inside the +if-statement above, that took the current scan latency from 80μs to around 36μs. Not inlining it halved the +scan latency.

+

Last notes on inlining, the compiler makes decisions about inlining that can be very hard to understand, you change +something seemingly irrelevant, and suddenly the binary increases in size by 25% and latency increases by about +the same amount because the compiler decided to inline something that doesn't fit with your performance goals. +I want the scan-loop to be fast, but the compiler saw an opportunity to make something else fast at the expense of +the scan-loop, for example. It's not a bad decision, but it's a bad fit.
+Making small changes and testing them is therefore important, and interesting!

+

Const evaluation, bounds checking

+

Fewer instructions are often better, fewer instructions are generally faster to execute than more instructions, +they take up less space in the instruction-cache, and may therefore make an inlining-tradeoff make more sense.

+

This get_unchecked which elides the bounds-check made a massive difference in performance.

+
/// self.buffer[self.tail] -> unsafe {self.buffer.get_unchecked_mut(self.tail)};
+
+

It did it in two parts, it caused the compiler to inline the function, that in itself did a lot. +I manually marked it inline and reverted the change, and it still provided a several microsecond benefit. +Since I do bounds-checking elsewhere, I was confident keeping this unsafe.

+

To further improve performance I wanted to evaluate as much as possible at compilation time, so that things are +accessed efficiently, if I can assert that indices are in bounds at comptime, I can safely use unsafe +index accesses. Rust's type system provides tools for that, and since I know how many keys I have on my keyboard, +I don't have to have any dynamically sized arrays.

+

Here's an example:

+
#[repr(transparent)]
+#[derive(Debug, Copy, Clone)]
+pub struct RowIndex(pub u8);
+impl RowIndex {
+    #[must_use]
+    #[allow(clippy::missing_panics_doc)]
+    pub const fn from_value(ind: u8) -> Self {
+        assert!(
+            ind < NUM_ROWS,
+            "Tried to construct row index from a bad value"
+        );
+        Self(ind)
+    }
+    #[inline]
+    #[must_use]
+    pub const fn index(self) -> usize {
+        self.0 as usize
+    }
+}
+
+

The RowIndex-struct only accepts indices that are valid, therefore it's always safe to use to index into +structures with NUM_ROWS length or more.

+

Using this strategy to elide bounds-checking shaved more microseconds off the loop-times. Since pin-indexing +is done on the gpio pin-scan on each loop, these improvements makes quite the difference.

+

Macros to avoid branching

+

I abhor macros, they're difficult to follow and understand, and professionally I try to avoid them like the plague. +But, here in my private life it's all about the performance, and they can be useful to avoid branching.

+

Consider the connection of the actual GPIO-pin, and the struct that I use to keep a pin's state in memory.

+

They have different types, all the GPIO-pins have different types, and all the keys as well, they can't be +kept in a collection together without using a v-table. This, in my opinion, is fixable in Rust. +The reason that the buttons, for example, can't be kept together, is that each button may have a different memory layout.

+

In my case they all have the same layout and all expose the same function, here's an example:

+
impl KeyboardButton for LeftRow0Col0 {
+    fn on_press(&mut self, keyboard_report_state: &mut KeyboardReportState) {
+        keyboard_report_state.push_key(KeyCode::TAB);
+    }
+    fn on_release(
+        &mut self,
+        _last_press_state: LastPressState,
+        keyboard_report_state: &mut KeyboardReportState,
+    ) {
+        keyboard_report_state.pop_key(KeyCode::TAB);
+    }
+}
+
+

I generate the key-structs from a macro, they all have the exact same layout. +I should be able to store them in an array (assuming that the function addresses of each respective button's methods are knowable +which thinking about it, they might not be).

+

Macros are a way around this though:

+
macro_rules! impl_read_pin_col {
+    ($($structure: expr, $row: tt,)*, $col: tt) => {
+        paste! {
+            pub fn [<read_col _ $col _pins>]($([< $structure:snake >]: &mut $structure,)* left_buttons: &mut LeftButtons, keyboard_report_state: &mut KeyboardReportState, timer: Timer) -> bool {
+                // Safety: Make sure this is properly initialized and restored
+                // at the end of this function, makes a noticeable difference in performance
+                let col = unsafe {left_buttons.cols.$col.take().unwrap_unchecked()};
+                let col = col.into_push_pull_output_in_state(PinState::Low);
+                // Just pulling chibios defaults of 0.25 micros, could probably be 0
+                crate::timer::wait_nanos(timer, 250);
+                let mut any_change = false;
+                $(
+                    {
+                        if [< $structure:snake >].check_update_state(left_buttons.row_pin_is_low(rp2040_kbd_lib::matrix::RowIndex::from_value($row)), keyboard_report_state, timer) {
+                            any_change = true;
+                        }
+                    }
+                )*
+                left_buttons.cols.$col = Some(col.into_pull_up_input());
+                $(
+                    {
+                        while left_buttons.row_pin_is_low(rp2040_kbd_lib::matrix::RowIndex::from_value($row)) {}
+                    }
+                )*
+                any_change
+            }
+        }
+    };
+}
+
+

Here's how it's used:

+
impl_read_pin_col!(
+    LeftRow0Col1, 0,
+    LeftRow1Col1, 1,
+    LeftRow2Col1, 2,
+    LeftRow3Col1, 3,
+    LeftRow4Col1, 4,
+    ,1
+); 
+// Produces function `read_col_1_pins` with proper typechecking
+let col1_change = read_col_1_pins(
+    &mut self.left_row0_col1,
+    &mut self.left_row1_col1,
+    &mut self.left_row2_col1,
+    &mut self.left_row3_col1,
+    &mut self.left_row4_col1,
+    left_buttons,
+    keyboard_report_state,
+    timer,
+);
+
+

In practice the macro code is inlined like this:

+
pub fn read_col_1_pins(left_row0_col1: &mut LeftRow0Col1, left_row1_col1: &mut LeftRow1Col1, left_row2_col1: &mut LeftRow2Col1, left_row3_col1: &mut LeftRow3Col1, left_row4_col1: &mut LeftRow4Col1, left_buttons: &mut LeftButtons, keyboard_report_state: &mut KeyboardReportState, timer: Timer) -> bool {
+    let col = unsafe {
+        left_buttons.cols.1
+            .take().unwrap_unchecked()
+    };
+    let col = col.into_push_pull_output_in_state(PinState::Low);
+    crate::timer::wait_nanos(timer, 250);
+    let mut any_change = false;
+    {
+        if left_row0_col1.check_update_state(left_buttons.row_pin_is_low(rp2040_kbd_lib::matrix::RowIndex::from_value(0)), keyboard_report_state, timer) {
+            any_change = true;
+        }
+    }
+    {
+        if left_row1_col1.check_update_state(left_buttons.row_pin_is_low(rp2040_kbd_lib::matrix::RowIndex::from_value(1)), keyboard_report_state, timer) {
+            any_change = true;
+        }
+    }
+    {
+        if left_row2_col1.check_update_state(left_buttons.row_pin_is_low(rp2040_kbd_lib::matrix::RowIndex::from_value(2)), keyboard_report_state, timer) {
+            any_change = true;
+        }
+    }
+    {
+        if left_row3_col1.check_update_state(left_buttons.row_pin_is_low(rp2040_kbd_lib::matrix::RowIndex::from_value(3)), keyboard_report_state, timer) {
+            any_change = true;
+        }
+    }
+    {
+        if left_row4_col1.check_update_state(left_buttons.row_pin_is_low(rp2040_kbd_lib::matrix::RowIndex::from_value(4)), keyboard_report_state, timer) {
+            any_change = true;
+        }
+    }
+    left_buttons.cols.1
+        = Some(col.into_pull_up_input());
+    {
+        while left_buttons.row_pin_is_low(rp2040_kbd_lib::matrix::RowIndex::from_value(0)) {}
+    }
+    {
+        while left_buttons.row_pin_is_low(rp2040_kbd_lib::matrix::RowIndex::from_value(1)) {}
+    }
+    {
+        while left_buttons.row_pin_is_low(rp2040_kbd_lib::matrix::RowIndex::from_value(2)) {}
+    }
+    {
+        while left_buttons.row_pin_is_low(rp2040_kbd_lib::matrix::RowIndex::from_value(3)) {}
+    }
+    {
+        while left_buttons.row_pin_is_low(rp2040_kbd_lib::matrix::RowIndex::from_value(4)) {}
+    }
+    any_change
+}
+
+

There is no access by index for the pins here, they are manually checked one-by-one.

+

Performance summary

+

In the end I took 4 measurements on the left side:

+
    +
  1. Scan latency +
  2. Change originating from left scan loop latency +
  3. Change originating from right scan loop latency +
  4. Inter-core message queue capacity +
+

And 3 on the right:

+
    +
  1. Scan latency +
  2. Change loop latency +
  3. Inter-core message queue capacity +
+

The scan latency has been talked about, it ended up at about 20μs after optimizations, +that is, each pin is checked every 20μs if the keyboard is idle (on both sides).

+

Changes originating from the left measures the loop latency, the time it takes before discovering a change +to completely processing it, when a change comes from the left side gpio pins. That landed on about +60μs. In other words, from starting to check for changes, to discovering and handling a change is +60μs.

+

Changes originating from the right measures the same as above but from the right side, that takes about +70μs.

+

Inter-core message queue capacity sits firmly at 0 on both sides, even though the consumer-core writes messages to oled, +it doesn't get overwhelmed.

+

On the right-side the latency on changes is only 25μs however, since the +left side handles all the logic contained in the keymap, this makes sense.

+

Rough calculation of worst case latency

+

This means that the keyboard should at most add a 70μs latency overhead from the left, and 25μs on the right, +and be able to detect a change lasting for 20μs or more on both sides.

+

The transfer rate between sides is set by the uart baud-rate which is 781 250 bits per second.
+This calculates to 10.24μs per byte sent, all messages sent are at most 1 byte.

+
+

Edit 2040-04-17

+

I changed to protocol to be two bytes for robustness, but updated the baud-rate to 20x.

+

This puts one message at 1.024μs of latency with better robustness.

+
+

Worst case scenario should therefore be the os_poll_latency + left_side_right_change_latency + right_side_latency + transfer_latency, +which would be 1000μs + 70μs + 25μs + 10μs = 1105μs, when a single key is pressed on the right side,os_poll_latency + left_side_left_change_latency = 1060μs on the left.

+

Caveat

+

This only holds for single presses, if the keymap outputs sequences like when I press ^, on eu keyboards +that needs a second press to activate, so that you can send symbols like â. However, I don't do that, I want ^ +to go out immediately so when ^ is pressed, I send KeyDown ^ + KeyUp ^ + KeyDown ^ +which makes the os-latency alone be 3000μs.

+

End

+

This has been my longest writeup yet, it was my first real foray into embedded development, and it ended with +me writing this on a keyboard running my own firmware.

+

There's still stuff to iron out with the keymap, but I'm really happy with the result.
+The firmware is fast and works, the two things that I care about, the code can be found here.

+

Thoughts on QMK

+

I went on a bit of a rant on QMK, but it's a great robust codebase, it could probably be reimplemented in Rust +if one really wanted to, but it seems unnecessary, and my firmware does not at all attempt to do it.
+Mostly the macro-parts would need some thinking over, because the way I did keymaps were a real mess of boilerplate-code +that is not nice to work with.

+
+
\ No newline at end of file diff --git a/rust-linux-kernel-module.html b/rust-linux-kernel-module.html new file mode 100644 index 0000000..bcd952b --- /dev/null +++ b/rust-linux-kernel-module.html @@ -0,0 +1,1004 @@ + + + + + + + + + RustLinuxKernelModule + + + +
+

Rust for Linux, how hard is it to write a Kernel module in Rust at present?

+

Once again I'm back on parental leave, I've been lazily following the Rust for Linux +effort but finally decided to get into it and write a simple kernel module in Rust.

+

Contents

+

This write-up is about writing a kernel module in Rust which will expose a file under /proc/rust-proc-file, +the file is going to function as a regular file, but backed by pure ram.

+

It'll go through zero-cost abstractions and how one can safely wrap unsafe extern "C" fn's hiding away +the gritty details of C-API's.

+

It'll also go through numerous way of causing and avoiding UB, as well as some kernel internals.

+

Objective

+

I've been a Linux user for quite a while but have never tried my hand at contributing to the codebase, +the reason is that I generally spend my free time writing things that I myself would use. Having +that as a guide leads to me finishing my side-projects. There hasn't been something that I've wanted or needed +that I've been unable to implement in user-space, so it just hasn't happened.

+

Sadly, that's still the case, so I had to contrive something: A proc-file that works just like a regular file.

+

The /proc Filesystem

+

The stated purpose of the /proc filesystem is to "provide information about the running Linux System", read +more about it here.

+

On a Linux machine with the /proc filesystem you can find process information e.g. under /proc/<pid>/.., +like memory usage, mounts, cpu-usage, fd's, etc. With the above stated purpose, and how the /proc filesystem is +used, the purpose of this module doesn't quite fit, but for simplicity that's what I chose.

+

A proc 'file'

+

Proc files can be created by the kernels proc_fs-api, it lives here.

+

The function, proc_create, looks like this:

+
struct proc_dir_entry *proc_create(const char *name, umode_t mode, struct proc_dir_entry *parent, const struct proc_ops *proc_ops);
+
+

When properly invoked it will create a file under /proc/<name> (if no parent is provided).

+

That file is an interface to the kernel, a pseudo-file where the user interacts with it as a regular file on one end, +and the kernel provides handlers for regular file-functionality on the other end (like open, read, write, lseek, +etc.).

+

That interface is provided through the last argument ...,proc_ops *proc_ops);...

+

proc_ops is a struct defined like this:

+
struct proc_ops {
+	unsigned int proc_flags;
+	int	(*proc_open)(struct inode *, struct file *);
+	ssize_t	(*proc_read)(struct file *, char __user *, size_t, loff_t *);
+	ssize_t (*proc_read_iter)(struct kiocb *, struct iov_iter *);
+	ssize_t	(*proc_write)(struct file *, const char __user *, size_t, loff_t *);
+	/* mandatory unless nonseekable_open() or equivalent is used */
+	loff_t	(*proc_lseek)(struct file *, loff_t, int);
+	int	(*proc_release)(struct inode *, struct file *);
+	__poll_t (*proc_poll)(struct file *, struct poll_table_struct *);
+	long	(*proc_ioctl)(struct file *, unsigned int, unsigned long);
+#ifdef CONFIG_COMPAT
+	long	(*proc_compat_ioctl)(struct file *, unsigned int, unsigned long);
+#endif
+	int	(*proc_mmap)(struct file *, struct vm_area_struct *);
+	unsigned long (*proc_get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
+} __randomize_layout;
+
+

proc_open

+

When a user tries to open the proc-file, the handler int (*proc_open)(struct inode *, struct file *); +will be invoked.

+

A perfectly functional C-implementation of that, in the case that no work needs to be done specifically when a +user invokes open is:

+
int proc_open(struct inode *inode, struct file *file)
+{
+	return 0;
+}
+
+

It just returns 0 for success.

+

There are cases where one would like to do something when the file is opened, in that case, +the *file pointer could be modified, for example by editing the void *private_data-field to add some data +that will follow the file into its coming operations. +Read some more about the file structure here, +or check out its definition here.

+

proc_read

+

Now it's getting into some logic, when a user wants to read from the file +it provides a buffer and an offset pointer, the signature looks like +this:

+
ssize_t	proc_read(struct file *f, char __user *buf, size_t buf_len, loff_t *offset);
+
+

Again there's the file structure-pointer which could contain data that +was put there in an open-implementation, as well as a suspiciously annotated +char __user *buf.

+

The kernel should write data into the user buffer, return the number of +bytes written, and update the offset through the pointer.

+

proc_write

+

When a user tries to write to the file, it enters through proc_write.

+

Which looks like this:

+
ssize_t	(*proc_write)(struct file *f, const char __user *buf, size_t buf_len, loff_t *offset);
+
+

The user provides the buffer it wants to write into the file along with its length, and +a pointer to update the offset. Again suspiciously annotating the buffer with __user.

+

The kernel should write data from the user buffer into the backing storage.

+

proc_lseek

+

Lastly, if the file is to be seekable to an offset proc_lseek +has to be implemented.

+

It's signature looks like this:

+
loff_t (*proc_lseek)(struct file *f, loff_t offset, int whence);
+
+

Again the file is provided, the offset to seek to, and whence to seek, +whence is an int which should have one of 5 values, those are described +more in the docs here, +the most intuitive one is SEEK_SET which means that the file's offset should +be set to the offset that the user provided.

+

Assuming that the offset makes sense, the kernel should return the new offset.

+

Implementing it in Rust

+

That's it, with those 4 functions implemented there should be a fairly complete working +file created when they're passed as members of the proc_ops-struct, time to start!

+

Generating bindings

+

Rust for Linux uses Rust-bindings generated from the kernel headers. +They're conveniently added when building, as long as the correct headers are +added here, +for this module only proc_fs.h is needed.

+

unsafe extern "C" fn

+

Since Rust is compatible with C by jumping through some hoops, +theoretically the module could be implemented by just using the C-api +directly as-is through the functions provided by the bindings.

+

The power of Rust is being able to take unsafe code and make safe abstractions +on top of them. But, it's a good start to figure out how the API's work.

+

The generated rust functions-pointer-definitions look like this:

+
unsafe extern "C" fn proc_open(
+        inode: *mut kernel::bindings::inode,
+        file: *mut kernel::bindings::file,
+    ) -> i32 {
+    ...
+}
+unsafe extern "C" fn proc_read(
+    file: *mut kernel::bindings::file,
+    buf: *mut core::ffi::c_char,
+    buf_cap: usize,
+    read_offset: *mut kernel::bindings::loff_t,
+) -> isize {
+    ...
+}
+unsafe extern "C" fn proc_write(
+    file: *mut kernel::bindings::file,
+    buf: *const core::ffi::c_char,
+    buf_cap: usize,
+    write_offset: *mut kernel::bindings::loff_t,
+) -> isize {
+    ...
+}
+unsafe extern "C" fn proc_lseek(
+    file: *mut kernel::bindings::file,
+    offset: kernel::bindings::loff_t,
+    whence: core::ffi::c_int,
+) -> kernel::bindings::loff_t {
+    ...
+}
+
+

One key difference between these C-style function declarations and something +like Rust's Fn-traits +is that these function cannot capture any state.

+

This necessitates using global-static for persistent state that has to +be shared between user-calls into the proc-file. +(For modifications that do not have to be shared or persisted after the interaction ends +, the file's private data could be used).

+

Another key difference is that that pesky __user-annotation is finally gone, let's not +think more about that, the problem solved itself.

+

Abstraction

+

As mentioned previously, one key-point of rust is being able to +abstract away unsafety, ideally an API would consist of Rust function-signatures +containing refernces instead of C-style function-signatures containing raw pointers, +it's a bit tricky, but it can be done.

+

Here's an example of how to do the conversion in a way with zero-cost:

+

Without any conversion, calling a rust-function within a C-style function:

+
fn rust_fn() -> i32 {
+    std::hint::black_box(5) * std::hint::black_box(15)
+}
+pub unsafe extern "C" fn my_callback2() -> i32 {
+    rust_fn()
+}
+pub fn main() -> i32{
+    unsafe {
+        my_callback2()
+    }
+}
+
+

This allows the user to define rust_fn, and then wrap it with C-style function.

+

Through godbolt it produces this assembly:

+
example::my_callback2::h381eee3be316e700:
+        mov     dword ptr [rsp - 8], 5
+        lea     rax, [rsp - 8]
+        mov     eax, dword ptr [rsp - 8]
+        mov     dword ptr [rsp - 4], 15
+        lea     rcx, [rsp - 4]
+        imul    eax, dword ptr [rsp - 4]
+        ret
+example::main::h11eebe12cad5e117:
+        mov     dword ptr [rsp - 8], 5
+        lea     rax, [rsp - 8]
+        mov     eax, dword ptr [rsp - 8]
+        mov     dword ptr [rsp - 4], 15
+        lea     rcx, [rsp - 4]
+        imul    eax, dword ptr [rsp - 4]
+        ret
+
+

The above shows that the entire function my_callback2 was inlined +into main, a zero-cost abstraction should produce the same code, +so any abstraction should produce the same assembly.

+

Here is an example of such an abstraction:

+
+fn rust_fn() -> i32 {
+    std::hint::black_box(5) * std::hint::black_box(15)
+}
+pub trait MyTrait<'a> {
+    const CALLBACK_1: &'a dyn Fn() -> i32;
+}
+pub struct MyStruct;
+impl<'a> MyTrait<'a> for MyStruct {
+    const CALLBACK_1: &'a dyn Fn() -> i32 = &rust_fn;
+}
+pub struct Container<'a, T>(core::marker::PhantomData<&'a T>);
+impl<'a, T> Container<'a, T> where T: MyTrait<'a> {
+    pub unsafe extern "C" fn proxy_callback() -> i32 {
+        T::CALLBACK_1()
+    }
+}
+pub fn main() -> i32 {
+    unsafe {
+        Container::<'_, MyStruct>::proxy_callback()
+    }
+}
+
+

Which produces this assembly:

+
example::main::h11eebe12cad5e117:
+        mov     dword ptr [rsp - 8], 5
+        lea     rax, [rsp - 8]
+        mov     eax, dword ptr [rsp - 8]
+        mov     dword ptr [rsp - 4], 15
+        lea     rcx, [rsp - 4]
+        imul    eax, dword ptr [rsp - 4]
+        ret
+
+

Again, the entire function was inlined, even though a dyn-trait is used +the compiler can figure out that it should/can be inlined.

+

This may seem a bit useless, since the only difference between the pre- and post-abstraction +code is having the function connected to a struct, but using that better abstractions can be provided.

+

Better function signatures

+

Looking again at the function pointer that will be invoked for lseek:

+
unsafe extern "C" fn proc_lseek(
+    file: *mut kernel::bindings::file,
+    offset: kernel::bindings::loff_t,
+    whence: core::ffi::c_int,
+) -> kernel::bindings::loff_t {
+    ...
+}
+
+

It can be described as a pure-rust-function like this:

+
fn proc_lseek(file: *mut kernel::bindings::file,
+    offset: kernel::bindings::loff_t,
+    whence: core::ffi::c_int) -> kernel::bindings::loff_t;
+
+

Or even better like this:

+
/// lseek valid variants [See the lseek docs for more detail](https://man7.org/linux/man-pages/man2/lseek.2.html)
+#[repr(u32)]
+pub enum Whence {
+    /// See above doc link
+    SeekSet = kernel::bindings::SEEK_SET,
+    /// See above doc link
+    SeekCur = kernel::bindings::SEEK_CUR,
+    /// See above doc link
+    SeekEnd = kernel::bindings::SEEK_END,
+    /// See above doc link
+    SeekData = kernel::bindings::SEEK_DATA,
+    /// See above doc link
+    SeekHole = kernel::bindings::SEEK_HOLE,
+}
+impl TryFrom<u32> for Whence {
+    type Error = kernel::error::Error;
+    fn try_from(value: u32) -> core::result::Result<Self, Self::Error> {
+        Ok(match value {
+            kernel::bindings::SEEK_SET => Self::SeekSet,
+            kernel::bindings::SEEK_CUR => Self::SeekCur,
+            kernel::bindings::SEEK_END => Self::SeekEnd,
+            kernel::bindings::SEEK_DATA => Self::SeekData,
+            kernel::bindings::SEEK_HOLE => Self::SeekHole,
+            _ => return Err(EINVAL),
+        })
+    }
+}
+fn proc_lseek(file: *mut kernel::bindings::file,
+    offset: kernel::bindings::loff_t,
+    whence: Whence) -> kernel::bindings::loff_t;
+
+

Or even better, since even though the bindings specify a *mut, converting that to a mutable reference +is likely going to cause UB, but converting it to +an immutable reference should be safe.

+
fn proc_lseek(file: &kernel::bindings::file,
+    offset: kernel::bindings::loff_t,
+    whence: Whence) -> kernel::bindings::loff_t;
+
+

Making a safer abstraction over the bindings struct file would be even better, but deemed out of scope, +the rust-api now communicates that lseek takes a reference to a file that should not be mutated +(it can safely be mutated with synchronization, again out of scope), an offset, and a Whence-enum which +can only be one of 5 types.

+

However, something needs to wrap this Rust-function, validate that Whence can be converted from the provided int +from the C-style function, and check that the file-pointer is non-null, and turn it into a reference.

+

Here's an example of how that could look:

+
/// Raw C-entrypoint
+unsafe extern "C" fn proc_lseek(
+    file: *mut kernel::bindings::file,
+    offset: kernel::bindings::loff_t,
+    whence: core::ffi::c_int,
+) -> kernel::bindings::loff_t {
+    // Take the `c_int` and Convert to a `Whence`-enum, return an error if invalid
+    let Ok(whence_u32) = u32::try_from(whence) else {
+        return EINVAL.to_errno().into();
+    };
+    let Ok(whence) = Whence::try_from(whence_u32) else {
+        return EINVAL.to_errno().into();
+    };
+    // Take the file-pointer, convert to a reference if not null
+    let file_ref = unsafe {
+        let Some(file_ref) = file.as_ref() else {
+            return EINVAL.to_errno().into();
+        };
+        file_ref
+    };
+    // Execute the rust-function `T:LSEEK` with the converted arguments, and return the result, or error as an errno
+    match (T::LSEEK)(file_ref, offset, whence) {
+        core::result::Result::Ok(offs) => offs,
+        core::result::Result::Err(e) => {
+            return e.to_errno().into();
+        }
+    }
+}
+
+

The T::LSEEK comes from a generic bound, as with the minimal example, this function-pointer comes from +a struct, which is bounded on a struct implementing a trait.

+

The definition of the generated proc_ops looks like this:

+
pub struct proc_ops {
+    pub proc_flags: core::ffi::c_uint,
+    pub proc_open: ::core::option::Option<
+        unsafe extern "C" fn(arg1: *mut inode, arg2: *mut file) -> core::ffi::c_int,
+    >,
+    pub proc_read: ::core::option::Option<
+        unsafe extern "C" fn(
+            arg1: *mut file,
+            arg2: *mut core::ffi::c_char,
+            arg3: usize,
+            arg4: *mut loff_t,
+        ) -> isize,
+    >,
+    pub proc_read_iter: ::core::option::Option<
+        unsafe extern "C" fn(arg1: *mut kiocb, arg2: *mut iov_iter) -> isize,
+    >,
+    pub proc_write: ::core::option::Option<
+        unsafe extern "C" fn(
+            arg1: *mut file,
+            arg2: *const core::ffi::c_char,
+            arg3: usize,
+            arg4: *mut loff_t,
+        ) -> isize,
+    >,
+    pub proc_lseek: ::core::option::Option<
+        unsafe extern "C" fn(arg1: *mut file, arg2: loff_t, arg3: core::ffi::c_int) -> loff_t,
+    >,
+    pub proc_release: ::core::option::Option<
+        unsafe extern "C" fn(arg1: *mut inode, arg2: *mut file) -> core::ffi::c_int,
+    >,
+    pub proc_poll: ::core::option::Option<
+        unsafe extern "C" fn(arg1: *mut file, arg2: *mut poll_table_struct) -> __poll_t,
+    >,
+    pub proc_ioctl: ::core::option::Option<
+        unsafe extern "C" fn(
+            arg1: *mut file,
+            arg2: core::ffi::c_uint,
+            arg3: core::ffi::c_ulong,
+        ) -> core::ffi::c_long,
+    >,
+    pub proc_compat_ioctl: ::core::option::Option<
+        unsafe extern "C" fn(
+            arg1: *mut file,
+            arg2: core::ffi::c_uint,
+            arg3: core::ffi::c_ulong,
+        ) -> core::ffi::c_long,
+    >,
+    pub proc_mmap: ::core::option::Option<
+        unsafe extern "C" fn(arg1: *mut file, arg2: *mut vm_area_struct) -> core::ffi::c_int,
+    >,
+    pub proc_get_unmapped_area: ::core::option::Option<
+        unsafe extern "C" fn(
+            arg1: *mut file,
+            arg2: core::ffi::c_ulong,
+            arg3: core::ffi::c_ulong,
+            arg4: core::ffi::c_ulong,
+            arg5: core::ffi::c_ulong,
+        ) -> core::ffi::c_ulong,
+    >,
+}
+
+

It's a struct containing a bunch of optional function-pointers. Here's what it looks after abstracting most of the C-parts away +(only implementing open, read, write, and lseek).

+
/// Type alias for open function signature
+pub type ProcOpen<'a> = &'a dyn Fn(&inode, &file) -> Result<i32>;
+/// Type alias for read function signature
+pub type ProcRead<'a> = &'a dyn Fn(&file, UserSliceWriter, &loff_t) -> Result<(usize, usize)>;
+/// Type alias for write function signature
+pub type ProcWrite<'a> = &'a dyn Fn(&file, UserSliceReader, &loff_t) -> Result<(usize, usize)>;
+/// Type alias for lseek function signature
+pub type ProcLseek<'a> = &'a dyn Fn(&file, loff_t, Whence) -> Result<loff_t>;
+/// Proc file ops handler
+pub trait ProcHandler<'a> {
+    /// Open handler
+    const OPEN: ProcOpen<'a>;
+    /// Read handler
+    const READ: ProcRead<'a>;
+    /// Write handler
+    const WRITE: ProcWrite<'a>;
+    /// Lseek handler
+    const LSEEK: ProcLseek<'a>;
+}
+/// Wrapper for the kernel type `proc_ops`
+/// Roughly a translation of the expected `extern "C"`-function pointers that
+/// the kernel expects into Rust-functions with a few more helpful types.
+pub struct ProcOps<'a, T>
+where
+    T: ProcHandler<'a>,
+{
+    ops: bindings::proc_ops,
+    _pd: PhantomData<&'a T>,
+}
+impl<'a, T> ProcOps<'a, T>
+where
+    T: ProcHandler<'a>,
+{
+    /// Create new ProcOps from a handler and flags
+    pub const fn new(proc_flags: u32) -> Self {
+        Self {
+            ops: proc_ops {
+                proc_flags,
+                proc_open: Some(ProcOps::<'a, T>::proc_open),
+                proc_read: Some(ProcOps::<'a, T>::proc_read),
+                proc_read_iter: None,
+                proc_write: Some(ProcOps::<'a, T>::proc_write),
+                proc_lseek: Some(ProcOps::<'a, T>::proc_lseek),
+                proc_release: None,
+                proc_poll: None,
+                proc_ioctl: None,
+                proc_compat_ioctl: None,
+                proc_mmap: None,
+                proc_get_unmapped_area: None,
+            },
+            _pd: PhantomData,
+        }
+    }
+    unsafe extern "C" fn proc_open(
+        inode: *mut kernel::bindings::inode,
+        file: *mut kernel::bindings::file,
+    ) -> i32 {
+        ...
+    }
+    unsafe extern "C" fn proc_read(
+        file: *mut kernel::bindings::file,
+        buf: *mut core::ffi::c_char,
+        buf_cap: usize,
+        read_offset: *mut kernel::bindings::loff_t,
+    ) -> isize {
+        ...
+    }
+    unsafe extern "C" fn proc_write(
+        file: *mut kernel::bindings::file,
+        buf: *const core::ffi::c_char,
+        buf_cap: usize,
+        write_offset: *mut kernel::bindings::loff_t,
+    ) -> isize {
+        ...
+    }
+    unsafe extern "C" fn proc_lseek(
+        file: *mut kernel::bindings::file,
+        offset: kernel::bindings::loff_t,
+        whence: core::ffi::c_int,
+    ) -> kernel::bindings::loff_t {
+        ...
+    }
+}
+
+

Some details are elided for brevity, the above code defines a trait ProcHandler, which contains +constants for each of the functions to be provided. Those constants are 'static-references to rust functions.

+

Then it defines the ProcOps-struct, which is generic over ProcHandler, it defines the correct C-style +functions which do conversions and call the provided ProcHandler's '&static-functions and return their results.

+

Using this, the C-style proc_create function can get a Rust-abstraction taking that ProcOps-struct:

+
/// Create a proc entry with the filename `name`
+pub fn proc_create<'a, T>(
+    name: &'static kernel::str::CStr,
+    mode: bindings::umode_t,
+    dir_entry: Option<&ProcDirEntry<'a>>,
+    proc_ops: &'a ProcOps<'a, T>,
+) -> Result<ProcDirEntry<'a>>
+where
+    T: ProcHandler<'a>,
+{
+    // ProcOps contains the c-style struct, give the kernel a pointer to the address of that struct
+    let pops = core::ptr::addr_of!(proc_ops.ops);
+    let pde = unsafe {
+        let dir_ent = dir_entry
+            .map(|de| de.ptr.as_ptr())
+            .unwrap_or_else(core::ptr::null_mut);
+        bindings::proc_create(
+            name.as_ptr() as *const core::ffi::c_char,
+            mode,
+            dir_ent,
+            pops,
+        )
+    };
+    match core::ptr::NonNull::new(pde) {
+        None => Err(ENOMEM),
+        Some(nn) => Ok(ProcDirEntry {
+            ptr: nn,
+            _pd: core::marker::PhantomData::default(),
+        }),
+    }
+}
+
+

Getting to work

+

Now it's time to use the abstraction, it looks like this:

+
struct ProcHand;
+/// Implement `ProcHandler`, providing static references to rust-functions
+impl ProcHandler<'static> for ProcHand {
+    const OPEN: kernel::proc_fs::ProcOpen<'static> = &popen;
+    const READ: kernel::proc_fs::ProcRead<'static> = &pread;
+    const WRITE: kernel::proc_fs::ProcWrite<'static> = &pwrite;
+    const LSEEK: kernel::proc_fs::ProcLseek<'static> = &plseek;
+}
+#[inline]
+fn popen(_inode: &kernel::bindings::inode, _file: &kernel::bindings::file) -> Result<i32> {
+    Ok(0)
+}
+fn pread(
+    _file: &kernel::bindings::file,
+    mut user_slice: UserSliceWriter,
+    offset: &kernel::bindings::loff_t,
+) -> Result<(usize, usize)> {
+    ...
+}
+fn pwrite(
+    file: &kernel::bindings::file,
+    user_slice_reader: UserSliceReader,
+    offset: &kernel::bindings::loff_t,
+) -> Result<(usize, usize)> {
+    ...
+}
+fn plseek(
+    file: &kernel::bindings::file,
+    offset: kernel::bindings::loff_t,
+    whence: Whence,
+) -> Result<kernel::bindings::loff_t> {
+    ...
+}
+
+

Oh right, the __user-part.

+

On the first iterations of this module I conveniently ignored it, when the kernel is passed a buffer from a user +that is marked __user, it needs to copy that memory from the user to be able to use it, it can't directly read from +the provided buffer. The same goes for writing, it needs to copy memory into the buffer, it can't just directly use +the buffer.

+

On the C-side, this is done by the functions exposed by linux/uaccess.h +copy_from_user +and copy_to_user.

+

The functions will:

+
    +
  1. Check if the operation should fault, a bit complicated and I don't fully understand where faults may be injected, +but the documentation is here. +
  2. Check that the memory is a valid user space address +
  3. Check that the object has space to be written into/read from a valid address (no OOB reads into memory the user +doesn't have access to). +
  4. Do the actual copying +
+

The Rust kernel code fairly conveniently wraps this into an api here.

+

The api is used in the wrapper for PropOps, it looks like this:

+
unsafe extern "C" fn proc_read(
+    file: *mut kernel::bindings::file,
+    buf: *mut core::ffi::c_char,
+    buf_cap: usize,
+    read_offset: *mut kernel::bindings::loff_t,
+) -> isize {
+    ...
+    let buf = buf as *mut u8 as usize;
+    let buf_ref = UserSlice::new(buf, buf_cap);
+    let buf_writer = buf_ref.writer();
+    ...
+    match (T::READ)(file_ref, buf_writer, offset) {
+        ...
+    }
+}
+
+

The code takes the raw buf-ptr which lost its __user-annotation through bindgen, turns it into +a raw address, and makes a UserSlice out of it, it then turns that slice into a UserSliceWriter (the user reads +data, then the kernel needs to write data), and passes that into the module's supplied READ-function. +Which again, has a signature that looks like this:

+
pub type ProcRead<'a> = &'a dyn Fn(&file, UserSliceWriter, &loff_t) -> Result<(usize, usize)>;
+
+

Writing the module

+

The module is defined by this convenient module!-macro:

+
struct RustProcRamFile;
+module! {
+    type: RustProcRamFile,
+    name: "rust_proc_ram_file",
+    author: "Rust for Linux Contributors",
+    description: "Rust proc ram file example",
+    license: "GPL",
+}
+
+

Most of that is metadata. But, the name will be the same name that can be modprobe'd +to load the module, e.g. modprobe rust_proc_ram_file.

+

All that remains is implementing kernel::Module for RustProcRamFile, which is an arbitrary struct to represent +module data.

+
impl kernel::Module for RustProcRamFile {
+    fn init(_module: &'static ThisModule) -> Result<Self> {
+        // Initialization-code
+        ...
+        Self
+    }
+}
+
+

On hitch is that the module needs to be safe for concurrent access, it needs to be both Send + Sync.

+

Remembering that the objective is to build a file that is backed by just bytes (a Vec<u8> being most convenient), +creating a RustProcRamFile(Vec<u8>) won't cut it.

+

There's a need for shared mutable state and that's where this gets tricky.

+

Mutex

+

One of the simplest ways of creating (simplest by mental model at least) is by wrapping the state with a mutual-exclusion +lock, a Mutex.

+

Through the Kernel's C-API it's trivial to do that statically.

+
static DEFINE_MUTEX(my_mutex);
+
+

It statically defines a mutex (definition here) +which can be interacted with, by e.g. +mutex_lock, +mutex_unlock, +etc.

+

In Rust-land there's a safe API for creating mutexes, it looks like this:

+
let pin_init_lock = kernel::new_mutex!(Some(data), "proc_ram_mutex");
+
+

pin_init_lock is something that implements PinInit, +the most important function of which is __pinned_init(self, slot: *mut T)
+which takes uninitialized memory that fits a T and initializes the variable there.

+

For reasons that will become clearer later, the mutex will be initialized into static memory.

+

Finally, to initialize the data that the file will be backed by, the code looks like this:

+
mod backing_data {
+    use core::cell::UnsafeCell;
+    use kernel::sync::lock::{mutex::MutexBackend, Lock};
+    use super::*;
+    static mut MAYBE_UNUNIT_DATA_SLOT: MaybeUninit<Mutex<Option<alloc::vec::Vec<u8>>>> =
+        MaybeUninit::uninit();
+    ...
+    /// Initialize the backing data of this module, letting new
+    /// users access it.
+    /// # Safety
+    /// Safe if only called once during the module's lifetime
+    pub(super) unsafe fn init_data(
+        lock_ready: impl PinInit<Lock<Option<alloc::vec::Vec<u8>>, MutexBackend>>,
+    ) -> Result<()> {
+        unsafe {
+            let slot = MAYBE_UNUNIT_DATA_SLOT.as_mut_ptr();
+            lock_ready.__pinned_init(slot)?;
+        }
+        Ok(())
+    }
+    ...
+    /// Get's the initialized data as a static reference
+    /// # Safety
+    /// Safe only if called after initialization, otherwise
+    /// it will return a pointer to uninitialized memory.  
+    pub(super) unsafe fn get_initialized_data() -> &'static Mutex<Option<alloc::vec::Vec<u8>>> {
+        unsafe { MAYBE_UNUNIT_DATA_SLOT.assume_init_ref() }
+    }
+    ...
+}
+impl kernel::Module for RustProcRamFile {
+    fn init(_module: &'static ThisModule) -> Result<Self> {
+        ...
+        let data = alloc::vec::Vec::new();
+        let lock = kernel::new_mutex!(Some(data), "proc_ram_mutex");
+        unsafe {
+            // Safety: Only place this is called, has to be invoked before `proc_create`
+            backing_data::init_data(lock)?
+        }
+        ...
+    }
+}
+
+

That's quite a lot.

+

First off, the static mut MAYBE_UNUNIT_DATA_SLOT: MaybeUninit<Mutex<Option<alloc::vec::Vec<u8>>>> = MaybeUninit::uninit(); +creates static uninitialized memory, that's represented by the MaybeUninit. +The memory has space for a Mutex containing an Option<alloc::vec::Vec<u8>>.

+

The reason for having the inner data be Option is to be able to remove it on module-unload and properly cleaning it up. +The Drop-code will show how that cleanup works in more detail, and it's likely a bit pedantic.

+

Second, in the module's init, a Vec is created, and put into a PinInit -> Mutex that needs memory for initialization.
+That PinInit is passed to init_data which takes a pointer to the static memory MAYBE_UNUNIT_DATA_SLOT and writes +the mutex into it.

+

Now There's an initialized Mutex.

+

Storing the ProcDirEntry

+

Now a proc_create can be called which will create a proc-file.

+
mod backing_data {
+    ...
+    struct SingleAccessPdeStore(UnsafeCell<Option<ProcDirEntry<'static>>>);
+    unsafe impl Sync for SingleAccessPdeStore {}
+    static ENTRY: SingleAccessPdeStore = SingleAccessPdeStore(UnsafeCell::new(None));
+    ...
+    /// Write PDE into static memory
+    /// # Safety
+    /// Any concurrent access is unsafe.  
+    pub(super) unsafe fn set_pde(pde: ProcDirEntry<'static>) {
+        unsafe {
+            ENTRY.0.get().write(Some(pde));
+        }
+    }
+    /// Remove the PDE
+    /// # Safety
+    /// While safe to invoke regardless of PDE initalization,
+    /// any concurrent access is unsafe.  
+    pub(super) unsafe fn take_pde() -> Option<ProcDirEntry<'static>> {
+        unsafe {
+            let mut_ref = ENTRY.0.get().as_mut()?;
+            mut_ref.take()
+        }
+    }
+}
+fn init(_module: &'static ThisModule) -> Result<Self> {
+        const POPS: ProcOps<'static, ProcHand> = ProcOps::<'static, ProcHand>::new(0);
+        // Struct defined inline since this is the only safe place for it to be used
+        struct ProcHand;
+        impl ProcHand {
+            ...
+        }
+        let data = alloc::vec::Vec::new();
+        let lock = kernel::new_mutex!(Some(data), "proc_ram_mutex");
+        unsafe {
+            // Safety: Only place this is called, has to be invoked before `proc_create`
+            backing_data::init_data(lock)?
+        }
+        // This is technically unsound, e.g. READ is not safe to invoke until
+        // `init_data` has been called, but could theoretically be invoked in a safe context before
+        // then, so don't, it's ordered like this for a reason.
+        impl ProcHandler<'static> for ProcHand {
+            const OPEN: kernel::proc_fs::ProcOpen<'static> = &Self::popen;
+            const READ: kernel::proc_fs::ProcRead<'static> =
+                &|f, u, o| unsafe { Self::pread(f, u, o) };
+            const WRITE: kernel::proc_fs::ProcWrite<'static> =
+                &|f, u, o| unsafe { Self::pwrite(f, u, o) };
+            const LSEEK: kernel::proc_fs::ProcLseek<'static> =
+                &|f, o, w| unsafe { Self::plseek(f, o, w) };
+        }
+        let pde = proc_create(c_str!("rust-proc-file"), 0666, None, &POPS)?;
+        unsafe {
+            // Safety: Only place this is called, no concurrent access
+            backing_data::set_pde(pde);
+        }
+        pr_info!("Loaded /proc/rust-proc-file\n");
+        Ok(Self)
+    }
+
+

That's also quite a lot.

+

Now the code is encountering issues with unsoundness (an API that is not marked as unsafe but is unsafe under some conditions).

+

Starting from the top:

+

Calling proc_create returns a ProcDirEntry which when dropped removes the proc-file. The entry should be kept alive +until the module is dropped. Therefore, a static variable ENTRY is created to house it, it will get removed on +the module's Drop.

+

static-entries need to be Sync i.e. it can be shared between threads, +UnsafeCell is not Sync, it therefore needs to be wrapped in the newtype +SingleAccessPdeStore. It is indeed safe to be shared between threads in some conditions, so +Sync is unsafely implemented through:

+
unsafe impl Sync for SingleAccessPdeStore {}
+
+

It tells the compiler that even though it doesn't look Sync it should treat is as Sync. +(Sync and Send are examples of automatic trait implementations, if a struct contain types that all implement +Send and/or Sync, that struct will also implement Send or Sync, a bit more on that here).

+

Next comes two unsafe functions. One sets the ENTRY to a provided ProcDirEntry<'static>, +the operation is safe as long as it doesn't happen concurrently, that would create a data-race.

+

The other takes the ProcDirEntry from ENTRY, this is done on module teardown, when the module is unloaded, for example +through rmmod, rmmod rust_proc_ram_file.

+

Entering the init-function, there are struct definitions and trait-implementations defined inside the function.
+The reasons for this is to make some inherent unsoundness about the memory-lifecycle less dangerous, it's worth getting +into why that it is, and what the trade-offs of having some unsoundness is.

+

Memory lifecycle, you, me, and C

+

Again, the C-api looks like this:

+
struct proc_ops {
+	unsigned int proc_flags;
+	int	(*proc_open)(struct inode *, struct file *);
+	ssize_t	(*proc_read)(struct file *, char __user *, size_t, loff_t *);
+	ssize_t (*proc_read_iter)(struct kiocb *, struct iov_iter *);
+	ssize_t	(*proc_write)(struct file *, const char __user *, size_t, loff_t *);
+	/* mandatory unless nonseekable_open() or equivalent is used */
+	loff_t	(*proc_lseek)(struct file *, loff_t, int);
+	int	(*proc_release)(struct inode *, struct file *);
+	__poll_t (*proc_poll)(struct file *, struct poll_table_struct *);
+	long	(*proc_ioctl)(struct file *, unsigned int, unsigned long);
+#ifdef CONFIG_COMPAT
+	long	(*proc_compat_ioctl)(struct file *, unsigned int, unsigned long);
+#endif
+	int	(*proc_mmap)(struct file *, struct vm_area_struct *);
+	unsigned long (*proc_get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
+} __randomize_layout;
+struct proc_dir_entry *proc_create(const char *name, umode_t mode, struct proc_dir_entry *parent, const struct proc_ops *proc_ops);
+
+

So, the module needs to call the function proc_create supplying a pointer const struct proc_ops *proc_ops +which itself contains function pointers. What are the lifetime requirements?

+

const struct proc_ops *proc_ops has a requirement to live until proc_remove is called on the returned proc_dir_entry*, +that's easily represented in Rust, we could model the API to accept something with the lifetime 'a and return +a ProcDirEntry<'a>, taking ownership of the reference to ProcOps and calling proc_remove in the destructor.

+

But how long do the function pointers that are themselves contained in proc_ops need to live?

+

On could assume it's the same, 'a, but let's consider how the kernel 'routes' a user through the module and the +lifecycle of an interaction.

+
A user interaction
+

A user wants to open the file, by name.

+
    +
  1. The user issues the open syscall. +
  2. The kernel accepts the open syscall, and finds this *proc_dir_entry. +
  3. The kernel enters the proc_open-function. +
  4. The kernel sets the correct register return address value. +
  5. The kernel yields execution. +
+

The kernel handles two pointers from the module, non-atomically, in separate steps, multiple users could trigger +this interaction concurrently (the reason for the lock).

+

So, in the case that there exists a *proc_dir_entry but the proc_open-function pointer is +dangling, +because the lifetime of it is less than *proc_dir_entry, or they have the same lifetime but the mechanics of +the free happens in an unfavourable order. In that case, the kernel will try to access a dangling pointer, +which may or may not cause chaos. A dangling pointer is worse than a null-pointer in this case, since a +null-pointer is generally going to be acceptable.

+

In another case, the proc_dir_entry may definitively be removed first, but since some process may have read the +function pointer proc_open from it, but not started executing it (race) yet, proc_open can theoretically +never be safely destroyed. The reason for that is because in a time-sharing OS +no guarantees are made about the timeliness of operations. Therefore, the lifetime requirement of +proc_open is 'static as represented by:

+
...
+const OPEN: kernel::proc_fs::ProcOpen<'static> = &Self::popen;
+...
+
+
Constraints caused by 'static-lifetimes
+

Static (sloppily expressed) means 'for the duration of the program', if there's a 'static-requirement for a variable +it means that that variable needs its memory to be allocated in the binary.

+

An example would be a string literal

+
const MY_STR: &'static str = "hello";
+static MY_STR2: &'static str = "hello";
+// or 
+fn my_fn() {
+    let my_str = "hello";
+}
+
+

In all cases the string-literal exists in the binary, the difference between these cases are that in the +case of the const-variable some space is allocated in the binary that fits a reference to a str, +which may point to some data that exist in the data-section of the binary (or somewhere else, implementation dependent).
+const also dictates that this value may never change.

+

static also makes sure that the binary has space for the variable (still a reference to a string), it will also +point to some data that is likely to be in the data-section, but it is theoretically legal to change the data that +it's pointing to (with some constraints).

+

In the function, space is made available on the stack for the reference, but the actual hello is likely again in +the data-section.

+
Using static data for the backing storage
+

Looking back at the purpose of the module, data needs to be stored with a static lifetime, there are multiple ways +to achieve this in Rust, the data can be owned directly, like a member of the module RustProcRamFile. +However, this means that when the module is dropped, the data is dropped as well. Since the function-pointers +have a 'static-requirement that doesn't work.

+

Even if the data is wrapped in a Box, or an Arc, the RustProcRamFile-module can't own it for the above reason, +the functions needs to live for the duration of the program (and be valid), a global static is necessary (sigh).

+

Here is where the globals come in:

+
...
+static mut MAYBE_UNINIT_DATA_SLOT: MaybeUninit<Mutex<Option<alloc::vec::Vec<u8>>>> =
+        MaybeUninit::uninit();
+...
+static ENTRY: SingleAccessPdeStore = SingleAccessPdeStore(UnsafeCell::new(None));
+...
+const POPS: ProcOps<'static, ProcHand> = ProcOps::<'static, ProcHand>::new(0);
+
+

Comes in.

+

Looking at the definitions, two of these contain data that can (and will) be changed, those are therefore static, +one (the container of the functions that are passed through the C-api) is marked as const, since it will never change.

+

MAYBE_UNINIT_DATA_SLOT is MaybeUninit, so that when the program starts, there is already space made available in +the binary for the data it will contain, on module-initialization data will be written into that.

+

Same goes for Entry, UnsafeCell does essentially the same thing, there's a reason that both aren't wrapped by +UnsafeCell<Option>, partially performance.

+
MaybeUninit vs UnsafeCell<Option>
+

MaybeUninit contains potentially uninitialized data. +Accessing that data, by for example creating a reference to it, is UB if that data is not yet initialized.
+Which means that the requirements for safe-access is only possible if:

+
    +
  1. Non-modifying access happens after initialization. +
  2. Modifying access happens in a non-concurrent context. +
+

UnsafeCell<Option> does not contain potentially +uninitialized data, the uninitialized state is represented by the Option. +Safe access only requires that there is no concurrent access (of any kind) at the same time as mutable access. +It's a bit easier to make safe.

+

I would prefer UnsafeCell<Option<T>> in both cases, but as the PinInit-api is constructed (that is needed for +the Mutex), a slot of type T (being the Mutex) needs to be provided. Therefore, it would have to be +static UnsafeCell<Lock<..>> which cannot be instantiated at compile-time in the same way that an UnsafeCell<Option<T>> +can (static MY_VAR: UnsafeCell<Option<String>> = UnsafeCell::new(None) for example).

+

That is the reason why the variables look like they do.

+
Global POPS and an unsound API
+

Back again to POPS, the init-function and unsoundness:

+
fn init(_module: &'static ThisModule) -> Result<Self> {
+        const POPS: ProcOps<'static, ProcHand> = ProcOps::<'static, ProcHand>::new(0);
+        // Struct defined inline since this is the only safe place for it to be used
+        struct ProcHand;
+        impl ProcHand {
+            ...
+        }
+        let data = alloc::vec::Vec::new();
+        let lock = kernel::new_mutex!(Some(data), "proc_ram_mutex");
+        unsafe {
+            // Safety: Only place this is called, has to be invoked before `proc_create`
+            backing_data::init_data(lock)?
+        }
+        // This is technically unsound, e.g. READ is not safe to invoke until
+        // `init_data` has been called, but could theoretically be invoked in a safe context before
+        // then, so don't, it's ordered like this for a reason.
+        impl ProcHandler<'static> for ProcHand {
+            const OPEN: kernel::proc_fs::ProcOpen<'static> = &Self::popen;
+            const READ: kernel::proc_fs::ProcRead<'static> =
+                &|f, u, o| unsafe { Self::pread(f, u, o) };
+            const WRITE: kernel::proc_fs::ProcWrite<'static> =
+                &|f, u, o| unsafe { Self::pwrite(f, u, o) };
+            const LSEEK: kernel::proc_fs::ProcLseek<'static> =
+                &|f, o, w| unsafe { Self::plseek(f, o, w) };
+        }
+        let pde = proc_create(c_str!("rust-proc-file"), 0666, None, &POPS)?;
+        unsafe {
+            // Safety: Only place this is called, no concurrent access
+            backing_data::set_pde(pde);
+        }
+        pr_info!("Loaded /proc/rust-proc-file\n");
+        Ok(Self)
+    }
+
+

ProcHand::pread, ProcHand::pwrite, and ProcHand::plseek all access data that is not safe to +access any time before initialization, but safe to access after, therefore they are marked as unsafe.

+

However, since the API (that I wrote...) takes a safe-function, they are wrapped by a 'static closure that +is safe, then uses an unsafe-block internally.

+

This wrapping is implemented AFTER the code that initializes the data that is safe to access after initialization. +However, the API is still unsound, since the function could theoretically be called before that initialization, +even though it's defined after it.

+

One note on the wrapping, running it through godbolt again shows it's still being inlined.

+

This problem can be worked around, by for example, creating a static INITIALIZED: AtomicBool = AtomicBool::new(false);, +and then setting that during initialization. But that requires an atomic-read on each access for something that +is set once on initialization. This is a tradeoff of soundness vs performance, in this case performance is chosen, +because the plan for this code is not to be distributed to someone else's production environment, +or having to be maintained by someone else. In that case opting for soundness may be preferable, although the +'window' for creating UB here is quite slim.

+

Deallocation

+

Finally, the data is set up, and can be used with some constraints, now the teardown.

+
impl Drop for RustProcRamFile {
+    fn drop(&mut self) {
+        // Remove the PDE if initialized
+        // Drop it to remove the proc entry
+        unsafe {
+            // Safety:
+            // Runs at most once, no concurrent access
+            backing_data::take_pde();
+        }
+        // Remove and deallocate the data
+        unsafe {
+            // Safety:
+            // This module is only instantiated if data is initialized, therefore
+            // the data is initialized when this destructor is run.
+            backing_data::get_initialized_data().lock().take();
+        }
+        // There is theoretically a race-condition, where module-users are currently in a
+        // proc handler, the handler itself is 'static, so the kernel will be trusted
+        // to keep function-related memory initialized until it's no longer needed.
+        // There is a race-condition where it's impossible that the file can be removed, and it's made sure that all users
+        // get a 'graceful' exit, i.e. all users who can see a file and start a proc-op gets to
+        // finish it. This is because the module recording that a user has entered, and removing
+        // the proc-entry can't happen atomically together. It's impossible to ensure that there
+        // isn't a gap between a user entering the proc-handler, then recording its presence, and
+        // removing the proc-entry and checking if the user registered.
+        // In that case, the user will get an EBUSY
+    }
+}
+
+

First, the ProcDirEntry is dropped, invoking the kernel's proc_remove removing the proc-file.
+After that, a reference to the initialized data is taken, and the mutex is accessed to remove the backing-data for the +'file'. When that data is dropped, the backing data will be deallocated. +With that, all runtime-created data is removed, the only thing that may remain are function pointers which were static +anyway, and accessing them will produce a safe error.

+

Summing up

+

All important parts are now covered, the actual implementation of pread, pwrite, plseek, is fairly boring +and straight-forward, the full code can be found here +if that, and the rest of the implementation is interesting.

+

Generating bindings

+

First off bindings for the Linux C-API for creating a proc-file had to be generated, it only required adding +a header in the list here

+

Wrapping the API with reasonable lifetimes

+

The C-API has some lifetime requirements, those are encoded in the proc_fs.rs.

+

The C-API-parts that take function-pointers can be wrapped by a Rust-fn with zero-cost (as was show here), allowing a more Rust-y API +to be exposed.

+

Dealing with static data in a concurrent context

+

Some static data needs to be initialized at runtime but not concurrently mutably accessed, that was represented by a MaybeUninit.

+

Some static data does not need to be initialized at runtime, but cannot be mutable access concurrently, that was +represented by an UnsafeCell<Option<T>>.

+

Some static data was also constant, never mutable, and safe for all non-mutable access, that was represented by a +regular const <VAR>.

+

Tradeoff between soundness and performance

+

Lastly, there was a tradeoff where some functions were arbitrarily marked as safe, even though they are unsafe under +same conditions. Whether that tradeoff is justified is up to the programmer.

+
+
\ No newline at end of file diff --git a/static-pie.html b/static-pie.html new file mode 100644 index 0000000..dead529 --- /dev/null +++ b/static-pie.html @@ -0,0 +1,332 @@ + + + + + + + + + StaticPie + + + +
+

Static pie linking a nolibc Rust binary

+

Something has been bugging me for a while with tiny-std, +if I try to compile executables created with them as -C target-feature=+crt-static (statically link the C-runtime), +it segfaults.

+

The purpose of creating tiny-std was to avoid C, but to get Rust to link a binary statically, that flag needs +to be passed. -C target-feature=+crt-static -C relocation-model=static does produce a valid binary though. +The default relocation-model for static binaries is -C relocation-model=pie, +(at least for the target x86_64-unknown-linux-gnu) so something about PIE-executables created with tiny-std fails, +in this writeup I'll go into the solution for that.

+

Static pie linking

+

Static pie linking is a combination of two concepts.

+
    +
  1. Static linking, putting everything in the same place at compile time. +As opposed to dynamic linking, where library dependencies can be found and used at runtime. +Statically linking an executable gives it the property that it can be run on any system +that can handle the executable type, i.e. I can start a statically linked elf-executable on any platform that can run +elf-executables. Whereas a dynamically linked executable will not start if its dynamic dependencies cannot be found +at application start. +
  2. Position-independent code is able to run properly +regardless of where in memory is placed. The benefit, as I understand it, is security, and platform compatibility-related. +
+

When telling rustc to create a static-pie linked executable through -C target-feature=+crt-static -C relocation-model=pie +(relocation-model defaults to pie, could be omitted), it creates an elf-executable which has a header that marks it as +DYN. Here's what an example readelf -h looks like:

+
ELF Header:
+  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
+  Class:                             ELF64
+  Data:                              2's complement, little endian
+  Version:                           1 (current)
+  OS/ABI:                            UNIX - System V
+  ABI Version:                       0
+  Type:                              DYN (Position-Independent Executable file)
+  Machine:                           Advanced Micro Devices X86-64
+  Version:                           0x1
+  Entry point address:               0x24b8
+  Start of program headers:          64 (bytes into file)
+  Start of section headers:          1894224 (bytes into file)
+  Flags:                             0x0
+  Size of this header:               64 (bytes)
+  Size of program headers:           56 (bytes)
+  Number of program headers:         9
+  Size of section headers:           64 (bytes)
+  Number of section headers:         32
+  Section header string table index: 20
+
+

This signals to the OS that the executable can be run position-independently, but since tiny-std assumes that +memory addresses are absolute, the ones they were when compiled, the executable segfaults as soon as it tries to get +the address of any symbols, like functions or static variables, since those have been moved.

+

Where are my symbols?

+

This seems like a tricky problem, as a programmer, I have a bunch of variable and function calls, some that the +Rust-language emits for me, now each of the addresses for those variables and functions are in another place in memory.
+Before using any of them I need to remap them, which means that I need to have remapping code before using any +function calls (kinda).

+

The start function

+

The executable enters through the _start function, this is defined in asm for tiny-std:

+
// Binary entrypoint
+#[cfg(all(feature = "symbols", feature = "start", target_arch = "x86_64"))]
+core::arch::global_asm!(
+    ".text",
+    ".global _start",
+    ".type _start,@function",
+    "_start:",
+    "xor rbp,rbp", // Zero the stack-frame pointer
+    "mov rdi, rsp", // Move the stack pointer into rdi, c-calling convention arg 1
+    ".weak _DYNAMIC", // Elf dynamic symbol
+    ".hidden _DYNAMIC",
+    "lea rsi, [rip + _DYNAMIC]", // Load the dynamic address off the next instruction to execute incremented by _DYNAMIC into rsi
+    "and rsp,-16", // Align the stack pointer
+    "call __proxy_main" // Call our rust start function
+);
+
+

The assembly prepares the stack by aligning it, putting the stack pointer into arg1 for the coming function-call, +then adds the offset off _DYNAMIC to the special purpose rip-register address, and puts that in rsi which becomes +our called function's arg 2.

+

After that __proxy_main is called, the signature looks like this:

+

unsafe extern "C" fn __proxy_main(stack_ptr: *const u8, dynv: *const usize) +It takes the stack_ptr and the dynv-dynamic vector as arguments, which were provided in +the above assembly.

+

I wrote more about the _start-function in pgwm03 and fasterthanli.me +wrote more about it at their great blog, but in short:

+

Before running the user's main some setup is required, like arguments, environment variables, aux-values, +map in faster functions from the vdso (see pgwm03 for more on that), and set up some thread-state, +see the thread writeup for that.

+

All these variables come off the executable's stack, which is why stack pointer needs to be passed as an argument to +our setup-function, so that it can be used before the stack is polluted by the setup function.

+

The first extraction looks like this:

+
#[no_mangle]
+#[cfg(all(feature = "symbols", feature = "start"))]
+unsafe extern "C" fn __proxy_main(stack_ptr: *const u8, dynv: *const usize) {
+    // Fist 8 bytes is a u64 with the number of arguments
+    let argc = *(stack_ptr as *const u64);
+    // Directly followed by those arguments, bump pointer by 8 bytes
+    let argv = stack_ptr.add(8) as *const *const u8;
+    let ptr_size = core::mem::size_of::<usize>();
+    // Directly followed by a pointer to the environment variables, it's just a null terminated string.
+    // This isn't specified in Posix and is not great for portability, but this isn't meant to be portable outside of Linux.
+    let env_offset = 8 + argc as usize * ptr_size + ptr_size;
+    // Bump pointer by combined offset
+    let envp = stack_ptr.add(env_offset) as *const *const u8;
+    let mut null_offset = 0;
+    loop {
+        let val = *(envp.add(null_offset));
+        if val as usize == 0 {
+            break;
+        }
+        null_offset += 1;
+    }
+    // We now know how long the envp is
+    // ... 
+}
+
+

This works all the same as a pie because:

+

Prelude, inline

+

There will be trouble when trying to find a symbol contained in the binary, such as a function call.
+Up to here, that hasn't been a problem because even though ptr::add() and core::mem:size_of::<T>() is invoked, +no addresses are needed for those. This is because of inlining.

+

Looking at core::mem::size_of<T>():

+
#[inline(always)]
+#[must_use]
+#[stable(feature = "rust1", since = "1.0.0")]
+#[rustc_promotable]
+#[rustc_const_stable(feature = "const_mem_size_of", since = "1.24.0")]
+#[cfg_attr(not(test), rustc_diagnostic_item = "mem_size_of")]
+pub const fn size_of<T>() -> usize {
+    intrinsics::size_of::<T>()
+}
+
+

It has the #[inline(always)] attribute, the same goes for ptr::add(). Since that code is inlined, +an address to a function isn't necessary, and therefore it works even though all of the addresses are off.

+

To be able to debug, I would like to be able to print variables, since I haven't been able to hook a debugger up +to tiny-std executables yet. But, printing to the terminal requires code, code that usually isn't #[inline(always)].

+

So I wrote a small print:

+
#[inline(always)]
+unsafe fn print_labeled(msg: &[u8], val: usize) {
+    print_label(msg);
+    print_val(val);
+}
+#[inline(always)]
+unsafe fn print_label(msg: &[u8]) {
+    syscall!(WRITE, 1, msg.as_ptr(), msg.len());
+}
+#[inline(always)]
+unsafe fn print_val(u: usize) {
+    syscall!(WRITE, 1, num_to_digits(u).as_ptr(), 21);
+}
+#[inline(always)]
+unsafe fn num_to_digits(mut u: usize) -> [u8; 22] {
+    let mut base = *b"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\n";
+    let mut ind = base.len() - 2;
+    if u == 0 {
+        base[ind] = 48;
+    }
+    while u > 0 {
+        let md = u % 10;
+        base[ind] = md as u8 + 48;
+        ind -= 1;
+        u = u / 10;
+    }
+    base
+}
+
+

Printing to the terminal can be done through the syscall WRITE on fd 1 (STDOUT).
+It takes a buffer of bytes and a length. The call through syscall!() is always inlined.

+

Since I primarily need look at addresses, I just print usize, and I wrote a beautifully stupid number to digits function.
+Since the max digits of a usize on a 64-bit machine is 21, I allocate a slice on the stack filled with +null-bytes, these won't be displayed. Then add digit by digit, which means that the number is formatted without leading or +trailing zeroes.

+

Invoking it looks like this:

+
fn test() {
+    print_labeled(b"My msg as bytes: ", 15);
+}
+
+

Relocation

+

Now that basic debug-printing is possible work to relocate the addresses can begin.

+

I previously had written some code the extract aux-values, but now that code needs to run without using any +non-inlined functions or variables.

+

Aux values

+

A good description of aux-values comes from the docs here, +in short the kernel puts some data in the memory of a program when it's loaded.
+This data points to other data that is needed to do relocation. It also has an insane layout for reasons that +I haven't yet been able to find any motivation for.
+A pointer to the aux-values are put after the envp on the stack.

+

The aux-values were collected and stored pretty sloppily as a global static variable before implementing this change, +this time it needs to be collected onto the stack, used for finding the dynamic relocation addresses, +and then it could be put into a static variable after that (since the address of the static variable can't be found before +remapping).

+

The dyn-values are also required, which are essentially the same as aux-values, provided for DYN-objects.

+

In musl, the aux-values that are put on the stack looks like this:

+
size_t i, aux[AUX_CNT], dyn[DYN_CNT];
+
+

So I replicated the aux-vec on the stack like this:

+
// There are 32 aux values.
+let mut aux: [0usize; 32];
+
+

And then initialize it, with the aux-pointer provided by the OS.

+

The OS-supplies some values in the aux-vector more info here +the necessary ones for remapping are:

+
    +
  1. AT_BASE the base address of the program interpreter, 0 if no interpreter (static-pie). +
  2. AT_PHNUM, the number of program headers. +
  3. AT_PHENT, the size of one program header entry. +
  4. AT_PHDR, the address of the program headers in the executable. +
+

First a virtual address found at the program header that has the dynamic type must be found.

+

The program header is laid out in memory as this struct:

+
#[repr(C)]
+#[derive(Debug, Copy, Clone)]
+pub struct elf64_phdr {
+    pub p_type: Elf64_Word,
+    pub p_flags: Elf64_Word,
+    pub p_offset: Elf64_Off,
+    pub p_vaddr: Elf64_Addr,
+    pub p_paddr: Elf64_Addr,
+    pub p_filesz: Elf64_Xword,
+    pub p_memsz: Elf64_Xword,
+    pub p_align: Elf64_Xword,
+}
+
+

The address of the AT_PHDR can be treated as an array declared as:

+
let phdr: &[elf64_phdr; AT_PHNUM] = ...
+
+

That array can be walked until finding a program header struct with p_type = PT_DYNAMIC, +that program header holds an offset at p_vaddr that can be subtracted from the dynv pointer to get +the correct base address.

+

Initialize the dyn section

+

The dynv pointer supplied by the os, as previously stated, is analogous to the aux-pointer but +trying to stack allocate its value mappings like this:

+
let dyn_values = [0usize; 37];
+
+

Will cause a segfault.

+

SYMBOLS!!!

+

It took me a while to figure out what's happening, a zeroed array is allocated in rust, and +that array is larger than [0usize; 32] (256 bytes of zeroes seems to be the exact breakpoint) +rustc instead of using sse instructions, uses memset to zero the memory it just took off the stack.

+

The asm will look like this:

+
        ...
+        mov edx, 296
+        mov rdi, rbx
+        xor esi, esi
+        call qword ptr [rip + memset@GOTPCREL]
+        ...
+
+

Accessing that memset symbol is what causes the segfault.
+I tried a myriad of ways to get the compiler to not emit that symbol, among +posting this +help request.

+

It seems that there is no reliable way to avoid rustc emitting unwanted symbols without doing it all in assembly, +and since that seems a bit much, at least right now, I opted to instead restructure the code. Unpacking both +the aux and dyn values and just keeping what tiny-std needs.
+The unpacked aux values now look like this:

+
/// Some selected aux-values, needs to be kept small since they're collected
+/// before symbol relocation on static-pie-linked binaries, which means rustc
+/// will emit memset on a zeroed allocation of over 256 bytes, which we won't be able
+/// to find and thus will result in an immediate segfault on start.
+/// See [docs](https://man7.org/linux/man-pages/man3/getauxval.3.html)
+#[derive(Debug)]
+pub(crate) struct AuxValues {
+    /// Base address of the program interpreter
+    pub(crate) at_base: usize,
+    /// Real group id of the main thread
+    pub(crate) at_gid: usize,
+    /// Real user id of the main thread
+    pub(crate) at_uid: usize,
+    /// Address of the executable's program headers
+    pub(crate) at_phdr: usize,
+    /// Size of program header entry
+    pub(crate) at_phent: usize,
+    /// Number of program headers
+    pub(crate) at_phnum: usize,
+    /// Address pointing to 16 bytes of a random value
+    pub(crate) at_random: usize,
+    /// Executable should be treated securely
+    pub(crate) at_secure: usize,
+    /// Address of the vdso
+    pub(crate) at_sysinfo_ehdr: usize,
+}
+
+

It only contains the aux-values that are actually used by tiny-std.

+

The dyn-values are only used for relocations so far, so they were packed into this much smaller struct:

+
pub(crate) struct DynSection {
+    rel: usize,
+    rel_sz: usize,
+    rela: usize,
+    rela_sz: usize,
+}
+
+

Now that rustc's memset emissions has been sidestepped, the DynSection struct can be filled with the values from the +dynv-pointer, and then finally the symbols can be relocated:

+
#[inline(always)]
+pub(crate) unsafe fn relocate(&self, base_addr: usize) {
+    // Relocate all rel-entries
+    for i in 0..(self.rel_sz / core::mem::size_of::<Elf64Rel>()) {
+        let rel_ptr = ((base_addr + self.rel) as *const Elf64Rel).add(i);
+        let rel = ptr_unsafe_ref(rel_ptr);
+        if rel.0.r_info == relative_type(REL_RELATIVE) {
+            let rel_addr = (base_addr + rel.0.r_offset as usize) as *mut usize;
+            *rel_addr += base_addr;
+        }
+    }
+    // Relocate all rela-entries
+    for i in 0..(self.rela_sz / core::mem::size_of::<Elf64Rela>()) {
+        let rela_ptr = ((base_addr + self.rela) as *const Elf64Rela).add(i);
+        let rela = ptr_unsafe_ref(rela_ptr);
+        if rela.0.r_info == relative_type(REL_RELATIVE) {
+            let rel_addr = (base_addr + rela.0.r_offset as usize) as *mut usize;
+            *rel_addr = base_addr + rela.0.r_addend as usize;
+        }
+    }
+    // Skip implementing relr-entries for now
+}
+
+

After the relocate-section runs, symbols can again be used, and tiny-std can continue with the setup.

+

Outro

+

The commit that added the functionality can be found here.

+

Thanks for reading!

+
+
\ No newline at end of file diff --git a/static/github-markdown.css b/static/github-markdown.css new file mode 100644 index 0000000..e451c64 --- /dev/null +++ b/static/github-markdown.css @@ -0,0 +1 @@ +@media (prefers-color-scheme:dark){:root{color-scheme:dark;--color-prettylights-syntax-comment:#8b949e;--color-prettylights-syntax-constant:#79c0ff;--color-prettylights-syntax-entity:#d2a8ff;--color-prettylights-syntax-storage-modifier-import:#c9d1d9;--color-prettylights-syntax-entity-tag:#7ee787;--color-prettylights-syntax-keyword:#ff7b72;--color-prettylights-syntax-string:#a5d6ff;--color-prettylights-syntax-variable:#ffa657;--color-prettylights-syntax-brackethighlighter-unmatched:#f85149;--color-prettylights-syntax-invalid-illegal-text:#f0f6fc;--color-prettylights-syntax-invalid-illegal-bg:#8e1519;--color-prettylights-syntax-carriage-return-text:#f0f6fc;--color-prettylights-syntax-carriage-return-bg:#b62324;--color-prettylights-syntax-string-regexp:#7ee787;--color-prettylights-syntax-markup-list:#f2cc60;--color-prettylights-syntax-markup-heading:#1f6feb;--color-prettylights-syntax-markup-italic:#c9d1d9;--color-prettylights-syntax-markup-bold:#c9d1d9;--color-prettylights-syntax-markup-deleted-text:#ffdcd7;--color-prettylights-syntax-markup-deleted-bg:#67060c;--color-prettylights-syntax-markup-inserted-text:#aff5b4;--color-prettylights-syntax-markup-inserted-bg:#033a16;--color-prettylights-syntax-markup-changed-text:#ffdfb6;--color-prettylights-syntax-markup-changed-bg:#5a1e02;--color-prettylights-syntax-markup-ignored-text:#c9d1d9;--color-prettylights-syntax-markup-ignored-bg:#1158c7;--color-prettylights-syntax-meta-diff-range:#d2a8ff;--color-prettylights-syntax-brackethighlighter-angle:#8b949e;--color-prettylights-syntax-sublimelinter-gutter-mark:#484f58;--color-prettylights-syntax-constant-other-reference-link:#a5d6ff;--color-fg-default:#c9d1d9;--color-fg-muted:#8b949e;--color-fg-subtle:#484f58;--color-canvas-default:#0d1117;--color-canvas-subtle:#161b22;--color-border-default:#30363d;--color-border-muted:#21262d;--color-neutral-muted:rgba(110,118,129,0.4);--color-accent-fg:#58a6ff;--color-accent-emphasis:#1f6feb;--color-attention-subtle:rgba(187,128,9,0.15);--color-danger-fg:#f85149;}}@media (prefers-color-scheme:light){:root{color-scheme:light;--color-prettylights-syntax-comment:#6e7781;--color-prettylights-syntax-constant:#0550ae;--color-prettylights-syntax-entity:#8250df;--color-prettylights-syntax-storage-modifier-import:#24292f;--color-prettylights-syntax-entity-tag:#116329;--color-prettylights-syntax-keyword:#cf222e;--color-prettylights-syntax-string:#0a3069;--color-prettylights-syntax-variable:#953800;--color-prettylights-syntax-brackethighlighter-unmatched:#82071e;--color-prettylights-syntax-invalid-illegal-text:#f6f8fa;--color-prettylights-syntax-invalid-illegal-bg:#82071e;--color-prettylights-syntax-carriage-return-text:#f6f8fa;--color-prettylights-syntax-carriage-return-bg:#cf222e;--color-prettylights-syntax-string-regexp:#116329;--color-prettylights-syntax-markup-list:#3b2300;--color-prettylights-syntax-markup-heading:#0550ae;--color-prettylights-syntax-markup-italic:#24292f;--color-prettylights-syntax-markup-bold:#24292f;--color-prettylights-syntax-markup-deleted-text:#82071e;--color-prettylights-syntax-markup-deleted-bg:#FFEBE9;--color-prettylights-syntax-markup-inserted-text:#116329;--color-prettylights-syntax-markup-inserted-bg:#dafbe1;--color-prettylights-syntax-markup-changed-text:#953800;--color-prettylights-syntax-markup-changed-bg:#ffd8b5;--color-prettylights-syntax-markup-ignored-text:#eaeef2;--color-prettylights-syntax-markup-ignored-bg:#0550ae;--color-prettylights-syntax-meta-diff-range:#8250df;--color-prettylights-syntax-brackethighlighter-angle:#57606a;--color-prettylights-syntax-sublimelinter-gutter-mark:#8c959f;--color-prettylights-syntax-constant-other-reference-link:#0a3069;--color-fg-default:#24292f;--color-fg-muted:#57606a;--color-fg-subtle:#6e7781;--color-canvas-default:#ffffff;--color-canvas-subtle:#f6f8fa;--color-border-default:#d0d7de;--color-border-muted:hsla(210,18%,87%,1);--color-neutral-muted:rgba(175,184,193,0.2);--color-accent-fg:#0969da;--color-accent-emphasis:#0969da;--color-attention-subtle:#fff8c5;--color-danger-fg:#cf222e;}}.markdown-body{-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%;margin:0;color:var(--color-fg-default);background-color:var(--color-canvas-default);font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji";font-size:16px;line-height:1.5;word-wrap:break-word;}.markdown-body .octicon{display:inline-block;fill:currentColor;vertical-align:text-bottom;}.markdown-body h1:hover .anchor .octicon-link:before,.markdown-body h2:hover .anchor .octicon-link:before,.markdown-body h3:hover .anchor .octicon-link:before,.markdown-body h4:hover .anchor .octicon-link:before,.markdown-body h5:hover .anchor .octicon-link:before,.markdown-body h6:hover .anchor .octicon-link:before{width:16px;height:16px;content:' ';display:inline-block;background-color:currentColor;-webkit-mask-image:url("data:image/svg+xml,");mask-image:url("data:image/svg+xml,");}.markdown-body details,.markdown-body figcaption,.markdown-body figure{display:block;}.markdown-body summary{display:list-item;}.markdown-body[hidden]{display:none !important;}.markdown-body a{background-color:transparent;color:var(--color-accent-fg);text-decoration:none;}.markdown-body a:active,.markdown-body a:hover{outline-width:0;}.markdown-body abbr[title]{border-bottom:none;text-decoration:underline dotted;}.markdown-body b,.markdown-body strong{font-weight:600;}.markdown-body dfn{font-style:italic;}.markdown-body h1{margin:.67em 0;font-weight:600;padding-bottom:.3em;font-size:2em;border-bottom:1px solid var(--color-border-muted);}.markdown-body mark{background-color:var(--color-attention-subtle);color:var(--color-text-primary);}.markdown-body small{font-size:90%;}.markdown-body sub,.markdown-body sup{font-size:75%;line-height:0;position:relative;vertical-align:baseline;}.markdown-body sub{bottom:-0.25em;}.markdown-body sup{top:-0.5em;}.markdown-body img{border-style:none;max-width:100%;box-sizing:content-box;background-color:var(--color-canvas-default);}.markdown-body code,.markdown-body kbd,.markdown-body pre,.markdown-body samp{font-family:monospace,monospace;font-size:1em;}.markdown-body figure{margin:1em 40px;}.markdown-body hr{box-sizing:content-box;overflow:hidden;background:transparent;border-bottom:1px solid var(--color-border-muted);height:.25em;padding:0;margin:24px 0;background-color:var(--color-border-default);border:0;}.markdown-body input{font:inherit;margin:0;overflow:visible;font-family:inherit;font-size:inherit;line-height:inherit;}.markdown-body[type=button],.markdown-body[type=reset],.markdown-body[type=submit]{-webkit-appearance:button;}.markdown-body[type=button]::-moz-focus-inner,.markdown-body[type=reset]::-moz-focus-inner,.markdown-body[type=submit]::-moz-focus-inner{border-style:none;padding:0;}.markdown-body[type=button]:-moz-focusring,.markdown-body[type=reset]:-moz-focusring,.markdown-body[type=submit]:-moz-focusring{outline:1px dotted ButtonText;}.markdown-body[type=checkbox],.markdown-body[type=radio]{box-sizing:border-box;padding:0;}.markdown-body[type=number]::-webkit-inner-spin-button,.markdown-body[type=number]::-webkit-outer-spin-button{height:auto;}.markdown-body[type=search]{-webkit-appearance:textfield;outline-offset:-2px;}.markdown-body[type=search]::-webkit-search-cancel-button,.markdown-body[type=search]::-webkit-search-decoration{-webkit-appearance:none;}.markdown-body::-webkit-input-placeholder{color:inherit;opacity:.54;}.markdown-body::-webkit-file-upload-button{-webkit-appearance:button;font:inherit;}.markdown-body a:hover{text-decoration:underline;}.markdown-body hr::before{display:table;content:"";}.markdown-body hr::after{display:table;clear:both;content:"";}.markdown-body table{border-spacing:0;border-collapse:collapse;display:block;width:max-content;max-width:100%;overflow:auto;}.markdown-body td,.markdown-body th{padding:0;}.markdown-body details summary{cursor:pointer;}.markdown-body details:not([open])>*:not(summary){display:none !important;}.markdown-body kbd{display:inline-block;padding:3px 5px;font:11px ui-monospace,SFMono-Regular,SF Mono,Menlo,Consolas,Liberation Mono,monospace;line-height:10px;color:var(--color-fg-default);vertical-align:middle;background-color:var(--color-canvas-subtle);border:solid 1px var(--color-neutral-muted);border-bottom-color:var(--color-neutral-muted);border-radius:6px;box-shadow:inset 0 -1px 0 var(--color-neutral-muted);}.markdown-body h1,.markdown-body h2,.markdown-body h3,.markdown-body h4,.markdown-body h5,.markdown-body h6{margin-top:24px;margin-bottom:16px;font-weight:600;line-height:1.25;}.markdown-body h2{font-weight:600;padding-bottom:.3em;font-size:1.5em;border-bottom:1px solid var(--color-border-muted);}.markdown-body h3{font-weight:600;font-size:1.25em;}.markdown-body h4{font-weight:600;font-size:1em;}.markdown-body h5{font-weight:600;font-size:.875em;}.markdown-body h6{font-weight:600;font-size:.85em;color:var(--color-fg-muted);}.markdown-body p{margin-top:0;margin-bottom:10px;}.markdown-body blockquote{margin:0;padding:0 1em;color:var(--color-fg-muted);border-left:.25em solid var(--color-border-default);}.markdown-body ul,.markdown-body ol{margin-top:0;margin-bottom:0;padding-left:2em;}.markdown-body ol ol,.markdown-body ul ol{list-style-type:lower-roman;}.markdown-body ul ul ol,.markdown-body ul ol ol,.markdown-body ol ul ol,.markdown-body ol ol ol{list-style-type:lower-alpha;}.markdown-body dd{margin-left:0;}.markdown-body tt,.markdown-body code{font-family:ui-monospace,SFMono-Regular,SF Mono,Menlo,Consolas,Liberation Mono,monospace;font-size:12px;}.markdown-body pre{margin-top:0;margin-bottom:0;font-family:ui-monospace,SFMono-Regular,SF Mono,Menlo,Consolas,Liberation Mono,monospace;font-size:12px;word-wrap:normal;}.markdown-body .octicon{display:inline-block;overflow:visible !important;vertical-align:text-bottom;fill:currentColor;}.markdown-body::placeholder{color:var(--color-fg-subtle);opacity:1;}.markdown-body input::-webkit-outer-spin-button,.markdown-body input::-webkit-inner-spin-button{margin:0;-webkit-appearance:none;appearance:none;}.markdown-body .pl-c{color:var(--color-prettylights-syntax-comment);}.markdown-body .pl-c1,.markdown-body .pl-s .pl-v{color:var(--color-prettylights-syntax-constant);}.markdown-body .pl-e,.markdown-body .pl-en{color:var(--color-prettylights-syntax-entity);}.markdown-body .pl-smi,.markdown-body .pl-s .pl-s1{color:var(--color-prettylights-syntax-storage-modifier-import);}.markdown-body .pl-ent{color:var(--color-prettylights-syntax-entity-tag);}.markdown-body .pl-k{color:var(--color-prettylights-syntax-keyword);}.markdown-body .pl-s,.markdown-body .pl-pds,.markdown-body .pl-s .pl-pse .pl-s1,.markdown-body .pl-sr,.markdown-body .pl-sr .pl-cce,.markdown-body .pl-sr .pl-sre,.markdown-body .pl-sr .pl-sra{color:var(--color-prettylights-syntax-string);}.markdown-body .pl-v,.markdown-body .pl-smw{color:var(--color-prettylights-syntax-variable);}.markdown-body .pl-bu{color:var(--color-prettylights-syntax-brackethighlighter-unmatched);}.markdown-body .pl-ii{color:var(--color-prettylights-syntax-invalid-illegal-text);background-color:var(--color-prettylights-syntax-invalid-illegal-bg);}.markdown-body .pl-c2{color:var(--color-prettylights-syntax-carriage-return-text);background-color:var(--color-prettylights-syntax-carriage-return-bg);}.markdown-body .pl-sr .pl-cce{font-weight:bold;color:var(--color-prettylights-syntax-string-regexp);}.markdown-body .pl-ml{color:var(--color-prettylights-syntax-markup-list);}.markdown-body .pl-mh,.markdown-body .pl-mh .pl-en,.markdown-body .pl-ms{font-weight:bold;color:var(--color-prettylights-syntax-markup-heading);}.markdown-body .pl-mi{font-style:italic;color:var(--color-prettylights-syntax-markup-italic);}.markdown-body .pl-mb{font-weight:bold;color:var(--color-prettylights-syntax-markup-bold);}.markdown-body .pl-md{color:var(--color-prettylights-syntax-markup-deleted-text);background-color:var(--color-prettylights-syntax-markup-deleted-bg);}.markdown-body .pl-mi1{color:var(--color-prettylights-syntax-markup-inserted-text);background-color:var(--color-prettylights-syntax-markup-inserted-bg);}.markdown-body .pl-mc{color:var(--color-prettylights-syntax-markup-changed-text);background-color:var(--color-prettylights-syntax-markup-changed-bg);}.markdown-body .pl-mi2{color:var(--color-prettylights-syntax-markup-ignored-text);background-color:var(--color-prettylights-syntax-markup-ignored-bg);}.markdown-body .pl-mdr{font-weight:bold;color:var(--color-prettylights-syntax-meta-diff-range);}.markdown-body .pl-ba{color:var(--color-prettylights-syntax-brackethighlighter-angle);}.markdown-body .pl-sg{color:var(--color-prettylights-syntax-sublimelinter-gutter-mark);}.markdown-body .pl-corl{text-decoration:underline;color:var(--color-prettylights-syntax-constant-other-reference-link);}.markdown-body[data-catalyst]{display:block;}.markdown-body g-emoji{font-family:"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";font-size:1em;font-style:normal !important;font-weight:400;line-height:1;vertical-align:-0.075em;}.markdown-body g-emoji img{width:1em;height:1em;}.markdown-body::before{display:table;content:"";}.markdown-body::after{display:table;clear:both;content:"";}.markdown-body>*:first-child{margin-top:0 !important;}.markdown-body>*:last-child{margin-bottom:0 !important;}.markdown-body a:not([href]):not(.self-link){color:inherit;text-decoration:none;}.markdown-body .absent{color:var(--color-danger-fg);}.markdown-body .anchor{float:left;padding-right:4px;margin-left:-20px;line-height:1;}.markdown-body .anchor:focus{outline:none;}.markdown-body p,.markdown-body blockquote,.markdown-body ul,.markdown-body ol,.markdown-body dl,.markdown-body table,.markdown-body pre,.markdown-body details{margin-top:0;margin-bottom:16px;}.markdown-body blockquote>:first-child{margin-top:0;}.markdown-body blockquote>:last-child{margin-bottom:0;}.markdown-body sup>a::before{content:"[";}.markdown-body sup>a::after{content:"]";}.markdown-body h1 .octicon-link,.markdown-body h2 .octicon-link,.markdown-body h3 .octicon-link,.markdown-body h4 .octicon-link,.markdown-body h5 .octicon-link,.markdown-body h6 .octicon-link{color:var(--color-fg-default);vertical-align:middle;visibility:hidden;}.markdown-body h1:hover .anchor,.markdown-body h2:hover .anchor,.markdown-body h3:hover .anchor,.markdown-body h4:hover .anchor,.markdown-body h5:hover .anchor,.markdown-body h6:hover .anchor{text-decoration:none;}.markdown-body h1:hover .anchor .octicon-link,.markdown-body h2:hover .anchor .octicon-link,.markdown-body h3:hover .anchor .octicon-link,.markdown-body h4:hover .anchor .octicon-link,.markdown-body h5:hover .anchor .octicon-link,.markdown-body h6:hover .anchor .octicon-link{visibility:visible;}.markdown-body h1 tt,.markdown-body h1 code,.markdown-body h2 tt,.markdown-body h2 code,.markdown-body h3 tt,.markdown-body h3 code,.markdown-body h4 tt,.markdown-body h4 code,.markdown-body h5 tt,.markdown-body h5 code,.markdown-body h6 tt,.markdown-body h6 code{padding:0 .2em;font-size:inherit;}.markdown-body ul.no-list,.markdown-body ol.no-list{padding:0;list-style-type:none;}.markdown-body ol[type="1"]{list-style-type:decimal;}.markdown-body ol[type=a]{list-style-type:lower-alpha;}.markdown-body ol[type=i]{list-style-type:lower-roman;}.markdown-body div>ol:not([type]){list-style-type:decimal;}.markdown-body ul ul,.markdown-body ul ol,.markdown-body ol ol,.markdown-body ol ul{margin-top:0;margin-bottom:0;}.markdown-body li>p{margin-top:16px;}.markdown-body li+li{margin-top:.25em;}.markdown-body dl{padding:0;}.markdown-body dl dt{padding:0;margin-top:16px;font-size:1em;font-style:italic;font-weight:600;}.markdown-body dl dd{padding:0 16px;margin-bottom:16px;}.markdown-body table th{font-weight:600;}.markdown-body table th,.markdown-body table td{padding:6px 13px;border:1px solid var(--color-border-default);}.markdown-body table tr{background-color:var(--color-canvas-default);border-top:1px solid var(--color-border-muted);}.markdown-body table tr:nth-child(2n){background-color:var(--color-canvas-subtle);}.markdown-body table img{background-color:transparent;}.markdown-body img[align=right]{padding-left:20px;}.markdown-body img[align=left]{padding-right:20px;}.markdown-body .emoji{max-width:none;vertical-align:text-top;background-color:transparent;}.markdown-body span.frame{display:block;overflow:hidden;}.markdown-body span.frame>span{display:block;float:left;width:auto;padding:7px;margin:13px 0 0;overflow:hidden;border:1px solid var(--color-border-default);}.markdown-body span.frame span img{display:block;float:left;}.markdown-body span.frame span span{display:block;padding:5px 0 0;clear:both;color:var(--color-fg-default);}.markdown-body span.align-center{display:block;overflow:hidden;clear:both;}.markdown-body span.align-center>span{display:block;margin:13px auto 0;overflow:hidden;text-align:center;}.markdown-body span.align-center span img{margin:0 auto;text-align:center;}.markdown-body span.align-right{display:block;overflow:hidden;clear:both;}.markdown-body span.align-right>span{display:block;margin:13px 0 0;overflow:hidden;text-align:right;}.markdown-body span.align-right span img{margin:0;text-align:right;}.markdown-body span.float-left{display:block;float:left;margin-right:13px;overflow:hidden;}.markdown-body span.float-left span{margin:13px 0 0;}.markdown-body span.float-right{display:block;float:right;margin-left:13px;overflow:hidden;}.markdown-body span.float-right>span{display:block;margin:13px auto 0;overflow:hidden;text-align:right;}.markdown-body code,.markdown-body tt{padding:.2em .4em;margin:0;font-size:85%;background-color:var(--color-neutral-muted);border-radius:6px;}.markdown-body code br,.markdown-body tt br{display:none;}.markdown-body del code{text-decoration:inherit;}.markdown-body pre code{font-size:100%;}.markdown-body pre>code{padding:0;margin:0;word-break:normal;white-space:pre;background:transparent;border:0;}.markdown-body .highlight{margin-bottom:16px;}.markdown-body .highlight pre{margin-bottom:0;word-break:normal;}.markdown-body .highlight pre,.markdown-body pre{padding:16px;overflow:auto;font-size:85%;line-height:1.45;background-color:var(--color-canvas-subtle);border-radius:6px;}.markdown-body pre code,.markdown-body pre tt{display:inline;max-width:auto;padding:0;margin:0;overflow:visible;line-height:inherit;word-wrap:normal;background-color:transparent;border:0;}.markdown-body .csv-data td,.markdown-body .csv-data th{padding:5px;overflow:hidden;font-size:12px;line-height:1;text-align:left;white-space:nowrap;}.markdown-body .csv-data .blob-num{padding:10px 8px 9px;text-align:right;background:var(--color-canvas-default);border:0;}.markdown-body .csv-data tr{border-top:0;}.markdown-body .csv-data th{font-weight:600;background:var(--color-canvas-subtle);border-top:0;}.markdown-body .footnotes{font-size:12px;color:var(--color-fg-muted);border-top:1px solid var(--color-border-default);}.markdown-body .footnotes ol{padding-left:16px;}.markdown-body .footnotes li{position:relative;}.markdown-body .footnotes li:target::before{position:absolute;top:-8px;right:-8px;bottom:-8px;left:-24px;pointer-events:none;content:"";border:2px solid var(--color-accent-emphasis);border-radius:6px;}.markdown-body .footnotes li:target{color:var(--color-fg-default);}.markdown-body .footnotes .data-footnote-backref g-emoji{font-family:monospace;}.markdown-body .task-list-item{list-style-type:none;}.markdown-body .task-list-item label{font-weight:400;}.markdown-body .task-list-item.enabled label{cursor:pointer;}.markdown-body .task-list-item+.task-list-item{margin-top:3px;}.markdown-body .task-list-item .handle{display:none;}.markdown-body .task-list-item-checkbox{margin:0 .2em .25em -1.6em;vertical-align:middle;}.markdown-body .contains-task-list:dir(rtl) .task-list-item-checkbox{margin:0 -1.6em .25em .2em;}.markdown-body::-webkit-calendar-picker-indicator{filter:invert(50%);} \ No newline at end of file diff --git a/static/rust-kbd-oled.jpg b/static/rust-kbd-oled.jpg new file mode 100644 index 0000000..8aadec7 Binary files /dev/null and b/static/rust-kbd-oled.jpg differ diff --git a/static/starry_night.css b/static/starry_night.css new file mode 100644 index 0000000..fb038cf --- /dev/null +++ b/static/starry_night.css @@ -0,0 +1 @@ + :root{--color-prettylights-syntax-comment:#6e7781;--color-prettylights-syntax-constant:#0550ae;--color-prettylights-syntax-entity:#8250df;--color-prettylights-syntax-storage-modifier-import:#24292f;--color-prettylights-syntax-entity-tag:#116329;--color-prettylights-syntax-keyword:#cf222e;--color-prettylights-syntax-string:#0a3069;--color-prettylights-syntax-variable:#953800;--color-prettylights-syntax-brackethighlighter-unmatched:#82071e;--color-prettylights-syntax-invalid-illegal-text:#f6f8fa;--color-prettylights-syntax-invalid-illegal-bg:#82071e;--color-prettylights-syntax-carriage-return-text:#f6f8fa;--color-prettylights-syntax-carriage-return-bg:#cf222e;--color-prettylights-syntax-string-regexp:#116329;--color-prettylights-syntax-markup-list:#3b2300;--color-prettylights-syntax-markup-heading:#0550ae;--color-prettylights-syntax-markup-italic:#24292f;--color-prettylights-syntax-markup-bold:#24292f;--color-prettylights-syntax-markup-deleted-text:#82071e;--color-prettylights-syntax-markup-deleted-bg:#ffebe9;--color-prettylights-syntax-markup-inserted-text:#116329;--color-prettylights-syntax-markup-inserted-bg:#dafbe1;--color-prettylights-syntax-markup-changed-text:#953800;--color-prettylights-syntax-markup-changed-bg:#ffd8b5;--color-prettylights-syntax-markup-ignored-text:#eaeef2;--color-prettylights-syntax-markup-ignored-bg:#0550ae;--color-prettylights-syntax-meta-diff-range:#8250df;--color-prettylights-syntax-brackethighlighter-angle:#57606a;--color-prettylights-syntax-sublimelinter-gutter-mark:#8c959f;--color-prettylights-syntax-constant-other-reference-link:#0a3069;}@media (prefers-color-scheme:dark){:root{--color-prettylights-syntax-comment:#8b949e;--color-prettylights-syntax-constant:#79c0ff;--color-prettylights-syntax-entity:#d2a8ff;--color-prettylights-syntax-storage-modifier-import:#c9d1d9;--color-prettylights-syntax-entity-tag:#7ee787;--color-prettylights-syntax-keyword:#ff7b72;--color-prettylights-syntax-string:#a5d6ff;--color-prettylights-syntax-variable:#ffa657;--color-prettylights-syntax-brackethighlighter-unmatched:#f85149;--color-prettylights-syntax-invalid-illegal-text:#f0f6fc;--color-prettylights-syntax-invalid-illegal-bg:#8e1519;--color-prettylights-syntax-carriage-return-text:#f0f6fc;--color-prettylights-syntax-carriage-return-bg:#b62324;--color-prettylights-syntax-string-regexp:#7ee787;--color-prettylights-syntax-markup-list:#f2cc60;--color-prettylights-syntax-markup-heading:#1f6feb;--color-prettylights-syntax-markup-italic:#c9d1d9;--color-prettylights-syntax-markup-bold:#c9d1d9;--color-prettylights-syntax-markup-deleted-text:#ffdcd7;--color-prettylights-syntax-markup-deleted-bg:#67060c;--color-prettylights-syntax-markup-inserted-text:#aff5b4;--color-prettylights-syntax-markup-inserted-bg:#033a16;--color-prettylights-syntax-markup-changed-text:#ffdfb6;--color-prettylights-syntax-markup-changed-bg:#5a1e02;--color-prettylights-syntax-markup-ignored-text:#c9d1d9;--color-prettylights-syntax-markup-ignored-bg:#1158c7;--color-prettylights-syntax-meta-diff-range:#d2a8ff;--color-prettylights-syntax-brackethighlighter-angle:#8b949e;--color-prettylights-syntax-sublimelinter-gutter-mark:#484f58;--color-prettylights-syntax-constant-other-reference-link:#a5d6ff;}}.pl-c{color:var(--color-prettylights-syntax-comment);}.pl-c1,.pl-s .pl-v{color:var(--color-prettylights-syntax-constant);}.pl-e,.pl-en{color:var(--color-prettylights-syntax-entity);}.pl-smi,.pl-s .pl-s1{color:var(--color-prettylights-syntax-storage-modifier-import);}.pl-ent{color:var(--color-prettylights-syntax-entity-tag);}.pl-k{color:var(--color-prettylights-syntax-keyword);}.pl-s,.pl-pds,.pl-s .pl-pse .pl-s1,.pl-sr,.pl-sr .pl-cce,.pl-sr .pl-sre,.pl-sr .pl-sra{color:var(--color-prettylights-syntax-string);}.pl-v,.pl-smw{color:var(--color-prettylights-syntax-variable);}.pl-bu{color:var(--color-prettylights-syntax-brackethighlighter-unmatched);}.pl-ii{color:var(--color-prettylights-syntax-invalid-illegal-text);background-color:var(--color-prettylights-syntax-invalid-illegal-bg);}.pl-c2{color:var(--color-prettylights-syntax-carriage-return-text);background-color:var(--color-prettylights-syntax-carriage-return-bg);}.pl-sr .pl-cce{font-weight:bold;color:var(--color-prettylights-syntax-string-regexp);}.pl-ml{color:var(--color-prettylights-syntax-markup-list);}.pl-mh,.pl-mh .pl-en,.pl-ms{font-weight:bold;color:var(--color-prettylights-syntax-markup-heading);}.pl-mi{font-style:italic;color:var(--color-prettylights-syntax-markup-italic);}.pl-mb{font-weight:bold;color:var(--color-prettylights-syntax-markup-bold);}.pl-md{color:var(--color-prettylights-syntax-markup-deleted-text);background-color:var(--color-prettylights-syntax-markup-deleted-bg);}.pl-mi1{color:var(--color-prettylights-syntax-markup-inserted-text);background-color:var(--color-prettylights-syntax-markup-inserted-bg);}.pl-mc{color:var(--color-prettylights-syntax-markup-changed-text);background-color:var(--color-prettylights-syntax-markup-changed-bg);}.pl-mi2{color:var(--color-prettylights-syntax-markup-ignored-text);background-color:var(--color-prettylights-syntax-markup-ignored-bg);}.pl-mdr{font-weight:bold;color:var(--color-prettylights-syntax-meta-diff-range);}.pl-ba{color:var(--color-prettylights-syntax-brackethighlighter-angle);}.pl-sg{color:var(--color-prettylights-syntax-sublimelinter-gutter-mark);}.pl-corl{text-decoration:underline;color:var(--color-prettylights-syntax-constant-other-reference-link);} \ No newline at end of file diff --git a/static/styles.css b/static/styles.css new file mode 100644 index 0000000..71c8a4a --- /dev/null +++ b/static/styles.css @@ -0,0 +1 @@ +@import "github-markdown.css";@media (prefers-color-scheme:dark){body{background-color:#07090d;}}@media (prefers-color-scheme:light){body{background-color:#e6eaed;}}.menu-item{text-align:center;font-size:2em;font-weight:600;font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji";color:var(--color-fg-default);background-color:var(--color-canvas-subtle);padding:5px;border-radius:3px;margin-right:5px;margin-top:5px;}.self-link{cursor:pointer;}button{all:unset;cursor:pointer;}#markdown-text{padding-top:10px;padding-bottom:10px;margin-left:5%;margin-right:5%;}#content{margin-left:calc((100vw - 980px - 45px) / 2);}@media (max-width:1025px){#content{margin:0 auto;}}.markdown-body{box-sizing:border-box;min-width:200px;max-width:980px;margin:0 auto;padding:45px;}@media (max-width:767px){.markdown-body{padding:15px;}} \ No newline at end of file diff --git a/table-of-contents.html b/table-of-contents.html new file mode 100644 index 0000000..6d5340e --- /dev/null +++ b/table-of-contents.html @@ -0,0 +1 @@ + Nav

Table of contents

Because I'm terrible at web-dev and unable to make a side menu scale properly, I made things easier for myself and made navigation happen through this md-page instead.

Top level navigation

Projects

Retrospectives

\ No newline at end of file diff --git a/test.html b/test.html new file mode 100644 index 0000000..7f0f46f --- /dev/null +++ b/test.html @@ -0,0 +1,24 @@ + + + + + + + + + Test + + + +
+

Here's a test write-up

+

I always test in prod.

+
fn main() {
+    panic!("Finally highlighting works");
+}
+
+

Test some change here!

+
+
\ No newline at end of file diff --git a/threads.html b/threads.html new file mode 100644 index 0000000..26697f1 --- /dev/null +++ b/threads.html @@ -0,0 +1,517 @@ + + + + + + + + + Threads + + + +
+

Threads, some assembly required.

+

Lately I've been thinking about adding threads to tiny-std, +my linux-only x86_64/aarch64-only tiny standard library for Rust.

+

Now I've finally done that, with some jankiness, in this write-up I'll +go through that process.

+

Parallelism

+

Sometimes in programming, parallelism (doing multiple things at the +same time), is desirable. For example, to complete some task two different long-running calculations have to be made. +If those can be run in parallel, our latency becomes that of the slowest of those tasks (plus some overhead).

+

Some ways of achieving parallelism in your program are:

+
    +
  1. SIMD, hopefully +your compiler does that for you. But here we're talking about singular processor operations, +not arbitrary tasks. +
  2. Offloading tasks to the OS. If your OS has asynchronous apis then you could ask it to do multiple things at once +and achieve parallelism that way. +
  3. Running tasks in other processes. +
  4. Running tasks in threads. +
+

Threads

+

Wikipedia says of threads:

+
+

"In computer science, a thread of execution is the smallest sequence of programmed instructions that can be +managed independently by a scheduler, which is typically a part of the operating system."

+
+

Threads from a programming perspective, are managed by the OS, how threads work is highly OS-dependent. I'll +only go into Linux specifically here, and only from an api-consumers perspective.

+

Spawning a thread with a minimal task

+

In the rust std-library, a thread can be spawned with

+
fn main() {
+    let handle = std::thread::spawn(|| {
+        std::thread::sleep(std::time::Duration::from_millis(500));
+        println!("Hello from my thread");
+    });
+    // Suspends execution of the calling thread until the child-thread completes.  
+    handle.join().unwrap();   
+}
+
+

In the above program, some setup runs before the main-function, some delegated to +libc. Which sets up what it deems appropriate. +Rust sets up a panic handler, and miscellaneous things the program needs to run correctly, +then the main-thread starts executing the main function.
+In the main function, the main thread spawns a child, which at the point of spawn starts executing the task provided by the +supplied closure Wait 500 millis, then print a message, then waits for that thread to complete.

+

I wanted to replicate this API, without using libc.

+

Clone, swiss army syscall

+

The Linux clone syscall can be used for a lot of things.
+So many that it's extremely difficult to use it correctly, it's very easy to cause security issues through +various memory-management mistakes, many of which I discovered on this journey.

+

The signature for the glibc clone wrapper function looks like:

+
int clone(int (*fn)(void *), void *stack, int flags, void *arg, ...
+/* pid_t *parent_tid, void *tls, pid_t *child_tid */ );
+
+

Right away I can tell that calling this is not going to be easy from Rust, we've got +varargs in there, which is problematic because:

+
    +
  1. Rust doesn't have varargs, porting some C-functionality from for example +musl won't be straight forward. +
  2. Varargs are not readable (objectively true opinion). +
+

Skipping down to the Notes-section of the documentation shows the actual syscall interface (for x86_64 in a +conspiracy to ruin my life, the last two args are switched on aarch64):

+
long clone(unsigned long flags, void *stack,
+                      int *parent_tid, int *child_tid,
+                      unsigned long tls);
+
+

Very disconcerting, since the C-api which accepts varargs, seems to do quite a bit of work before making the syscall, +handing over a task to the OS.

+

In simple terms, clone is a way to "clone" the current process. If you have experience with +fork, that's an example of clone. +Here's an equivalent fork using the clone syscall from tiny-std:

+
/// Fork isn't implemented for aarch64, we're substituting with a clone call here
+/// # Errors
+/// See above
+/// # Safety
+/// See above
+#[cfg(target_arch = "aarch64")]
+pub unsafe fn fork() -> Result<PidT> {
+    // SIGCHLD is mandatory on aarch64 if mimicking fork it seems
+    let cflgs = crate::platform::SignalKind::SIGCHLD;
+    let res = syscall!(CLONE, cflgs.bits().0, 0, 0, 0, 0);
+    bail_on_below_zero!(res, "CLONE syscall failed");
+    #[allow(clippy::cast_possible_truncation, clippy::cast_possible_wrap)]
+    Ok(res as i32)
+}
+
+

What happens immediately after this call, is that our process is cloned and starts executing past the code which called +clone, following the above Rust example:

+
fn parallelism_through_multiprocess() {
+    let pid = unsafe { rusl::process::fork().unwrap() };
+    if pid == 0 {
+        println!("In child!");
+        rusl::process::exit(0);
+    } else {
+        println!("In parent, spawned child {pid}");
+    }
+}
+
+

This program will print (in non-deterministic order):
+In parent, spawned child 24748 and
+In child, and return to the caller.

+

Here we achieved parallelism by spawning another process and doing work there, separately scheduled by the OS, +then exited that process. At the same time, our caller returns as usual, only stopping briefly to make the syscall.

+

Achieving parallelism in this way can be fine. If you want to run a command, forking/cloning then executing +another binary through the execve-syscall +is usually how that's done.
+Multiprocessing can be a bad choice if the task is small, because setting up an entire other process can be resource +intensive, and communicating between processes can be slower than communicating through shared memory.

+

Threads: Cloning intra-process with shared memory

+

What we think of as threads in linux are sometimes called +Light-Weight Processes, the above clone call spawned a regular +process, which got a full copy of the parent-process' memory with copy-on-write semantics.

+

To reduce overhead in both spawning, and communicating between the cloned process and the rest of the processes +in the application, a combination of flags are used:

+
let flags = CloneFlags::CLONE_VM
+        | CloneFlags::CLONE_FS
+        | CloneFlags::CLONE_FILES
+        | CloneFlags::CLONE_SIGHAND
+        | CloneFlags::CLONE_THREAD
+        | CloneFlags::CLONE_SYSVSEM
+        | CloneFlags::CLONE_CHILD_CLEARTID
+        | CloneFlags::CLONE_SETTLS;
+
+

Clone flags are tricky to explain, they interact with each other as well, but in short:

+
    +
  1. CLONE_VM, clone memory without copy-on-write semantics, meaning, the parent and child +share memory space and can modify each-other's memory. +
  2. CLONE_FS, the parent and child share the same filesystem information, such as current directory. +
  3. CLONE_FILES, the parent and child share the same file-descriptor table, +(if one opens an fd, that fd is available to the other). +
  4. CLONE_SIGHAND, the parent and child share signal handlers. +
  5. CLONE_THREAD, the child-process is placed in the same thread-group as the parent. +
  6. CLONE_SYSVSEM, the parent and child shares semaphores. +
  7. CLONE_CHILD_CLEARTID, wake up waiters for the supplied child_tid futex pointer when the child exits +(we'll get into this). +
  8. CLONE_SETTLS, set the thread-local storage to the data pointed at by the tls-variable (architecture specific, +we'll get into this as well). +
+

The crucial flags to run some tasks in a thread are only:

+
    +
  1. CLONE_VM +
  2. CLONE_THREAD +
+

The rest are for usability and expectation, as well as cleanup reasons.

+

Implementation

+

Now towards the actual implementation of a minimal threading API.

+

API expectation

+

The std library in Rust provides an interface that could be used like this:

+
let join_handle = std::thread::spawn(|| println!("Hello from my thread!"));
+join_handle.join().unwrap();
+
+

A closure that is run on another thread is supplied and a JoinHandle<T> is returned, the join handle +can be awaited by calling its join-method, which will block the calling thread until the thread executing the closure +has completed. If it panics, the Result will be an Err, if it succeeds, it will be an Ok(T) where T is +the return value from the closure, which in this case is nothing (());

+

Executing a clone call

+

If CLONE_VM is specified, a stack should be supplied. CLONE_VM means sharing mutable memory, if we didn't +supply the stack, both threads would continue mutating the same stack area, here's an excerpt from +the docs about that:

+
+

[..] (If the +child shares the parent's memory because of the use of the +CLONE_VM flag, then no copy-on-write duplication occurs and chaos +is likely to result.) - "C library/kernel differences"-section

+
+

Allocating the stack

+

Therefore, setting up a stack is required. There are a few options for that, the kernel only needs a chunk of correctly +aligned memory depending on what platform we're targeting. We could even just take some memory off our own stack +if we want too.

+
Use the callers stack
+
fn clone() {
+    /// 16 kib stack allocation
+    let mut my_stack = [0u8; 16384];
+    let stack_ptr = my_stack.as_mut_ptr();
+    /// pass through to syscall
+    syscall!(CLONE, ..., stack_ptr, ...);
+}
+
+

This is bad for a generic API for a multitude of reasons. +It restricts the user to threads that complete before the caller has popped the stack frame in which they were created, +since the part of the stack that was used in this frame will be reused by the caller later, possibly while the +child-thread still uses it for its own stack. Which we now know, would result in chaos.

+

Additionally, we will have to have stack space available on the calling thread, for an arbitrary amount of children +if this API was exposed to users. In the case a heap-allocations, the assumption that we will have enough memory for +reasonable thread-usage is valid. Rust's default thread stack size is 2MiB. On a system with 16GiB of RAM, with +8GiB available at a given time, that's 4000 threads, spawning that many is likely not intentional or performant.

+

Keeping threads on the main-thread's stack, significantly reduces our memory availability, along with the risk of chaos.

+

There is a case to be made for some very specific application which spawns some threads in scope, does some work, then exits, +to reuse the caller's stack. But I have yet to encounter that kind of use-case in practice, let's move on to something +more reasonable.

+
Mmap more stack-space
+

This is what musl does. We allocate the memory that we want to use from new os-pages and use those.
+We could potentially do a regular malloc as well, although that would mean less control over the allocated memory.

+

Communicating with the started thread

+

Now mmap-ing some stack-memory is enough for the OS to start a thread with its own stack, but then what?
+The thread needs to know what to do, we can't provide it with any arguments, we need to put all the data it needs +on its stack before starting execution of the task.

+

This means that we'll need some assembly, since using the clone syscall and then continuing in Rust relinquishes +control that we need over the stack, we need to put almost the entire child-thread's lifetime in assembly.

+

The structure of the call is mostly stolen from musl, with some changes for this more minimal use-case. +The rust function will look like this:

+
extern "C" {
+    fn __clone(
+        start_fn: usize,
+        stack_ptr: usize,
+        flags: i32,
+        args_ptr: usize,
+        tls_ptr: usize,
+        child_tid_ptr: usize,
+        stack_unmap_ptr: usize,
+        stack_sz: usize,
+    ) -> i32;
+}
+
+
    +
  1. It takes a pointer to a start_fn, which is a C calling convention function pointer, where our thread will pick up. +
  2. It also takes a pointer to the stack, stack_ptr. +
  3. It takes clone-flags which we send onto the OS in the syscall. +
  4. It takes an args_ptr, which is the closure we want to run, converted to a C calling convention function pointer. +
  5. It takes a tls_ptr, a pointer to some thread local storage, which we'll need to deallocate the thread's stack, and +communicate with the calling thread. +
  6. It takes a child_tid_ptr, which will be used to synchronize with the calling thread. +
  7. It takes a stack_unmap_ptr, which is the base address that we allocated for the stack at its original 0 offset. +
  8. It takes the stack_sz, stack-size, which we'll need to deallocate the stack later. +
+

Syscalls

+

x86_64 and aarch64 assembly has a command to execute a syscall.

+

A syscall is like a function call to the kernel, we'll need to make three syscalls using assembly:

+
    +
  1. CLONE, nr 56 on x86_64 +
  2. MUNMAP, nr 11 on x86_64 +
  3. EXIT, nr 60 on x86_64 +
+

The interface for the syscall is as follows:

+
/// Syscall conventions are on 5 args:
+/// - arg -> arch: reg,
+/// - nr -> x86: rax, aarch64: x8
+/// - a1 -> x86: rdi, aarch64: x0
+/// - a2 -> x86: rsi, aarch64: x1
+/// - a3 -> x86: rdx, aarch64: x2
+/// - a4 -> x86: r10, aarch64: x3
+/// - a5 -> x86: r8,  aarch64: x4
+/// Pseudocode syscall as extern function: 
+extern "C" {
+    fn syscall(nr: usize, a1: usize, a2: usize, a3: usize, a4: usize, a5: usize);
+}
+
+

Onto the assembly, it can be boiled down to this:

+
    +
  1. Prepare arguments to go in the right registers for the syscall. +
  2. Put what the thread needs into its stack. +
  3. Execute the clone syscall, return directly to the caller (parent-thread). +
  4. Pop data from the spawned thread's stack into registers. +
  5. Execute the function we wanted to run in the spawned thread. +
  6. Unmap the spawned thread's own stack +
  7. Exit 0 +
+
// Boilerplate to expose the symbol
+.text
+.global __clone
+.hidden __clone
+.type   __clone,@function
+// Actual declaration
+__clone:
+// tls_ptr already in r8, syscall arg 5 register, due to C calling convention on this function, same with stack_ptr in rsi
+// Zero syscall nr register ax (eax = 32bit ax)
+xor eax, eax
+// Move 56 into the lower 8 bits of ax (al = 8bit ax), 56 is the CLONE syscall nr for x86_64, will become: syscall(56, .., stack_ptr, .., tls_ptr)
+mov al, 56
+// Move start function into r11, scratch register, save it there since we need to shuffle stuff around
+mov r11, rdi
+// Move flags into rdi, syscall arg 1 register, well become: syscall(56, flags, stack_ptr, .., .., tls_ptr)
+mov rdi, rdx
+// Zero parent_tid_ptr from syscall arg 3 register (not using), will become: syscall(56, flags, stack_ptr, 0, .., tls_ptr)
+xor rdx, rdx
+// Move child_tid_ptr into syscall arg 4 register (our arg 6), will become: syscall(56, flags, stack_ptr, 0, child_tid_ptr, tls_ptr)
+mov r10, r9
+// Move start function into r9
+mov r9, r11
+// Align stack ptr to -16
+and rsi, -16
+// Move down 8 bytes on the stack ptr
+sub rsi, 8
+// Move args onto the the top of the stack
+mov [rsi], rcx
+// Move down 8 bytes more on the stack ptr
+sub rsi, 8
+// Move the first arg that went on the stack into rcx (stack_unmap_ptr)
+mov rcx, [8 + rsp]
+// Move stack_unmap_ptr onto our new stack
+mov [rsi], rcx
+// Move the second arg that went on the stack into rcx (stack_sz)
+mov rcx, [16 + rsp]
+// Move down stack ptr
+sub rsi, 8
+// Move stack_sz onto the new stack
+mov [rsi], rcx
+// Make clone syscall
+syscall
+// Check if the syscall return vaulue is 0
+test eax, eax
+// if not zero, return (we're the calling thread)
+jnz 1f
+// Child:
+// Zero the base pointer
+xor ebp, ebp
+// Pop the stack_sz off the provided stack into callee saved register
+pop r13
+// Pop the stack_ptr off the provided stack into another callee saved register
+pop r12
+// Pop the start fn args off the provided stack into rdi
+pop rdi
+// Call the function we saved in r9, rdi first arg
+call r9
+// Zero rax (function return, we don't care)
+xor rax, rax
+// Move MUNMAP syscall into ax
+mov al, 11
+// Stack ptr as the first arg
+mov rdi, r12
+// Stack len as the second arg
+mov rsi, r13
+// Syscall, unmapping the stack
+syscall
+// Clear the output register, we can't use the return value anyway
+xor eax,eax
+// Move EXIT syscall nr into ax
+mov al, 60
+// Set exit code for the thread to 0
+mov rdi, 0
+// Make exit syscall
+syscall
+1: ret
+
+

And that's it, kinda, with some code wrapping this we can run an arbitrary closure on a separate thread!

+

Race conditions

+

We're far from done, in the happy case we're starting a thread, it completes, and deallocates its own stack. +But, we need to get its returned value, and we need to know if it's done.

+

Unlike a process, we cannot use the wait-syscall to wait +for the process to complete, but there is another way, alluded to in the note on CLONE_CHILD_CLEARTID.

+

Futex messaging

+

If CLONE_CHILD_CLEARTID is supplied in clone-flags along with a pointer to a futex variable, something with a u32-layout +in Rust that's most reasonably AtomicU32, then the OS will set that futex-value to 0 (not null) when the thread exits, +successfully or not.

+

This means that if the caller wants to join, i.e. blocking-wait for the child-thread to finish, it can use the +futex-syscall.

+

Getting the returned value

+

The return value is fairly simple, we need to allocate space for it, for example with a pointer to an UnsafeCell<Option<T>>, +and then have the child-thread update it. The catch here is that we can't have &-references to that value while the child-thread +may be writing to it, since that's UB. We share a pointer with the child containing the value, and we need to be +absolutely certain that the child-thread is done with +its modification before we try to read it. For example by waiting for it to exit by join-ing.

+

Memory leaks, who deallocates what?

+

We don't necessarily have to keep our JoinHandle<T> around after spawning a thread. A perfectly valid use-case is to +just spawn some long-running thread and then forget about it, this causes a problem, if the calling thread doesn't have +sole responsibility of deallocating the shared memory (the futex variable, and the return value), then we need a way +to signal to the child-thread that it's that thread's responsibility to deallocate those variables before exiting.

+

Enter the third shared variable, an AtomicBool called should_dealloc, both threads share a pointer to this variable +as well.

+

Now there are three deallocation-scenarios:

+
    +
  1. Caller joins the child thread by waiting for the futex-variable to change value to 0. +In this case the caller deallocates the futex, takes the return value of the heap freeing its memory, and +deallocates the should_dealloc pointer. +
  2. Caller drops the JoinHandle<T>. This is racy, we need to read should_dealloc to see that the child thread hasn't +already completed its work. If it has, we wait on the futex to make sure the child thread is completely done, then +deallocate as above. +
  3. The child thread tries to set should_dealloc to true and fails, meaning that the calling thread has already +dropped the JoinHandle<T>. In this case, the child thread needs to signal to the OS that the futex is no longer +to be updated on thread exit through the +set_tid_address-syscall (forgetting to do this results in a +use after free, oof. Here's a Linux-code-comment calling me a dumbass that I found when trying to find the source of the segfaults: +
+
// 929ed21dfdb6ee94391db51c9eedb63314ef6847, kernel/fork.c#L1634, written by Linus himself
+if (tsk->clear_child_tid) {
+		if (atomic_read(&mm->mm_users) > 1) {
+			/*
+			 * We don't check the error code - if userspace has
+			 * not set up a proper pointer then tough luck.
+			 */
+			put_user(0, tsk->clear_child_tid);
+			do_futex(tsk->clear_child_tid, FUTEX_WAKE,
+					1, NULL, NULL, 0, 0);
+		}
+		tsk->clear_child_tid = NULL;
+	}
+
+

). Then it can safely deallocate the shared variables.

+

Oh, right. Panics...

+

I imagine a world where Rust doesn't contain panics. Sadly, we don't live in that world, and thus we need to handle them.
+If the thread panics, and we try to join then it's no issue, we'll get a None return value, and can continue with +the regular cleanup from the caller.
+However, if the thread panics after the caller has dropped the JoinHandle<T> the shared memory is leaked, +and the stack isn't deallocated.

+

A Rust panic handler could like this:

+
/// Dummy panic handler
+#[panic_handler]
+pub fn on_panic(info: &core::panic::PanicInfo) -> ! {
+    loop {}
+}
+
+

The signature shows that it gets PanicInfo and never returns.
+When a thread panics, it enters that function and never returns, it's here that we need to handle cleanup in the +case that the thread panics.

+

What we need:

+
    +
  1. A pointer to the futex +
  2. A pointer to the return value +
  3. A pointer to the should_dealloc variable +
  4. The address at which we allocated this thread's stack +
  5. The size of that allocated stack +
+

We could insert those in registers that shouldn't be touched by the user-supplied function, but that's fairly brittle, +instead we'll use the dreaded tls.

+

Thread-local storage

+

Thread-local storage, or tls is a way to store thread-specific data.
+For x86_64 and aarch64 there is a specific register we can use to store a pointer to some arbitrary data, +we can read from that data at any time from any place, in other words, the data is global to the thread.

+

In practice:

+
#[repr(C)]
+#[derive(Copy, Clone)]
+pub(crate) struct ThreadLocalStorage {
+    // First arg needs to be a pointer to this struct, it's immediately dereferenced
+    pub(crate) self_addr: usize,
+    // Info on spawned threads that allow us to unmap the stack later
+    pub(crate) stack_info: Option<ThreadDealloc>,
+}
+#[repr(C)]
+#[derive(Copy, Clone)]
+pub(crate) struct ThreadDealloc {
+    // For the stack dealloc
+    stack_addr: usize,
+    stack_sz: usize,
+    // For the return value dealloc
+    payload_ptr: usize,
+    payload_layout: Layout,
+    // Futex, 
+    futex_ptr: usize,
+    // Sync who deallocs
+    sync_ptr: usize,
+}
+#[inline]
+#[must_use]
+fn get_tls_ptr() -> *mut ThreadLocalStorage {
+    let mut output: usize;
+    #[cfg(target_arch = "x86_64")]
+    unsafe {
+        core::arch::asm!("mov {x}, fs:0", x = out(reg) output);
+    }
+    #[cfg(target_arch = "aarch64")]
+    unsafe {
+        core::arch::asm!("mrs {x}, tpidr_el0", x = out(reg) output);
+    }
+    output as _
+}
+
+

This takes us to another of our clone-flags CLONE_SETTLS, we can now allocate and supply a pointer to a +ThreadLocalStorage-struct, and that will be put into the thread's thread-local storage register by the OS, +which registers are used can be seen in get_tls_ptr.

+

Now when entering the panic_handler we can get_tls_ptr and see if there is a ThreadDealloc associated with the +thread that's currently panicking. If there isn't, we're on the main thread, and we'll just bail out by exiting with +code 1, terminating the program. +If there is a ThreadDealloc we can now first check if the caller has dropped the JoinHandle<T>, +and if we have exclusive access to the shared memory, if we do have exclusive access we deallocate it, +if we don't we let the caller handle it. Then, again we have to exit with some asm:

+
// We need to be able to unmap the thread's own stack, we can't use the stack anymore after that
+// so it needs to be done in asm.
+// With the stack_ptr and stack_len in rdi/x0 and rsi/x1, respectively we can call mmap then
+// exit the thread
+#[cfg(target_arch = "x86_64")]
+core::arch::asm!(
+// Call munmap, all args are provided in this macro call.
+"syscall",
+// Zero eax from munmap ret value
+"xor eax, eax",
+// Move exit into ax
+"mov al, 60",
+// Exit code 0 from thread.
+"mov rdi, 0",
+// Call exit, no return
+"syscall",
+in("rax") MUNMAP,
+in("rdi") map_ptr,
+in("rsi") map_len,
+options(nostack, noreturn)
+);
+
+

We also need to remember to deallocate the ThreadLocalStorage, what we keep in the register is just a pointer to +that allocated heap-memory. This needs to be done both in successful and panicking thread-exits.

+

Final thoughts

+

I've been dreading reinventing this particular wheel, but I'm glad I did. +I learnt a lot, and it was interesting to dig into how threading works in practice on Linux, plus tiny-std now has +threads!

+

The code for threads in tiny-std can be found here. +With a huge amount of comments its 500 lines.

+

I believe that it doesn't contain UB or leakage, but it's incredibly hard to test, what I know is lacking is signal +handling, which is something else that I have been dreading getting into.

+

Next up

+

I've ordered a Pinephone explorer edition, I'll probably try doing stuff with that next.

+

Thanks for reading!

+
+
\ No newline at end of file diff --git a/x11-to-xcb.html b/x11-to-xcb.html new file mode 100644 index 0000000..aa7f8ac --- /dev/null +++ b/x11-to-xcb.html @@ -0,0 +1,140 @@ + + + + + + + + + X11ToXcb + + + +
+

Rewrite it in Rust, a cautionary tale

+

RIIR (Rewrite It In Rust) is a pretty fun joke, at my current workplace my team writes +essentially everything in Rust, for good and bad. We like to have a bit of +fun with it, pushing the RIIR-agenda around the company.

+

But, this short retrospective is about when porting something from C-bindings to Rust +just made life harder.

+

Security advisory on Rust XCB-bindings

+

I've written a lot about XCB and +X11 in my project write-ups +about my x11-wm, I'm not going +to get into it here, but for these purposes XCB can be summarized as a +library to handle displaying things on a desktop.

+

One day when building a project, a security advisory comes up on Rust's XCB bindings.

+

Bindings

+

Generally if you want to use an existing big library, you can take the approach of reinventing the wheel, +or creating bindings to a C-library that already exists. For example, +Rust has a zstd crate which contains bindings +to libzstd. If you want to use that, +you need to have libzstd available to the binary. Sometimes, it's built as +part of a build-script and statically compiled into the binary, then you don't +have to worry about it at all (Rocksdb does this I think). +There's also a pure Rust implementation of zstd decompression, +which is the other approach, same algorithm, different implementation.

+

Why not?

+

There are some good reasons to RIIR, all the good things about using Rust can go here. +But, there are some very good reasons not to, apart from the effort.
+The one this retrospective is about is maturity, and the robustness that can come from it.

+

Porting x11-clipboard from C-bindings to Rust implementation

+

The security advisory comes up, transitively through x11-clipboard, +but the advisory is on the XCB-bindings.
+As I mentioned, my previous work on my WM had made me familiar with a Rust +library that replaces the bindings: x11rb.

+

To be clear, x11rb is a great library, and the story is not about how it contained some unexpected bug, +it didn't, it was the act of replacement that became the issue.

+

I made a PR on June16, 2022 to replace usage of the bindings, to +x11rb in `x11-clipboard. The PR is fairly large, but very procedural. The rust-api +is essentially the same as the C-one, it was mostly a matter of changing the types.

+

Creeping issues

+

x11-clipboard is a library that handles copying and pasting withing x11-sessions. It's used for a lot of Rust's +gui-applications, so people are likely to run into mistakes if you make them, and there were mistakes.

+

Bug report through alacritty

+

9 months later, alacritty gets a bug report, where +when things are pasted FROM alacritty into other applications, they hang.

+

The bug report is floated into x11-clipboards issue tracker after +a bisection shows that the problem comes from the version update caused by my change.

+

Debugging it was medium-difficult, it was easy to reproduce, but difficult to understand, but in the end it was +resolved by a +1 -1 change,

+

From this:

+
        time: event.time,
+        requestor: event.requestor,
+        selection: event.selection,
+        target,
+        property: event.property
+    }
+);
+
+

To this:

+
        time: event.time,
+        requestor: event.requestor,
+        selection: event.selection,
+        target: event.target,
+        property: event.property
+    }
+);
+
+

The error was interesting, it caused some clients (a client in this context is an application like +Brave browser) to hang waiting for a notification that the application never sent.

+

A funny note about X11 is that the protocol has been around for so long, and seen so much misuse, that a lot of +clients are built to handle this kind of mistake, so the error doesn't show up on for example +Firefox.

+

Bug report through pot-app

+

On Jan 17, 2024 a bug report comes in from pot-app/Selection.

+

Pot App is:

+
+

🌈一个跨平台的划词翻译和OCR软件 | A cross-platform software for text translation and recognition.

+
+

To be fair, I think this was a pre-existing bug, but I was kind of on the hook at this point, and it was interesting.

+

The clipboard library spawns a thread that listens for events, this threads holds a claim to the connection to the +x-server, blocking waiting for a reply. Even if the handle that's given to the user is dropped that thread stays alive, keeping +the connection alive. This means that if you're recreating the structure in a loop for example, you start leaking +connections until the connection-pool is drained, which means that no new clients can connect. Or in other words, +no more applications can start because you clogged up the server.

+

A problem here is that the thread needs to know from the structure that spawned it, that it's done and should quit. +There are not many nice way of signalling threads like that they are blocked waiting for something.

+

The thread waits like this:

+
while let Ok(event) = context.connection.wait_for_event() {
+
+

The API doesn't have other facilities for waiting other than polling in a loop, and ideally one doesn't want to +run the thread at 100% CPU just waiting.

+

However, you can get the underlying file descriptor for the connection like this:

+
let stream_fd = context.connection.stream().as_fd();
+
+

And if you have an FD, you can use Linux's APIs to check for readiness, instead of what's exposed through the +x11rb API. This is only running on Linux anyway, so why not? (This is foreshadowing).

+

In the end I make a PR that uses libc, +the Linux Poll API, and an, +eventfd. +If the struct is dropped, it'll write an event on the eventfd. On the other side, the thread polls for +either a new message on the stream, or an event on the eventfd, if an event arrives on the stream, it'll handle that +like before, if it arrives through the eventfd it just quits. That solved the issue.

+

Bug report through the regular issue tracker

+

On Feb 28, 2024 a bug report is posted on x11-clipboard.

+

Now, I figured X11 was only used on Linux, Mac and Windows have their own display systems. But, I forgot about +the BSDs, those operation systems can run X11, and I should have thought about that before picking the +Linux specific eventfd.

+

POSIX

+

POSIX is an OS-compatibility standard, if you use POSIX-compliant OS-apis, +can generally get away with using APIs that interface with the OS for Linux and they'll still work for the BSDs, +some examples: poll, read, write, +pipe. eventfd is a counter-example.

+

What my bugfix was trying to achieve was a drop of the struct exposed through the x11-clipboard API causing something +pollable to happen in the running thread. I thought eventfd was a good fit, but something POSIX-compliant would be +to create a pipe, two fds, a read-end and a write-end, put the write-end in the user struct, the read-end in the +thread, and poll for a POLLHUP (hangup), that gets sent to one end when the other end's FD is closed.

+

Now I could use the existing RAII-closing of the write-end on the user struct, and just listen to a hangup on the running +thread, and it works on the BSDs!

+

Conclusion

+

For now that's been it, I'll update this if more stuff comes in. I think that lessons learned here are that there's a +maintenance cost to any change. While RIIR might be fun, it's good to think twice about how reasonable it is.

+

Of course, there may be lurking bugs in the C-implementation that isn't seen because of selection bias, but I don't have +any basis for that.

+

Last of all, I'm sorry for the hassle quininer, I know you don't want to maintain this +project anymore, and I made your life a bit more difficult.

+
+
\ No newline at end of file