Skip to content
This repository has been archived by the owner on Mar 7, 2021. It is now read-only.

Add RCU bindings #143

Open
geofft opened this issue Aug 24, 2019 · 5 comments
Open

Add RCU bindings #143

geofft opened this issue Aug 24, 2019 · 5 comments
Labels
enhancement New feature or request

Comments

@geofft
Copy link
Collaborator

geofft commented Aug 24, 2019

There are some APIs like for_each_process that want you to hold an RCU read lock when calling them. We should support this, at minimum. We probably want to also have full support for RCU-protected data, but maybe we can put that on hold.

Upstream RCU docs: https://www.kernel.org/doc/Documentation/RCU/whatisRCU.txt

The rough idea of RCU is to implement a reader-writer lock-ish pattern where readers are low-overhead, under the assumption that all readers are short-lived - a valid assumption for kernels, where a "reader" is the execution of a single syscall/interrupt; anything that persists state beyond this will count as a "writer". In particular, readers should not be slowed down when a writer is trying to do an update, so ideally reads should happen concurrently. The way this is done is by having writers allocate and fill in a new structure, atomically update the pointer to it, synchronize memory and wait for existing readers to complete, and then deallocating the old structure. (This pointer can either be a single pointer to some single structure, or one of the pointers in a linked list, or something.) That way, existing readers see a consistent view - either the old or new version - and you're guaranteed that once you're finished synchronizing, any readers in progress are only seeing the new version.

(Multiple writers are not protected by RCU in any way. They should use a conventional mutex, or something.)

The basic RCU API in Rusty pseudocode is

extern {
    fn rcu_read_lock();
    fn rcu_read_unlock();
    fn synchronize_rcu();
    fn rcu_assign_pointer<T: Sized>(p: &*mut T, v: *mut T);
    fn rcu_dereference<T: Sized>(p: &*T) -> *T;
}

(Note the latter two in C are macros, into which p is syntactically passed by reference. If we want to bind them directly, we probably want to expose C helpers that take void ** and handle the type safety in the Rust wrapper.)

rcu_read_lock and rcu_read_unlock take no context info, they cause an RCU read-sized critical section for any RCU use in the kernel. Between those you may call rcu_dereference on anything. (That said, there are two other RCU "flavors" in the kernel, rcu_read_lock_bh etc. and rcu_read_lock_sched etc. They appear to have been consolidated last year but I think from the API they should still be treated as separate? In any case, the other flavors are rarely used and we can probably get away with ignoring them for now.)

The rules as I understand them are:

  • You can only read an RCU pointer between rcu_read_lock and rcu_read_unlock (aka "within a read-side critical section"), and you can only read it with rcu_dererence (though it compiles out on all architectures besides Alpha). rcu_dereference doesn't actually do the dereferencing itself, it simply gives you a pointer you can safely deference until rcu_read_unlock.
  • You cannot block in a read-side critical section.
  • You can only write an RCU pointer with rcu_assign_pointer, though you can do so at any point without advance notice.
  • You must keep the old object pointed to by an rcu_assign_pointer valid until you've called synchronize_rcu.

You may nest / overlap read-side critical sections, though. It's a reader lock, you can have more than one of those and writers can't do anything to invalidate the data you're reading until all reader locks are dropped. The only difference with RCU is that writers only wait on synchronize_rcu for existing reader locks, once they start, any new reader locks don't affect it (new reads are now guaranteed to be ordered after any previous rcu_assign_pointers).

I think the rough way to handle this is to

  • create an Rcu<T> pointer type
  • have an empty RcuRead object whose constructor calls rcu_read_lock(), whose destructor calls rcu_read_unlock(), and which is required to read an Rcu<T>
  • have ... some sort of operation to assign to an Rcu<T> that keeps the old value alive. Perhaps it returns an RcuDropGuard<T> object that holds on to a pointer to the old object, and that object's destructor calls synchronize_rcu() and then frees it?

I think it's memory-safe that we're using RAII / destructors here. The worst that can happen is that you deadlock (if you forget an RcuRead) or you keep the old value alive forever (if you forget RcuDropGuard<T>). The unsafety in std::thread::scoped::JoinGuard was that there were operations that were unsafe to do before the JoinGuard was dropped and safe after (like, deallocate data used by the thread). There are no operations that are enabled by either an RcuRead or an RcuDropGuard<T> going out of scope. (The RcuDropGuard<T> itself needs to handle deallocating the old object, for this reason, to guarantee that synchronize_rcu() was in fact called.)

I'm a little less sure about making sure you don't hold the result of reading an Rcu<T> after the originating RcuRead goes out of scope / after you call rcu_read_unlock. Can we use lifetimes here, by giving you a pointer that's constrained to the lifetime of RcuRead? Or do we need to insist on making you pass a callback/lambda?

For the short term, I'd like to at least introduce RcuRead and have it be a required parameter for binding things like for_each_process, and we can handle doing RCU pointer reads and writes in Rust later. (Which also lets us defer figuring out how to integrate the mutex needed for avoiding concurrent writes.)

More interesting RCU docs:

More description of things that RCU does and does not guarantee: https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html

An RCU home page, more or less: http://www.rdrop.com/~paulmck/RCU/

"RCU's first-ever CVE, and how I lived to tell the tale": https://youtu.be/hZX1aokdNiY http://www.rdrop.com/~paulmck/RCU/cve.2019.01.23e.pdf (tl;dr: use-after-free because they locked the wrong RCU flavor, but the solution is a bit more complicated than that)

@geofft geofft changed the title Support calling RCU-protected functions Add RCU bindings Aug 24, 2019
@geofft
Copy link
Collaborator Author

geofft commented Aug 24, 2019

Thinking a bit harder about the long-term API and how to handle writers... the interesting part of RCU is that it separates readers and writers. You don't need to block readers while doing a write; you only need to block writers at some point after they've done a write.

Also (as the name and some of the terminology imply) "writer" isn't quite accurate. You can't write to an RCU-protected structure. You can only update it to a new one, and then reclaim the old structure later. Since there's no acquire step on the write side, just an atomic update, if a writer needs to read the old structure to copy and modify it, it needs to ensure that no other writers are conflicting with it. (I suppose this could be done with RCU itself, but in practice it seems to be more useful to hold a spinlock across both the read and the update. There's a helper macro rcu_deference_protected(p, lockdep_is_held(&some_mutex)) for this, which just asserts that its second parameter is held and skips the barrier.)

So... I think the way to do this is to treat the read capability and the update capability as two different Rust objects that contain the same pointer. Something roughly like

pub struct Rcu<T> {ptr: *const T);
pub struct RcuWrite<T> {ptr: Box<T>);

pub struct RcuReadGuard(());
impl RcuReadGuard {
    fn new() -> Self { unsafe { rcu_read_lock() }; RcuReadGuard(()) }
}
impl Drop for RcuReadGuard {
    fn drop(&mut self) { unsafe { rcu_read_unlock () }; }
}

impl<T> Rcu<T> {
    fn init(data: T) -> (Rcu<T>, RcuWrite<T>) {
        let w = RcuWrite { ptr: Box::new(data) };
        let r = Rcu { w.ptr.into_raw() };
        (r, w)
    }
    fn read<'a>(&self, _: &'g RcuReadGuard) -> &'g T {
       unsafe { rcu_defererence(self.ptr) } as &T
    }
}

pub struct RcuWriteGuard(Box<T>);
impl Drop for RcuWriteGuard {
    fn drop(&mut self) { unsafe { synchronize_rcu () }; }
}

impl<T> RcuWrite<T> {
    fn update(&mut self, data: T) {
        let new = Box::new(data);
        // protected because we have &mut self
        let old = unsafe { Box::from_raw(rcu_dereference_protected(self.ptr, 1)) };
        unsafe { rcu_assign_pointer(&mut self.ptr, new.into_raw()) };
        RcuWriteGuard(old)
    }
}

or something. Which you could use, then, by storing both a Mutex<RcuWrite<T>> and an Rcu<T> in statics. The Mutex gets you safety around concurrent updates, and reads can happen from the Rcu<T> while you are updating.

@geofft
Copy link
Collaborator Author

geofft commented Aug 25, 2019

Er, obviously that implementation isn't quite right, you need Rcu and RcuWrite to overlap and contain the same pointer, not just have two independent pointers to the same value. (Otherwise RcuWrite::update() will not affect Rcu::read()!) I think it's sound to literally just transmute one to the other, the whole point of RCU is to making sure that updating the pointer value is sound.

Also perhaps a better design, instead of spitting out a tuple of the read and write objects, is to construct the RcuWrite first, then have a method that gets you an Rcu<T, 'a> from an &'a RcuWrite<T>. (Add a PhantomData<&'a ()> to Rcu to ensure that it does not outlive the underlying RcuWrite, which owns the real data.)

In any case, I'm fairly confident about the design of RcuReadGuard: use it as an RAII guard and pass a reference to it into functions that need it (which ensures that they give the reference back, you can't store a reference). And I think that's the right thing to implement first. Note that this matches the design of crossbeam_epoch::Guard: you allocate a guard in a local, immediately take a reference to it, and pass that reference around to functions that need it. (AIUI, crossbeam_epoch is very similar to RCU in design, it just doesn't have the kernel-style guarantee of "epochs" terminating in a timely fashion, so it doesn't want writers to block until all read-side critical sections are done and instead uses deferred work. I think you can get that in the kernel, btw, by using call_rcu which takes a deallocation callback instead of synchronize_rcu which blocks.)

@nicowilliams-work
Copy link

nicowilliams-work commented Aug 29, 2019

[Deleted. This issue is about RCU bindings for Rust, for Rust-coded Linux kernel modules. My earlier comment is not relevant.]

@geofft
Copy link
Collaborator Author

geofft commented Aug 29, 2019

@nicowilliams-work Unfortunately we need to bind the running kernel's implementation of RCU since we need to access exported RCU-protected data structures.

(Although, now that I think about it, for the use case of things that live entirely in Rust-space, I vaguely wonder if crossbeam-epoch would work... it does have no_std support. Might be an interesting thing to benchmark RCU in C vs. RCU Rust bindings vs. crossbeam-epoch.)

@alex alex added the enhancement New feature or request label Sep 1, 2019
@geofft
Copy link
Collaborator Author

geofft commented Aug 17, 2020

A working version of read support (only) is up at #250.

geofft added a commit that referenced this issue Aug 19, 2020
Add an RcuReadGuard object for calling functions that expect to be
RCU-protected, and add a Rust equivalent of for_each_process and a very
light task_struct bindings sufficient to write a simple test/demo.
geofft added a commit that referenced this issue Aug 19, 2020
Add an RcuReadGuard object for calling functions that expect to be
RCU-protected, and add a Rust equivalent of for_each_process and a very
light task_struct bindings sufficient to write a simple test/demo.
geofft added a commit that referenced this issue Aug 22, 2020
Add an RcuReadGuard object for calling functions that expect to be
RCU-protected, and add a Rust equivalent of for_each_process and a very
light task_struct bindings sufficient to write a simple test/demo.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants