Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for .so compartments #13

Merged
merged 1 commit into from
Feb 7, 2024

Conversation

0152la
Copy link
Contributor

@0152la 0152la commented Jan 24, 2024

Implement support for compartments to be gives as dynamic libraries (i.e., so files). This includes looking for library dependencies, and loading those in the same compartment as the given user file. These libraries are expected to be found in the path given by the environment variable COMP_LIBRARY_PATH.

What this means is that we now also add a "local" libc to each compartment that needs it (likely all?). Thus, we can greatly reduce the number of intercepts we require. Perhaps we can overhaul the mechanism entirely to only intercept hard-coded functions we know we need (e.g., allocator calls).

The support for static binaries has not been removed completely, but likely has been broken due to this overhaul. Whether to leave as is, to remove completely, or to add support for it remains to be decided.

In terms of the PR, there might be scope to logically split it up. I kept the individual commits in a separate branch, to squash them into smaller commits, if need be.

Left TODOs:

  • check vDSO and system calls in the new system
  • fix all old tests
  • consider making stack / heap of compartments dynamic sized, rather than a static hardcoded variable
  • revisit logic for storing transition capabilities - I think there's something wrong there

@0152la 0152la requested a review from ltratt January 24, 2024 13:01
CMakeLists.txt Outdated Show resolved Hide resolved
include/compartment.h Outdated Show resolved Hide resolved
include/intercept.h Outdated Show resolved Hide resolved
include/intercept.h Outdated Show resolved Hide resolved
include/intercept.h Outdated Show resolved Hide resolved
src/compartment.c Show resolved Hide resolved
src/compartment.c Outdated Show resolved Hide resolved
src/compartment.c Outdated Show resolved Hide resolved
src/compartment.c Outdated Show resolved Hide resolved
src/compartment.c Outdated Show resolved Hide resolved
src/intercept.c Outdated Show resolved Hide resolved
src/intercept.c Show resolved Hide resolved
src/intercept.c Show resolved Hide resolved
src/manager.c Show resolved Hide resolved
src/manager.c Show resolved Hide resolved
src/manager.c Outdated Show resolved Hide resolved
tests/CMakeLists.txt Show resolved Hide resolved
tests/loader-lib.c Outdated Show resolved Hide resolved
@ltratt
Copy link
Contributor

ltratt commented Jan 25, 2024

I've left lots of detailed, but mostly shallow, comments. I have two bigger comments:

  1. There is a lot of use uintptr_t but nearly all (maybe all) of these felt like they were really just pointers and thus probably should be void *? I might have misunderstood something here!
  2. I've really lost sense of the overall structure of the system. For example a lot of manager_ functions are in intercept.c and a lot of compartment setup stuff is in manager.c as well as the more obviously "manager" stuff.

I think the first point might need to be thought about in the context of this PR.

I think the second point doesn't need to be addressed in this PR, but (assuming this isn't entirely me being stupid!) I think a follow-up PR that rethinks the structure might be worthwhile. I guess that some splitting up of some of the larger files might be worthwhile, as well as moving code around? One thing that might be useful at that point is to use the old trick of giving "public" functions in each module a "modulename_" prefix. So all public functions in "manager.c" might have the prefix m_ (or mg_ or ...) and so on. I think that might help me navigate things a little bit better. At the same time, perhaps a system overview in README.md might be useful?

I have one shallow comment: I keep misreading types formatted x* y as x y not x *y. The latter is so universal in C code that using the former really confuses my eyes. I think it might be worth reconsidering whether this unusual difference in formatting is useful or not. Perhaps there could be a future clang-format PR? Up to you, and I won't raise this as an issue again for this repo, but I thought I'd mention it.

And finally one very random idea: I wonder whether it might be worth running a static analyser over the code? Clearly the DDC manipulation is going to confuse any static analyser and it will give even more imperfect results than on "normal" C code. But I am partly curious whether it does anything useful! I have found LLVM's builtin in scan-build to be surprisingly useful (I remember when it found real bugs in my code such as ltratt/extsmail@85cf55f). This might be a terrible idea, so take it for what it's worth!

@0152la
Copy link
Contributor Author

0152la commented Jan 25, 2024

There is a lot of use uintptr_t but nearly all (maybe all) of these felt like they were really just pointers and thus probably should be void *? I might have misunderstood something here!

I think I'm using uintptr_t only when I need to do some pointer arithmetic. I can do another pass to ensure that is the case, and change appropriately to void * otherwise.

I've really lost sense of the overall structure of the system. For example a lot of manager_ functions are in intercept.c and a lot of compartment setup stuff is in manager.c as well as the more obviously "manager" stuff.

The idea with that re-organization was:

  • everything that the end-user should need is only in manager.c. This will, for the most part, mirror main compartment.c functions (e.g., comp_exec, comp_from_file), but are provided as a general use point
  • compartment.c contains stuff to do with compartment management, mapping, look-ups, all that stuff
  • intercept.c contains stuff to do with intercepts, and thus also with internal compartment memory management (mem_mng.c). I might've missed renaming some manager stuff when I moved them over to intercept. I'll double check.

If you agree with the layout above (which I can add to a README), then I can take care to rename stuff appropriately in this PR (I admit I was a bit sloppy with this). Otherwise, we can leave stuff as is, and discuss a better overall design.

I keep misreading types formatted x* y as x y not x *y. The latter is so universal in C code that using the former really confuses my eyes.

I personally prefer putting the pointer next to the type, but it seems 1 the other way is considered "C-style". I can of course (and probably should, unrelated to this particular point) add a clang-format that does whatever we prefer. If you heavily prefer the pointer next to the variable, I'll keep that in mind. Just adding a clang-format might be a good next PR.

I wonder whether it might be worth running a static analyser over the code?

I would love to run any asan, analyzer, or else I could get my hands on. I think it's been overdue. I wasn't aware of scan-build. I'll give it a look and see. That CHERI-fied Clang Static Analyzer might also be worth a thought.

Footnotes

  1. https://stackoverflow.com/questions/2660633/declaring-pointers-asterisk-on-the-left-or-right-of-the-space-between-the-type

@ltratt
Copy link
Contributor

ltratt commented Jan 25, 2024

I think I'm using uintptr_t only when I need to do some pointer arithmetic. I can do another pass to ensure that is the case, and change appropriately to void * otherwise.

In ye olde days, if you wanted to do byte-level pointer arithmetic on pointers whose underlying data you're unsure about was to use char *. These days uint8_t * or similar might make the intent clearer. Either way, I would reserve uintptr_t for those cases where you're showing the reader that you might really need to do "non-pointer integer operations" on something (e.g. "pointer tagging"). I didn't notice any of these in the PR but I might have missed them!

The idea with that re-organization was:

  • everything that the end-user should need is only in manager.c. This will, for the most part, mirror main compartment.c functions (e.g., comp_exec, comp_from_file), but are provided as a general use point
  • compartment.c contains stuff to do with compartment management, mapping, look-ups, all that stuff
  • intercept.c contains stuff to do with intercepts, and thus also with internal compartment memory management (mem_mng.c). I might've missed renaming some manager stuff when I moved them over to intercept. I'll double check.

If you agree with the layout above (which I can add to a README), then I can take care to rename stuff appropriately in this PR (I admit I was a bit sloppy with this). Otherwise, we can leave stuff as is, and discuss a better overall design.

Documenting this would definitely be good!

I think perhaps manager.c as the user-facing thing is an odd name? Perhaps that reflects an unstated intuition I have that C libraries have a "name" (e.g. "LibraryMcLibraryFace") and then the "main" user-facing file tends to be named after that library (librarymclibraryface.c), with other files tending to have the "internal" parts to the library. But, that said, that still might not make the structure quite as clear as it could be. But I don't think we should worry about for this PR: documenting the structure as it is is sufficient, and we can perhaps reorganise things in a subsequent PR.

Just adding a clang-format might be a good next PR.

That would work for me. The older the keener I am on autoformatters because it stops me having to develop or enforce formatting opinions on my own :p

@0152la
Copy link
Contributor Author

0152la commented Jan 25, 2024

In ye olde days, if you wanted to do byte-level pointer arithmetic on pointers whose underlying data you're unsure about was to use char *. These days uint8_t * or similar might make the intent clearer. Either way, I would reserve uintptr_t for those cases where you're showing the reader that you might really need to do "non-pointer integer operations" on something (e.g. "pointer tagging"). I didn't notice any of these in the PR but I might have missed them!

I got this idea from was page 12 of the CHERI Programming Guide 1, which has the following excerpt for intptr_t, uintptr_t:

(2) Where it is more convenient to place a pointer value in an integer type for the purposes
of arithmetic (which takes place on the capability’s address and in units of bytes, as if the
pointer had been cast to char *).

I'm not too familiar with pointer tagging, or can think of any other "non-pointer integer operations", but I think pointer arithmetic should fall under the above defined use for intptr_t? If so, I'd rather keep it as such. However, if you prefer we just stick to traditional types, I can replace uses of intptr_t as I intended to use them with char* casts (the document seems to suggest it'd be equivalent).

Footnotes

  1. https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-947.pdf

@ltratt
Copy link
Contributor

ltratt commented Jan 25, 2024

(2) Where it is more convenient to place a pointer value in an integer type for the purposes
of arithmetic (which takes place on the capability’s address and in units of bytes, as if the
pointer had been cast to char ).
I'm not too familiar with pointer tagging, or can think of any other "non-pointer integer operations", but I think pointer arithmetic should fall under the above defined use for intptr_t? If so, I'd rather keep it as such. However, if you prefer we just stick to traditional types, I can replace uses of intptr_t as I intended to use them with char
casts (the document seems to suggest it'd be equivalent).

Might be worth asking @jacobbramley for a second opinion here, but I think (especially in a hybrid context) doing pointer arithmetic on pointers is preferred, unless we're doing arithmetic stuff that can't be done on a pointer.

@jacobbramley
Copy link

Might be worth asking @jacobbramley for a second opinion here

Practical C/C+ code does it both ways and this tends to be a style choice, or similar.

I'd like to be a purist and say that you should always use T* (for some T, maybe templated in C++), but in practice it can be hard, especially if you need to do fancy arithmetic. In practice, it depends (sorry), but uintptr_t isn't unreasonable.

As an aside: I've never been terribly convinced that using uintptr_t for C pointer arithmetic is strictly correct. I think C and C++ guarantee that a round-trip (T* -> uintptr_t -> T*) gets you what you started with, but I don't recall a statement regarding pointer arithmetic on the intermediate uintptr_t. It obviously won't always result in a valid T* at the end, but I once looked for a statement suggesting that it might sometimes, and failed to find one. However, it's surely no worse than using char* in the same way, and sometimes you do need that level of control.

As a starting point, keeping the original T* is always useful, because you get implicit alignment constraints, linter help, and semantic information from the type. In C++, templates tend to let you keep T* quite far down. If you need pseudo-polymorphism in C then void* helps, and the CHERI API works well with this. However, you can't do arithmetic on void*.

If I can't do what I want with T* or void* then I'd normally use uintptr_t, but in CHERI it's often useful to use ptraddr_t (and cheri_address_set()). This greatly simplifies representability analysis, since it removes worries about transient unrepresentability. However, it'll be inefficient for simple operations (like a single addition, which has a dedicated instruction).

The only advantage I can see for using char* is that you can dereference it (like in memcpy); it has special treatment during aliasing analysis. In my opinion, uintptr_t and void* are both semantically better when you want "a pointer to something unspecified". char* is sometimes used for that, but char* does specify the target; it's a char!

@0152la
Copy link
Contributor Author

0152la commented Jan 29, 2024

I've been perusing the standard 1, and I think these are the relevant parts to extract, in my own words:

  • First of all, I couldn't find anything talking about provenance, or such. Pointers seem to be introduced in 6.2.5 25) (p39), but no mention of provenance;
  • Section 6.5.6, 9) says that we can perform integer arithmetic against any pointer type, and the usual actual offset is based on the pointer type; footnote 119 says to convert to char* basically
  • Section 6.5.6 2) says we can only add a pointer and an integer type together, not two pointer types (I think this might be an issue in some of the things I do, such as getting the concrete address of a function in a compartment by adding the compartment base and the function offset in the ELF file - that might be fixed by casting the former to a ptrdiff_t though?)
  • intptr_t only exists to be able to hold a pointer value; there is no mention of using it in pointer arithmetic
  • I think doing "pointer arithmetic" in intptr_t is kind of fine, as a consequence of just converting to integer types (probably intptr_t), doing the arithmetic, converting back, and ensuring that UB doesn't happen on either conversion

Overall, I think the most "C-like" way of tackling this would be char* casting, and I think intptr_t operations can be considered an acceptable (albeit much more "things can blow up") alternative. As long as we have a way to solve the "adding two pointers" issue (for which either intptr_t or ptrdiff_t should work I think), we should be find with converting stuff to char*, or we can leave stuff as is (although I'll double check all my uses of intptr_t to make sure I used them only in pointer arithmetic as I intended).

Footnotes

  1. https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf

@ltratt
Copy link
Contributor

ltratt commented Jan 29, 2024

@0152la Thanks! My suggestion is that you change the "easy obvious" cases of uintptr_t (I just realised that intptr_t seems a lot more dangerous than uintptr_t so I hope we don't have any intptr_ts!) to char * here. If there are any that look tricky, we can punt on those to a separate PR.

@0152la 0152la mentioned this pull request Jan 29, 2024
7 tasks
@0152la
Copy link
Contributor Author

0152la commented Jan 29, 2024

So this means we decide to use char* for pointer arithmetic instead of (u)intptr_t, right?

@ltratt
Copy link
Contributor

ltratt commented Jan 29, 2024

I would say: at a minimum don't use intptr_t because I'm not sure of its correctness with arithmetic full stop; and, where possible, move from uintptr_t to char *, but let's not kill ourselves doing so.

@0152la
Copy link
Contributor Author

0152la commented Jan 29, 2024

I've made the changes to pointer arithmetic in cfe4bf9 and d82e312.

@ltratt
Copy link
Contributor

ltratt commented Jan 29, 2024

@0152la Thanks! We're almost there for this PR I think -- there are still a couple of open comments above though.

@ltratt
Copy link
Contributor

ltratt commented Feb 6, 2024

Please squash.

Implement support for compartments to be gives as dynamic libraries
(i.e., `so` files). This includes looking for library dependencies, and
loading those in the same compartment as the given user file. These
libraries are expected to be found in the path given by the environment
variable `COMP_LIBRARY_PATH`.

What this means is that we now also add a "local" `libc` to each
compartment that needs it (likely all?). Thus, we can greatly reduce the
number of intercepts we require. Perhaps we can overhaul the mechanism
entirely to only intercept hard-coded functions we know we need (e.g.,
allocator calls).

The support for static binaries has not been removed completely, but
likely has been broken due to this overhaul. Whether to leave as is,
to remove completely, or to add support for it remains to be decided.

Further small changes:
* Rework testing infrastructure to account for `so` compartments
* Replace `false` `assert`s and `exit`s with `err`s
* Replace all `pread`s with `do_pread` wrappers
* Rename intercept functions from `manager_` prefix to `intercepted_`
* Slight `README` update
* Replace `intptr_t` usage with `void*` where appopriate
* Fixed a bug where we would find symbols in dependency libraries with a
  value of 0x0 to be eagerly relocated. Assumption is these symbols are
  meant to be relocated in the dependency library itself. We now simply
  filter out 0x0 addresses for symbols to eagerly relocate
@0152la
Copy link
Contributor Author

0152la commented Feb 7, 2024

Squashed.

@ltratt ltratt merged commit db9539f into capablevms:master Feb 7, 2024
1 check passed
This was referenced Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants