Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null terminated slices for exec #358

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
344 changes: 343 additions & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ pub mod unistd;

use libc::c_char;
use std::{ptr, result};
use std::ffi::{CStr, OsStr};
use std::ffi::{CStr, CString, OsStr};
use std::path::{Path, PathBuf};
use std::os::unix::ffi::OsStrExt;
use std::fmt;
Expand Down Expand Up @@ -193,6 +193,17 @@ impl NixPath for CStr {
}
}

impl NixPath for CString {
fn len(&self) -> usize {
self.to_bytes().len()
}

fn with_nix_path<T, F>(&self, f: F) -> Result<T>
where F: FnOnce(&CStr) -> T {
self.as_c_str().with_nix_path(f)
}
}

impl NixPath for [u8] {
fn len(&self) -> usize {
self.len()
Expand Down Expand Up @@ -254,3 +265,334 @@ impl<'a, NP: ?Sized + NixPath> NixPath for Option<&'a NP> {
}
}
}

/*
*
* ===== Null terminated slices for exec =====
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this comment header.

*
*/

use std::ops::{Deref, DerefMut};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this use statements to the top of the file.

use std::mem::transmute;
use std::{iter, io};

/// An error returned from the [`TerminatedSlice::from_slice`] family of
/// functions when the provided data is not terminated by `None`.
#[derive(Clone, PartialEq, Eq, Debug)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not Copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, missed that.

pub struct NotTerminatedError {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this another enum variant within NixError?

Copy link
Contributor Author

@arcnmx arcnmx Dec 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like unnecessary bloat to NixError for an error case specific to a single function call that no one will ever actually use or call in 99% of scenarios.

Also chances are if someone does use it, they'll already have created the data to be terminated and actually want the unsafe variant anyway (or if you're avoiding unsafe then failure of the call will be treated as an internal assertion failure that's just abort/unwrap()/unreachable!()). I can't really imagine a scenario where this error would ever be propagated up a method chain or actually used with try!(), which is partially why the original implementation simply used an Option - using an error type is neater, but also kind of overkill?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make the argument that it's less bloat than all the code implementing a new error type, but I understand your point.

As to why this should be an Error type is because the semantics behind that better match what's happening rather than Option. If there is a failure, it should be an error. Options aren't used to denote failure. There is a grey area where it can go either way, but I think this is a clear case for Error.

So let's go ahead and leave this error as is. A goal of mine is to unify the error handling within nix to use the new failure crate, so this will likely get cleaned up along with that change.

// private initializers only
_inner: (),
}

impl NotTerminatedError {
fn not_terminated() -> Self {
NotTerminatedError {
_inner: (),
}
}
}

impl error::Error for NotTerminatedError {
fn description(&self) -> &str { "data not terminated by None" }
}

impl fmt::Display for NotTerminatedError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{}", error::Error::description(self))
}
}

impl From<NotTerminatedError> for io::Error {
fn from(e: NotTerminatedError) -> io::Error {
io::Error::new(io::ErrorKind::InvalidInput,
error::Error::description(&e))
}
}

/// An error returned from [`TerminatedVec::from_vec`] when the provided data is
/// not terminated by `None`.
#[derive(Clone, PartialEq, Eq)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing Copy and Debug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug is defined below this (it's manually implemented because we want to impl it regardless of whether T: Debug or not), and Copy cannot be derived as it contains a Vec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, sorry I didn't read that closely enough!

pub struct NotTerminatedVecError<T> {
inner: Vec<Option<T>>,
}

impl<T> NotTerminatedVecError<T> {
fn not_terminated(vec: Vec<Option<T>>) -> Self {
NotTerminatedVecError {
inner: vec,
}
}

pub fn into_vec(self) -> Vec<Option<T>> {
self.inner
}
}

impl<T> fmt::Debug for NotTerminatedVecError<T> {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
f.debug_struct("NotTerminatedVecError").finish()
}
}

impl<T> error::Error for NotTerminatedVecError<T> {
fn description(&self) -> &str { "data not terminated by None" }
}

impl<T> fmt::Display for NotTerminatedVecError<T> {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{}", error::Error::description(self))
}
}

impl<T> From<NotTerminatedVecError<T>> for io::Error {
fn from(e: NotTerminatedVecError<T>) -> io::Error {
io::Error::new(io::ErrorKind::InvalidInput,
error::Error::description(&e))
}
}

/// A conversion trait that may borrow or allocate memory depending on the input.
/// Used to convert between terminated slices and `Vec`s.
pub trait IntoRef<'a, T: ?Sized> {
type Target: 'a + AsRef<T> + Deref<Target=T>;

fn into_ref(self) -> Self::Target;
}

/// A slice of references terminated by `None`. Used by API calls that accept
/// null-terminated arrays such as the `exec` family of functions.
pub struct TerminatedSlice<T> {
inner: [Option<T>],
}

impl<T> TerminatedSlice<T> {
/// Instantiate a `TerminatedSlice` from a slice ending in `None`. Fails if
/// the provided slice is not properly terminated.
pub fn from_slice(slice: &[Option<T>]) -> result::Result<&Self, NotTerminatedError> {
if slice.last().map(Option::is_none).unwrap_or(false) {
Ok(unsafe { Self::from_slice_unchecked(slice) })
} else {
Err(NotTerminatedError::not_terminated())
}
}

/// Instantiate a `TerminatedSlice` from a mutable slice ending in `None`.
/// Fails if the provided slice is not properly terminated.
pub fn from_slice_mut(slice: &mut [Option<T>]) -> result::Result<&mut Self, NotTerminatedError> {
if slice.last().map(Option::is_none).unwrap_or(false) {
Ok(unsafe { Self::from_slice_mut_unchecked(slice) })
} else {
Err(NotTerminatedError::not_terminated())
}
}

/// Instantiate a `TerminatedSlice` from a slice ending in `None`.
///
/// ## Unsafety
///
/// This assumes that the slice is properly terminated, and can cause
/// undefined behaviour if that invariant is not upheld.
pub unsafe fn from_slice_unchecked(slice: &[Option<T>]) -> &Self {
transmute(slice)
}

/// Instantiate a `TerminatedSlice` from a mutable slice ending in `None`.
///
/// ## Unsafety
///
/// This assumes that the slice is properly terminated, and can cause
/// undefined behaviour if that invariant is not upheld.
pub unsafe fn from_slice_mut_unchecked(slice: &mut [Option<T>]) -> &mut Self {
transmute(slice)
}
}

impl<'a, U: Sized> TerminatedSlice<&'a U> {
pub fn as_ptr(&self) -> *const *const U {
self.inner.as_ptr() as *const _
}
}

impl<T> Deref for TerminatedSlice<T> {
type Target = [Option<T>];

fn deref(&self) -> &Self::Target {
&self.inner[..self.inner.len() - 1]
}
}

impl<T> DerefMut for TerminatedSlice<T> {
fn deref_mut(&mut self) -> &mut Self::Target {
let len = self.inner.len();
&mut self.inner[..len - 1]
}
}

impl<T> AsRef<TerminatedSlice<T>> for TerminatedSlice<T> {
fn as_ref(&self) -> &Self {
self
}
}

/// Coercion into an argument that can be passed to `exec`.
impl<'a, T: 'a> IntoRef<'a, TerminatedSlice<&'a T>> for &'a TerminatedSlice<&'a T> {
type Target = &'a TerminatedSlice<&'a T>;

fn into_ref(self) -> Self::Target {
self
}
}

/// A `Vec` of references terminated by `None`. Used by API calls that accept
/// null-terminated arrays such as the `exec` family of functions. Owned variant
/// of [`TerminatedSlice`].
pub struct TerminatedVec<T> {
inner: Vec<Option<T>>,
}

impl<T> TerminatedVec<T> {
/// Instantiates a `TerminatedVec` from a `None` terminated `Vec`. Fails if
/// the provided `Vec` is not properly terminated.
pub fn from_vec(vec: Vec<Option<T>>) -> result::Result<Self, NotTerminatedVecError<T>> {
if vec.last().map(Option::is_none).unwrap_or(false) {
Ok(unsafe { Self::from_vec_unchecked(vec) })
} else {
Err(NotTerminatedVecError::not_terminated(vec))
}
}

/// Instantiates a `TerminatedVec` from a `None` terminated `Vec`.
///
/// ## Unsafety
///
/// This assumes that the `Vec` is properly terminated, and can cause
/// undefined behaviour if that invariant is not upheld.
pub unsafe fn from_vec_unchecked(vec: Vec<Option<T>>) -> Self {
TerminatedVec {
inner: vec,
}
}

/// Consume `self` to return the inner wrapped `Vec`.
pub fn into_inner(self) -> Vec<Option<T>> {
self.inner
}

/// Converts an iterator into a `None` terminated `Vec`.
pub fn terminate<I: IntoIterator<Item=T>>(iter: I) -> Self {
let terminated = iter.into_iter()
.map(Some).chain(iter::once(None)).collect();

unsafe {
Self::from_vec_unchecked(terminated)
}
}
}

impl<'a> TerminatedVec<&'a c_char> {
/// Converts an iterator of `AsRef<CStr>` into a `TerminatedVec` to
/// be used by the `exec` functions. This allows for preallocation of the
/// array when memory allocation could otherwise cause problems (such as
/// when combined with `fork`).
///
/// ```
/// use std::iter;
/// use std::ffi::CString;
/// use nix::{TerminatedVec, unistd};
/// use nix::sys::wait;
/// use nix::libc::_exit;
///
/// # #[cfg(target_os = "android")]
/// # fn exe_path() -> CString {
/// # CString::new("/system/bin/sh").unwrap()
/// # }
/// # #[cfg(not(target_os = "android"))]
/// # fn exe_path() -> CString {
/// let exe = CString::new("/bin/sh").unwrap();
/// # exe
/// # }
/// # let exe = exe_path();
/// let args = [
/// exe.clone(),
/// CString::new("-c").unwrap(),
/// CString::new("echo hi").unwrap(),
/// ];
/// let args_p = TerminatedVec::terminate_cstr(&args);
/// let env = TerminatedVec::terminate(iter::empty());
///
/// match unistd::fork().unwrap() {
/// unistd::ForkResult::Child => {
/// match unistd::execve(exe.as_c_str(), &args_p, &env) {
/// Err(..) => {
/// // Panicking or returning will drop locals and
/// // deallocate memory, so let's avoid that here.
/// unsafe { _exit(1); }
/// },
/// Ok(..) => unreachable!(),
/// }
/// },
/// unistd::ForkResult::Parent { child } => {
/// let status = wait::waitpid(child, None).unwrap();
/// assert_eq!(status, wait::WaitStatus::Exited(child, 0));
/// },
/// }
/// ```
pub fn terminate_cstr<T: AsRef<CStr> + 'a, I: IntoIterator<Item=T>>(iter: I) -> Self {
fn cstr_char<'a, S: AsRef<CStr> + 'a>(s: S) -> &'a c_char {
unsafe {
&*s.as_ref().as_ptr()
}
}

Self::terminate(iter.into_iter().map(cstr_char))
}
}

impl<T> Deref for TerminatedVec<T> {
type Target = TerminatedSlice<T>;

fn deref(&self) -> &Self::Target {
unsafe {
TerminatedSlice::from_slice_unchecked(&self.inner)
}
}
}

impl<T> DerefMut for TerminatedVec<T> {
fn deref_mut(&mut self) -> &mut Self::Target {
unsafe {
TerminatedSlice::from_slice_mut_unchecked(&mut self.inner)
}
}
}

impl<T> AsRef<TerminatedSlice<T>> for TerminatedVec<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to put this impl up with the Slice implementation code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean by this. I tend to follow a type definition with the traits that it impls or provides/enables. You mean this should be moved before the definition of TerminatedVec?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, disregard this comment.

fn as_ref(&self) -> &TerminatedSlice<T> {
self
}
}

impl<'a, T: 'a> IntoRef<'a, TerminatedSlice<T>> for TerminatedVec<T> {
type Target = TerminatedVec<T>;

fn into_ref(self) -> Self::Target {
self
}
}

impl<'a, T> IntoRef<'a, TerminatedSlice<T>> for &'a TerminatedVec<T> {
type Target = &'a TerminatedSlice<T>;

fn into_ref(self) -> Self::Target {
self
}
}

/// Coercion of `CStr` iterators into an argument that can be passed to `exec`.
impl<'a, T: AsRef<CStr> + 'a, I: IntoIterator<Item=T>> IntoRef<'a, TerminatedSlice<&'a c_char>> for I {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that makes sense to have globally-defined? Or can this be integrated into the function definition instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This impl is pretty central to allowing exec to accept either:

  • &TerminatedSlice (usually actually a TerminatedVec created by terminate())
  • &[CStr] and similar (specifically any IntoIterator where Item: AsRef<CStr>)

... particularly the latter, which enables the old behaviour of allocate-at-call-site. One slightly less global approach might be to move the IntoRef trait and its impls into the unistd module and possibly give it a name that isn't as generic sounding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to simply remove the old style and require the use of TerminatedVec::terminate_cstr any time exec is called, whether the Vec is being preallocated or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a big fan of reducing options and improving consistency. This is why I've been asking for code examples, as I'm not familiar with how people use this API regularly. I want to make sure we know how people use it regularly and our API is easy for the 90% case and possible for the last 10% case. Or something along those lines.

All the examples I see involve a hard-coded array. And in those situations, I think a term_vec!() macro makes a lot of sense; this would look like a regular vec! call and avoid all of the wrapping in Option. But I don't know how people build up these arrays programmatically or if that's ever done in practice. Is that a common operation? I'd think that is handled nicely by the API provided here, but I don't know exactly.

Copy link
Contributor Author

@arcnmx arcnmx Dec 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, examples are going to use a hard-coded array because they're just examples. Real world usage varies of course, and can be hard-coded in simple cases, or generated from a config file or argv like in the use case that I originally wrote these changes for.

Most uses for exec I can imagine would use data coming from a dynamic source rather than being hard-coded. The convenient no-preallocation variant mostly comes into play when using exec in a program that doesn't fork and just intends to replace the whole process - I'm not sure how common that is, I guess if you're writing a tool like env(1) or something. A single-threaded program that does fork applies too, although that's not an assumption one should make if writing a library that doesn't know what kind of program might be using it.

type Target = TerminatedVec<&'a c_char>;

fn into_ref(self) -> Self::Target {
TerminatedVec::terminate_cstr(self)
}
}
Loading