Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add which function for finding executables in PATH #2440

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ dirs = "5.0.1"
dotenvy = "0.15"
edit-distance = "2.0.0"
heck = "0.5.0"
is_executable = "1.0.4"
lexiclean = "0.0.1"
libc = "0.2.0"
num_cpus = "1.15.0"
Expand Down
17 changes: 17 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1675,6 +1675,23 @@ set unstable
foo := env('FOO') || 'DEFAULT_VALUE'
```

- `which(exe)`<sup>master</sup> — Retrieves the full path of `exe` according
to the `PATH`. Returns an empty string if no executable named `exe` exists.

```just
bash := which("bash")
nexist := which("does-not-exist")

@test:
echo "bash: '{{bash}}'"
echo "nexist: '{{nexist}}'"
```

```console
bash: '/bin/bash'
nexist: ''
```

#### Invocation Information

- `is_dependency()` - Returns the string `true` if the current recipe is being
Expand Down
49 changes: 49 additions & 0 deletions src/function.rs
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ pub(crate) fn get(name: &str) -> Option<Function> {
"uppercase" => Unary(uppercase),
"uuid" => Nullary(uuid),
"without_extension" => Unary(without_extension),
"which" => Unary(which),
_ => return None,
};
Some(function)
Expand Down Expand Up @@ -661,6 +662,54 @@ fn uuid(_context: Context) -> FunctionResult {
Ok(uuid::Uuid::new_v4().to_string())
}

fn which(context: Context, s: &str) -> FunctionResult {
0xzhzh marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about this version? I found the original logic a little hard to follow:

(Also included some comments which don't need to go into the final PR.)

fn which(context: Context, command: &str) -> FunctionResult {
  use std::path::Component;

  let command = Path::new(command);

  let relative = match command.components().next() {
    None => return Err("empty command".into()),
    // Any path that starts with `.` or `..` can't be joined to elements of `$PATH` and should be considered on its own. (Is this actually true? What about `C:foo` on windows? Is that a thing?
    Some(Component::CurDir) | Some(Component::ParentDir) => vec![command.into()],
    _ => {
      let paths = env::var_os("PATH").ok_or("`PATH` environment variable not set")?;

      env::split_paths(&paths)
        .map(|path| path.join(command))
        .collect()
    }
  };

  let working_directory = context.evaluator.context.working_directory();

  let absolute = relative
    .into_iter()
    // note that an path.join(absolute_path) winds up being absolute_path
    // lexiclean is hear to remove unnecessary `.` and `..`
    .map(|relative| working_directory.join(relative).lexiclean())
    .collect::<Vec<PathBuf>>();

  for candidate in absolute {
    if is_executable::is_executable(&candidate) {
      return candidate
        .to_str()
        .map(str::to_string)
        .ok_or_else(|| format!("Executable path not unicode: {}", candidate.display()));
    }
  }

  Ok(String::new())
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally find my implementation easier to understand, but that might only be because I wrote it.

The main difference between our implementations seems to be how we determine whether to resolve via PATH. Mine checks command.components.count(): anything with more than one component is considered a relative path and skips PATH resolution. Meanwhile, yours checks command.components.next(): anything whose first element is ., .., or / is considered a relative path and skips PATH resolution.

Crucially, they differ on how they handle something like which("dir/cmd"), which has more than one component, but the first component is a Component::Normal("dir"). Thus, my implementation would consider it the same as ./dir/cmd (i.e., resolved relative to CWD), while yours would resolve it using the PATH variable.

I wrote a small test, and (at least on macOS with sh, bash, and zsh) it seems like my implementation has the correct behavior in this instance:

#!/bin/sh

set -e
tmpdir="$(mktemp -d)"
cd "$tmpdir"

mkdir -p path/dir dir

printf "#!%s\necho resolved via PATH" "$SHELL" > path/dir/cmd
chmod +x path/dir/cmd

printf "#!%s\necho resolved via CWD" "$SHELL" > dir/cmd
chmod +x dir/cmd

PATH="$(realpath path)" dir/cmd   # prints 'resolved via CWD'

rm -rf "$tmpdir"

The other differences are:

  • You use Path::new(command) rather than PathBuf::from(command). I've adopted this change since it's more efficient (and also
  • You use lexiclean() to remove extraneous .. or . components. However, at least the which implementation on my system (macOS) does not do this.
  • You handle absolute paths (those beginning with /) implicitly, through the behavior of Path::join() (TIL about this behavior!). But I personally think it's clearer to handle this case explicitly.
  • You make two passes through the candidates, one to prepend the working directory, and one to check whether it is an executable. I didn't do this to avoid making another copy.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, you're right, in addition to your test, I looked at the [POSIX shell standard](https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/utilities/V3_chap02.html_ (search for "command search and execution"), and it says that for any path with a, the PATH variable isn't consulted.

So I think command.components.count() is best. (Which, if we're doing that, it makes sense to use that to check for an empty command, and not .is_empty())

GNU which does actually remove ..:

$ gwhich ../just/bin/forbid
/Users/rodarmor/src/just/bin/forbid

So I'm in favor of using lexiclean, since it makes the returned paths easier to read, in case they return .. or ..

I personally like using the relative path behavior to do joining, but I think that's kind of a wash, since it's a little weird.

Copy link
Author

@0xzhzh 0xzhzh Jan 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GNU which does actually remove .. [...] So I'm in favor of using lexiclean, since it makes the returned paths easier to read, in case they return .. or ..

I wasn't aware of that, thanks for finding that behavior! But if someone really wants to clean up the path, can't they do something like clean(which("some/../funny/path"))? Or even canonicalize(which("some/../funny/path"))?

The argument against cleaning the path is that doing so removes information that can no longer be recovered. I'm wary of over-normalization for weird edge cases like the example mentioned in the Rust Path::components() docs.

Admittedly, I can't personally foresee any scenario where this relative path information might actually be useful, so I'm strongly against calling lexiclean() here. But since the user can opt into normalization with other Just functions like clean and canonicalize, shouldn't we leave this to the user?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably okay, and .. is likely to be relatively common in things passed to which(), since you're likely to want to go back to parent directories from the justfile, and it would be nice to get rid of them so the output reads better.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I've added lexiclean().

use is_executable::IsExecutable;

let cmd = Path::new(s);

let candidates = match cmd.components().count() {
0 => return Err("empty command".into()),
1 => {
// cmd is a regular command
let path_var = env::var_os("PATH").ok_or("Environment variable `PATH` is not set")?;
env::split_paths(&path_var)
.map(|path| path.join(cmd))
.collect()
}
_ => {
// cmd contains a path separator, treat it as a path
vec![cmd.into()]
}
};

for mut candidate in candidates {
if candidate.is_relative() {
// This candidate is a relative path, either because the user invoked `which("rel/path")`,
// or because there was a relative path in `PATH`. Resolve it to an absolute path,
// relative to the working directory of the just invocation.
candidate = context
.evaluator
.context
.working_directory()
.join(candidate);
}

candidate = candidate.lexiclean();

if candidate.is_executable() {
return candidate.to_str().map(str::to_string).ok_or_else(|| {
format!(
"Executable path is not valid unicode: {}",
candidate.display()
)
});
}
}

// No viable candidates; return an empty string
Ok(String::new())
}

fn without_extension(_context: Context, path: &str) -> FunctionResult {
let parent = Utf8Path::new(path)
.parent()
Expand Down
1 change: 1 addition & 0 deletions tests/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ mod timestamps;
mod undefined_variables;
mod unexport;
mod unstable;
mod which_function;
#[cfg(windows)]
mod windows;
#[cfg(target_family = "windows")]
Expand Down
25 changes: 25 additions & 0 deletions tests/test.rs
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,31 @@ impl Test {
self
}

pub(crate) fn make_executable(self, path: impl AsRef<Path>) -> Self {
let file = self.tempdir.path().join(path);

// Make sure it exists first, as a sanity check.
assert!(file.exists(), "file does not exist: {}", file.display());

// Windows uses file extensions to determine whether a file is executable.
// Other systems don't care. To keep these tests cross-platform, just make
// sure all executables end with ".exe" suffix.
assert!(
file.extension() == Some("exe".as_ref()),
"executable file does not end with .exe: {}",
file.display()
);

println!("hi: {}", file.display());

if !cfg!(windows) {
let perms = std::os::unix::fs::PermissionsExt::from_mode(0o755);
fs::set_permissions(file, perms).unwrap();
}

self
}

pub(crate) fn expect_file(mut self, path: impl AsRef<Path>, content: impl AsRef<[u8]>) -> Self {
let path = path.as_ref();
self
Expand Down
203 changes: 203 additions & 0 deletions tests/which_function.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
use super::*;

const HELLO_SCRIPT: &str = "#!/usr/bin/env bash
echo hello
";

#[test]
fn finds_executable() {
let tmp = tempdir();
let path = PathBuf::from(tmp.path());

Test::with_tempdir(tmp)
.justfile("p := which('hello.exe')")
.args(["--evaluate", "p"])
.write("hello.exe", HELLO_SCRIPT)
.make_executable("hello.exe")
.env("PATH", path.to_str().unwrap())
.stdout(format!("{}", path.join("hello.exe").display()))
.run();
}

#[test]
fn prints_empty_string_for_missing_executable() {
let tmp = tempdir();
let path = PathBuf::from(tmp.path());

Test::with_tempdir(tmp)
.justfile("p := which('goodbye.exe')")
.args(["--evaluate", "p"])
.write("hello.exe", HELLO_SCRIPT)
.make_executable("hello.exe")
.env("PATH", path.to_str().unwrap())
.stdout("")
.run();
}

#[test]
fn skips_non_executable_files() {
let tmp = tempdir();
let path = PathBuf::from(tmp.path());

Test::with_tempdir(tmp)
.justfile("p := which('hi')")
.args(["--evaluate", "p"])
.write("hello.exe", HELLO_SCRIPT)
.make_executable("hello.exe")
.write("hi", "just some regular file")
.env("PATH", path.to_str().unwrap())
.stdout("")
.run();
}

#[test]
fn supports_multiple_paths() {
let tmp = tempdir();
let path = PathBuf::from(tmp.path());
let path_var = env::join_paths([
path.join("subdir1").to_str().unwrap(),
path.join("subdir2").to_str().unwrap(),
])
.unwrap();

Test::with_tempdir(tmp)
.justfile("p := which('hello1.exe') + '+' + which('hello2.exe')")
.args(["--evaluate", "p"])
.write("subdir1/hello1.exe", HELLO_SCRIPT)
.make_executable("subdir1/hello1.exe")
.write("subdir2/hello2.exe", HELLO_SCRIPT)
.make_executable("subdir2/hello2.exe")
.env("PATH", path_var.to_str().unwrap())
.stdout(format!(
"{}+{}",
path.join("subdir1").join("hello1.exe").display(),
path.join("subdir2").join("hello2.exe").display(),
))
.run();
}

#[test]
fn supports_shadowed_executables() {
enum Variation {
Dir1Dir2, // PATH=/tmp/.../dir1:/tmp/.../dir2
Dir2Dir1, // PATH=/tmp/.../dir2:/tmp/.../dir1
}

for variation in [Variation::Dir1Dir2, Variation::Dir2Dir1] {
let tmp = tempdir();
let path = PathBuf::from(tmp.path());

let path_var = match variation {
Variation::Dir1Dir2 => env::join_paths([
path.join("dir1").to_str().unwrap(),
path.join("dir2").to_str().unwrap(),
]),
Variation::Dir2Dir1 => env::join_paths([
path.join("dir2").to_str().unwrap(),
path.join("dir1").to_str().unwrap(),
]),
}
.unwrap();

let stdout = match variation {
Variation::Dir1Dir2 => format!("{}", path.join("dir1").join("shadowed.exe").display()),
Variation::Dir2Dir1 => format!("{}", path.join("dir2").join("shadowed.exe").display()),
};

Test::with_tempdir(tmp)
.justfile("p := which('shadowed.exe')")
.args(["--evaluate", "p"])
.write("dir1/shadowed.exe", HELLO_SCRIPT)
.make_executable("dir1/shadowed.exe")
.write("dir2/shadowed.exe", HELLO_SCRIPT)
.make_executable("dir2/shadowed.exe")
.env("PATH", path_var.to_str().unwrap())
.stdout(stdout)
.run();
}
}

#[test]
fn ignores_nonexecutable_candidates() {
let tmp = tempdir();
let path = PathBuf::from(tmp.path());

let path_var = env::join_paths([
path.join("dummy").to_str().unwrap(),
path.join("subdir").to_str().unwrap(),
path.join("dummy").to_str().unwrap(),
])
.unwrap();

let dummy_exe = if cfg!(windows) {
"dummy/foo"
} else {
"dummy/foo.exe"
};

Test::with_tempdir(tmp)
.justfile("p := which('foo.exe')")
.args(["--evaluate", "p"])
.write("subdir/foo.exe", HELLO_SCRIPT)
.make_executable("subdir/foo.exe")
.write(dummy_exe, HELLO_SCRIPT)
.env("PATH", path_var.to_str().unwrap())
.stdout(format!("{}", path.join("subdir").join("foo.exe").display()))
.run();
}

#[test]
fn handles_absolute_path() {
let tmp = tempdir();
let path = PathBuf::from(tmp.path());
let abspath = path.join("subdir").join("foo.exe");

Test::with_tempdir(tmp)
.justfile(format!("p := which('{}')", abspath.display()))
.write("subdir/foo.exe", HELLO_SCRIPT)
.make_executable("subdir/foo.exe")
.write("pathdir/foo.exe", HELLO_SCRIPT)
.make_executable("pathdir/foo.exe")
.env("PATH", path.join("pathdir").to_str().unwrap())
.args(["--evaluate", "p"])
.stdout(format!("{}", abspath.display()))
.run();
}

#[test]
fn handles_dotslash() {
let tmp = tempdir();
let path = tmp.path().canonicalize().unwrap();
// canonicalize() is necessary here to account for the justfile prepending
// the canonicalized working directory to './foo.exe'.

Test::with_tempdir(tmp)
.justfile("p := which('./foo.exe')")
.args(["--evaluate", "p"])
.write("foo.exe", HELLO_SCRIPT)
.make_executable("foo.exe")
.write("pathdir/foo.exe", HELLO_SCRIPT)
.make_executable("pathdir/foo.exe")
.env("PATH", path.join("pathdir").to_str().unwrap())
.stdout(format!("{}", path.join("foo.exe").display()))
.run();
}

#[test]
fn handles_dir_slash() {
let tmp = tempdir();
let path = tmp.path().canonicalize().unwrap();
// canonicalize() is necessary here to account for the justfile prepending
// the canonicalized working directory to 'subdir/foo.exe'.

Test::with_tempdir(tmp)
.justfile("p := which('subdir/foo.exe')")
.args(["--evaluate", "p"])
.write("subdir/foo.exe", HELLO_SCRIPT)
.make_executable("subdir/foo.exe")
.write("pathdir/foo.exe", HELLO_SCRIPT)
.make_executable("pathdir/foo.exe")
.env("PATH", path.join("pathdir").to_str().unwrap())
.stdout(format!("{}", path.join("subdir").join("foo.exe").display()))
.run();
}