-
-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support @fromfile option values in the Rust options parser. #20634
Conversation
Reviewers - thanks for your attention and patience. The diff size seems scary, but the vast majority of it is in new tests, and the Cargo.lock changes. I recommend starting by looking at the changes in parse.rs (and my GH comments there), then lib.rs, then the changes in args.rs/env.rs/config.rs should be straightforward applications of those, and then there are the new tests, which are very uniform across args_tests.rs/env_tests.rs/config_tests.rs. |
@@ -274,49 +278,188 @@ fn format_parse_error( | |||
)) | |||
} | |||
|
|||
#[allow(dead_code)] | |||
pub(crate) fn parse_bool(value: &str) -> Result<bool, ParseError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we called these for spot-parsing in various places, or passed these around as function pointers in generic code. Instead, we now implement these via a new trait, Parseable
, which allows the straightforward bool::parse()
and bool::parse_list()
instead.
#[allow(dead_code)] | ||
pub(crate) fn parse_bool(value: &str) -> Result<bool, ParseError> { | ||
option_value_parser::bool(value).map_err(|e| format_parse_error("bool", value, e)) | ||
pub(crate) fn parse_dict(value: &str) -> Result<DictEdit, ParseError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do still need an explicit parse_dict()
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because dict values are heterogeneous, so there is no scalar type to attach their conversion to, like we do for lists. We could implement parse()
(but not parse_list()
) on HashMap<String, Val>
but that doesn't really buy us anything. The point of the other cleanups was to get rid of boilerplate via type parametrization, but that doesn't apply here.
src/rust/engine/options/src/parse.rs
Outdated
} | ||
} | ||
|
||
pub(crate) trait ListMember: Parseable + DeserializeOwned {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a tagging trait for list members, to ensure they are also deserializable into an owned instance by serde, when we read them from json/yaml fromfiles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know of three goals with this sort of trait:
- an alias that's more convenient than writing
T: Parseable + DeserializeOwned
- wrapping multiple traits up into a single one, for use in a trait object (i.e.
&dyn ListMember
works, but&dyn (Parseable + DeserializeOwned)
doesn't; the+
syntax only supports built-in traits likeSend
) - an "assertion" that particular types implement the parent traits
Is this usage fitting into one of those categories? If the first or second one, an option for the impl
might be a blanket impl, something like:
impl<T: Parseable + DeserializeOwned + ?Sized> ListMember for T {}
(With the ?Sized
bound only required to support trait objects well, I think. So not necessary.)
If the third one, an another option is something like:
const fn assert_is_list_member<T: Parseable + DeserializeOwned>() {}
const _ASSERT_BOOL: () = assert_is_list_member::<bool>();
...
but now that I write it out, that's not amazingly nicer. 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was #1.
But this code evolved while I was working on it, and now that I look at where it ended up, ListMember isn't actually needed anyway. Every Parseable can be a list member (because of its parse_list()), so it's easier to make Parseable a subtrait of DeserializeOwned and be done with it.
Good call!
pub(crate) fn parse_bool_list(value: &str) -> Result<Vec<ListEdit<bool>>, ParseError> { | ||
option_value_parser::bool_list_edits(value) | ||
.map_err(|e| format_parse_error("bool list", value, e)) | ||
fn maybe_expand(value: String) -> Result<ExpandedValue, ParseError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A helper function for the expand* functions below that are the ones actually used by args/env/config.
#[allow(dead_code)] | ||
pub(crate) fn parse_int_list(value: &str) -> Result<Vec<ListEdit<i64>>, ParseError> { | ||
option_value_parser::int_list_edits(value).map_err(|e| format_parse_error("int list", value, e)) | ||
pub(crate) fn expand(value: String) -> Result<Option<String>, ParseError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does no type conversion, just expands "path to @fromfile" -> "content of @fromfile". See usage in args/env/config.rs.
src/rust/engine/options/src/parse.rs
Outdated
if let Some(ext) = path.extension() { | ||
if ext == "json" { | ||
deserialized_items = | ||
Some(serde_json::from_str(&value).map_err(|e| mk_parse_err(e, &path))?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serde is freakin' magic! This (plus a minor annotation on the Val class in lib.rs) is all I needed to get Rust to parse json and yaml into our existing data structures!
src/rust/engine/options/src/parse.rs
Outdated
if let Some(ext) = path.extension() { | ||
if ext == "json" { | ||
deserialized_items = | ||
Some(serde_json::from_str(&value).map_err(|e| mk_parse_err(e, &path))?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a little bit of repetition here vis the previous function, but the differences are just subtle enough that it became more confusing than it was worth to try and unify them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about at least pulling out the file-type-identification? Maybe something like
enum FileType {
Json,
Yalm
}
impl FileType {
fn guess(path: &Path) -> Option<FileType> {
match path.extension() {
"json" => Some(FileType::Json),
"yaml" | "yml" => ...,
_ => None,
}
}
}
and then this section of the code in both could could be something like:
let deserialized_items = match path_opt.flat_map(|p| FileType::guess(&p)) {
Some(FileType::Json) => ...,
Some(FileType::Yaml) => ...,
None => None,
};
One could potentially then even move the "call serde_json
or serde_yaml
" decision to a generic method on FileType
, and have that be shared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I toyed with this some more and was able to create a generic deserialization helper function that works for lists and dicts, so hooray Rust type system!
@@ -38,7 +40,18 @@ fn check_with_arg<T: PartialEq + Debug>( | |||
#[test] | |||
fn test_parse_quoted_string() { | |||
fn check_str(expected: &str, input: &str) { | |||
check!(expected.to_string(), parse_quoted_string(input)); | |||
// This is slightly convoluted: quoted strings appear as list items, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so I could delete the otherwise dead code parse_quoted_string()
. It makes sense to specifically test that we handle quoted strings correctly, but now we do so where we actually encounter them - inside lists.
@@ -648,3 +661,190 @@ fn test_parse_heterogeneous_dict() { | |||
) | |||
); | |||
} | |||
|
|||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From here we have some very exhaustive tests of the fromfile mechanism. We also test this mechanism's application in args/env/config in the appropriate test files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
In addition to this set, could we have a test or two covering some more error cases beyond just "not existing": the major class I can think of is when the file contents is invalid, both as a scalar, and, probably more interestingly, as a list or dict (e.g. where serde json/yaml can't even parse it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding some list/dict cases as you suggest. There is no such thing as an invalid scalar at this level, since we don't convert to the type but just expand string to string. Can test for this in args/config/env maybe. I'll look into adding something.
/// | ||
/// A source of option values. | ||
/// | ||
/// This is currently a subset of the types of options the Pants python option system handles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is no longer a subset! So this whole comment can go.
pub source: Source, | ||
pub value: T, | ||
} | ||
|
||
#[derive(Debug)] | ||
pub struct ListOptionValue<T> { | ||
pub derivation: Option<Vec<(Source, Vec<ListEdit<T>>)>>, | ||
// The highest-priority source that provided edits for this value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not directly needed for fromfile, but will be needed for replacing the Python parser, and it was easy to sneak in here...
src/rust/engine/options/src/args.rs
Outdated
id: &OptionId, | ||
parse_list: fn(&str) -> Result<Vec<ListEdit<T>>, ParseError>, | ||
) -> Result<Option<Vec<ListEdit<T>>>, String> { | ||
fn get_list<T: ListMember>(&self, id: &OptionId) -> Result<Option<Vec<ListEdit<T>>>, String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No more need for the parse_list
function pointer - ListMember
has the functionality we need (via its Parseable
supertrait).
@@ -54,7 +56,8 @@ pub use types::OptionType; | |||
// We only use this for parsing values in dicts, as in other cases we know that the type must | |||
// be some scalar or string, or a uniform list of one type of scalar or string, so we can | |||
// parse as such. | |||
#[derive(Clone, Debug, PartialEq)] | |||
#[derive(Clone, Debug, PartialEq, Deserialize)] | |||
#[serde(untagged)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All I needed to do to get json and yaml deserialization working!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice!
#[allow(dead_code)] | ||
pub(crate) fn parse_bool(value: &str) -> Result<bool, ParseError> { | ||
option_value_parser::bool(value).map_err(|e| format_parse_error("bool", value, e)) | ||
pub(crate) fn parse_dict(value: &str) -> Result<DictEdit, ParseError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
src/rust/engine/options/src/parse.rs
Outdated
} | ||
} | ||
|
||
pub(crate) trait ListMember: Parseable + DeserializeOwned {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know of three goals with this sort of trait:
- an alias that's more convenient than writing
T: Parseable + DeserializeOwned
- wrapping multiple traits up into a single one, for use in a trait object (i.e.
&dyn ListMember
works, but&dyn (Parseable + DeserializeOwned)
doesn't; the+
syntax only supports built-in traits likeSend
) - an "assertion" that particular types implement the parent traits
Is this usage fitting into one of those categories? If the first or second one, an option for the impl
might be a blanket impl, something like:
impl<T: Parseable + DeserializeOwned + ?Sized> ListMember for T {}
(With the ?Sized
bound only required to support trait objects well, I think. So not necessary.)
If the third one, an another option is something like:
const fn assert_is_list_member<T: Parseable + DeserializeOwned>() {}
const _ASSERT_BOOL: () = assert_is_list_member::<bool>();
...
but now that I write it out, that's not amazingly nicer. 🤷♂️
src/rust/engine/options/src/parse.rs
Outdated
|
||
fn mk_parse_err(err: impl Display, path: &Path) -> ParseError { | ||
ParseError::new(format!( | ||
"Problem reading {path} for {{name}}: {err_msg}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the {{name}}
mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A placeholder that ParseError.render() fills in with the option name, rendered suitably for the source type (e.g., --foo
vs PANTS_FOO
vs [GLOBAL] foo
. This was all preexisting before any of my recent changes.
Some(subsuffix) => { | ||
// @? means the path is allowed to not exist. | ||
let path = PathBuf::from(subsuffix); | ||
match fs::read_to_string(&path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I note that this is doing synchronous IO, but generally pants does async. Is the sync IO acceptable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably? We expect very few fromfile options, and they are read synchronously today in the Python code.
The trickier part is invalidating options when the fromfile changes (which we don't to today either). Making this async though would be a pretty big change that would be best done as its own project. For now we're no worse off than today.
src/rust/engine/options/src/parse.rs
Outdated
if let Some(ext) = path.extension() { | ||
if ext == "json" { | ||
deserialized_items = | ||
Some(serde_json::from_str(&value).map_err(|e| mk_parse_err(e, &path))?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about at least pulling out the file-type-identification? Maybe something like
enum FileType {
Json,
Yalm
}
impl FileType {
fn guess(path: &Path) -> Option<FileType> {
match path.extension() {
"json" => Some(FileType::Json),
"yaml" | "yml" => ...,
_ => None,
}
}
}
and then this section of the code in both could could be something like:
let deserialized_items = match path_opt.flat_map(|p| FileType::guess(&p)) {
Some(FileType::Json) => ...,
Some(FileType::Yaml) => ...,
None => None,
};
One could potentially then even move the "call serde_json
or serde_yaml
" decision to a generic method on FileType
, and have that be shared.
Ok(Some("FOO".to_string())), | ||
expand(format!("@{}", fromfile_path_str)) | ||
); | ||
assert_eq!(Ok(None), expand("@?/does/not/exist".to_string())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also having a test for an optional file that does exist, i.e. make sure that we're stripping off the ?
correctly? (I could imagine a future refactoring making a typo that causes us to pass in ?/does/not/exist
literally as the file path.)
In that vein, is it worth having "edge-case tests" that at least lock down the behaviour for "weird" things like @@path/to/file
reading @path/to/file
and similarly @??path/to/file
(optionally) reading ?path/to/file
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Note that @@
is already covered - that is the escape hatch for a literal value that starts with @
, so we never allow such a fromfile.
@@ -648,3 +661,190 @@ fn test_parse_heterogeneous_dict() { | |||
) | |||
); | |||
} | |||
|
|||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
In addition to this set, could we have a test or two covering some more error cases beyond just "not existing": the major class I can think of is when the file contents is invalid, both as a scalar, and, probably more interestingly, as a list or dict (e.g. where serde json/yaml can't even parse it).
Thanks for the helpful review! I've addressed all your comments. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome
As part of this, streamlines away some previous function pointer
clunkiness via a new
Parseable
trait that implements parsingof scalars and list of scalars for the scalar types bool/i64/f64/String.
While messing with this code: this change also implements the
"source" field for list- and dict-valued option values.
Previously this was only provided for scalar-valued options. However
the highest-priority source is still a valid and useful concept even
for compound values that might be composed across multiple sources.