Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Environment] Generic approach with process_env #63

Open
klemens-morgenstern opened this issue Oct 31, 2020 · 13 comments
Open

[Environment] Generic approach with process_env #63

klemens-morgenstern opened this issue Oct 31, 2020 · 13 comments

Comments

@klemens-morgenstern
Copy link
Contributor

I don't think we can rely on P1275R0 to be standardized (I think the global std::arguments will not be accepted) I think we should have a more generic approach to passing environments into a process, than relying on once class. We could add an initializer, e.g. process_env or have a full featured environment class.

The big need of environment functions arises on the side of the child process, i.e. when a child queries it's own environment. For example, searching for an executable that emulates the shell behaviour would be quite common for starting processes, but is quite different between platforms:

On windows, a certain file name is search in the PATH (or Path sometimes) variable with the addition of the extensions available in PATHEXT. On posix no such extensions exist, thus a function like std::vector<std::filesystem::path> std::this_process::env::find_executable(std::string & name).
Similarly, one might want to get the default shell of the OS, which on posix is stored in SHELL while it's deducable through windir on windows. This could be handled by a std::filesystem::path std::this_process::env::shell() function.

It is to now however, that this information is queried before process start and thus usually queried from the father environment. Thus adding those specialized functions to an environment that is used to initialize a child process, should be considered of little to no priority. If another paper does provide such a class, we should support it.

If we discount specific values in an environment, what are the semantics ?

  1. It's a list of strings, in which every string contains at least one =
  2. The part before the first = is a unique key
  3. The part after the first = is the value, that is a : (posix) or ; (windows) seperated list.

With these three characteristics we can define what types are acceptable as an environment to be set by the process_env:

  1. range<convertible_to<string_view>>
  2. range<convertible_to<pair<string_view, string_view>>
  3. range<convertible_to<pair<string_view, range<string_view>>>
struct process_env {

    //string range
    template<ranges::range T>
        requires (
            requires(T t) {{*ranges::begin(t)} -> convertible_to<string_view>;}
            )
    process_env(T && t);


    //string pair range
    template<ranges::range T>
        requires (
            requires(T t) {{get<0>(*ranges::begin(t))} -> convertible_to<string_view>;}
         && requires(T t) {{get<1>(*ranges::begin(t))} -> convertible_to<string_view>;}
            )
    process_env(T && t);

    //string pair vector range
    template<ranges::range T>
        requires (
            requires(T t) {{get<0>(*ranges::begin(t))} -> convertible_to<string_view>;}
         && requires(T t) {{get<1>(*ranges::begin(t))} -> ranges::range;}
         && requires(T t) {{*ranges::begin(get<1>(*ranges::begin(t)))} -> ranges::range;}
            )
    process_env(T && t);

};

This this overloaded we can now use different containers, e.g. vector<string>, unordered_map<string, string> and unordered_map<string, vector<string>>.
The environment class proposed in P1275R0 does fulfill the first constraint as well.

@klemens-morgenstern
Copy link
Contributor Author

It might be a better idea to make process_env a template, which will allow more optimization for non-owning env, as might be given by a third party environment library.

@klemens-morgenstern
Copy link
Contributor Author

Version for a string list:

template<typename R>
        requires (
            requires(ranges::range_value_t<R> r) {{r} -> convertible_to<   string_view>;} ||
            requires(ranges::range_value_t<R> r) {{r} -> convertible_to<  wstring_view>;} ||
            requires(ranges::range_value_t<R> r) {{r} -> convertible_to< u8string_view>;} ||
            requires(ranges::range_value_t<R> r) {{r} -> convertible_to<u16string_view>;} ||
            requires(ranges::range_value_t<R> r) {{r} -> convertible_to<u32string_view>;}
            )
struct process_env {
    process_env( const R & data);
    process_env( const R & data, const std::locale & loc);


    template<typename T>
    process_env( initializer_list<T> data);// -> process_env<initializer_list<T>> ;
    template<typename T>
    process_env( initializer_list<T> data, const std::locale & loc);// -> process_env<initializer_list<T>>;

};

template<typename T>
process_env( initializer_list<T> data) -> process_env<initializer_list<T>> ;
template<typename T>
process_env( initializer_list<T> data, const locale & loc) -> process_env<initializer_list<T>>;

@klemens-morgenstern
Copy link
Contributor Author

The thing might work, but adding an initializer_list deduction guide for everthing is a bit intense. What we need is the capability to specialize for the native type, so there is overhead and maybe a initializer_list<string_view>. It does however make a log of sense to allow the env to be constructed from range<string>, map<string, string> and map<string, range<string>> . Furthermore, mixed forms to not need to be considered, beause the more complex can model the simple ones.

@klemens-morgenstern
Copy link
Contributor Author

Alright, made it readable, will adapt it for #67. It is turning out big: convertible_to_string_view is just for readability

template<typename T>
concept convertible_to_string_view = 
        convertible_to<T,    string_view> ||  convertible_to<T,   wstring_view> || convertible_to<T,  u8string_view> || convertible_to<T, u16string_view> || convertible_to<T, u32string_view> ;

template<typename T>
    requires convertible_to_string_view<T> || 
        (requires (T t)
        {
            {tuple_size<T>{}} -> convertible_to<integral_constant<size_t,2u>>;
            {get<0>(t)} -> convertible_to_string_view;
            {get<1>(t)} -> convertible_to_string_view;
        } || 
        requires (T t)
        {
            {tuple_size<T>{}} -> convertible_to<integral_constant<size_t,2u>>;
            {get<0>(t)} -> convertible_to_string_view;
            {get<1>(t)} -> ranges::range;
            {*ranges::begin(get<1>(t))} -> convertible_to_string_view;
        })
class process_env {
public:
    process_env();

    process_env(initializer_list<T> r);
    process_env(initializer_list<T> r, const std::locale & loc);

    template<ranges::range Range>
        requires(convertible_to<ranges::range_value_t<Range>, T>)
    process_env(Range r);

    template<ranges::range Range>
        requires(convertible_to<ranges::range_value_t<Range>, T>)
    process_env(Range r, const std::locale & loc);

    template<input_iterator InputIt>
        requires(convertible_to<typename iterator_traits<InputIt>::value_type, T>)
    process_env(InputIt first, InputIt last);

    template<input_iterator InputIt>
        requires(convertible_to<typename iterator_traits<InputIt>::value_type, T>)
    process_env(InputIt first, InputIt last,  const std::locale & loc);

};


template<ranges::range Range>
process_env(Range r) -> process_env<ranges::range_value_t<Range>>;

template<ranges::range Range>
process_env(Range && r, const std::locale & loc)  -> process_env<ranges::range_value_t<Range>>;


template<input_iterator InputIt>
process_env(InputIt first, InputIt last) -> process_env<typename iterator_traits<InputIt>::value_type>;

template<input_iterator InputIt>
process_env(InputIt first, InputIt last, const std::locale & loc) -> process_env<typename iterator_traits<InputIt>::value_type>;

process_env(initializer_list<pair<   string_view,    string_view>> r) -> process_env<pair<   string_view,    string_view>>;
process_env(initializer_list<pair<  wstring_view,   wstring_view>> r) -> process_env<pair<  wstring_view,   wstring_view>>;
process_env(initializer_list<pair< u8string_view,  u8string_view>> r) -> process_env<pair< u8string_view,  u8string_view>>;
process_env(initializer_list<pair<u16string_view, u16string_view>> r) -> process_env<pair<u16string_view, u16string_view>>;
process_env(initializer_list<pair<u32string_view, u32string_view>> r) -> process_env<pair<u32string_view, u32string_view>>;

process_env(initializer_list<pair<   string_view, initializer_list<   string_view>>> r) -> process_env<pair<   string_view, initializer_list<   string_view>>>;
process_env(initializer_list<pair<  wstring_view, initializer_list<  wstring_view>>> r) -> process_env<pair<  wstring_view, initializer_list<  wstring_view>>>;
process_env(initializer_list<pair< u8string_view, initializer_list< u8string_view>>> r) -> process_env<pair< u8string_view, initializer_list< u8string_view>>>;
process_env(initializer_list<pair<u16string_view, initializer_list<u16string_view>>> r) -> process_env<pair<u16string_view, initializer_list<u16string_view>>>;
process_env(initializer_list<pair<u32string_view, initializer_list<u32string_view>>> r) -> process_env<pair<u32string_view, initializer_list<u32string_view>>>;

@klemens-morgenstern
Copy link
Contributor Author

I got a rudimentary implementation of this process_env, which works, but I am starting to think it might be a stupid idea (probably because I spent the last three hours on it). All we really need would be to have the seperator of a list defined (i.e. : vs ;), so that a user can construct his own environment. That would be:

template<typename T>
concept convertible_to_string_view = 
        convertible_to<T,    string_view> ||  convertible_to<T,   wstring_view> || convertible_to<T,  u8string_view> || convertible_to<T, u16string_view> || convertible_to<T, u32string_view> ;

class process_env {
public:

    template<typename Char>
    constexpr static Char list_seperator = implementation-defined;
    
    template<convertible_to_string_view T>
    process_env(initializer_list<T> r);

    template<convertible_to_string_view T>
    process_env(initializer_list<T> r, const std::locale & loc);

    template<ranges::range Range>
        requires(convertible_to_string_view<ranges::range_value_t<Range>>)
    process_env(Range r);

    template<ranges::range Range>
        requires(convertible_to_string_view<ranges::range_value_t<Range>>)
    process_env(Range r, const std::locale & loc);

    template<input_iterator InputIt>
        requires(convertible_to_string_view<typename iterator_traits<InputIt>::value_type>)
    process_env(InputIt first, InputIt last);

    template<input_iterator InputIt>
        requires(convertible_to_string_view<typename iterator_traits<InputIt>::value_type>)
    process_env(InputIt first, InputIt last,  const std::locale & loc);

};

This is much simpler and should seamlessly work with a a third party environment library. Thoughts about that are highliy appreciated!

@JeffGarland
Copy link
Owner

apologies for being so far behind you on this. I think this is exactly the correct direction -- in particular the range based approach. Let me address a few points

  • P1275R0 -- the committee is expecting we will supplant this effort -- so for sure we can't count on it
  • P1275 takes a 'container approach' -- we should try to avoid that, which I think is what I see here
    1. containers are relatively complex to specify
    2. why invent a new one -- there's plenty of them already
    3. there's a proposal for ranges::to that takes a range an makes a container https://wg21.link/p1206
  • we can bikeshed the name, but i think process_env will be a hard sell because unfortunately 'process' sounds like a verb in this context - maybe just environment instead
  • would it be bad to just drop the iterator overloads to simplify more? Obviously a range can be made from the iterator pair.
  • can you explain the locale overload -- what is that used for?

@klemens-morgenstern
Copy link
Contributor Author

  • I have added the remark to the Rev2, that we can put every initializer as a nested type, e.g. process::env. This does make sense since all those things are just used to initialize a new process. That might address the issue.
  • I only added the iterator overloads to it reflects other common constructors, so yes we can remove them.
  • The env might need to perform some internal char-conversions (e.g. for wchar_t) which this locale can be used for. This reflects the locale overlaods in std::filesystem::path.

@JeffGarland
Copy link
Owner

* This reflects the `locale` overloads in `std::filesystem::path`.

Ok. I haven't followed in detail the discussions about filesystem::path on the committee level, but it seems like it is able to do some magic w.r.t. this -- but maybe that's in a proposed replacement, will have to check. The good news is that since Elias and I have already briefed this a couple times we can get SG16 to tell us what they'd like us to do with this stuff to be compatible with the upcoming unicode support directions.

@klemens-morgenstern
Copy link
Contributor Author

Right, whatever the current convention is works for me. We need to convert to wchar_t, i.e. UTF-16 on windows, I don't care how that's done.

@marijanp
Copy link
Contributor

comment 723703925 is a good insight, and the creation of a process_env should be possible in that manner. But what if the user wants to manipulate the environment variables definitions computationally before creating a child-process? I suspect that most users will at some point create containers to combine/manipulate the environment variables contents. That results in string operations like splitting (O(N)) e.g. the PATH variables paths. That might happen only once. But joining them again to a string (O(N)) might occur more often, frustrating the user.

Another thing that comes in my mind is the question, what should happen if multiple definitions of the same environment variable are passed to the constructor?

@klemens-morgenstern
Copy link
Contributor Author

If you manipulate the environment you will need to rebuild the environment for the child process anyhow. In linux it's a nullptr terminated char** list, on windows it's a string array, where the last element is an empty string. I.e. FOO=1\nBAR=2\n\n.

This means that you can get away without copies on linux, by just taking the pointer from a string_view, while you'll always copy a bunch on windows. The problem you talk about only occurs if the library accessing the current environment copies a lot. Passing everything in a string_views would allow preallocation though.

I did simplifiy the actually proposed version to just take a range of string_views and not the other stuff, so there would be no splitting going on unless explicitly requested by the users. That is unless he wants to modify the strings.

@klemens-morgenstern
Copy link
Contributor Author

I would leave the handling of multiple keys up to the implementation. Could be an overwrite or a system_error I reckon.

@marijanp
Copy link
Contributor

Reading all the comments I forgot what this issue was all about -> just the creation of a process env for the child...so you are right, the problem only occurs when the user accesses the environment of the current process and manipulates in some way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants