Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-extern-C symbols #27

Open
eyalroz opened this issue May 13, 2022 · 16 comments
Open

Support non-extern-C symbols #27

eyalroz opened this issue May 13, 2022 · 16 comments
Assignees
Labels
enhancement New feature or request

Comments

@eyalroz
Copy link
Contributor

eyalroz commented May 13, 2022

This is a C++ library for working with shared objects, but it only supports unmangled, C-style functions. That means it doesn't serve its primary function. The library must support any C++ function one can load from a shared object. Naturally, this is ABI-specific, but that's either for the user to configure and build accordingly, or potentially a case for multi-ABI support. The latter is much more complicated, and would be a feature request in itself, but function symbols should definitely be looked up by their mangled name, if they're not extern-C.

@martin-olivier martin-olivier self-assigned this May 17, 2022
@martin-olivier martin-olivier added the enhancement New feature or request label May 17, 2022
@martin-olivier
Copy link
Owner

martin-olivier commented May 30, 2022

Current status

Hello,
I'm currently working on that.
The goal is to add a feature to dylib to be able to load c++ symbols

Linux and MacOS

Variables
I can now access a mangled variable within a namespace :

dylib lib("lib.so");
auto ver = lib.get_variable<double>("driver::infos::version");

Functions
To be able to mangle functions within a namespace, or / and in an overload situation, i need to have access to each function parameter types. But currently, the template you need to specify to get_function is the following :

dylib lib("lib.so");

// get_function<T> for T = [module *(const char *)]
auto mod = lib.get_function<module *(const char *)>("driver::factory");

To be able to iterate over variadic template arguments, i temporally replaced the current syntax with the following one :

// old syntax
// get_function<T>
auto mod = lib.get_function<module *(const char *)>("driver::factory");

// temporary new syntax
// get_function<Ret, Args...>
auto mod = lib.get_function<module *, const char *>("driver::factory");

Do you know if there is a way to "decompose" a function template argument to get its return value as Ret and its arguments as Args... ?

Windows

TODO

@martin-olivier
Copy link
Owner

Update

Linux and MacOS

Variables
I can now access a mangled variable within a namespace :

dylib lib("lib");
auto ver = lib.get_variable<double>("driver::infos::version");

Functions
I can now access a mangled function within a namespace with any types of arguments :

dylib lib("lib");

auto mod = lib.get_function<module *, const char *>("driver::factory");
auto set_inst = lib.get_function<void, module &&>("driver::instance::set");
auto print = lib.get_function<void, std::ostream &, const std::string &>("driver::tools::print");

Windows

TODO (Next step)

Question

Do you know if there is a way to "decompose" a function template argument to get its return value as Ret and its arguments as Args... ?

@eyalroz
Copy link
Contributor Author

eyalroz commented Jun 3, 2022

Do you know if there is a way to "decompose" a function template argument to get its return value as Ret and its arguments as Args... ?

Well, std::result_of for the return type; and you can use this hack for the parameters.

But are you sure you're not going about this the wrong way? I mean, take the function's proper type, then apply name mangling (not yourself - there's an ABI library for that), then look for the symbol.

@martin-olivier
Copy link
Owner

But are you sure you're not going about this the wrong way? I mean, take the function's proper type, then apply name mangling (not yourself - there's an ABI library for that), then look for the symbol.

This is what I'm doing but i'm not sure there is an abi lib to mangle names (i'm currently using typeid(T)::name() to apply mangle)

There is this abi function to demangle a symbol but i didn't see anything about re-mangling:

char *demangledName = abi::__cxa_demangle(av[i], NULL, NULL, &status);

@eyalroz
Copy link
Contributor Author

eyalroz commented Jun 4, 2022

Ah, right, abi:: is just for demangling. typeid(T)::name() doesn't need an extra library; but then - it doesn't mangle names in the sense of getting you the symbol name to look for in an object.

@eyalroz
Copy link
Contributor Author

eyalroz commented Jun 4, 2022

Also, this may be relevant for Windows.

@martin-olivier
Copy link
Owner

martin-olivier commented Jun 4, 2022

typeid(T)::name() doesn't need an extra library; but then - it doesn't mangle names in the sense of getting you the symbol name to look for in an object.

You are right, to do so, i made the following code to have at the end the accurate function symbol mangled name in all situations (except pointers and namespaces) on unix :

    template <typename T, typename U, typename... Args>
    static std::string TemplateMangle()
    {
        return TemplateMangle<T>() + TemplateMangle<U, Args...>();
    }

    template <typename T>
    static std::string TemplateMangle()
    {
        std::string t = typeid(T).name();
        if (std::is_lvalue_reference<T>::value) {
            std::string tmp = "R";
            if (std::is_const<typename std::remove_reference<T>::type>::value)
                tmp += 'K';
            t = tmp + t;
        }
        else if (std::is_rvalue_reference<T>::value) {
            std::string tmp = "O";
            if (std::is_const<typename std::remove_reference<T>::type>::value)
                tmp += 'K';
            t = tmp + t;
        }
        return t;
    }

    template<typename ReturnType, typename Arg1, typename ...Args>
    static std::string mangle_function(const std::string &name) {
        return "_Z" + std::to_string(name.size()) + name + TemplateMangle<Arg1, Args...>();
    }

    template<typename ReturnType>
    static std::string mangle_function(const std::string &name) {
        return "_Z" + std::to_string(name.size()) + name + typeid(void).name();
    }

@eyalroz
Copy link
Contributor Author

eyalroz commented Jun 4, 2022

Let me first note I've asked about this at StackOverflow.

Now, for your implementation.

  • I think all of this code should be made constexpr - since it's all information that we know at compile-time.
  • The T and U in one of your function variants are ambiguous. Give them more specific names?
  • I suggest we don't use std::string's, but rather a string_view (or a char* and size_t pair in C++11) as the target buffer. We have rather expensive string concatenations in our code, that's true - but that only happens when handling errors.
  • ... actually, we may want to have a "poor man's span" structure with just those two fields
  • Same point about the inputs. So, something like:
    template <typename Function>
    mangle_function(dylib::detail_::poor_span<char> mangled_name, dylib::detail_::poor_span<char> function_name)`
    
  • TemplateMangle - what exactly does it mangle? It seems like it mangles the name of a type, right? Then better call it mangle_type(). Or perhaps just mangle().
  • Don't use the same string literal in multiple places.
  • You can probably have an outer mangle() function with a single template parameter, like I suggested above - since it can distinguish at compile-time between whether it was asked to mangle a function or a variable, and call inner code - possibly with a different function name - as necessary.
  • Have you checked this against the Itanium ABI document to make sure it's valid? That should also get you going with namespace and pointers.
  • What about mangling a variable?

@martin-olivier
Copy link
Owner

I think all of this code should be made constexpr - since it's all information that we know at compile-time.
The T and U in one of your function variants are ambiguous. Give them more specific names?
I suggest we don't use std::string's, but rather a string_view (or a char* and size_t pair in C++11) as the target buffer. We have rather expensive string concatenations in our code, that's true - but that only happens when handling errors.
... actually, we may want to have a "poor man's span" structure with just those two fields
Same point about the inputs. So, something like:
template
mangle_function(dylib::detail_::poor_span mangled_name, dylib::detail_::poor_span function_name)`
TemplateMangle - what exactly does it mangle? It seems like it mangles the name of a type, right? Then better call it mangle_type(). Or perhaps just mangle().
Don't use the same string literal in multiple places.

You're right, but currently I prefer to focus on making the proof of concept work

Have you checked this against the Itanium ABI document to make sure it's valid? That should also get you going with namespace and pointers.

Yes, i'm using this document to implement the feature

What about mangling a variable?

The following code mangles namespaced varibles on unix :

class dylib { 
private:
    static std::vector<std::string> string_to_vector(const std::string &str, const char *delimiters) {
        std::vector<std::string> tokens;
        std::string::size_type lastPos = str.find_first_not_of(delimiters, 0);
        std::string::size_type pos = str.find_first_of(delimiters, lastPos);
        while (std::string::npos != pos || std::string::npos != lastPos) {
            tokens.push_back(str.substr(lastPos, pos - lastPos));
            lastPos = str.find_first_not_of(delimiters, pos);
            pos = str.find_first_of(delimiters, lastPos);
        }
        return tokens;
    }

    static std::string mangle_variable(const std::string &name) {
        if (name.find("::") == std::string::npos)
            return name;
        auto ns_list = string_to_vector(name, "::");
        if (ns_list.size() == 1)
            return ns_list.front();
        std::string mangled = "_ZN";
        for (auto &ns : ns_list)
            mangled += std::to_string(ns.size()) + ns;
        return mangled + 'E';
    }
}

@eyalroz
Copy link
Contributor Author

eyalroz commented Jun 5, 2022

I think you may be misusing the delimiters parameter... it takes several chars, each of which is a delimited.

@martin-olivier
Copy link
Owner

I'm gonna release 2.0.0 without this remangling feature since i dont have many time to work on that actually.

@eyalroz
Copy link
Contributor Author

eyalroz commented Jun 11, 2022

@martin-olivier : There's always version 3.0...

@eyalroz eyalroz changed the title No support for arbitrary functions Support non-extern-C symbols Jun 20, 2022
@eyalroz
Copy link
Contributor Author

eyalroz commented Jun 20, 2022

Here are some outstanding SO questions about doing this:

@eyalroz
Copy link
Contributor Author

eyalroz commented Jun 21, 2022

Good news - here's MSVC mangling code for you:

https://godbolt.org/z/nnW19qzYE

Right now, that code requires C++20, but with a little work you can bring that down to C++11 and integrate it into yur own code.

@ericoporto
Copy link

Good news - here's MSVC mangling code for you:

https://godbolt.org/z/nnW19qzYE

Right now, that code requires C++20, but with a little work you can bring that down to C++11 and integrate it into yur own code.

Hey, did anyone did that "little work"? I kinda need it to build in some old compilers. :/

@stellarpower
Copy link

Don't know if this is of any help. When demangling the other way, I always use boost. There is also a nice static type_info that uses a string_view and the PRETTY_FUNCTION macro in order to extract the unmangled names. There a few around and can't remember which I used but here is one of them. Either way I'd probably buy not build and would've thought a compiler/support library builtin must be able to mangle names. Last time I looked at libsupcxx many years ago I think I saw one.

Personally though I'd probably rather use the compiler and register something into my library. I appreciate that only works for some usecases though, where you can modify the sources and you're generating something more like a plugin, rather than just very late binding of an arbitrary function. Let the compiler do that work and pull symbols in using something more like a factory with my own key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants