Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorstore C++ Integration: How to Avoid Driver Registry Conflict #191

Open
sameeul opened this issue Aug 26, 2024 · 6 comments
Open

Tensorstore C++ Integration: How to Avoid Driver Registry Conflict #191

sameeul opened this issue Aug 26, 2024 · 6 comments

Comments

@sameeul
Copy link

sameeul commented Aug 26, 2024

We have two Python packages (bfiocpp and argolid) where we use the C++ API and follow the CMake integration instruction given in the documentation.

Both of the packages individually work without any issue. However, we noticed that when both the packages are present in the same environment, if we try to import both of them, we get the following error:

python: /home/samee/axle_dev/argolid/build/temp.linux-x86_64-cpython-311/_deps/absl-src/absl/container/internal
/raw_hash_set.h:3154: void absl::lts_20240116::container_internal::raw_hash_set<Policy, Hash, Eq, Alloc>::
emplace_at(size_t, Args&& ...) [with Args = {const tensorstore::serialization::Registry::Entry*}; Policy = 
absl::lts_20240116::container_internal::FlatHashSetPolicy<const tensorstore::serialization::Registry::Entry*>; Hash = 
tensorstore::internal::SupportsHeterogeneous<absl::lts_20240116::hash_internal::Hash<tensorstore::internal::KeyAdapter<const 
tensorstore::serialization::Registry::Entry*, std::basic_string_view<char>, &tensorstore::serialization::Registry::Entry::id> > >;
 Eq = tensorstore::internal::SupportsHeterogeneous<std::equal_to<tensorstore::internal::KeyAdapter<const 
tensorstore::serialization::Registry::Entry*, std::basic_string_view<char>, &tensorstore::serialization::Registry::Entry::id> > >; Alloc = 
std::allocator<const tensorstore::serialization::Registry::Entry*>; size_t = long unsigned int]: Assertion 
`PolicyTraits::apply(FindElement{*this}, *iterator_at(i)) == iterator_at(i) && "constructed value does not match the lookup key"' failed.

I am using v0.1.60. I would appreciate if you can shed some light on why such driver registry works on global level instead of per-package level.

@laramiel
Copy link
Collaborator

laramiel commented Sep 3, 2024

Both argolid and bfiocpp use very old versions of tensorstore; have you updated them?

Anyway, without looking at your package files, it seems likely that two versions of tensorstore are being built and linked into your project. There may be order problems with your CMake file.

It would be nice to provide a self-contained repro to understand better what's going on.

@sameeul
Copy link
Author

sameeul commented Sep 3, 2024

I have also tried with the newer tensorstore release (v1.63). Finally, explicitly modifying the symbol visibility in the CMake files (at the project level) seem to resolve the issue.

 set(CMAKE_CXX_VISIBILITY_PRESET hidden)
 set(CMAKE_VISIBILITY_INLINES_HIDDEN 1)

@laramiel
Copy link
Collaborator

laramiel commented Sep 3, 2024

I may need to add something like that to our CMake generation.

@jbms
Copy link
Collaborator

jbms commented Sep 3, 2024

It appears that at least some tensorstore symbols have somehow ended up with global visibility across the two extension modules. Symbol visibility for Python extension modules is tricky. Normally you want all symbols defined by the extension module to be private. Python in fact loads extension modules with RTLD_LOCAL by default, which means that symbols should be private.

Separately, you can build with -fvisibility=hidden for better performance and to ensure symbols are private even if loaded with RTLD_GLOBAL. The tensorstore python package that we provide is built this way.

You may want to still figure out why the global symbol visibility issue came up in the first place. Is tensorstore statically linked into your C extension module or is there an intermediate dynamic library by any chance?

@laramiel
Copy link
Collaborator

laramiel commented Sep 3, 2024

I think that this is during the C++ build. There could be two versions of tensorstore included in the build, and/or there a difference between the equivalent bazel build flags and the cmake build flags and/or bazel_to_cmake failed to generate the linker script here?

@laramiel
Copy link
Collaborator

laramiel commented Sep 5, 2024

Is there a github repository or some other reproducing example you can provide where this happens?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@jbms @sameeul @laramiel and others