Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not working with grpc-1.53.0 & server still waiting after finish training #63

Open
lidh15 opened this issue Mar 30, 2023 · 8 comments
Open
Labels
enhancement New feature or request

Comments

@lidh15
Copy link

lidh15 commented Mar 30, 2023

the documentation mentioned that grpc earlier than 1.50 may not work, I used the latest release 1.53, and making throws error:

[ 20%] Building CXX object src/FedTree/CMakeFiles/FedTree_DIST.dir/scikit_fedtree.cpp.o
In file included from /usr/local/include/absl/base/config.h:86,
                 from /usr/local/include/absl/base/const_init.h:25,
                 from /usr/local/include/absl/synchronization/mutex.h:67,
                 from /usr/local/include/grpcpp/impl/sync.h:30,
                 from /usr/local/include/grpcpp/impl/codegen/sync.h:25,
                 from /usr/local/include/grpcpp/completion_queue.h:43,
                 from /usr/local/include/grpcpp/channel.h:25,
                 from /usr/local/include/grpcpp/grpcpp.h:52,
                 from /workspace/FedTree/include/FedTree/FL/distributed_party.h:8,
                 from /workspace/FedTree/src/FedTree/FL/distributed_party.cpp:5:
/usr/local/include/absl/base/policy_checks.h:79:2: error: #error "C++ versions less than C++14 are not supported."
   79 | #error "C++ versions less than C++14 are not supported."
      |  ^~~~~
In file included from /usr/local/include/absl/base/config.h:86,
                 from /usr/local/include/absl/base/const_init.h:25,
                 from /usr/local/include/absl/synchronization/mutex.h:67,
                 from /usr/local/include/grpcpp/impl/sync.h:30,
                 from /usr/local/include/grpcpp/impl/codegen/sync.h:25,
                 from /usr/local/include/grpcpp/completion_queue.h:43,
                 from /usr/local/include/grpcpp/channel.h:25,
                 from /usr/local/include/grpcpp/grpcpp.h:52,
                 from /workspace/FedTree/include/FedTree/FL/distributed_server.h:8,
                 from /workspace/FedTree/src/FedTree/FL/distributed_server.cpp:5:
/usr/local/include/absl/base/policy_checks.h:79:2: error: #error "C++ versions less than C++14 are not supported."
   79 | #error "C++ versions less than C++14 are not supported."
      |  ^~~~~
In file included from /usr/local/include/absl/time/time.h:88,
                 from /usr/local/include/absl/time/clock.h:26,
                 from /usr/local/include/absl/synchronization/internal/kernel_timeout.h:35,
                 from /usr/local/include/absl/synchronization/mutex.h:74,
                 from /usr/local/include/grpcpp/impl/sync.h:30,
                 from /usr/local/include/grpcpp/impl/codegen/sync.h:25,
                 from /usr/local/include/grpcpp/completion_queue.h:43,
                 from /usr/local/include/grpcpp/channel.h:25,
                 from /usr/local/include/grpcpp/grpcpp.h:52,
                 from /workspace/FedTree/include/FedTree/FL/distributed_party.h:8,
                 from /workspace/FedTree/src/FedTree/FL/distributed_party.cpp:5:
/usr/local/include/absl/strings/string_view.h: In member function ‘constexpr void absl::lts_20230125::string_view::remove_prefix(absl::lts_20230125::string_view::size_type) const’:
/usr/local/include/absl/strings/string_view.h:340:10: error: assignment of member ‘absl::lts_20230125::string_view::ptr_’ in read-only object
  340 |     ptr_ += n;
      |     ~~~~~^~~~
/usr/local/include/absl/strings/string_view.h:341:13: error: assignment of member ‘absl::lts_20230125::string_view::length_’ in read-only object
  341 |     length_ -= n;
      |     ~~~~~~~~^~~~
/usr/local/include/absl/strings/string_view.h:338:18: error: invalid return type ‘void’ of ‘constexpr’ function ‘constexpr void absl::lts_20230125::string_view::remove_prefix(absl::lts_20230125::string_view::size_type) const’
  338 |   constexpr void remove_prefix(size_type n) {
      |                  ^~~~~~~~~~~~~
/usr/local/include/absl/strings/string_view.h: In member function ‘constexpr void absl::lts_20230125::string_view::remove_suffix(absl::lts_20230125::string_view::size_type) const’:
/usr/local/include/absl/strings/string_view.h:350:13: error: assignment of member ‘absl::lts_20230125::string_view::length_’ in read-only object
  350 |     length_ -= n;
      |     ~~~~~~~~^~~~
/usr/local/include/absl/strings/string_view.h:348:18: error: invalid return type ‘void’ of ‘constexpr’ function ‘constexpr void absl::lts_20230125::string_view::remove_suffix(absl::lts_20230125::string_view::size_type) const’
  348 |   constexpr void remove_suffix(size_type n) {
      |                  ^~~~~~~~~~~~~
/usr/local/include/absl/strings/string_view.h: In member function ‘constexpr void absl::lts_20230125::string_view::swap(absl::lts_20230125::string_view&) const’:
/usr/local/include/absl/strings/string_view.h:358:13: error: passing ‘const absl::lts_20230125::string_view’ as ‘this’ argument discards qualifiers [-fpermissive]
  358 |     *this = s;
      |             ^
/usr/local/include/absl/strings/string_view.h:161:7: note:   in call to ‘absl::lts_20230125::string_view& absl::lts_20230125::string_view::operator=(const absl::lts_20230125::string_view&)’
  161 | class string_view {
      |       ^~~~~~~~~~~
/usr/local/include/absl/strings/string_view.h:356:18: error: invalid return type ‘void’ of ‘constexpr’ function ‘constexpr void absl::lts_20230125::string_view::swap(absl::lts_20230125::string_view&) const’
  356 |   constexpr void swap(string_view& s) noexcept {
      |                  ^~~~
In file included from /usr/local/include/absl/time/time.h:88,
                 from /usr/local/include/absl/time/clock.h:26,
                 from /usr/local/include/absl/synchronization/internal/kernel_timeout.h:35,
                 from /usr/local/include/absl/synchronization/mutex.h:74,
                 from /usr/local/include/grpcpp/impl/sync.h:30,
                 from /usr/local/include/grpcpp/impl/codegen/sync.h:25,
                 from /usr/local/include/grpcpp/completion_queue.h:43,
                 from /usr/local/include/grpcpp/channel.h:25,
                 from /usr/local/include/grpcpp/grpcpp.h:52,
                 from /workspace/FedTree/include/FedTree/FL/distributed_server.h:8,
                 from /workspace/FedTree/src/FedTree/FL/distributed_server.cpp:5:
/usr/local/include/absl/strings/string_view.h: In member function ‘constexpr void absl::lts_20230125::string_view::remove_prefix(absl::lts_20230125::string_view::size_type) const’:
/usr/local/include/absl/strings/string_view.h:340:10: error: assignment of member ‘absl::lts_20230125::string_view::ptr_’ in read-only object
  340 |     ptr_ += n;
      |     ~~~~~^~~~
/usr/local/include/absl/strings/string_view.h:341:13: error: assignment of member ‘absl::lts_20230125::string_view::length_’ in read-only object
  341 |     length_ -= n;
      |     ~~~~~~~~^~~~
/usr/local/include/absl/strings/string_view.h:338:18: error: invalid return type ‘void’ of ‘constexpr’ function ‘constexpr void absl::lts_20230125::string_view::remove_prefix(absl::lts_20230125::string_view::size_type) const’
  338 |   constexpr void remove_prefix(size_type n) {
      |                  ^~~~~~~~~~~~~
/usr/local/include/absl/strings/string_view.h: In member function ‘constexpr void absl::lts_20230125::string_view::remove_suffix(absl::lts_20230125::string_view::size_type) const’:
/usr/local/include/absl/strings/string_view.h:350:13: error: assignment of member ‘absl::lts_20230125::string_view::length_’ in read-only object
  350 |     length_ -= n;
      |     ~~~~~~~~^~~~
/usr/local/include/absl/strings/string_view.h:348:18: error: invalid return type ‘void’ of ‘constexpr’ function ‘constexpr void absl::lts_20230125::string_view::remove_suffix(absl::lts_20230125::string_view::size_type) const’
  348 |   constexpr void remove_suffix(size_type n) {
      |                  ^~~~~~~~~~~~~
/usr/local/include/absl/strings/string_view.h: In member function ‘constexpr void absl::lts_20230125::string_view::swap(absl::lts_20230125::string_view&) const’:
/usr/local/include/absl/strings/string_view.h:358:13: error: passing ‘const absl::lts_20230125::string_view’ as ‘this’ argument discards qualifiers [-fpermissive]
  358 |     *this = s;
      |             ^
/usr/local/include/absl/strings/string_view.h:161:7: note:   in call to ‘absl::lts_20230125::string_view& absl::lts_20230125::string_view::operator=(const absl::lts_20230125::string_view&)’
  161 | class string_view {
      |       ^~~~~~~~~~~
/usr/local/include/absl/strings/string_view.h:356:18: error: invalid return type ‘void’ of ‘constexpr’ function ‘constexpr void absl::lts_20230125::string_view::swap(absl::lts_20230125::string_view&) const’
  356 |   constexpr void swap(string_view& s) noexcept {
      |                  ^~~~
[ 21%] Linking CXX shared library ../../lib/libFedTree.so
make[2]: *** [src/FedTree/CMakeFiles/FedTree_DIST.dir/build.make:146: src/FedTree/CMakeFiles/FedTree_DIST.dir/FL/distributed_party.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
/usr/bin/ld: /usr/local/lib/libntl.a(ZZ.o): relocation R_X86_64_TPOFF32 against `_ZN3NTLL8iodigitsE' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: /usr/local/lib/libntl.a(fileio.o): relocation R_X86_64_TPOFF32 against `_ZZN3NTL8UniqueIDB5cxx11EvE37_ntl_hidden_variable_tls_local_ptr_ID' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: /usr/local/lib/libntl.a(lip.o): relocation R_X86_64_TPOFF32 against `_ZZ10_ntl_gswapPP17_ntl_gbigint_bodyS1_E36_ntl_hidden_variable_tls_local_ptr_t' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: /usr/local/lib/libntl.a(tools.o): relocation R_X86_64_TPOFF32 against symbol `_ZN3NTL16ErrorMsgCallbackE' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: /usr/local/lib/libntl.a(thread.o): relocation R_X86_64_TPOFF32 against `_ZZN3NTL15CurrentThreadIDB5cxx11EvE37_ntl_hidden_variable_tls_local_ptr_ID' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: /usr/local/lib/libntl.a(BasicThreadPool.o): relocation R_X86_64_TPOFF32 against `_ZZN3NTLL49_ntl_hidden_function_tls_access_NTLThreadPool_stgEvE52_ntl_hidden_variable_tls_local_ptr_NTLThreadPool_stg' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: /usr/local/lib/libntl.a(lip.o): warning: relocation against `_ZTV21_ntl_tmp_vec_crt_fast' in read-only section `.text'
collect2: error: ld returned 1 exit status
make[2]: *** [src/FedTree/CMakeFiles/FedTree.dir/build.make:551: lib/libFedTree.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:154: src/FedTree/CMakeFiles/FedTree.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
make[2]: *** [src/FedTree/CMakeFiles/FedTree_DIST.dir/build.make:160: src/FedTree/CMakeFiles/FedTree_DIST.dir/FL/distributed_server.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:232: src/FedTree/CMakeFiles/FedTree_DIST.dir/all] Error 2
[ 23%] Linking CXX static library ../../lib/libft_grpc_proto.a
[ 24%] Built target ft_grpc_proto
make: *** [Makefile:91: all] Error 2
seems that it came from the latest absl.
@lidh15
Copy link
Author

lidh15 commented Mar 30, 2023

okay, it's not about absl, update CMakeLists.txt from c++11 to c++14 fixed it, but it is about zliib, the errors are:

/usr/bin/ld: /usr/local/lib/libgrpc.a(message_compress.cc.o): in function `zlib_compress(grpc_slice_buffer*, grpc_slice_buffer*, int)':
message_compress.cc:(.text+0x541): undefined reference to `deflateInit2_'
/usr/bin/ld: message_compress.cc:(.text+0x58b): undefined reference to `deflate'
/usr/bin/ld: message_compress.cc:(.text+0x660): undefined reference to `deflateEnd'
/usr/bin/ld: /usr/local/lib/libgrpc.a(message_compress.cc.o): in function `zlib_decompress(grpc_slice_buffer*, grpc_slice_buffer*, int)':
message_compress.cc:(.text+0x701): undefined reference to `inflateInit2_'
/usr/bin/ld: message_compress.cc:(.text+0x747): undefined reference to `inflate'
/usr/bin/ld: message_compress.cc:(.text+0x7ee): undefined reference to `inflateEnd'
collect2: error: ld returned 1 exit status
make[2]: *** [src/FedTree/CMakeFiles/FedTree-distributed-party.dir/build.make:164: bin/FedTree-distributed-party] Error 1
make[1]: *** [CMakeFiles/Makefile2:259: src/FedTree/CMakeFiles/FedTree-distributed-party.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
/usr/bin/ld: /usr/local/lib/libgrpc.a(message_compress.cc.o): in function `zlib_compress(grpc_slice_buffer*, grpc_slice_buffer*, int)':
message_compress.cc:(.text+0x541): undefined reference to `deflateInit2_'
/usr/bin/ld: message_compress.cc:(.text+0x58b): undefined reference to `deflate'
/usr/bin/ld: message_compress.cc:(.text+0x660): undefined reference to `deflateEnd'
/usr/bin/ld: /usr/local/lib/libgrpc.a(message_compress.cc.o): in function `zlib_decompress(grpc_slice_buffer*, grpc_slice_buffer*, int)':
message_compress.cc:(.text+0x701): undefined reference to `inflateInit2_'
/usr/bin/ld: message_compress.cc:(.text+0x747): undefined reference to `inflate'
/usr/bin/ld: message_compress.cc:(.text+0x7ee): undefined reference to `inflateEnd'
collect2: error: ld returned 1 exit status
make[2]: *** [src/FedTree/CMakeFiles/FedTree-distributed-server.dir/build.make:164: bin/FedTree-distributed-server] Error 1
make[1]: *** [CMakeFiles/Makefile2:286: src/FedTree/CMakeFiles/FedTree-distributed-server.dir/all] Error 2
make: *** [Makefile:91: all] Error 2

@QinbinLi
Copy link
Member

Hi @lidh15 ,

We use grpc 1.50.0 to generate the proto files. If you use a version other than 1.50.0, you may need to go to src/FedTree/grpc directory and run the following commands. Then you can try to compile the library. Thank you!

protoc -I ./ --grpc_out=. --plugin=protoc-gen-grpc=`which grpc_cpp_plugin` ./fedtree.proto
protoc -I ./ --cpp_out=. ./fedtree.proto

@lidh15
Copy link
Author

lidh15 commented Mar 30, 2023

okay, I'll try.

@lidh15
Copy link
Author

lidh15 commented Mar 30, 2023

I don't know if it is okay to discuss in this issue or I should start a new one: why the distributed server won't exit after a vertical gbdt training process?
I know in original horizontal federated learning architecture it is believed to be a service, but in vertical scenarios "server" usually is also a "party" but only with label, will it be possible that "distributed-party" takes server's job and exit after a training task?

@QinbinLi
Copy link
Member

Thank you for this great suggestion! Indeed it'd be better if the server stops automatically when the task is over. We'll fix it in the future.

@QinbinLi QinbinLi changed the title not working with grpc-1.53.0 not working with grpc-1.53.0 & server still waiting after finish training Mar 30, 2023
@QinbinLi QinbinLi added the enhancement New feature or request label Mar 30, 2023
@lidh15
Copy link
Author

lidh15 commented Mar 31, 2023

Hi @lidh15 ,

We use grpc 1.50.0 to generate the proto files. If you use a version other than 1.50.0, you may need to go to src/FedTree/grpc directory and run the following commands. Then you can try to compile the library. Thank you!

protoc -I ./ --grpc_out=. --plugin=protoc-gen-grpc=`which grpc_cpp_plugin` ./fedtree.proto
protoc -I ./ --cpp_out=. ./fedtree.proto

this didn't help

@lidh15
Copy link
Author

lidh15 commented Mar 31, 2023

and one more question, how many bits of N is used in paillier HE for vertical GBDT? Typically it is 2048, but I didn't see this description in the documentation.

QinbinLi added a commit that referenced this issue Apr 1, 2023
@QinbinLi
Copy link
Member

QinbinLi commented Apr 1, 2023

512 bits are used in the default setting. I just added the parameter key_length so that users can control the bits. Please refer to https://fedtree.readthedocs.io/en/latest/Parameters.html for details.

For grpc 1.53.0, I have no idea why it fails. I'm considering adding a feature to automatically install a fixed version of grpc when compiling FedTree to avoid the grpc compatibility issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants