Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.17~preview.129.36+325 fails to build on M2 Chip Mac w/ libtorch 2.3.0, OCaml 5.2.0, opam 2.2.1 #14

Open
ShunchiZhang opened this issue Oct 28, 2024 · 11 comments
Labels
forwarded-to-js-devs This report has been forwarded to Jane Street's internal review system.

Comments

@ShunchiZhang
Copy link

ShunchiZhang commented Oct 28, 2024

Follow the instructions in #2:

  • Download libtorch binaries (or build libtorch in your Mac). At the moment there are no official pre-build binaries. I downloaded my (unofficial) binaries from https://github.com/mlverse/libtorch-mac-m1/releases .
  • Install OCaml >= 4.14 (see here: https://opam.ocaml.org/packages/torch/)
  • Double check what libtorch version is compatible with the current version of OCaml torch. Version 1.13.1 is the one you want with v0.16.0 version of OCaml torch.
  • Set the LIBTORCH environment variable to the directory that includes the include and lib directories.

To install with opam:

opam install torch.v0.16.0 --ignore-constraints-on libtorch

I run:

cd /opt/libtorch
wget https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.3.0.zip
unzip libtorch-macos-arm64-2.3.0.zip -d v2.3.0
cat /opt/libtorch/v2.3.0/libtorch/build-version # 2.3.0

LIBTORCH=/opt/libtorch/v2.3.0/libtorch/ opam install torch.v0.17.0 --ignore-constraints-on libtorch

and get the following error:

Click to expand the terminal log
➜ LIBTORCH=/opt/libtorch/v2.3.0/libtorch/ opam install torch.v0.17.0 --ignore-constraints-on libtorch
The following actions will be performed:
=== install 1 package
  ∗ torch v0.17.0

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><><><><>  🐫 
⬇ retrieved torch.v0.17.0  (cached)
[ERROR] The compilation of torch.v0.17.0 failed at "dune build -p torch -j 7".

#=== ERROR while compiling torch.v0.17.0 ======================================#
# context     2.2.1 | macos/arm64 | ocaml-base-compiler.5.2.0 | https://opam.ocaml.org#6383bc5431ca714c10b4e29dbf7eda9572a4ac07
# path        ~/.opam/5.2.0/.opam-switch/build/torch.v0.17.0
# command     ~/.opam/opam-init/hooks/sandbox.sh build dune build -p torch -j 7
# exit-code   1
# env-file    ~/.opam/log/torch-12746-231689.env
# output-file ~/.opam/log/torch-12746-231689.out
### output ###
# torch_stubs_generated.c:31309:43: warning: passing 'const char *' to parameter of type 'char *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
# [...]
#                                           ^~~~~~
# ./torch_api_generated.h:2014:73: note: passing argument to parameter 'reduce' here
# raw_tensor atg_segment_reduce_out(gc_tensor out, gc_tensor data, char * reduce, gc_tensor lengths, gc_tensor indices, gc_tensor offsets, int64_t axis, int unsafe, scalar initial);
#                                                                         ^
# torch_stubs_generated.c:35007:28: warning: passing 'const char *' to parameter of type 'char *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
#                    x33586, x33589, x33590, x33593, x33596);
#                            ^~~~~~
# ./torch_api_generated.h:2336:182: note: passing argument to parameter 'pad_mode' here
# raw_tensor atg_stft_center(gc_tensor self, int64_t n_fft, int64_t hop_length_v, int hop_length_null, int64_t win_length_v, int win_length_null, gc_tensor window, int center, char * pad_mode, int normalized, int onesided, int return_complex);
#                                                                                                                                                                                      ^
# 125 warnings and 1 error generated.



<><> Error report <><><><><><><><><><><><><><><><><><><><><><><><><><><><><>  🐫 
┌─ The following actions failed
│ λ build torch v0.17.0
└─ 
╶─ No changes have been performed

<><> torch.v0.17.0 troubleshooting ><><><><><><><><><><><><><><><><><><><><>  🐫 
=> Installation of ocaml-torch failed. This likely happened
   because there is no system installation of libtorch to compile
   OCaml bindings against. Please instal the CPU version of libtorch
   through opam, or the appropriate version of libtorch for your GPU
   through the official distribution.

All of this is done at my MacBook Air with M2 chip. The version of OCaml and opam shows as follows:

➜ opam --version
2.2.1
➜ opam switch list  
#  switch   compiler                                           description
→  5.2.0    ocaml-base-compiler.5.2.0,ocaml-options-vanilla.1  ocaml-base-compiler = 5.2.0 | ocaml-system = 5.2.0
   default  ocaml-base-compiler.5.2.0,ocaml-options-vanilla.1  ocaml >= 4.05.0

Thank you for helping me to resolve this issue :)

@ShunchiZhang
Copy link
Author

ShunchiZhang commented Oct 28, 2024

TL;DR: same error with libtorch v2.1.0

Although the current README indicates the compatible version of torch is v2.3:

The current GitHub tip corresponds to PyTorch **v2.3**.

I notice at opam package page for torch.v0.17.0, libtorch<2.1.0 | >=2.2.0 will result conflicts. But my try with libtorch v2.1.0 (from mlverse/libtorch-mac-m1 as there is no official build until v2.2.0) also fails with the same error above.

@ShunchiZhang
Copy link
Author

ShunchiZhang commented Oct 28, 2024

TL;DR: same error installing v0.16.0 with libtorch v1.13.1

I just tried to build v0.16.0 with libtorch v1.13.1, but again met the same error:

Click to expand the terminal log
[ERROR] The compilation of torch.v0.16.0 failed at "dune build -p torch -j 7".

#=== ERROR while compiling torch.v0.16.0 ======================================#
# context     2.2.1 | macos/arm64 | ocaml-base-compiler.5.2.0 | https://opam.ocaml.org#6383bc5431ca714c10b4e29dbf7eda9572a4ac07
# path        ~/.opam/5.2.0/.opam-switch/build/torch.v0.16.0
# command     ~/.opam/opam-init/hooks/sandbox.sh build dune build -p torch -j 7
# exit-code   1
# env-file    ~/.opam/log/torch-32153-10e3e4.env
# output-file ~/.opam/log/torch-32153-10e3e4.out
### output ###
# torch_stubs.c:34650:51: warning: passing 'const char *' to parameter of type 'char *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
# [...]
#                                                   ^~~~~~
# ./torch_api_generated.h:1982:71: note: passing argument to parameter 'reduce' here
# void atg_segment_reduce_out(tensor *, tensor out, tensor data, char * reduce, tensor lengths, tensor indices, tensor offsets, int64_t axis, int unsafe, scalar initial);
#                                                                       ^
# torch_stubs.c:38971:36: warning: passing 'const char *' to parameter of type 'char *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers]
#                    x37501, x37502, x37505, x37506, x37509, x37512);
#                                    ^~~~~~
# ./torch_api_generated.h:2300:180: note: passing argument to parameter 'pad_mode' here
# void atg_stft_center(tensor *, tensor self, int64_t n_fft, int64_t hop_length_v, int hop_length_null, int64_t win_length_v, int win_length_null, tensor window, int center, char * pad_mode, int normalized, int onesided, int return_complex);
#                                                                                                                                                                                    ^
# 121 warnings and 1 error generated.



<><> Error report <><><><><><><><><><><><><><><><><><><><><><><><><><><><><>  🐫 
┌─ The following actions failed
│ λ build torch v0.16.0

@github-iron github-iron added the forwarded-to-js-devs This report has been forwarded to Jane Street's internal review system. label Oct 29, 2024
@mwlon
Copy link

mwlon commented Oct 30, 2024

I think you'll need 0.17.0 + libtorch 2.3 (see README for current libtorch version).

Since we don't publish mac releases of the libtorch opam package anymore, you'll need to uninstall opam torch (and opam libtorch if you have it), download the binaries manually (options 2-4 in README), set the corresponding environment variable, and reinstall opam torch. Do not install opam libtorch.

@ShunchiZhang
Copy link
Author

I think you'll need 0.17.0 + libtorch 2.3 (see README for current libtorch version).

Since we don't publish mac releases of the libtorch opam package anymore, you'll need to uninstall opam torch (and opam libtorch if you have it), download the binaries manually (options 2-4 in README), set the corresponding environment variable, and reinstall opam torch. Do not install opam libtorch.

As I stated above, I have tried 3 below combinations with option 4 and met the same error.

ocaml-torch libtorch Reference
v0.17.0 v2.3.0 Current README (e4d20de)
v0.17.0 v2.1.0 opam Package Page
v0.16.0 v1.13.1 Solution in Issue #2

Besides, there seems to be no all target in the Makefile.

@arbipher
Copy link

arbipher commented Nov 1, 2024

Here is the error on my machine apple M3 with ocaml-torch v0.17, OCaml 5.2.0, opam 2.2.1.
The LIBTORCH is set to /Users/<me>/Library/Python/3.12/lib/python/site-packages/torch at version 2.3.1

The following actions will be performed:
=== install 1 package
  ∗ torch v0.17.0 (pinned)

Proceed with ∗ 1 installation? [y/n] y

<><> Processing actions <><><><><><><><><><><><><><><><><><><><><><><><><><>  🐫
⬇ retrieved torch.v0.17.0  (no changes)
[ERROR] The compilation of torch.v0.17.0 failed at "dune build -p torch -j 15".

#=== ERROR while compiling torch.v0.17.0 ======================================#
# context     2.2.1 | macos/arm64 | ocaml.5.2.0 | pinned(git+https://github.com/janestreet/torch.git#e4d20dea8df4fedeabcf22fd32149ff58108a652)
# path        ~/.opam/default/.opam-switch/build/torch.v0.17.0
# command     ~/.opam/opam-init/hooks/sandbox.sh build dune build -p torch -j 15
# exit-code   1
# env-file    ~/.opam/log/torch-25252-f5afb6.env
# output-file ~/.opam/log/torch-25252-f5afb6.out
### output ###
# ./torch_api_generated.h:2385:182: note: passing argument to parameter 'pad_mode' here
# [...]
# 129 warnings and 1 error generated.
# (cd _build/default && /Users/ex/.opam/default/bin/ocamlc.opt -w -40 -g -bin-annot -bin-annot-occurrences -I src/torch/.torch.objs/byte -I /Users/ex/.opam/default/lib/base -I /Users/ex/.opam/default/lib/base/base_internalhash_types -I /Users/ex/.opam/default/lib/base/md5 -I /Users/ex/.opam/default/lib/base/shadow_stdlib -I /Users/ex/.opam/default/lib/base_bigstring -I /Users/ex/.opam/default/l[...]
# File "src/torch/optimizer.ml", line 148, characters 18-40:
# 148 |       let index = Option.value_local_exn index in
#                         ^^^^^^^^^^^^^^^^^^^^^^
# Error: Unbound value "Option.value_local_exn"
# (cd _build/default && /Users/ex/.opam/default/bin/ocamlopt.opt -w -40 -g -I src/torch/.torch.objs/byte -I src/torch/.torch.objs/native -I /Users/ex/.opam/default/lib/base -I /Users/ex/.opam/default/lib/base/base_internalhash_types -I /Users/ex/.opam/default/lib/base/md5 -I /Users/ex/.opam/default/lib/base/shadow_stdlib -I /Users/ex/.opam/default/lib/base_bigstring -I /Users/ex/.opam/default/l[...]
# File "src/torch/optimizer.ml", line 148, characters 18-40:
# 148 |       let index = Option.value_local_exn index in
#                         ^^^^^^^^^^^^^^^^^^^^^^
# Error: Unbound value "Option.value_local_exn"

I can see the same error for both opam install torch or opam pin torch https://github.com/janestreet/torch.git.

@arbipher
Copy link

arbipher commented Nov 1, 2024

Ok, after warming up some old memory on this code in February and some trial and error, it now builds and runs on my mac. It needs OCaml 5.1.1 because PyML needs 5.1.1 (due to stdcompact). My PR should be agnostic of this.

> dune exec examples/basics/basics.exe
cuda available: false                  
cudnn available: false
42
[ CPUFloatType{} ]

@mwlon
Copy link

mwlon commented Nov 1, 2024

@ShunchiZhang ah I missed that you had tried that combination already. I've been able to replicate the error now, will try @arbipher's fix

@arbipher
Copy link

arbipher commented Nov 1, 2024

Hi @mwlon

This post was obsolete. See my newer reply.

I found another problem that with my fix or the original code that

dune build always works but dune build -p torch (which opam install uses) will raise fatal error: 'torch_api_generated.cpp' file not found.

dune build src/wrapper also works.

It seems torch_api is not specified in any dune files that library torch can refer to. It's not a problem for dune build because it may try all targets. No ideas for this yet.

When using dune build -p torch, torch.install will never trigger those gen_{bindings,stubs} alias, therefore the building fail fast on the 4th subtask.

File "src/wrapper/dune", line 4, characters 9-18:
4 |   (names torch_api)
             ^^^^^^^^^
(cd _build/default/src/wrapper && /usr/bin/cc -std=c++17 -fPIC -D_GLIBCXX_USE_CXX11_ABI=1 -isystem /Users/ex/Library/Python/3.12/lib/python/site-packages/torch/include -isystem /Users/ex/Library/Python/3.12/lib/python/site-packages/torch/include/torch/csrc/api/include -g -I /Users/ex/.opam/5.1.1/lib/ocaml -I /Users/ex/.opam/5.1.1/lib/base -I /Users/ex/.opam/5.1.1/lib/base/base_internalhash_types -I /Users/ex/.opam/5.1.1/lib/base/md5 -I /Users/ex/.opam/5.1.1/lib/base/shadow_stdlib -I /Users/ex/.opam/5.1.1/lib/base_quickcheck -I /Users/ex/.opam/5.1.1/lib/base_quickcheck/ppx_quickcheck/runtime -I /Users/ex/.opam/5.1.1/lib/bigarray-compat -I /Users/ex/.opam/5.1.1/lib/bin_prot -I /Users/ex/.opam/5.1.1/lib/bin_prot/shape -I /Users/ex/.opam/5.1.1/lib/ctypes -I /Users/ex/.opam/5.1.1/lib/ctypes-foreign -I /Users/ex/.opam/5.1.1/lib/ctypes/stubs -I /Users/ex/.opam/5.1.1/lib/fieldslib -I /Users/ex/.opam/5.1.1/lib/integers -I /Users/ex/.opam/5.1.1/lib/jane-street-headers -I /Users/ex/.opam/5.1.1/lib/ocaml/str -I /Users/ex/.opam/5.1.1/lib/ocaml/threads -I /Users/ex/.opam/5.1.1/lib/ocaml/unix -I /Users/ex/.opam/5.1.1/lib/ocaml_intrinsics_kernel -I /Users/ex/.opam/5.1.1/lib/parsexp -I /Users/ex/.opam/5.1.1/lib/ppx_assert/runtime-lib -I /Users/ex/.opam/5.1.1/lib/ppx_bench/runtime-lib -I /Users/ex/.opam/5.1.1/lib/ppx_compare/runtime-lib -I /Users/ex/.opam/5.1.1/lib/ppx_enumerate/runtime-lib -I /Users/ex/.opam/5.1.1/lib/ppx_expect/config -I /Users/ex/.opam/5.1.1/lib/ppx_expect/config_types -I /Users/ex/.opam/5.1.1/lib/ppx_expect/make_corrected_file -I /Users/ex/.opam/5.1.1/lib/ppx_expect/runtime -I /Users/ex/.opam/5.1.1/lib/ppx_hash/runtime-lib -I /Users/ex/.opam/5.1.1/lib/ppx_here/runtime-lib -I /Users/ex/.opam/5.1.1/lib/ppx_inline_test/config -I /Users/ex/.opam/5.1.1/lib/ppx_inline_test/runtime-lib -I /Users/ex/.opam/5.1.1/lib/ppx_log/syntax -I /Users/ex/.opam/5.1.1/lib/ppx_log/types -I /Users/ex/.opam/5.1.1/lib/ppx_module_timer/runtime -I /Users/ex/.opam/5.1.1/lib/ppx_sexp_conv/runtime-lib -I /Users/ex/.opam/5.1.1/lib/ppx_stable_witness/runtime -I /Users/ex/.opam/5.1.1/lib/ppx_stable_witness/stable_witness -I /Users/ex/.opam/5.1.1/lib/ppx_string/runtime -I /Users/ex/.opam/5.1.1/lib/ppxlib/print_diff -I /Users/ex/.opam/5.1.1/lib/sexplib -I /Users/ex/.opam/5.1.1/lib/sexplib0 -I /Users/ex/.opam/5.1.1/lib/splittable_random -I /Users/ex/.opam/5.1.1/lib/stdio -I /Users/ex/.opam/5.1.1/lib/stdlib-shims -I /Users/ex/.opam/5.1.1/lib/time_now -I /Users/ex/.opam/5.1.1/lib/typerep -I /Users/ex/.opam/5.1.1/lib/variantslib -I ../bindings -o torch_api.o -c torch_api.cpp)
torch_api.cpp:903:10: fatal error: 'torch_api_generated.cpp' file not found
  903 | #include "torch_api_generated.cpp"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
-> required by _build/default/src/wrapper/torch_api.o
-> required by _build/default/src/wrapper/dlltorch_core_stubs.so
-> required by _build/install/default/lib/stublibs/dlltorch_core_stubs.so
-> required by _build/default/torch.install
-> required by alias install

@arbipher
Copy link

arbipher commented Nov 2, 2024

I cannot use my mac with M3 in the weekend but I tested it with my wsl. Now both dune build -p torch and dune build compile without problems.

There is a subtle concern on my editing in src/wrapper/dune

  (flags
   ;-Wincompatible-pointer-types ; if using gcc
   -Wno-error=incompatible-function-pointer-types ; if using clang
   )

however, I cannot figure out how to write the correct stanza for these conditional flags. It will only bother gcc users.

@arbipher
Copy link

arbipher commented Nov 5, 2024

It also works with my OCaml 5.2.0. PyML is only used in some examples so it's not required if users just install this package (or dune build -p torch).

@mwlon
Copy link

mwlon commented Nov 7, 2024

I've released a fix internally, borrowing from @arbipher's PR. It should propagate out later. I'll try to get a corrected version 0.17.1 released later as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
forwarded-to-js-devs This report has been forwarded to Jane Street's internal review system.
Projects
None yet
Development

No branches or pull requests

4 participants