Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build pytorch BladeDISC in docker #1314

Open
aiyxxj opened this issue Aug 20, 2024 · 0 comments
Open

build pytorch BladeDISC in docker #1314

aiyxxj opened this issue Aug 20, 2024 · 0 comments

Comments

@aiyxxj
Copy link

aiyxxj commented Aug 20, 2024

Describe the bug
root@fcaab431d485:/home/codespace/BladeDISC/pytorch_blade# bash ./scripts/build_pytorch_blade.sh

  • export CUDA_HOME=/usr/local/cuda/
  • CUDA_HOME=/usr/local/cuda/
  • export TF_CUDA_HOME=/usr/local/cuda/
  • TF_CUDA_HOME=/usr/local/cuda/
  • export CUDACXX=/usr/local/cuda//bin/nvcc
  • CUDACXX=/usr/local/cuda//bin/nvcc
  • export PATH=/usr/local/cuda//bin/:/root/.vscode-server/bin/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/bin/remote-cli:/opt/cmake/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
  • PATH=/usr/local/cuda//bin/:/root/.vscode-server/bin/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/bin/remote-cli:/opt/cmake/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
  • export TENSORRT_INSTALL_PATH=/usr/local/TensorRT/
  • TENSORRT_INSTALL_PATH=/usr/local/TensorRT/
  • export LD_LIBRARY_PATH=/usr/local/TensorRT//lib/:/usr/local/TensorRT//lib64/:/usr/local/cuda//lib64:/usr/local/TensorRT/lib/:/usr/local/cuda/lib64/:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
  • LD_LIBRARY_PATH=/usr/local/TensorRT//lib/:/usr/local/TensorRT//lib64/:/usr/local/cuda//lib64:/usr/local/TensorRT/lib/:/usr/local/cuda/lib64/:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
  • export LIBRARY_PATH=/usr/local/TensorRT//lib/:/usr/local/TensorRT//lib64/:/usr/local/cuda//lib64:/usr/local/cuda/lib64/stubs
  • LIBRARY_PATH=/usr/local/TensorRT//lib/:/usr/local/TensorRT//lib64/:/usr/local/cuda//lib64:/usr/local/cuda/lib64/stubs
  • export TORCH_BLADE_BUILD_MLIR_SUPPORT=ON
  • TORCH_BLADE_BUILD_MLIR_SUPPORT=ON
  • export TORCH_BLADE_BUILD_WITH_CUDA_SUPPORT=ON
  • TORCH_BLADE_BUILD_WITH_CUDA_SUPPORT=ON
  • export TORCH_BLADE_RUN_EXAMPLES=OFF
  • TORCH_BLADE_RUN_EXAMPLES=OFF
  • ci_build
  • echo 'DO TORCH_BLADE CI_BUILD'
    DO TORCH_BLADE CI_BUILD
  • pip_install_deps
  • TORCH_BLADE_CI_BUILD_TORCH_VERSION=2.0.1+cu118
  • requirements=requirements-dev-2.0.1+cu118.txt
  • python3 -m pip install --upgrade pip
    Requirement already satisfied: pip in /usr/local/lib/python3.8/dist-packages (24.2)
    WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
  • python3 -m pip install virtualenv
    Requirement already satisfied: virtualenv in /usr/local/lib/python3.8/dist-packages (20.26.2)
    Requirement already satisfied: distlib<1,>=0.3.7 in /usr/local/lib/python3.8/dist-packages (from virtualenv) (0.3.8)
    Requirement already satisfied: filelock<4,>=3.12.2 in /usr/local/lib/python3.8/dist-packages (from virtualenv) (3.14.0)
    Requirement already satisfied: platformdirs<5,>=3.9.1 in /usr/local/lib/python3.8/dist-packages (from virtualenv) (4.2.2)
    WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
  • python3 -m pip install -r scripts/pip/requirements-dev-2.0.1+cu118.txt -f https://download.pytorch.org/whl/torch_stable.html
    Looking in links: https://download.pytorch.org/whl/torch_stable.html
    Requirement already satisfied: wget in /usr/local/lib/python3.8/dist-packages (from -r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 1)) (3.2)
    Requirement already satisfied: pytest==5.0.0 in /usr/local/lib/python3.8/dist-packages (from -r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 2)) (5.0.0)
    Requirement already satisfied: networkx in /usr/local/lib/python3.8/dist-packages (from -r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 3)) (3.1)
    Requirement already satisfied: onnx==1.12.0 in /usr/local/lib/python3.8/dist-packages (from -r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 4)) (1.12.0)
    Requirement already satisfied: hypothesis in /usr/local/lib/python3.8/dist-packages (from -r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 5)) (6.111.1)
    Requirement already satisfied: expecttest in /usr/local/lib/python3.8/dist-packages (from -r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 6)) (0.2.1)
    Requirement already satisfied: py-cpuinfo in /usr/local/lib/python3.8/dist-packages (from -r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 7)) (9.0.0)
    Requirement already satisfied: aliyun-log-python-sdk==0.6.48.6 in /usr/local/lib/python3.8/dist-packages (from -r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 8)) (0.6.48.6)
    Requirement already satisfied: cryptography in /usr/local/lib/python3.8/dist-packages (from -r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 9)) (43.0.0)
    Requirement already satisfied: decorator in /usr/local/lib/python3.8/dist-packages (from -r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 10)) (5.1.1)
    Requirement already satisfied: torch==2.0.1+cu118 in /usr/local/lib/python3.8/dist-packages (from -r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 11)) (2.0.1+cu118)
    Requirement already satisfied: torchvision==0.15.2+cu118 in /usr/local/lib/python3.8/dist-packages (from -r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 12)) (0.15.2+cu118)
    Requirement already satisfied: py>=1.5.0 in /usr/local/lib/python3.8/dist-packages (from pytest==5.0.0->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 2)) (1.11.0)
    Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from pytest==5.0.0->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 2)) (24.1)
    Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from pytest==5.0.0->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 2)) (24.2.0)
    Requirement already satisfied: more-itertools>=4.0.0 in /usr/local/lib/python3.8/dist-packages (from pytest==5.0.0->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 2)) (10.4.0)
    Requirement already satisfied: atomicwrites>=1.0 in /usr/local/lib/python3.8/dist-packages (from pytest==5.0.0->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 2)) (1.4.1)
    Requirement already satisfied: pluggy<1.0,>=0.12 in /usr/local/lib/python3.8/dist-packages (from pytest==5.0.0->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 2)) (0.13.1)
    Requirement already satisfied: importlib-metadata>=0.12 in /usr/local/lib/python3.8/dist-packages (from pytest==5.0.0->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 2)) (8.3.0)
    Requirement already satisfied: wcwidth in /usr/local/lib/python3.8/dist-packages (from pytest==5.0.0->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 2)) (0.2.13)
    Requirement already satisfied: numpy>=1.16.6 in /usr/local/lib/python3.8/dist-packages (from onnx==1.12.0->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 4)) (1.24.4)
    Requirement already satisfied: protobuf<=3.20.1,>=3.12.2 in /usr/local/lib/python3.8/dist-packages (from onnx==1.12.0->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 4)) (3.20.1)
    Requirement already satisfied: typing-extensions>=3.6.2.1 in /usr/local/lib/python3.8/dist-packages (from onnx==1.12.0->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 4)) (4.12.2)
    Requirement already satisfied: dateparser in /usr/local/lib/python3.8/dist-packages (from aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 8)) (1.2.0)
    Requirement already satisfied: elasticsearch in /usr/local/lib/python3.8/dist-packages (from aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 8)) (8.15.0)
    Requirement already satisfied: jmespath in /usr/local/lib/python3.8/dist-packages (from aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 8)) (1.0.1)
    Requirement already satisfied: python-dateutil in /usr/local/lib/python3.8/dist-packages (from aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 8)) (2.9.0.post0)
    Requirement already satisfied: requests in /usr/local/lib/python3.8/dist-packages (from aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 8)) (2.32.3)
    Requirement already satisfied: six in /usr/local/lib/python3.8/dist-packages (from aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 8)) (1.16.0)
    Requirement already satisfied: filelock in /usr/local/lib/python3.8/dist-packages (from torch==2.0.1+cu118->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 11)) (3.14.0)
    Requirement already satisfied: sympy in /usr/local/lib/python3.8/dist-packages (from torch==2.0.1+cu118->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 11)) (1.13.2)
    Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from torch==2.0.1+cu118->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 11)) (3.1.4)
    Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.8/dist-packages (from torch==2.0.1+cu118->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 11)) (2.0.0)
    Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /usr/local/lib/python3.8/dist-packages (from torchvision==0.15.2+cu118->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 12)) (10.4.0)
    Requirement already satisfied: cmake in /usr/local/lib/python3.8/dist-packages (from triton==2.0.0->torch==2.0.1+cu118->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 11)) (3.30.2)
    Requirement already satisfied: lit in /usr/local/lib/python3.8/dist-packages (from triton==2.0.0->torch==2.0.1+cu118->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 11)) (18.1.8)
    Requirement already satisfied: sortedcontainers<3.0.0,>=2.1.0 in /usr/local/lib/python3.8/dist-packages (from hypothesis->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 5)) (2.4.0)
    Requirement already satisfied: exceptiongroup>=1.0.0 in /usr/local/lib/python3.8/dist-packages (from hypothesis->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 5)) (1.2.2)
    Requirement already satisfied: cffi>=1.12 in /usr/local/lib/python3.8/dist-packages (from cryptography->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 9)) (1.17.0)
    Requirement already satisfied: pycparser in /usr/local/lib/python3.8/dist-packages (from cffi>=1.12->cryptography->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 9)) (2.22)
    Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.8/dist-packages (from importlib-metadata>=0.12->pytest==5.0.0->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 2)) (3.20.0)
    Requirement already satisfied: pytz in /usr/local/lib/python3.8/dist-packages (from dateparser->aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 8)) (2024.1)
    Requirement already satisfied: regex!=2019.02.19,!=2021.8.27 in /usr/local/lib/python3.8/dist-packages (from dateparser->aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 8)) (2024.7.24)
    Requirement already satisfied: tzlocal in /usr/local/lib/python3.8/dist-packages (from dateparser->aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 8)) (5.2)
    Requirement already satisfied: elastic-transport<9,>=8.13 in /usr/local/lib/python3.8/dist-packages (from elasticsearch->aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 8)) (8.15.0)
    Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->torch==2.0.1+cu118->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 11)) (2.1.5)
    Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.8/dist-packages (from requests->aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 8)) (3.3.2)
    Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.8/dist-packages (from requests->aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 8)) (3.7)
    Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.8/dist-packages (from requests->aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 8)) (2.2.2)
    Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.8/dist-packages (from requests->aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 8)) (2024.7.4)
    Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.8/dist-packages (from sympy->torch==2.0.1+cu118->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 11)) (1.3.0)
    Requirement already satisfied: backports.zoneinfo in /usr/local/lib/python3.8/dist-packages (from tzlocal->dateparser->aliyun-log-python-sdk==0.6.48.6->-r scripts/pip/requirements-dev-2.0.1+cu118.txt (line 8)) (0.2.1)
    WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
  • COMMON_SETUP_ARGS=
  • '[' '' = ON ']'
  • '[' '' = ON ']'
  • '[' ON = ON ']'
  • export TORCH_BLADE_BUILD_TENSORRT=ON
  • TORCH_BLADE_BUILD_TENSORRT=ON
  • export TORCH_BLADE_BUILD_TENSORRT_STATIC=OFF
  • TORCH_BLADE_BUILD_TENSORRT_STATIC=OFF
  • python3 ../scripts/python/common_setup.py
    2024-08-20 03:19:27,828 INFO Execute shell command: rm -rf tf_community/.tf_configure.bazelrc, cwd: /home/codespace/BladeDISC
    2024-08-20 03:19:27,837 INFO linking via tao_compiler/file_map ...
    2024-08-20 03:19:27,838 INFO Execute shell command: rm -rf /home/codespace/BladeDISC/tf_community/tensorflow/compiler/decoupling && ln -s /home/codespace/BladeDISC/tao_compiler/decoupling /home/codespace/BladeDISC/tf_community/tensorflow/compiler/decoupling, cwd: /home/codespace/BladeDISC/pytorch_blade
    2024-08-20 03:19:27,851 INFO Execute shell command: rm -rf /home/codespace/BladeDISC/tf_community/tensorflow/compiler/mlir/disc && ln -s /home/codespace/BladeDISC/tao_compiler/mlir/disc /home/codespace/BladeDISC/tf_community/tensorflow/compiler/mlir/disc, cwd: /home/codespace/BladeDISC/pytorch_blade
    2024-08-20 03:19:27,864 INFO Execute shell command: rm -rf /home/codespace/BladeDISC/tf_community/mlir/util && ln -s /home/codespace/BladeDISC/tao_compiler/mlir/util /home/codespace/BladeDISC/tf_community/mlir/util, cwd: /home/codespace/BladeDISC/pytorch_blade
    2024-08-20 03:19:27,877 INFO Execute shell command: rm -rf /home/codespace/BladeDISC/tf_community/tensorflow/compiler/mlir/ral && ln -s /home/codespace/BladeDISC/tao_compiler/mlir/ral /home/codespace/BladeDISC/tf_community/tensorflow/compiler/mlir/ral, cwd: /home/codespace/BladeDISC/pytorch_blade
    2024-08-20 03:19:27,888 INFO Execute shell command: rm -rf /home/codespace/BladeDISC/tf_community/tensorflow/../.bazelrc.user && ln -s /home/codespace/BladeDISC/tao_compiler/.bazelrc.user /home/codespace/BladeDISC/tf_community/tensorflow/../.bazelrc.user, cwd: /home/codespace/BladeDISC/pytorch_blade
    2024-08-20 03:19:27,900 INFO linking ./tao to tao_compiler/tao
    2024-08-20 03:19:27,900 INFO Execute shell command: rm -rf /home/codespace/BladeDISC/tao_compiler/tao && ln -s /home/codespace/BladeDISC/tao /home/codespace/BladeDISC/tao_compiler/tao, cwd: /home/codespace/BladeDISC/pytorch_blade
    2024-08-20 03:19:27,913 INFO linking blade_gemm
    2024-08-20 03:19:27,913 INFO Execute shell command: rm -rf /home/codespace/BladeDISC/tao/blade_gemm && ln -s /home/codespace/BladeDISC/../platform_alibaba/blade_gemm /home/codespace/BladeDISC/tao/blade_gemm, cwd: /home/codespace/BladeDISC/pytorch_blade
    2024-08-20 03:19:27,925 INFO linking blade_service_common
    2024-08-20 03:19:27,925 INFO Execute shell command: rm -rf /home/codespace/BladeDISC/tao/third_party/blade_service_common && ln -s /home/codespace/BladeDISC/../platform_alibaba/third_party/blade_service_common /home/codespace/BladeDISC/tao/third_party/blade_service_common, cwd: /home/codespace/BladeDISC/pytorch_blade
    2024-08-20 03:19:27,938 INFO cleanup tao_compiler with XLA always...
    2024-08-20 03:19:27,938 INFO Execute shell command: rm -rf /home/codespace/BladeDISC/tao/tao_bridge/tao_launch_op, cwd: /home/codespace/BladeDISC/pytorch_blade
    2024-08-20 03:19:27,947 INFO Execute shell command: rm -rf /home/codespace/BladeDISC/tao/tao_bridge/gpu, cwd: /home/codespace/BladeDISC/pytorch_blade
    ++ python -c 'import torch; import os; print(os.path.dirname(os.path.abspath(torch.file)) + "/lib/")'
  • TORCH_LIB=/usr/local/lib/python3.8/dist-packages/torch/lib/
  • export LD_LIBRARY_PATH=:/usr/local/TensorRT//lib/:/usr/local/TensorRT//lib64/:/usr/local/cuda//lib64:/usr/local/TensorRT/lib/:/usr/local/cuda/lib64/:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
  • LD_LIBRARY_PATH=:/usr/local/TensorRT//lib/:/usr/local/TensorRT//lib64/:/usr/local/cuda//lib64:/usr/local/TensorRT/lib/:/usr/local/cuda/lib64/:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
  • DEBUG=1
  • python3 setup.py cpp_test
  • tee cpp_test.out
    version = '0.2.0+2.0.1.cu118'
    serialization_version = '0.0.3'
    debug = True
    cuda = '11.8'
    cuda_available = True
    build_tensorrt = True
    static_tensorrt = False
    git_version = 'fbe39bce9ae2d365d77842af38a33fa76d37237a'
    torch_version = '2.0.1+cu118'
    torch_git_version = 'e9ebda29d87ce0916ab08c06ab26fd3766a870e5'
    GLIBCXX_USE_CXX11_ABI = False

running cpp_test
ERROR: Config value 'nonccl' is not defined in any .rc file
INFO: Invocation ID: 46da1191-118d-455a-a6ce-5fdc0794eac0
Traceback (most recent call last):
File "setup.py", line 151, in
setup(
File "/usr/lib/python3/dist-packages/setuptools/init.py", line 144, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.8/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/lib/python3.8/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "setup.py", line 107, in run
self.cpp_run()
File "setup.py", line 91, in cpp_run
build.test()
File "/home/codespace/BladeDISC/pytorch_blade/bazel_build.py", line 284, in test
subprocess.check_call(test_cmd, shell=True, env=env, executable="/bin/bash")
File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -e; set -o pipefail; source .bazel_pyenv/bin/activate; bazel test --action_env PYTHON_BIN_PATH=/usr/bin/python3 --action_env BAZEL_LINKLIBS=-lstdc++ --action_env CC=/usr/bin/gcc --action_env CXX=/usr/bin/g++ --action_env DISC_FOREIGN_MAKE_JOBS=32 --copt=-DPYTORCH_VERSION_STRING="2.0.1+cu118" --copt=-DPYTORCH_MAJOR_VERSION=2 --copt=-DPYTORCH_MINOR_VERSION=0 --copt=-DTORCH_BLADE_CUDA_VERSION=11.8 --action_env TORCH_BLADE_TORCH_INSTALL_PATH=/usr/local/lib/python3.8/dist-packages/torch --copt=-DPYBIND11_COMPILER_TYPE="_gcc" --copt=-DPYBIND11_STDLIB="_libstdcpp" --copt=-DPYBIND11_BUILD_ABI="_cxxabi1011" --config=torch_debug --config=torch_tensorrt --action_env TENSORRT_INSTALL_PATH=/usr/local/TensorRT/ --action_env NVCC=/usr/local/cuda//bin/nvcc --config=torch_enable_quantization --config=torch_cxx11abi_0 --config=torch_cuda //tests/mhlo/... //pytorch_blade:torch_blade_test_suite //tests/torch-disc-pdll/tests/... //tests/torchscript/...' returned non-zero exit status 2.

To Reproduce
cd pytorch_blade && bash ./scripts/build_pytorch_blade.sh

Example code
If the code is too long (hopefully, it isn't), feel free to put it in a public gist and link it to the issue: https://gist.github.com.

Expected behavior
A clear and concise description of what you expected to happen.

Additional context for PyTorch

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python3 collect_env.py

root@fcaab431d485:/home/codespace/BladeDISC/pytorch_blade# python3 collect_env.py
Collecting environment information...
PyTorch version: 2.0.1+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.20.0
Libc version: glibc-2.31

Python version: 3.8.10 (default, Nov 22 2023, 10:22:35) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-6.5.0-44-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 4090
GPU 1: NVIDIA GeForce RTX 4090

Nvidia driver version: 535.183.01
cuDNN version: Probably one of the following:
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn.so.8.2.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.2.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.2.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.2.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.2.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.2.4
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.2.4
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 57 bits virtual
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 143
Model name: Intel(R) Xeon(R) Gold 6430
Stepping: 8
CPU MHz: 800.000
BogoMIPS: 4200.00
Virtualization: VT-x
L1d cache: 3 MiB
L1i cache: 2 MiB
L2 cache: 128 MiB
L3 cache: 120 MiB
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63,65,67,69,71,73,75,77,79,81,83,85,87,89,91,93,95,97,99,101,103,105,107,109,111,113,115,117,119,121,123,125,127
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 invpcid_single cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk pconfig arch_lbr ibt amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities

Versions of relevant libraries:
[pip3] numpy==1.24.4
[pip3] onnx==1.12.0
[pip3] torch==2.0.1+cu118
[pip3] torchvision==0.15.2+cu118
[pip3] triton==2.0.0
[conda] Could not collect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant