Skip to content

Commit

Permalink
Switch to packaged Thrust on Ubuntu, enable CentOS 7.5 as a CI target (
Browse files Browse the repository at this point in the history
…pytorch#12899)

Summary:
1) Use the hip-thrust version of Thrust as opposed to the GH master. (ROCm 267)

2) CentOS 7.5 docker (ROCm 279)

* Always install the libraries at docker creation for ubuntu.
* Add Dockerfile for CentOS ROCm
* Enable the centos build
* Source devtoolset in bashrc
* Set locales correctly depending on whether we are on Ubuntu or CentOS
* Install a newer cmake for CentOS
* Checkout thrust as there is no package for CentOS yet.

PyTorch/Caffe2 on ROCm passed tests: ROCm#280

For attention: bddppq ezyang

Docker rebuild for Ubuntu not urgent (getting rid of Thrust checkout and package install is mainly cosmetic). If docker for CentOS 7.5 is wanted, build is necessary. Build of PyTorch tested by me in CentOS docker. PyTorch unit tests work mostly, however, a test in test_jit causes a python recursion error that seems to be due to the python2 on CentOS as we haven't ever seen this on Ubuntu - hence please do not enable unit tests.
Pull Request resolved: pytorch#12899

Differential Revision: D13029424

Pulled By: bddppq

fbshipit-source-id: 1ca8f4337ec6a603f2742fc81046d5b8f8717c76
  • Loading branch information
iotamudelta authored and facebook-github-bot committed Nov 12, 2018
1 parent 1caa341 commit 53a3c46
Show file tree
Hide file tree
Showing 18 changed files with 200 additions and 135 deletions.
18 changes: 0 additions & 18 deletions .jenkins/caffe2/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ set -ex

pip install --user --no-cache-dir hypothesis==3.59.0


# The INSTALL_PREFIX here must match up with test.sh
INSTALL_PREFIX="/usr/local/caffe2"
LOCAL_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
Expand Down Expand Up @@ -154,23 +153,6 @@ if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then
# This is needed to read datasets from https://download.caffe2.ai/databases/resnet_trainer.zip
CMAKE_ARGS+=("-USE_LMDB=ON")
# TODO: This is patching the official FindHip to properly handly
# cmake generator expression. A PR is opened in the upstream repo here:
# https://github.com/ROCm-Developer-Tools/HIP/pull/516
# remove this hack once it's merged.
if [[ -f /opt/rocm/hip/cmake/FindHIP.cmake ]]; then
sudo sed -i 's/\ -I${dir}/\ $<$<BOOL:${dir}>:-I${dir}>/' /opt/rocm/hip/cmake/FindHIP.cmake
fi
export LANG=C.UTF-8
export LC_ALL=C.UTF-8
export HCC_AMDGPU_TARGET=gfx900
# The link time of libcaffe2_hip.so takes 40 minutes, according to
# https://github.com/RadeonOpenCompute/hcc#thinlto-phase-1---implemented
# using using ThinLTO could significantly improve link-time performance.
export KMTHINLTO=1
########## HIPIFY Caffe2 operators
${PYTHON} "${ROOT_DIR}/tools/amd_build/build_pytorch_amd.py"
${PYTHON} "${ROOT_DIR}/tools/amd_build/build_caffe2_amd.py"
Expand Down
14 changes: 0 additions & 14 deletions .jenkins/caffe2/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -49,20 +49,6 @@ fi
mkdir -p $TEST_DIR/{cpp,python}
if [[ $BUILD_ENVIRONMENT == *-rocm* ]]; then
export LANG=C.UTF-8
export LC_ALL=C.UTF-8
# Pin individual runs to specific gpu so that we can schedule
# multiple jobs on machines that have multi-gpu.
NUM_AMD_GPUS=$(/opt/rocm/bin/rocminfo | grep 'Device Type.*GPU' | wc -l)
if (( $NUM_AMD_GPUS == 0 )); then
echo >&2 "No AMD GPU detected!"
exit 1
fi
export HIP_VISIBLE_DEVICES=$(($BUILD_NUMBER % $NUM_AMD_GPUS))
fi
cd "${WORKSPACE}"
# C++ tests
Expand Down
16 changes: 0 additions & 16 deletions .jenkins/pytorch/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,22 +43,6 @@ cmake --version
pip install -q -r requirements.txt || true

if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
# This is necessary in order to cross compile (or else we'll have missing GPU device).
export HCC_AMDGPU_TARGET=gfx900

# These environment variables are not set on CI when we were running as the Jenkins user.
# The HIP Utility scripts require these environment variables to be set in order to run without error.
export LANG=C.UTF-8
export LC_ALL=C.UTF-8

# This environment variable enabled HCC Optimizations that speed up the linking stage.
# https://github.com/RadeonOpenCompute/hcc#hcc-with-thinlto-linking
export KMTHINLTO=1

# Need the libc++1 and libc++abi1 libraries to allow torch._C to load at runtime
sudo apt-get -qq install libc++1
sudo apt-get -qq install libc++abi1

# When hcc runs out of memory, it silently exits without stopping
# the build process, leaving undefined symbols in the shared lib
# which will cause undefined symbol errors when later running
Expand Down
1 change: 1 addition & 0 deletions .jenkins/pytorch/enabled-configs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ short-perf-test-cpu
short-perf-test-gpu
py2-clang7-rocmdeb-ubuntu16.04-build
py2-clang7-rocmdeb-ubuntu16.04-test
py2-devtoolset7-rocmrpm-centos7.5-build
pytorch-ppc64le-cuda9.2-cudnn7-py3-build
pytorch-ppc64le-cuda9.2-cudnn7-py3-test
pytorch-ppc64le-cuda9.1-cudnn7-py3-build
Expand Down
13 changes: 13 additions & 0 deletions aten/src/ATen/native/cuda/ReduceOpsKernel.cu
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,19 @@ void sum_kernel_impl(TensorIterator& iter) {
});
}

#ifdef __HIPCC__
template <>
void sum_kernel_impl<int16_t, int16_t>(TensorIterator& iter) {
// There is a Register Coalescing bug in LLVM causing the hcc
// compiler segfaults:
// https://bugs.llvm.org/show_bug.cgi?id=39602
// To work around it, use int32 as the accumulate type.
gpu_reduce_kernel<int16_t>(iter, []GPU_LAMBDA(int32_t a, int32_t b) -> int32_t {
return a + b;
});
}
#endif

template <typename scalar_t, typename acc_t=scalar_t>
void prod_kernel_impl(TensorIterator& iter) {
gpu_reduce_kernel<scalar_t>(iter, []GPU_LAMBDA(acc_t a, acc_t b) -> acc_t {
Expand Down
17 changes: 7 additions & 10 deletions caffe2/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -382,7 +382,7 @@ if(USE_ROCM)
hip_add_library(caffe2_hip ${Caffe2_HIP_SRCS})

# Since PyTorch files contain HIP headers, these flags are required for the necessary definitions to be added.
target_compile_options(caffe2_hip PRIVATE ${HIP_HIPCC_FLAGS})
target_compile_options(caffe2_hip PRIVATE ${HIP_HCC_FLAGS})
target_link_libraries(caffe2_hip PUBLIC caffe2)
target_link_libraries(caffe2_hip PUBLIC ${Caffe2_HIP_DEPENDENCY_LIBS})

Expand All @@ -393,9 +393,6 @@ if(USE_ROCM)
# Set standard properties on the target
torch_set_target_props(caffe2_hip)

# When a library has object files that contain device code, it needs to use hipcc/hcc to link.
set_target_properties(caffe2_hip PROPERTIES LINKER_LANGUAGE HIP)

caffe2_interface_library(caffe2_hip caffe2_hip_library)
list(APPEND Caffe2_MAIN_LIBS caffe2_hip_library)
install(TARGETS caffe2_hip EXPORT Caffe2Targets DESTINATION lib)
Expand Down Expand Up @@ -441,10 +438,11 @@ if (BUILD_TEST)
foreach(test_src ${Caffe2_HIP_TEST_SRCS})
set_source_files_properties(${test_src} PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
get_filename_component(test_name ${test_src} NAME_WE)
hip_add_executable(${test_name} "${test_src}")
add_executable(${test_name} "${test_src}")
target_link_libraries(${test_name} ${Caffe2_MAIN_LIBS} gtest_main)
target_include_directories(${test_name} PRIVATE $<INSTALL_INTERFACE:include>)
target_include_directories(${test_name} PRIVATE ${Caffe2_CPU_INCLUDE})
target_include_directories(${test_name} PRIVATE ${Caffe2_CPU_INCLUDE} ${Caffe2_HIP_INCLUDES})
target_compile_options(${test_name} PRIVATE ${HIP_CXX_FLAGS})
add_test(NAME ${test_name} COMMAND $<TARGET_FILE:${test_name}>)
if (INSTALL_TEST)
install(TARGETS ${test_name} DESTINATION test)
Expand Down Expand Up @@ -563,16 +561,15 @@ if (BUILD_PYTHON)
endif()

if(USE_ROCM)
hip_add_library(caffe2_pybind11_state_hip MODULE ${Caffe2_HIP_PYTHON_SRCS})
set_target_properties(caffe2_pybind11_state_hip PROPERTIES LINKER_LANGUAGE HIP)
target_compile_options(caffe2_pybind11_state_hip PRIVATE ${HIP_HIPCC_FLAGS} -fvisibility=hidden)
add_library(caffe2_pybind11_state_hip MODULE ${Caffe2_HIP_PYTHON_SRCS})
target_compile_options(caffe2_pybind11_state_hip PRIVATE ${HIP_CXX_FLAGS} -fvisibility=hidden)
set_target_properties(caffe2_pybind11_state_hip PROPERTIES PREFIX "")
set_target_properties(caffe2_pybind11_state_hip PROPERTIES SUFFIX ${PY_EXT_SUFFIX})
if (APPLE)
set_target_properties(caffe2_pybind11_state_hip PROPERTIES LINK_FLAGS "-undefined dynamic_lookup")
endif()
target_include_directories(caffe2_pybind11_state_hip PRIVATE $<INSTALL_INTERFACE:include>)
target_include_directories(caffe2_pybind11_state_hip PRIVATE ${Caffe2_CPU_INCLUDE})
target_include_directories(caffe2_pybind11_state_hip PRIVATE ${Caffe2_CPU_INCLUDE} ${Caffe2_HIP_INCLUDES})
target_link_libraries(
caffe2_pybind11_state_hip caffe2_library caffe2_hip_library)
if (WIN32)
Expand Down
47 changes: 21 additions & 26 deletions cmake/Dependencies.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -673,21 +673,27 @@ if(NOT BUILD_ATEN_MOBILE)
message(INFO "Compiling with HIP for AMD.")
caffe2_update_option(USE_ROCM ON)

list(APPEND HIP_HIPCC_FLAGS -fPIC)
list(APPEND HIP_HIPCC_FLAGS -D__HIP_PLATFORM_HCC__=1)
list(APPEND HIP_HIPCC_FLAGS -DCUDA_HAS_FP16=1)
list(APPEND HIP_HIPCC_FLAGS -D__HIP_NO_HALF_OPERATORS__=1)
list(APPEND HIP_HIPCC_FLAGS -D__HIP_NO_HALF_CONVERSIONS__=1)
list(APPEND HIP_HIPCC_FLAGS -DHIP_VERSION=${HIP_VERSION_MAJOR})
list(APPEND HIP_HIPCC_FLAGS -Wno-macro-redefined)
list(APPEND HIP_HIPCC_FLAGS -Wno-inconsistent-missing-override)
list(APPEND HIP_HIPCC_FLAGS -Wno-exceptions)
list(APPEND HIP_HIPCC_FLAGS -Wno-shift-count-negative)
list(APPEND HIP_HIPCC_FLAGS -Wno-shift-count-overflow)
list(APPEND HIP_HIPCC_FLAGS -Wno-unused-command-line-argument)
list(APPEND HIP_HIPCC_FLAGS -Wno-duplicate-decl-specifier)
list(APPEND HIP_HIPCC_FLAGS -DCAFFE2_USE_MIOPEN)
list(APPEND HIP_HIPCC_FLAGS -DROCBLAS_FP16=0)
list(APPEND HIP_CXX_FLAGS -fPIC)
list(APPEND HIP_CXX_FLAGS -D__HIP_PLATFORM_HCC__=1)
list(APPEND HIP_CXX_FLAGS -DCUDA_HAS_FP16=1)
list(APPEND HIP_CXX_FLAGS -D__HIP_NO_HALF_OPERATORS__=1)
list(APPEND HIP_CXX_FLAGS -D__HIP_NO_HALF_CONVERSIONS__=1)
list(APPEND HIP_CXX_FLAGS -DHIP_VERSION=${HIP_VERSION_MAJOR})
list(APPEND HIP_CXX_FLAGS -Wno-macro-redefined)
list(APPEND HIP_CXX_FLAGS -Wno-inconsistent-missing-override)
list(APPEND HIP_CXX_FLAGS -Wno-exceptions)
list(APPEND HIP_CXX_FLAGS -Wno-shift-count-negative)
list(APPEND HIP_CXX_FLAGS -Wno-shift-count-overflow)
list(APPEND HIP_CXX_FLAGS -Wno-unused-command-line-argument)
list(APPEND HIP_CXX_FLAGS -Wno-duplicate-decl-specifier)
list(APPEND HIP_CXX_FLAGS -DCAFFE2_USE_MIOPEN)
list(APPEND HIP_CXX_FLAGS -DROCBLAS_FP16=0)

set(HIP_HCC_FLAGS ${HIP_CXX_FLAGS})
# Ask hcc to generate device code during compilation so we can use
# host linker to link.
list(APPEND HIP_HCC_FLAGS -fno-gpu-rdc)
list(APPEND HIP_HCC_FLAGS -amdgpu-target=${HCC_AMDGPU_TARGET})

set(Caffe2_HIP_INCLUDES
${hip_INCLUDE_DIRS} ${hcc_INCLUDE_DIRS} ${hsa_INCLUDE_DIRS} ${rocrand_INCLUDE_DIRS} ${hiprand_INCLUDE_DIRS} ${rocblas_INCLUDE_DIRS} ${miopen_INCLUDE_DIRS} ${thrust_INCLUDE_DIRS} $<INSTALL_INTERFACE:include> ${Caffe2_HIP_INCLUDES})
Expand Down Expand Up @@ -726,17 +732,6 @@ if(USE_ROCM)
include_directories(SYSTEM ${HIPRAND_PATH}/include)
include_directories(SYSTEM ${ROCRAND_PATH}/include)
include_directories(SYSTEM ${THRUST_PATH})

# load HIP cmake module and load platform id
EXECUTE_PROCESS(COMMAND ${HIP_PATH}/bin/hipconfig -P OUTPUT_VARIABLE PLATFORM)
EXECUTE_PROCESS(COMMAND ${HIP_PATH}/bin/hipconfig --cpp_config OUTPUT_VARIABLE HIP_CXX_FLAGS)

# Link with HIPCC https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_porting_guide.md#linking-with-hipcc
# SET(CMAKE_CXX_LINK_EXECUTABLE ${HIP_HIPCC_EXECUTABLE})

# Show message that we're using ROCm.
MESSAGE(STATUS "ROCM TRUE:")
MESSAGE(STATUS "CMAKE_CXX_COMPILER: " ${CMAKE_CXX_COMPILER})
endif()

# ---[ NCCL
Expand Down
11 changes: 7 additions & 4 deletions cmake/public/LoadHIP.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,8 @@ ENDIF()
# THRUST_PATH
IF(DEFINED ENV{THRUST_PATH})
SET(THRUST_PATH $ENV{THRUST_PATH})
ELSEIF(DEFINED ENV{THRUST_ROOT})
# TODO: Remove support of THRUST_ROOT environment variable
SET(THRUST_PATH $ENV{THRUST_ROOT})
ELSE()
SET(THRUST_PATH ${ROCM_PATH}/Thrust)
SET(THRUST_PATH ${ROCM_PATH}/include)
ENDIF()

# HIPRAND_PATH
Expand Down Expand Up @@ -97,6 +94,12 @@ ELSE()
SET(MIOPEN_PATH $ENV{MIOPEN_PATH})
ENDIF()

IF(NOT DEFINED ENV{HCC_AMDGPU_TARGET})
SET(HCC_AMDGPU_TARGET gfx900)
ELSE()
SET(HCC_AMDGPU_TARGET $ENV{HCC_AMDGPU_TARGET})
ENDIF()

# Add HIP to the CMAKE Module Path
set(CMAKE_MODULE_PATH ${HIP_PATH}/cmake ${CMAKE_MODULE_PATH})

Expand Down
18 changes: 4 additions & 14 deletions cmake/public/utils.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -113,25 +113,15 @@ function(caffe2_binary_target target_name_or_src)
endfunction()

function(caffe2_hip_binary_target target_name_or_src)
caffe2_binary_target(${target_name_or_src})

if (ARGC GREATER 1)
set(__target ${target_name_or_src})
prepend(__srcs "${CMAKE_CURRENT_SOURCE_DIR}/" "${ARGN}")
else()
get_filename_component(__target ${target_name_or_src} NAME_WE)
prepend(__srcs "${CMAKE_CURRENT_SOURCE_DIR}/" "${target_name_or_src}")
endif()

# These two lines are the only differences between
# caffe2_hip_binary_target and caffe2_binary_target
set_source_files_properties(${__srcs} PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(${__target} ${__srcs})

target_link_libraries(${__target} ${Caffe2_MAIN_LIBS})
# If we have Caffe2_MODULES defined, we will also link with the modules.
if (DEFINED Caffe2_MODULES)
target_link_libraries(${__target} ${Caffe2_MODULES})
endif()
install(TARGETS ${__target} DESTINATION bin)
target_compile_options(${__target} PRIVATE ${HIP_CXX_FLAGS})
target_include_directories(${__target} PRIVATE ${Caffe2_HIP_INCLUDES})
endfunction()

##############################################################################
Expand Down
8 changes: 8 additions & 0 deletions docker/caffe2/jenkins/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ fi
if [[ "$image" == *rocm* ]]; then
ROCM_VERSION="$(echo "${image}" | perl -n -e'/rocm(\d+\.\d+\.\d+|nightly)/ && print $1')"
DOCKERFILE="${OS}-rocm/Dockerfile"
# newer cmake version needed
CMAKE_VERSION=3.6.3
fi

if [[ "$image" == *conda* ]]; then
Expand Down Expand Up @@ -66,6 +68,11 @@ if [[ "$image" == *-clang* ]]; then
CLANG_VERSION="$(echo "${image}" | perl -n -e'/clang(\d+(\.\d+)?)/ && print $1')"
fi


if [[ "$image" == *-devtoolset* ]]; then
DEVTOOLSET_VERSION="$(echo "${image}" | perl -n -e'/devtoolset(\d+(\.\d+)?)/ && print $1')"
fi

# Copy over common scripts to directory containing the Dockerfile to build
cp -a common/* "$(dirname ${DOCKERFILE})"

Expand All @@ -84,6 +91,7 @@ docker build \
--build-arg "JENKINS_GID=${JENKINS_GID:-}" \
--build-arg "UBUNTU_VERSION=${UBUNTU_VERSION}" \
--build-arg "CENTOS_VERSION=${CENTOS_VERSION}" \
--build-arg "DEVTOOLSET_VERSION=${DEVTOOLSET_VERSION}" \
--build-arg "PYTHON_VERSION=${PYTHON_VERSION}" \
--build-arg "ANACONDA_VERSION=${ANACONDA_VERSION}" \
--build-arg "CUDA_VERSION=${CUDA_VERSION}" \
Expand Down
1 change: 1 addition & 0 deletions docker/caffe2/jenkins/centos-rocm/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.sh
56 changes: 56 additions & 0 deletions docker/caffe2/jenkins/centos-rocm/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
ARG CENTOS_VERSION
FROM centos:${CENTOS_VERSION}

# Install required packages to build Caffe2
ARG EC2
ADD ./install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh

# Install devtoolset
ARG DEVTOOLSET_VERSION
ADD ./install_devtoolset.sh install_devtoolset.sh
RUN bash ./install_devtoolset.sh
RUN rm install_devtoolset.sh
ENV BASH_ENV "/etc/profile"

# Install rocm
ARG ROCM_VERSION
ADD ./install_rocm.sh install_rocm.sh
RUN bash ./install_rocm.sh
RUN rm install_rocm.sh
ENV PATH /opt/rocm/bin:$PATH
ENV PATH /opt/rocm/hcc/bin:$PATH
ENV PATH /opt/rocm/hip/bin:$PATH
ENV PATH /opt/rocm/opencl/bin:$PATH
ENV MIOPEN_DISABLE_CACHE 1
ENV HIP_PLATFORM hcc
ENV LC_ALL en_US.utf8
ENV LANG en_US.utf8

# Install non-default CMake version
ARG CMAKE_VERSION
ADD ./install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh

# Compile/install ccache for faster builds
ADD ./install_ccache.sh install_ccache.sh
RUN bash ./install_ccache.sh && rm install_ccache.sh

# Install Python
ARG PYTHON_VERSION
ADD ./install_python.sh install_python.sh
RUN if [ -n "${PYTHON_VERSION}" ]; then bash ./install_python.sh; fi
RUN rm install_python.sh

# (optional) Add Jenkins user
ARG JENKINS
ARG JENKINS_UID
ARG JENKINS_GID
ADD ./add_jenkins_user.sh add_jenkins_user.sh
RUN if [ -n "${JENKINS}" ]; then bash ./add_jenkins_user.sh; fi
RUN rm add_jenkins_user.sh

# Include BUILD_ENVIRONMENT environment variable in image
ARG BUILD_ENVIRONMENT
ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}
10 changes: 10 additions & 0 deletions docker/caffe2/jenkins/common/install_devtoolset.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash

set -ex

[ -n "$DEVTOOLSET_VERSION" ]

yum install -y centos-release-scl
yum install -y devtoolset-$DEVTOOLSET_VERSION

echo "source scl_source enable devtoolset-$DEVTOOLSET_VERSION" > "/etc/profile.d/devtoolset-$DEVTOOLSET_VERSION.sh"
Loading

0 comments on commit 53a3c46

Please sign in to comment.