Skip to content

Commit

Permalink
Merge branch 'branch-0.14' into benchmark_updates
Browse files Browse the repository at this point in the history
Conflicts:
	CHANGELOG.md
  • Loading branch information
cjnolet committed Apr 28, 2020
2 parents 4b060f8 + 7a904ce commit d019d96
Show file tree
Hide file tree
Showing 154 changed files with 5,690 additions and 3,765 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ cuml.egg-info/
dist/
python/cuml/**/*.cpp
python/external_repositories
python/record.txt
log
.ipynb_checkpoints
.DS_Store
Expand Down
28 changes: 28 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,11 @@
## New Features
- PR #1980: prim: added a new write-only unary op prim
- PR #1867: C++: add logging interface support in cuML based spdlog
- PR #1902: Multi class inference in FIL C++ and importing multi-class forests from treelite
- PR #1906: UMAP MNMG
- PR #2067L python: wrap logging interface in cython
- PR #2083: Added dtype, order, and use_full_low_rank to MNMG `make_regression`
- PR #2074: SG and MNMG `make_classification`

## Improvements
- PR #1931: C++: enabled doxygen docs for all of the C++ codebase
Expand All @@ -17,21 +21,33 @@
- PR #1974: Reduce ARIMA testing time
- PR #1984: Enable Ninja build
- PR #2005: Adding missing algorithms to cuml benchmarks and notebook
- PR #2016: Add capability to setup.py and build.sh to fully clean all cython build files and artifacts
- PR #2044: A cuda-memcheck helper wrapper for devs
- PR #2018: Using `cuml.dask.part_utils.extract_partitions` and removing similar, duplicated code
- PR #2019: Enable doxygen build in our nightly doc build CI script
- PR #1996: Cythonize in parallel
- PR #2032: Reduce number of tests for MBSGD to improve CI running time
- PR #2031: Encapsulating UCX-py interactions in singleton
- PR #2029: Add C++ ARIMA log-likelihood benchmark
- PR #2051: Reduce the time required to run dask pca and dask tsvd tests
- PR #1981: Using CumlArray in kNN and DistributedDataHandler in dask kNN
- PR #2053: Introduce verbosity level in C++ layer instead of boolean `verbose` flag
- PR #2047: Make internal streams non-blocking w.r.t. NULL stream
- PR #2048: Random forest testing speedup
- PR #2058: Use CumlArray in Random Projection
- PR #2068: Updating knn class probabilities to use make_monotonic instead of binary search
- PR #2062: Adding random state to UMAP mnmg tests
- PR #2064: Speed-up K-Means test
- PR #2015: Renaming .h to .cuh in solver, dbscan and svm
- PR #2080: Improved import of sparse FIL forests from treelite
- PR #2090: Upgrade C++ build to C++14 standard
- PR #2089: CI: enabled cuda-memcheck on ml-prims unit-tests during nightly build
- PR #2118: Updating SGD & mini-batch estimators to use CumlArray
- PR #2120: Speeding up dask RandomForest tests
- PR #1883: Use CumlArray in ARIMA
- PR #2135: A few optimizations to UMAP fuzzy simplicial set
- PR #2098: Renaming .h to .cuh in decision_tree, glm, pca
- PR #2146: Remove deprecated kalman filter

## Bug Fixes
- PR #1939: Fix syntax error in cuml.common.array
Expand All @@ -40,24 +56,34 @@
- PR #1969: Update libcumlprims to 0.14
- PR #1973: Add missing mg files for setup.py --singlegpu flag
- PR #1993: Set `umap_transform_reproducibility` tests to xfail
- PR #2004: Refactoring the arguments to `plant()` call
- PR #2017: Fixing memory issue in weak cc prim
- PR #2028: Skipping UMAP knn reproducibility tests until we figure out why its failing in CUDA 10.2
- PR #2024: Fixed cuda-memcheck errors with sample-without-replacement prim
- PR #1540: prims: support for custom math-type used for computation inside adjusted rand index prim
- PR #2059: Make all Scipy imports conditional
- PR #2077L dask-make blobs arguments to match sklearn
- PR #2078: Ignore negative cache indices in get_vecs
- PR #2084: Fixed cuda-memcheck errors with COO unit-tests
- PR #2087: Fixed cuda-memcheck errors with dispersion prim
- PR #2096: Fixed syntax error with nightly build command for memcheck unit-tests
- PR #2115: Fixed contingency matrix prim unit-tests for computing correct golden values
- PR #2107: Fix PCA transform
- PR #2109: input_to_cuml_array __cuda_array_interface__ bugfix
- PR #2117: cuDF __array__ exception small fixes
- PR #2144: Remove GPU arch < 60 from CMake build
- PR #2153: Added missing namespaces to some Decision Tree files

# cuML 0.13.0 (Date TBD)

## New Features
- PR #1777: Python bindings for entropy
- PR #1742: Mean squared error implementation with cupy
- PR #1817: Confusion matrix implementation with cupy (SNSG and MNMG)
- PR #1766: Mean absolute error implementation with cupy
- PR #1766: Mean squared log error implementation with cupy
- PR #1635: cuML Array shim and configurable output added to cluster methods
- PR #1892: One hot encoder implementation with cupy
- PR #1586: Seasonal ARIMA
- PR #1683: cuml.dask make_regression
- PR #1689: Add framework for cuML Dask serializers
Expand All @@ -70,6 +96,7 @@
- PR #1738: cuml.dask refactor beginning and dask array input option for OLS, Ridge and KMeans
- PR #1874: Add predict_proba function to RF classifier
- PR #1815: Adding KNN parameter to UMAP
- PR #1978: Adding `predict_proba` function to dask RF

## Improvements
- PR #1644: Add `predict_proba()` for FIL binary classifier
Expand Down Expand Up @@ -108,6 +135,7 @@
- PR #1848: Rely on subclassing for cuML Array serialization
- PR #1866: Minimizing client memory pressure on Naive Bayes
- PR #1788: Removing complexity bottleneck in S-ARIMA
- PR #1873: Remove usage of nvstring and nvcat from LabelEncoder
- PR #1891: Additional improvements to naive bayes tree reduction

## Bug Fixes
Expand Down
19 changes: 13 additions & 6 deletions build.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash

# Copyright (c) 2019, NVIDIA CORPORATION.
# Copyright (c) 2019-2020, NVIDIA CORPORATION.

# cuml build script

Expand Down Expand Up @@ -33,7 +33,7 @@ HELP="$0 [<target> ...] [<flag> ...]
-g - build for debug
-n - no install step
--allgpuarch - build for all supported GPU architectures
--singlegpu - Build cuml without multigpu support (multigpu requires libcumlprims)
--singlegpu - Build cuml without libcumlprims based multigpu algorithms.
--nvtx - Enable nvtx for profiling support
--show_depr_warn - show cmake deprecation warnings
-h - print this text
Expand Down Expand Up @@ -114,11 +114,18 @@ if (( ${CLEAN} == 1 )); then
# The find removes all contents but leaves the dirs, the rmdir
# attempts to remove the dirs but can fail safely.
for bd in ${BUILD_DIRS}; do
if [ -d ${bd} ]; then
find ${bd} -mindepth 1 -delete
rmdir ${bd} || true
fi
if [ -d ${bd} ]; then
find ${bd} -mindepth 1 -delete
rmdir ${bd} || true
fi



done

cd ${REPODIR}/python
python setup.py clean --all
cd ${REPODIR}
fi

################################################################################
Expand Down
2 changes: 1 addition & 1 deletion ci/cpu/cuml/upload-anaconda.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,5 +29,5 @@ if [ "$BUILD_CUML" == "1" ]; then

echo "Upload"
echo ${UPLOADFILE}
anaconda -t ${MY_UPLOAD_KEY} upload -u ${CONDA_USERNAME:-rapidsai} ${LABEL_OPTION} --force ${UPLOADFILE}
anaconda -t ${MY_UPLOAD_KEY} upload -u ${CONDA_USERNAME:-rapidsai} ${LABEL_OPTION} --skip-existing ${UPLOADFILE}
fi
2 changes: 1 addition & 1 deletion ci/cpu/libcuml/upload-anaconda.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,5 +29,5 @@ if [ "$BUILD_LIBCUML" == "1" ]; then

echo "Upload"
echo ${UPLOADFILE}
anaconda -t ${MY_UPLOAD_KEY} upload -u ${CONDA_USERNAME:-rapidsai} ${LABEL_OPTION} --force ${UPLOADFILE}
anaconda -t ${MY_UPLOAD_KEY} upload -u ${CONDA_USERNAME:-rapidsai} ${LABEL_OPTION} --skip-existing ${UPLOADFILE}
fi
3 changes: 2 additions & 1 deletion ci/gpu/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,8 @@ GTEST_OUTPUT="xml:${WORKSPACE}/test-results/prims/" ./test/prims
# TEST - Run GoogleTest for ml-prims, but with cuda-memcheck enabled
################################################################################

if [ "$BUILD_MODE" = "branch" && "$BUILD_TYPE" = "gpu" ]; then
if [ "$BUILD_MODE" = "branch" ] && [ "$BUILD_TYPE" = "gpu" ]; then
logger "GoogleTest for ml-prims with cuda-memcheck enabled..."
cd $WORKSPACE/cpp/build
python ../scripts/cuda-memcheck.py -tool memcheck -exe ./test/prims
fi
1 change: 0 additions & 1 deletion cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -275,7 +275,6 @@ if(BUILD_CUML_CPP_LIBRARY)
src/fil/infer.cu
src/glm/glm.cu
src/holtwinters/holtwinters.cu
src/kalman_filter/lkf_py.cu
src/kmeans/kmeans.cu
src/knn/knn.cu
src/metrics/metrics.cu
Expand Down
2 changes: 1 addition & 1 deletion cpp/cmake/Dependencies.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ set(FAISS_DIR ${CMAKE_CURRENT_BINARY_DIR}/faiss CACHE STRING
"Path to FAISS source directory")
ExternalProject_Add(faiss
GIT_REPOSITORY https://github.com/facebookresearch/faiss.git
GIT_TAG v1.6.1
GIT_TAG v1.6.2
CONFIGURE_COMMAND LIBS=-pthread
CPPFLAGS=-w
LDFLAGS=-L${CMAKE_INSTALL_PREFIX}/lib
Expand Down
13 changes: 11 additions & 2 deletions cpp/cmake/EvalGpuArchs.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,15 @@ int main(int argc, char** argv) {
${eval_file}
OUTPUT_VARIABLE __gpu_archs
OUTPUT_STRIP_TRAILING_WHITESPACE)
message("Auto detection of gpu-archs: ${__gpu_archs}")
set(${gpu_archs} ${__gpu_archs} PARENT_SCOPE)
set(__gpu_archs_filtered "${__gpu_archs}")
foreach(arch ${__gpu_archs})
if (arch VERSION_LESS 60)
list(REMOVE_ITEM __gpu_archs_filtered ${arch})
endif()
endforeach()
if (NOT __gpu_archs_filtered)
message(FATAL_ERROR "No supported GPU arch found on this system")
endif()
message("Auto detection of gpu-archs: ${__gpu_archs_filtered}")
set(${gpu_archs} ${__gpu_archs_filtered} PARENT_SCOPE)
endfunction(evaluate_gpu_archs)
39 changes: 19 additions & 20 deletions cpp/comms/std/src/cuML_std_comms_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -204,12 +204,9 @@ cumlStdCommunicator_impl::cumlStdCommunicator_impl(
_ucp_eps(eps),
_size(size),
_rank(rank),
_next_request_id(0),
_ucp_handle(NULL) {
_next_request_id(0) {
initialize();

_ucp_handle = (void *)malloc(sizeof(struct comms_ucp_handle));
init_comms_ucp_handle((struct comms_ucp_handle *)_ucp_handle);
p2p_enabled = true;
}
#endif

Expand All @@ -231,10 +228,6 @@ cumlStdCommunicator_impl::~cumlStdCommunicator_impl() {

CUDA_CHECK_NO_THROW(cudaFree(_sendbuff));
CUDA_CHECK_NO_THROW(cudaFree(_recvbuff));

#ifndef WITH_UCX
close_ucp_handle((struct comms_ucp_handle *)_ucp_handle);
#endif
}

int cumlStdCommunicator_impl::getSize() const { return _size; }
Expand Down Expand Up @@ -279,6 +272,8 @@ void cumlStdCommunicator_impl::get_request_id(request_t *req) const {
void cumlStdCommunicator_impl::isend(const void *buf, int size, int dest,
int tag, request_t *request) const {
ASSERT(UCX_ENABLED, "cuML Comms not built with UCX support");
ASSERT(p2p_enabled,
"cuML Comms instance was not initialized for point-to-point");

#ifdef WITH_UCX
ASSERT(_ucp_worker != nullptr,
Expand All @@ -287,9 +282,10 @@ void cumlStdCommunicator_impl::isend(const void *buf, int size, int dest,
get_request_id(request);
ucp_ep_h ep_ptr = (*_ucp_eps)[dest];

struct ucp_request *ucp_req =
ucp_isend((struct comms_ucp_handle *)_ucp_handle, ep_ptr, buf, size, tag,
default_tag_mask, getRank());
ucp_request *ucp_req = (ucp_request *)malloc(sizeof(ucp_request));

this->_ucp_handler.ucp_isend(ucp_req, ep_ptr, buf, size, tag,
default_tag_mask, getRank());

CUML_LOG_DEBUG(
"%d: Created send request [id=%llu], ptr=%llu, to=%llu, ep=%llu", getRank(),
Expand All @@ -303,6 +299,8 @@ void cumlStdCommunicator_impl::isend(const void *buf, int size, int dest,
void cumlStdCommunicator_impl::irecv(void *buf, int size, int source, int tag,
request_t *request) const {
ASSERT(UCX_ENABLED, "cuML Comms not built with UCX support");
ASSERT(p2p_enabled,
"cuML Comms instance was not initialized for point-to-point");

#ifdef WITH_UCX
ASSERT(_ucp_worker != nullptr,
Expand All @@ -318,9 +316,9 @@ void cumlStdCommunicator_impl::irecv(void *buf, int size, int source, int tag,
tag_mask = any_rank_tag_mask;
}

struct ucp_request *ucp_req =
ucp_irecv((struct comms_ucp_handle *)_ucp_handle, _ucp_worker, ep_ptr, buf,
size, tag, tag_mask, source);
ucp_request *ucp_req = (ucp_request *)malloc(sizeof(ucp_request));
_ucp_handler.ucp_irecv(ucp_req, _ucp_worker, ep_ptr, buf, size, tag, tag_mask,
source);

CUML_LOG_DEBUG(
"%d: Created receive request [id=%llu], ptr=%llu, from=%llu, ep=%llu",
Expand All @@ -334,12 +332,14 @@ void cumlStdCommunicator_impl::irecv(void *buf, int size, int source, int tag,
void cumlStdCommunicator_impl::waitall(int count,
request_t array_of_requests[]) const {
ASSERT(UCX_ENABLED, "cuML Comms not built with UCX support");
ASSERT(p2p_enabled,
"cuML Comms instance was not initialized for point-to-point");

#ifdef WITH_UCX
ASSERT(_ucp_worker != nullptr,
"ERROR: UCX comms not initialized on communicator.");

std::vector<struct ucp_request *> requests;
std::vector<ucp_request *> requests;
requests.reserve(count);

time_t start = time(NULL);
Expand All @@ -360,13 +360,12 @@ void cumlStdCommunicator_impl::waitall(int count,
// in 10 or more seconds.
ASSERT(now - start < 10, "Timed out waiting for requests.");

for (std::vector<struct ucp_request *>::iterator it = requests.begin();
for (std::vector<ucp_request *>::iterator it = requests.begin();
it != requests.end();) {
bool restart = false; // resets the timeout when any progress was made

// Causes UCP to progress through the send/recv message queue
while (ucp_progress((struct comms_ucp_handle *)_ucp_handle,
_ucp_worker) != 0) {
while (_ucp_handler.ucp_progress(_ucp_worker) != 0) {
restart = true;
}

Expand Down Expand Up @@ -396,7 +395,7 @@ void cumlStdCommunicator_impl::waitall(int count,
req->other_rank, req->is_send_request, !req->needs_release);

// perform cleanup
free_ucp_request((struct comms_ucp_handle *)_ucp_handle, req);
_ucp_handler.free_ucp_request(req);

// remove from pending requests
it = requests.erase(it);
Expand Down
30 changes: 4 additions & 26 deletions cpp/comms/std/src/cuML_std_comms_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,31 +26,8 @@

#ifdef WITH_UCX
#include <ucp/api/ucp.h>

/**
* Standard UCX request object that will be passed
* around asynchronously. This object is really
* opaque and the comms layer only cares that it
* has been completed. Because cuml comms do not
* initialize the ucx application context, it doesn't
* own this object and thus it's important not to
* modify this struct.
*/
struct ucx_context {
int completed;
};

/**
* The ucp_request struct is owned by cuml comms. It
* wraps the `ucx_context` request and adds a few
* other fields for logging and cleanup.
*/
struct ucp_request {
struct ucx_context* req;
bool needs_release = true;
int other_rank = -1;
bool is_send_request = false;
};
#include <ucp/api/ucp_def.h>
#include "ucp_helper.h"
#endif

namespace ML {
Expand Down Expand Up @@ -149,7 +126,8 @@ class cumlStdCommunicator_impl : public MLCommon::cumlCommunicator_iface {
void get_request_id(request_t* req) const;

#ifdef WITH_UCX
void* _ucp_handle;
bool p2p_enabled = false;
comms_ucp_handler _ucp_handler;
ucp_worker_h _ucp_worker;
std::shared_ptr<ucp_ep_h*> _ucp_eps;
mutable request_t _next_request_id;
Expand Down
Loading

0 comments on commit d019d96

Please sign in to comment.