Merge branch 'branch-0.14' into benchmark_updates

Conflicts: CHANGELOG.md
albert17 · Apr 28, 2020 · d019d96 · d019d96
2 parents 4b060f8 + 7a904ce
commit d019d96
Show file tree

Hide file tree

Showing 154 changed files with 5,690 additions and 3,765 deletions.
diff --git a/.gitignore b/.gitignore
@@ -19,6 +19,7 @@ cuml.egg-info/
 dist/
 python/cuml/**/*.cpp
 python/external_repositories
+python/record.txt
 log
 .ipynb_checkpoints
 .DS_Store

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,7 +3,11 @@
 ## New Features
 - PR #1980: prim: added a new write-only unary op prim
 - PR #1867: C++: add logging interface support in cuML based spdlog
+- PR #1902: Multi class inference in FIL C++ and importing multi-class forests from treelite
 - PR #1906: UMAP MNMG
+- PR #2067L python: wrap logging interface in cython
+- PR #2083: Added dtype, order, and use_full_low_rank to MNMG `make_regression`
+- PR #2074: SG and MNMG `make_classification`
 
 ## Improvements
 - PR #1931: C++: enabled doxygen docs for all of the C++ codebase
@@ -17,21 +21,33 @@
 - PR #1974: Reduce ARIMA testing time
 - PR #1984: Enable Ninja build
 - PR #2005: Adding missing algorithms to cuml benchmarks and notebook
+- PR #2016: Add capability to setup.py and build.sh to fully clean all cython build files and artifacts
 - PR #2044: A cuda-memcheck helper wrapper for devs
 - PR #2018: Using `cuml.dask.part_utils.extract_partitions` and removing similar, duplicated code
 - PR #2019: Enable doxygen build in our nightly doc build CI script
 - PR #1996: Cythonize in parallel
 - PR #2032: Reduce number of tests for MBSGD to improve CI running time
 - PR #2031: Encapsulating UCX-py interactions in singleton
 - PR #2029: Add C++ ARIMA log-likelihood benchmark
+- PR #2051: Reduce the time required to run dask pca and dask tsvd tests
+- PR #1981: Using CumlArray in kNN and DistributedDataHandler in dask kNN
 - PR #2053: Introduce verbosity level in C++ layer instead of boolean `verbose` flag
 - PR #2047: Make internal streams non-blocking w.r.t. NULL stream
+- PR #2048: Random forest testing speedup
 - PR #2058: Use CumlArray in Random Projection
+- PR #2068: Updating knn class probabilities to use make_monotonic instead of binary search
+- PR #2062: Adding random state to UMAP mnmg tests
 - PR #2064: Speed-up K-Means test
 - PR #2015: Renaming .h to .cuh in solver, dbscan and svm
 - PR #2080: Improved import of sparse FIL forests from treelite
 - PR #2090: Upgrade C++ build to C++14 standard
 - PR #2089: CI: enabled cuda-memcheck on ml-prims unit-tests during nightly build
+- PR #2118: Updating SGD & mini-batch estimators to use CumlArray
+- PR #2120: Speeding up dask RandomForest tests
+- PR #1883: Use CumlArray in ARIMA
+- PR #2135: A few optimizations to UMAP fuzzy simplicial set
+- PR #2098: Renaming .h to .cuh in decision_tree, glm, pca
+- PR #2146: Remove deprecated kalman filter
 
 ## Bug Fixes
 - PR #1939: Fix syntax error in cuml.common.array
@@ -40,24 +56,34 @@
 - PR #1969: Update libcumlprims to 0.14
 - PR #1973: Add missing mg files for setup.py --singlegpu flag
 - PR #1993: Set `umap_transform_reproducibility` tests to xfail
+- PR #2004: Refactoring the arguments to `plant()` call
 - PR #2017: Fixing memory issue in weak cc prim
 - PR #2028: Skipping UMAP knn reproducibility tests until we figure out why its failing in CUDA 10.2
 - PR #2024: Fixed cuda-memcheck errors with sample-without-replacement prim
 - PR #1540: prims: support for custom math-type used for computation inside adjusted rand index prim
+- PR #2059: Make all Scipy imports conditional
 - PR #2077L dask-make blobs arguments to match sklearn
 - PR #2078: Ignore negative cache indices in get_vecs
 - PR #2084: Fixed cuda-memcheck errors with COO unit-tests
 - PR #2087: Fixed cuda-memcheck errors with dispersion prim
+- PR #2096: Fixed syntax error with nightly build command for memcheck unit-tests
+- PR #2115: Fixed contingency matrix prim unit-tests for computing correct golden values
+- PR #2107: Fix PCA transform
+- PR #2109: input_to_cuml_array __cuda_array_interface__ bugfix
 - PR #2117: cuDF __array__ exception small fixes
+- PR #2144: Remove GPU arch < 60 from CMake build
+- PR #2153: Added missing namespaces to some Decision Tree files
 
 # cuML 0.13.0 (Date TBD)
 
 ## New Features
 - PR #1777: Python bindings for entropy
 - PR #1742: Mean squared error implementation with cupy
+- PR #1817: Confusion matrix implementation with cupy (SNSG and MNMG)
 - PR #1766: Mean absolute error implementation with cupy
 - PR #1766: Mean squared log error implementation with cupy
 - PR #1635: cuML Array shim and configurable output added to cluster methods
+- PR #1892: One hot encoder implementation with cupy
 - PR #1586: Seasonal ARIMA
 - PR #1683: cuml.dask make_regression
 - PR #1689: Add framework for cuML Dask serializers
@@ -70,6 +96,7 @@
 - PR #1738: cuml.dask refactor beginning and dask array input option for OLS, Ridge and KMeans
 - PR #1874: Add predict_proba function to RF classifier
 - PR #1815: Adding KNN parameter to UMAP
+- PR #1978: Adding `predict_proba` function to dask RF
 
 ## Improvements
 - PR #1644: Add `predict_proba()` for FIL binary classifier
@@ -108,6 +135,7 @@
 - PR #1848: Rely on subclassing for cuML Array serialization
 - PR #1866: Minimizing client memory pressure on Naive Bayes
 - PR #1788: Removing complexity bottleneck in S-ARIMA
+- PR #1873: Remove usage of nvstring and nvcat from LabelEncoder
 - PR #1891: Additional improvements to naive bayes tree reduction
 
 ## Bug Fixes

diff --git a/build.sh b/build.sh
@@ -1,6 +1,6 @@
 #!/bin/bash
 
-# Copyright (c) 2019, NVIDIA CORPORATION.
+# Copyright (c) 2019-2020, NVIDIA CORPORATION.
 
 # cuml build script
 
@@ -33,7 +33,7 @@ HELP="$0 [<target> ...] [<flag> ...]
    -g               - build for debug
    -n               - no install step
    --allgpuarch     - build for all supported GPU architectures
-   --singlegpu      - Build cuml without multigpu support (multigpu requires libcumlprims)
+   --singlegpu      - Build cuml without libcumlprims based multigpu algorithms.
    --nvtx           - Enable nvtx for profiling support
    --show_depr_warn - show cmake deprecation warnings
    -h               - print this text
@@ -114,11 +114,18 @@ if (( ${CLEAN} == 1 )); then
     # The find removes all contents but leaves the dirs, the rmdir
     # attempts to remove the dirs but can fail safely.
     for bd in ${BUILD_DIRS}; do
-  if [ -d ${bd} ]; then
-      find ${bd} -mindepth 1 -delete
-      rmdir ${bd} || true
-  fi
+      if [ -d ${bd} ]; then
+          find ${bd} -mindepth 1 -delete
+          rmdir ${bd} || true
+      fi
+
+
+
     done
+
+    cd ${REPODIR}/python
+    python setup.py clean --all
+    cd ${REPODIR}
 fi
 
 ################################################################################

diff --git a/ci/cpu/cuml/upload-anaconda.sh b/ci/cpu/cuml/upload-anaconda.sh
@@ -29,5 +29,5 @@ if [ "$BUILD_CUML" == "1" ]; then
 
   echo "Upload"
   echo ${UPLOADFILE}
-  anaconda -t ${MY_UPLOAD_KEY} upload -u ${CONDA_USERNAME:-rapidsai} ${LABEL_OPTION} --force ${UPLOADFILE}
+  anaconda -t ${MY_UPLOAD_KEY} upload -u ${CONDA_USERNAME:-rapidsai} ${LABEL_OPTION} --skip-existing ${UPLOADFILE}
 fi
diff --git a/ci/cpu/libcuml/upload-anaconda.sh b/ci/cpu/libcuml/upload-anaconda.sh
@@ -29,5 +29,5 @@ if [ "$BUILD_LIBCUML" == "1" ]; then
 
   echo "Upload"
   echo ${UPLOADFILE}
-  anaconda -t ${MY_UPLOAD_KEY} upload -u ${CONDA_USERNAME:-rapidsai} ${LABEL_OPTION} --force ${UPLOADFILE}
+  anaconda -t ${MY_UPLOAD_KEY} upload -u ${CONDA_USERNAME:-rapidsai} ${LABEL_OPTION} --skip-existing ${UPLOADFILE}
 fi
diff --git a/ci/gpu/build.sh b/ci/gpu/build.sh
@@ -142,7 +142,8 @@ GTEST_OUTPUT="xml:${WORKSPACE}/test-results/prims/" ./test/prims
 # TEST - Run GoogleTest for ml-prims, but with cuda-memcheck enabled
 ################################################################################
 
-if [ "$BUILD_MODE" = "branch" && "$BUILD_TYPE" = "gpu" ]; then
+if [ "$BUILD_MODE" = "branch" ] && [ "$BUILD_TYPE" = "gpu" ]; then
+    logger "GoogleTest for ml-prims with cuda-memcheck enabled..."
     cd $WORKSPACE/cpp/build
     python ../scripts/cuda-memcheck.py -tool memcheck -exe ./test/prims
 fi
diff --git a/cpp/CMakeLists.txt b/cpp/CMakeLists.txt
@@ -275,7 +275,6 @@ if(BUILD_CUML_CPP_LIBRARY)
     src/fil/infer.cu
     src/glm/glm.cu
     src/holtwinters/holtwinters.cu
-    src/kalman_filter/lkf_py.cu
     src/kmeans/kmeans.cu
     src/knn/knn.cu
     src/metrics/metrics.cu

diff --git a/cpp/cmake/Dependencies.cmake b/cpp/cmake/Dependencies.cmake
@@ -61,7 +61,7 @@ set(FAISS_DIR ${CMAKE_CURRENT_BINARY_DIR}/faiss CACHE STRING
   "Path to FAISS source directory")
 ExternalProject_Add(faiss
   GIT_REPOSITORY    https://github.com/facebookresearch/faiss.git
-  GIT_TAG           v1.6.1
+  GIT_TAG           v1.6.2
   CONFIGURE_COMMAND LIBS=-pthread
                     CPPFLAGS=-w
                     LDFLAGS=-L${CMAKE_INSTALL_PREFIX}/lib

diff --git a/cpp/cmake/EvalGpuArchs.cmake b/cpp/cmake/EvalGpuArchs.cmake
@@ -54,6 +54,15 @@ int main(int argc, char** argv) {
       ${eval_file}
     OUTPUT_VARIABLE __gpu_archs
     OUTPUT_STRIP_TRAILING_WHITESPACE)
-  message("Auto detection of gpu-archs: ${__gpu_archs}")
-  set(${gpu_archs} ${__gpu_archs} PARENT_SCOPE)
+  set(__gpu_archs_filtered "${__gpu_archs}")
+  foreach(arch ${__gpu_archs})
+    if (arch VERSION_LESS 60)
+      list(REMOVE_ITEM __gpu_archs_filtered ${arch})
+    endif()
+  endforeach()
+  if (NOT __gpu_archs_filtered)
+    message(FATAL_ERROR "No supported GPU arch found on this system")
+  endif()
+  message("Auto detection of gpu-archs: ${__gpu_archs_filtered}")
+  set(${gpu_archs} ${__gpu_archs_filtered} PARENT_SCOPE)
 endfunction(evaluate_gpu_archs)
diff --git a/cpp/comms/std/src/cuML_std_comms_impl.cpp b/cpp/comms/std/src/cuML_std_comms_impl.cpp
@@ -204,12 +204,9 @@ cumlStdCommunicator_impl::cumlStdCommunicator_impl(
     _ucp_eps(eps),
     _size(size),
     _rank(rank),
-    _next_request_id(0),
-    _ucp_handle(NULL) {
+    _next_request_id(0) {
   initialize();
-
-  _ucp_handle = (void *)malloc(sizeof(struct comms_ucp_handle));
-  init_comms_ucp_handle((struct comms_ucp_handle *)_ucp_handle);
+  p2p_enabled = true;
 }
 #endif
 
@@ -231,10 +228,6 @@ cumlStdCommunicator_impl::~cumlStdCommunicator_impl() {
 
   CUDA_CHECK_NO_THROW(cudaFree(_sendbuff));
   CUDA_CHECK_NO_THROW(cudaFree(_recvbuff));
-
-#ifndef WITH_UCX
-  close_ucp_handle((struct comms_ucp_handle *)_ucp_handle);
-#endif
 }
 
 int cumlStdCommunicator_impl::getSize() const { return _size; }
@@ -279,6 +272,8 @@ void cumlStdCommunicator_impl::get_request_id(request_t *req) const {
 void cumlStdCommunicator_impl::isend(const void *buf, int size, int dest,
                                      int tag, request_t *request) const {
   ASSERT(UCX_ENABLED, "cuML Comms not built with UCX support");
+  ASSERT(p2p_enabled,
+         "cuML Comms instance was not initialized for point-to-point");
 
 #ifdef WITH_UCX
   ASSERT(_ucp_worker != nullptr,
@@ -287,9 +282,10 @@ void cumlStdCommunicator_impl::isend(const void *buf, int size, int dest,
   get_request_id(request);
   ucp_ep_h ep_ptr = (*_ucp_eps)[dest];
 
-  struct ucp_request *ucp_req =
-    ucp_isend((struct comms_ucp_handle *)_ucp_handle, ep_ptr, buf, size, tag,
-              default_tag_mask, getRank());
+  ucp_request *ucp_req = (ucp_request *)malloc(sizeof(ucp_request));
+
+  this->_ucp_handler.ucp_isend(ucp_req, ep_ptr, buf, size, tag,
+                               default_tag_mask, getRank());
 
   CUML_LOG_DEBUG(
     "%d: Created send request [id=%llu], ptr=%llu, to=%llu, ep=%llu", getRank(),
@@ -303,6 +299,8 @@ void cumlStdCommunicator_impl::isend(const void *buf, int size, int dest,
 void cumlStdCommunicator_impl::irecv(void *buf, int size, int source, int tag,
                                      request_t *request) const {
   ASSERT(UCX_ENABLED, "cuML Comms not built with UCX support");
+  ASSERT(p2p_enabled,
+         "cuML Comms instance was not initialized for point-to-point");
 
 #ifdef WITH_UCX
   ASSERT(_ucp_worker != nullptr,
@@ -318,9 +316,9 @@ void cumlStdCommunicator_impl::irecv(void *buf, int size, int source, int tag,
     tag_mask = any_rank_tag_mask;
   }
 
-  struct ucp_request *ucp_req =
-    ucp_irecv((struct comms_ucp_handle *)_ucp_handle, _ucp_worker, ep_ptr, buf,
-              size, tag, tag_mask, source);
+  ucp_request *ucp_req = (ucp_request *)malloc(sizeof(ucp_request));
+  _ucp_handler.ucp_irecv(ucp_req, _ucp_worker, ep_ptr, buf, size, tag, tag_mask,
+                         source);
 
   CUML_LOG_DEBUG(
     "%d: Created receive request [id=%llu], ptr=%llu, from=%llu, ep=%llu",
@@ -334,12 +332,14 @@ void cumlStdCommunicator_impl::irecv(void *buf, int size, int source, int tag,
 void cumlStdCommunicator_impl::waitall(int count,
                                        request_t array_of_requests[]) const {
   ASSERT(UCX_ENABLED, "cuML Comms not built with UCX support");
+  ASSERT(p2p_enabled,
+         "cuML Comms instance was not initialized for point-to-point");
 
 #ifdef WITH_UCX
   ASSERT(_ucp_worker != nullptr,
          "ERROR: UCX comms not initialized on communicator.");
 
-  std::vector<struct ucp_request *> requests;
+  std::vector<ucp_request *> requests;
   requests.reserve(count);
 
   time_t start = time(NULL);
@@ -360,13 +360,12 @@ void cumlStdCommunicator_impl::waitall(int count,
     // in 10 or more seconds.
     ASSERT(now - start < 10, "Timed out waiting for requests.");
 
-    for (std::vector<struct ucp_request *>::iterator it = requests.begin();
+    for (std::vector<ucp_request *>::iterator it = requests.begin();
          it != requests.end();) {
       bool restart = false;  // resets the timeout when any progress was made
 
       // Causes UCP to progress through the send/recv message queue
-      while (ucp_progress((struct comms_ucp_handle *)_ucp_handle,
-                          _ucp_worker) != 0) {
+      while (_ucp_handler.ucp_progress(_ucp_worker) != 0) {
         restart = true;
       }
 
@@ -396,7 +395,7 @@ void cumlStdCommunicator_impl::waitall(int count,
           req->other_rank, req->is_send_request, !req->needs_release);
 
         // perform cleanup
-        free_ucp_request((struct comms_ucp_handle *)_ucp_handle, req);
+        _ucp_handler.free_ucp_request(req);
 
         // remove from pending requests
         it = requests.erase(it);

diff --git a/cpp/comms/std/src/cuML_std_comms_impl.hpp b/cpp/comms/std/src/cuML_std_comms_impl.hpp
@@ -26,31 +26,8 @@
 
 #ifdef WITH_UCX
 #include <ucp/api/ucp.h>
-
-/**
- * Standard UCX request object that will be passed
- * around asynchronously. This object is really
- * opaque and the comms layer only cares that it
- * has been completed. Because cuml comms do not
- * initialize the ucx application context, it doesn't
- * own this object and thus it's important not to
- * modify this struct.
- */
-struct ucx_context {
-  int completed;
-};
-
-/**
- * The ucp_request struct is owned by cuml comms. It
- * wraps the `ucx_context` request and adds a few
- * other fields for logging and cleanup.
- */
-struct ucp_request {
-  struct ucx_context* req;
-  bool needs_release = true;
-  int other_rank = -1;
-  bool is_send_request = false;
-};
+#include <ucp/api/ucp_def.h>
+#include "ucp_helper.h"
 #endif
 
 namespace ML {
@@ -149,7 +126,8 @@ class cumlStdCommunicator_impl : public MLCommon::cumlCommunicator_iface {
   void get_request_id(request_t* req) const;
 
 #ifdef WITH_UCX
-  void* _ucp_handle;
+  bool p2p_enabled = false;
+  comms_ucp_handler _ucp_handler;
   ucp_worker_h _ucp_worker;
   std::shared_ptr<ucp_ep_h*> _ucp_eps;
   mutable request_t _next_request_id;