GitHub - matthewmerris/ParTI: Parallel Tensor Infrastructure (ParTI!)

ParTI!

(Notice: For the code used in Optimizing sparse tensor times matrix on GPUs, please go to branch JPDC.)

A Parallel Tensor Infrastructure (ParTI!), is to support fast essential sparse tensor operations and tensor decompositions on multicore CPU and GPU architectures. These basic tensor operations are critical to the overall performance of tensor analysis algorithms (such as tensor decomposition). ParTI! is formerly known as SpTOL.

Supported sparse tensor operations:

Scala-tensor mul/div [CPU]
Kronecker product [CPU]
Khatri-Rao product [CPU]
Sparse tensor matricization [CPU]
Element-wise tensor add/sub/mul/div [CPU, Multicore, GPU]
Sparse tensor-times-dense matrix (SpTTM) [CPU, Multicore, GPU]
Sparse matricized tensor times Khatri-Rao product (SpMTTKRP) [CPU, Multicore, GPU]
Sparse tensor matricization [CPU]
Sparse CANDECOMP/PARAFAC decomposition
Sparse Tucker decomposition (refer to branch JPDC)

Supported sparse tensor formats:

Coordinate (COO) format
Hierarchical coordinate (HiCOO) format [Paper]

Build requirements:

C Compiler (GCC or ICC or Clang)
CMake (>v3.2)
CUDA SDK [Required for GPU algorithms]
OpenBLAS (Or an alternative BLAS and Lapack library) [Required for tensor decomposition]
MAGMA [Optional]

Build:

Create a file by `cp build.config build-sample.config' and change the settings appropriately
Type ./build.sh
Check build for resulting library
Check build/tests for testing programs, for basic functionality
Check build/examples for example programs including MTTKRP, TTM, CP decomposition

Build MATLAB interface (Not ready for new functions):

cd matlab
export LD_LIBRARY_PATH=../build:$LD_LIBRARY_PATH
Type make to build all functions into MEX library.
matlab
1. In matlab environment, type addpath(pwd)
2. Play with ParTI MATLAB inferface.

Build docs:

Install Doxygen
Go to docs
Type make

Run examples:

Please refer to GettingStarted.md for more general cases. Only some major functions are shown below.

MTTKRP:

COO-MTTKRP (CPU, Multicore)
- Usage: ./build/examples/mttkrp [options], Options:
  - -i INPUT, --input=INPUT (.tns file)
  - -o OUTPUT, --output=OUTPUT (output file name)
  - -m MODE, --mode=MODE (default -1: loop all modes, or specify a mode, e.g., 0 or 1 or 2 for third-order tensors.)
  - -s sortcase, --sortcase=SORTCASE (0:default,1,2,3,4. Different tensor sorting.)
  - -b BLOCKSIZE, --blocksize=BLOCKSIZE (in bits) (Only for sortcase=3)
  - -k KERNELSIZE, --kernelsize=KERNELSIZE (in bits) (Only for sortcase=3)
  - -d DEV_ID, --dev-id=DEV_ID (-2:sequential,default; -1:OpenMP parallel)
  - -r RANK (the number of matrix columns, 16:default)
  - OpenMP options:
  - -t NTHREADS, --nt=NT (1:default)
  - -u use_reduce, --ur=use_reduce (use privatization or not)
  - --help

COO-MTTKRP (GPU)
- Usage: ./build/examples/mttkrp_gpu [options], Options:
  - -i INPUT, --input=INPUT (.tns file)
  - -o OUTPUT, --output=OUTPUT (output file name)
  - -m MODE, --mode=MODE (specify a mode, e.g., 0 or 1 or 2 for third-order tensors. Default:0.)
  - -s sortcase, --sortcase=SORTCASE (0:default,1,2,3,4. Different tensor sorting.)
  - -b BLOCKSIZE, --blocksize=BLOCKSIZE (in bits) (Only for sortcase=3)
  - -k KERNELSIZE, --kernelsize=KERNELSIZE (in bits) (Only for sortcase=3)
  - -d CUDA_DEV_ID,, --cuda-dev-id=CUDA_DEV_ID (>0:GPU device id)
  - -r RANK (the number of matrix columns, 16:default)
  - GPU options:
  - -p IMPL_NUM, --impl-num=IMPL_NUM (11, 12, 15, where 15 should be the best case)
  - --help
HiCOO-MTTKRP (CPU, Multicore)
- Usage: ./build/examples/mttkrp_hicoo [options], Options:
  - -i INPUT, --input=INPUT (.tns file)
  - -o OUTPUT, --output=OUTPUT (output file name)
  - -m MODE, --mode=MODE (default -1: loop all modes, or specify a mode, e.g., 0 or 1 or 2 for third-order tensors.)
  - -b BLOCKSIZE, --blocksize=BLOCKSIZE (in bits) (required)
  - -k KERNELSIZE, --kernelsize=KERNELSIZE (in bits) (required)
  - -d DEV_ID, --dev-id=DEV_ID (-2:sequential,default; -1:OpenMP parallel)
  - -r RANK (the number of matrix columns, 16:default)
  - OpenMP options:
  - -t NTHREADS, --nt=NT (1:default)
  - --help

CPD:

COO-CPD (CPU, Multicore)
- Usage: ./build/examples/cpd [options], Options:
  - -i INPUT, --input=INPUT (.tns file)
  - -o OUTPUT, --output=OUTPUT (output file name)
  - -d DEV_ID, --dev-id=DEV_ID (-2:sequential,default; -1:OpenMP parallel)
  - -r RANK (CPD rank, 16:default)
  - OpenMP options:
  - -t NTHREADS, --nt=NT (1:default)
  - -u use_reduce, --ur=use_reduce (use privatization or not)
  - --help
COO-CPD (GPU)
- Usage: ./build/examples/cpd_gpu [options], Options:
  - -i INPUT, --input=INPUT (.tns file)
  - -o OUTPUT, --output=OUTPUT (output file name)
  - -d CUDA_DEV_ID, --cuda-dev-id=CUDA_DEV_ID (>=0:GPU device id)
  - -r RANK (CPD rank, 16:default)
  - GPU options:
  - -p IMPL_NUM, --impl-num=IMPL_NUM (11, 12, 15, where 15 should be the best case)v
  - --help
HiCOO-CPD (CPU, Multicore)
- Usage: ./build/examples/cpd_hicoo [options], Options:
  - -i INPUT, --input=INPUT (.tns file)
  - -o OUTPUT, --output=OUTPUT (output file name)
  - -b BLOCKSIZE, --blocksize=BLOCKSIZE (in bits) (required)
  - -k KERNELSIZE, --kernelsize=KERNELSIZE (in bits) (required)
  - -d DEV_ID, --dev-id=DEV_ID (-2:sequential,default; -1:OpenMP parallel)
  - -r RANK (CPD rank, 16:default)
  - OpenMP options:
  - -t NTHREADS, --nt=NT (1:default)
  - --help

TTM:

COO-TTM (CPU, Multicore, GPU)
- Usage: ./build/examples/ttm tsr mode impl_num smem_size [cuda_dev_id, R, output]
- tsr: input sparse tensor
- mode: specify tensor mode, e.g. (0, or 1, or 2) for third-order tensors
- impl_num: 11, 12, 13, 14, 15, where either 14 or 15 should be the best case
- smem_size: shared memory size in bytes (0, or 16000, or 32000, or 48000)
- cuda_dev_id: -2, -1, or 0, 1, ... -2: sequential code; -1: omp code; 0, or other possible integer: GPU devide id. [Optinal, -2 by default]
- R: rank number (matrix column size), an integer. [Optinal, 16 by default]
- output: the file name for output. [Optinal]
- An example: ./build/examples/ttm example.tns 0 15 16000 0 16 result.txt
SCOO-TTM (CPU, GPU)
- Usage: ./build/examples/sttm tsr U Y mode [cuda_dev_id]
- tsr: input semi-sparse tensor
- U: input dense matrix
- Y: output semi-sparse tensor
- mode: specify tensor mode, e.g. (0, or 1, or 2) for third-order tensors
- cuda_dev_id: -1, or 0, 1, ... -1: sequential code; 0, or other possible integer: GPU devide id. [Optinal, -1 by default]

Tucker Decomposition

COO-Tucker (CPU, GPU)

The code is in the jpdc branch, which is a complete different and imcompatible codebase in C++.
- Usage: ./build/examples/tucker --dev device [options] tsr R1 R2 ... dimorder1 dimorder2 ...
- device: CPU core ID or GPU ID, obtain with ./build/examples/detect_devices (Currently multicore CPU is not implemented)
- tsr: input sparse tensor
- R1, R2, ...: the shape of expected output core tensor
- dimorder1, dimorder2, ...: the order of the TTM chain operation
- Options:
- -o, --output: output the core tensor into a text file
- -d, --dense-format: print the result to screen in dense format, instead of sparse format
- -l, --limit: limit how much result to print to screen

The algorithms and details are described in the following publications.

Publication

Scalable Tensor Decompositions in High Performance Computing Environments. Jiajia Li. PhD Dissertation. Georgia Institute of Technology, Atlanta, GA, USA. July 2018. [pdf] [bib]
HiCOO: Hierarchical Storage of Sparse Tensors. Jiajia Li, Jimeng Sun, Richard Vuduc. ACM/IEEE International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC). 2018 (accepted). [pdf] [bib]
Optimizing Sparse Tensor Times Matrix on GPUs. Yuchen Ma, Jiajia Li, Xiaolong Wu, Chenggang Yan, Jimeng Sun, Richard Vuduc. Journal of Parallel and Distributed Computing (Special Issue on Systems for Learning, Inferencing, and Discovering). [pdf] [bib]
Optimizing Sparse Tensor Times Matrix on multi-core and many-core architectures. Jiajia Li, Yuchen Ma, Chenggang Yan, Richard Vuduc. The sixth Workshop on Irregular Applications: Architectures and Algorithms (IA^3), co-located with SC’16. 2016. [pdf]
ParTI!: a Parallel Tensor Infrastructure for Data Analysis. Jiajia Li, Yuchen Ma, Chenggang Yan, Jimeng Sun, Richard Vuduc. Tensor-Learn Workshop @ NIPS'16. [pdf]

Citation

@misc{parti,
author="Jiajia Li and Yuchen Ma and Richard Vuduc",
title="{ParTI!} : A Parallel Tensor Infrastructure for multicore CPUs and GPUs",
month="Oct",
year="2018",
url="https://github.com/hpcgarage/ParTI"
}

Contributiors

Jiajia Li (Contact: [email protected])
Yuchen Ma (Contact: [email protected])

License

ParTI! is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Name		Name	Last commit message	Last commit date
Latest commit History 874 Commits
.vscode		.vscode
cmake-modules		cmake-modules
docs		docs
examples		examples
extra		extra
include		include
matlab		matlab
src		src
tensors		tensors
tests		tests
tools		tools
.clang_complete		.clang_complete
.gitignore		.gitignore
CHANGES.md		CHANGES.md
CMakeLists.txt		CMakeLists.txt
Contributing.md		Contributing.md
GettingStarted.md		GettingStarted.md
LICENSE		LICENSE
README.md		README.md
build-sample.config		build-sample.config
build.sh		build.sh
test.sh		test.sh
todo.md		todo.md
valgrind.supp		valgrind.supp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ParTI!

Supported sparse tensor operations:

Supported sparse tensor formats:

Build requirements:

Build:

Build MATLAB interface (Not ready for new functions):

Build docs:

Run examples:

Publication

Citation

Contributiors

License

About

Releases

Packages

Languages

License

matthewmerris/ParTI

Folders and files

Latest commit

History

Repository files navigation

ParTI!

Supported sparse tensor operations:

Supported sparse tensor formats:

Build requirements:

Build:

Build MATLAB interface (Not ready for new functions):

Build docs:

Run examples:

Publication

Citation

Contributiors

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages