This is the cuda extension for DeepWok/MASE.
Note
This project is still under development.
Note
Beginner Guide notes down my learning process and setup.
- C++17 (GCC < 14)
- CUDA 12.4/12.5/12.6
- CMake >= 3.20
- Python >= 3.11
- Tox for Python package building and testing.
- Torch >= 2.3.0 (
pip install torch
in conda env) which includes LibTorch for wrapping CUDA kernels.
- Justfile
-
Build Tests
just build-cu-test
-
Build Profiling for NSight Compute
just build-cu-profile
-
Build
test_mxint8_dequantize1d
for debug and launch cuda-gdb for debuggingjust --set CU_BUILD_TARGETS test_mxint8_dequantize1d build-cu-test-debug cuda-gdb --args ./build/test/cu/mxint/dequantize/test_mxint8_dequantize1d 25600 256
tox
sets up the python environment automatically.
-
Build
mase_cuda
package and run python teststox # this is slow since cpu & gpu performance profiling is enabled
-
Run quick test
just test-py-fast
-
The package is built in
dist/
directory.
-
-
Create env for dev
tox -e dev
-
Build
mase_cuda
packagejust build-py