Add two-stage algorithms #25

dmbates · 2020-10-16T19:00:02Z

Motivated in part by https://discourse.julialang.org/t/wraping-a-mkl-handle-in-julia/48241 I looked in more detail at the two-stage algorithms for recent MKL versions, They are of interest to me, specifically some of the multiplication and solutions. I could create a PR on this repository or create a separate Julia package to wrap these.

Because the two-stage algorithms go beyond some of the BLAS functions I am leaning toward creating a separate package. However, I can be convinced that they would fit in with this package. Any opinions?

pablosanjose · 2020-10-16T19:39:38Z

I don't have a strong opinion either way, but would very much love to see progress towards interfacing Julia with MKL's Inspector-executor routines. I think I would lean towards integrating them in this package, which was actually meant to go beyond BLAS, according to the README, right?

pablosanjose · 2020-10-16T19:44:28Z

Also, see #17

dmbates · 2020-10-18T15:31:52Z

To avoid cross-talk between the two formulations I will create a new package MKLsparseinspector to experiment with the inspector-executor routines.

dmbates · 2020-10-19T21:46:06Z

Having looked at this a bit, I think it may be possible to re-use much of the structure of the existing package and that it would be worthwhile working on this because there are many advantages to the new formulation.

Many of the execution routines can use the MKL CSC format so that there is no need to keep track of the adjoint when going from Julia's SparseMatrixCSC format to the MKL CSR format. (Unfortunately, mkl_sparse_?_syrkd, which is one that I am interested in, does not allow MKL CSC.)

The way I am approaching the easy cases is to define a mutable empty struct in Julia (MKLcsc) parameterized by element type with a conversion function that takes a SparseMatrixCSC and returns a pointer to a struct. The simple routines just create the pointer, pass the pointer to, e.g. mkl_sparse_d_mv and call :mkl_sparse_destroy on the pointer (to free the memory allocated in MKL) before returning.

I have taken the liberty of creating a branch of this repository to implement this approach.

dmbates · 2020-10-20T17:49:02Z

I have pushed some changes to the dmb/matrix_sparse branch implementing what I had suggested for the mv, mm, sv, and sm operations. Some of the tests in the triangular test set are failing and I am looking into that but I thought it would be best to get something on the repository for others to look at.

dmbates · 2020-10-20T21:51:49Z

The test failures are very weird. Multiplications of a vector by a triangular sparse matrix on the left that go through cscmv! and, hence the C function mkl_sparse_d_mv give incorrect answers whereas converting the vector to an n by 1 matrix so that the calculation goes through cscmm! and mkl_sparse_d_mm gives the correct answer. But in the matrix-vector multiplication test set the cscmv! -> mkl_sparse_d_mv path works. I'm having difficulty deciding what could be different in the triangular matrix case.

pablosanjose · 2020-10-21T05:52:59Z

I'm not familiar with MKL's mechanism of inspector-executor. Perhaps ping some of the other linalg Julia folk?

dmbates · 2020-10-21T17:57:22Z

General call for help given in https://discourse.julialang.org/t/help-with-debugging-some-code-for-the-mklsparse-package/48764

andreasnoack · 2020-10-21T21:38:40Z

I think there might be a bug in MKL here although it seems unlikely. This program

#include "mkl_spblas.h"
#include "stdio.h"

#define M 2
#define N 2
#define NNZ 3

int main(int argc, char const *argv[])
{
    struct matrix_descr descrA;
    sparse_matrix_t cscA;
    sparse_status_t err;

    MKL_INT cscColPtr[] = {0, 2, 3};
    MKL_INT cscRowInd[] = {0, 1, 1};
    double  cscVal[] = {1.0, 1.0, 1.0};

    double x[] = {1.0, 1.0};
    double y[] = {0.0, 0.0};

    mkl_sparse_d_create_csc (
        &cscA,
        SPARSE_INDEX_BASE_ZERO,
        N,  // number of rows
        M,  // number of cols
        cscColPtr,
        cscColPtr + 1,
        cscRowInd,
        cscVal );
    /* code */

    descrA.type = SPARSE_MATRIX_TYPE_GENERAL;

    err = mkl_sparse_d_mv (
        SPARSE_OPERATION_NON_TRANSPOSE,
        1.0,
        cscA,
        descrA,
        x,
        0.0,
        y );

    printf("general sparse-dense matvec\n");
    printf("%f %f\n", y[0], y[1]);

    descrA.type = SPARSE_MATRIX_TYPE_TRIANGULAR;
    descrA.mode = SPARSE_FILL_MODE_LOWER;
    descrA.diag = SPARSE_DIAG_NON_UNIT;

    err = mkl_sparse_d_mv (
        SPARSE_OPERATION_NON_TRANSPOSE,
        1.0,
        cscA,
        descrA,
        x,
        0.0,
        y );

    printf("\ntriangular sparse-dense matvec with SPARSE_FILL_MODE_LOWER\n");
    printf("%f %f\n", y[0], y[1]);


    descrA.mode = SPARSE_FILL_MODE_UPPER;

    err = mkl_sparse_d_mv (
        SPARSE_OPERATION_NON_TRANSPOSE,
        1.0,
        cscA,
        descrA,
        x,
        0.0,
        y );

    printf("\ntriangular sparse-dense matvec with SPARSE_FILL_MODE_UPPER\n");
    printf("%f %f\n", y[0], y[1]);


    descrA.mode = SPARSE_FILL_MODE_LOWER;

    mkl_sparse_d_mm (
        SPARSE_OPERATION_NON_TRANSPOSE,
        1.0,
        cscA,
        descrA,
        SPARSE_LAYOUT_COLUMN_MAJOR,
        x,
        1,
        1,
        0.0,
        y,
        1 );

    printf("\ntriangular sparse-dense matmat with SPARSE_FILL_MODE_LOWER\n");
    printf("%f %f\n", y[0], y[1]);

    return 0;
}

gives

general sparse-dense matvec
1.000000 2.000000

triangular sparse-dense matvec with SPARSE_FILL_MODE_LOWER
1.000000 1.000000

triangular sparse-dense matvec with SPARSE_FILL_MODE_UPPER
1.000000 2.000000

triangular sparse-dense matmat with SPARSE_FILL_MODE_LOWER
1.000000 2.000000

so it looks like they might have flipped the storage branches for the mv routine.

dmbates · 2020-10-22T00:50:02Z

Thanks for checking that by writing the C code, @andreasnoack . I was beginning to convince myself that the problem may be in MKL but I kept thinking that this was such a basic routine that it couldn't possibly have a bug that had gotten past their tests undetected. It may be that they haven't checked the CSC version as closely as the CSR version.

Would you be willing to write to the MKL discussion forums about this? Information is at
https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-to-post-in-the-Intel-oneAPI-Math-Kernel-Library-forum/td-p/1209982

andreasnoack · 2020-10-22T09:55:06Z

Yes. I can do that once Intel's registration system works again. Just tried and it currently isn't working.

andreasnoack · 2020-10-22T10:59:49Z

Ref https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Possible-issue-with-mkl-sparse-x-mv-when-using-CSC/m-p/1220155#M30192

dmbates · 2020-10-22T17:57:28Z

Thank you very much for making that report, @andreasnoack . I haven't been able to view that link yet but I imagine that is because the moderator must release it.

I am rather disappointed in the quality of this release of MKL. I have been working on incorporating mkl_sparse_?_syrkd into this branch as the rankUpdate operation is where I was hoping to gain speed in a blocked Cholesky factorization. For some of my models one of the off-diagonal blocks will be a large, sparse, rectangular matrix and I need to downdate the diagonal block from it. I did get it working and my timings show that mkl_sparse_?_syrkd is about 3 times slower than the single-threaded, non-SIMD Julia code I am using. Even worse, the answers are not correct in that it evaluates the entire symmetric update but only multiplies the upper triangular part of C by β. So repeated calls end up with junk in the lower triangle of C.

dmbates · 2020-11-11T16:38:18Z

@andreasnoack Has there been any response from the MKL maintainers about the bug report? I have not succeeded in viewing that issue.

andreasnoack · 2020-11-11T19:39:00Z

Not really. On 30 October, I was told that

We will check the issue ASAP and get back to you.

and I haven't heard more. I've just asked for an update in the support request.

andreasnoack · 2020-12-17T15:17:36Z

Intel has just gotten back to me to tell me that they were sorry for not replying earlier. No further details. To be continued.

pablosanjose · 2021-01-22T21:16:48Z

No further news?

andreasnoack · 2021-01-23T20:40:35Z

Unfortunately not. Actually, I asked for an update four days ago but haven't heard back from Intel.

andreasnoack · 2021-01-26T08:22:05Z

Some progress. They just replied that they have run the reproducer I shared, that they are able to reproduce the bug, and that

The issue has been escalated to the MKL engineering and will be addressed.

I'll follow up.

andreasnoack · 2021-02-23T07:31:30Z

Just got the following update from Intel

The issue is in progress.

andreasnoack · 2021-03-08T08:33:06Z

Today's news from Intel

The Issue has been fixed and will be available with MKL 2021.U3

BacAmorim · 2021-10-22T14:23:20Z

MKL_jll.jl currently ships MKL v2021.1.1+2. For this issue to be solved, MKL_jll would have to ship the latest MKL version, correct?

BacAmorim · 2022-01-31T16:55:59Z

MKL_jll.jl has been updated in the lattest realease to MKL v2022.0.0+0. So this issue can now be solved, correct?

dmbates · 2022-01-31T17:11:40Z

By the way, I recently checked again the speed of their mkl_sparse_?_syrkd versus our single threaded code and the single threaded code is still faster. It appears that their inspector-executor code for sparse matrices didn't have quite the same level of effort invested in it as did the dense matrix code.

amontoison · 2024-04-28T04:53:26Z

I regenerated src.libmklsparse.jl with MKL_jll.jl v2024.1.0. Do you need additional routines?
I can easily add other headers if needed: https://github.com/JuliaSparse/MKLSparse.jl/blob/master/gen/wrapper.jl#L43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add two-stage algorithms #25

Add two-stage algorithms #25

dmbates commented Oct 16, 2020

pablosanjose commented Oct 16, 2020

pablosanjose commented Oct 16, 2020

dmbates commented Oct 18, 2020

dmbates commented Oct 19, 2020

dmbates commented Oct 20, 2020

dmbates commented Oct 20, 2020

pablosanjose commented Oct 21, 2020

dmbates commented Oct 21, 2020

andreasnoack commented Oct 21, 2020

dmbates commented Oct 22, 2020

andreasnoack commented Oct 22, 2020

andreasnoack commented Oct 22, 2020

dmbates commented Oct 22, 2020

dmbates commented Nov 11, 2020

andreasnoack commented Nov 11, 2020

andreasnoack commented Dec 17, 2020

pablosanjose commented Jan 22, 2021

andreasnoack commented Jan 23, 2021

andreasnoack commented Jan 26, 2021

andreasnoack commented Feb 23, 2021

andreasnoack commented Mar 8, 2021

BacAmorim commented Oct 22, 2021

BacAmorim commented Jan 31, 2022

dmbates commented Jan 31, 2022

amontoison commented Apr 28, 2024

Add two-stage algorithms #25

Add two-stage algorithms #25

Comments

dmbates commented Oct 16, 2020

pablosanjose commented Oct 16, 2020

pablosanjose commented Oct 16, 2020

dmbates commented Oct 18, 2020

dmbates commented Oct 19, 2020

dmbates commented Oct 20, 2020

dmbates commented Oct 20, 2020

pablosanjose commented Oct 21, 2020

dmbates commented Oct 21, 2020

andreasnoack commented Oct 21, 2020

dmbates commented Oct 22, 2020

andreasnoack commented Oct 22, 2020

andreasnoack commented Oct 22, 2020

dmbates commented Oct 22, 2020

dmbates commented Nov 11, 2020

andreasnoack commented Nov 11, 2020

andreasnoack commented Dec 17, 2020

pablosanjose commented Jan 22, 2021

andreasnoack commented Jan 23, 2021

andreasnoack commented Jan 26, 2021

andreasnoack commented Feb 23, 2021

andreasnoack commented Mar 8, 2021

BacAmorim commented Oct 22, 2021

BacAmorim commented Jan 31, 2022

dmbates commented Jan 31, 2022

amontoison commented Apr 28, 2024