-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feedback Needed - Refactored SpMxSpM #158
Comments
I'm checking it right now. |
I think the way to multiply two sparse matrices here is way too complicated. I understand that the issue is related to the concept that user is responsible for buffer allocation. Here are my general comments
The general explicit call for spGemm should be done at most in two separated functions. The first one calculates the necessary memory requirements for result matrix, second is doing actual spmm. clsparseSpGEMM spgemmInfo;
clsparseStatus status = clsparse<S/D>SpmmAnalyse(&A, &B, &spgemmInfo, control);
if (status == clsparseSuccess)
{
//check the spgemmInfo additional information, i.e. if there is enough memory to
//complete spmm operation.
//allocate the C matrix according to spgemminfo content;
//do the spmm
status = clsparse<S/D>Spmm(&A, &B, &C, &spgemmInfo, control);
}
clsparseReleaseSpGEMMInfo(&spgemmInfo); I see that it might be difficult at the moment, but I would like to see that every iteration of this algorithm bring us to this kind of concept. Here some more details inside of clsparse-spm-spm.cpp:
I did not go trough the internal spmm functions deeply for now, but I saw that they need to be templetized for various types, the kernels should be feeded with additional definitions group size, tuple_queue etc. In the final optimisation stages several kernels like compute_nnz_Ct looks like spmv traversal so they might be optimised for vectorized versions. |
Thanks for the feedback @jpola . I agree with all comments and your suggestions can be incorporated, but before I do that should we go ahead with this design. Basic principle of this design is, to separate memory allocations from computation and user should have control for buffer allocation. |
@kvaragan CLSPARSE_EXPORT clsparseStatus
clsparseScsrSpGemm(
const clsparseCsrMatrix* sparseMatA,
const clsparseCsrMatrix* sparseMatB,
clsparseCsrMatrix* sparseMatC,
const clsparseControl control ); than to change from these API's clstatus = clsparseSpMSpM_CreateInit(&A, &B, control,
&spgemmInfo, // output
&rowPtrCtSizeInBytes, // output
&buffer_d_SizeInBytes); // output
clstatus = clsparseSpMSpM_ComputennzCtExt(csrRowPtrCt_d,
rowPtrCtSizeInBytes,
sizeof(cl_int),
buffer_d,
buffer_d_SizeInBytes,
&nnzCt,
spgemmInfo,
control);
clstatus = clsparseSpMSpM_ScsrSpmSpmnnzC(C.rowOffsets, // ( m+1) * sizeof(int) - memory allocated
csrRowPtrCt_d, // ( m+1) * sizeof(int) - memory allocated
&csrColIndCt_d, // nnzCt * sizeof(int)
&csrValCt_d, // nnzCt * sizeof(float) - Single Precision
&nnzC, // nnz of Output matrix
spgemmInfo,
control);
clstatus = clsparseSpMSpM_FillScsrOutput(C.rowOffsets,
C.colIndices,
C.values,
csrRowPtrCt_d,
csrColIndCt_d,
csrValCt_d,
nnzC,
spgemmInfo,
control); I agree that currently the SpM-SpM is allocating a lot of internal memory, but the user experience of the original API is more in line with what we want present to our users. I have not combed through the implementation files, but I trust that Jakub has given sufficient feedback for this iteration. Here's your challenge: can you design a working implementation with an API like what Jakub mocked above? 1 helper |
@jpola |
I wish but I think it would not be possible. I'll take a closer look on 2015-09-29 23:37 GMT+02:00 Kent Knox [email protected]:
|
@jpola using clsparse::vector is same as using clCreateBuffer(),because internally that's what clsparse::vector is doing. If I use clsparse::vector then it will defeat the purpose of this design. |
@kknox Because of the nature of the algorithm, 2 APIs are not enough but if we use clsparse::vector may be it can be done, but that means memory allocations inside the library and that will be as good as our first implementation. So can I conclude we that will be going with the 1 API design which is already present in the develop branch? |
@kvaragan of course the clsparse::vector is built over the opencl buffer. It was just my suggestion that I would use it because it have more benefits among many I think those are the most important:
|
@jpola If this is the case, then why don't we expose this header to application, so that application can start using clsparse::vector. I don't know whether this is currently possible or not. We will have advantages of what you mentioned and advantages of user having control over memory allocations. If we can't expose this header, then we don't need this new design we can stick to the old one and just replace explicit device memory allocations with clsparse::vector. |
Sorry Kiran, I might misunderstood you. The concept I mentioned before is When it comes to interface functions from clSPARSE.h header we are passing However when you are implementing interface functions you can put a //create vector from preinitialized buffer. You may see the Concluding, I wanted to suggest that when you are in the definition of the Kuba. 2015-09-30 10:42 GMT+02:00 Kiran [email protected]:
|
Understood, I missed the point C-style interface. So we will stick to the 1 API design + your suggestions about memory allocations (clsparse::vector). Thanks for clarifying. |
@kvaragan I looks like @jpola answered the majority of your questions, but to close this out I think we need to ship Beta2 with the 1 API version. It's not optimal, but its easy to document and use. I think a good approach to use is to try to simplify the guts of the SpM-SpM routine first (starting with the recommendations above), and once the current codepaths are simpler (possibly leveraging boost.compute) it may be easier to refactor the memory allocations in a cleaner way. In the ideal world (I like Jakub's proposed names), the *Analyze function would 'simulate' what would happen end to end in the SpM-SpM call, and can record/store all required temp buffers needed for the algorithm. The user would then be free to allocate those buffers as they see fit and put the buffer handles into the spmminfo structure to be passed into the *spmm compute routine. Everything included in or included by clsparse.h should be |
Single API is now split into multiple API's in order to allow the application to allocate memory rather than the library.
I have written the sample code on how to use this API.
Is this the better-way ? See the code @
sample file : https://github.com/kvaragan/clSPARSE/blob/kvaragan_PR150/samples/sample-spmspm.cpp
refactor code: https://github.com/kvaragan/clSPARSE/blob/kvaragan_PR150/src/library/blas3/clsparse-spm-spm.cpp
API's : https://github.com/kvaragan/clSPARSE/blob/kvaragan_PR150/src/include/clSPARSE.h
The text was updated successfully, but these errors were encountered: