-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for 32-bit platforms broken? #221
Comments
CHOLMOD can use either int32 (as in cholmod_whatever) or int64 (as in cholmod_l_whatever). SPQR only has the int64_t version for its integers. I didn't realize that changing SuiteSparse_long from long to int64_t would break the 32-bit builds. Let me think over what I can do. |
I found this related issues for the 32-bit Octave: https://savannah.gnu.org/bugs/?59833 and I see you know about that :-). CHOLMOD, UMFPACK, AMD, CXSparse, COLAMD, KLU, LDL, etc. all provide both int32 and int64 versions. The problem with SuiteSparse_long is that it's a confusing type ... it's size was uncertain (just like "long") and that often causes difficulties. It's much simpler to have integers of known sizes: int32_t and int64_t. I don't want to go back to using SuiteSparse_long. The simplest way to do this would be to allow either 64-bit indices, or 32-bit, but not both, like I had when using SuiteSparse_long in SPQR. I would need to change all calls to cholmod_l_(whatever) to cholmod_(whatever), and I would need to change all "int64_t" with some internal typedef that would depend on something like -DSPQR32. This would be very easy to do. A better option would be to allow both 32-bit and 64-bit methods to exist side-by-side, via templates, but then the cholmod_sparse object would be difficult to handle, since it's not templatized and is a plain C object. That matrix type has an internal flag with the size of the integer, and I'd need to have the SPQR user call both cholmod_l_start (cc64) and cholmod_start (cc32), say, and then pass in the corresponding Common (cc64 or cc32) to SPQR. Matrices with different integer sizes can't be mixed inside CHOLMOD, but they can both exist at the same time in the user space. This option would take more work since it's a more significant change to make. It might be overkill since I don't think user applications would typically need both 32-bit and 64-bit matrices to exist at the same time in the same application. I'd also like to keep backward compatibility with the existing 64-bit interface to SPQR, since that is the most commonly used form. In the meantime, the 32-bit Octave would need to stick with SuiteSparse v5.13. What do you think? What would make most sense for Octave? |
I'm not certain which bug report you are referring to. Something seems to have gone wrong with the link. I'm looking at this with two slightly different hats on: The other hat is the one of a contributor to MSYS2 which distributes binary packages for Windows in different "build environments". Usually, the versions that are distributed for the different build environments are the same. So either, SuiteSparse would need to be downgraded to version 5.13 for all environments or they could stop shipping SuiteSparse for the 32-bit environments. (That is if it is really that broken on those platforms.) I don't have a complete overview yet. And it might be that I'm misunderstanding the issue. But from a first glance, I'd guess that this is one of the cases why there is I also don't know if the width of Fortran integers comes into play here. Afaict, that is usually 32-bit. But it can be 64-bit on 64-bit platforms. Afaict, most distributions don't ship binaries with 64-bit Fortran integers though (at least not by default). |
Here's the corrected link (I also editted it in my post above): https://savannah.gnu.org/bugs/?59833 intptr_t would be an option, but I'd like to stick with int32_t and int64_t for the packages that support both integer sizes (AMD, COLAMD, KLU, UMFPACK, CHOLMOD, etc). That way, even on a 32-bit platform, the 64-bit integer index would be available. I can keep the SPQR interface nearly unchanged. I would essentially revert to SuiteSparse_long as the single supported integer size. I would call that integer something else, since the integer would be for SPQR only. The integer would be int64_t by default, but would become int32_t if -DSPQR32 is used. SPQR relies on CHOLMOD, and it would call cholmod_l_* methods by default, such as cholmod_l_analyze (the int64_t version). If -DSPQR32 is enabled, it would call cholmod_analyze instead (the int32_t version). This change would be very easy for me to make, and it would not extend or break the current API. This change would require a minor change to the Octave interface. How does this sound? The fortran integer is a separate issue. I detect that already, and I do a safe typecast into the SUITESPARSE_BLAS_INT in the macros defined in SuiteSparse_config.h. I detect the blas integer size in cmake, and configure SuiteSparse_config.h accordingly. That's totally transparent to the user of UMFPACK, CHOLMOD, and SPQR, and has no effect on the choice of integers for the sparse matrix data structures. |
Does the 32-bit Octave use 32-bit integers for the octave_idx_type, and 64 bit integers for the 64-bit build? |
I believe it is doing that already anyway:
By default, yes. However, it is possible to limit the size to 32 bit manually when configuring with |
Perfect. The changes to Octave would then hopefully be minor. This section:
can become
or maybe just the single line
Then if SuiteSparseQR is compiled by default, its integers would be int64_t. That would be the case when octave_idx_type is 64 bit. If compiled with (say) -DSPQR32, then the integer used by SPQR would be int32_t, which would also match the octave_idx_type if Octave is compiled on a 32-bit platform or if --disable-64 is used. |
I can also create a typedef that is visible to the user application in the SPQR include files, say SPQR_int, which would be the integer index type I would be using in all of SPQR. That definition would be configured by the cmake config process to be int64_t (by default) or int32_t (if -DSPQR32 is used in the cmake build). |
We'd probably still need to check during configure which version of the SuiteSparse libraries we are dealing with. The newer |
Good point. I would make this SuiteSparse 6.1.0, most likely, and I would bump the major version of SPQR by one as well (from 3.0 to 4.0). Then SuiteSparse 5.13.0 and earlier (with SPQR 2.x and earlier), or 6.1.x or later (with SPQR 4.0 and later), would work with Octave, but not SuiteSparse 6.0.x (with SPQR 3.0.x, which breaks the 32-bit octave). |
The alternative would be to add new user-visible functions with a new name space (say SuiteSparseQR32_*) that used int32_t exclusively, and keep the current ones using int64_t all the time. Then the compiled library libspqr.so, or libspqr.dll or whatever, would always have both versions available. I could use templating instead of the namespace, but that would change the user-visible API for the current methods, which only templatize on the Entry type (real or complex). This would take some more work but might be a more robust solution in the long run. It would make the SPQR library consistent with AMD, COLAMD, UMFPACK, CHOLMOD, KLU, etc. |
I might be missing something. But didn't we find out that the version with 64-bit integers doesn't work correctly on 32-bit platforms? Would that version start working for some reason with the changes you are proposing? |
The 64-bit versions is working fine for other packages. The problem here with SPQR is that the 32-bit Octave is expecting it to take int32_t inputs (the old SuiteSparse_long), but SPQR now assumes int64_t. I don't want to use intptr_t because it's hard to reason about (like the old SuiteSparse_long). I want functions that use a fixed integer size, like amd_order using int32_t and amd_l_order using int64_t. It's hard to work with integers that vary in size based on which platform happens to be compiling the code. I want to have control over the integer size, so I'd rather pick int32_t or int64_t explicitly. |
Sorry, I'm still not sure I understand. |
Yes, SuiteSparse_long is now int64_t. It used to be different. I haven't gotten any feedback on anyone building on a 32-bit platform for many years, and I don't have access to one myself. So I hadn't tried it, and that's why my recent change inadvertently broke the 32-bit Octave. Using intptr_t would make SPQR very different from all my other codes. It would be hard to know which version of CHOLMOD that SPQR should call (cholmod_analyze or cholmod_l_analyze?). I would rather that the current SPQR functions keep their int64_t integers, and then add new ones via templates or a different name. Then the current SPQR functions would always connect with cholmod_l_whatever, and the new 32-bit SPQR functions would always connect to cholmod_whatever. I'm not sure why some demos are giving incorrect output. Some of the incorrect results are because of how the values are printed with printf. The printfs are using the wrong format. I can fix that. |
So I think my plan is to make SPQR just like CHOLMOD: with both int32_t and int64_t options. Then Octave would pick the methods for SPQR the same way it does for CHOLMOD. This will take more work than making SPQR use a single integer (say intptr_t) but it's more robust in the long term. |
I'm not sure if the issue that we found with
|
MSYS2 now packages the last SuiteSparse 5 version for 32-bit: Other distributions could probably do the same. It might still be nice if SuiteSparse could fix the |
Yes, that's my plan. I plan on adding a 32-bit variant of SPQR (option 1 in your comment above), where all the indices are 32-bit in size. |
I agree that this would be a good solution to make the APIs of the different libraries more similar. I'd still argue that indices must be smaller than |
I've added this issue to a Project for the future SuiteSparse v7.1.0: https://github.com/users/DrTimothyAldenDavis/projects/1 |
Is that project set to private? I'm just getting a 404 error page for that link. |
I just set it to public. |
@DrTimothyAldenDavis - @Wimmerer and I are running into some of these challenges getting SuiteSparse 7 into Julia. Would you recommend waiting until 7.1 is out and the dust on 32-bit issues is settled - especially if APIs will change? Or would you recommend moving forward with 7.0 for now, and that 7.1 will only introduce new interfaces for SPQR 32-bit? |
I also tried cherry-picking this change to SuiteSparse 7.1.0. With it, Octave no longer segfaults during the test suite. But I still see a failing test that used to pass with SuiteSparse 5:
The last message could indicate that some indices or sizes might still be returned with bogus values. But the main blocker -- i.e., the segfault -- is indeed fixed afaict. 🎉 |
Fwiw, the MINGW32 build artifacts that I used for that test are here: |
@svillemot: You wrote that Octave's test suite no longer segfaults for you after you cherry-picked that commit. Do you still see the regression though? Or is that Windows-only? |
Except for that particular Octave test failure, I think this issue is resolved. That would be nice to fix, of course, but I'm not sure if it's now an issue with SPQR, and hopefully now an issue of a minor update to Octave to use the specific 32-bit or 64-bit version it needs. |
Again: Does that mean your answer to this issue is that the |
No, the spqr interface isn't broken; it now supports both 32-bit and 64-bit integer indices, on both 32-bit and 64-bit platforms. It's just different from what it used to be, with this change of API. So that could be causing the Octave failure, but hopefully a minor update to Octave would be easy to do, on the 32-bit platform. |
Ideally, if I could replicate the Octave issue myself, I might be able to submit a PR to update Octave's interface to SPQR on 32-bit platforms. |
I would appreciate that. |
Would I start with building octave from here: https://github.com/gnu-octave/octave ? I would need to link it against SuiteSparse v7.2.0.beta1, and presumably test it on a 32- bit platform. I have a 32-bit debian 11 installed in virtualbox. I would then need to find the test that is currently failing. Looking through the octave source code, I think the best solution would be for SuiteSparseQR to track the same integer as octave_idx_type. That type is either int32_t or int64_t. The SuiteSparse_long type would then be removed and replaced with octave_idx_type. The two uses of CHOLMOD_LONG in sparse-qr.cc would become a ternary: A.itype = sizeof (octave_idx_type) == sizeof (int64_t) ? CHOLMOD_LONG : CHOLMOD_INT ; and the copy of Ap and Ai on lines 250-260 and 294-304 would go away. All calls to SuiteSparseQR would use This would assume that SuiteSparseQR 4.2.0 or later is available, of course, so perhaps an #ifdef would be required to keep the current sparse-qr.cc the same if the version of SuiteSparseQR is < 4.2.0. The function from_suitesparse_long would no longer be needed. Before I try al that, let me try to replicate the test failure so I understand what is currently failing. |
I don't know if that is possible without a lot of preprocessor conditionals. We need to keep compatibility with the versions of SuiteSparse that are packaged by the various distributions. For the time being, it might be best to keep using the version that uses
For the reasons mentioned before, that would probably increase the maintenance burden because different code paths for older and newer versions of SuiteSparse would need to be maintained. |
It's a Heisenbug. I was able to reproduce when configuring with Something's not getting initialized correctly? So, the values depend on the memory layout? Or a |
Unsure. I would need to replicate the bug myself. I suppose I need to build octave myself and link in SuiteSparse v7.2.0.beta. What OS should I use to do that? Does this bug only occur for 32-bit OS's? Or does --disable-64 trigger the bug in a 64-bit OS? I'm trying the latest version on the github repo, with "configure --disable-64 ; make " and then I suppose "make install ; make check" ? I'm trying this on a 64-bit OS (Ubuntu 18.04 with gcc/g++ 12.2.0 from spack). That might not be the best place to try it but at least it will show me how to compile my own copy of octave. Then how to I tell it to use my own copy of SuiteSparse v7.2.0beta? |
The "Heisenbug behavior" was with a 64-bit build that was configured with To have the configure script pick up the libraries from a non-default location, you could configure with After a fresh checkout, you'd need to run the bootstrap script once. I.e., the following commands should get you started:
|
Thanks! I'll give it a try and let you know how it goes. |
OK ... making progress. I got octave to build with my beta SuiteSparse 7.2.0, and I see the test failure. I can also replicate it in ./run-octave:
qr(A) seems to handle rank deficient matrices OK, just not matrices that are entirely empty. I have some of those matrices in my standalone tests, outside of octave, and they work fine. Either octave is triggering a different path inside my code and hitting a bug in SPQR, or perhaps there's a glitch in the octave interface to the revised SPQR, or something. Whichever it is, it looks like I should be able to track this down. |
Oh ... I think I see a bug in sparse-qr.cc. Octave is using A->nzmax to determine the # of entries in the matrix. That is not the right value. That's the space allocated for entries in the matrix. For an empty matrix, I think A->nzmax is 1 but A [ncol] is zero. Checking this now but that looks suspicious. |
The correct method to ask a matrix for the # of entries is to use cholmod_nnz or cholmod_l_nnz. |
That's the bug, but I'm having a hard time fixing it because I can't find the CHOLMOD Common object to pass in.
The usage A similar bug appears when using CXSparse instead. |
Here's a corrected sparse-qr.cc file. It generates some warnings about typecasting but it fixes the bug. |
I haven't tested the changes I made to the CXSparse interface, since I'm running with SPQR installed. |
Summary: PASS 18292 |
Thank you very much! I pushed your changes with some modifications here: That also includes changes of the same pattern in other files. So, it's been an issue in Octave after all that only showed up with newer versions of SuiteSparse (for whatever reason). |
Based on changes by @DrTimothyAldenDavis on GitHub: DrTimothyAldenDavis/SuiteSparse#221 (comment) * liboctave/numeric/sparse-qr.cc (rcs2ros, ccs2cos): Add cholmod_common as input argument. Use "cholmod_l_nnz" to get number of non-zero matrix elements. (sparse_qr<T>::sparse_qr_rep::V): Pass cholmod_common to conversion function. Get correct number of non-zero matrix elements. (sparse_qr<T>::sparse_qr_rep::R): Use "cholmod_l_nnz" to get number of non-zero matrix elements. Get correct number of non-zero matrix elements. (sparse_qr<T>::sparse_qr_rep::tall_solve): Use "cholmod_l_nnz" to get number of non-zero matrix elements. (sparse_qr<T>::min2norm_solve): Pass cholmod_common to conversion function. * liboctave/numeric/sparse-qr.cc (parse_chol<chol_type>::L): Use "cholmod_[l_]nnz" to get number of non-zero matrix elements. * liboctave/array/CSparse.cc, dSparse.cc (fsolve): Use "cholmod_[l_]nnz" to get number of non-zero matrix elements.
Glad to help! Yes, my modifications did need some polishing; my typecast was generating a warning, and I only fixed the one file. The bug has been there for a long time, I'm sure. Typically, my nzmax is equal to max(1,nz) for a matrix with nz entries, and sometimes larger. For example: SuiteSparse/CSparse/Source/cs_util.c Line 12 in b65115d
SuiteSparse/CHOLMOD/Core/cholmod_sparse.c Line 107 in b65115d
I do this because As a result, if nnz(A) is zero, A->i still exists and A->nzmax == 1. And that one integer was uninitialized space. Looks like I can close this issue now, right? |
Based on changes by @DrTimothyAldenDavis on GitHub: DrTimothyAldenDavis/SuiteSparse#221 (comment) * liboctave/numeric/sparse-qr.cc (rcs2ros, ccs2cos): Add cholmod_common as input argument. Use "cholmod_l_nnz" to get number of non-zero matrix elements. (sparse_qr<T>::sparse_qr_rep::V): Pass cholmod_common to conversion function. Get correct number of non-zero matrix elements. (sparse_qr<T>::sparse_qr_rep::R): Use "cholmod_l_nnz" to get number of non-zero matrix elements. Get correct number of non-zero matrix elements. (sparse_qr<T>::sparse_qr_rep::tall_solve): Use "cholmod_l_nnz" to get number of non-zero matrix elements. (sparse_qr<T>::min2norm_solve): Pass cholmod_common to conversion function. * liboctave/numeric/sparse-qr.cc (parse_chol<chol_type>::L): Use "cholmod_[l_]nnz" to get number of non-zero matrix elements. * liboctave/array/CSparse.cc, dSparse.cc (fsolve): Use "cholmod_[l_]nnz" to get number of non-zero matrix elements.
In versions prior to 6.0, there was a data type
SuiteSparse_long
. That data type waslong
on all platforms but WIN64 where it was__int64
.That means that the data type had a size of 32 bit on 32-bit platforms and a size of 64 bit on 64-bit platforms (Linux and Windows alike).
For version 6.0, that was replaced with
int64_t
unconditionally. That means the type has now a size of 64 bit on 32-bit and 64-bit platforms.IIUC, that type is used when indexing into arrays. But those can't be larger than 32 bit on 32-bit platforms.
Would it make sense to replace that type with
intptr_t
instead ofint64_t
?I'm seeing crashes when using SPQR/CHOLMOD with 32-bit Octave on Windows.
Could that be related to that change? Are 32-bit platforms no longer supported by SuiteSparse?
The top of a backtrace from a segmentation fault caught with
gdb
when using 32-bit Octave with the libraries from SuiteSparse 6.0.2 (on WoW64):The remainder of the backtrace is in functions in Octave.
The text was updated successfully, but these errors were encountered: