Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MpiInt is wrong and should be removed #83

Open
jeffhammond opened this issue Sep 22, 2023 · 3 comments
Open

MpiInt is wrong and should be removed #83

jeffhammond opened this issue Sep 22, 2023 · 3 comments

Comments

@jeffhammond
Copy link

jeffhammond commented Sep 22, 2023

From #19 (comment):

Furthermore, the MPI C interface is newly assumed to expect the type MpiInt, which is a macro again, by default evaluates to int, but can be (in hypothetical and very unlikely circumstances) overridden by the user as well. However, I know of no MPI C library providing other than LP64 interface, so this freedom will be most likely never used.

Sorry, but this is just wrong. The MPI standard uses int and no other type is conforming. There is no scenario involving a valid implementation of MPI where MpiInt should ever be used.

The MPI standard has MPI_Fint specifically to ensure compatibility with the MPI Fortran library, and only this type should be used for this purpose.

I don't have strong feelings about Int but MpiInt should be removed. Its existence is dangerous and unnecessary.

@haidarazzam
Copy link

haidarazzam commented Feb 28, 2024

@jeffhammond @albandil @langou
I think there is 2 main issues here.

  1. First let's ignore MPI. the BLACS functions BI_XXX are scalapack functions and are called from within the Fortran or C routines, thus if Fortran and C have their Int compiled as 64bits integer, then the arguments to the BI_XXX functions are of typedef 64bits and thus should be Int not MpiInt to avoid erroneous and confusion. Moreover for example dgamx2d_.c line 20 we have len[0] = len[1] = N; len is of type MpiInt while N is of type Int. I believe we should fix this to have both of type Int, and if MPI provide any 64bit API, it should be handled inside the BI_XXX function.

  2. related to the above, I am not sure today how compiling scalapack for 64bit will work, because currently Int is set by the user to 64bits integer while MpiInt is hardcoded to int, I guess the compiler is doing the automatic casting but this can be a source of a lot of bugs in particular if the int64 that is passed to MPI overflow. I suggest immediate fix for it, by casting all int64 to int

  3. Now let talk the MPI issue. according to my understanding, all MPI installation (standard) are 32bit API, thus we need to cast and check for overflow all calls to MPI. casting in the C API is easy but I am not sure about the Fortran call to MPI, we may need to replace all Mpval by a new variable that will be cast. The other solution for 64bits is for user to use their own compilation. OMPI allows compiling it for 64bit but I don't know about MPICH. Also there is https://github.com/jeffhammond/BigMPI . Anyway, I don't think users would want to compile their own MPI which can result in a bad config and usually it is not preferred on suprcomputer.

  4. @jeffhammond I am not sure how MPI_Fint could help here?

@jeffhammond
Copy link
Author

  1. Yes, BLACS can do whatever it wants as long as the C code safely casts to int before calling MPI C API.
  2. It is a good idea to have a macro to cast safely from 64 to 32 bits. NWChem was burned by this at one point when we tried to broadcast more than INT_MAX elements.
  3. BigMPI is just a prototype of MPI-4 large-count support, which isn't implemented in Open MPI yet. For this, one could define MpiInt to MPI_Count but then all the C API symbols would need the _c suffix.
  4. MPI_Fint doesn't matter in this specific situation but I wanted to mention it anyways, because it is used in C-Fortran interoperability APIs.

@haidarazzam
Copy link

haidarazzam commented Feb 28, 2024

Suggestion:
I would suggest to replace all MpiInt originally int into Int (similar to when int was replaced by Int), so we know all integer computation and indices work with the same typedef (32 or 64), and then any call to MPI_XXX must be replaced by a wrapper WRAPPER_MPI_XX with Int used as input argument and inside it, got cast/verified_below_MAX_INT then call the corresponding MPI call.

This way, we gain two folds.

  1. first we will be able to check the return status of MPI (that was never done in BLACS) and fail with BI_BlacsErr in case of error
  2. second check if any int is above MAX_INT then we can either fail as unsupported for now and later if we found that scalapack need above MAX_INT communication, then we might figure out a solution learning from @jeffhammond work for BIGMPI. @jeffhammond I am happy to discuss the solution provided in BIGMPI to learn more how you provided a workaround so maybe we can add similar functionality in the wrappers. A naive solution that might work would be to create our own dataType (which wil work for point to point comm) but for global comm we might split the operation (Reduce, BACST) over chunk that fit MAX_INT, however it might be complex for async comm. Anyway I don't think scalapack needs above MAX_INT comm as most of the comm work for NB block.

I compiled a list of MPI function that are called from BLACS and it seems there is about 41 functions that need wrappers.
any suggestions are welcome.

MPI_Barrier
MPI_Error_class
MPI_Finalize
MPI_Abort
MPI_Init
MPI_Initialized

MPI_Allreduce
MPI_Bcast
MPI_Irecv
MPI_Isend
MPI_Recv
MPI_Reduce
MPI_Rsend
MPI_Send
MPI_Sendrecv
MPI_Testall
MPI_Waitall

MPI_Comm_f2c
MPI_Comm_c2f
MPI_Comm_create
MPI_Comm_dup
MPI_Comm_free
MPI_Comm_get_attr
MPI_Comm_group
MPI_Comm_rank
MPI_Comm_size
MPI_Comm_split
MPI_Group_free
MPI_Group_incl

MPI_Unpack
MPI_Pack
MPI_Pack_size
MPI_Get_count
MPI_Op_create
MPI_Op_free
MPI_Type_commit
MPI_Type_create_struct
MPI_Type_free
MPI_Type_indexed
MPI_Type_vector
MPI_Type_match_size

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants