-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature suggestion] OpenMP Parallelization #1
Comments
Hi, @sthartman |
I've just updated. I've tested the bands_wt_layer to check that it reproduces the same results as the serial version within rounding error. I don't remember if I ever got the bands_wt_all subroutine working correctly in parallel, since I didn't need it for my particular case. It looks like I tried to parallelize it, it shouldn't be that different than the other one. Unfortunately I don't think I have my input files any more or I'd test its accuracy myself. |
@sthartman Hi, Steven, I am sorry for the delayed reply. I almost forgot this depository. Many thanks for your work on the program. Have you already updated the subroutine of the program? In fact, my student recently worked out interfaces to the LCAO code ABCUS and Phonopy. In addition, he replaced the FFT part with that from the intel lib, which can also speed up calculations significantly. We will update the program soon. |
the bands_wt_layer in mod_kproj.f of my fork definitely works, i think bands_wt_all also works but it's been 4 months since I did it so I don't remember the details. |
Hi, @sthartman . |
Hi, @sthartman ! |
That's interesting. It's possible that the segfault could be a memory problem, the big array is shared between threads but some of the other variables are duplicated. What value of OMP_NUM_THREADS are you using? |
@sthartman |
The test you're asking about, do you mean putting the $omp parallel in? I can try to do that later today, if I can find any of my input files. |
Yes,it is. |
Going back and doing some tests, it seems that I have to put export OMP_STACKSIZE=1024m in my batch script, to set the environment variable, or else it segfaults. The default OMP_STACKSIZE is very small, only a few MB, is there a chance that it's overflowing the default stack when you run the larger 6x6 graphene example? The system I'm testing is a big heterointerface supercell with INKPROJ: I tried adding an !$OMP PARALLEL on the line before my $OMP PARALLEL DO, but the compiler (ifort) threw an error I don't think it liked the syntax of having two commands to begin the parallel section, but only one OMP END PARALLEL DO. I also tried adding another OMP END PARALLEL after OMP END PARALLEL DO, but it errored with an abort trap signal - I think it considers that to be nested parallelism, but with the wrong syntax. |
I got faster speed during using "ulimit -s unlimited" and "export OMP_STACKSIZE=1024m" . As you said ,it's a memory problem . But usually users don't take the initiative to set these. I will try to find a better solution. |
I try to move "fft_allocate and allocate(q_tmp)" and add "deallocate(q_pw,q_tmp)" into parallel region to aovid memory problem . Though this solves the probelm ,it will slow the parallel speed by 30 % . Because large arrays are allocated and deallocated repeatedly in every threads . |
That makes sense, the default stacksize is around 4 MB I think, it will be very hard to contain the thread within that size. What are the dimensions of q_pw? I couldn't tell from looking at the fft_allocate subroutine. If the concern is only that less experienced users will not know to set omp_stacksize correctly, it might be easier just to leave the standard compiler options in the makefile, with a commented-out alternate line which has the -qopenmp option also. If kproj compiled without the -qopenmp option, the compiler should ignore the openmp directives in mod_kproj.f and compile serial code. That way, a user will only get the parallel version if they know what they are doing and choose it on purpose. |
The size of q_pw is usually about 100 * 100 * 100 * nspinor, which is related to the energy cutoff.I have added ”-qopenmp” in makefile.
We can estimate the size of q_pw, and then add the corresponding omp_stack_ size in makefile.Do you mean this?
|
I don't know if it's possible to set environment variables in the makefile, but if it is, that would be the simplest way to do it. Normally I set it as a runtime environment variable in my SLURM batch script, and I just set it to some large value like 1024m. As far as I know, there's no drawback to setting a large value as long as OMP_STACKSIZE x OMP_NUM_THREADS doesn't exceed the computer's memory. |
Hello,
I've been working on parallelizing the KPROJ code using OpenMP, I have a minimal working version by now. I've gotten around 10x speedup by running on all 36 cores of a compute node - would you be interested in merging that once I get all the details sorted out and pushed to my fork?
The text was updated successfully, but these errors were encountered: