-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable multiple resolutions - first step #151
base: develop
Are you sure you want to change the base?
Conversation
Present table in my sample program (currently "ectrans-benchmark", to be removed again before merge) after trans_release and before trans_end: host:0x73d2840 device:0x331a00000 size:1180032 presentcount:0+1 line:46 name:alloc%ptr(:) and after trans_end |
892fe0d
to
05790e7
Compare
Nice work @lukasm91! This will probably take a while to review properly. In the mean time, has something gone wrong with a rebase? I see duplicated commits here from develop. |
I think it is because I was merging with master. Do you prefer rebases? If yes, I might restructure the commits into three new commits or so. And yes, there are conflicts after the merge today, but I can easily resolve those. |
Personally I have no preference of rebasing over merging, as long as commits aren't duplicated. It looks like that's the case, but perhaps I just don't understand how Github presents merging. |
0448fda
to
8debe62
Compare
Good question, I am not sure what it was showing; maybe it was because I was starting from Willems branch (which was merged in the meantime), and then I merged develop later. In any case, I quickly restructured the commits into three commits (from which the last commit we will remove before merge; this is just for show-casing the usage). It is still correct that Willems transi branch can just be merged into this and it will work (except vordivtouv/uvtovordiv, which will we have to fix separately). |
I do prefer rebases yes, and a cleaned up history :) |
eb466a9
to
7ed5169
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a huge first step! It looks really nicely done!
Question on the ASSOCIATE statements. Are they required for OpenACC to work or is it to introduce minimal changes?
The ectrans-benchmark looks a little bit too inflated to my liking, though I understand the reasoning. Could we not create a smaller test that just tests that the resolutions are properly deallocated or uniquely using their own memory?
Yes, I will revert the ectrans-benchmark. I just did that to test and show how it could be used. I think we will have to test it properly, through the transi tests, so that will happen. But I wanted to have you a look at how this can now be used. The ASSOCIATE at this point are really to keep the changes to a minimum. In a second step, we can remove then, and we should then also check if they have a performance impact. |
What would be the performance cost of having arrays like ZAA in JPRD only, and reusable for single and double precision? |
The GEMMs can become very expensive, in terms of runtime but also memory footprint.Storage in the order of tenths of GB. I doubt that we want to store them in double precision on CPU, but even less on GPU. I think those arrays have to be stored with as little precision as possible. |
So then opposite approach, could it be in JPRM and still give desired results for double precision? |
For the legendre coefficients, this could indeed be worth a test. We store one wave number in double precision for a reason, but I don't know if a "mixed precision" gemm would be sufficient in that case, too (Andreas or Sam might know the details). We do the single precision GEMM at a even lower precision, so who knows, it could be sufficient. For sure not something we should do in this PR, but I can only encourage experiments of doing certain things in lower precision. |
I am happy to hear you think it's an avenue worth exploring. That's what I wanted to get to. This is of course out of scope of this PR. |
I think the question being asked is: what would be the implications of doing DGEMM for m=0 and SGEMM for m>0 in the dp version of ecTrans, like is already done in the sp version? Is that right? |
7ed5169
to
3cde5bc
Compare
This reverts commit 3cde5bc.
https://www.openacc.org/sites/default/files/inline-files/OpenACC.2.7.pdf section 2.14.1 (p62) is about !$acc init If CrayFortran does not support this, then probably following guard should be used? |
Yeah that should do the trick. |
According to |
If it works, then yes :) |
? |
[ 97%] Building Fortran object ectrans/src/programs/CMakeFiles/ectrans-benchmark-gpu-dp.dir/ectrans-benchmark.F90.o Global is external, but doesn't have external or weak linkage! ptr @"str2int$ectrans_benchmark_" Global is external, but doesn't have external or weak linkage! ptr @"print_help$ectrans_benchmark_" LLVM ERROR: Broken module found, compilation aborted! ftn-2116 ftn: INTERNAL
This sounds weird that it only shows up now, but we ICE can be caused as surprizing side effects ... I moved the code now. Same in the other program benchmark? |
Following changes have been done now:
One general suggestion: If you have more comments, even if they apply to multiple places, can you just add a comment in one case, then I can reply one-by-one and we have threads to discuss. This might be easier. |
216597a
to
1c5f52e
Compare
Unfortunately this branch (with the tweaks I just suggested) doesn't run on MI250x (LUMI-G) so we'll have to rectify that before we can merge this. Firstly I see many instances of this warning:
and eventually the benchmark program crashes with |
I will rebase, yes. But regarding the problem with Cray Compilers - isn't this just a warning, could we put ASYNC into ifdefs for the problematic instances? I can't reallz help with the parse error, but I don't think this can be related to derived types? Have you checked if it is actually using the not inplace FFT, because I think this caused the same error? |
Yeah, I'll take some time this week to look closer and see what's going on. |
And LUMI just announced a 4-day service break :) |
… reverted once ecmwf-ifs#151 is merged
Sorry, I'm not making any progress understanding the issue on Cray systems. I'm using Frontier to debug this. If I build ecTrans and FIAT "manually" through CMake invocations etc., everything seems to work. If I build both using an ecBundle-based system, as I usually do, I get the warning mentioned above and either this error message:
OR it does "work" but gives spectral error norms of ~ 1.0, which looks like at some point the spectral fields are just being zeroed. I've seen both these behaviours while debugging but I don't understand fully the conditions under which each one occurs. Annoying. |
This PR is a first step towards enabling running multiple resolutions (required for transi).
trans_end
/trans_release
.ASSOCIATE
to make changes less intrusive, we can clean the associates up in a second step)ncur_resol
to fft and gemms to be able to cache multiple plansWe have to revert the ectrans-benchmark program before merge.