You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For nearest-neighbor remapping, if there are equidistant source points, there is currently some logic that says that, if there are equidistant source points, arbitrarily use the point with the smallest ID. But, according to @oehmke , that logic isn't done in the multi-processor case, because currently the IDs aren't sent between processors. This results in nearest-neighbor mapping giving different results with different processor counts if there are equidistant source points. @oehmke proposes adding a send of the IDs so that the multi-processor case can break ties using the ID, similarly to in the single-processor case.
In CTSM, we use ESMF to read some input files. One particular pair of input files, specifying crop sowing window start and end dates, is at half-degree resolution. We tell ESMF to do nearest-neighbor1 spatial interpolation as necessary to match the simulation grid.
When I do a run at 10°x15° resolution, some of the simulation gridcell centers are located exactly at the "corners" of four half-degree input pixels, meaning that those four neighbors are equally near. It doesn't matter to me which of those ESMF chooses as the "nearest neighbor," as long as it's consistent.
Unfortunately, it's not: At least one gridcell has a different "nearest neighbor" chosen depending on how many processors the job is split across.
As an example, I've made a figure based on two cases that are identical in setup except that Case 1 used 128 processors and Case 2 used 64. Due to this issue, a certain crop in the gridcell centered at latitude 0, longitude 30°E2 gets sowing window of days 7-82 in Case 1 and 336-46 in Case 2.
The white/gray/black in this figure represents the half-degree sowing window files. Gray pixels match the values in Case 1, black pixels match Case 2, and white pixels match neither. The red lines intersect at the center of the 10x15 CTSM gridcell.
It looks like Case 1 reads from the pixel to the southwest, whereas Case 2 reads from the pixel to the northwest.
Some notes:
I'm not 100% certain this is an ESMF issue as opposed to something weird that CTSM is doing, but I'm at the point where I've done all the troubleshooting I can within CTSM.
This reproduces every time, over dozens of tests.
Tagging @ekluzek, @billsacks, and @briandobbins, who have expressed interest in this. By the way, I think I mentioned to y'all that I was having an ERP test pass but the equivalent PEM test fail—this is why! The read of sowing windows only happens at the very beginning of the test, so changing processor count halfway through makes no difference.
It needs to be nearest-neighbor because dates are modulo—interpolating between Jan. 2 [day 2] and Dec. 31 (day 365) should give Jan. 1 (day 1), not July 3-4 (day [2+365]/2 = 183.5)—and that's not something ESMF can do, to my knowledge. ↩
There are other crops in this gridcell that also get different sowing windows. There are no crops in any other gridcell that get different sowing windows, but that doesn't necessarily mean different "nearest" neighbors are getting chosen. That might be happening, just with input pixels that don't differ. ↩
The text was updated successfully, but these errors were encountered:
Following up: Is this something that's on the roadmap to be in the ESMF version used in the CESM3 release? No worries if not, but in that case I'll need to make some of my tooling more robust and official.
Yep, it's on the roadmap to ESMF 8.8.0, which is what we're targeting for CESM3. I'm hoping to get it done soon-ish, so we can make sure that it works awhile before the release.
For nearest-neighbor remapping, if there are equidistant source points, there is currently some logic that says that, if there are equidistant source points, arbitrarily use the point with the smallest ID. But, according to @oehmke , that logic isn't done in the multi-processor case, because currently the IDs aren't sent between processors. This results in nearest-neighbor mapping giving different results with different processor counts if there are equidistant source points. @oehmke proposes adding a send of the IDs so that the multi-processor case can break ties using the ID, similarly to in the single-processor case.
Discussed in https://github.com/orgs/esmf-org/discussions/261
Originally posted by samsrabin July 10, 2024
Requirements
Affiliation(s)
NSF-NCAR
ESMF Version
No response
Issue
In CTSM, we use ESMF to read some input files. One particular pair of input files, specifying crop sowing window start and end dates, is at half-degree resolution. We tell ESMF to do nearest-neighbor1 spatial interpolation as necessary to match the simulation grid.
When I do a run at 10°x15° resolution, some of the simulation gridcell centers are located exactly at the "corners" of four half-degree input pixels, meaning that those four neighbors are equally near. It doesn't matter to me which of those ESMF chooses as the "nearest neighbor," as long as it's consistent.
Unfortunately, it's not: At least one gridcell has a different "nearest neighbor" chosen depending on how many processors the job is split across.
As an example, I've made a figure based on two cases that are identical in setup except that Case 1 used 128 processors and Case 2 used 64. Due to this issue, a certain crop in the gridcell centered at latitude 0, longitude 30°E2 gets sowing window of days 7-82 in Case 1 and 336-46 in Case 2.
The white/gray/black in this figure represents the half-degree sowing window files. Gray pixels match the values in Case 1, black pixels match Case 2, and white pixels match neither. The red lines intersect at the center of the 10x15 CTSM gridcell.
It looks like Case 1 reads from the pixel to the southwest, whereas Case 2 reads from the pixel to the northwest.
Some notes:
Tagging @ekluzek, @billsacks, and @briandobbins, who have expressed interest in this. By the way, I think I mentioned to y'all that I was having an ERP test pass but the equivalent PEM test fail—this is why! The read of sowing windows only happens at the very beginning of the test, so changing processor count halfway through makes no difference.
Autotag
@oehmke
Footnotes
It needs to be nearest-neighbor because dates are modulo—interpolating between Jan. 2 [day 2] and Dec. 31 (day 365) should give Jan. 1 (day 1), not July 3-4 (day [2+365]/2 = 183.5)—and that's not something ESMF can do, to my knowledge. ↩
There are other crops in this gridcell that also get different sowing windows. There are no crops in any other gridcell that get different sowing windows, but that doesn't necessarily mean different "nearest" neighbors are getting chosen. That might be happening, just with input pixels that don't differ. ↩
The text was updated successfully, but these errors were encountered: