Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralize test data and reduce unnecessary copying of test data #147

Merged
merged 8 commits into from
Sep 10, 2024

Conversation

guoqing-noaa
Copy link
Collaborator

@guoqing-noaa guoqing-noaa commented Sep 5, 2024

This PR is the first step to centralize test data and reduce unnecessary copying.
It addresses issue #93

As a byproduct, it also addresses issue #127

All these will reduce the unnecessary turnaround time to speed up the build process and save disk space.

@guoqing-noaa guoqing-noaa marked this pull request as draft September 5, 2024 06:28
@guoqing-noaa guoqing-noaa marked this pull request as ready for review September 5, 2024 06:38
bundle/CMakeLists.txt Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
@SamuelDegelia-NOAA
Copy link
Contributor

SamuelDegelia-NOAA commented Sep 5, 2024

Could you give a quick summary of how the test data gets installed now? It can be hard to understand just by looking through the code changes. Since you removed the RDAS_RRFS_DATA_ROOT variables from the module files, does that mean that all test data is being linked into RDASApp through ush/init.sh? If so, does that mean we need to run this script during build.sh?

Also, @delippi has made a good argument for using copies instead of links during this early development process of RDASApp. For example, I've found some cases where I needed to manually modify obs files to determine why something was breaking. This is a little harder when using links for everything. Just giving my two cents (and open to other opinions), but I perfer if we first copy the test data into RDASApp/bundle during the initial build, but then utilize links for the ctests (since those directories will likely not be accessed manually and since we need to make multiple copies for each ctest).

@guoqing-noaa guoqing-noaa changed the title Reduce copying of test data Centralize test data and reduce unnecessary copying of test data Sep 5, 2024
@guoqing-noaa
Copy link
Collaborator Author

guoqing-noaa commented Sep 5, 2024

@SamuelDegelia-NOAA I will test this PR on all platforms today and get back to you with more information.
To answer some of the questions:

  1. init.sh is already in build.sh (implicitly through link_mpasjedi_expr.sh). We can make it explicit if you want to do this. Yes, we need to run init.sh in the build step.
  2. test data are already in the repo through links:
 rrfs-data_mpasjedi_2024052700 -> ../../expr/mpas_2024052700/
 rrfs-data_fv3jedi_2022052619 -> ../../fix/expr_data/rrfs-data_fv3jedi_2022052619/

The link_mpasjedi_expr.sh will create the expr/ directory and init.sh will initialize the fix directory (both through build.sh)

  1. For the developers' need to modify some files, there are lots of options to use, such as:
cp -rL rrfs-data_fv3jedi_2022052619 tmp
rm rrfs-data_fv3jedi_2022052619
mv tmp rrfs-data_fv3jedi_2022052619

If you can post specific cases, we may come up with simpler direct solutions.
Thanks!

@SamuelDegelia-NOAA
Copy link
Contributor

Ah that is right, init.sh is already part of link_mpasjedi_expr.sh so that is how everything gets linked in. Regarding #​2, it is true that you can convert the link to a copy relatively easily. Maybe we could add those commands to the wiki somewhere since at least myself and maybe others would plan to use them frequently.

@ShunLiu-NOAA
Copy link

@guoqing-noaa and @TingLei-NOAA Based on the meeting on Thursday, what's the next step for this PR?

@guoqing-noaa
Copy link
Collaborator Author

@guoqing-noaa and @TingLei-NOAA Based on the meeting on Thursday, what's the next step for this PR?

I am working on testing this on Hera/Jet/Orion/Hercules and making sure all rrfs tests as well as all JCSDA tests pass.

@guoqing-noaa
Copy link
Collaborator Author

guoqing-noaa commented Sep 8, 2024

For reference, here are the 57 MPASJEDI test coming from JCSDA:
(under build/mpas-jedi, run ctest -N)

  Test #1426: mpasjedi_coding_norms                                                                                                                                         
  Test #1427: test_mpasjedi_geometry
  Test #1428: test_mpasjedi_state
  Test #1429: test_mpasjedi_model
  Test #1430: test_mpasjedi_increment
  Test #1431: test_mpasjedi_errorcovariance
  Test #1432: test_mpasjedi_linvarcha
  Test #1433: test_mpasjedi_unsinterp_4pe
  Test #1434: test_mpasjedi_geometry_iterator_2d_2pe
  Test #1435: test_mpasjedi_geometry_iterator_3d_2pe
  Test #1436: test_mpasjedi_getvalues
  Test #1437: test_mpasjedi_obslocalization
  Test #1438: test_mpasjedi_obslocalization_vertical
  Test #1439: test_mpasjedi_obslocalizations
  Test #1440: test_mpasjedi_forecast
  Test #1441: test_mpasjedi_hofx3d
  Test #1442: test_mpasjedi_hofx3d_nbam
  Test #1443: test_mpasjedi_hofx4d
  Test #1444: test_mpasjedi_hofx4d_pseudo
  Test #1445: test_mpasjedi_convertstate_bumpinterp
  Test #1446: test_mpasjedi_convertstate_unsinterp
  Test #1447: test_mpasjedi_converttostructuredgrid_latlon
  Test #1448: test_mpasjedi_parameters_bumpcov
  Test #1449: test_mpasjedi_parameters_bumploc
  Test #1450: test_mpasjedi_gen_ens_pert_B
  Test #1451: test_mpasjedi_dirac_bumpcov
  Test #1452: test_mpasjedi_dirac_bumploc
  Test #1453: test_mpasjedi_dirac_noloc
  Test #1454: test_mpasjedi_dirac_spectral_1
  Test #1455: test_mpasjedi_3dvar
  Test #1456: test_mpasjedi_3dvar_bumpcov
  Test #1457: test_mpasjedi_3dvar_bumpcov_nbam
  Test #1458: test_mpasjedi_3denvar_bumploc
  Test #1459: test_mpasjedi_3denvar_dual_resolution
  Test #1460: test_mpasjedi_3denvar_2stream_bumploc
  Test #1461: test_mpasjedi_3denvar_amsua_allsky
  Test #1462: test_mpasjedi_3denvar_amsua_bc
  Test #1463: test_mpasjedi_3dhybrid_bumpcov_bumploc
  Test #1464: test_mpasjedi_3dfgat
  Test #1465: test_mpasjedi_3dfgat_pseudo
  Test #1466: test_mpasjedi_4denvar_ID
  Test #1467: test_mpasjedi_4denvar_VarBC
  Test #1468: test_mpasjedi_4denvar_VarBC_nonpar
  Test #1469: test_mpasjedi_4denvar_bumploc
  Test #1470: test_mpasjedi_4dhybrid_bumpcov_bumploc                                                                                                                        
  Test #1471: test_mpasjedi_4dfgat
  Test #1472: test_mpasjedi_eda_3dhybrid
  Test #1473: test_mpasjedi_rtpp
  Test #1474: test_mpasjedi_ens_mean_variance
  Test #1475: test_mpasjedi_letkf_3dloc_4pe
  Test #1476: test_mpasjedi_lgetkf_4pe
  Test #1477: test_mpasjedi_lgetkf_height_vloc
  Test #1478: test_mpasjedi_forecast_2pe
  Test #1479: test_mpasjedi_parameters_bumpcov_2pe
  Test #1480: test_mpasjedi_parameters_bumploc_2pe
  Test #1481: test_mpasjedi_3dvar_2pe
  Test #1482: test_mpasjedi_3dhybrid_bumpcov_bumploc_2pe

@guoqing-noaa
Copy link
Collaborator Author

guoqing-noaa commented Sep 9, 2024

one update on this:
MPASJEDI tests will not write any files into the mpas-jedi-data/ directory, but they check whether some directories are writable. If not writable, the test will fail as follows:

ERROR: In file ./Data/480km/streams.atmosphere, definition of stream "restart" references directory ./Data/480km/bg without write permission.
CRITICAL ERROR: xml stream parser failed: ./Data/480km/streams.atmosphere
Logging complete.  Closing file at 2024/09/08 17:35:14

This was fixed by creating directories as needed and then linking static files to each corresponding directory.
Now 52 tests passed with 5 failed:

The following tests FAILED:
        1461 - test_mpasjedi_3denvar_amsua_allsky (Failed)
        1462 - test_mpasjedi_3denvar_amsua_bc (Failed)
        1467 - test_mpasjedi_4denvar_VarBC (Failed)
        1468 - test_mpasjedi_4denvar_VarBC_nonpar (Failed)
        1477 - test_mpasjedi_lgetkf_height_vloc (Failed)

Working further on the 5 tests failed.

@guoqing-noaa
Copy link
Collaborator Author

guoqing-noaa commented Sep 9, 2024

okay, per my manual testing, the above 5 tests also failed in the current RDASApp. so these 5 failed tests are most likely NOT due to this PR.

I'm now making a fresh clone of the current RDASApp to double-check this. If confirmed, I will create an issue for this.

@guoqing-noaa
Copy link
Collaborator Author

As reported in issue #158, the mentioned 5 failed tests are not due to any changes in this PR at the moment.

    track the jcsda data through fix/ and cleanup env variables etc who are no longer needed
@guoqing-noaa
Copy link
Collaborator Author

guoqing-noaa commented Sep 9, 2024

@ShunLiu-NOAA @TingLei-NOAA @TingLei-NOAA @SamuelDegelia-NOAA

This PR is ready for another round of review. All rrfs tests passed and all "currently-must-pass" mpasjedi tests passed.
The failed 5 mpasjedi tests are not due to this PR but they failed with the current RDASApp already.

A few upcoming PRs depend on this, such as using mpasout files, updating to mpasjedi v3.0, etc. I think we can merge this PR while fixing those 5 ctest issues later (for example, @SamuelDegelia-NOAA reported in PR #158 that we will need to wait for a https://github.com/JCSDA-internal/saber/pull/928 PR to resolve some of the failures).

@TingLei-NOAA
Copy link
Contributor

@guoqing-noaa Please allow me wait for all the mpas-jedi ctests pass.

@ShunLiu-NOAA @TingLei-NOAA @TingLei-NOAA @SamuelDegelia-NOAA

This PR is ready for another round of review. All rrfs tests passed and all "currently-must-pass" mpasjedi tests passed. The failed 5 mpasjedi tests are not due to this PR but they failed with the current RDASApp already.

A few upcoming PRs depend on this, such as using mpasout files, updating to mpasjedi v3.0, etc. I think we can merge this PR while fixing those 5 ctest issues later (for example, @SamuelDegelia-NOAA reported in PR #158 that we will need to wait for a https://github.com/JCSDA-internal/saber/pull/928 PR to resolve some of the failures).

@quoging, could you use ctest -VV option to run those failed mpas-jedi tests and point me to the ctest output?

@guoqing-noaa
Copy link
Collaborator Author

@TingLei-NOAA
Here is my testing log file:

/scratch1/BMC/wrfruc/gge/tmp/rdas_build_test/RDASApp_guoqing-noaa_reduce_copying2/build/mpas-jedi/Testing/Temporary/LastTest.log

But running those tests from your directory may facilitate debugging the problem.
You can find a directory on Hera, copy and run the following commands, wait for an hour or so and then it is ready for your further examination:

git clone https://github.com/rrfsx/rdas_build_test.git
cd rdas_build_test
./build_test.sh guoqing-noaa reduce_copying fv3-cam  # fv3-cam can be replaced by da-cpu

@guoqing-noaa
Copy link
Collaborator Author

guoqing-noaa commented Sep 9, 2024

For reference, here is the command to run the 5 failed mpasjedi tests

ctest -VV -R "test_mpasjedi_3denvar_amsua_allsky|test_mpasjedi_3denvar_amsua_bc|test_mpasjedi_4denvar_VarBC|test_mpasjedi_4denvar_VarBC_nonpar|test_mpasjedi_lgetkf_height_vloc"

@TingLei-NOAA
Copy link
Contributor

TingLei-NOAA commented Sep 9, 2024

For reference, here is the command to run the 5 failed mpasjedi tests

ctest -R "test_mpasjedi_3denvar_amsua_allsky|test_mpasjedi_3denvar_amsua_bc|test_mpasjedi_4denvar_VarBC|test_mpasjedi_4denvar_VarBC_nonpar|test_mpasjedi_lgetkf_height_vloc"

@guoqing-noaa Thanks for sharing the command, however, would you please point to your output? If you haven't run using -VV option, could you do that and then point me to the output? I need to see the output from the ctest command.
The ctest log files don't have enough information I want to check.

@guoqing-noaa
Copy link
Collaborator Author

Sorry, I forgot the `-VV' option. I just rerun it. It should be available in a few minutes at the same location:

/scratch1/BMC/wrfruc/gge/tmp/rdas_build_test/RDASApp_guoqing-noaa_reduce_copying2/build/mpas-jedi/Testing/Temporary/LastTest.log

@guoqing-noaa
Copy link
Collaborator Author

@TingLei-NOAA 5 tests were completed.

@TingLei-NOAA
Copy link
Contributor

@guoqing-noaa Thanks. Yes. those output seem good for this PR. I checked the source files to which the symbolic links of those input data point to are in your own account, (not checked if all of them are). So, I will do a test on my side and let you know how they go.

@guoqing-noaa
Copy link
Collaborator Author

I will go ahead to merge this PR soon to facilitate subsequent PRs and we can continue working on the failed 5 tests which may take a few weeks to get finally resolved.

@TingLei-NOAA
Copy link
Contributor

@guoqing-noaa I am not concerned for those failed 5 ctests on your account. Are there other users using this PR and have no issues with mpas-jedi tests for writing permissions? I would suggest to give more time to the reviewing process.

@guoqing-noaa
Copy link
Collaborator Author

@TingLei-NOAA As far as I know, there is no writing permission issue here. If you experience that, could you send me your run directory and I can take a look.

This PR does not change any codes, but just centralize and reduce the unnecessary data copying. All rrfs tests passed and all mpasjedi tests passed except those already failed in current RDASApp.

@TingLei-NOAA
Copy link
Contributor

@guoqing-noaa As I mentioned, we had writing permission issues related to mpas-jedi ctests. Using your branch just cleanly cloned/building, all mpas-jedi ctests passed. But the problem is that: in this PR, on hera: all data are stored in the role account :/ /scratch1/NCEPDEV/fv3-cam/RDAS_DATA/fix/ . So, the success of this PR on hera run by myself or you doesn't show it would work for users who has no permission to the above location.
To run this PR on orion, using the same commands as you gave , all the ctests are missing as :

 tlei$ ctest -N
Test project /work/noaa/fv3-cam/tlei/dr-guoqing/rdas_build_test/RDASApp_guoqing-noaa_reduce_copying/build

Total Tests: 0

I am trying to find what the problem is.

@guoqing-noaa
Copy link
Collaborator Author

guoqing-noaa commented Sep 10, 2024

@TingLei-NOAA The fix file directories are globally readable. They are NOT writable by anybody except the role.rrfsfix role account. The fix file directory on Hera right now is just a link to /scratch2/BMC/rtrr/RDAS_DATA

We created all required mpas-jedi-data directories and subdirectories in the RDASApp repo. So the mpasjedi tests actually "write" to users' local directories, not the fix file directories

I cannot access your Orion directory as permission denied. Could you change it to be readable for me?

Also, I would NOT suggest you do tests on Orion as there are building complications reported by @SamuelDegelia-NOAA .
If you can access Hercules, could you do a test there? Or Jet? Thanks!

@guoqing-noaa
Copy link
Collaborator Author

I am merging this PR and hope issue #163 will help resolve the 5 failed mpas-jedi tests in current RDASApp.

@guoqing-noaa guoqing-noaa merged commit d8c20b7 into NOAA-EMC:develop Sep 10, 2024
1 check passed
@guoqing-noaa guoqing-noaa deleted the reduce_copying branch September 10, 2024 18:53
@guoqing-noaa guoqing-noaa restored the reduce_copying branch September 10, 2024 19:02
@TingLei-NOAA
Copy link
Contributor

A update: the PR works on orion except for a couple of mpasjedi 's failure for non-prepoducible results which don't matter to this PR.

@guoqing-noaa
Copy link
Collaborator Author

A update: the PR works on orion except for a couple of mpasjedi 's failure for non-prepoducible results which don't matter to this PR.

@TingLei-NOAA Thanks a lot for doing the test on Orion and reporting the results.

@guoqing-noaa guoqing-noaa deleted the reduce_copying branch October 15, 2024 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants