Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up bufr_dupupr.f code and generation of uprair bufr dump files #18

Open
ilianagenkova opened this issue Dec 4, 2023 · 6 comments
Assignees

Comments

@ilianagenkova
Copy link
Contributor

bufr_dupupr.f is a code developed by Dennis Keiser and updated by Chris Hill, and it was added to the obsproc v1.2.0 release
It is the slowest step in the generation of uprairbufr_d files and leads to long gdas and gfs dump run times, unacceptable by NCO in their current ecf configuration (kick off times).

Two changes were tested and made to speed up the code:

  • latitude presence check - reduced the number of pressure levels in a profile checked for missing latitudes, from 25 to 5, saving ~150s of processing time
  • sorting the array with all profiles by receipt time stamp is skipped (sorting by obs time stamp and station id remains)

One more change was considered, but not implemented - turn off the CORN mnemonic check (was a profile correction done by data provider). While this mnemonic is good to have in the bufr file, and it's in the prepbufr layout, it's not actually written in the prepbufr file , and it's not read by GSI. It would have ~150s . This could be turned off when the next amount of TAC profiles is replaced by BUFR high res profiles, as an easy "speed up" solution.

What else to explore? (scope of this task)

  • Investigate the use of UFBTAB in bufr_dupupr and consider replacing it with faster (according to J.Woolen) UFBTAM or UFBMEM which may speed up the reading/writing process, but may require opening the file for read with a different bufrlib command.
  • Evaluate ratio of TAConly/TAC&BUFR/BUFRonly profiles in the tanks and consider turning off the check for TAC&BUFR stations (if that saves us time)
    Study the code and test replacing the UFBTAB calls, if help is needed, turn to Jack Woollen and Ron McLaren.

At the current time, we are working with SPAs to start the global dump steps earlier and are evaluating the impact on data loss, as a temporary solution in starting to use bufr profiles asap.

@rmclaren
Copy link

rmclaren commented Dec 6, 2023

Can't find bufr_dupupr.f..

@ilianagenkova
Copy link
Contributor Author

I should have provided a path- [https://github.com/NOAA-EMC/bufr-dump/blob/release/bufr_dump.v1.2.0/sorc/bufr_dupupr.fd/dupupr.f]

@rmclaren
Copy link

rmclaren commented Dec 7, 2023

@ilianagenkova What type of input file does this program take? Just FYI, this code does not seem to be compatible with the gfortran compiler as the formattig specifier "Q" is none standard and unsupported. Are there any special compiler flags or procedures to compile on this platform. I guess I should look at the build.sh script..

@rmclaren
Copy link

rmclaren commented Dec 7, 2023

Think I found my answer in CMakelists.txt:

# Compiler check.
if(NOT CMAKE_Fortran_COMPILER_ID MATCHES "^(Intel)$")
  message(WARNING "Compiler not officially supported: ${CMAKE_Fortran_COMPILER_ID}")
endif()

@ilianagenkova
Copy link
Contributor Author

@rmclaren, we cd to /bufr-dump and run ./ush/build.sh

I am happy to test run your code changes, b/c it's not trivial to run just the executable bufr_dupupr

@rmclaren
Copy link

@ilianagenkova So from what I can tell this code reads and combines a selection of subtypes out of the \b002 tank (one of them being xx101) to create the uprair dump file. Basically it reads the data, combines (ordered by message timestamp) and cleans the data, then writes the new data file. Is this correct?

Some questions:

  1. Could you list all the subtypes that are being read (xx101... any others (?))?
  2. Can you have files from different days if time window specifies this?
  3. Could you list the things its doing during the cleaning phase (at a high level). Is the following complete?
    a) Throws away message subsets with out of bounds WMO block number (CRPID ranges 0 to 99)
    b) Find file with valid LAT,LON coords (and warn)???
    c) Mark corrected report fields (CORN ???)
    d) Set missing minutes field to 0
    e) Order message subsets by timestamp
    f) Remove duplicate messages subsets
    g) Trim data to exact start/end times
  4. What do you get for the timestamps you print out.
  5. Who uses the output file and what format do they ultimately use (netcdf, bufr...??).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants