Skip to content

Instructions for using Dan Grosvenor's post processing scripts

dangrosvenor edited this page Apr 28, 2023 · 6 revisions

Git stuff

You can copy the git repository (or clone it so that you can get updates, or make changes perhaps) to either Monsoon or Jasmin (or both) – it should work on either. It depends where your .pp files are.

Below are some instructions (from Ruth Price) for how to deal with Git on Monsoon and Jasmin – I think you need to set up an ssh key for it (you may already have one for Monsoon and/or Jasmin, so you can use that) to do any commits, etc. Then I think you would do :-

git clone [email protected]:dangrosvenor/Python_plotting_scripts.git

Or to do it without ssh keys you can do :-

git clone https://github.com/dangrosvenor/Python_plotting_scripts.git

Instructions for setting up Git on Jasmin and Monsoon from Ruth Price

I link them using the github website. Go to the website and navigate to the page to create a new repository (assuming you have an account, if not then you'll need to get one first). Give the repository the same name as the one you have on jasmin already, make it private if you don't want anyone else to be able to see it and then press create. On the next page it will have some instructions for how to link your new github repository to the one you have on jasmin ("push an existing repository from the command line").

You might have already found this out if you're already using git on jasmin, but to push/pull commits to the repository from jasmin you'll need to use their ssh method instead of their http method. Instructions here:

https://help.github.com/en/github/authenticating-to-github/connecting-to-github-with-ssh

and I have this working too so if you have any problems then I can try and help.

To get repository on to monsoon (apologies if you know this already):

On monsoon, cd to where you want the directory to be, then run git clone where is found from looking at your repository on the github website and using "Clone or download". Make sure you use the SSH option.

That should be all you need to do! I've tried to keep this brief but feel free to come back to me if anything needs explaining more or doesn't work.

Ruth

Instructions for running Dan's post-processing scripts

Clone the git plotting directory (see above) to somewhere - ideally to ~/py/plotting If not here then need to set where it is as py_dir variable in :- DRIVER_plot_contours_multi_with_timser.py

You tell it what variables to extract in a file called :-

  • NML_set_vars_XXX.py

And what runs to process using :-

  • NML_set_runs_XXX.py

You can choose what is so that you can have different scripts for different projects perhaps. I’ve set up some test ones where XXX = “_Hawaii”. This works on some of my data to extract the SW TOA fluxes as a NetCDF file.

In NML_set_vars_XXX.py you need to specify the set of files to work on for each variable here :

  • output_filetype_multi.append('a.pd')

or similar. This will then use those files as a basis to process. The code tries to find the correct pa, pb, etc. file for the stash variables that it needs, but this only works if the other file (pa, pb, etc. file) has the same naming stucture as the one set above for output_filetype_multi. So, this means that the output frequency has to be the same (e.g., if it needs pb files then they need the same naming and output frequency as the pb files).

It may be useful to create a directory with soft links (using ln -s) to specific files if you only want to process a limited set of files.

Setting whether the variable is to be a vertical integral or not :-

  • Set vert_integral_multi to =1 for a vertical integral in NML_set_vars*.py for the specific variable by doing this :-
  • vert_integral_multi.append(1) or
  • vert_integral_multi.append(0)

Before running on Monsoon need to do :-

  • module load scitools/production_legacy-os43-2

Set it running like this :-

  • python2.7 DRIVER_plot_contours_multi_with_timser.py 0 _XXX

This tells it to use the NML_vars_XXX.py and the corresponding NML*_runs.py file The “0” here tells is to run from the terminal rather than submit to the queue system. To have it submit each variable to the queues change the zero above to a 1.

The output appears in the same directory as the input files in an output directory. One folder for each variable. One .nc file for each input file (will contain multiple times if each input file does) If you want to concatenate the .nc files into one there's a quick nco command for that :-

  • cdo mergetime *.nc merged.nc

Checking jobs on Monsoon

Monitor the status of the job using the qstat command. For example:

  • qstat -u $USER

This will report something like the following:

Job id Name User Time Use S Queue


300.xcd00 hello itsa 0 R parallel

The second field from the right indicates the status of the job. A value of R indicates that the job is currently running.

If you're interested in other people's jobs, then this functionality of qstat has been disabled to save excessive strain on the scheduler. To get round this, the information is written centrally every minute to a file, which can be queried with the qstat_snapshot command.

Pasted from http://collab.metoffice.gov.uk/twiki/bin/view/Support/CrayQuickStartGuide

Also, if want to know the full name of each job then can do :-

  • qstat -f

Gives lots of info, including the full name. Could grep this for Job_Name

Delete a job from the queue

qdel