Skip to content

Latest commit

 

History

History
36 lines (24 loc) · 3.64 KB

README.md

File metadata and controls

36 lines (24 loc) · 3.64 KB

eerie_dreq

Workings for EERIE data request

The Excel files in the xls directory were generated by dreqPy version 1.2.0 using the commands:

python3 -m venv ../../venvs/dreqpy
. ../../venvs/dreqpy/bin/activate
pip install dreqpy xlsxwriter
export LANG=en_GB.utf8
export LOCALE=en_GB.utf8

drq -m HighResMIP -e HighResMIP -p 3 --xls

The final PRIMAVERA data request is available at:

PRIMAVERA/PRIMAVERA_Data_Request_v01_00_13.xlsx

The first attempt at a data request is in the eerie_first_draft directory. PRIMAVERA_top_50.xlsx is from version 1.2.0 of dreqPy and I have modified the variables manually. HighResMIP.xlsx is the original unmodified output from dreqPy for reference and was originally named cmvmm_cm.hi_TOTAL_1_3.xlsx. In the modified spreadsheet I have aimed to keep the top approximately 50 most frequently accessed variables from PRIMAVERA using Appendix C from 10.5281/zenodo.3961931. Approximately 50 variables were chosen as this included the most frequently accessed monthly, daily and sub-hourly variables. This aim was achieved by keeping all variables in the Amon, Oday, SIday and day tables and a few high frequency variables. The following high frequency variables were added at a frequency of 6 hours: pr, psl, tas, uas, vas and zg. These variables were chosen because they are surface only and were the most frequently accessed in Appendix C. Most of these high frequency variables were available in the 6hrPlevPt table, but pr had to be taken from the custom Prim6hr table, which was not ideal but should be easy to implement; Prim6hr could also be renamed to something more appropriate for EERIE.

I am not very experienced with Excel and so this could be implemented in a more efficient way. I had to do much manual copying and pasting. It is also difficult to test that I have not made any errors. In a programming language I would write unit tests to check for errors. Ideally this spreadsheet should be carefully reviewed by someone else to check for errors. I will try some manual error checking myself over the next few days.

Many columns are hidden so that the dimensions and data volumes columns can be seen together. These columns can be unhidden (shown) without affecting any functionality.

Details of each of the models, for example number of points in the atmosphere, ocean and the number of vertical levels are stored on the Notes sheet. All other sheets then use these values to estimate the storage required. The number of bytes used to store each grid point is also stored on the Notes sheet and has initially been assumed to be 2.5 bytes per grid point, which was taken from 10.5281/zenodo.3961931. Use of quantisation could improve the deflation rate and decrease the number of bytes required per grid point.

The calculation of the storage required for mc in the Amon table for AWI's model is:

=Notes!$C$7 * Notes!$K$7 * Notes!$C$10 * Notes!$C$13

where cell C7 is the number of atmosphere points in the AWI model, K7 is the number of vertical levels in AWI's atmosphere model, C10 is the number of time points in a year for monthly mean data and C13 is the number of bytes required to store one data point. The dollar symbol in the cell references are to prevent the index of the cells being updated when pasted into another cell. When the variable is output on a fixed number of levels, for example plev19 then the reference to the number of vertical levels is replaced by the integer number of vertical levels, for example * Notes!$K$7 is replaced with * 19. When the variable is on a single level then this term can be omitted.