diff --git a/README.md b/README.md index 9c62995..7d2f78a 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,190 @@ -CPRNC is a fortran program used to compare netcdf data files. -It was developed and has evolved for use with CESM. +cprnc README +------------ +cprnc is a generic tool for analyzing a netcdf file or comparing +two netcdf files. + +If you are trying to debug an installed cprnc tool make sure that you +are looking at the correct one by comparing the path to the one in +your case directory. + + +Quick Start Guide: +------------------ + +cprnc uses cmake and requires netcdf-fortran. It is a serial program and must +use a serial build of the netcdf-fortran library. To build +cd cprnc +mkdir bld +cd bld +cmake ../ + +This should be suffiecient if netcdf-fortran and the compiler that +library was built with are in the path. + +Finally, put the resulting executable in CCSM_CPRNC as defined in +config_machines.xml. + + + Usage: cprnc [-v] [-d dimname:start[:count]] file1 [file2] + -m: Compare each time sample. Default is false, i.e. match "time" + coordinate values before comparing + -v: Verbose output + -d dimname:start[:count] + Print variable values for the specified dimname index subrange. + + +Users Guide: +------------ + +cprnc is a Fortran-90 application. It relies on netcdf version 3 or +later and uses the f90 netcdf interfaces. It requires a netcdf include +file and a netcdf library. + +cprnc generates an ascii output file via standard out. It initially +summarizes some characteristics of the input file[s]. A compare file is +generally 132 characters wide and an analyze file is less than 80 +characters wide. + +In analyze mode, the output for a field looks like + + ( lon, lat, time, -----) + 259200 ( 587, 134, 1) ( 269, 59, 1) + FX1 96369 8.273160400390625E+02 0.000000000000000E+00 + avg abs field values: 9.052845920820910E+01 + +and a guide to this information is printed at the top of the file + + ( dim1, dim2, dim3, dim4) + ARRSIZ1 ( indx1, indx2, indx3) file 1 + FIELD NVALID MAX MIN + + +The first 10 characters of the field name are identified in the first + dozen columns of the third line. +The first line summarizes the names of the dimensions of the field +The second line summarizes the indices of the maximum and minimum value + of the field for the first three dimensions. If the fourth dimension + exists, it's always assumed to be time. Time is handled separately. +The third line summarizes the number of valid values in the array + and the maximum and minimum value over those valid values. Invalid + values are values that are identified to be "fill" value. +The last line summarizes some overall statistics including the average + absolute value of the valid values of the field. + +In comparison mode, the output (132 chars wide) for a field looks like + + 96369 ( lon, lat, time) + 259200 ( 422, 198, 1) ( 203, 186, 1) ( 47, 169, 1) ( 224, 171, 1) + FIRA 96369 1.466549530029297E+02 -3.922052764892578E+01 1.4E+02 -3.037954139709473E+01 1.0E+00 -3.979958057403564E+00 + 96369 1.321966247558594E+02 -1.603044700622559E+01 1.084177169799805E+02 3.982142448425293E+00 + 259200 ( 156, 31, 1) ( 573, 178, 1) ( + avg abs field values: 6.778244097051392E+01 rms diff: 1.4E+01 avg rel diff(npos): 4.6E-02 + 5.960437961084186E+01 avg decimal digits(ndif): 1.2 worst: 0.0 + +and a guide to this information is printed at the top of the file + + NDIFFS ( dim1, dim2, dim3, dim4, ... ) + ARRSIZ1 ( indx1, indx2, indx3, ... ) file 1 + FIELD NVALID1 MAX1 MIN1 DIFFMAX VALUES RDIFMAX VALUES + NVALID2 MAX2 MIN2 + ARRSIZ2 ( indx1, indx2, indx3, ...) file 2 + +The information content is identical to the information in analyze +mode with the following additions. Two additional lines are added +in the main body. Lines 4 and 5 are identical to line 3 and 2 +respectively but are associated with file 2 instead of file 1. +In addition, the right hand side of lines 2, 3, and 4 contain +information about the maximum difference, the location and values +of the maximum difference, the relative difference and the location +and values of the maximum relative difference. The last two line +summarize some overall statistics including average absolute values +of the field on the two files, rms difference, average relative +difference, average number of digits that match, and the worst +case for the number of digits that match. + +"avg rel diff" gives the average relative difference (sum of relative +differences normalized by the number of indices where both variables +have valid values). The denominator for each relative difference is +the MAX of the two values. + +"avg decimal digits" is determined by: For each diff, determine the +number of digits that match (as -log10(rdiff(i)); add this to a +running sum; then normalize by the number of diffs (ignoring places +where the two variables are the same). For example, if there are 10 +values, 8 of which match, one has a relative difference of 1e-3 and +one has a relative difference of 1e-5, then the avg decimal digits +will be 4. + +"worst decimal digits" is simply log10(1/rdmax), where rdmax is the +max relative difference (in the above example, this would give 3). + +At the end of the output file, a summary is presented that looks like + +SUMMARY of cprnc: + A total number of 119 fields were compared + of which 83 had non-zero differences + and 17 had differences in fill patterns + and 2 had differences in dimension sizes + A total number of 10 fields could not be analyzed + A total number of 0 time-varying fields on file 1 were not found on file 2. + A total number of 0 time-constant fields on file 1 were not found on file 2. + A total number of 0 time-varying fields on file 2 were not found on file 1. + A total number of 0 time-constant fields on file 2 were not found on file 1. + diff_test: the two files seem to be DIFFERENT + + +This summarizes: +- the number of fields that were compared +- the number of fields that differed (not counting fields that differed + only in the fill pattern) +- the number of fields with differences in fill patterns +- the number of fields with differences in dimension sizes +- the number of fields that could not be analyzed +- the number of fields on one file but not the other + - for files with an unlimited (time) dimension, these counts are + broken down into time-varying fields (i.e., fields with an unlimited + dimension) and time-constant fields (i.e., fields without an + unlimited dimension) +- whether the files are IDENTICAL, DIFFERENT, or DIFFER only in their field lists + - Files are considered DIFFERENT if there are differences in the values, fill + patterns or dimension sizes of any variable + - Files are considered to "DIFFER only in their field lists" if matching + variables are all identical, but there are either fields on file1 that are + not on file2, or fields on file2 that are not on file1 + - However, if the only difference in field lists is in the presence + or absence of time-constant fields on a file that has an unlimited + (time) dimension, the files are considered to be IDENTICAL, with + an extra message appended that notes this fact. (While not ideal, + this exception is needed so that exact restart tests pass despite + some time-constant fields being on the output files from one case + but not the other.) + +Developers Guide: +----------------- + +The tool works as follows. + +Fields can be analyzed if they are int, float or double and +have between 0 and n dimensions + +In general, fields that appear on both files are +compared. If they are sizes, no difference +statistics are computed and only a summary of the fields on +the files are presented. If fields only appear +on one file, those fields are analyzed. + +The unlimited dimension is treated uniquely. In general, for files +that have a dimension named "time", the time axes are compared +and matching time values on the two files are compared one +timestep at a time. Time values that don't match are skipped. +To override the matching behaviour, use cprnc -m. In this mode, +timestamps are compared in indexical space. In analyze mode, +the fields are analyzed one timestamp at a time. In general, +if there is a "time" axis, it will be the outer-most loop in +the output analysis. In compare mode, fields with a time axis +and a timestamp that are not common between the two files are +ignored. + +It is also possible to compare files that don't have an unlimited +dimension; in this case, the '-m' flag must be given.