diff --git a/README.md b/README.md index 6c571ea..4562430 100644 --- a/README.md +++ b/README.md @@ -30,11 +30,11 @@ Run [*validation.bat*](validation.bat) script, it will perform preprocessing, va - identify records with missing or unknown type; - identify duplicate records using MD5 hash values; - classify initial TG version of each record; - - create *endpoint.md.json* metadata summary. + - create *endpoint.md.json* metadata summary (after completed preprocessing of all records). 2. Validation (detailed [below](#metadata-validation-and-tg-version-classification-procedure)): - validate each record using the validator instance(s); - save validation reports for each record in *endpoint* folder, the subfolder structure of the source folder is preserved; - - classify TG version of each record; + - classify TG version of each record based on the validation results; - add results for each record to CSV results *endpoint.csv*, detailed [below](#results-csv-columns). 3. Results: - after completed validation of all source metadata the following result files are generated: *endpoint.json*, *endpoint.services.zip* and *endpoint.dataset.zip*, detailed [below](#result-files); @@ -42,7 +42,7 @@ Run [*validation.bat*](validation.bat) script, it will perform preprocessing, va In case the validation does not complete for all source metadata (due to errors, user interruption, etc.), when the transformation is run for the same endpoint again, it will continue processing source metadata that were not processed before, hence are not included in CSV results. To re-validate an endpoint that was validated before, the CSV results file needs to be renamed or moved out of the results folder. -Alternatively, the procedure can be run from the PDI user interface (Spoon) which provides more control and feedback. For this purpose run *Spoon.bat*, open and run [*validation.kjb*](pdi/validation.kjb) job. +Alternatively, the procedure can be run from the PDI user interface (Spoon) which provides more control and feedback, and allows for modifications. For this purpose run *Spoon.bat*, open and run [*validation.kjb*](pdi/validation.kjb) job. #### Metadata validation and TG version classification procedure: 1. TG version classification (1.3 vs. 2.0) is initially based on the presence of the `gmd:useLimitation` element, denoted in column *version_0* in CSV results; if the element is present, TG v. 1.3 is assumed; if the element is not present, TG v. 2.0 is assumed, @@ -60,7 +60,8 @@ Alternatively, the procedure can be run from the PDI user interface (Spoon) whic 5. *endpoint.services.zip* - validation reports for service metadata records that failed validation, 6. *endpoint.dataset.zip* - validation reports for dataset, series, missing and unkown metadata records that failed validation. -Files 4., 5. and 6. are produced only after completed validation of all source metadata. +File 2. is produced only after completed preprocessing of all metadata records. +Files 4., 5. and 6. are produced only after completed validation of all metadata records. #### Results CSV columns: - `file_id` - identifies source metadata file and validation reports,