Skip to content

Commit

Permalink
Update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Lukasz Ziemba committed Jun 25, 2020
1 parent 338e815 commit d4e6d7b
Showing 1 changed file with 5 additions and 4 deletions.
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,19 +30,19 @@ Run [*validation.bat*](validation.bat) script, it will perform preprocessing, va
- identify records with missing or unknown type;
- identify duplicate records using MD5 hash values;
- classify initial TG version of each record;
- create *endpoint.md.json* metadata summary.
- create *endpoint.md.json* metadata summary (after completed preprocessing of all records).
2. Validation (detailed [below](#metadata-validation-and-tg-version-classification-procedure)):
- validate each record using the validator instance(s);
- save validation reports for each record in *endpoint* folder, the subfolder structure of the source folder is preserved;
- classify TG version of each record;
- classify TG version of each record based on the validation results;
- add results for each record to CSV results *endpoint.csv*, detailed [below](#results-csv-columns).
3. Results:
- after completed validation of all source metadata the following result files are generated: *endpoint.json*, *endpoint.services.zip* and *endpoint.dataset.zip*, detailed [below](#result-files);
- the results can be used to calculate the conformity indicators as detailed [below](#conformity-indicators).

In case the validation does not complete for all source metadata (due to errors, user interruption, etc.), when the transformation is run for the same endpoint again, it will continue processing source metadata that were not processed before, hence are not included in CSV results. To re-validate an endpoint that was validated before, the CSV results file needs to be renamed or moved out of the results folder.

Alternatively, the procedure can be run from the PDI user interface (Spoon) which provides more control and feedback. For this purpose run *Spoon.bat*, open and run [*validation.kjb*](pdi/validation.kjb) job.
Alternatively, the procedure can be run from the PDI user interface (Spoon) which provides more control and feedback, and allows for modifications. For this purpose run *Spoon.bat*, open and run [*validation.kjb*](pdi/validation.kjb) job.

#### Metadata validation and TG version classification procedure:
1. TG version classification (1.3 vs. 2.0) is initially based on the presence of the `gmd:useLimitation` element, denoted in column *version_0* in CSV results; if the element is present, TG v. 1.3 is assumed; if the element is not present, TG v. 2.0 is assumed,
Expand All @@ -60,7 +60,8 @@ Alternatively, the procedure can be run from the PDI user interface (Spoon) whic
5. *endpoint.services.zip* - validation reports for service metadata records that failed validation,
6. *endpoint.dataset.zip* - validation reports for dataset, series, missing and unkown metadata records that failed validation.

Files 4., 5. and 6. are produced only after completed validation of all source metadata.
File 2. is produced only after completed preprocessing of all metadata records.
Files 4., 5. and 6. are produced only after completed validation of all metadata records.

#### Results CSV columns:
- `file_id` - identifies source metadata file and validation reports,
Expand Down

0 comments on commit d4e6d7b

Please sign in to comment.