Explora Data Input Standards and Data Source Objectives

August 2014 - Preliminary Thoughts and Observations

The original (and baseline) Explora code accepts a simple CSV file as its input source, each row of which is a separate record for a plant genetic resources accession, and the columns correspond to three data dimensions:

Column 1 - A locally unique identifier, perhaps just a number for the accession in the input set
Set of Columns 2 to n, corresponding to trait descriptors with continuous value ranges, where n is the number of continuous variables
Set of columns n+1 to m, corresponding to trait descriptors with nominal (categorical) value ranges, where m - n+1 is the number of nominal variables

The continuous trait columns are presumed to preceed the nominal value columns, and user also explicitly tells Explora how many of each type of variable are in the dataset, thus perhaps allowing the software to ignore anything in the spreadsheet after column m.

Several thoughts fall out of the above:

Is there any other format other than CSV that could be handled (for example, XML or RDF formatted data?)
Could (should?) direct selection and importing of data from standard public sources (e.g. GENESYS?) be enabled?
Could the adoption of common PGR data formats by such data sources accelerate adoption of the tool by the community?
Is there any mechanism by which the input data could be made self-describing, that is, that the identifier, continuous and nominal data columns could be automatically extracted out of the file (e.g. perhaps a fixed prefix to the column names would empower this kind of automation?)
Should the accession identifiers have a more informative global meaning and syntax (e.g. URI's)?
Should the trait values be constrained to map onto public trait ontology (e.g. crop ontology) values?
Could (or should) additional meta-data (files) be provided or maintained in the system, that map additional meta-data such as passport data mapped to accession identifiers, or crop ontology linkages to trait columns?
Should Eplora application user interaction eventually provide in situ support for the above accession and trait descriptor details (e.g. external clickable linkages to relevant online (meta-)data?)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explora Data Input Standards and Data Source Objectives

Clone this wiki locally