Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data object extension #15

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Data object extension #15

wants to merge 7 commits into from

Conversation

HLWeil
Copy link

@HLWeil HLWeil commented Feb 28, 2023

Preface

Hey, here are our proposed adjustments to the datamodel and documentation for enabling ISA to thoroughly describe data objects.
For reference, here is the discussion about this topic: ISA-tools/isa-api#484

General Goals

Our goal here is to improve the description of data using the isa model.

Currently, the description given in the ISA model points just to the file, but not inside the file. This is not sufficient, if the file format is not well understood or when the actual data object resulting from a measurement or computation is not a full file, but rather a value or value set in a file.

So we wanted to enhance the data object with two things:

  • A Pointer pointing to a specific location in the file
  • A Dataset description, which gives context to the data objects stored in a data file

Changes made

Datamodel

We came up with the following data model:

Property Datatype Description
File name String A file name or full path referencing a data file produced by the related process that MAY be packaged with, or is accessible via, the ISA reference implementation content.
Pointer String A pointer referencing a location inside the data file. This SHOULD always be specified when the data of interest is not the complete file, but a specific part of it.
Generated By String A file name, full path or identifier referencing the tool with which this data object was generated.
Explication Ontology Annotation An ontology annotation qualifying what the data describes.
Unit Ontology Annotation The unit qualifying the value stored in the data object.
Object Type Ontology Annotation Specifies the format in which the value in the data object will be stored.

ISA Json

Which results in the following json schema:

{
    "$schema": "http://json-schema.org/draft-04/schema",
    "title": "ISA data schema",
    "description": "JSON-schema representing a data file in the ISA model",
    "description": "JSON-schema representing a data object in the ISA model",
    "type": "object",
    "properties": {
        "@id": { "type": "string", "format": "uri" },
        "name": {
        "filename": {
            "type": "string"
        },
        "pointer": {
            "type": "string"
        },
        "type": {
            "type": "string",
            "enum": [
                "Raw Data File",
                "Derived Data File",
                "Image File"
            ]
        },
        "generatedBy": {
            "type": "string"
        },
        "explication": {
            "$ref": "ontology_annotation_schema.json#"
        },
        "unit": {
            "$ref": "ontology_annotation_schema.json#"
        },
        "objectType": {
            "$ref": "ontology_annotation_schema.json#"
        },
        "label": {
            "type": "string"
        },
        "comments" : {
            "type": "array",
            "items": {
                "$ref": "comment_schema.json#"
            }
        }
    },
    "additionalProperties": false
}

ISA Tab

To integrate these model extensions into the ISA Tab Format, we propose two adjustments:

To enable processes to point into files data files, we propose to add a new column Data Pointer to the Assay file. This column should be used to qualify the Data File column, when the data object resulting from the process is not the full data file, but instead a value or value set in the data file.

Additionally, to give context about the values in the data file, we propose to add a new file to the isa tab family, namely the Dataset file, which carries all other data fields, which we added in the Data Model.

Aux

  • Small fix to sphinx config file as one function was deprecated.

Open Questions

  • Should we add the build folder here in this PR?

@stain
Copy link

stain commented Mar 9, 2023

See also https://www.w3.org/TR/annotation-model/#selectors on how fragment selectors are different for different media types. You need to indicate the type of pointer, either as a prefix or pointertype. The media type of filename will then also be essential (equivalent to encodingFormat in RO-Crate for IANA Media type) so the client can know how to resolve the pointer.

@muehlhaus
Copy link

I agree with Stain! We need a pointerType and encodingFormat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants