Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promote "Metadata in Table Schema" recipe to the specs. #961

Open
wants to merge 34 commits into
base: next
Choose a base branch
from

Conversation

pierrecamilleri
Copy link

@pierrecamilleri pierrecamilleri commented Jul 5, 2024

Context and related issue :
fix #899

Migrated pull request from datapackage-v2-draft repo

Still to do on this PR:

  • Move table schema to /standards/table-schema.md (cf. main)
  • Move changelog to /overview/changelog.md (cf. main)
  • Update wording so it aligns with Data Package (my pending PR)
  • Update some links
  • Fix invalid json example (it was an unintended change)
  • Clarify if examples are borrowing title and path from Data Resource (I think they are, included in my PR) or if this is something completely separate.

@roll
Copy link
Member

roll commented Jul 8, 2024

Thanks a lot for rebasing the PR @pierrecamilleri !

@pierrecamilleri
Copy link
Author

@peterdesmet @roll

Almost there.

I added "partial" to qualify "Data Resource" in the suggestion of Peter concerning the documentation of top-level "examples" ; because an object with title and path is not a valid Data Resource (name is missing). Or do we want to impose a name property ? What do you think of this wording ?

Copy link
Member

@peterdesmet peterdesmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pierrecamilleri thanks for the updates! Regarding:

Or do we want to impose a name property ? What do you think of this wording ?

I think we should simply stick to the Data Resource spec and require a name (over a title). I have made suggestions to the files to reflect this change.

profiles/source/dictionary/common.yaml Outdated Show resolved Hide resolved
profiles/source/dictionary/schema.yaml Outdated Show resolved Hide resolved
profiles/source/dictionary/schema.yaml Outdated Show resolved Hide resolved
profiles/source/dictionary/schema.yaml Outdated Show resolved Hide resolved
profiles/source/dictionary/schema.yaml Outdated Show resolved Hide resolved
profiles/source/dictionary/schema.yaml Outdated Show resolved Hide resolved
@@ -57,7 +57,7 @@
"homepage": {
"propertyOrder": 60,
"title": "Home Page",
"description": "The home on the web that is related to this data package.",
"description": "The home on the web that is related to this resource.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two notes:

  1. These files should be rebuild based on the changes above.
  2. Is it a good idea to rebuild these files? I think we should have a new directory target/2.1-draft

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with the build process, in particular what is the difference between "build/profiles" and "profiles/target" ?

When I run the "build.js", with "VERSION=2.1-draft", it does not create (or for the matter, populate) a target/2.1-draft directory 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterdesmet
@pierrecamilleri
I need to set up the branching process for new versions so it's OK if this PR just includes actual changes to the standard and I'll handle the rest

@pierrecamilleri
Copy link
Author

Little ping @peterdesmet, if you can find the time for a review it would be great !

@peterdesmet
Copy link
Member

Thanks for the heads up! Will have a look this week.

@roll
Copy link
Member

roll commented Sep 17, 2024

Amazing @pierrecamilleri!

I'm going to be on it next week as well. First of all, I would suggest us to test a profile patch on #975, and then do this one

name, title, description, homepage, version, created, keywords, contributors, examples
Copy link
Member

@peterdesmet peterdesmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @pierrecamilleri I did the following:

  • List the properties in the same order as on Data Package: name, title, description, homepage, version, created, keywords, contributors, examples
  • Made some review comments regarding phrasing. I think we can remove the cf. Data Package everywhere, but indicate where it inherits from Data Package where applicable.
  • Minor update to simplify the CHANGELOG

I've started to use these metadata properties, so I'm looking forward to see this adopted!

@@ -266,6 +266,44 @@ If the value of the `foreignKey.reference.resource` property is an empty string
Data consumer MUST support the `foreignKey.fields` and `foreignKey.reference.fields` properties in a form of a single string e.g. `"fields": "a"` which was a part of the `v1.0` of the specification.
:::

#### `name`

A simple name or identifier for the schema (cf. [Data Package](https://datapackage.org/standard/data-package/#name)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A simple name or identifier for the schema (cf. [Data Package](https://datapackage.org/standard/data-package/#name)).
A simple name or identifier for the schema.


#### `description`

A description of the schema (cf. [Data Package](https://datapackage.org/standard/data-package/#description)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A description of the schema (cf. [Data Package](https://datapackage.org/standard/data-package/#description)).
A description of the schema.


#### `created`

The datetime on which the schema was created (cf. [Data Package](https://datapackage.org/standard/data-package/#created)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The datetime on which the schema was created (cf. [Data Package](https://datapackage.org/standard/data-package/#created)).
The datetime on which the schema was created.

Copy link
Member

@peterdesmet peterdesmet Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could inherit from Data Package. Would add:

If not specified, the schema inherits from the Data Package if distributed in a Data Package descriptor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree


#### `version`

A version string identifying the version of the schema (cf. [Data Package](https://datapackage.org/standard/data-package/#version)). If not specified, the schema inherits from the Data Package if distributed in a Data Package descriptor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A version string identifying the version of the schema (cf. [Data Package](https://datapackage.org/standard/data-package/#version)). If not specified, the schema inherits from the Data Package if distributed in a Data Package descriptor.
A version string identifying the version of the schema. If not specified, the schema inherits from the Data Package if distributed in a Data Package descriptor.


#### `keywords`

An array of string keywords to assist users searching for the schema in catalogs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pierrecamilleri would you argue this inherits from Data Package or not? If so, please change to:

Suggested change
An array of string keywords to assist users searching for the schema in catalogs.
An array of string keywords to assist users searching for the schema in catalogs. If not specified, the schema inherits from the Data Package if distributed in a Data Package descriptor.


#### `contributors`

The people or organizations that contributed to the schema (cf. [Data Package](https://datapackage.org/standard/data-package/#contributors)). If not specified the schema inherits from the Data Package if distributed in a Data Package descriptor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The people or organizations that contributed to the schema (cf. [Data Package](https://datapackage.org/standard/data-package/#contributors)). If not specified the schema inherits from the Data Package if distributed in a Data Package descriptor.
The people or organizations that contributed to the schema. If not specified the schema inherits from the Data Package if distributed in a Data Package descriptor.

@@ -348,7 +348,7 @@
"homepage": {
"propertyOrder": 70,
"title": "Home Page",
"description": "The home on the web that is related to this data package.",
"description": "The home on the web that is related to this resource.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"description": "The home on the web that is related to this resource.",
"description": "The home on the web that is related to this data package.",

@roll If these files are rebuild, it should say descriptor (general) or Data Package (specific) here.


#### `version`

A version string identifying the version of the schema (cf. [Data Package](https://datapackage.org/standard/data-package/#version)). If not specified, the schema inherits from the Data Package if distributed in a Data Package descriptor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A version string identifying the version of the schema (cf. [Data Package](https://datapackage.org/standard/data-package/#version)). If not specified, the schema inherits from the Data Package if distributed in a Data Package descriptor.
A version string identifying the version of the schema. If not specified, the schema inherits from the Data Package if distributed in a Data Package descriptor.


#### `keywords`

An array of string keywords to assist users searching for the schema in catalogs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this inherit from Data Package if not defined? If so, please add:

If not specified, the schema inherits from the Data Package if distributed in a Data Package descriptor.

"$ref": "#/definitions/version"
keywords:
"$ref": "#/definitions/keywords"
contributors:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Four properties from Data Package are not carried over to Schema. I'm fine with waiting for use cases, but wanted to know your opinion on these @pierrecamilleri:

  • id: potentially useful. In my use case, we opted for path instead (the URL where the schema is available), which could be useful to add.
  • licenses: likely a bit tricky to inherit, would not add for now
  • image: probably seldom useful
  • sources: not really applicable here

@pierrecamilleri
Copy link
Author

pierrecamilleri commented Sep 25, 2024

Thanks for the quality improvements !

I think we can remove the cf. Data Package everywhere,

Cf. Data Package was more to avoid repeating stuff in the documentation, for instance the datapackage.version gives precision about "It SHOULD conform to the Semantic Versioning requirements and SHOULD follow the Data Package Version recipe." that we probably don't want to repeat. So I think these links add some clarity.

WDYT ?

but indicate where it inherits from Data Package where applicable.

Concerning the inheritance, I wonder if we could not rather drop this mention of inheritance, even for version and contributors. As a user of e.g. frictionless-r or frictionless-py, this would create the expectation that if I access the version or contributors property of a schema, it automatically fills a missing field if the schema is part of a package.

I am not quite sure there is a need for this. The metadata is optional and every creator of datapackage can gauge whether it makes sense to repeat this information, and tooling can choose to look at datapackages properties when relevant.

If you think that dropping it would create confusion, OK for adding it to keywords as well.

@roll roll changed the base branch from main to next October 28, 2024 10:15
Copy link
Member

@roll roll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @pierrecamilleri !

@pierrecamilleri
Copy link
Author

I merged the next branch to resolve merge conflicts. In the changelog, I included the changes under the v2.1 header (which release PR is still in Draft status), I hope it is planned to release them under this version number. I corrected the linting errors as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Promote "Metadata in Table Schema" recipe to the specs
3 participants