From 06952a39e80752307c1150f4a4cc447c66f7ea7a Mon Sep 17 00:00:00 2001 From: github-actions <41898282+github-actions[bot]@users.noreply.github.com> Date: Thu, 29 Aug 2024 10:55:02 +0000 Subject: [PATCH] Automatic update of RiverBench/dataset-openaire-lod (dev) --- docs/datasets/openaire-lod/index.md | 80 +++++++++++++++++++---------- 1 file changed, 54 insertions(+), 26 deletions(-) diff --git a/docs/datasets/openaire-lod/index.md b/docs/datasets/openaire-lod/index.md index 21bbc94ee..fceb11ce6 100644 --- a/docs/datasets/openaire-lod/index.md +++ b/docs/datasets/openaire-lod/index.md @@ -1,4 +1,6 @@ -# openaire-lod (development version) +
[:material-link-variant: Permanent URL](https://w3id.org/riverbench/datasets/openaire-lod/dev "Link to the permanent URL of this resource.")
**[:material-database-edit: Edit this page](https://github.com/RiverBench/dataset-openaire-lod/edit/main/metadata.ttl "Edit this page's source in RDF/Turtle on GitHub.")**
[:material-help-circle:](../../documentation/editing-docs.md "Need help with editing?")
+ +# Dataset: openaire-lod (development version) [OpenAIRE LOD](https://web.archive.org/web/20201230151925/http://lod.openaire.eu/documentation) was a service that exported data from the OpenAIRE information space in RDF format, using Linked Open Data principles. The data was exported to Zenodo, with the [last dump dated at March 3, 2021](https://zenodo.org/records/4587369). This dataset consists of the "result" subset of the OpenAIRE LOD graph, including scientific results such as publications. @@ -8,8 +10,11 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/ !!! info - Download this metadata in RDF: **[Turtle](https://w3id.org/riverbench/datasets/openaire-lod/dev.ttl)**, **[N-Triples](https://w3id.org/riverbench/datasets/openaire-lod/dev.nt)**, **[RDF/XML](https://w3id.org/riverbench/datasets/openaire-lod/dev.rdf)**, **[Jelly](https://w3id.org/riverbench/datasets/openaire-lod/dev.jelly)** -
Source repository: **[openaire-lod](https://github.com/RiverBench/dataset-openaire-lod)** + :fontawesome-solid-diagram-project: Download this metadata in RDF: **[Turtle](https://w3id.org/riverbench/datasets/openaire-lod/dev.ttl)**, **[N-Triples](https://w3id.org/riverbench/datasets/openaire-lod/dev.nt)**, **[RDF/XML](https://w3id.org/riverbench/datasets/openaire-lod/dev.rdf)**, **[Jelly](https://w3id.org/riverbench/datasets/openaire-lod/dev.jelly)** +
:material-github: Source repository: **[dataset-openaire-lod](https://github.com/RiverBench/dataset-openaire-lod)** +
:material-link-variant: Permanent URL: [`https://w3id.org/riverbench/datasets/openaire-lod/dev`](https://w3id.org/riverbench/datasets/openaire-lod/dev) + + **[:octicons-arrow-down-24: Go to download links](#distributions)** ??? example "Stream preview (click to expand)" @@ -73,17 +78,17 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/ - **Name**: Christoph Lange - **Piotr Sowiński (5)** - **Name**: Piotr Sowiński + - **Comment**: Processing the dataset for RiverBench - **Nickname**: Ostrzyciel - **Homepage**: - ([https://orcid.org/0000-0002-2543-9461](https://orcid.org/0000-0002-2543-9461)) - Ostrzyciel ([https://github.com/Ostrzyciel](https://github.com/Ostrzyciel)) - - **Comment**: Processing the dataset for RiverBench - **License**: [https://spdx.org/licenses/CC0-1.0](https://spdx.org/licenses/CC0-1.0) - **Source**: - [https://doi.org/10.5281/zenodo.4587369](https://doi.org/10.5281/zenodo.4587369) - [https://web.archive.org/web/20201230151925/http://lod.openaire.eu/documentation](https://web.archive.org/web/20201230151925/http://lod.openaire.eu/documentation) - **Date Issued**: 2024-07-12 -- **Date Modified**: 2024-07-12 +- **Date Modified**: 2024-08-29 - **Landing page**: [openaire-lod (dev)](https://w3id.org/riverbench/datasets/openaire-lod/dev) - **Conforms To**: Metadata ([https://w3id.org/riverbench/schema/metadata](https://w3id.org/riverbench/schema/metadata)) @@ -103,11 +108,11 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/ - **Type**: - Stream elements split by time ([rb:TimeStreamElementSplit](https://w3id.org/riverbench/schema/metadata#TimeStreamElementSplit)) - Stream elements split by topic ([rb:TopicStreamElementSplit](https://w3id.org/riverbench/schema/metadata#TopicStreamElementSplit)) + - **Comment**: Each stream element corresponds to exactly one scientific result in OpenAIRE, each with an assigned timestamp (time of collection). _(en)_ - **Has temporal property**: [http://lod.openaire.eu/vocab/dateofcollection](http://lod.openaire.eu/vocab/dateofcollection) - **Has subject shape**: - **Comment**: Target instances of class ResultEntity. _(en)_ - **Target class**: [http://lod.openaire.eu/vocab/ResultEntity](http://lod.openaire.eu/vocab/ResultEntity) - - **Comment**: Each stream element corresponds to exactly one scientific result in OpenAIRE, each with an assigned timestamp (time of collection). _(en)_ - **Uses vocabulary**: [http://lod.openaire.eu/vocab](http://lod.openaire.eu/vocab) - **Conforms to W3C RDF 1.1 specification**: yes - **Conforms to W3C RDF-star draft specification as of December 17, 2021**: yes @@ -118,6 +123,25 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/ ## Distributions + +### Download links + +The dataset is published in a few size variants, each containing a specific number of stream elements. +For each size, there are three distribution types available: flat (just an N-Triples/N-Quads file), +streaming (a .tar.gz archive with Turtle/TriG files, one file per stream element), +and [Jelly](https://w3id.org/jelly) (a native binary format for streaming RDF). +See the [documentation](../../documentation/dataset-release-format.md) for more details. + +Distribution size | Statements | Flat | Streaming | Jelly +--- | --: | --: | --: | --: +10K | 193,178 | [:octicons-download-24: 3.4 MB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/flat_10K.nt.gz) | [:octicons-download-24: 3.0 MB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/stream_10K.tar.gz) | [:octicons-download-24: 3.1 MB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/jelly_10K.jelly.gz) +100K | 2,267,185 | [:octicons-download-24: 48.8 MB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/flat_100K.nt.gz) | [:octicons-download-24: 43.2 MB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/stream_100K.tar.gz) | [:octicons-download-24: 45.0 MB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/jelly_100K.jelly.gz) +1M | 42,913,544 | [:octicons-download-24: 1.2 GB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/flat_1M.nt.gz) | [:octicons-download-24: 1.1 GB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/stream_1M.tar.gz) | [:octicons-download-24: 1.2 GB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/jelly_1M.jelly.gz) +Full | 71,810,467 | [:octicons-download-24: 1.7 GB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/flat_full.nt.gz) | [:octicons-download-24: 1.6 GB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/stream_full.tar.gz) | [:octicons-download-24: 1.7 GB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/jelly_full.jelly.gz) + + +The full metadata of all distributions can be found below. + ### Full stream distribution - **Title**: Full stream distribution @@ -131,7 +155,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/ - Full distribution ([rb:fullDistribution](https://w3id.org/riverbench/schema/metadata#fullDistribution)) - Stream distribution ([rb:streamDistribution](https://w3id.org/riverbench/schema/metadata#streamDistribution)) - **Has stream element count**: 2,000,000 -- **Byte size**: 1.58 GB +- **Byte size**: 1.6 GB - **Media type**: text/turtle - **Packaging format**: application/tar - **Compression format**: application/gzip @@ -165,7 +189,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/ - Full distribution ([rb:fullDistribution](https://w3id.org/riverbench/schema/metadata#fullDistribution)) - Jelly distribution ([rb:jellyDistribution](https://w3id.org/riverbench/schema/metadata#jellyDistribution)) - **Has stream element count**: 2,000,000 -- **Byte size**: 1.70 GB +- **Byte size**: 1.7 GB - **Media type**: application/x-jelly-rdf - **Compression format**: application/gzip - **Checksum**: @@ -193,7 +217,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/ - Flat distribution ([rb:flatDistribution](https://w3id.org/riverbench/schema/metadata#flatDistribution)) - Full distribution ([rb:fullDistribution](https://w3id.org/riverbench/schema/metadata#fullDistribution)) - **Has stream element count**: 2,000,000 -- **Byte size**: 1.74 GB +- **Byte size**: 1.7 GB - **Media type**: application/n-triples - **Compression format**: application/gzip - **Checksum**: @@ -221,7 +245,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/ - Partial distribution ([rb:partialDistribution](https://w3id.org/riverbench/schema/metadata#partialDistribution)) - Stream distribution ([rb:streamDistribution](https://w3id.org/riverbench/schema/metadata#streamDistribution)) - **Has stream element count**: 1,000,000 -- **Byte size**: 1.08 GB +- **Byte size**: 1.1 GB - **Media type**: text/turtle - **Packaging format**: application/tar - **Compression format**: application/gzip @@ -255,7 +279,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/ - Jelly distribution ([rb:jellyDistribution](https://w3id.org/riverbench/schema/metadata#jellyDistribution)) - Partial distribution ([rb:partialDistribution](https://w3id.org/riverbench/schema/metadata#partialDistribution)) - **Has stream element count**: 1,000,000 -- **Byte size**: 1.19 GB +- **Byte size**: 1.2 GB - **Media type**: application/x-jelly-rdf - **Compression format**: application/gzip - **Checksum**: @@ -283,7 +307,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/ - Flat distribution ([rb:flatDistribution](https://w3id.org/riverbench/schema/metadata#flatDistribution)) - Partial distribution ([rb:partialDistribution](https://w3id.org/riverbench/schema/metadata#partialDistribution)) - **Has stream element count**: 1,000,000 -- **Byte size**: 1.16 GB +- **Byte size**: 1.2 GB - **Media type**: application/n-triples - **Compression format**: application/gzip - **Checksum**: @@ -311,7 +335,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/ - Partial distribution ([rb:partialDistribution](https://w3id.org/riverbench/schema/metadata#partialDistribution)) - Stream distribution ([rb:streamDistribution](https://w3id.org/riverbench/schema/metadata#streamDistribution)) - **Has stream element count**: 100,000 -- **Byte size**: 43.17 MB +- **Byte size**: 43.2 MB - **Media type**: text/turtle - **Packaging format**: application/tar - **Compression format**: application/gzip @@ -345,7 +369,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/ - Jelly distribution ([rb:jellyDistribution](https://w3id.org/riverbench/schema/metadata#jellyDistribution)) - Partial distribution ([rb:partialDistribution](https://w3id.org/riverbench/schema/metadata#partialDistribution)) - **Has stream element count**: 100,000 -- **Byte size**: 45.00 MB +- **Byte size**: 45.0 MB - **Media type**: application/x-jelly-rdf - **Compression format**: application/gzip - **Checksum**: @@ -373,7 +397,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/ - Flat distribution ([rb:flatDistribution](https://w3id.org/riverbench/schema/metadata#flatDistribution)) - Partial distribution ([rb:partialDistribution](https://w3id.org/riverbench/schema/metadata#partialDistribution)) - **Has stream element count**: 100,000 -- **Byte size**: 48.82 MB +- **Byte size**: 48.8 MB - **Media type**: application/n-triples - **Compression format**: application/gzip - **Checksum**: @@ -401,7 +425,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/ - Partial distribution ([rb:partialDistribution](https://w3id.org/riverbench/schema/metadata#partialDistribution)) - Stream distribution ([rb:streamDistribution](https://w3id.org/riverbench/schema/metadata#streamDistribution)) - **Has stream element count**: 10,000 -- **Byte size**: 3.00 MB +- **Byte size**: 3.0 MB - **Media type**: text/turtle - **Packaging format**: application/tar - **Compression format**: application/gzip @@ -435,7 +459,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/ - Jelly distribution ([rb:jellyDistribution](https://w3id.org/riverbench/schema/metadata#jellyDistribution)) - Partial distribution ([rb:partialDistribution](https://w3id.org/riverbench/schema/metadata#partialDistribution)) - **Has stream element count**: 10,000 -- **Byte size**: 3.12 MB +- **Byte size**: 3.1 MB - **Media type**: application/x-jelly-rdf - **Compression format**: application/gzip - **Checksum**: @@ -463,7 +487,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/ - Flat distribution ([rb:flatDistribution](https://w3id.org/riverbench/schema/metadata#flatDistribution)) - Partial distribution ([rb:partialDistribution](https://w3id.org/riverbench/schema/metadata#partialDistribution)) - **Has stream element count**: 10,000 -- **Byte size**: 3.43 MB +- **Byte size**: 3.4 MB - **Media type**: application/n-triples - **Compression format**: application/gzip - **Checksum**: @@ -484,75 +508,79 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/ - **Title**: Statistics for full distributions -| | **Sum** | **Unique** | **Mean** | **St. dev.** | **Min.** | **Max.** | +| | **Sum** | **Unique (approx.)** | **Mean** | **St. dev.** | **Min.** | **Max.** | | --- | --: | --: | --: | --: | --: | --: | | **IRIs** | 44,830,559 | 5,938,140 | 22.42 | 48.04 | 10 | 8,988 | | **Blank nodes** | 0 | _N/A_ | 0.00 | 0.00 | 0 | 0 | +| **Objects** | 69,535,884 | 14,154,385 | 34.77 | 141.46 | 7 | 8,985 | | **Graphs** | 2,000,000 | 1 | 1.00 | 0.00 | 1 | 1 | | **Statements** | 71,810,467 | _N/A_ | 35.91 | 141.51 | 8 | 8,987 | | **Literals** | 55,180,959 | 9,633,580 | 27.59 | 132.67 | 5 | 5,121 | | **Simple literals** | 55,180,959 | 9,633,580 | 27.59 | 132.67 | 5 | 5,121 | | **Datatype literals** | 0 | 0 | 0.00 | 0.00 | 0 | 0 | | **Language literals** | 0 | 0 | 0.00 | 0.00 | 0 | 0 | +| **ASCII control chars** | 5,234 | _N/A_ | 0.00 | 0.94 | 0 | 503 | | **Quoted triples** | 0 | _N/A_ | 0.00 | 0.00 | 0 | 0 | | **Subjects** | 2,000,000 | 2,000,131 | 1.00 | 0.00 | 1 | 1 | | **Predicates** | 28,476,853 | 24 | 14.24 | 0.95 | 8 | 19 | -| **Objects** | 69,535,884 | 14,154,385 | 34.77 | 141.46 | 7 | 8,985 | ### Statistics for 1M distributions - **Title**: Statistics for 1M distributions -| | **Sum** | **Unique** | **Mean** | **St. dev.** | **Min.** | **Max.** | +| | **Sum** | **Unique (approx.)** | **Mean** | **St. dev.** | **Min.** | **Max.** | | --- | --: | --: | --: | --: | --: | --: | | **IRIs** | 26,480,270 | 4,695,708 | 26.48 | 67.66 | 10 | 8,988 | | **Blank nodes** | 0 | _N/A_ | 0.00 | 0.00 | 0 | 0 | +| **Objects** | 41,489,263 | 9,106,902 | 41.49 | 199.70 | 7 | 8,985 | | **Graphs** | 1,000,000 | 1 | 1.00 | 0.00 | 1 | 1 | | **Statements** | 42,913,544 | _N/A_ | 42.91 | 199.75 | 8 | 8,987 | | **Literals** | 29,668,659 | 4,871,239 | 29.67 | 187.48 | 5 | 5,121 | | **Simple literals** | 29,668,659 | 4,871,239 | 29.67 | 187.48 | 5 | 5,121 | | **Datatype literals** | 0 | 0 | 0.00 | 0.00 | 0 | 0 | | **Language literals** | 0 | 0 | 0.00 | 0.00 | 0 | 0 | +| **ASCII control chars** | 2 | _N/A_ | 0.00 | 0.00 | 0 | 2 | | **Quoted triples** | 0 | _N/A_ | 0.00 | 0.00 | 0 | 0 | | **Subjects** | 1,000,000 | 1,000,059 | 1.00 | 0.00 | 1 | 1 | | **Predicates** | 13,660,885 | 24 | 13.66 | 0.93 | 8 | 19 | -| **Objects** | 41,489,263 | 9,106,902 | 41.49 | 199.70 | 7 | 8,985 | ### Statistics for 100K distributions - **Title**: Statistics for 100K distributions -| | **Sum** | **Unique** | **Mean** | **St. dev.** | **Min.** | **Max.** | +| | **Sum** | **Unique (approx.)** | **Mean** | **St. dev.** | **Min.** | **Max.** | | --- | --: | --: | --: | --: | --: | --: | | **IRIs** | 1,700,241 | 210,791 | 17.00 | 3.85 | 12 | 227 | | **Blank nodes** | 0 | _N/A_ | 0.00 | 0.00 | 0 | 0 | +| **Objects** | 2,165,891 | 940,274 | 21.66 | 29.45 | 10 | 3,101 | | **Graphs** | 100,000 | 1 | 1.00 | 0.00 | 1 | 1 | | **Statements** | 2,267,185 | _N/A_ | 22.67 | 29.35 | 10 | 3,101 | | **Literals** | 1,849,721 | 828,050 | 18.50 | 28.73 | 7 | 3,037 | | **Simple literals** | 1,849,721 | 828,050 | 18.50 | 28.73 | 7 | 3,037 | | **Datatype literals** | 0 | 0 | 0.00 | 0.00 | 0 | 0 | | **Language literals** | 0 | 0 | 0.00 | 0.00 | 0 | 0 | +| **ASCII control chars** | 0 | _N/A_ | 0.00 | 0.00 | 0 | 0 | | **Quoted triples** | 0 | _N/A_ | 0.00 | 0.00 | 0 | 0 | | **Subjects** | 100,000 | 100,006 | 1.00 | 0.00 | 1 | 1 | | **Predicates** | 1,284,078 | 22 | 12.84 | 1.10 | 10 | 18 | -| **Objects** | 2,165,891 | 940,274 | 21.66 | 29.45 | 10 | 3,101 | ### Statistics for 10K distributions - **Title**: Statistics for 10K distributions -| | **Sum** | **Unique** | **Mean** | **St. dev.** | **Min.** | **Max.** | +| | **Sum** | **Unique (approx.)** | **Mean** | **St. dev.** | **Min.** | **Max.** | | --- | --: | --: | --: | --: | --: | --: | | **IRIs** | 170,859 | 29,880 | 17.09 | 6.29 | 13 | 207 | | **Blank nodes** | 0 | _N/A_ | 0.00 | 0.00 | 0 | 0 | +| **Objects** | 177,888 | 76,837 | 17.79 | 8.41 | 10 | 239 | | **Graphs** | 10,000 | 1 | 1.00 | 0.00 | 1 | 1 | | **Statements** | 193,178 | _N/A_ | 19.32 | 8.53 | 10 | 240 | | **Literals** | 138,454 | 56,917 | 13.85 | 5.07 | 7 | 202 | | **Simple literals** | 138,454 | 56,917 | 13.85 | 5.07 | 7 | 202 | | **Datatype literals** | 0 | 0 | 0.00 | 0.00 | 0 | 0 | | **Language literals** | 0 | 0 | 0.00 | 0.00 | 0 | 0 | +| **ASCII control chars** | 0 | _N/A_ | 0.00 | 0.00 | 0 | 0 | | **Quoted triples** | 0 | _N/A_ | 0.00 | 0.00 | 0 | 0 | | **Subjects** | 10,000 | 9,999 | 1.00 | 0.00 | 1 | 1 | | **Predicates** | 121,432 | 22 | 12.14 | 0.71 | 10 | 16 | -| **Objects** | 177,888 | 76,837 | 17.79 | 8.41 | 10 | 239 |