From 06952a39e80752307c1150f4a4cc447c66f7ea7a Mon Sep 17 00:00:00 2001
From: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Date: Thu, 29 Aug 2024 10:55:02 +0000
Subject: [PATCH] Automatic update of RiverBench/dataset-openaire-lod (dev)
---
docs/datasets/openaire-lod/index.md | 80 +++++++++++++++++++----------
1 file changed, 54 insertions(+), 26 deletions(-)
diff --git a/docs/datasets/openaire-lod/index.md b/docs/datasets/openaire-lod/index.md
index 21bbc94ee..fceb11ce6 100644
--- a/docs/datasets/openaire-lod/index.md
+++ b/docs/datasets/openaire-lod/index.md
@@ -1,4 +1,6 @@
-# openaire-lod (development version)
+
+
+# Dataset: openaire-lod (development version)
[OpenAIRE LOD](https://web.archive.org/web/20201230151925/http://lod.openaire.eu/documentation) was a service that exported data from the OpenAIRE information space in RDF format, using Linked Open Data principles. The data was exported to Zenodo, with the [last dump dated at March 3, 2021](https://zenodo.org/records/4587369). This dataset consists of the "result" subset of the OpenAIRE LOD graph, including scientific results such as publications.
@@ -8,8 +10,11 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/
!!! info
- Download this metadata in RDF: **[Turtle](https://w3id.org/riverbench/datasets/openaire-lod/dev.ttl)**, **[N-Triples](https://w3id.org/riverbench/datasets/openaire-lod/dev.nt)**, **[RDF/XML](https://w3id.org/riverbench/datasets/openaire-lod/dev.rdf)**, **[Jelly](https://w3id.org/riverbench/datasets/openaire-lod/dev.jelly)**
-
Source repository: **[openaire-lod](https://github.com/RiverBench/dataset-openaire-lod)**
+ :fontawesome-solid-diagram-project: Download this metadata in RDF: **[Turtle](https://w3id.org/riverbench/datasets/openaire-lod/dev.ttl)**, **[N-Triples](https://w3id.org/riverbench/datasets/openaire-lod/dev.nt)**, **[RDF/XML](https://w3id.org/riverbench/datasets/openaire-lod/dev.rdf)**, **[Jelly](https://w3id.org/riverbench/datasets/openaire-lod/dev.jelly)**
+
:material-github: Source repository: **[dataset-openaire-lod](https://github.com/RiverBench/dataset-openaire-lod)**
+
:material-link-variant: Permanent URL: [`https://w3id.org/riverbench/datasets/openaire-lod/dev`](https://w3id.org/riverbench/datasets/openaire-lod/dev)
+
+ **[:octicons-arrow-down-24: Go to download links](#distributions)**
??? example "Stream preview (click to expand)"
@@ -73,17 +78,17 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/
- **Name**: Christoph Lange
- **Piotr Sowiński (5)**
- **Name**: Piotr Sowiński
+ - **Comment**: Processing the dataset for RiverBench
- **Nickname**: Ostrzyciel
- **Homepage**:
- ([https://orcid.org/0000-0002-2543-9461](https://orcid.org/0000-0002-2543-9461))
- Ostrzyciel ([https://github.com/Ostrzyciel](https://github.com/Ostrzyciel))
- - **Comment**: Processing the dataset for RiverBench
- **License**: [https://spdx.org/licenses/CC0-1.0](https://spdx.org/licenses/CC0-1.0)
- **Source**:
- [https://doi.org/10.5281/zenodo.4587369](https://doi.org/10.5281/zenodo.4587369)
- [https://web.archive.org/web/20201230151925/http://lod.openaire.eu/documentation](https://web.archive.org/web/20201230151925/http://lod.openaire.eu/documentation)
- **Date Issued**: 2024-07-12
-- **Date Modified**: 2024-07-12
+- **Date Modified**: 2024-08-29
- **Landing page**: [openaire-lod (dev)](https://w3id.org/riverbench/datasets/openaire-lod/dev)
- **Conforms To**: Metadata ([https://w3id.org/riverbench/schema/metadata](https://w3id.org/riverbench/schema/metadata))
@@ -103,11 +108,11 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/
- **Type**:
- Stream elements split by time ([rb:TimeStreamElementSplit](https://w3id.org/riverbench/schema/metadata#TimeStreamElementSplit))
- Stream elements split by topic ([rb:TopicStreamElementSplit](https://w3id.org/riverbench/schema/metadata#TopicStreamElementSplit))
+ - **Comment**: Each stream element corresponds to exactly one scientific result in OpenAIRE, each with an assigned timestamp (time of collection). _(en)_
- **Has temporal property**: [http://lod.openaire.eu/vocab/dateofcollection](http://lod.openaire.eu/vocab/dateofcollection)
- **Has subject shape**:
- **Comment**: Target instances of class ResultEntity. _(en)_
- **Target class**: [http://lod.openaire.eu/vocab/ResultEntity](http://lod.openaire.eu/vocab/ResultEntity)
- - **Comment**: Each stream element corresponds to exactly one scientific result in OpenAIRE, each with an assigned timestamp (time of collection). _(en)_
- **Uses vocabulary**: [http://lod.openaire.eu/vocab](http://lod.openaire.eu/vocab)
- **Conforms to W3C RDF 1.1 specification**: yes
- **Conforms to W3C RDF-star draft specification as of December 17, 2021**: yes
@@ -118,6 +123,25 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/
## Distributions
+
+### Download links
+
+The dataset is published in a few size variants, each containing a specific number of stream elements.
+For each size, there are three distribution types available: flat (just an N-Triples/N-Quads file),
+streaming (a .tar.gz archive with Turtle/TriG files, one file per stream element),
+and [Jelly](https://w3id.org/jelly) (a native binary format for streaming RDF).
+See the [documentation](../../documentation/dataset-release-format.md) for more details.
+
+Distribution size | Statements | Flat | Streaming | Jelly
+--- | --: | --: | --: | --:
+10K | 193,178 | [:octicons-download-24: 3.4 MB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/flat_10K.nt.gz) | [:octicons-download-24: 3.0 MB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/stream_10K.tar.gz) | [:octicons-download-24: 3.1 MB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/jelly_10K.jelly.gz)
+100K | 2,267,185 | [:octicons-download-24: 48.8 MB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/flat_100K.nt.gz) | [:octicons-download-24: 43.2 MB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/stream_100K.tar.gz) | [:octicons-download-24: 45.0 MB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/jelly_100K.jelly.gz)
+1M | 42,913,544 | [:octicons-download-24: 1.2 GB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/flat_1M.nt.gz) | [:octicons-download-24: 1.1 GB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/stream_1M.tar.gz) | [:octicons-download-24: 1.2 GB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/jelly_1M.jelly.gz)
+Full | 71,810,467 | [:octicons-download-24: 1.7 GB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/flat_full.nt.gz) | [:octicons-download-24: 1.6 GB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/stream_full.tar.gz) | [:octicons-download-24: 1.7 GB](https://w3id.org/riverbench/datasets/openaire-lod/dev/files/jelly_full.jelly.gz)
+
+
+The full metadata of all distributions can be found below.
+
### Full stream distribution
- **Title**: Full stream distribution
@@ -131,7 +155,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/
- Full distribution ([rb:fullDistribution](https://w3id.org/riverbench/schema/metadata#fullDistribution))
- Stream distribution ([rb:streamDistribution](https://w3id.org/riverbench/schema/metadata#streamDistribution))
- **Has stream element count**: 2,000,000
-- **Byte size**: 1.58 GB
+- **Byte size**: 1.6 GB
- **Media type**: text/turtle
- **Packaging format**: application/tar
- **Compression format**: application/gzip
@@ -165,7 +189,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/
- Full distribution ([rb:fullDistribution](https://w3id.org/riverbench/schema/metadata#fullDistribution))
- Jelly distribution ([rb:jellyDistribution](https://w3id.org/riverbench/schema/metadata#jellyDistribution))
- **Has stream element count**: 2,000,000
-- **Byte size**: 1.70 GB
+- **Byte size**: 1.7 GB
- **Media type**: application/x-jelly-rdf
- **Compression format**: application/gzip
- **Checksum**:
@@ -193,7 +217,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/
- Flat distribution ([rb:flatDistribution](https://w3id.org/riverbench/schema/metadata#flatDistribution))
- Full distribution ([rb:fullDistribution](https://w3id.org/riverbench/schema/metadata#fullDistribution))
- **Has stream element count**: 2,000,000
-- **Byte size**: 1.74 GB
+- **Byte size**: 1.7 GB
- **Media type**: application/n-triples
- **Compression format**: application/gzip
- **Checksum**:
@@ -221,7 +245,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/
- Partial distribution ([rb:partialDistribution](https://w3id.org/riverbench/schema/metadata#partialDistribution))
- Stream distribution ([rb:streamDistribution](https://w3id.org/riverbench/schema/metadata#streamDistribution))
- **Has stream element count**: 1,000,000
-- **Byte size**: 1.08 GB
+- **Byte size**: 1.1 GB
- **Media type**: text/turtle
- **Packaging format**: application/tar
- **Compression format**: application/gzip
@@ -255,7 +279,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/
- Jelly distribution ([rb:jellyDistribution](https://w3id.org/riverbench/schema/metadata#jellyDistribution))
- Partial distribution ([rb:partialDistribution](https://w3id.org/riverbench/schema/metadata#partialDistribution))
- **Has stream element count**: 1,000,000
-- **Byte size**: 1.19 GB
+- **Byte size**: 1.2 GB
- **Media type**: application/x-jelly-rdf
- **Compression format**: application/gzip
- **Checksum**:
@@ -283,7 +307,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/
- Flat distribution ([rb:flatDistribution](https://w3id.org/riverbench/schema/metadata#flatDistribution))
- Partial distribution ([rb:partialDistribution](https://w3id.org/riverbench/schema/metadata#partialDistribution))
- **Has stream element count**: 1,000,000
-- **Byte size**: 1.16 GB
+- **Byte size**: 1.2 GB
- **Media type**: application/n-triples
- **Compression format**: application/gzip
- **Checksum**:
@@ -311,7 +335,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/
- Partial distribution ([rb:partialDistribution](https://w3id.org/riverbench/schema/metadata#partialDistribution))
- Stream distribution ([rb:streamDistribution](https://w3id.org/riverbench/schema/metadata#streamDistribution))
- **Has stream element count**: 100,000
-- **Byte size**: 43.17 MB
+- **Byte size**: 43.2 MB
- **Media type**: text/turtle
- **Packaging format**: application/tar
- **Compression format**: application/gzip
@@ -345,7 +369,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/
- Jelly distribution ([rb:jellyDistribution](https://w3id.org/riverbench/schema/metadata#jellyDistribution))
- Partial distribution ([rb:partialDistribution](https://w3id.org/riverbench/schema/metadata#partialDistribution))
- **Has stream element count**: 100,000
-- **Byte size**: 45.00 MB
+- **Byte size**: 45.0 MB
- **Media type**: application/x-jelly-rdf
- **Compression format**: application/gzip
- **Checksum**:
@@ -373,7 +397,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/
- Flat distribution ([rb:flatDistribution](https://w3id.org/riverbench/schema/metadata#flatDistribution))
- Partial distribution ([rb:partialDistribution](https://w3id.org/riverbench/schema/metadata#partialDistribution))
- **Has stream element count**: 100,000
-- **Byte size**: 48.82 MB
+- **Byte size**: 48.8 MB
- **Media type**: application/n-triples
- **Compression format**: application/gzip
- **Checksum**:
@@ -401,7 +425,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/
- Partial distribution ([rb:partialDistribution](https://w3id.org/riverbench/schema/metadata#partialDistribution))
- Stream distribution ([rb:streamDistribution](https://w3id.org/riverbench/schema/metadata#streamDistribution))
- **Has stream element count**: 10,000
-- **Byte size**: 3.00 MB
+- **Byte size**: 3.0 MB
- **Media type**: text/turtle
- **Packaging format**: application/tar
- **Compression format**: application/gzip
@@ -435,7 +459,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/
- Jelly distribution ([rb:jellyDistribution](https://w3id.org/riverbench/schema/metadata#jellyDistribution))
- Partial distribution ([rb:partialDistribution](https://w3id.org/riverbench/schema/metadata#partialDistribution))
- **Has stream element count**: 10,000
-- **Byte size**: 3.12 MB
+- **Byte size**: 3.1 MB
- **Media type**: application/x-jelly-rdf
- **Compression format**: application/gzip
- **Checksum**:
@@ -463,7 +487,7 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/
- Flat distribution ([rb:flatDistribution](https://w3id.org/riverbench/schema/metadata#flatDistribution))
- Partial distribution ([rb:partialDistribution](https://w3id.org/riverbench/schema/metadata#partialDistribution))
- **Has stream element count**: 10,000
-- **Byte size**: 3.43 MB
+- **Byte size**: 3.4 MB
- **Media type**: application/n-triples
- **Compression format**: application/gzip
- **Checksum**:
@@ -484,75 +508,79 @@ See also [the project documentation](https://web.archive.org/web/20201230151925/
- **Title**: Statistics for full distributions
-| | **Sum** | **Unique** | **Mean** | **St. dev.** | **Min.** | **Max.** |
+| | **Sum** | **Unique (approx.)** | **Mean** | **St. dev.** | **Min.** | **Max.** |
| --- | --: | --: | --: | --: | --: | --: |
| **IRIs** | 44,830,559 | 5,938,140 | 22.42 | 48.04 | 10 | 8,988 |
| **Blank nodes** | 0 | _N/A_ | 0.00 | 0.00 | 0 | 0 |
+| **Objects** | 69,535,884 | 14,154,385 | 34.77 | 141.46 | 7 | 8,985 |
| **Graphs** | 2,000,000 | 1 | 1.00 | 0.00 | 1 | 1 |
| **Statements** | 71,810,467 | _N/A_ | 35.91 | 141.51 | 8 | 8,987 |
| **Literals** | 55,180,959 | 9,633,580 | 27.59 | 132.67 | 5 | 5,121 |
| **Simple literals** | 55,180,959 | 9,633,580 | 27.59 | 132.67 | 5 | 5,121 |
| **Datatype literals** | 0 | 0 | 0.00 | 0.00 | 0 | 0 |
| **Language literals** | 0 | 0 | 0.00 | 0.00 | 0 | 0 |
+| **ASCII control chars** | 5,234 | _N/A_ | 0.00 | 0.94 | 0 | 503 |
| **Quoted triples** | 0 | _N/A_ | 0.00 | 0.00 | 0 | 0 |
| **Subjects** | 2,000,000 | 2,000,131 | 1.00 | 0.00 | 1 | 1 |
| **Predicates** | 28,476,853 | 24 | 14.24 | 0.95 | 8 | 19 |
-| **Objects** | 69,535,884 | 14,154,385 | 34.77 | 141.46 | 7 | 8,985 |
### Statistics for 1M distributions
- **Title**: Statistics for 1M distributions
-| | **Sum** | **Unique** | **Mean** | **St. dev.** | **Min.** | **Max.** |
+| | **Sum** | **Unique (approx.)** | **Mean** | **St. dev.** | **Min.** | **Max.** |
| --- | --: | --: | --: | --: | --: | --: |
| **IRIs** | 26,480,270 | 4,695,708 | 26.48 | 67.66 | 10 | 8,988 |
| **Blank nodes** | 0 | _N/A_ | 0.00 | 0.00 | 0 | 0 |
+| **Objects** | 41,489,263 | 9,106,902 | 41.49 | 199.70 | 7 | 8,985 |
| **Graphs** | 1,000,000 | 1 | 1.00 | 0.00 | 1 | 1 |
| **Statements** | 42,913,544 | _N/A_ | 42.91 | 199.75 | 8 | 8,987 |
| **Literals** | 29,668,659 | 4,871,239 | 29.67 | 187.48 | 5 | 5,121 |
| **Simple literals** | 29,668,659 | 4,871,239 | 29.67 | 187.48 | 5 | 5,121 |
| **Datatype literals** | 0 | 0 | 0.00 | 0.00 | 0 | 0 |
| **Language literals** | 0 | 0 | 0.00 | 0.00 | 0 | 0 |
+| **ASCII control chars** | 2 | _N/A_ | 0.00 | 0.00 | 0 | 2 |
| **Quoted triples** | 0 | _N/A_ | 0.00 | 0.00 | 0 | 0 |
| **Subjects** | 1,000,000 | 1,000,059 | 1.00 | 0.00 | 1 | 1 |
| **Predicates** | 13,660,885 | 24 | 13.66 | 0.93 | 8 | 19 |
-| **Objects** | 41,489,263 | 9,106,902 | 41.49 | 199.70 | 7 | 8,985 |
### Statistics for 100K distributions
- **Title**: Statistics for 100K distributions
-| | **Sum** | **Unique** | **Mean** | **St. dev.** | **Min.** | **Max.** |
+| | **Sum** | **Unique (approx.)** | **Mean** | **St. dev.** | **Min.** | **Max.** |
| --- | --: | --: | --: | --: | --: | --: |
| **IRIs** | 1,700,241 | 210,791 | 17.00 | 3.85 | 12 | 227 |
| **Blank nodes** | 0 | _N/A_ | 0.00 | 0.00 | 0 | 0 |
+| **Objects** | 2,165,891 | 940,274 | 21.66 | 29.45 | 10 | 3,101 |
| **Graphs** | 100,000 | 1 | 1.00 | 0.00 | 1 | 1 |
| **Statements** | 2,267,185 | _N/A_ | 22.67 | 29.35 | 10 | 3,101 |
| **Literals** | 1,849,721 | 828,050 | 18.50 | 28.73 | 7 | 3,037 |
| **Simple literals** | 1,849,721 | 828,050 | 18.50 | 28.73 | 7 | 3,037 |
| **Datatype literals** | 0 | 0 | 0.00 | 0.00 | 0 | 0 |
| **Language literals** | 0 | 0 | 0.00 | 0.00 | 0 | 0 |
+| **ASCII control chars** | 0 | _N/A_ | 0.00 | 0.00 | 0 | 0 |
| **Quoted triples** | 0 | _N/A_ | 0.00 | 0.00 | 0 | 0 |
| **Subjects** | 100,000 | 100,006 | 1.00 | 0.00 | 1 | 1 |
| **Predicates** | 1,284,078 | 22 | 12.84 | 1.10 | 10 | 18 |
-| **Objects** | 2,165,891 | 940,274 | 21.66 | 29.45 | 10 | 3,101 |
### Statistics for 10K distributions
- **Title**: Statistics for 10K distributions
-| | **Sum** | **Unique** | **Mean** | **St. dev.** | **Min.** | **Max.** |
+| | **Sum** | **Unique (approx.)** | **Mean** | **St. dev.** | **Min.** | **Max.** |
| --- | --: | --: | --: | --: | --: | --: |
| **IRIs** | 170,859 | 29,880 | 17.09 | 6.29 | 13 | 207 |
| **Blank nodes** | 0 | _N/A_ | 0.00 | 0.00 | 0 | 0 |
+| **Objects** | 177,888 | 76,837 | 17.79 | 8.41 | 10 | 239 |
| **Graphs** | 10,000 | 1 | 1.00 | 0.00 | 1 | 1 |
| **Statements** | 193,178 | _N/A_ | 19.32 | 8.53 | 10 | 240 |
| **Literals** | 138,454 | 56,917 | 13.85 | 5.07 | 7 | 202 |
| **Simple literals** | 138,454 | 56,917 | 13.85 | 5.07 | 7 | 202 |
| **Datatype literals** | 0 | 0 | 0.00 | 0.00 | 0 | 0 |
| **Language literals** | 0 | 0 | 0.00 | 0.00 | 0 | 0 |
+| **ASCII control chars** | 0 | _N/A_ | 0.00 | 0.00 | 0 | 0 |
| **Quoted triples** | 0 | _N/A_ | 0.00 | 0.00 | 0 | 0 |
| **Subjects** | 10,000 | 9,999 | 1.00 | 0.00 | 1 | 1 |
| **Predicates** | 121,432 | 22 | 12.14 | 0.71 | 10 | 16 |
-| **Objects** | 177,888 | 76,837 | 17.79 | 8.41 | 10 | 239 |