Fix duplicated values (fixes #28) #30

ckindermann · 2024-09-27T01:27:08Z

Jena's internal model doesn't treat an RDF graph as a set of triples. Instead, repeated triples (meaning triples with the same subject, predicate, and object) are represented as different Java objects, even though they are the same w.r.t. equals. It seems that Jena's internal model repeats triples for blank nodes annotated with rdf:nodeID="genid3", leading to duplicates in our LDTab output.

The proposed change fixes this issue. However, it would prevent us from round-tripping files with duplicate triples. This is not exactly desirable. I'll see whether I can come up with a better solution. At least we know where the 'bug' is located.

ckindermann · 2024-09-27T23:37:11Z

This is a tricky issue - I don't think there is a 'correct' way to solve this.

The RDF specification says that "[Identifiers] are not persistent or portable [...] for blank nodes. Blank node identifiers are not part of the RDF abstract syntax, but are entirely dependent on the concrete syntax or implementation."

This means we are not required to handle blank node identifiers (such as rdf:nodeID="genid3") in LDTab as far as the RDF spec is concerned. However, handling blank node identifiers is necessary if we want to offer a perfect round-trip service (without some normalization procedure). In other words, we'd need to assign blank node structures (with "datatype":"_JSON") an ID ... but this obviously goes against the design of LDTab to eliminate blank nodes where possible.

So, we have three options:

RDF as sets of triples: Accept commit f039757 as is (and drop support for persisting any duplicate triples)
RDF with duplicates: Change f039757 to only remove duplicated triples where the subject is a blank node (so there is still support for persisting duplicated triples - but we don't offer support for persisting blank node identifiers)
Full support for blank node identifiers: introduce a meta key to persist blank node IDs when it matters.

jamesaoverton · 2024-09-30T15:58:43Z

We discussed this on a call, and tentatively agreed on 1.

Fix duplicated values (fixes #28)

f039757

ckindermann linked an issue Sep 27, 2024 that may be closed by this pull request

Bug: Assigning a blank node an rdf:nodeID can lead to duplicates in LDTab #28

Open

ckindermann marked this pull request as ready for review October 28, 2024 05:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix duplicated values (fixes #28) #30

Fix duplicated values (fixes #28) #30

ckindermann commented Sep 27, 2024

ckindermann commented Sep 27, 2024 •

edited

Loading

jamesaoverton commented Sep 30, 2024

Fix duplicated values (fixes #28) #30

Are you sure you want to change the base?

Fix duplicated values (fixes #28) #30

Conversation

ckindermann commented Sep 27, 2024

ckindermann commented Sep 27, 2024 • edited Loading

jamesaoverton commented Sep 30, 2024

ckindermann commented Sep 27, 2024 •

edited

Loading