-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
6.6.2 Enhanced Statistics uses LinkSet wrongly #81
Comments
Cannot comment on propertyPartition, classPartition and distinctSubjects, but use of LinkSet and linkPredicate appear to be wrong. From http://www.w3.org/TR/void/#linkset:
and
|
disagree. a void:Dataset is a set of RDF triples. a void:Linkset is a collection of RDF triples between two datasets. Therefore, we can create Linksets between any arbitrary datasets. |
Yes, but the section I'm quoting doesn't talk about 2 datasets. It appears to want to provide some stats of 1 dataset, and uses wrong class and property. |
A dataset is any set of triples. in the formulation for the enhanced statistics, we describe a set of relations (i.e. linkset) between arbitrary partitions of a dataset. each partition is a dataset in its own right (see void:subset). i think this approach is justifiable, and falls within the scope of VoID constructs provided. You seem not to agree - could you provide an alternative formulation? |
Not true. Eg section properties and the number of unique objects linked to the property shows this query:
Where do you see 2 arbitrary (i.e. independent) partitions here? The right way to express this is (see http://www.w3.org/TR/void/#statistics):
This counts any objects (URIs, blank nodes, literals), as per the above query and the VOID spec. If you want to count only resources, see http://www.w3.org/TR/void/#class-property-partitions and use rdfs:Resource (not rdfs:Class):
The key to understanding the above is that both void:propertyPartition and void:classPartition create sub-datasets, which are sets of triples. So it's legitimate to speak of the void:distinctObjects of those triples. |
We need to specify so the reason we started using the linkset was because of "void:subjectsTarget" and "void:objectsTarget" to specify both the subject and target class partitions. Can you elaborate on how we can get this kind of functionality using a void:propertyPartition? |
Dear Michel, I cannot see any query in the quoted section that reports on property and two classes. The closest query that I see is: unique subject types that are linked through a property to unique object types:
It counts distinct subjects and objects per property. This can be reported as follows:
However, the same query seems to want to (incorrectly) report on property and two classes:
To make such a report, you need to use the http://ldf.fi/void-ext ontology (see here for a tool implementing such counts: http://jiemakel.github.io/aether/, and a paper explaning it), eg like this:
Above we use:
Note that if you have some subclass or subproperty inference in the repository, those partitions won't be exclusive... |
so the objectClassPartition is a property of the classPartition? and the void:triples are associated with the objectClassPartition? strange. |
void-ext:objectClassPartition is analogous to void:classPartition: they make a subset (both are subprops of void:subset). The difference is that objectClassPartition restricts the Objects of triples in the subset, whereas classPartition restricts the Subjects. This needs to be qualified: http://www.w3.org/TR/void/#class-property-partitions says "The (classPartition) contains all triples that describe entities that have this class as their rdf:type". Is it true that the word "describe" means "have as subject"? SPARQL deliberately leaves freedom about how a "DESCRIBE ?s" query is implemented. Most repos return Concise Bounded Description (CBD), which includes all "?s ?p ?o" triples, but also all triples "?s ?p1 ?blank. ?blank ?p2 ?o" where ?blank is a blank node (recursively); and "?statement rdf:subject ?s. ?statement ?p ?o" (i.e. all reified statements about ?s). Others even return Symmetric CBD, which includes statements where ?s is Object.
No: objectClassPartition can be applied against and void:Dataset, no matter whether it's the result of a partition or not. The subsets being void:Dataset, you can subdivide them further. You can swap the order/nesting of the propertyPartition, classPartition, objectClassPartition and still get almost the same results. At each level, you need to describe the parameter of partition: void:property and void:class (twice). By "almost" I refer to the ambiguity of "describe" above. You also need to be careful about literals: if your repo does not automagically declare all literals to be of class rdf:Literal, then objectClassPartition will skip all data triples (having a literal as their object). And "declare literals as rdf:Literal" means eg "123 a rdf:Literal" which is weird, because in RDF 1.0 literals cannot be the subject of a statement (maybe RDF 1.1 allows that) |
Hi, how does that look? |
@VladimirAlexiev can you have a look at the diff? |
Use that:
Cheers! |
@VladimirAlexiev ok, i have made the edits. can you verify the correctness for each statistic? |
Thanks for adding me to the contributors! Could you please change it to this:
|
done. |
Please ensure that the examples both within the document and hcls.ttl are updated. (Relates to issue #89) |
I'll look at the IO Informatics use case and will harmonize it in accordance with the guidelines |
I sent a note to Vladimir asking him to verify what Michel did (followup to #81 (comment)). |
@VladimirAlexiev can you have another look at the latest? |
refactored statistics have now been merged as per commit e85578a |
Sec 6.6.2 uses LinkSet to provide
This is totally wrong: void:LinkSet and void:linkPredicate are used to describe links between datasets, not counts within one dataset.
You should use void:propertyPartition (and maybe void:classPartition within it) and void:distinctSubjects.
The text was updated successfully, but these errors were encountered: