-
Notifications
You must be signed in to change notification settings - Fork 2
SemPub17_QueriesTask1
More details and explanations will be gradually added to this page. Participants are invited to use the mailing list (https://groups.google.com/forum/#!forum/sempub-challenge) to comment, to ask questions, and to get in touch with chairs and other participants.
Participants are required to translate the input queries into SPARQL queries that can be executed against the produced LOD. The dataset can use any vocabulary but the query result output must conform with the rules described on this page.
Some preliminary information and general rules:
- queries must produce a CSV output, according to the rules detailed below. The evaluation will be performed automatically by comparing this output (on the evaluation dataset) with the expected results.
- IRIs of workshop volumes and papers must follow the following naming convention:
type of resource | URI example |
---|---|
workshop volume | http://ceur-ws.org/Vol-1010/ |
paper | http://ceur-ws.org/Vol-1099/#paper3 |
Papers have fragment IDs like paper3
in the most recently published workshop proceedings. When processing older workshop proceedings, please derive such IDs from the filenames of the papers, by removing the PDF extension (e.g. paper3.pdf
→ paper3
or ldow2011-paper12.pdf
→ ldow2011-paper12
).
- IRIs of other resources (e.g. affilitations, funding agencies) must also be within the http://ceur-ws.org/ namespace, but in a path separate from http://ceur-ws.org/Vol-NNN/ for any number NNN.
- the structure of the IRI used in the examples is not normative and does not provide any indication. Participants are free to use their own IRI structure and their own organization of classes and instances
- All data relevant for the queries and available in the input dataset must be extracted and produced as output. Though the evaluation mechanisms will be implemented so as to take minor differences into account and to normalize them, participants are asked to extract as much as information as possible. Further details are given below for each query.
- Since most of the queries take as input a paper (usually denoted as X), participants are required to use an unambiguous way of identifying input papers. To avoid errors, papers are identified by the URL of the PDF file, as available in http://ceur-ws-org.
- The order of output records does not matter.
We do not provide further explanations for queries whose output looks clear. If they are not or there is any other issue, please feel free to ask on the mailing list.
Queries for Task 1 of the Semantic Publishing Challenge
Query: For each dataset D of FedBench in paper P, find the number of subjects X
E.g. For each dataset D of FedBench ran in paper http://ceur-ws.org/Vol-782/GoerlitzAndStaab_COLD2011.pdf, find the number of triples T.
NOTE: From the same table different values might be requested to be extracted during the evaluation but the dataset will remain the same.
Expected output format (CSV):
dataset, subjects rdfs:Literal;,xsd:decimal rdfs:Literal;,xsd:decimal [...]
- Use the following values to run the query:
- X=subjects, P=http://ceur-ws.org/Vol-782/GoerlitzAndStaab_COLD2011.pdf
Some examples of output are shown below, others can be found in the training dataset files.
"DBpedia subset", "9.5"^^xsd:decimal "GeoNames", "7.48"^^xsd:decimal "LinkedMDB", "0.694"^^xsd:decimal "Jamendo", "0.336"^^xsd:decimal "New York Times", "21.7"^^xsd:decimal "SW Dog Food", "12"^^xsd:decimal "KEGG", "34.3"^^xsd:decimal "ChEBI", "50.5"^^xsd:decimal "Drugbank", "19.7"^^xsd:decimal "LS3", "9054"^^xsd:decimal "LS4", "3"^^xsd:decimal "LS5", "393"^^xsd:decimal "LS6", "28"^^xsd:decimal "LS7", "144"^^xsd:decimal
Query Q1.2: _ Which are the test cases T for OAEI in year Y? _
Expected output format (CSV):
test case, year rdfs:Literal;,xsd:integer [...]
- Run the query for Y=2016
"benchmarks", "2016"^^xsd:integer "anatomy", "2016"^^xsd:integer "conference", "2016"^^xsd:integer "largebio", "2016"^^xsd:integer "phenotype", "2016"^^xsd:integer "multifarm", "2016"^^xsd:integer "interactive", "2016"^^xsd:integer "process model", "2016"^^xsd:integer "instance", "2016"^^xsd:integer
Query Q1.3: _ Which test cases T did all editions have in common from year Y1 to Y2? _
Expected output format (CSV):
test case <IRI>,rdfs:Literal [...]
- Run the query for Y1=2010, Y2=2013
"benchmark"
"anatomy"
"conference"
"iimdb"
Query Q1.4: _ Which participants P participated in all editions from Y1 to Y2? _
Expected output format (CSV):
test case rdfs:Literal [...]
- Run the query for Y1=2010, Y2=2013
"Aroma" "CODI"
Query Q1.5: _ Which participants P have addressed test case T for every edition from Y1 to Y2? _
Expected output format (CSV):
test case rdfs:Literal [...]
- Run the query for T=anatomy, Y1=2010, Y2=2013
"CODI"
Query Q1.6: _ How many X did test dataset D have in year Y? _
e.g. How many entities X did test dataset D have in 2016?
Expected output format (CSV):
dataset, entities rdfs:Literal;,xsd:integer [...]
- Run the query for Y=2016
"biblio", "209"^^xsd:integer "film", "284"^^xsd:integer
Query Q1.7: _ What was the X of system S in year Y for test dataset D? _
Find one of the measuere: precision/recall/F-measure. e.g. What was the precision of system S in year 2016 for test case T?
Expected output format (CSV):
table-iri, system, dataset, precision <IRI>,rdfs:Literal;,rdfs:Literal;,xsd:decimal [...]
- Run the query for X=precision, Y=2016
"edna", "biblio", "0.35"^^xsd:decimal "edna", "film", "0.43"^^xsd:decimal "AML", "biblio", "1"^^xsd:decimal "CroMatcher", "biblio", "0.96"^^xsd:decimal "Lily", "biblio", "0.97"^^xsd:decimal "Lily", "film", "0.97"^^xsd:decimal "LogMap", "biblio", "0.93"^^xsd:decimal "LogMap", "film", "0.83"^^xsd:decimal "LogMapLt", "biblio", "0.43"^^xsd:decimal "PhenoMF", "film", "0.03"^^xsd:decimal "PhenoMF", "biblio", "0.03"^^xsd:decimal "PhenoMM", "film", "0.03"^^xsd:decimal "PhenoMM", "biblio", "0.03"^^xsd:decimal "PhenoMP", "film", "0.02"^^xsd:decimal "PhenoMP", "biblio", "0.03"^^xsd:decimal "XMap", "film", "0.95"^^xsd:decimal "XMap", "biblio", "0.78"^^xsd:decimal "LogMapBio", "film", "0.48"^^xsd:decimal "LogMapBio", "biblio", "0.59"^^xsd:decimal
Query Q1.8: _ Which system S in year Y had the best X ever? _
(X=runtime/size/precision/F-measure/recall) e.g. Which system S in year 2016 had the best runtime R ever?
- Run the query for X=runtime, Y=2016 on http://ceur-ws.org/Vol-1766/oaei16_paper0.pdf
Expected output format (CSV):
system, runtime rdfs:Literal;,xsd:integer [...]
"LogMapLite", "20"^^xsd:integer
Query Q1.9: _ List the names of all systems that have ever been used in an experiment in an edition of QALD. _
Expected output format (CSV):
system, year rdfs:Literal;,xsd:integer [...]
- Run the query for Y=2014, http://ceur-ws.org/Vol-1180/CLEF2014wn-QA-UngerEt2014.pdf
"Xser", "2014"^^xsd:integer "gAnswer", "2014"^^xsd:integer "CASIA", "2014"^^xsd:integer "Intui3", "2014"^^xsd:integer "ISOFT", "2014"^^xsd:integer "RO_FII", "2014"^^xsd:integer "GFMed", "2014"^^xsd:integer "POMELO", "2014"^^xsd:integer
Query Q1.10: _ For Multilingual QA in the QALD challenge, what system in what edition had the best precision, and what system in what edition had the worst recall? _
Expected output format (CSV):
system, year of best precision , year of worst recall, rdfs:Literal;,xsd:integer;,<IRI>,xsd:integer [...]
- Run the query for Y=2013, Y=2014, Y=2015, Y=2015
"Xser", "2015"^^xsd:integer, "2014"^^xsd:integer
"qAnswer", "2015"^^xsd:integer, "2015"^^xsd:integer