Skip to content

SemPub17_QueriesTask1

SaharVahdati edited this page Feb 14, 2017 · 13 revisions

More details and explanations will be gradually added to this page. Participants are invited to use the mailing list (https://groups.google.com/forum/#!forum/sempub-challenge) to comment, to ask questions, and to get in touch with chairs and other participants.

General information and rules

Participants are required to translate the input queries into SPARQL queries that can be executed against the produced LOD. The dataset can use any vocabulary but the query result output must conform with the rules described on this page.

Some preliminary information and general rules:

  • queries must produce a CSV output, according to the rules detailed below. The evaluation will be performed automatically by comparing this output (on the evaluation dataset) with the expected results.
  • IRIs of workshop volumes and papers must follow the following naming convention:
type of resource URI example
workshop volume http://ceur-ws.org/Vol-1010/
paper http://ceur-ws.org/Vol-1099/#paper3

Papers have fragment IDs like paper3 in the most recently published workshop proceedings. When processing older workshop proceedings, please derive such IDs from the filenames of the papers, by removing the PDF extension (e.g. paper3.pdfpaper3 or ldow2011-paper12.pdfldow2011-paper12 ).

  • IRIs of other resources (e.g. affilitations, funding agencies) must also be within the http://ceur-ws.org/ namespace, but in a path separate from http://ceur-ws.org/Vol-NNN/ for any number NNN.
  • the structure of the IRI used in the examples is not normative and does not provide any indication. Participants are free to use their own IRI structure and their own organization of classes and instances
  • All data relevant for the queries and available in the input dataset must be extracted and produced as output. Though the evaluation mechanisms will be implemented so as to take minor differences into account and to normalize them, participants are asked to extract as much as information as possible. Further details are given below for each query.
  • Since most of the queries take as input a paper (usually denoted as X), participants are required to use an unambiguous way of identifying input papers. To avoid errors, papers are identified by the URL of the PDF file, as available in http://ceur-ws-org.
  • The order of output records does not matter.

We do not provide further explanations for queries whose output looks clear. If they are not or there is any other issue, please feel free to ask on the mailing list.

Queries

Query Q1.1: Subjects of a dataset in a paper

Query: For each dataset D of FedBench in paper P, find the number of subjects X

E.g. For each dataset D of FedBench ran in paper http://ceur-ws.org/Vol-782/GoerlitzAndStaab_COLD2011.pdf, find the number of triples T.

NOTE: From the same table different values might be requested to be extracted during the evaluation but the dataset will remain the same.

Expected output format (CSV):

dataset, subjects 
rdfs:Literal;,xsd:decimal
rdfs:Literal;,xsd:decimal
[...]

Examples in TD

Some examples of output are shown below, others can be found in the training dataset files.

"DBpedia subset", "9.5"^^xsd:decimal "GeoNames", "7.48"^^xsd:decimal "LinkedMDB", "0.694"^^xsd:decimal "Jamendo", "0.336"^^xsd:decimal "New York Times", "21.7"^^xsd:decimal "SW Dog Food", "12"^^xsd:decimal "KEGG", "34.3"^^xsd:decimal "ChEBI", "50.5"^^xsd:decimal "Drugbank", "19.7"^^xsd:decimal "LS3", "9054"^^xsd:decimal "LS4", "3"^^xsd:decimal "LS5", "393"^^xsd:decimal "LS6", "28"^^xsd:decimal "LS7", "144"^^xsd:decimal

Query Q1.2: Test cases

Query Q1.2: _ Which are the test cases T for OAEI in year Y? _

Expected output format (CSV):

test case, year
rdfs:Literal;,xsd:integer
[...]
  • Run the query for Y=2016

Examples in TD

"benchmarks", "2016"^^xsd:integer "anatomy", "2016"^^xsd:integer "conference", "2016"^^xsd:integer "largebio", "2016"^^xsd:integer "phenotype", "2016"^^xsd:integer "multifarm", "2016"^^xsd:integer "interactive", "2016"^^xsd:integer "process model", "2016"^^xsd:integer "instance", "2016"^^xsd:integer

Query Q1.3: Common test cases of two editions

Query Q1.3: _ Which test cases T did all editions have in common from year Y1 to Y2? _

Expected output format (CSV):

test case
<IRI>,rdfs:Literal
[...]
  • Run the query for Y1=2010, Y2=2013

Examples in TD

"benchmark"
"anatomy"
"conference"
"iimdb"

Query Q1.4: Persistant participants

Query Q1.4: _ Which participants P participated in all editions from Y1 to Y2? _

Expected output format (CSV):

test case
rdfs:Literal
[...]
  • Run the query for Y1=2010, Y2=2013

Examples in TD

"Aroma" "CODI"

Query Q1.5: Repeated test cases

Query Q1.5: _ Which participants P have addressed test case T for every edition from Y1 to Y2? _

Expected output format (CSV):

 test case
 rdfs:Literal
[...]
  • Run the query for T=anatomy, Y1=2010, Y2=2013

Examples in TD

"CODI"

Query Q1.6: Information of datasets

Query Q1.6: _ How many X did test dataset D have in year Y? _

e.g. How many entities X did test dataset D have in 2016?

Expected output format (CSV):

dataset, entities
rdfs:Literal;,xsd:integer
[...]
  • Run the query for Y=2016

Examples in TD

"biblio", "209"^^xsd:integer "film", "284"^^xsd:integer

Query Q1.7: Information of systems for datasets

Query Q1.7: _ What was the X of system S in year Y for test dataset D? _

Find one of the measuere: precision/recall/F-measure. e.g. What was the precision of system S in year 2016 for test case T?

Expected output format (CSV):

table-iri, system, dataset, precision
<IRI>,rdfs:Literal;,rdfs:Literal;,xsd:decimal
[...]
  • Run the query for X=precision, Y=2016

Examples in TD

"edna", "biblio", "0.35"^^xsd:decimal "edna", "film", "0.43"^^xsd:decimal "AML", "biblio", "1"^^xsd:decimal "CroMatcher", "biblio", "0.96"^^xsd:decimal "Lily", "biblio", "0.97"^^xsd:decimal "Lily", "film", "0.97"^^xsd:decimal "LogMap", "biblio", "0.93"^^xsd:decimal "LogMap", "film", "0.83"^^xsd:decimal "LogMapLt", "biblio", "0.43"^^xsd:decimal "PhenoMF", "film", "0.03"^^xsd:decimal "PhenoMF", "biblio", "0.03"^^xsd:decimal "PhenoMM", "film", "0.03"^^xsd:decimal "PhenoMM", "biblio", "0.03"^^xsd:decimal "PhenoMP", "film", "0.02"^^xsd:decimal "PhenoMP", "biblio", "0.03"^^xsd:decimal "XMap", "film", "0.95"^^xsd:decimal "XMap", "biblio", "0.78"^^xsd:decimal "LogMapBio", "film", "0.48"^^xsd:decimal "LogMapBio", "biblio", "0.59"^^xsd:decimal

Query Q1.8: Systems best performance

Query Q1.8: _ Which system S in year Y had the best X ever? _

(X=runtime/size/precision/F-measure/recall) e.g. Which system S in year 2016 had the best runtime R ever?

Expected output format (CSV):

system, runtime
rdfs:Literal;,xsd:integer
[...]

Examples in TD

"LogMapLite", "20"^^xsd:integer

Query Q1.9: All tools ever mentioned in one edition of QALD

Query Q1.9: _ List the names of all systems that have ever been used in an experiment in an edition of QALD. _

Expected output format (CSV):

system, year
rdfs:Literal;,xsd:integer
[...]

Examples in TD

"Xser", "2014"^^xsd:integer "gAnswer", "2014"^^xsd:integer "CASIA", "2014"^^xsd:integer "Intui3", "2014"^^xsd:integer "ISOFT", "2014"^^xsd:integer "RO_FII", "2014"^^xsd:integer "GFMed", "2014"^^xsd:integer "POMELO", "2014"^^xsd:integer

Query Q1.10: All tools ever mentioned in one edition of QALD

Query Q1.10: _ For Multilingual QA in the QALD challenge, what system in what edition had the best precision, and what system in what edition had the worst recall? _

Expected output format (CSV):

system, year of best precision , year of worst recall, 
rdfs:Literal;,xsd:integer;,<IRI>,xsd:integer
[...]
  • Run the query for Y=2013, Y=2014, Y=2015, Y=2015

Examples in TD

"Xser", "2015"^^xsd:integer, "2014"^^xsd:integer
"qAnswer", "2015"^^xsd:integer, "2015"^^xsd:integer
Clone this wiki locally