-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blocking seems not to be working #60
Comments
Hi Sotirios, |
Where should this attribute be? We are using a mix of distance measures (levenstein, jarro, dice, jaccard etc) with thresholds 0.2. |
you find it in every Compare element, example: |
We included |
After taking a look into the code, I think the blocking is activated by default, even if you don't add the you can also try with |
I have tried running SiLK single machine in different OS. |
This seems to be realy odd. Normally |
Thank you for your time. <?xml version="1.0"?>
<Silk>
<Prefixes>
<Prefix id="rdfs" namespace="http://www.w3.org/2000/01/rdf-schema#"/>
<Prefix id="xsd" namespace="http://www.w3.org/2001/XMLSchema#"/>
<Prefix id="owl" namespace="http://www.w3.org/2002/07/owl#"/>
<Prefix id="rdf" namespace="http://www.w3.org/1999/02/22-rdf-syntax-ns#"/>
<Prefix id="sesame" namespace="http://www.openrdf.org/schema/sesame#"/>
<Prefix id="fn" namespace="http://www.w3.org/2005/xpath-functions#"/>
<Prefix id="skos" namespace="http://www.w3.org/2004/02/skos/core#"/>
</Prefixes>
<DataSources>
<DataSource id="codelist1" type="file">
<Param name="file" value="source.rdf"/>
<Param name="format" value="RDF/XML"/>
</DataSource>
<DataSource id="codelist2" type="file">
<Param name="file" value="target.rdf"/>
<Param name="format" value="RDF/XML"/>
</DataSource>
</DataSources>
<Blocking enabled="false" />
<Interlinks>
<Interlink id="labels">
<SourceDataset dataSource="codelist1" var="a">
<RestrictTo></RestrictTo>
</SourceDataset>
<TargetDataset dataSource="codelist2" var="b">
<RestrictTo></RestrictTo>
</TargetDataset>
<LinkageRule linkType="skos:closeMatch">
<Aggregate type="max">
<Compare metric="levenshtein" threshold="0.20" >
<TransformInput function="lowerCase">
<Input path="?a/skos:prefLabel"/>
</TransformInput>
<TransformInput function="lowerCase">
<Input path="?b/skos:prefLabel"/>
</TransformInput>
</Compare>
<Compare metric="jaro" threshold="0.20" >
<TransformInput function="lowerCase">
<Input path="?a/skos:prefLabel"/>
</TransformInput>
<TransformInput function="lowerCase">
<Input path="?b/skos:prefLabel"/>
</TransformInput>
</Compare>
<Compare metric="jaroWinkler" threshold="0.20" >
<TransformInput function="lowerCase">
<Input path="?a/skos:prefLabel"/>
</TransformInput>
<TransformInput function="lowerCase">
<Input path="?b/skos:prefLabel"/>
</TransformInput>
</Compare>
<Compare metric="jaccard" threshold="0.20" >
<TransformInput function="lowerCase">
<Input path="?a/skos:prefLabel"/>
</TransformInput>
<TransformInput function="lowerCase">
<Input path="?b/skos:prefLabel"/>
</TransformInput>
</Compare>
<Compare metric="dice" threshold="0.20" >
<TransformInput function="lowerCase">
<Input path="?a/skos:prefLabel"/>
</TransformInput>
<TransformInput function="lowerCase">
<Input path="?b/skos:prefLabel"/>
</TransformInput>
</Compare>
<Compare metric="softjaccard" threshold="0.20" >
<TransformInput function="lowerCase">
<Input path="?a/skos:prefLabel"/>
</TransformInput>
<TransformInput function="lowerCase">
<Input path="?b/skos:prefLabel"/>
</TransformInput>
</Compare>
</Aggregate>
<Filter limit="10"/>
</LinkageRule>
</Interlink>
</Interlinks>
<Outputs>
<Output id="suggestions" type="file" minConfidence="0.5">
<Param name="file" value="top10_project5.nt"/>
<Param name="format" value="N-TRIPLE"/>
</Output>
<Output id="exactMatch" type="file" minConfidence="1">
<Param name="file" value="exact_project5.nt"/>
<Param name="format" value="N-TRIPLE"/>
</Output>
<Output id="score" type="alignment" minConfidence="0.5" maxConfidence="1">
<Param name="file" value="score_project5.rdf"/>
<Param name="format" value="RDF/XML"/>
</Output>
</Outputs>
</Silk> |
If we get |
Hi, we have been using the Silk Single machine to create some links between two datasets. We would like to enable Blocking to reduce running times. But nothing seems to happen.
While here is denoted that blocking should be enabled by adding
[<Blocking blocks="100" />]
Java throws an error about mailformed configuration.
We changed it to
<Blocking blocks="100" />
Silk seems to running but there is not any reduction in running times. Is it actually use it but has no effect because of our data? Or wrong configuration?
The text was updated successfully, but these errors were encountered: