Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add minor fixes to MarkDown documentation #458

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 58 additions & 25 deletions doc/Plugins.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# Data Sources
# Plugins

## Data Sources

- Sparql Endpoints
- RDF files
Expand All @@ -7,10 +9,11 @@
- CSV files
- MySQL database

# Similarity Measures
## Similarity Measures

The following similarity measures are included:
## Characterbased

### Character based

Character-based distance measures compare strings on the character level. They are well suited for
handling typographical errors.
Expand All @@ -24,10 +27,10 @@ handling typographical errors.
| qGrams([q: Int = '2'], [minChar: Char = '0'], [maxChar: Char = 'z']) | qGrams | String similarity based on q-grams (by default q=2). |
| substring([granularity: String = '3']) | SubString | Return 0 to 1 for strong similarity to weak similarity |

## Tokenbased
### Token based

While character-based distance measures work well for typographical errors, there are a number of tasks where token-base distance measures are better suited:

While character-based distance measures work well for typographical
errors, there are a number of tasks where token-base distance measures are better suited:
- Strings where parts are reordered e.g. “John Doe” and “Doe, John”
- Texts consisting of multiple words

Expand All @@ -38,36 +41,44 @@ errors, there are a number of tasks where token-base distance measures are bette
| jaccard() | Jaccard | Jaccard similarity coefficient. |
| softjaccard([maxDistance: Int = '1']) | Soft Jaccard | Soft Jaccard similarity coefficient. Same as Jaccard distance but values within an levenhstein distance of 'maxDistance' are considered equivalent. |
| tokenwiseDistance([ignoreCase: Boolean = 'true'], [metricName: String = 'levenshtein'], [splitRegex: String = '[\s\d\p{Punct}]+'], [stopwords: String = ''], [stopwordWeight: Double = '0.01'], [nonStopwordWeight: Double = '0.1'], [useIncrementalIdfWeights: Boolean = 'false'], [matchThreshold: Double = '0.0'], [orderingImpact: Double = '0.0'], [adjustByTokenLength: Boolean = 'false']) | Token-wise Distance | Token-wise string distance using the specified metric |
## Numeric

### Numeric

| Function and parameters | Name | Description |
| --- | --- | --- |
| date() | Date | The distance in days between two dates ('YYYY-MM-DD' format). |
| dateTime() | DateTime | Distance between two date time values (xsd:dateTime format) in seconds. |
| insideNumericInterval([separator: String]) | Inside numeric interval | Checks if a number is contained inside a numeric interval, such as '1900 - 2000' |
| num([minValue: Double = '-Infinity'], [maxValue: Double = 'Infinity']) | Numeric similarity | Computes the numeric distance between two numbers. |
| wgs84([unit: String = 'km']) | Geographical distance | Computes the geographical distance between two points. Author: Konrad Höffner (MOLE subgroup of Research Group AKSW, University of Leipzig) |
## Equality

### Equality

| Function and parameters | Name | Description |
| --- | --- | --- |
| constant([value: Double = '1.0']) | Constant | Always returns a constant similarity value. |
| equality() | Equality | Return 0 if strings are equal, 1 otherwise. |
| inequality() | Inequality | Return 1 if strings are equal, 0 otherwise. |
| lowerThan([orEqual: Boolean = 'false']) | LowerThan | Return 1 if the source value is lower than the target value, 0 otherwise. If both strings are numbers, numerical order is used for comparison. Otherwise, alphanumerical order is used |
| relaxedEquality() | RelaxedEquality | Return 1 if strings are equal, 0 otherwise. Lower/upper case and differences like ö/o, n/ñ, c/ç etc. are treated as equal. |
## Asian

### Asian

| Function and parameters | Name | Description |
| --- | --- | --- |
| cjkReadingDistance([minChar: Char = '0'], [maxChar: Char = 'z']) | CJK Reading Distance | CJK Reading Distance. |
| korean TranslitDistance([minChar: Char = '0'], [maxChar: Char = 'z']) | Korean translit distance | Transliterated Korean distance. |
| koreanPhonemeDistance([minChar: Char = '0'], [maxChar: Char = 'z']) | Korean phoneme distance | Korean phoneme distance. |

## Spatial Distances
### Spatial Distances

| Function and parameters | Name | Description |
| --- | --- | --- |
| CentroidDistanceMetric([blockingParameter: Double = 1.0]) | Centroid distance | Computes the distance between the centroids of two geometries in meters. |
| MinDistanceMetric([blockingParameter: Double = 1.0]) | Min distance | Computes the minimum distance between two geometries in meters. |

## Spatial Relations
### Spatial Relations

| Function and parameters | Name | Description |
| --- | --- | --- |
| SContainsMetric([blockingParameter: Double = 1.0]) | Spatial Contains | Computes the relation "contains" between two geometries. |
Expand All @@ -80,7 +91,8 @@ errors, there are a number of tasks where token-base distance measures are bette
| TouchesMetric([blockingParameter: Double = 1.0]) | Touches | Computes the relation "touches" between two geometries. |
| WithinMetric([blockingParameter: Double = 1.0]) | Within | Computes the relation "within" between two geometries. |

## Temporal Distances
### Temporal Distances

| Function and parameters | Name | Description |
| --- | --- | --- |
| DaysDistanceMetric([blockingParameter: Double = 1.0]) | Days distance | Computes the distance in days between two time periods or instants. |
Expand All @@ -91,7 +103,8 @@ errors, there are a number of tasks where token-base distance measures are bette
| SecsDistanceMetric([blockingParameter: Double = 1.0]) | Secs distance | Computes the distance in secs between two time periods or instants. |
| YearsDistanceMetric([blockingParameter: Double = 1.0]) | Years distance | Computes the distance in years between two time periods or instants. |

## Temporal Relations
### Temporal Relations

| Function and parameters | Name | Description |
| --- | --- | --- |
| AfterMetric([blockingParameter: Double = 1.0]) | After | Computes the relation "after" between two time periods or instants. |
Expand All @@ -108,21 +121,27 @@ errors, there are a number of tasks where token-base distance measures are bette
| TOverlapsMetric([blockingParameter: Double = 1.0]) | Temporal Overlaps | Computes the relation "overlaps" between two time periods or instants. |
| StartsMetric([blockingParameter: Double = 1.0]) | Starts | Computes the relation "starts" between two time periods or instants. |

# Transformations
## Transformations

The following transform and normalization functions are included:
## Replace

### Replace

| Function and parameters | Name | Description |
| --- | --- | --- |
| regexReplace(regex: String, replace: String) | Regex replace | Replace all occurrences of a regex "regex" with "replace" in a string. |
| replace(search: String, replace: String) | Replace | Replace all occurrences of a string "search" with "replace" in a string. |
## Combine

### Combine

| Function and parameters | Name | Description |
| --- | --- | --- |
| concat([glue: String = '']) | Concatenate | Concatenates strings from two inputs. |
| concatMultiValues([glue: String = ''], [removeDuplicates: Boolean = 'false']) | ConcatenateMultipleValues | Concatenates multiple values received for an input. If applied to multiple inputs, yields at most one value per input. Optionally removes duplicate values. |
| merge() | Merge | Merges the values of all inputs. |
## Normalize

### Normalize

| Function and parameters | Name | Description |
| --- | --- | --- |
| alphaReduce() | Alpha reduce | Strips all non-alphabetic characters from a string. |
Expand All @@ -134,7 +153,9 @@ The following transform and normalization functions are included:
| stripUriPrefix() | Strip URI prefix | Strips the URI prefix and decodes the remainder. Leaves values unchanged which don't start with 'http:' |
| trim() | Trim | Remove leading and trailing whitespaces. |
| upperCase() | Upper case | Converts a string to upper case. |
## Linguistic

### Linguistic

| Function and parameters | Name | Description |
| --- | --- | --- |
| NYSIIS([refined: Boolean = 'true']) | NYSIIS | NYSIIS phonetic encoding. Provided by the StringMetric library: http://rockymadden.com/stringmetric/ |
Expand All @@ -143,7 +164,9 @@ The following transform and normalization functions are included:
| soundex([refined: Boolean = 'true']) | Soundex | Soundex algorithm. Provided by the StringMetric library: http://rockymadden.com/stringmetric/ |
| spotlight() | Spotlight | Concatenates all values to a string and gets a weighted entity vector from the Spotlight service. |
| stem() | Stem | Stems a string using the Porter Stemmer. |
## Substring

### Substring

| Function and parameters | Name | Description |
| --- | --- | --- |
| stripPostfix(postfix: String) | Strip postfix | Strips a postfix of a string. |
Expand All @@ -152,29 +175,39 @@ The following transform and normalization functions are included:
| substring([beginIndex: Int = '0'], [endIndex: Int = '0']) | Substring | Returns a substring between 'beginIndex' (inclusive) and 'endIndex' (exclusive). If 'endIndex' is 0 (default), it is ignored and the entire remaining string starting with 'beginIndex' is returned. If 'endIndex' is negative, -endIndex characters are removed from the end.' |
| trim() | Trim | Remove leading and trailing whitespaces. |
| untilCharacter(untilCharacter: Char) | Until Character | Give a substring until the character given |
## Conversion

### Conversion

| Function and parameters | Name | Description |
| --- | --- | --- |
| convertCharset([sourceCharset: String = 'ISO-8859-1'], [targetCharset: String = 'UTF-8']) | Convert Charset | Convert the string from "sourceCharset" to "targetCharset". |
## Filter

### Filter

| Function and parameters | Name | Description |
| --- | --- | --- |
| filterByLength([min: Int = '0'], [max: Int = '2147483647']) | filter by length | Removes all strings that are shorter than 'min' characters and longer than 'max' characters. |
| filterByRegex(regex: String, [negate: Boolean = 'false']) | filter by regex | Removes all strings that do NOT match a regex. If 'negate' is true, only strings will be removed that match the regex. |
| removeEmptyValues() | Remove empty values | Removes empty values. |
| removeValues(blacklist: String) | Remove values | Removes values. |
## Tokenization

### Tokenization

| Function and parameters | Name | Description |
| --- | --- | --- |
| camelcasetokenizer() | Camel Case Tokenizer | Tokenizes a camel case string. That is it splits strings between a lower case characted and an upper case character. |
| tokenize([regex: String = '\s']) | Tokenize | Tokenizes all input values. |
## Numeric

### Numeric

| Function and parameters | Name | Description |
| --- | --- | --- |
| aggregateNumbers(operator: String) | Aggregate Numbers | Aggregates all numbers in this set using a mathematical operation. |
| compareNumbers([comparator: String = '<']) | Compare Numbers | Compares the numbers of two sets. |
| numReduce() | Numeric reduce | Strip all non-numeric characters from a string. |
## Date

### Date

| Function and parameters | Name | Description |
| --- | --- | --- |
| compareDates([comparator: String = '<']) | Compare Dates | Compares two dates. Returns 1 if the comparison yields true and 0 otherwise. If there are multiple dates in both sets, the comparator must be true for all dates. e.g. {2014-08-02,2014-08-03} < {2014-08-03} yields 0 as not all dates in the first set are smaller than in the second. Accepts one parameter: comparator: One of '<', '<=', '=', '>=', '>' |
Expand All @@ -184,8 +217,8 @@ The following transform and normalization functions are included:
| durationInSeconds() | Duration in Seconds | Converts an xsd:duration to seconds. |
| timeToDate() | Timestamp to date | convert Unix timestamp to xsd:date |

### Spatial

## Spatial
| Function and parameters | Name | Description |
| --- | --- | --- |
| AreaTransformer | Area Transformer | Returns the area of the input geometry. |
Expand Down
6 changes: 4 additions & 2 deletions doc/Readme.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
# Silk Documentation

- [Overview of the Silk Workbench](Workbench.md)
- [Overview of the Silk Workbench](Overview.md)
- [Linking with the Silk Workbench](Linking.md)
- [REST API](RestInterface.md)
- [Command Line](CommandLine.md)
- [Configuration](Configuration.md)
- [Internal API](Internal.md)
- [Linkage Rules](LinkageRules.md)
- [The Silk Link Specification Language](LinkSpecificationLanguage.md)
- [Available Plugins](Plugins.md)
Expand Down