forked from opensearch-project/documentation-website
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
adding classic token filter docs opensearch-project#7876
Signed-off-by: AntonEliatra <[email protected]>
- Loading branch information
1 parent
639cb38
commit a000c66
Showing
2 changed files
with
95 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
--- | ||
layout: default | ||
title: classic | ||
parent: Token filters | ||
nav_order: 150 | ||
--- | ||
|
||
# Classic token filter | ||
|
||
The primary function of the `classic` token filter is to work along side `classic` tokenizer and process tokens by applying several common transformations that help in text analysis and search. The transformations include: | ||
- Removal of possessive endings such as "'s", for example: "John's" becomes "John". | ||
- Separating words on internal hyphens, making terms like "co-operate" become tokens "co" and "operate". | ||
- Removal of "." from acronyms, for example: "D.A.R.P.A." becomes "DARPA". | ||
|
||
|
||
## Example | ||
|
||
Following is an example of how you can define an analyzer with the `classic` filter: | ||
|
||
```json | ||
PUT /custom_classic_filter | ||
{ | ||
"settings": { | ||
"analysis": { | ||
"analyzer": { | ||
"custom_classic": { | ||
"type": "custom", | ||
"tokenizer": "classic", | ||
"filter": ["classic"] | ||
} | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Generated tokens | ||
|
||
You can use the following command to examine the tokens being generated using the created analyzer: | ||
|
||
```json | ||
POST /custom_classic_filter/_analyze | ||
{ | ||
"analyzer": "custom_classic", | ||
"text": "John's co-operate was excellent." | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
The response contains the generated tokens: | ||
|
||
```json | ||
{ | ||
"tokens": [ | ||
{ | ||
"token": "John", | ||
"start_offset": 0, | ||
"end_offset": 6, | ||
"type": "<APOSTROPHE>", | ||
"position": 0 | ||
}, | ||
{ | ||
"token": "co", | ||
"start_offset": 7, | ||
"end_offset": 9, | ||
"type": "<ALPHANUM>", | ||
"position": 1 | ||
}, | ||
{ | ||
"token": "operate", | ||
"start_offset": 10, | ||
"end_offset": 17, | ||
"type": "<ALPHANUM>", | ||
"position": 2 | ||
}, | ||
{ | ||
"token": "was", | ||
"start_offset": 18, | ||
"end_offset": 21, | ||
"type": "<ALPHANUM>", | ||
"position": 3 | ||
}, | ||
{ | ||
"token": "excellent", | ||
"start_offset": 22, | ||
"end_offset": 31, | ||
"type": "<ALPHANUM>", | ||
"position": 4 | ||
} | ||
] | ||
} | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters