-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add apostrophe token filter page #7871 #7884
Merged
kolchfa-aws
merged 11 commits into
opensearch-project:main
from
AntonEliatra:apostrophe-token-filter
Aug 6, 2024
Merged
Changes from 2 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
f443445
adding apostrophe token filter page #7871
AntonEliatra cf684e4
fixing vale error
AntonEliatra 8eab70a
Update apostrophe-token-filter.md
AntonEliatra 7e81e2b
updating the naming
AntonEliatra 5005621
Merge branch 'apostrophe-token-filter' of github.com:AntonEliatra/doc…
AntonEliatra 5d5570c
updating as per the review comments
AntonEliatra f7b8403
updating the heading to Apostrophe token filter
AntonEliatra 66e1157
updating as per PR comments
AntonEliatra e59cb9b
Apply suggestions from code review
AntonEliatra f0e30c0
Apply suggestions from code review
AntonEliatra 1a6d18f
Merge branch 'main' into apostrophe-token-filter
kolchfa-aws File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
--- | ||
layout: default | ||
title: Apostrophe token filter | ||
parent: Token filters | ||
nav_order: 110 | ||
--- | ||
|
||
# Apostrophe token filter | ||
|
||
The `apostrophe` token filter's primary function is to remove possessive apostrophes and anything following them. This can be very useful in analyzing text in languages which rely heavily on apostrophes, such as Turkish, where apostrophes serves to separate the root word from suffixes, including possessive suffixes, case markers, and other grammatical endings. | ||
|
||
|
||
## Example | ||
|
||
Following example can be used to create new index `custom_text_index` with custom analyzer configured in `settings` and used in `mappings`. | ||
|
||
``` | ||
PUT /custom_text_index | ||
{ | ||
"settings": { | ||
"analysis": { | ||
"analyzer": { | ||
"custom_analyzer": { | ||
"type": "custom", | ||
"tokenizer": "standard", # splits text into words | ||
"filter": [ | ||
"lowercase", | ||
"apostrophe" | ||
] | ||
} | ||
} | ||
} | ||
}, | ||
"mappings": { | ||
"properties": { | ||
"content": { | ||
"type": "text", | ||
"analyzer": "custom_analyzer" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
## Check generated tokens | ||
|
||
You can use the following command to examine the tokens being generated using the created analyzer. | ||
|
||
``` | ||
POST /custom_text_index/_analyze | ||
{ | ||
"analyzer": "custom_analyzer", | ||
"text": "John's car is faster than Peter's bike" | ||
} | ||
``` | ||
|
||
Expected result: | ||
|
||
``` | ||
{ | ||
"tokens": [ | ||
{ | ||
"token": "john", | ||
"start_offset": 0, | ||
"end_offset": 6, | ||
"type": "<ALPHANUM>", | ||
"position": 0 | ||
}, | ||
{ | ||
"token": "car", | ||
"start_offset": 7, | ||
"end_offset": 10, | ||
"type": "<ALPHANUM>", | ||
"position": 1 | ||
}, | ||
{ | ||
"token": "is", | ||
"start_offset": 11, | ||
"end_offset": 13, | ||
"type": "<ALPHANUM>", | ||
"position": 2 | ||
}, | ||
{ | ||
"token": "faster", | ||
"start_offset": 14, | ||
"end_offset": 20, | ||
"type": "<ALPHANUM>", | ||
"position": 3 | ||
}, | ||
{ | ||
"token": "than", | ||
"start_offset": 21, | ||
"end_offset": 25, | ||
"type": "<ALPHANUM>", | ||
"position": 4 | ||
}, | ||
{ | ||
"token": "peter", | ||
"start_offset": 26, | ||
"end_offset": 33, | ||
"type": "<ALPHANUM>", | ||
"position": 5 | ||
}, | ||
{ | ||
"token": "bike", | ||
"start_offset": 34, | ||
"end_offset": 38, | ||
"type": "<ALPHANUM>", | ||
"position": 6 | ||
} | ||
] | ||
} | ||
``` | ||
|
||
The built in `apostrophe` token filter is not suitable for languages such as French, as the apostrophes are used at the beginning of the words, for example `"C'est l'amour de l'école"` will result in four tokens: "C", "l", "de", "l". | ||
{: .note} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kolchfa-aws Thank you for the feedback, thats now updated