Skip to content

Commit

Permalink
Added persian_stem. (#592)
Browse files Browse the repository at this point in the history
Signed-off-by: dblock <[email protected]>
  • Loading branch information
dblock authored Sep 30, 2024
1 parent 39efae2 commit 9771b03
Show file tree
Hide file tree
Showing 10 changed files with 160 additions and 62 deletions.
7 changes: 5 additions & 2 deletions .cspell
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ datarows
decompounder
Decompounder
dedup
deprovision
determinized
distilbert
DNFOF
Expand Down Expand Up @@ -75,6 +76,7 @@ kstem
kuromoji
Kuromoji
languageset
Léon
localstats
Lovins
lucene
Expand Down Expand Up @@ -137,6 +139,7 @@ Reindex
relo
reloadcerts
remotestore
reprovision
rerank
Rerank
Reranker
Expand Down Expand Up @@ -192,5 +195,5 @@ vectory
whoamiprotected
wordnet
Yrtsd
reprovision
deprovision
جامد
جامدات
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
- Added `/_bulk/stream` ([#584](https://github.com/opensearch-project/opensearch-api-specification/pull/584))
- Added `/_plugins/_ml/agents/_register`, `/_plugins/_ml/connectors/_create`, `DELETE /_plugins/_ml/agents/{agent_id}`, `DELETE /_plugins/_ml/connectors/{connector_id}` ([#228](https://github.com/opensearch-project/opensearch-api-specification/issues/228))
- Added the `context` query param to the `put_script` APIs ([#586](https://github.com/opensearch-project/opensearch-api-specification/pull/586))
- Added `persian_stem` filter ([#592](https://github.com/opensearch-project/opensearch-api-specification/pull/592))

### Changed

Expand Down
12 changes: 12 additions & 0 deletions spec/schemas/_common.analysis.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -474,6 +474,7 @@ components:
- $ref: '#/components/schemas/NoriPartOfSpeechTokenFilter'
- $ref: '#/components/schemas/PatternCaptureTokenFilter'
- $ref: '#/components/schemas/PatternReplaceTokenFilter'
- $ref: '#/components/schemas/PersianStemTokenFilter'
- $ref: '#/components/schemas/PorterStemTokenFilter'
- $ref: '#/components/schemas/PredicateTokenFilter'
- $ref: '#/components/schemas/RemoveDuplicatesTokenFilter'
Expand Down Expand Up @@ -894,6 +895,17 @@ components:
required:
- pattern
- type
PersianStemTokenFilter:
allOf:
- $ref: '#/components/schemas/TokenFilterBase'
- type: object
properties:
type:
type: string
enum:
- persian_stem
required:
- type
PorterStemTokenFilter:
allOf:
- $ref: '#/components/schemas/TokenFilterBase'
Expand Down
60 changes: 0 additions & 60 deletions tests/default/_core/analyze.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,63 +30,3 @@ chapters:
- Moneyball, directed by Bennett Miller
response:
status: 200
- synopsis: Apply a filter.
path: /_analyze
method: GET
request:
payload:
tokenizer: keyword
filter:
- uppercase
text: Moneyball
response:
status: 200
payload:
tokens:
- token: MONEYBALL
type: word
start_offset: 0
end_offset: 9
position: 0
- synopsis: Apply a character filter.
path: /_analyze
method: GET
request:
payload:
tokenizer: keyword
filter:
- lowercase
char_filter:
- html_strip
text: <b>Moneyball</b>
response:
status: 200
payload:
tokens:
- token: moneyball
type: word
start_offset: 3
end_offset: 16
position: 0
- synopsis: Combine a lowercase translation with a stop filter.
path: /_analyze
method: GET
request:
payload:
tokenizer: whitespace
filter:
- lowercase
- type: stop
stopwords:
- in
- to
text: Moneyball directed by Bennett Miller
response:
status: 200
payload:
tokens:
- token: moneyball
type: word
start_offset: 0
end_offset: 9
position: 0
23 changes: 23 additions & 0 deletions tests/default/_core/analyze/filter/asciifolding.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
$schema: ../../../../../json_schemas/test_story.schema.yaml

description: Test /_analyze with a filter.
version: '>= 2.17'
chapters:
- synopsis: Apply a asciifolding filter.
path: /_analyze
method: GET
request:
payload:
tokenizer: keyword
filter:
- asciifolding
text: Léon
response:
status: 200
payload:
tokens:
- token: Leon
type: word
start_offset: 0
end_offset: 4
position: 0
24 changes: 24 additions & 0 deletions tests/default/_core/analyze/filter/lowercase.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
$schema: ../../../../../json_schemas/test_story.schema.yaml

description: Test /_analyze with a filter.
chapters:
- synopsis: Apply a lowercase character filter.
path: /_analyze
method: GET
request:
payload:
tokenizer: keyword
filter:
- lowercase
char_filter:
- html_strip
text: <b>Moneyball</b>
response:
status: 200
payload:
tokens:
- token: moneyball
type: word
start_offset: 3
end_offset: 16
position: 0
23 changes: 23 additions & 0 deletions tests/default/_core/analyze/filter/persian_stem.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
$schema: ../../../../../json_schemas/test_story.schema.yaml

description: Test /_analyze with a filter.
version: '>= 2.17'
chapters:
- synopsis: Apply a persian_stem filter.
path: /_analyze
method: GET
request:
payload:
tokenizer: keyword
filter:
- persian_stem
text: جامدات
response:
status: 200
payload:
tokens:
- token: جامد
type: word
start_offset: 0
end_offset: 6
position: 0
23 changes: 23 additions & 0 deletions tests/default/_core/analyze/filter/porterstem.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
$schema: ../../../../../json_schemas/test_story.schema.yaml

description: Test /_analyze with a filter.
version: '>= 2.17'
chapters:
- synopsis: Apply a porter_stem filter.
path: /_analyze
method: GET
request:
payload:
tokenizer: keyword
filter:
- porter_stem
text: Directed by Bennett Miller
response:
status: 200
payload:
tokens:
- token: Directed by Bennett Mil
type: word
start_offset: 0
end_offset: 26
position: 0
26 changes: 26 additions & 0 deletions tests/default/_core/analyze/filter/stop.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
$schema: ../../../../../json_schemas/test_story.schema.yaml

description: Test /_analyze with a filter.
chapters:
- synopsis: Combine a lowercase translation with a stop filter.
path: /_analyze
method: GET
request:
payload:
tokenizer: whitespace
filter:
- lowercase
- type: stop
stopwords:
- in
- to
text: Moneyball directed by Bennett Miller
response:
status: 200
payload:
tokens:
- token: moneyball
type: word
start_offset: 0
end_offset: 9
position: 0
23 changes: 23 additions & 0 deletions tests/default/_core/analyze/filter/uppercase.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
$schema: ../../../../../json_schemas/test_story.schema.yaml

description: Test /_analyze with a filter.
chapters:
- synopsis: Apply an uppercase character filter.
path: /_analyze
method: GET
request:
payload:
tokenizer: keyword
filter:
- uppercase
text: Moneyball
response:
status: 200
payload:
tokens:
- token: MONEYBALL
type: word
start_offset: 0
end_offset: 9
position: 0

0 comments on commit 9771b03

Please sign in to comment.