add simple pattern split tokenizer docs #8491

AntonEliatra · 2024-10-10T10:49:06Z

Description

add simple pattern split tokenizer docs

Issues Resolved

Part of #1483 addressed in this PR.

Version

all

Checklist

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Anton Rubin <[email protected]>

github-actions · 2024-10-10T10:49:18Z

Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged.

Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer.

When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review.

vagimeli · 2024-10-10T21:41:18Z

@udabhas @varun-lodaya Please provide tech review approval to move this PR forward in the documentation process. Please review this week or provide a peer who can review it. Thank you.

Signed-off-by: Anton Rubin <[email protected]>

Signed-off-by: Fanit Kolchina <[email protected]>

_analyzers/tokenizers/simple-pattern-split.md

Signed-off-by: kolchfa-aws <[email protected]>

Signed-off-by: Fanit Kolchina <[email protected]>

…om:AntonEliatra/documentation-website into adding-simple-pattern-split-tokenizer-docs

natebower

@kolchfa-aws @AntonEliatra Editorial review. Thanks!

_analyzers/tokenizers/simple-pattern-split.md

natebower · 2024-12-09T11:09:34Z

_analyzers/tokenizers/simple-pattern-split.md

+
+Parameter | Required/Optional | Data type | Description
+:--- | :--- | :--- | :--- 
+`pattern` | Optional | String | The pattern used to split text into tokens specified using a [Lucene regular expression](https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/util/automaton/RegExp.html). Default is an empty string, which returns the input text as one token. 


If the tokens are specified, then fine as is. If, however, the pattern is specified, then there should be a comma after "tokens".

_analyzers/tokenizers/simple-pattern-split.md

Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>

* add simple pattern split tokenizer docs Signed-off-by: Anton Rubin <[email protected]> * updating parameter table Signed-off-by: Anton Rubin <[email protected]> * Doc review Signed-off-by: Fanit Kolchina <[email protected]> * Update _analyzers/tokenizers/simple-pattern-split.md Signed-off-by: kolchfa-aws <[email protected]> * Clarification Signed-off-by: Fanit Kolchina <[email protected]> * Apply suggestions from code review Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> --------- Signed-off-by: Anton Rubin <[email protected]> Signed-off-by: Fanit Kolchina <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> Co-authored-by: Fanit Kolchina <[email protected]> Co-authored-by: kolchfa-aws <[email protected]> Co-authored-by: Nathan Bower <[email protected]> (cherry picked from commit 595bc13) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

add simple pattern split tokenizer docs

99f51ff

Signed-off-by: Anton Rubin <[email protected]>

AntonEliatra requested review from kolchfa-aws, Naarcha-AWS, vagimeli, AMoo-Miki, natebower, dlvenable and epugh as code owners October 10, 2024 10:49

github-actions bot assigned kolchfa-aws Oct 10, 2024

kolchfa-aws assigned vagimeli and unassigned kolchfa-aws Oct 10, 2024

vagimeli added 3 - Tech review PR: Tech review in progress Needs SME Waiting on input from subject matter expert Content gap analyzers labels Oct 10, 2024

AntonEliatra and others added 2 commits October 16, 2024 17:17

updating parameter table

bbf8845

Signed-off-by: Anton Rubin <[email protected]>

Doc review

b2cbb98

Signed-off-by: Fanit Kolchina <[email protected]>

kolchfa-aws assigned kolchfa-aws and unassigned vagimeli Dec 5, 2024

kolchfa-aws added 5 - Editorial review PR: Editorial review in progress backport 2.18 PR: Backport label for 2.18 and removed 3 - Tech review PR: Tech review in progress labels Dec 5, 2024

kolchfa-aws reviewed Dec 5, 2024

View reviewed changes

_analyzers/tokenizers/simple-pattern-split.md Outdated Show resolved Hide resolved

kolchfa-aws and others added 3 commits December 5, 2024 14:15

Update _analyzers/tokenizers/simple-pattern-split.md

15b34cf

Signed-off-by: kolchfa-aws <[email protected]>

Clarification

a37a8b5

Signed-off-by: Fanit Kolchina <[email protected]>

Merge branch 'adding-simple-pattern-split-tokenizer-docs' of github.c…

625b6f4

…om:AntonEliatra/documentation-website into adding-simple-pattern-split-tokenizer-docs

natebower reviewed Dec 9, 2024

View reviewed changes

kolchfa-aws reviewed Dec 9, 2024

View reviewed changes

_analyzers/tokenizers/simple-pattern-split.md Outdated Show resolved Hide resolved

Apply suggestions from code review

337524d

Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]>

kolchfa-aws approved these changes Dec 9, 2024

View reviewed changes

kolchfa-aws merged commit 595bc13 into opensearch-project:main Dec 9, 2024
6 checks passed

opensearch-trigger-bot bot mentioned this pull request Dec 9, 2024

[Backport 2.18] add simple pattern split tokenizer docs #8905

Merged

AntonEliatra deleted the adding-simple-pattern-split-tokenizer-docs branch December 9, 2024 18:03

github-actions bot pushed a commit that referenced this pull request Dec 9, 2024

add simple pattern split tokenizer docs (#8491) (#8905)

d2a1ec4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add simple pattern split tokenizer docs #8491

add simple pattern split tokenizer docs #8491

AntonEliatra commented Oct 10, 2024

github-actions bot commented Oct 10, 2024

vagimeli commented Oct 10, 2024

natebower left a comment

natebower Dec 9, 2024

add simple pattern split tokenizer docs #8491

add simple pattern split tokenizer docs #8491

Conversation

AntonEliatra commented Oct 10, 2024

Description

Issues Resolved

Version

Checklist

github-actions bot commented Oct 10, 2024

vagimeli commented Oct 10, 2024

natebower left a comment

Choose a reason for hiding this comment

natebower Dec 9, 2024

Choose a reason for hiding this comment