Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

benbrandt / text-splitter Public

Notifications You must be signed in to change notification settings
Fork 19
Star 317

Code
Issues 7
Pull requests
Discussions
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Releases: benbrandt/text-splitter

Releases · benbrandt/text-splitter

v0.22.0

17 Jan 10:15

benbrandt

This commit was signed with the committer’s verified signature.

benbrandt Ben Brandt

GPG key ID: 7DDA95F5EAC98E73

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.22.0 Latest

Latest

Breaking Changes

Revert change to special token behavior in v0.21. This had many unintended side effects, and does not seem to be recommended for chunking.

Full Changelog: v0.21.0...v0.22.0

Assets 2

Loading

All reactions

0 Join discussion

v0.21.0

16 Jan 07:55

benbrandt

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.21.0

Breaking Changes

Special tokens are now also encoded by both Huggingface and Tiktoken tokenizers. This is closer to the default behavior on the Python side, and should make sure if a model adds tokens at the beginning or end of a sequence, these are accounted for as well. This is especially important for embedding models that can add a special token to the beginning of the sequence, and the chunks generated didn't actually fit within the context window because of this.

What's New

Rust

MSRV is now 1.80 to remove dependency on once_cell.

Full Changelog: v0.20.1...v0.21.0

Assets 2

Loading

All reactions

0 Join discussion

v0.20.1

01 Jan 20:22

benbrandt

Compare

Choose a tag to compare

Loading

v0.20.1

Fixes

Python: correctly specify version for compatibility with uv installations.

Full Changelog: v0.20.0...v0.20.1

Assets 2

Loading

All reactions

0 Join discussion

v0.20.0

14 Dec 20:50

benbrandt

Compare

Choose a tag to compare

Loading

v0.20.0

Breaking Changes

Switched backing Unicode segmentation implementation from unicode-segmentation to icu_segmenter. This brings some modest performance gains, along with being able to leverage the official Unicode crate. There may be slight differences in chunk behavior in some edge cases, so treating this as a breaking change.

Full Changelog: v0.19.1...v0.20.0

Assets 2

Loading

All reactions

0 Join discussion

v0.19.1

14 Dec 07:07

benbrandt

Compare

Choose a tag to compare

Loading

v0.19.1

What's New

Python splitters have new chunk_all and chunk_all_indices method so the multiple texts can be processed in parallel. (For Rust, you should be able to use rayon to do this already)

Full Changelog: v0.19.0...v0.19.1

Assets 2

Loading

All reactions

0 Join discussion

v0.19.0

28 Nov 10:49

benbrandt

This commit was signed with the committer’s verified signature.

benbrandt Ben Brandt

GPG key ID: 7DDA95F5EAC98E73

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.19.0

Breaking Changes

Update to tokenizers v0.21

Full Changelog: v0.18.1...v0.19.0

Assets 2

Loading

All reactions

0 Join discussion

v0.18.1

25 Oct 19:31

benbrandt

This commit was signed with the committer’s verified signature.

benbrandt Ben Brandt

GPG key ID: 7DDA95F5EAC98E73

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.18.1

What's New

Ensure tokenizer sizers with truncation parameters count their overflow encodings by @Jeadie in #433

New Contributors

@Jeadie made their first contribution in #433

Full Changelog: v0.18.0...v0.18.1

Contributors

Jeadie

Assets 2

Loading

All reactions

0 Join discussion

v0.18.0

14 Oct 12:57

benbrandt

This commit was signed with the committer’s verified signature.

benbrandt Ben Brandt

GPG key ID: 7DDA95F5EAC98E73

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.18.0

Breaking

Change supported tiktoken-rs version to 0.6.x

Full Changelog: v0.17.1...v0.18.0

Assets 2

Loading

All reactions

0 Join discussion

v0.17.1

11 Oct 05:07

benbrandt

This commit was signed with the committer’s verified signature.

benbrandt Ben Brandt

GPG key ID: 7DDA95F5EAC98E73

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.17.1

What's New

Loosen regex crate version requirement

Full Changelog: v0.17.0...v0.17.1

Assets 2

Loading

All reactions

0 Join discussion

v0.17.0

06 Oct 13:33

benbrandt

This commit was signed with the committer’s verified signature.

benbrandt Ben Brandt

GPG key ID: 7DDA95F5EAC98E73

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.17.0

Breaking Changes

Support [email protected] for CodeSplitters.
Due to a slight change in the backing unicode segmentation implementation, there are some slight shifts in behavior for CodeSplitters as well (in my tests, mostly that semicolons have a more logical grouping with previous content).

Full Changelog: v0.16.1...v0.17.0

Assets 2

Loading

All reactions

0 Join discussion

Previous 1 2 3 4 5 6 Next

Previous Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.