Took a (longer than expected) break from NLP, so this release is mostly just maintenance and bug fixes — but in anticipation of more interesting updates to come.
- upgraded built-in language identification model (PR #375)
- replaced v2 thinc/cld3 model with v3 floret/fasttext model, which has much faster predictions and comparable but more consistent performance
- modernized and improved Python packaging for faster, simpler installation and testing (PR #368 and #369)
- all package metadata and configuration moved into a single
pyproject.toml
file - code formatting and linting updated to use
ruff
plus newer versions ofmypy
andblack
, and their use in GitHub Actions CI has been consolidated - bumped supported Python versions range from 3.8–3.10 to 3.9–3.11 (PR #369)
- added full CI testing matrix for PY 3.9/3.10/3.11 x Linux/macOS/Windows, and removed extraneous AppVeyor integration
- all package metadata and configuration moved into a single
- updated and improved type hints throughout, reducing number of
mypy
complaints by ~80% (PR #372)
Fixed
- fixed ReDoS bugs in regex patterns (PR #371)
- fixed breaking API issues with newer networkx/scikit-learn versions (PR #367)
- improved dev workflow documentation and code to better incorporate language data (PR #363)
- updated caching code with a fix from upstream pysize library, which was preventing Russian-language spaCy model from loading properly (PR #358)
Contributors
Big thanks to @jonwiggins, @Hironsan, amnd @kevinbackhouse for the fixes!