Releases: GlobalMaksimum/sadedegel
0.3
ADD: ML based Sentence Boundary Detection (SBD) achieves an IoU score of 0.8946
(previously used Regular Expression rule based detector achieves 0.7224
)
ADD: -m sadedegel.tokenize evaluate
to evaluate a sbd.
ADD: -m sadedegel.tokenize diff
to analyze tokenization errors between model annotated (sents) dataset.
ADD: -m sadedegel.tokenize train
to train a new ML based sbd.
FIX: Performance improvement in loading the summarizer pipeline caused by AutoTokenizer
ADD: raw corpus cleaner.
ADD: optional base_path
parameter for raw and sent corpus loader.
FIX: Lot's of dataset issues on raw and sent corpus
ADD: -m sadedegel.dataset validate
for corpus sanity check.
Maintanence Release
Lots to do before July 31st But we need to give a pause and publish a maintenance release for sadedegel
. Another major release is on the way.
- ADD: More References
- FIX: Lazy loading of
AutoTokenizer
to improvesadedegel.load()
performance. - ADD: We add
validate
subcommand intosadedegel.dataset
commandline to ensuresents
dataset is in coherence withraw
dataset (yes there is such a command-line which we need to document) - FIX: Huge effort by @dafajon and @askarbozcan to fix numerous errors on our
raw
andsents
datasets. - FIX: Other minor documentation and LICENSE fixes thanks to @mccakir
0.2
- ADD: MIT License
- ADD: Reference section into README.md
- ADD:
RandomK
summarizer.RandomK
summarizer is another baseline summarizer selecting random K (without replacement) sentences out of all sentences in document - ADD:
[metadata]
section intosetup.cfg
. Extensively improved metadata content for PyPI build. - REMOVE:
sadedegel.annotator
- FIX:
flake8
parameters - FIX: Documentation
First Release
We are finally there as a working library that can be installed and used. Obviously we are still far from being production ready.
With this first release:
- We bundled sadedeGel wheel and let others download and start using it
- pylint, flake8 and bandit integration is completed.
- We shaped the library structure (which will be changed in upcoming releases)