0.3
Pre-release
Pre-release
ADD: ML based Sentence Boundary Detection (SBD) achieves an IoU score of 0.8946
(previously used Regular Expression rule based detector achieves 0.7224
)
ADD: -m sadedegel.tokenize evaluate
to evaluate a sbd.
ADD: -m sadedegel.tokenize diff
to analyze tokenization errors between model annotated (sents) dataset.
ADD: -m sadedegel.tokenize train
to train a new ML based sbd.
FIX: Performance improvement in loading the summarizer pipeline caused by AutoTokenizer
ADD: raw corpus cleaner.
ADD: optional base_path
parameter for raw and sent corpus loader.
FIX: Lot's of dataset issues on raw and sent corpus
ADD: -m sadedegel.dataset validate
for corpus sanity check.