Skip to content

0.3

Pre-release
Pre-release
Compare
Choose a tag to compare
@husnusensoy husnusensoy released this 21 Jul 16:05
· 933 commits to master since this release

ADD: ML based Sentence Boundary Detection (SBD) achieves an IoU score of 0.8946 (previously used Regular Expression rule based detector achieves 0.7224)
ADD: -m sadedegel.tokenize evaluate to evaluate a sbd.
ADD: -m sadedegel.tokenize diff to analyze tokenization errors between model annotated (sents) dataset.
ADD: -m sadedegel.tokenize train to train a new ML based sbd.
FIX: Performance improvement in loading the summarizer pipeline caused by AutoTokenizer
ADD: raw corpus cleaner.
ADD: optional base_path parameter for raw and sent corpus loader.
FIX: Lot's of dataset issues on raw and sent corpus
ADD: -m sadedegel.dataset validate for corpus sanity check.