1.3.8
Release Notes for Loghi-HTR Version 1.3.8
Date: 2024-01-19
Overview
Version 1.3.8 of Loghi-HTR introduces a range of new features and updates to enhance testing procedures, handle Out-of-Vocabulary (OOV) vocabulary more effectively, and improve data normalization and validation processes.
New Features
-
Enabling Test List Usage:
- Added functionality to use a
test_list
for streamlined testing procedures.
- Added functionality to use a
-
OOV Vocabulary Implementation:
- Implemented handling for Out-of-Vocabulary (OOV) words.
- Replaced [UNK] tokens with � (a less common character), enabling it to be counted as a single character in Character Error Rate (CER) calculations.
-
Outputting Results to File:
- Validation and test results can now be outputted to a .csv file in the output folder.
Enhancements
-
Data Normalization and Validation Process Updates:
- Separated validation and evaluation datasets for more precise control:
validation_dataset
: Used with the--do_validate
option, not normalized.evaluation_dataset
: Used for evaluation during training, undergoes normalization.
- Separated validation and evaluation datasets for more precise control:
-
Default OOV Token Settings:
- OOV tokens are enabled by default for testing and validation, but not for training and evaluation.
Bug Fixes
- General Stability and Performance Enhancements:
- Addressed various minor issues to improve overall system stability and performance.
Contributors
- @TimKoornstra: Responsible for the implementation of OOV vocabulary handling, test list functionality, and enhancements in data normalization and validation processes.
Full Changelog: 1.3.7...1.3.8