- Upgraded to Okapi M36
- Upgraded dependencies, fixed known vulnerability in Jetty
- Fixed logging issues, better logging messages
- Segmentation improvements on percentages and URLs
- Upgraded to Okapi M35
- Removed deprecated dependencies
- Parametrized segmentation of bilingual files, so it can be overridden by custom filters
- Improved exceptions logging
- Improved compatibility of the XLIFF produced by Filters with SDL Trados Studio
- When converting to target file, support XLIFFs with non UTF-8 encodings
- Fixed: conversion to target failures caused by spaces between XLIFF header tags
- Upgrade to Okapi M34
- Direct conversion of all macro-enabled MS Office files (e.g. DOCM, XLSM...)
- XLIFF filter now preserves CDATA areas in output files
- Better whitespace correction for conversion from/to oriental languages
- Graceful failures on password protected MS Office files
- Fixed: ">" char was not escaped in XLIFF filter output
- Proper handling of the SIGTERM, now you can quit the app with CTRL+C
Changes we developed in Okapi and approved in this MateCat Filters release:
- Support for CDATA preservation in XLIFF and ITS filter
- Support for all macro-enabled MS Office files (e.g. DOCM, XLSM...)
- Password protection detection in MS Office files
- Fixed: paragraphs spacing lost after merge in some MS Office files
- Fixed: some MS Excel files caused infinite loop on extraction
- Fixed: some text not extracted from MIF files
- Fixed: bug merging some Open Office documents with bookmarks references
- Fixed: errors parsing lists in YAML files whose items had no space after comma
- HTML subfiltering in XML now uses the same rules of the regular HTML filter
- Fixed: conversion of CSV and TSV files was not working properly
Changes we developed in Okapi and approved in this MateCat Filters release:
- Fixed: revisions detection in DOCX files sometimes produced false positives
- Fixed: some tags where not extracted from MIF files
- Fixed: default thresholds for IDML files where too conservative
- Fixed: some special inline tags in TTX where causing broken converted files
- Improved Filters architecture to allow more customization; develop and plug your own filters to support files with particular features
Changes we developed in Okapi and approved in this MateCat Filters release:
- Excel files fix: visible cells merged with hidden ones were not extracted
- Proper kuten (asian period character with embedded trailing space) support
- Win Converter failover: Filters can now use the Consul service discovery to get a list of available Win Converters
- Improved Win Converter communication protocol for speed and robustness
- Workaround for filenames charset bug: due to a bug in MIMEPull library, filenames are always read as ISO-8859-1. To send a UTF-8 filename you can now use the "fileName" POST parameter.
- Updated ICU4J library for better segmentation. Noticed improvements in the break behavior on periods not followed by space.
Changes we developed in Okapi and approved in this MateCat Filters release:
- Word, Excel and Powerpoint: improved the hyphens support
- Word, Excel and Powerpoint: improved <mc:AlternateContent> support
- Excel: fix some corruptions in merge step
- Powerpoint: ignore some useless attributes producing tags
- Adobe FrameMaker: fix bug processing header of newer versions
- Added ability to provide custom segmentation rules.
- Fixed "Get Original" button not working in GUI.
- Fixed URL segmentation rules causing near-infinite loop.
- Added segmentation rule for footnotes references.
Changes we developed in Okapi and approved in this MateCat Filters release:
- MS Office: 100% RTL support
- MS Office: improved handling of <w:smartTag>
- Word: fixed buggy handling for <w:fldChar> (long urls)
- Word: gracefully fail on revisions (tracked changes)
- Word: improved handling of <w:SpecVanish>
- Word: fix error on double <w:hyperlinks>
- Excel: hidden table headers no more extracted
- HTML: added RTL support
- Open/LibreOffice: doc properties no more extracted
- Open/LibreOffice Calc: cached formula results no more extracted
- Big improvements to documentation, code structure and robustness
Changes we developed in Okapi and approved in this MateCat Filters release:
- MS Office: 80% RTL support
- MS Office: fixed issues on whitespaces handling
- MS Office: aggressive cleanup now strips away <w:bCs> and <w:szCs>
- Word: fixed some corruptions handling diagrams
- MS Office: fixed corruptions on tags with too many attributes
First public release.