Releases: alephdata/ingest-file
4.0.0
What's Changed
-
Use rabbitmq based task queues for workers by @catileptic and @stchris
This release makes the
ingest-file
worker into a rabbitmq based worker (see also https://github.com/alephdata/servicelayer/releases/tag/v1.23.0). To highlight some of the changes:- new settings of the form
RABBITMQ_*
have been introduced https://github.com/alephdata/servicelayer/blob/131171c137ce2a46d3ca36216b9cd7c2bd70125d/servicelayer/settings.py#L37 - a Redis connection is still needed. Redis is used to coordinate the state of job execution across all workers.
- it is possible to configure the prefetch count for tasks the ingest-file worker will grab at a time (see https://github.com/alephdata/ingest-file/blob/bd321ec7524c15a9ec0396a153f5575856e476f8/ingestors/settings.py#L57)
- new settings of the form
Other changes
Full Changelog: 3.22.0...4.0.0
3.22.0
Note
Please note that we skipped version 3.21.0. That means the previous version before this version is 3.20.3.
What's Changed
- Fix multi-line quoted-printable encoded values in vCards by @tillprochaska in #595
- Fix formatting by @stchris in #619
- Bump the dev-dependencies group with 4 updates by @dependabot in #615
- Bump servicelayer[amazon,google] from 1.22.1 to 1.22.2 by @dependabot in #614
- Bump rarfile from 4.1 to 4.2 by @dependabot in #611
- Bump sentry-sdk from 1.39.1 to 2.0.1 by @dependabot in #610
- Introduce a setting to disable sending ProcessingExceptions to Sentry by @stchris in #607
- Bump icalendar from 5.0.11 to 5.0.12 by @dependabot in #602
- Bump google-cloud-vision from 3.5.0 to 3.7.2 by @dependabot in #601
- Bump followthemoney from 3.5.8 to 3.5.9 by @dependabot in #581
- Bump click from 8.1.6 to 8.1.7 by @dependabot in #517
- Bump followthemoney-store[postgresql] from 3.0.6 to 3.1.0 by @dependabot in #598
Full Changelog: 3.20.3...3.22.0
3.20.3
Please refer to the release notes for Aleph 3.15.6 for detailed information.
3.20.2
What's Changed
- Fix TIFF processing by @catileptic in #587
- There was an issue with some types of TIFF files not being properly previewed and OCRd
- Extended test coverage to prevent regressions in OCR for gif, jpg, jp2, tiff, webp
Full Changelog: 3.20.1...3.20.2
3.20.1
What's changed
- Force installing
tesserocr
from source instead of using wheels because of sirfz/tesserocr#337. This fixes a regression which might have caused certain image file types to not have been OCRd. - Add a
clear-cache
command to theingestors
CLI, which allows one to clear the ingest cache. It also takes a prefix (for instanceocr:
orpdf:
.
Full Changelog: 3.20.0...3.20.1
3.20.0
What's Changed
- Emit a verbose error when processing a password-protected XLS / XLSX file by @catileptic in #551
- Add Prometheus instrumentation for ingest-file workers by @tillprochaska in #550
- Don't tag test- branches with latest by @catileptic in #561
- Bump ruff from 0.0.286 to 0.1.6 by @dependabot in #559
- Dependabot: remove old ignores and group dev deps by @stchris in #563
- Bump the dev-dependencies group with 3 updates by @dependabot in #564
- Bump lxml from 4.9.3 to 5.0.0 by @dependabot in #572
- Bump the dev-dependencies group with 2 updates by @dependabot in #571
- Bump sentry-sdk from 1.30.0 to 1.39.1 by @dependabot in #570
- Bump google-cloud-vision from 3.4.4 to 3.5.0 by @dependabot in #569
- Bump dbf from 0.99.3 to 0.99.9 by @dependabot in #568
- Bump olefile from 0.46 to 0.47 by @dependabot in #567
- Bump icalendar from 5.0.7 to 5.0.11 by @dependabot in #556
- Bump pyicu from 2.11 to 2.12 by @dependabot in #560
- Bump cryptography from 41.0.4 to 41.0.7 by @dependabot in #553
- Bump pymediainfo from 6.0.1 to 6.1.0 by @dependabot in #549
- Bump normality from 2.4.0 to 2.5.0 by @dependabot in #544
- Bump tesserocr from 2.6.1 to 2.6.2 by @dependabot in #547
- Bump pillow from 10.0.0 to 10.1.0 by @dependabot in #543
- Bump rarfile from 4.0 to 4.1 by @dependabot in #527
- Bump FTM version 3.5.2->3.5.8 by @catileptic in #574
Full Changelog: 3.19.3...3.20.0
3.20.0-rc1
What's Changed
- Bump pantomime from 0.6.0 to 0.6.1 by @dependabot in #501
- Bump ruff from 0.0.269 to 0.0.282 by @dependabot in #500
- Bump cryptography from 39.0.1 to 41.0.3 by @dependabot in #502
- Bump sentry-sdk from 1.26.0 to 1.29.2 by @dependabot in #499
- Bump black from 23.3.0 to 23.7.0 by @dependabot in #496
- Bump pytest from 7.2.2 to 7.4.0 by @dependabot in #485
- Bump lxml from 4.9.2 to 4.9.3 by @dependabot in #495
- Bump google-cloud-vision from 3.4.1 to 3.4.4 by @dependabot in #497
- Bump tesserocr from 2.6.0 to 2.6.1 by @dependabot in #493
- Bump spacy from 3.5.1 to 3.6.1 by @dependabot in #509
- Bump pillow from 9.5.0 to 10.0.0 by @dependabot in #483
- Bump pytest-cov from 4.0.0 to 4.1.0 by @dependabot in #476
- Bump requests[security] from 2.28.2 to 2.31.0 by @dependabot in #477
- Bump followthemoney-store[postgresql] from 3.0.5 to 3.0.6 by @dependabot in #513
- Bump followthemoney from 3.4.4 to 3.5.2 by @dependabot in #507
- Bump icalendar from 5.0.4 to 5.0.7 by @dependabot in #469
- Bump pyicu from 2.10.2 to 2.11 by @dependabot in #459
- Bump click from 8.1.3 to 8.1.7 by @dependabot in #508
- Lower click version to avoid mismatch by @stchris in #514
- Bump ruff from 0.0.282 to 0.0.286 by @dependabot in #516
- Add merge_group trigger by @stchris in #521
- GHA: Update checkout action to v3 by @stchris in #522
- Bump sentry-sdk from 1.29.2 to 1.30.0 by @dependabot in #519
- Bump fingerprints from 1.1.0 to 1.1.1 by @dependabot in #518
- Bump servicelayer[amazon,google] from 1.21.0 to 1.21.2 by @dependabot in #520
- Bump cryptography from 41.0.3 to 41.0.4 by @dependabot in #524
- Update pip, setuptools and wheel before installing packages by @stchris in #510
- Fix Aleph bug 2879 - fragmented 7z archive by @catileptic in #535
- Emit a verbose error when processing a password-protected XLS / XLSX file by @catileptic in #551
- Add Prometheus instrumentation for ingest-file workers by @tillprochaska in #550
Full Changelog: 3.19.2...3.20.0-rc1
3.19.3-rc1
What's Changed
- Bump pantomime from 0.6.0 to 0.6.1 by @dependabot in #501
- Bump ruff from 0.0.269 to 0.0.282 by @dependabot in #500
- Bump cryptography from 39.0.1 to 41.0.3 by @dependabot in #502
- Bump sentry-sdk from 1.26.0 to 1.29.2 by @dependabot in #499
- Bump black from 23.3.0 to 23.7.0 by @dependabot in #496
- Bump pytest from 7.2.2 to 7.4.0 by @dependabot in #485
- Bump lxml from 4.9.2 to 4.9.3 by @dependabot in #495
- Bump google-cloud-vision from 3.4.1 to 3.4.4 by @dependabot in #497
- Bump tesserocr from 2.6.0 to 2.6.1 by @dependabot in #493
- Bump spacy from 3.5.1 to 3.6.1 by @dependabot in #509
- Bump pillow from 9.5.0 to 10.0.0 by @dependabot in #483
- Bump pytest-cov from 4.0.0 to 4.1.0 by @dependabot in #476
- Bump requests[security] from 2.28.2 to 2.31.0 by @dependabot in #477
- Bump followthemoney-store[postgresql] from 3.0.5 to 3.0.6 by @dependabot in #513
- Bump followthemoney from 3.4.4 to 3.5.2 by @dependabot in #507
- Bump icalendar from 5.0.4 to 5.0.7 by @dependabot in #469
- Bump pyicu from 2.10.2 to 2.11 by @dependabot in #459
- Bump click from 8.1.3 to 8.1.7 by @dependabot in #508
- Lower click version to avoid mismatch by @stchris in #514
- Bump ruff from 0.0.282 to 0.0.286 by @dependabot in #516
- Add merge_group trigger by @stchris in #521
- GitHub Actions: Update checkout action to v3 by @stchris in #522
- Bump sentry-sdk from 1.29.2 to 1.30.0 by @dependabot in #519
- Bump fingerprints from 1.1.0 to 1.1.1 by @dependabot in #518
- Bump servicelayer[amazon,google] from 1.21.0 to 1.21.2 by @dependabot in #520
Full Changelog: 3.19.2...3.19.3-rc1
3.19.2
What's Changed
- Fix handling of multipart emails by @tillprochaska in #488
- Send ProcessingExceptions to Sentry by @stchris in #487
New Contributors
- @tillprochaska made their first contribution in #488
Full Changelog: 3.18.4...3.19.2
3.19.2-rc1
What's Changed
- Fix handling of multipart emails by @tillprochaska in #488
- Send ProcessingExceptions to Sentry by @stchris in #487
Full Changelog: 3.18.4...3.19.2-rc1