All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Added
maven-jar-plugin
inbeekeeper-integration-test
.
- Changed scope of attributes to
protected
inbeekeeper-integration-test
- Added a db migration file and implementation of
beekeeper-history
table to track beekeeper activity.
- Added
IcebergTableListenerEventFilter
filter for Iceberg tables inbeekeeper-scheduler-apiary
to prevent scheduling paths and metadata for deletion. - Added
IcebergValidator
to ensure Iceberg tables are identified and excluded from cleanup operations.
- Added error handling for bad requests with incorrect sort parameters.
- Added automatic pagination handling in the
/unreferenced-paths
endpoint for improved Swagger documentation. - Updated the Maven Central release workflow to run exclusively from the main branch.
- Added aws-sts-jdk dependency in
beekeeper-metadata-cleanup
,beekeeper-path-cleanup
,beekeeper-scheduler-apiary
to solve IRSA unable to assume role issue.
- Fixed paged API response by updating 'MetadataResponseConverter' and 'PathResponseConverter' to pass complete information about the number of pages and elements to the response Page.
- Added localisation normalization so locations like
s3:/a/b
ands3:/a/b/
will be considered the same and path won't be scheduled for deletion.
- Upgrade
Springboot
from2.4.4
to2.7.9
. - Upgrade
Spring framework
from5.3.5
to5.3.25
. - Migrate using
springfox
tospringdoc
due to incompatibilities withspring boot 2.6+
actuators. - Removed
micrometer
version frombeekeeper-vacuum-tool
to be the same as managed bybeekeeper/pom.xml
's dependencies. - Upgrade
specification-arg-resolver
version from2.6.1
to2.18.1
to be compatible withspringdoc
.
cleanUpOldDeletedRecords
status check to not deleteDISABLED
entries immediately.
- Check for
expired
property before disabling tables from TTL feature rather than theunreferenced
property.
- Allow cleanup delays to be specified in months or years as well as smaller units by combining the Period and Duration specifications.
- Version of MySQL container from
8.0.15
to8.0.26
in the integration tests.
- Throw exception if
cleanupDelay
can't be parsed instead of returning the default value.
- Don't return records for cleanup after 10 attempts have been reached.
- Added additional checks for the number of levels in table and partition paths so that invalid paths are not scheduled for deletion.
- Return Slice instead of Page in queries to avoid scanning the whole table for the total number of pages. Details here.
- Removed
order by
from the query for getting records to clean up in order to speed up processing.
- Error when deleting over 1000 files in a single request, added logic to break down the request in smaller parts
- Upgraded
com.amazonaws
dependency version to1.12.311
(was1.11.960
).
- S3 tests by using
test-containers
instead oflocalstack-utils
.
- Upgraded
test-containers
to1.17.1
(was1.15.2
).
- If a table gets deleted before Beekeeper is scheduled to do so, its entries in the
housekeeping_metadata
table will be disabled.
- Upgraded
log4j2
to2.16.0
(was2.15.0
).
- Added missing @Transactional annotation for MetadataDisableTablesService.
- Added check for beekeeper property before performing the cleanup in the metadata service.
- Upgraded
log4j2
to2.15.0
because of log4j security issue.
- Cleanup job for old rows in the
housekeeping_path
andhousekeeping_metadata
tables.
- Fixed DB migration script version.
- DB migration to change indexes for the
housekeeping_path
andhousekeeping_metadata
tables.
- DB migration to add indexes for the
housekeeping_path
andhousekeeping_metadata
tables.
- Added the
swagger
endpoint to thebeekeeper-api
module.
- Added a
GET /unreferenced-paths
endpoint to thebeekeeper-api
.
- Added a
GET /metadata
endpoint to thebeekeeper-api
.
- Updated
eg-oss-parent
to version2.4.0
(was2.3.2
). - Updated
snakeyaml
to version1.27
(was1.24
). - Updated
mockito.version
to version3.11.2
(was3.9.0
).
- Added
beekeeper-api
module.
- Updated
aws.version
to version1.11.960
(was1.11.532
) inbeekeeper-cleanup
. - Updated
eg-oss-parent
to version2.3.2
(was2.3.1
). - Updated
localstack-utils
to version0.2.12
(was0.2.7
).
- Updated
eg-oss-parent
to version2.3.0
(was1.3.1
). - Docker images are now built using Jib plugin.
- Excluded
javax.servlet
dependency fromhadoop-mapreduce-client
to avoid version conflict.
- Set lifecycle type to
UNREFERENCED
for paths picked up by Vacuum tool.
- Integration tests for asserting metrics in metadata cleanup.
- Add Time To Live (TTL) for all tables.
- DB migration for creating new
housekeeping_metadata
table.
- Renamed
beekeeper-path-scheduler-apiary
module tobeekeeper-scheduler-apiary
. - Renamed
beekeeper-path-scheduler
module tobeekeeper-scheduler
. - Renamed
beekeeper-assembly-path-scheduler-apiary
module tobeekeeper-assembly-scheduler-apiary
. - Docker image name changed from
beekeeper-path-scheduler-apiary
tobeekeeper-scheduler-apiary
- DB migration to rename
path_status
column tohousekeeping_status
in thepath
table. - DB migration to rename
path
table tohousekeeping_path
table.
beekeeper-vacuum-tool
module.
- Add
LifecycleEventType
enum inbeekeeper-core
to describe supported data Lifecycles. - Prometheus support.
- Refactored internals of
beekeeper-path-scheduler-apiary
to support generic Lifecycle scheduling.- Inserted additional workflow (handlers) between read & filter actions to support filters per Lifecycle type.
- MessageReaderAdapter now has additional logic to orchestrate the updated workflow.
- Renamed
PathEvent
toBeekeeperEvent
to better reflect event types.
- Refactored internals of
beekeeper-path-scheduler
to support generic data Lifecycle scheduling.- Renamed
PathSchedulerService
toUnreferencedPathSchedulerService
to differentiate types.
- Renamed
- Refactored internals of
beekeeper-cleanup
to support generic data Lifecycle deletions.- Refactored
PagingCleanupService
to be a generic orchestrator of Lifecycle handlers.
- Refactored
S3Client.listObjects()
to list all objects at a key with batching.eg-oss-parent
version updated to 1.3.1 (was 1.1.0).
TimedTaggable
annotation to time and report table level metrics.BytesDeletedReporter
to report bytes deleted at a table level.WhitelistedListenerEventFilter
to filter events unless listed inbeekeeper.hive.events.whitelist
.- Fix for S3 paths which require encoding.
- Handling cleanup for
s3a
ands3n
paths.
- Using default deletion delay if table parameter is configured incorrectly.
- Parent pom version to 1.1.0 (was 1.0.0).
MeterRegistry
toGraphiteMeterRegistry
so that Spring metrics use Beekeeper'sGraphiteConfig
and not the default config.
- Health check endpoints.
- First release: producing two docker images with single version tag.