Releases
apache-iceberg-1.4.0
API
Implement bound expression sanitization (#8149 )
Remove overflow checks in DefaultCounter
causing performance issues (#8297 )
Support incremental scanning with branch (#5984 )
Add a validation API to DeleteFiles
which validates files exist (#8525 )
Core
Use V2 format by default in new tables (#8381 )
Use zstd
compression for Parquet by default in new tables (#8593 )
Add strict metadata cleanup mode and enable it by default (#8397 ) (#8599 )
Avoid generating huge manifests during commits (#6335 )
Add a writer for unordered position deletes (#7692 )
Optimize DeleteFileIndex
(#8157 )
Optimize lookup in DeleteFileIndex
without useful bounds (#8278 )
Optimize split offsets handling (#8336 )
Optimize computing user-facing state in data tasks (#8346 )
Don't persist useless file and position bounds for deletes (#8360 )
Don't persist counts for paths and positions in position delete files (#8590 )
Support setting system-level properties via environmental variables (#5659 )
Add JSON parser for ContentFile
and FileScanTask
(#6934 )
Add REST spec and request for commits to multiple tables (#7741 )
Add REST API for committing changes against multiple tables (#7569 )
Default to exponential retry strategy in REST client (#8366 )
Support registering tables with REST session catalog (#6512 )
Add last updated timestamp and snapshot ID to partitions metadata table (#7581 )
Add total data size to partitions metadata table (#7920 )
Extend ResolvingFileIO
to support bulk operations (#7976 )
Key metadata in Avro format (#6450 )
Add AES GCM encryption stream (#3231 )
Fix a connection leak in streaming delete filters (#8132 )
Fix lazy snapshot loading history (#8470 )
Fix unicode handling in HTTPClient (#8046 )
Fix paths for unpartitioned specs in writers (#7685 )
Fix OOM caused by Avro decoder caching (#7791 )
Spark
Added support for Spark 3.5
Code for DELETE, UPDATE, and MERGE commands has moved to Spark, and all related extensions have been dropped from Iceberg.
Support for WHEN NOT MATCHED BY SOURCE clause in MERGE.
Column pruning in merge-on-read operations.
Ability to request a bigger advisory partition size for the final write to produce well-sized output files without harming the job parallelism.
Dropped support for Spark 3.1
Deprecated support for Spark 3.2
Support vectorized reads for merge-on-read operations in Spark 3.4 and 3.5 (#8466 )
Increase default advisory partition size for writes in Spark 3.5 (#8660 )
Support distributed planning in Spark 3.4 and 3.5 (#8123 )
Support pushing down system functions by V2 filters in Spark 3.4 and 3.5 (#7886 )
Support fanout position delta writers in Spark 3.4 and 3.5 (#7703 )
Use fanout writers for unsorted tables by default in Spark 3.5 (#8621 )
Support multiple shuffle partitions per file in compaction in Spark 3.4 and 3.5 (#7897 )
Output net changes across snapshots for carryover rows in CDC (#7326 )
Display read metrics on Spark SQL UI (#7447 ) (#8445 )
Adjust split size to benefit from cluster parallelism in Spark 3.4 and 3.5 (#7714 )
Add fast_forward
procedure (#8081 )
Support filters when rewriting position deletes (#7582 )
Support setting current snapshot with ref (#8163 )
Make backup table name configurable during migration (#8227 )
Add write and SQL options to override compression config (#8313 )
Correct partition transform functions to match the spec (#8192 )
Enable extra commit properties with metadata delete (#7649 )
Flink
Add possibility of ordering the splits based on the file sequence number (#7661 )
Fix serialization in TableSink
with anonymous object (#7866 )
Switch to FileScanTaskParser
for JSON serialization of IcebergSourceSplit
(#7978 )
Custom partitioner for bucket partitions (#7161 )
Implement data statistics coordinator to aggregate data statistics from operator subtasks (#7360 )
Support alter table column (#7628 )
Parquet
Add encryption config to read and write builders (#2639 )
Skip writing bloom filters for deletes (#7617 )
Cache codecs by name and level (#8182 )
Fix decimal data reading from ParquetAvroValueReaders
(#8246 )
Handle filters with transforms by assuming data must be scanned (#8243 )
ORC
Handle filters with transforms by assuming the filter matches (#8244 )
Vendor Integrations
GCP: Fix single byte read in GCSInputStream
(#8071 )
GCP: Add properties for OAtuh2 and update library (#8073 )
GCP: Add prefix and bulk operations to GCSFileIO
(#8168 )
GCP: Add bundle jar for GCP-related dependencies (#8231 )
GCP: Add range reads to GCSInputStream
(#8301 )
AWS: Add bundle jar for AWS-related dependencies (#8261 )
AWS: support config storage class for S3FileIO
(#8154 )
AWS: Add FileIO
tracker/closer to Glue catalog (#8315 )
AWS: Update S3 signer spec to allow an optional string body in S3SignRequest
(#8361 )
Azure: Add FileIO
that supports ADLSv2 storage (#8303 )
Azure: Make ADLSFileIO
implement DelegateFileIO
(#8563 )
Nessie: Provide better commit message on table registration (#8385 )
Dependencies
Bump Nessie to 0.71.0
Bump ORC to 1.9.1
Bump Arrow to 12.0.1
Bump AWS Java SDK to 2.20.131
You canβt perform that action at this time.