Skip to content

3.4.0-RC01

Pre-release
Pre-release
Compare
Choose a tag to compare
@jaogoy jaogoy released this 16 Jan 11:11
· 590 commits to main since this release
f492f0d

Release date: January 13, 2025

Data Lake Analytics

  • Optimized Iceberg V2 query performance and lowered memory usage by reducing repeated reads of delete-files.
  • Supports column mapping for Delta Lake tables, allowing queries against data after Delta Schema Evolution. For more information, see Delta Lake catalog - Feature support.
  • Data Cache related improvements:
    • Introduces a Segmented LRU (SLRU) Cache eviction strategy, which significantly defends against cache pollution from occasional large queries, improves cache hit rate, and reduces fluctuations in query performance. In simulated test cases with large queries, SLRU-based query performance can be improved by 70% or even higher. For more information, see Data Cache - Cache replacement policies.
    • Unified the Data Cache instance used in both shared-data architecture and data lake query scenarios to simplify the configuration and improve resource utilization. For more information, see Data Cache.
    • Provides an adaptive I/O strategy optimization for Data Cache, which flexibly routes some query requests to remote storage based on the cache disk's load and performance, thereby enhancing overall access throughput.
  • Supports automatic collection of external table statistics through automatic ANALYZE tasks triggered by queries. It can provide more accurate NDV information compared to metadata files, thereby optimizing the query plan and improving query performance. For more information, see Query-triggered collection.

Performance Improvement and Query Optimization

  • [Experimental] Offers a preliminary Query Feedback feature for automatic optimization of slow queries. The system will collect the execution details of slow queries, automatically analyze its query plan for potential opportunities for optimization, and generate a tailored optimization guide for the query. If CBO generates the same bad plan for subsequent identical queries, the system will locally optimize this query plan based on the guide. For more information, see Query Feedback.
  • [Experimental] Supports Python UDFs, offering more convenient function customization compared to Java UDFs. For more information, see Python UDF.

Shared-data Enhancements

  • Supports Query Cache, aligning the shared-nothing architecture.
  • Supports synchronous materialized views, aligning the shared-nothing architecture.

Storage Engine

  • Unified all partitioning methods into the expression partitioning and supported multi-level partitioning, where each level can be any expression. For more information, see Expression Partitioning.

Loading

  • INSERT OVERWRITE now supports a new semantic - Dynamic Overwrite. When this semantic is enabled, the ingested data will either create new partitions or overwrite existing partitions that correspond to the new data records. Partitions not involved will not be truncated or deleted. This semantic is especially useful when users want to recover data in specific partitions without specifying the partition names. For more information, see Dynamic Overwrite.
  • Optimized the data ingestion with INSERT from FILES to replace Broker Load as the preferred loading method:
    • FILES now supports listing files in remote storage, and providing basic statistics of the files. For more information, see FILES - list_files_only.
    • INSERT now supports matching columns by name, which is especially useful when users load data from numerous columns with identical names. (The default behavior matches columns by their position.) For more information, see Match column by name.
    • INSERT supports specifying PROPERTIES, aligning with other loading methods. Users can specify strict_mode, max_filter_ratio, and timeout for INSERT operations to control and behavior and quality of the data ingestion. For more information, see INSERT - PROPERTIES.
    • INSERT from FILES supports pushing down the target table schema check to the Scan stage of FILES to infer a more accurate source data schema. For more information, see see Push down target table schema check.
    • FILES supports unionizing files with different schema. The schema of Parquet and ORC files are unionized based on the column names, and that of CSV files are unionized based on the position (order) of the columns. When there are mismatched columns, users can choose to fill the columns with NULL or return an error by specifying the property fill_mismatch_column_with. For more information, see Union files with different schema.
    • FILES supports inferring the STRUCT type data from Parquet files. (In earlier versions, STRUCT data is inferred as STRING type.) For more information, see Infer STRUCT type from Parquet.

Others

  • Optimized the graceful exit process of BE and CN by accurately displaying the status of BE or CN nodes during a graceful exit as SHUTDOWN.
  • Optimized log printing to avoid excessive disk space being occupied.

Downgrade Notes

  • Clusters can be downgraded from v3.4.0 only to v3.3.9 and later.