Releases · StarRocks/starrocks

16 Jan 11:11

jaogoy

3.4.0-rc01

f492f0d

3.4.0-RC01 Pre-release

Pre-release

Release date: January 13, 2025

Data Lake Analytics

Optimized Iceberg V2 query performance and lowered memory usage by reducing repeated reads of delete-files.
Supports column mapping for Delta Lake tables, allowing queries against data after Delta Schema Evolution. For more information, see Delta Lake catalog - Feature support.
Data Cache related improvements:
- Introduces a Segmented LRU (SLRU) Cache eviction strategy, which significantly defends against cache pollution from occasional large queries, improves cache hit rate, and reduces fluctuations in query performance. In simulated test cases with large queries, SLRU-based query performance can be improved by 70% or even higher. For more information, see Data Cache - Cache replacement policies.
- Unified the Data Cache instance used in both shared-data architecture and data lake query scenarios to simplify the configuration and improve resource utilization. For more information, see Data Cache.
- Provides an adaptive I/O strategy optimization for Data Cache, which flexibly routes some query requests to remote storage based on the cache disk's load and performance, thereby enhancing overall access throughput.
Supports automatic collection of external table statistics through automatic ANALYZE tasks triggered by queries. It can provide more accurate NDV information compared to metadata files, thereby optimizing the query plan and improving query performance. For more information, see Query-triggered collection.

Performance Improvement and Query Optimization

[Experimental] Offers a preliminary Query Feedback feature for automatic optimization of slow queries. The system will collect the execution details of slow queries, automatically analyze its query plan for potential opportunities for optimization, and generate a tailored optimization guide for the query. If CBO generates the same bad plan for subsequent identical queries, the system will locally optimize this query plan based on the guide. For more information, see Query Feedback.
[Experimental] Supports Python UDFs, offering more convenient function customization compared to Java UDFs. For more information, see Python UDF.

Shared-data Enhancements

Supports Query Cache, aligning the shared-nothing architecture.
Supports synchronous materialized views, aligning the shared-nothing architecture.

Storage Engine

Unified all partitioning methods into the expression partitioning and supported multi-level partitioning, where each level can be any expression. For more information, see Expression Partitioning.

Loading

INSERT OVERWRITE now supports a new semantic - Dynamic Overwrite. When this semantic is enabled, the ingested data will either create new partitions or overwrite existing partitions that correspond to the new data records. Partitions not involved will not be truncated or deleted. This semantic is especially useful when users want to recover data in specific partitions without specifying the partition names. For more information, see Dynamic Overwrite.
Optimized the data ingestion with INSERT from FILES to replace Broker Load as the preferred loading method:
- FILES now supports listing files in remote storage, and providing basic statistics of the files. For more information, see FILES - list_files_only.
- INSERT now supports matching columns by name, which is especially useful when users load data from numerous columns with identical names. (The default behavior matches columns by their position.) For more information, see Match column by name.
- INSERT supports specifying PROPERTIES, aligning with other loading methods. Users can specify strict_mode, max_filter_ratio, and timeout for INSERT operations to control and behavior and quality of the data ingestion. For more information, see INSERT - PROPERTIES.
- INSERT from FILES supports pushing down the target table schema check to the Scan stage of FILES to infer a more accurate source data schema. For more information, see see Push down target table schema check.
- FILES supports unionizing files with different schema. The schema of Parquet and ORC files are unionized based on the column names, and that of CSV files are unionized based on the position (order) of the columns. When there are mismatched columns, users can choose to fill the columns with NULL or return an error by specifying the property fill_mismatch_column_with. For more information, see Union files with different schema.
- FILES supports inferring the STRUCT type data from Parquet files. (In earlier versions, STRUCT data is inferred as STRING type.) For more information, see Infer STRUCT type from Parquet.

Others

Optimized the graceful exit process of BE and CN by accurately displaying the status of BE or CN nodes during a graceful exit as SHUTDOWN.
Optimized log printing to avoid excessive disk space being occupied.

Downgrade Notes

Clusters can be downgraded from v3.4.0 only to v3.3.9 and later.

Assets 2

13 Jan 06:20

wangsimo0

3.3.9

dfae8f9

3.3.9 Latest

Latest

3.3.9

Release date: January 12, 2025

New Features

Supports the translation of Trino SQL into StarRocks SQL. #54185

Improvements

Corrected FE node names starting with bdbje_reset_election_group to enhance clarity. #54399
Implemented vectorization for the IF function on ARM architectures. #53093
ALTER SYSTEM CREATE IMAGE supports creating an image for StarManager. #54370
Supports deleting cloud-native indexes of Primary Key tables in shared-data clusters. #53971
Enforced the refresh of materialized views when the FORCE keyword is specified. #52081
Supports specifying hints in CACHE SELECT. #54697
Supports loading compressed CSV files using the FILES() function. Supported compression formats include gzip, bz2, lz4, deflate, and zstd. #54626
Supports assigning multiple values to the same column in an UPDATE statement. #54534

Bug Fixes

Fixed the following issues:

Unexpected errors when refreshing materialized views built on JDBC catalogs. #54487
Instability in results when a Delta Lake table joins itself. #54473
Upload retries fail when backing up data to HDFS. #53679
BFD initialization errors on the aarch64 architecture. #54372
Sensitive information recorded in BE logs. #54677
Errors in Compaction-related metrics in profiles. #54678
BE crashes caused by creating tables with nested TIME types. #54601
Query plan errors for LIMIT queries with subquery TOP-N. #54507

Downgrade notes

Clusters can be downgraded from v3.3.9 only to v3.2.11 and later.

Assets 2

09 Jan 02:34

yingtingdong

3.2.14

b86884b

3.2.14

Release date: January 8, 2025

Improvements

Supports collecting statistics of Paimon tables. #52858
Included node information and histogram metrics in JSON metrics. #53735

Bug Fixes

Fixed the following issues:

The score of the Primary Key table index was not updated in the Commit phase. #41737
Incorrect execution plans for max(count(distinct)) when low-cardinality optimization is enabled. #53403
When the List partition column has NULL values, queries against the Min/Max value of the partition column will lead to incorrect partition pruning. #53235
Upload retries fail when backing up data to HDFS. #53679

Assets 2

06 Jan 01:54

wangsimo0

3.3.8

e3816ec

3.3.8

Release date: January 3, 2025

Improvements

Added a cluster idle API to assist in determining cluster status. #53850
Included node information and histogram metrics in JSON metrics. #53735
Optimized the MemTable for Primary Key tables in shared-data clusters. #54178
Optimized memory usage and statistics for Primary Key tables in shared-data clusters. #54358
Introduced a limit on the number of partitions scanned per node for queries requiring full-table or large-scale partition scans, enhancing system stability by reducing scanning pressure on individual BE or CN nodes. #53747
Supports collecting statistics of Paimon tables. #52858
Supports configuration of S3 client request timeout for shared-data clusters. #54211

Bug Fixes

Fixed the following issues:

BE crashes caused by inconsistencies in the DelVec of Primary Key tables. #53460
Issues with lock release of Primary Key tables in shared-data clusters. #53878
Errors of UDFs nested in functions are not returned in query failures. #44297
Transactions are blocked at the Decommission phase because they depend on the original replicas. #49349
Queries against Delta Lake tables use relative paths instead of filenames for file retrieval. #53949
An error is returned when querying Delta Lake Shallow Clone tables. #54044
Case sensitivity issues when reading Paimon using JNI. #54041
An error is returned during INSERT OVERWRITE operations on Hive tables created in Hive. #53792
SHOW TABLE STATUS command does not validate view privileges. #53811
Missing FE metrics. #53058
Memory leaks in INSERT tasks. #53809
Concurrency issues caused by missing write locks in replication tasks. #54061
partition_ttl of tables in the statistics database does not take effect. #54398
Query Cache-related issues:
- Crashes when Query Cache is enabled with Group Execution. #54363
- Runtime Filter crashes. #54305
Issues with materialized view Union Rewrite. #54293
Missing padding in string updates for partial updates in Primary Key tables. #54182
Incorrect execution plans for max(count(distinct)) when low-cardinality optimization is enabled. #53403
Issues with changing the excluded_refresh_tables parameter of materialized views. #53394

Behavior Changes

Changed the default value of persistent_index_type for Primary Key tables in shared-data clusters to CLOUD_NATIVE, that is, enabled Persistent Index by default. #52209

Assets 2

03 Jan 11:19

jaogoy

3.1.17

67ae3b7

3.1.17

Release Date: January 3, 2025

Bug Fixes

Fixed the following issues:

Cross-cluster Data Migration Tool caused the Follower FE to crash during data synchronization and commit, due to not accounting for the deletion of partitions in the target cluster. #54061
BE in the target cluster might crash when synchronizing tables with DELETE operations using Cross-cluster Data Migration Tool. #54081
A bug in the BDBJE handshake mechanism where Leader FE would reject reconnection attempts from Follower FE when connection is being re-established, causing Follower FE nodes to exit. #50412
Duplicate memory statistics in FE leads to excessive memory usage. #53055
The statuses of the asynchronous materialized view refresh tasks are inconsistent across multiple FE nodes, which lead to inaccurate states of the materialized view during queries. #54236

Assets 2

13 Dec 06:58

yingtingdong

3.2.13

f0965dc

3.2.13

Release date: December 13, 2024

Improvements

Supports setting a time range within which Base Compaction is forbidden for a specific table. #50120

Bug Fixes

Fixed the following issues:

The loadRowsRate field returned 0 after executing SHOW ROUTINE LOAD. #52151
The Files() function read columns that were not queried. #52210
Prometheus failed to parse materialized view metrics with special characters in their names. (Now materialized view metrics support tags.) #52782
The array_map function caused BE to crash. #52909
Metadata Cache issues caused BE to crash. #52968
Routine Load tasks were canceled due to expired transactions. (Now tasks are canceled only if the database or table no longer exists). #50334
Stream Load failures when submitted using HTTP 1.0. #53010 #53008
Issues related to Glue and S3 integration: #48433
- Some error messages did not display the root cause.
- Error messages for writing to a Hive partitioned table with the partition column of type STRING when Glue was used as the metadata service.
- Dropping Hive tables failed without proper error messages when the user lacked sufficient permissions.
The storage_cooldown_time property for materialized views did not take effect when set to maximum. #52079

Assets 2

16 Dec 12:07

jaogoy

3.1.16

76526c0

3.1.16

Release date: December 16, 2024

Improvements

Optimized table-related statistics. #50316

Bug Fixes

Fixed the following issues:

Insufficient granularity in error code handling for disk full scenarios caused the BE to mistakenly identify disk errors and delete data. #51411
Stream Load failures when submitted using HTTP 1.0. #53010 #53008
Routine Load tasks were canceled due to expired transactions (now tasks are canceled only if the database or table no longer exists and paused when transactions expired). #50334
Unloading data using EXPORT with Broker to file:// resulted in a file rename error, causing the export to fail. #52544
If the join condition in an equal-join is an expression based on a low-cardinality column, the system may incorrectly push down a Runtime Filter predicate, leading to a BE crash. #50690

Assets 2

12 Dec 16:13

wangsimo0

3.3.7

00177de

3.3.7

Release date: November 29, 2024

New Features

Added a new Materialized View parameter, excluded_refresh_tables, exclude tables that need to be refreshed. #50926

Improvements

Rewrote unnest(bitmap_to_array) as unnest_bitmap to improve performance. #52870
Reduced the write and delete operations of Txn logs. #42542

Bug Fixes

Fixed the following issues:

Failure to connect Power BI to external tables. #52977
Misleading FE Thrift RPC failure messages in logs. #52706
Routine Load tasks were canceled due to expired transactions (now tasks are canceled only if the database or table no longer exists). #50334
Stream Load failures when submitted using HTTP 1.0. #53010 #53008
Integer overflow of partition IDs. #52965
Hive Text Reader failed to recognize the last empty element. #52990
Issues caused by array_map in Join conditions. #52911
Metadata cache issues under high concurrency scenarios. #52968
The whole materialized view was refreshed when a partition was dropped from the base table. #52740

Assets 2

20 Nov 06:38

wangsimo0

3.3.6

8f01cfa

3.3.6

Release date: November 18, 2024

Improvements

Optimized internal repair logic for Primary Key tables. #52707
Optimized the internal implementation of histograms of statistics. #52400
Supports adjusting log level via the FE configuration item sys_log_warn_modules to reduce Hudi Catalog logging. #52709
Supports constant folding in the yearweek function. #52714
Avoided push-down for Lambda functions. #52655
Divided the Query Error metric into three: Internal Error Rate, Analysis Error Rate, and Timeout Rate. #52646
Avoided constant expressions being extracted as common expressions within array_map. #52541
Optimized the Text-based Rewrite of materialized views. #52498

Bug Fixes

Fixed the following issues:

The unique_constraints and foreign_constraints parameters were incomplete in SHOW CREATE TABLE for cloud-native tables in shared-data clusters. #52804
Some materialized views were activated even when enable_mv_automatic_active_check was set to false. #52799
Memory usage is not reducing after stale memory flush. #52613
Resource leak caused by Hudi file-system views. #52738
Concurrent Publish and Update operations on Primary Key tables may cause issues. #52687
Failures to terminate queries on clients. #52185
Multi-column List partitions cannot be pushed down. #51036
Incorrect result due to the lack of hasnull property in ORC files. #52555
An issue caused by using uppercase column names in ORDER BY during table creation. #52513
An error was returned after running ALTER TABLE PARTITION (*) SET ("storage_cooldown_ttl" = "xxx"). #52482

Behavior Changes

In earlier versions, scale-in operations would fail if there were insufficient replicas for views in the _statistics_ database. Starting from v3.3.6, if nodes are scaled in to 3 or more, view replicas are set to 3; if there is only 1 node after the scale-in, view replicas are set to 1, allowing for successful scale-in. #51799

Affected views include:
- column_statistics
- histogram_statistics
- table_statistic_v1
- external_column_statistics
- external_histogram_statistics
- pipe_file_list
- loads_history
- task_run_history
New Primary Key tables no longer allow __op as a column name, even if allow_system_reserved_names is set to true. Existing tables are unaffected. #52621
Expression-partitioned tables cannot have partition names modified. #52557
Deprecated FE parameters heartbeat_mgr_blocking_queue_size and profile_process_threads_num. #52236
Enabled persistent index on object storage by default for Primary Key tables in shared-data clusters. #52209
Disallowed manual changes to bucketing methods for tables with the random bucketing method. #52120
Backup and Restore-related parameter changes: #52111
- make_snapshot_worker_count supports dynamic configuration.
- release_snapshot_worker_count supports dynamic configuration.
- upload_worker_count supports dynamic configuration. Its default value is changed from 1 to the number of CPU cores on the machine where the BE resides.
- download_worker_count supports dynamic configuration. Its default value is changed from 1 to the number of CPU cores on the machine where the BE resides.
The return type of SELECT @@autocommit has changed from BOOLEAN to BIGINT. #51946
Added a new FE configuration item, max_bucket_number_per_partition, to control the maximum number of buckets per partition. #47852
Enabled memory usage checks by default for Primary Key tables. #52393

Assets 2

14 Nov 07:15

yingtingdong

3.2.12

5f81e3e

3.2.12

Release date: October 23, 2024

Improvements

Optimized memory allocation and statistics in BE for certain complex query scenarios to avoid OOM. #51382
Optimized memory usage in FE in Schema Change scenarios. #50855
Optimized the job status display when querying the system-defined view information_schema.routine_load_jobs from Follower FE nodes. #51763
Supports Backup and Restore of with the List partitioned tables. #51993

Bug Fixes

Fixed the following issues:

The error message was lost after writing to Hive failed. #33167
The array_map function causes a crash when excessive constant parameters are used. #51244
Special characters in the PARTITION BY columns of expression partitioned tables cause FE CheckPoint failures. #51677
Accessing the system-defined view information_schema.fe_locks causes a crash. #51742
Querying generated columns causes an error. #51755
Optimize Table fails when the table name contains special characters. #51755
Tablets could not be balanced in certain scenarios. #51828

Behavior Changes

Supports dynamic modification of Backup and Restore-related parameters.#52111

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Lake Analytics

Performance Improvement and Query Optimization

Shared-data Enhancements

Storage Engine

Loading

Others

Downgrade Notes

3.3.9

New Features

Improvements

Bug Fixes

Downgrade notes

3.2.14

Improvements

Bug Fixes

3.3.8

Improvements

Bug Fixes

Behavior Changes

Bug Fixes

Improvements

Bug Fixes

Improvements

Bug Fixes

3.3.7

New Features

Improvements

Bug Fixes

3.3.6

Improvements

Bug Fixes

Behavior Changes

Improvements

Bug Fixes

Behavior Changes

Releases: StarRocks/starrocks

3.4.0-RC01

Data Lake Analytics

Performance Improvement and Query Optimization

Shared-data Enhancements

Storage Engine

Loading

Others

Downgrade Notes

3.3.9

3.3.9

New Features

Improvements

Bug Fixes

Downgrade notes

3.2.14

3.2.14

Improvements

Bug Fixes

3.3.8

3.3.8

Improvements

Bug Fixes

Behavior Changes

3.1.17

Bug Fixes

3.2.13

Improvements

Bug Fixes

3.1.16

Improvements

Bug Fixes

3.3.7

3.3.7

New Features

Improvements

Bug Fixes

3.3.6

3.3.6

Improvements

Bug Fixes

Behavior Changes

3.2.12

Improvements

Bug Fixes

Behavior Changes