Releases: StarRocks/starrocks
Releases · StarRocks/starrocks
Release notes 2.1.0
New Features
- [Preview] StarRocks now supports Iceberg external tables.
- [Preview] The pipeline engine is now available. It is a new execution engine designed for multicore scheduling. The query parallelism can be adaptively adjusted without the need to set the parallel_fragment_exec_instance_num parameter. This also improves performance in high concurrency scenarios.
- The CTAS (Create Table As Select) function is supported, making ETL and table creation easier.
- SQL fingerprint is supported. SQL fingerprint is generated in audit.log, which facilitates the location of slow queries.
Improvements
- Compaction is optimized. A flat table can contain up to 10,000 columns.
- The performance of first-time scan and page cache is optimized. Random I/O is reduced to improve first-time scan performance. The improvement is more noticeable if first-time scan occurs on SATA disks. StarRocks' page cache can store original data, which eliminates the need for bitshuffle encoding and unnecessary decoding. This improves the cache hit rate and query efficiency.
- Schema change is supported in the primary key model. You can add, delete, and modify bitmap indexes by using
Alter table
. - [Preview] The size of a string can be up to 1 MB.
- JSON load performance is optimized. You can load more than 100 MB JSON data in a single file.
- Bitmap index performance is optimized.
- The performance of StarRocks Hive external tables is optimized. Data in the CSV format can be read.
- DEFAULT CURRENT_TIMESTAMP is supported in the create table statement. #1193
- StarRocks supports the loading of CSV files with multiple delimiters.
BugFix
The following bugs are fixed:
- Auto __op mapping does not take effect if jsonpaths is specified in the command used for loading JSON data. #3405
- BE nodes fail because the source data changes during data loading using Broker Load. #3481
- Some SQL statements report errors after materialized views are created. #2975
- The routine load does not work due to quoted jsonpaths. #2488
- Query concurrency decreases sharply when the number of columns to query exceeds 200.
Behavior Changes
- The API for disabling a Colocation Group is changed from DELETE /api/colocate/group_stable to POST /api/colocate/group_unstable.
Others
- flink-connector-starrocks is now available for Flink to read StarRocks data in batches. This improves data read efficiency compared to the JDBC connector.
Release notes 2.0.2
Improvement
- Memory usage is optimized. Users can specify the label_keep_max_num parameter to control the maximum number of loading jobs to retain within a period of time. This prevents full GC caused by high memory usage of FE during frequent data loading. #2410
BugFix
The following bugs are fixed:
- BE nodes fail when the column decoder encounters an exception. #3510
- Auto __op mapping does not take effect when jsonpaths is specified in the command used for loading JSON data. #3405
- BE nodes fail because the source data changes during data loading using Broker Load. #3481
- Some SQL statements report errors after materialized views are created. #3053
- Query may fail if an SQL clause contains a predicate that supports global dictionary for low-cardinality optimization and a predicate that does not. #3421
Release notes 2.0.1
Release notes 2.0.0-GA
Release date: Jan 4, 2022
New Feature
- External Table
- [Experimental Function]Support for Hive external table on S3
- DecimalV3 support for external table #425
- Implement complex expressions to be pushed down to the storage layer for computation, thus gaining performance gains
- Primary Key is officially released, which supports Stream Load, Broker Load, Routine Load, and also provides a second-level synchronization tool for MySQL data based on Flink-cdc
Improvement
- Arithmetic operators optimization
- Optimize the performance of dictionary with low cardinality #791
- Optimize the scan performance of int for single table #273
- Optimize the performance of count(distinct int) with high cardinality #139 #250 #544#570
- Execution level optimization and refinement Group by 2 int / limit / case when / not equal
- Optimize Group by 2 int / limit / case when / not equal in implementation-level
- Memory management optimization
- Refactor the memory statistics and control framework to accurately count memory usage and completely solve OOM
- Optimize metadata memory usage
- Solve the problem of large memory release stuck in execution threads for a long time
- Add process graceful exit mechanism and support memory leak check #1093
Bugfix
- Fix the problem that the Hive external table is timeout to get metadata in a large amount.
- Fix the problem of unclear error message of materialized view creation.
- Fix the implementation of like in vectorization engine #722
- Repair the error of parsing the predicate is in alter table #725
- Fix the problem that the curdate function can not format the date.
Release notes 1.19.5
Release notes 1.19.4
Release notes 1.19.3
Release notes 1.19.2
Improvement
Major Bugfix
Release notes 1.19.1
Release notes 1.19.0
New Feature
- Implement Global Runtime Filter, which can enable runtime filter for shuffle join.
- CBO Planner is enabled by default, improved colocated join, bucket shuffle, statistical information estimation, etc.
- [Experimental Function] Primary Key model release: To better support real-time/frequent update features, StarRocks has added a new table type: primary key model. The model supports Stream Load, Broker Load, Routine Load, JSON import, and also provides a second-level synchronization tool for MySQL data based on Flink-cdc.
- [Experimental Function] Support write function for external tables. Support writing data to another StarRocks cluster table by external tables to solve the read/write separation requirement and provide better resource isolation.
Improvement
- Performance optimization.
- count distinct int statement
- group by int statement
- or statement
- Optimize disk balance algorithm. Data can be automatically balanced after adding disks to a single machine.
- Support partial column export.
- Optimize
show processlist
to show specific SQL. - Support multiple variable settings in SET_VAR .
- Improve the error reporting information, including table_sink, routine load, creation of materialized view, etc.
Bugfix
- Fix the issue that the dynamic partition table cannot be created automatically after the data recovery operation is completed. # 337
- Fix the problem of error reported by row_number function after CBO is opened.
- Fix the problem of FE stuck due to statistical information collection
- Fix the problem that set_var takes effect for session but not for statements.
- Fix the problem that select count(*) returns abnormality on the Hive partition external table.