doris

Author	SHA1	Message	Date
xzj7019	1715a824dd	[fix](nereids) fix partition dest overwrite bug when cte as bc right (#22177 ) In current cte multicast fragment param computing logic in coordinator, if shared hash table for bc opened, its destination's number will be the same as be hosts'. But the judgment of falling into shared hash table bc part code is wrong, which will cause when a multicast's target is fixed with both bc and partition, the first bc info will overwrite the following partition's, i.e, the destination info will be the host level, which should be per instance. This will cause the hash partition part hang.	2023-07-25 19:26:29 +08:00
LiBinfeng	28bbfdd590	[Fix](Nereids) fix minidump unit test caused of columnstatus changed (#22201 ) Problem: Minidump unit test failed because of column statistic deserialization need a new column schema but not added to minidump unit test file Solved: Add last update time to unit test input file	2023-07-25 19:23:12 +08:00
AKIRA	30965eed21	[fix](stats) Ignore complex type by default when collect column statistics (#21965 ) By default, if it contains any complex type in Analyze stmt submitted by user and error would be thrown before this PR.	2023-07-25 18:26:49 +08:00
lihangyu	3b6702a1e3	[Bug](point query) cancel future when meet timeout in PointQueryExec (#21573 ) 1. cancel future when meet timeout and add config to modify rpc timeout 2. add config to modify numof BackendServiceProxy since under high concurrent work load GRPC channel will be blocked	2023-07-25 18:18:09 +08:00
Kang	a7446fa59e	[fix](inverted index) make error message more friendly when query token is empty (#22118 )	2023-07-25 18:00:35 +08:00
morrySnow	f74f3e7944	[refactor](Nereids) add sink interface and abstract class (#22150 ) 1. add trait Sink 2. add abstract class LogicalSink and PhysicalSink 3. replace some sink visitor by visitLogicalSink and visitPhysicalSink	2023-07-25 17:51:49 +08:00
Gabriel	23e7423748	[pipeline](refactor) refactor pipeline task schedule logics (#22028 )	2023-07-25 17:18:26 +08:00
Xiangyu Wang	39ca91fc22	[opt](Nereids) always fallback when parse failed (#21865 ) always fallback to legacy planner when parse failed even if enable_fallback_to_original_planner is set to false	2023-07-25 17:08:57 +08:00
wudi	527547b4ed	[catalog](faq) add jdbc catalog faq (#22129 )	2023-07-25 15:59:16 +08:00
wudi	1e8ae7ad16	[doc](flink-connector)improve flink connector doc (#22143 )	2023-07-25 15:58:35 +08:00
huanghaibin	226b75e074	[Fix](compaction) return internal error to avoid be core when finalize_columns_data (#21882 ) return error instead of CHECK_EQ to avoid be core when finalize_columns_data	2023-07-25 15:39:58 +08:00
LiBinfeng	f84af95ac4	[feature](Nereids) Add minidump replay and refactor user feature of minidump (#20716 ) ### Two main changes: - 1、add minidump replay - 2、change minidump serialization of statistic messages and some interface between main logic of nereids optimizer and minidump ### Use of nereids ut: - 1、save minidump files: Execute command by mysql-client: ``` set enable_nereids_planner=true; set enable_minidump=true; ``` Execute sql in mysql-client - 2、use nereids-ut script to execute directory: ``` cp -r ${DORIS_HOME}/minidump ${DORIS_HOME}/output/fe && cd ${DORIS_HOME}/output/fe ./nereids_ut --d ${directory_of_minidump_files} ``` ### Refactor of minidump - move statistics used serialization to serialization of input and serialize with catalogs - generating minidump file only when enable_minidump flag is set, minidump module interactive with main optimizer only by : serializeInputsToDumpFile(catalog, statistics, query) && serializeOutputsToDumpFile(outputplan).	2023-07-25 15:26:19 +08:00
airborne12	fc2b9db0ad	[Feature](inverted index) add tokenize function for inverted index (#21813 ) In this PR, we introduce TOKENIZE function for inverted index, it is used as following: ``` SELECT TOKENIZE('I love my country', 'english'); ``` It has two arguments, first is text which has to be tokenized, the second is parser type which can be english, chinese or unicode. It also can be used with existing table, like this: ``` mysql> SELECT TOKENIZE(c,"chinese") FROM chinese_analyzer_test; +---------------------------------------+ \| tokenize(`c`, 'chinese') \| +---------------------------------------+ \| ["来到", "北京", "清华大学"] \| \| ["我爱你", "中国"] \| \| ["人民", "得到", "更", "实惠"] \| +---------------------------------------+ ```	2023-07-25 15:05:35 +08:00
mch_ucchi	d96e31c4d7	[opt](Nereids) not push down global limit to avoid early gather (#21891 ) the global limit will create a gather action, and all the data will be calculated in one instance. If we push down the global limit, the node run after the limit node will run slowly. We fix it by push down only local limit. a join plan tree before fixing: ``` LogicalLimit(global) LogicalLimit(local) Plan() LogicalLimit(global) LogicalLimit(local) LogicalJoin LogicalLimit(global) LogicalLimit(local) Plan() LogicalLimit(global) LogicalLimit(local) Plan() after fixing: LogicalLimit(global) LogicalLimit(local) Plan() LogicalLimit(local) LogicalJoin LogicalLimit(local) Plan() LogicalLimit(local) Plan() ```	2023-07-25 14:45:20 +08:00
bobhan1	2b4bfe5be7	[fix](autoinc) fix `_fill_auto_inc_cols` when the input column is `ColumnConst` (#22175 )	2023-07-25 14:41:36 +08:00
Mryange	28b714c371	[feature](executor) using fe version to set instance_num (#22047 )	2023-07-25 14:37:42 +08:00
YueW	c01230f99a	[fix](match) Optimize the logic for match_phrase function filter (#21622 )	2023-07-25 14:22:37 +08:00
abmdocrt	c251a574e8	[Fix](MoW) Fix dup key when do schema change add new key (#22154 )	2023-07-25 14:18:01 +08:00
Gabriel	103c473b96	[Bug](pipeline) fix pipeline shared scan + topn optimization (#21940 )	2023-07-25 12:48:27 +08:00
Mryange	0f439bb1ca	[vectorized](udf) java udf support map type (#22059 )	2023-07-25 11:56:20 +08:00
TengJianPing	7891c99e9f	[fix](pipeline) fix wrong state of runtime filter of pipeline (#22179 )	2023-07-25 11:29:09 +08:00
AKIRA	f6b47c34b3	[improvement](stats) show stats with updated time (#21377 ) Support to view the stats updated time. After ```sql mysql> show column stats t1; +-------------+-------+------+----------+-----------+---------------+------+------+---------------------+ \| column_name \| count \| ndv \| num_null \| data_size \| avg_size_byte \| min \| max \| updated_time \| +-------------+-------+------+----------+-----------+---------------+------+------+---------------------+ \| col2 \| 2.0 \| 2.0 \| 0.0 \| 0.0 \| 0.0 \| 2 \| 5 \| 2023-06-30 15:50:24 \| \| col3 \| 2.0 \| 2.0 \| 0.0 \| 0.0 \| 0.0 \| 3 \| 6 \| 2023-06-30 15:50:48 \| \| col1 \| 2.0 \| 2.0 \| 0.0 \| 0.0 \| 0.0 \| '1' \| '4' \| 2023-06-30 15:50:48 \| +-------------+-------+------+----------+-----------+---------------+------+------+---------------------+ ``` Before ```sql mysql> show column stats t1; +-------------+-------+------+----------+-----------+---------------+------+------+ \| column_name \| count \| ndv \| num_null \| data_size \| avg_size_byte \| min \| max \| +-------------+-------+------+----------+-----------+---------------+------+------+ \| col2 \| 2.0 \| 2.0 \| 0.0 \| 0.0 \| 0.0 \| 2 \| 5 \| \| col3 \| 2.0 \| 2.0 \| 0.0 \| 0.0 \| 0.0 \| 3 \| 6 \| \| col1 \| 2.0 \| 2.0 \| 0.0 \| 0.0 \| 0.0 \| '1' \| '4' \| +-------------+-------+------+----------+-----------+---------------+------+------+ ```	2023-07-25 11:22:08 +08:00
Jerry Hu	b41fcbb783	[feature](agg) add the aggregation function 'mag_agg' (#22043 ) New aggregation function: map_agg. This function requires two arguments: a key and a value, which are used to build a map. select map_agg(column1, column2) from t group by column3;	2023-07-25 11:21:03 +08:00
morrySnow	6a03a612a0	[opt](Nereids) add check msg for creating decimal type (#22172 )	2023-07-25 11:19:41 +08:00
caiconghui	2e20ff8cab	[feature](metric) Support collect query counter and error query counter metric in user level (#22125 ) 1. support collect query counter and error query counter metric in user level 2. add sum and count for histogram metric for mistaken delete in PR #22045	2023-07-25 11:16:38 +08:00
Euporia	ba2eb4d788	[typo](docs) add jdbc catalog error handling methods (#22160 )	2023-07-25 10:45:29 +08:00
LiBinfeng	3c58e9bac9	[Fix](Nereids) Fix problem of infer predicates not completely (#22145 ) Problem: When inferring predicate in nereids, new inferred predicates can not be the source of next round. For example: create table tt1(c1 int, c2 int) distributed by hash(c1) properties('replication_num'='1'); create table tt2(c1 int, c2 int) distributed by hash(c1) properties('replication_num'='1'); create table tt3(c1 int, c2 int) distributed by hash(c1) properties('replication_num'='1'); explain select * from tt1 left join tt2 on tt1.c1 = tt2.c1 left join tt3 on tt2.c1 = tt3.c1 where tt1.c1 = 123; we expect to get t33.c1 = 123, but we can just get t22.c1 = 123. Because when infer tt1.c1 = 123 and tt2.c1 = tt3.c1, we can not get any relationship of these two predicates. Solution: We need to cache middle results of source predicates like t22.c1 = 123 in example.	2023-07-25 10:05:00 +08:00
Gabriel	a0463ea047	[round](decimalv2) round decimalv2 to precision value (#22138 ) * [round](decimalv2) round decimalv2 to precision value * update * update`	2023-07-25 03:29:48 +08:00
Qi Chen	752cec9e19	[Fix](multi-catalog) Fix not single slot filter conjuncts with dict filter issue. (#22052 ) ### Issue Dictionary filtering is a mechanism that directly reads the dictionary encoding of a single string column filter condition for filter comparison. But dictionary filtered single string columns may be included in other multi-column filter conditions. This can cause problems. For example: `select * from multi_catalog.lineitem_string_date_orc where l_commitdate < l_receiptdate and l_receiptdate = '1995-01-01' order by l_orderkey, l_partkey, l_suppkey, l_linenumber limit 10;` `l_receiptdate` is string filter column，it is included by multi-column filter condition `l_commitdate < l_receiptdate`. ### Solution Resolve it by separating the multi-column filter conditions and executing it after the dictionary filter column is converted to string.	2023-07-24 22:31:18 +08:00
zhangdong	fc67929e34	[improvement](catalog) optimize ldap and support more character in user and table name (#21968 ) - common name support `-` ,reason: MySQL's db name support `-` - table name support `-` - username support `.`,reason:LDAP's username support `.` - ldap doc - ldap support rbac	2023-07-24 22:04:37 +08:00
zhangdong	7fcf702081	[improvement](multi catalog)paimon support filesystem metastore (#21910 ) 1.support filesystem metastore 2.support predicate and project when split 3.fix partition table query error todo: Now you need to manually put paimon-s3-0.4.0-incubating.jar in be/lib/java_extensions when use s3 filesystem doc pr: #21966	2023-07-24 22:02:57 +08:00
Dongyang Li	8180cde83b	[tools](tpcds) Update README.md, use default gcc (#21159 ) compile with gcc-11 is not ok, compile with gcc 9.40 or below is ok, default gcc often meet requirements.	2023-07-24 21:47:51 +08:00
Dongyang Li	9fe470b273	[pipeline](check) update check-pr-if-need-run-build.sh (#22171 ) no need to run pipeline if only modify regression-test/pipeline/p0/conf/regression-conf.groovy or regression-test/pipeline/p1/conf/regression-conf.groovy	2023-07-24 21:04:23 +08:00
morrySnow	82bdcb3da8	[fix](Nereids) translate partition topn order key on wrong tuple (#22168 ) partition key should on child tuple, sort key should on partition top's tuple	2023-07-24 20:46:27 +08:00
AKIRA	2d52d8d926	[opt](stats) Update stats table config and comment (#22070 ) 1. set replica count fot stats tbl as :"Math.max(Config.statistic_internal_table_replica_num,Config.min_replication_num_per_tablet)" 2. update comment for stats tbl remove symbol `'`	2023-07-24 20:43:55 +08:00
morrySnow	0677b261b5	[fix](Nereids) should not process prepare command by Nereids (#22167 )	2023-07-24 20:11:40 +08:00
Siyang Tang	0205f540ac	[enhancement](config) Enlarge broker scanner bytes conf to 500G, 5G is still not enough (#22126 )	2023-07-24 19:49:39 +08:00
morrySnow	cf30ea914a	[fix](Nereids) forbid gather sort with explict shuffle (#22153 ) gather sort with explict shuffle usually bad, forbid it	2023-07-24 19:45:18 +08:00
Ashin Gau	30c21789c8	[opt](filecache) use weak_ptr to cache the file handle of file segment (#21975 ) Use weak_ptr to cache the file handle of file segment. The max cached number of file handles can be configured by `file_cache_max_file_reader_cache_size`, default `1000000`. Users can inspect the number of cached file handles by request BE metrics: `http://be_host:be_webserver_port/metrics`: ``` # TYPE doris_be_file_cache_segment_reader_cache_size gauge doris_be_file_cache_segment_reader_cache_size{path="/mnt/datadisk1/gaoxin/file_cache"} 2500 ```	2023-07-24 19:09:27 +08:00
Calvin Kirs	3ba3690f93	[Fix](Http-API)Check and replace user sensitive characters (#22148 )	2023-07-24 18:21:42 +08:00
谢健	68bd4a1a96	[opt](Nereids) check multiple distinct functions that cannot be transformed into muti_distinct (#21626 ) This commit introduces a transformation for SQL queries that contain multiple distinct aggregate functions. When the number of distinct values processed by these functions is greater than 1, they are converted into multi_distinct functions for more efficient handling. Example: ``` SELECT COUNT(DISTINCT c1), SUM(DISTINCT c2) FROM tbl GROUP BY c3 -- Transformed to SELECT MULTI_DISTINCT_COUNT(c1), MULTI_DISTINCT_SUM(c2) FROM tbl GROUP BY c3 ``` The following functions can be transformed: - COUNT - SUM - AVG - GROUP_CONCAT If any unsupported functions are encountered, an error is now reported during the optimization phase. To ensure the absence of such cases, a final check has been implemented after the rewriting phase.	2023-07-24 16:34:17 +08:00
morrySnow	21deb57a4d	[fix](Nereids) remove double sigature of ceil, floor and round (#22134 ) we convert input parameters to double for function ceil, floor and round, because DecimalV2 could not do these operation. Since we intro DecimalV3, we should convert all parameters to DecimalV3 to get correct result. For example, when we use double as parameters, we get wrong result: ```sql select round(341/20000,4),341/20000,round(0.01705,4); +-------------------------+---------------+-------------------+ \| round((341 / 20000), 4) \| (341 / 20000) \| round(0.01705, 4) \| +-------------------------+---------------+-------------------+ \| 0.017 \| 0.01705 \| 0.0171 \| +-------------------------+---------------+-------------------+ ``` DecimalV3 could get correct result ```sql select round(341/20000,4),341/20000,round(0.01705,4); +-------------------------+---------------+-------------------+ \| round((341 / 20000), 4) \| (341 / 20000) \| round(0.01705, 4) \| +-------------------------+---------------+-------------------+ \| 0.0171 \| 0.01705 \| 0.0171 \| +-------------------------+---------------+-------------------+ ```	2023-07-24 16:08:00 +08:00
YueW	d2531db1cf	[fix](inverted index) fix regression case test_index_change_7 occasional failure (#22066 )	2023-07-24 15:39:08 +08:00
morrySnow	ac9480123c	[refactor](Nereids) push down all non-slot order key in sort and prune them upper sort (#22034 ) According the implementation in execution engine, all order keys in SortNode will be output. We must normalize LogicalSort follow by it. We push down all non-slot order key in sort to materialize them behind sort. So, all order key will be slot and do not need do projection by SortNode itself. This will simplify translation of SortNode by avoid to generate resolvedTupleExprs and sortTupleDesc.	2023-07-24 15:36:33 +08:00
HHoflittlefish777	e146969376	[Fix](config) delete unuse lazy open config #22136	2023-07-24 15:02:34 +08:00
DeadlineFen	667e4ea99b	[Fix](binlog) Fix bugs in tombstone (#22031 )	2023-07-24 14:33:16 +08:00
xzj7019	b5f27b5349	[enhance](nereids) enable wf partition topn by default (#21860 )	2023-07-24 14:21:45 +08:00
TengJianPing	99bf901607	[fix](in) throw exception for unsupported data type of in expr (#22050 )	2023-07-24 14:13:31 +08:00
Calvin Kirs	1a6709d3ac	[Fix](Sonar)Fix Java heap space error (#22135 )	2023-07-24 12:46:19 +08:00
jakevin	66fa1bef6d	[refactor](Nereids): avoid useless groupByColStats Map (#22000 )	2023-07-24 12:13:52 +08:00

1 2 3 4 5 ...

12068 Commits