doris

Author	SHA1	Message	Date
jakevin	b861b66bef	[improve](Nereids): verify the join reorder search space; (#13498 ) * [improve](Nereids): verify the join reorder search space;	2022-10-21 11:48:04 +08:00
Gabriel	9a3c1f0867	[Improvement](decimal) print decimal according to the real precision and scale (#13437 )	2022-10-21 10:00:01 +08:00
Xin Liao	27d84eafc5	[feature](alter) support rename column for table with unique column id (#13410 )	2022-10-21 08:45:34 +08:00
TsukiokaKogane	95f437c506	[fix] Fix potential unhandled exception cause data inconsistency (#11029 ) Co-authored-by: TsukiokaKogane <cby141994@gamil.com>	2022-10-20 23:23:36 +08:00
morrySnow	483a46d17c	[feature](Nereids) generate ExprId from 0 for each statement (#13382 ) Currently, ExprId in Nereids is generated by a global gnerator and shared by all statement. There are three problems: 1. ExprId could out of bound 2. hard to debug 3. could not use bitset to present ExprId set This PR solve this problem by new Id generator for each statement. after this PR ExprId always start from 0 for each statement. TODO: 1. refactor all place that new StatementContext in test code to ensure the logic is same with main code.	2022-10-20 22:29:22 +08:00
Kikyou1997	4ae777bfc5	[fix](Nereids) NPE caused by GroupExpression has null owner group when choosing best plan (#13252 )	2022-10-20 22:23:36 +08:00
Mingyu Chen	32b1456b28	[feature-wip](array) remove array config and check array nested depth (#13428 ) 1. remove FE config `enable_array_type` 2. limit the nested depth of array in FE side. 3. Fix bug that when loading array from parquet, the decimal type is treated as bigint 4. Fix loading array from csv(vec-engine), handle null and "null" 5. Change the csv array loading behavior, if the array string format is invalid in csv, it will be converted to null. 6. Remove `check_array_format()`, because it's logic is wrong and meaningless 7. Add stream load csv test cases and more parquet broker load tests	2022-10-20 15:52:31 +08:00
liujinhui	60d5e4dfce	[improvement](spark-load) support parquet and orc file (#13438 ) Add support for parquet/orc in SparkDpp.java Fixed sparkDpp checkstyle issue	2022-10-20 08:59:22 +08:00
Jibing-Li	4fa3b14bf0	[Fix](multi-catalog)Fix NPE caused by GsonUtils created objects. #13489	2022-10-20 08:52:58 +08:00
xueweizhang	697fa5f586	[Enhancement](profile) support configure the number of query profile (#13421 ) Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2022-10-20 08:51:36 +08:00
Jibing-Li	e65a4a9f9f	[Improvement](multi-catalog)Support refresh external catalog. (#13363 ) Support manually refresh external catalog metadata. 1. refresh catalog external_catalog_name 2. refresh database catalog.db OR refresh database db (current catalog) 3. refresh table catalog.db.table OR refresh table db.table (current catalog) OR refresh table table_name (current db) And the refresh operations above keep the database and table ids unchanged.	2022-10-19 16:02:14 +08:00
Stalary	eeb2b0acdb	[doc][fix](multi-catalog) Add multi-catalog es doc (#13429 ) 1. Add multicatalog es doc 2. Modify es unsigned_long mapping to largeint. 3. getHost add pre judge logic.	2022-10-19 16:00:13 +08:00
Zhengguo Yang	0b368fbbfa	[Bugfix](vec) Fix all create mv using to_bitmap() on negative value columns when enable_vectorized_alter_table is true (#13448 ) * [Bugfix] add negtive value check when create mv using vec	2022-10-19 15:40:04 +08:00
Mingyu Chen	5423de68dd	[refactor](new-scan) remove old file scan node (#13433 ) All these files are not used anymore, can be removed.	2022-10-19 14:25:32 +08:00
starocean999	ac037e57f5	[fix](sort)the sort expr's nullability property may not be right (#13328 )	2022-10-18 22:09:02 +08:00
ElvinWei	d8e53da764	[feature-wip](statistics) collect statistics by sampling sql-tasks (#13399 ) 1. Collect statistics by sampling sql-tasks. 2. Consolidate statistics SQL statements and remove redundant statements.	2022-10-18 16:34:01 +08:00
minghong	18f2db6064	[feature](nereids) let minValue and maxValue in stats support for Date, CHAR and VARCHAR type (#13311 ) 1. enable varchar/char type set min/max value. take first 8 chars as long, and convert to double. 2. fix bug when set min/max value for date and datav2	2022-10-18 12:12:33 +08:00
Mingyu Chen	dbf71ed3be	[feature-wip](new-scan) Support stream load with csv in new scan framework (#13354 ) 1. Refactor the file reader creation in FileFactory, for simplicity. Previously, FileFactory had too many `create_file_reader` interfaces. Now unified into two categories: the interface used by the previous BrokerScanNode, and the interface used by the new FileScanNode. And separate the creation methods of readers that read `StreamLoadPipe` and other readers that read files. 2. Modify the StreamLoadPlanner on FE side to support using ExternalFileScanNode 3. Now for generic reader, the file reader will be created inside the reader, not passed from the outside. 4. Add some test cases for csv stream load, the behavior is same as the old broker scanner.	2022-10-17 23:33:41 +08:00
luozenglin	207f4e559e	[feature](agg) support `group_bitmap_xor` agg function. (#13287 ) support `group_bitmap_xor` agg function	2022-10-17 18:40:06 +08:00
Yongqiang YANG	3b5b7ae12b	[improvement](config) let default value of alter and load timeout suitable for most cases (#13370 ) It is frustrated that a long running job fails due to small timeout. Actually, users do not expect a timeout for a log running job.	2022-10-17 14:55:05 +08:00
abmdocrt	045bccdbea	[Feature](Retention) support retention function (#13056 )	2022-10-17 11:00:47 +08:00
HappenLee	6ea9a65bb6	[Opt](vec) opt runtime filter for TPCH Q22 (#13339 )	2022-10-17 10:30:07 +08:00
xy720	e84d9a6c87	[fix](array-type) Fix cast null to array make be core (#13324 ) Doris do not support explicitly cast NULL_TYPE to ANY type . ``` mysql> select cast(NULL as int); ERROR 1105 (HY000): errCode = 2, detailMessage = Invalid type cast of NULL from NULL_TYPE to INT ``` So we should also forbid user from casting NULL_TYPE to ARRAY type. This commit will produce the following effect: ``` mysql> select cast(NULL as array<int>); ERROR 1105 (HY000): errCode = 2, detailMessage = Invalid type cast of NULL from NULL_TYPE to ARRAY<INT(11)> ```	2022-10-17 00:04:50 +08:00
camby	162e60eb19	[fix](array-type) check value valid while insert data into array column (#13365 ) We should prevent insert while value overflow. 1. create table: `CREATE TABLE test_array_load_test_array_int_insert_db.test_array_load_test_array_int_insert_tb ( k1 int NULL, k2 array<int> NULL ) DUPLICATE KEY(k1) DISTRIBUTED BY HASH(k1) BUCKETS 5` 2. try insert data less than INT_MIN. `insert into test_array_load_test_array_int_insert_tb values (1005, [-2147483649])` Before this pr, the insert will success, but the value it not correct. Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-10-17 00:01:03 +08:00
starocean999	bf2e20c4c4	[fix](agg) reset the content of grouping exprs instead of replace it with original exprs (#13376 ) * [fix](agg)the reseet the content of grouping exprs instead of replace it with original exprs * keep old behavior if the grouping type is not GROUP_BY	2022-10-15 11:07:35 +08:00
Gabriel	79a5125eff	[Improvement](predicates) Use datev2 as the compatible type between string and datev2 (#13348 ) If string literal can be converted to dateV2, we use datev2 as the compatible type instead of datetimev2.	2022-10-14 19:00:37 +08:00
jakevin	993f38fe3c	[feature](Nereids): use Multi join to rearrange join to eliminate cross join by using predicate. (#13353 )	2022-10-14 17:26:34 +08:00
ElvinWei	b82e54a525	[feature](statistics) support to drop table or partition statistics (#13303 ) Manually drop statistics for tables or partitions. Table or partition can be specified, if neither is specified, all statistics under the current database will be deleted. syntax: ```SQL DROP STATS [tableName [PARTITIONS(partitionNames)]]; -- e.g. DROP STATS; -- drop all table statistics under the current database DROP STATS t0; -- drop t0 statistics DROP STATS t1 PARTITIONS(p1); -- drop partition p1 statistics of t1 ```	2022-10-14 15:15:37 +08:00
Xinyi Zou	50ae9e6b19	[enhancement](planner) support select table sample (#10170 ) ### Motivation TABLESAMPLE allows you to limit the number of rows from a table in the FROM clause. Used for data detection, quick verification of the accuracy of SQL, table statistics collection. ### Grammar ``` [TABLET tids] TABLESAMPLE n [ROWS \| PERCENT] [REPEATABLE seek] ``` Limit the number of rows read from the table in the FROM clause, select a number of Tablets pseudo-randomly from the table according to the specified number of rows or percentages, and specify the number of seeds in REPEATABLE to return the selected samples again. In addition, can also manually specify the TableID, Note that this can only be used for OLAP tables. ### Example Q1: ``` SELECT * FROM t1 TABLET(10001,10002) limit 1000; ``` explain: ``` partitions=1/1, tablets=2/12, tabletList=10001,10002 ``` Select the specified tabletID of the t1. Q2: ``` SELECT * FROM t1 TABLESAMPLE(1000000 ROWS) REPEATABLE 1 limit 1000; ``` explain: ``` partitions=1/1, tablets=3/12, tabletList=10001,10002,10003 ``` Q3: ``` SELECT * FROM t1 TABLESAMPLE(1000000 ROWS) REPEATABLE 2 limit 1000; ``` explain: ``` partitions=1/1, tablets=3/12, tabletList=10002,10003,10004 ``` Pseudo-randomly sample 1000 rows in t1. Note that several Tablets are actually selected according to the statistics of the table, and the total number of selected Tablet rows may be greater than 1000, so if you want to explicitly return 1000 rows, you need to add Limit. ### Design First, determine how many rows to sample from each partition according to the number of partitions. Then determine the number of Tablets to be selected for each partition according to the average number of rows of Tablet, If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition. If seek is specified, it will be selected sequentially from the seek tablet of the partition. And add the manually specified Tablet id to the selected Tablet.	2022-10-14 15:05:23 +08:00
Zhengguo Yang	a2a2be22a5	[ResourceTag](tag) Unified tag format verification (#13312 )	2022-10-14 14:21:55 +08:00
morrySnow	a2e513720e	[feature](Nereids) auto fallback to legacy planner if analyze failed (#13351 ) 1. add NereidsException to wrap any exception thrown by Nereids 2. when we catch NereidsException and switch 'enableFallbackToOriginalPlanner' is on, we will use Legacy Planner to plan again	2022-10-14 10:38:21 +08:00
starocean999	5e0c34b35a	[fix](join) should call getOutputTblRefIds to get child's tuple info (#13227 ) * [fix](join) should call getOutputTblRefIds to get child's tuple info	2022-10-14 09:46:14 +08:00
xy720	87e5e2b48b	[Fix](array-type) Disable schema change between array type columns (#13261 ) Currently, we do not support schema change between array type columns. We should forbid users from doing this operation.	2022-10-13 22:59:09 +08:00
luozenglin	cb300b0b39	[feature](agg) support `any`,`any_value` agg functions. (#13228 )	2022-10-13 18:31:19 +08:00
zhannngchen	fe1524a287	[Enhancement](load) remove load mem limit (#13111 ) #12716 removed the mem limit for single load task, in this PR I propose to remove the session variable load_mem_limit, to avoid confusing. For compatibility, load_mem_limit in thrift not removed, the value is set equal to exec_mem_limit in FE	2022-10-13 17:19:22 +08:00
jakevin	4a6eb01ccb	[refactor](Nereids): refactor UT by using Pattern and rename to remove consecutive (#13337 ) * rename * refactor UT	2022-10-13 16:41:51 +08:00
Zhengguo Yang	0ff04e81bc	[fix](DynamicPartition) Not check max_dynamic_partition_num when disable DynamicPartition (#13267 ) Disable max_dynamic_partition_num check when disable DynamicPartition by ALTER TABLE tbl_name SET ("dynamic_partition.enable" = "false"), when max_dynamic_partition_num changed to larger and then changed to a lower value, the actual dynamic partition num may larger than max_dynamic_partition_num, and cannot disable DynamicPartition	2022-10-13 14:37:39 +08:00
jakevin	db7f955a70	[improve](Nereids): split otherJoinCondition with List. (#13216 ) * split otherJoinCondition with List.	2022-10-13 13:49:46 +08:00
jakevin	4248c6f37c	[improve](Nereids): avoid duplicated stats derive. (#13293 )	2022-10-13 13:49:21 +08:00
xueweizhang	e08ba8d573	[feature](restore) Add new property 'reserve_dynamic_partition_enable' to restore statement (#12498 ) Add restore new property 'reserve_dynamic_partition_enable', which means you can get a table with dynamic_partition_enable property which has the same value as before the backup. before this commit, you always get a table with property 'dynamic_partition_enable=false' when restore.	2022-10-13 11:16:15 +08:00
liujinhui	7147c77f22	[Enhancement](broker)Doris support obs broker load (#12781 ) 1. Upgrade fs_broker module hadoop2.7.3->hadoop2.8.3 2. Support obs broker load org.apache.doris.broker.hdfs.FileSystemManager add getOBSFileSystem method	2022-10-13 09:44:13 +08:00
Kang	1bd14f1d82	[feature-wip](jsonb) jsonb parse function and load (#13129 ) add function to parse json string to jsonb format and use it to support stream load.	2022-10-12 13:56:37 +08:00
Pxl	5c68f69362	[improvement](config) set enable_local_exchange default value to true (#13292 )	2022-10-12 09:07:24 +08:00
Kikyou1997	3c5e7e2f24	[feature](nereids) refactor statistics framework and introduce StatsCalculatorV2 (#12987 ) * squash change data type of metrics to double unit test add stats for some function add stats for arithmeticExpr 1. set max/min of ColumnStats to double 2. add stats for binaryExpr/compoundExpr in predicate * Add LiteralExpr in ColumnStat just for user display only.	2022-10-11 17:23:49 +08:00
Mingyu Chen	5af1439934	[feature](auth) support user password policy and alter user stmt (#13051 )	2022-10-11 16:37:35 +08:00
Kikyou1997	b5da751c2a	[enhancement](Nereids) remove redundant log when fall back to legacy parser (#13243 )	2022-10-11 10:53:07 +08:00
Kikyou1997	f007e0aed0	[fix](statstics) Incorrectly using the number of buckets to determine whether the table is partitioned (#13218 )	2022-10-10 17:22:24 +08:00
Gavin Chou	63903136c4	[refactor](jcup) Format keywords in sql_parser.cup (#13133 ) The key keyword definition section of `sql_parser.cup` is unordered and messy: 1. It is almost unreadable 2. There are no rules to format it when we make a change to it 3. It takes unnecessary effort to resolve conflict caused by the unordered keywords We can apply some simple rules to format it: 1. Sort in lexicographical order 4. Break into several "sections", keywords in each section have the same prefix `KW_${first_letter}` 5. Every 2 sections are connected with an empty line containing only 4 white spaces e.g. ``` terminal String KW_A... KW_B... ... KW_Z... ```	2022-10-10 14:34:51 +08:00
minghong	375dfedd83	[feature](nereids) dump physical tree and memo (#13091 ) dump memo info and physical plan in stdout and log set `enable_nereids_trace` variable true/false to open/close this dump. following is a fragment of memo: ``` Group[GroupId#8] GroupId#8(plan=PhysicalHashJoin ( type=INNER_JOIN, hashJoinCondition=[(r_regionkey#250 = n_regionkey#255)], otherJoinCondition=Optional.empty, stats=null )) children=[GroupId#6 GroupId#7 ] stats=(rows=25, isReduced=false, width=2) GroupId#8(plan=PhysicalHashJoin ( type=INNER_JOIN, hashJoinCondition=[(r_regionkey#250 = n_regionkey#255)], otherJoinCondition=Optional.empty, stats=null )) children=[GroupId#7 GroupId#6 ] stats=(rows=25, isReduced=false, width=2) ```	2022-10-10 13:05:28 +08:00
starocean999	e829061614	[fix](sort)should not change resolvedTupleExprs in toThrift method (#13211 ) The toThrift method will be called mutilple times for sending data to different be but the changes of resolvedTupleExprs should be done only once. This pr make sure the resolvedTupleExprs can only be changed only once	2022-10-10 08:39:58 +08:00

1 2 3 4 5 ...

2904 Commits