doris

Author	SHA1	Message	Date
jakevin	1eab93b368	[chore](Nereids): remove useless code (#22960 )	2023-08-15 13:14:20 +08:00
morrySnow	13d24297a7	[fix](Nereids) type check could not work when root node is table or file sink (#22902 ) type check could not work because no expression in plan. sink and scan have no expression at all. so cannot check type. this pr add expression on logical sink to let type check work well	2023-08-15 11:45:16 +08:00
daidai	bfc1efe1aa	[fix](createTableStmt)fix bug that createTableStmt toSql (#22750 ) Issue Number: https://github.com/apache/doris/issues/22749	2023-08-15 10:35:22 +08:00
Siyang Tang	b49dc8042d	[feature](load) refactor CSV reading process during scanning, and support enclose and escape for stream load (#22539 ) ## Proposed changes Refactor thoughts: close #22383 Descriptions about `enclose` and `escape`: #22385 ## Further comments 2023-08-09: It's a pity that experiment shows that the original way for parsing plain CSV is faster. Therefor, the refactor is only applied on enclose related code. The plain CSV parser use the original logic. Fallback of performance is unavoidable anyway. From the `CSV reader`'s perspective, the real weak point may be the write column behavior, proved by the flame graph. Trimming escape will be enable after fix: #22411 is merged Cases should be discussed: 1. When an incomplete enclose appears in the beginning of a large scale data, the line delimiter will be unreachable till the EOF, will the buffer become extremely large? 2. What if an infinite line occurs in the case? Essentially, `1.` is equivalent to this. Only support stream load as trial in this PR, avoid too many unrelated changes. Docs will be added when `enclose` and `escape` is available for all kinds of load.	2023-08-15 09:23:53 +08:00
hzq	ad8a8203a2	[fix](mysql compatibility) add an internal database mysql to improve mysql compatibility (#22868 )	2023-08-14 17:03:11 +08:00
jakevin	45481f5fe2	[optimize](Nereids): optimize Nereids performance (#22885 )	2023-08-14 15:21:29 +08:00
morrySnow	8f471a3a1f	[fix](Nereids) push agg to meta scan is not work well (#22811 )	2023-08-14 14:35:21 +08:00
zhangdong	fa6110accd	[fix](catalog)paimon support more data type (#22899 )	2023-08-14 13:48:33 +08:00
Pxl	49d503911e	[MV](exec) disable create mv with select star (#22895 )	2023-08-13 19:28:51 +08:00
bobhan1	bddab94121	[Enhancement](partial update) Support including delete sign column in partial update stream load (#22874 )	2023-08-13 10:32:21 +08:00
yujun	bff3b90263	[fix](tablet clone) fix tablet sched failed when tablet missing tag and version incomplete (#22861 )	2023-08-13 10:18:01 +08:00
Jibing-Li	5b09254fac	[improvement](external statistics)Fix external stats collection bugs (#22788 ) 1. Collect external table row count when execute analyze database. 2. Support show cached table stats (row count) 3. Support alter external table column stats. 4. Refresh/Invalidate table row count stat memory cache when analyze task finished and drop table stats.	2023-08-11 21:58:24 +08:00
mch_ucchi	130c47e669	[Fix](Nereids)add need forward for enable_nereids_dml and format some cases (#22888 )	2023-08-11 19:35:29 +08:00
mch_ucchi	045843991a	[Fix](Nereids) fix insert into table of random distribution for nereids (#22831 ) currently insert into a table of random distribution info is not supported, we fix it by set physical properties to Any.	2023-08-11 19:26:39 +08:00
jakevin	a2fd488438	[chore](Nereids): polish StatsCalculatorTest (#22884 )	2023-08-11 18:08:18 +08:00
DongLiang-0	a089fe3e43	[Improve](jni-avro)Reduce the volume of the avro-scanner-jar package (#22276 ) The avro-scanner-jar package is reduced from 204M to 160M. Hadoop-related dependencies in the original avro pom are directly packaged into a jar package, resulting in a jar volume of 200M. Now since there is already a hadoop jar package environment in be lib, it can be directly referenced.	2023-08-11 17:26:14 +08:00
DongLiang-0	db69457576	[fix](avro)Fix S3 TVF avro format reading failure (#22199 ) This pr fixes two issues: 1. when using s3 TVF to query files in AVRO format, due to the change of `TFileType`, the originally queried `FILE_S3 ` becomes `FILE_LOCAL`, causing the query failed. 2. currently, both parameters `s3.virtual.key` and `s3.virtual.bucket` are removed. A new `S3Utils` in jni-avro to parse the bucket and key of s3. The purpose of doing this operation is mainly to unify the parameters of s3.	2023-08-11 17:22:48 +08:00
starocean999	548226acfc	[fix](planner)shouldn't change the child type to assignmentCompatibleType if it's INVALID_TYPE (#22841 ) if changing the child type to INVALID_TYPE, the later getBuiltinFunction call will fail	2023-08-11 17:14:49 +08:00
Xiangyu Wang	8c3b95c523	[Fix](multi-catalog) sync default catalog when forwarding query to master. (#22684 ) Assume that there is a hive catalog named hive_ctl, a hive db named db1 and a table named tbl1, if we connect a slave FE and execute following commands: 1. `switch hive_ctl` 2. `show partitions from db1.tbl1` Then we will meet the error like this: ``` MySQL [(none)]> show partitions from db1.tbl1; ERROR 1049 (42000): errCode = 2, detailMessage = Unknown database 'default_cluster:db1' ``` The reason is that the slave FE will forward the `ShowPartitionStmt` to master FE but we do not sync the default catalog information, so the parser can not find the db and throws this exception. This is just one case, some other simillar cases will failed too.	2023-08-11 14:59:04 +08:00
jakevin	72837a3ab4	[enhancement](Nereids): Plan equals() hashcode() don't need LogicalProprties (#22774 ) - deepEquals don't need to compare LogicalProperties - Plan equals() hashcode() don't need logicalProperty	2023-08-11 14:53:47 +08:00
slothever	209f36f1bf	[fix](multi-catalog)fix jdbc loader (#22814 )	2023-08-11 14:36:19 +08:00
Yulei-Yang	94a7b44540	[Improvement](log) add config to controll compression of fe log & fe audit log (#22865 ) fe log is large for a busy doris cluster, if you want to preserve some historical logs, it cost too much disk space. enable compression is a good way to save space. and a gzip compressed text file can be viewed without decompression.	2023-08-11 14:08:08 +08:00
jakevin	080d613238	[enhancement](Nereids): speed up rewrite() (#22846 ) - use Set<Integer> instead of Set<String> to speedup `contains` - remove `getValidRules` and use `if` in `for` to avoid `toImmutableList`	2023-08-11 13:04:30 +08:00
Calvin Kirs	caf496a67e	[Chore](RoutineLoad)Change max_batch_interval minimum limit from 5 to 1 (#22858 )	2023-08-11 12:02:20 +08:00
yujun	b9b9071c9b	[improvement](create partition) create partition require quorum replicas succ (#22554 )	2023-08-11 11:59:05 +08:00
Calvin Kirs	e17779f193	[Dependency](fe)Upgrade dependency version (#22496 ) Upgrade guava to 32.1.2-jre Set ck dependency scope to provided Upgrade okio to 3.4.0 Upgrade snake yaml to 1.33 Upgrade aws-java-sdk to 1.12.519 Upgrade hadoop to 3.3.6	2023-08-11 10:54:37 +08:00
bobhan1	0aa00026bb	[fix](autoinc) ignore column property isAutoInc() for create table as select ... statement(#22827 )	2023-08-10 23:25:54 +08:00
yujun	9dc0f80386	[log](tablet clone) add decommission replica log (#22799 )	2023-08-10 21:41:45 +08:00
Mingyu Chen	a99211d818	[test](ctas) add some ut for testing varchar length in ctas (#22817 ) 1. If derived from a origin column, eg: `create table tbl1 as select col1 from tbl2`, the length will be same os the origin column. 2. If derived from a function, eg: `create table tbl1 as select func(col1) from tbl2`, the length will be 65533. 3. If derived from a constant value, eg: `create table tbl1 as select "abc" from tbl2`, the length will be 65533.	2023-08-10 20:48:12 +08:00
Chuanle Chen	71807ceb5f	[Enhancement](tvf) Table value function support reading local file (#17404 ) I tested the local tvf with tpch queries. First, generate `lineitem` datasets with 6001215 rows, and load it into `lineitem` table by: ``` insert into lineitem select c11, c1, c4, c2, c3, c5, c6, c7, c8, c9, c10, c12, c13, c14, c15, c16 from local( "file_path" = "tools/tpch-tools/bin/tpch-data/lineitem.tbl.1", "backend_id" = "10003", "format" = "csv", "column_separator" = "\|" ); ``` Then, run `q1` and `q16` tpch queries, the query result is correct. It can also analyze the BE's log directly like: ``` mysql> select * from local( "file_path" = "log/be.out", "backend_id" = "10006", "format" = "csv") where c1 like "%start_time%" limit 10; +--------------------------------------------------------+ \| c1 \| +--------------------------------------------------------+ \| start time: 2023年 08月 07日星期一 23:20:32 CST \| \| start time: 2023年 08月 07日星期一 23:32:10 CST \| \| start time: 2023年 08月 08日星期二 00:20:50 CST \| \| start time: 2023年 08月 08日星期二 00:29:15 CST \| +--------------------------------------------------------+ ```	2023-08-10 20:07:42 +08:00
minghong	879024a3a2	disable costmodelV2 (#22830 )	2023-08-10 19:22:24 +08:00
zy-kkk	8e5b4005dc	[enhancement](data type) add use_mysql_bigint_for_largeint config Tell Doris to use bigint when returning largeint type to mysql jdbc (#22835 )	2023-08-10 18:53:31 +08:00
HappenLee	a1223218f3	[pipeline](exec) Support shared scan in jdbc and odbc scan node (#22826 ) Support shared scan in jdbc and odbc scan node to improve exec performance	2023-08-10 18:34:45 +08:00
wuwenchi	94c9dce308	[fix](iceberg) fix iceberg's filter expr to filter file (#22740 ) Fix iceberg's filter expr to filter file, and add counts the number of partitions read	2023-08-10 18:20:57 +08:00
Calvin Kirs	221e860cb7	[Feature](Routine Load)Support Partial Update (#22785 )	2023-08-10 17:41:53 +08:00
Chenyang Sun	df26fb2de4	[fix][alter table property] fix alter table property failed (#22791 )	2023-08-10 17:12:42 +08:00
AlexYue	fd0c161081	[enhance](ColdHeatSeparation) forbid change storage policy to another one with different storage resource (#22519 )	2023-08-10 16:32:09 +08:00
yujun	50fbe31f93	[fix](tablet report) fix not add replicas when a backend re join the cluster after changing its ip or port (#22700 )	2023-08-10 15:29:28 +08:00
AKIRA	ec0cedab51	[opt](stats) Use single connect context for each olap analyze task 1. add some comment 2. Fix potential NPE caused by deleting a running analyze job 3. Use single connect context for each olap analyze task	2023-08-10 15:04:28 +08:00
Yulei-Yang	f7d00d467a	[fix](multicatlog) fix read hive/iceberg catalog on cosn & fix read data via broker (#22087 ) * [fix](multicatlog) fix read hive/iceberg catalog on cosn & fix read data via broker * Update FileSystemFactory.java	2023-08-10 14:44:53 +08:00
Qi Chen	f2658dc7bd	[Feature](multi-catalog) Truncate char or varchar columns if size is smaller than file columns or not found in the file column schema. (#22318 ) Truncate char or varchar columns if size is smaller than file columns or not found in the file column schema by session var `truncate_char_or_varchar_columns`.	2023-08-10 14:37:20 +08:00
daidai	f1db6bd8c1	[feature](hive)append support for struct and map column type on textfile format of hive table (#22347 ) 1. append support for struct and map column type on textfile format of hive table. 2. optimizer code that array column type. ```mysql +------+------------------------------------+ \| id \| perf \| +------+------------------------------------+ \| 1 \| {"key1":"value1", "key2":"value2"} \| \| 1 \| {"key1":"value1", "key2":"value2"} \| \| 2 \| {"name":"John", "age":"30"} \| +------+------------------------------------+ ``` ```mysql +---------+------------------+ \| column1 \| column2 \| +---------+------------------+ \| 1 \| {10, "data1", 1} \| \| 2 \| {20, "data2", 0} \| \| 3 \| {30, "data3", 1} \| +---------+------------------+ ``` Summarizes support for complex types(support assign delimiter) : 1. array< primitive_type > and array< array< ... > > 2. map< primitive_type , primitive_type > 3. Struct< primitive_type , primitive_type ... >	2023-08-10 13:47:58 +08:00
Jerry Hu	57fb9799b5	[feature](agg) add aggregation function 'bitmap_agg' (#22768 ) This function can be used to replace bitmap_union(to_bitmap(expr))， because bitmap_union(to_bitmap(expr)) need create many many small bitmaps firstly and then merge them into a single bitmap. bitmap_agg will convert the column value into a bitmap directly. Its performance is better than bitmap_union(to_bitmap(expr)) . In our test , there is about 30% improvement.	2023-08-10 12:18:25 +08:00
yujun	35dd787ed7	[improvement](transaction) abort txn when be lost heartbeat over 1 min (#22781 )	2023-08-10 12:04:42 +08:00
AKIRA	432c8f1d6a	[opt](stats) No more sync unknown stats since cannot serialize (#22775 ) Gson can't serialize INFINITY under current configuration	2023-08-10 11:46:56 +08:00
GoGoWen	f001b9d5c8	[enhance](multi-catalog) support multi name service when config hive catalog #21825 when create catalog with multi-servicename like below: REATE CATALOG hive_prod_t1 PROPERTIES ( 'type'='hms', 'hive.metastore.uris' = 'thrift://10.198.xxx:9011,thrift://11.11.xxx:9001,thrift://10.198.xxx:9011', 'hadoop.username' = 'user', 'dfs.nameservices'='ns1007,ns1017', 'dfs.ha.namenodes.ns1007'='nn1,nn2', 'dfs.namenode.rpc-address.ns1007.nn1'='10.198.xxxx:8120', 'dfs.namenode.rpc-address.ns1007.nn2'='10.198.xxx:8120', 'dfs.client.failover.proxy.provider.ns1007'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider', 'dfs.ha.namenodes.ns1017'='nn1,nn2', 'dfs.namenode.rpc-address.ns1017.nn1'='10.198.xxxx:8120', 'dfs.namenode.rpc-address.ns1017.nn2'='10.198.xxxx:8120', 'dfs.client.failover.proxy.provider.ns1017'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider' ); the result will be: ERROR 1105 (HY000): errCode = 2, detailMessage = Missing dfs.ha.namenodes.ns1007,ns1017 property	2023-08-10 10:48:08 +08:00
herry2038	eafdab0cfd	[Enhancement](tvf) Add frontends_disks table-valued-function (#22568 ) --------- Co-authored-by: yuxianbing <yuxianbing@yy.com> Co-authored-by: yuxianbing <iloveqaz123>	2023-08-10 10:40:24 +08:00
Tiewei Fang	b90a7748a6	[Feature](Job Schedule)implement Transient Task Register (#22665 ) Implement the TransientTaskRegister to support submitting transient tasks which do not require a timer trigger. rename some class: TimerTaskDisruptor -> TaskDisruptor TimerTaskEvent -> TaskEvent TimerTaskExpirationHandler -> TaskHandler AsyncJobManager -> TimerJobManager MemoryTask -> TransientTask	2023-08-10 10:34:13 +08:00
minghong	8591257d74	[fix](nereids) parallel instance number is set to 1 incorrectly (#22748 ) make PlanNode.getNumInstance() abstract to force every PlanNode specify how to define its numInstance. By default, PlanNode.numInstance is 1. PlanNode except exchangeNode should not use this default value directly. PlanNode.numInstance is used for PlanNode which will change numInstance like exchange node.	2023-08-10 10:17:37 +08:00
Calvin Kirs	8a5021c235	[Fix](Sql)NPE when the Delete statement does not specify a where condition (#22766 ) Execute Sql delete from test_table. 2023-08-09 11:51:46,586 WARN (mysql-nio-pool-7\|540) [StmtExecutor.analyze():987] Analyze failed. stmt[25, 519f916eeb94a8b-afe8e1094fb39fc1] java.lang.NullPointerException: null at org.apache.doris.rewrite.ExprRewriter.applyRuleBottomUp(ExprRewriter.java:236) ~[classes/:?] at org.apache.doris.rewrite.ExprRewriter.applyRule(ExprRewriter.java:226) ~[classes/:?] at org.apache.doris.rewrite.ExprRewriter.applyRuleRepeatedly(ExprRewriter.java:216) ~[classes/:?] at org.apache.doris.rewrite.ExprRewriter.rewrite(ExprRewriter.java:166) ~[classes/:?] at org.apache.doris.rewrite.ExprRewriter.rewrite(ExprRewriter.java:151) ~[classes/:?] at org.apache.doris.analysis.DeleteStmt.analyze(DeleteStmt.java:127) ~[classes/:?] at org.apache.doris.qe.StmtExecutor.analyze(StmtExecutor.java:983) ~[classes/:?] at org.apache.doris.qe.StmtExecutor.executeByLegacy(StmtExecutor.java:660) ~[classes/:?] at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:448) ~[classes/:?] at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:419) ~[classes/:?] at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:441) ~[classes/:?] at org.apache.doris.qe.ConnectProcessor.dispatch(ConnectProcessor.java:589) ~[classes/:?] at org.apache.doris.qe.ConnectProcessor.processOnce(ConnectProcessor.java:826) ~[classes/:?] at org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52) ~[classes/:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at java.lang.Thread.run(Thread.java:829) ~[?:?] Fix Result [HY000][1105] errCode = 2, detailMessage = Where clause is not set Affected version 2.0-Alpha +	2023-08-10 10:15:49 +08:00

... 54 55 56 57 58 ...

8289 Commits