doris

Author	SHA1	Message	Date
Pxl	4d44cea784	[Bug](materialized-view) check group expr at create mv (#21798 ) check group expr at create mv	2023-07-14 15:39:38 +08:00
minghong	62214cd1f4	[feature](nereids) adjust min/max of column stats for cast function (#21772 ) cast(A as date), where A is a string column. the min/max of result column stats should be calc like this: convert A.minExpr to a date dateA, and then get double value from dateA. add "explain memo plan select ..." to print memo from mysql client dump column stats for FileScanNode, used in datalake.	2023-07-14 12:54:04 +08:00
jakevin	2c897b82ad	[enhance](Nereids) Pushdown Project Through OuterJoin. (#21730 ) PushdownJoinOtherCondition will pushdown expression in condition into project, it will block JoinReorder, so we need to pushdown project to help JoinReorder	2023-07-14 11:46:29 +08:00
谢健	b2778d0724	[fix](Nereids) use groupExpr's children to make logicalPlan (#21794 ) After mergeGroup, the children of the plan are different from GroupExpr. To avoid optimizing out-dated group, we should construct new plan with groupExpr's children rather than plan's children	2023-07-14 11:41:38 +08:00
zhangstar333	c07e2ada43	[imporve](udaf) refactor java-udaf executor by using for loop (#21713 ) refactor java-udaf executor by using for loop	2023-07-14 11:37:19 +08:00
minghong	ea73dd5851	[improve](nereids)inner join estimation: assume children output at least one tuple #21792 this assumption is good to eliminate error propagation, when the filter estimation is too low, less than one.	2023-07-14 11:30:25 +08:00
Mryange	ebe771d240	[refactor](executor) remove unused variable	2023-07-14 10:35:59 +08:00
daidai	ca6e33ec0c	[feature](table-value-functions)add catalogs table-value-function (#21790 ) mysql> select * from catalogs() order by CatalogId;	2023-07-14 10:25:16 +08:00
Jibing-Li	352a0c2e17	[Improvement](multi catalog)Cache file system to improve list remote files performance (#21700 ) Use file system type and Conf as key to cache remote file system. This could avoid get a new file system for each external table partition's location. The time cost for fetching 100000 partitions with 1 file for each partition is reduced to 22s from about 15 minutes.	2023-07-14 09:59:46 +08:00
Ashin Gau	4158253799	[feature](hudi) support hudi time travel in external table (#21739 ) Support hudi time travel in external table: ``` select * from hudi_table for time as of '20230712221248'; ``` PR(https://github.com/apache/doris/pull/15418) supports to take timestamp or version as the snapshot ID in iceberg, but hudi only has timestamp as the snapshot ID. Therefore, when querying hudi table with `for version as of`, error will be thrown like: ``` ERROR 1105 (HY000): errCode = 2, detailMessage = Hudi table only supports timestamp as snapshot ID ``` The supported formats of timestamp in hudi are: 'yyyy-MM-dd HH:mm:ss[.SSS]' or 'yyyy-MM-dd' or 'yyyyMMddHHmmss[SSS]', which is consistent with the [time-travel-query.](https://hudi.apache.org/docs/quick-start-guide#time-travel-query) ## Partitioning Strategies Before this PR, hudi's partitions need to be synchronized to hive through [hive-sync-tool](https://hudi.apache.org/docs/syncing_metastore/#hive-sync-tool), or by setting very complex synchronization parameters in [spark conf](https://hudi.apache.org/docs/syncing_metastore/#sync-template). These processes are exceptionally complex and unnecessary, unless you want to query hudi data through hive. In addition, partitions are changed in time travel. We cannot guarantee the correctness of time travel through partition synchronization. So this PR directly obtain partitions by reading hudi meta information. Caching and updating table partition information through hudi instant timestamp, and reusing Doris' partition pruning.	2023-07-13 22:30:07 +08:00
slothever	c5dbd53e6f	[fix](multi-catalog)support oss-hdfs service (#21504 ) 1. support oss-hdfs if it is enabled when use dlf or hms catalog 2. add docs for aliyun dlf and mc.	2023-07-13 18:02:15 +08:00
mch_ucchi	d4bdd6768c	[Feature](Nereids) support select into outfile (#21197 )	2023-07-13 17:01:47 +08:00
Jack Drogon	14253b6a30	[fix](ccr) Add tableName in DropInfo && BatchDropInfo (#21736 ) Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>	2023-07-13 11:47:49 +08:00
LiBinfeng	f863c653e2	[Fix](Planner) fix limit execute before sort in show export job (#21663 ) Problem: When doing show export jobs, limit would execute before sort before changed. So the result would not be expected because limit always cut results first and we can not get what we want. Example: we having export job1 and job2 with JobId1 > JobId2. We want to get job with JobId1 show export from db order by JobId desc limit 1; We do limit 1 first, so we would probably get Job2 because JobId assigned from small to large Solve: We can not cut results first if we have order by clause. And cut result set after sorting	2023-07-13 11:17:28 +08:00
Calvin Kirs	2d2beb637a	[enhancement](RoutineLoad)Mutile table support pipeline load (#21678 )	2023-07-13 10:26:46 +08:00
Siyang Tang	e18465eac7	[feature](TVF) support path partition keys for external file TVF (#21648 )	2023-07-13 10:15:55 +08:00
Xiangyu Wang	105a162f94	[Enhancement](multi-catalog) Merge hms events every round to speed up events processing. (#21589 ) Currently we find that MetastoreEventsProcessor can not catch up the event producing rate in our cluster, so we need to merge some hms events every round.	2023-07-12 23:41:07 +08:00
minghong	0243c403f1	[refactor](nereids)set session var for bushy join (#21744 ) add session var: MAX_JOIN_NUMBER_BUSHY_TREE, default is 5 if table number is less than MAX_JOIN_NUMBER_BUSHY_TREE in a join cluster, nereids try bushy tree, o.w. zigzag tree	2023-07-12 16:40:48 +08:00
ElvinWei	3b76428de9	[fix](stats) when some stat is NULL, causing an exception during display stats (#21588 ) During manual statistics injection, some statistics may beNULL, causing an exception during display.	2023-07-12 14:57:06 +08:00
AKIRA	a18b345459	[opt](stats)update tbl stats of statistics collection after system statistics collection job succeeded (#21528 ) So that if FE crushed when system analyze task running, the system task for column could be created and running when FE recovered	2023-07-12 11:11:50 +08:00
AKIRA	56c2deadb1	[opt](nereids) update CTEConsumer's stats when CTEProducer's stats updated (#21469 )	2023-07-12 10:55:40 +08:00
AKIRA	88c719233a	[opt](nereids) convert OR expression to IN expression (#21326 ) Add new rule named "OrToIn", used to convert multi equalTo which has same slot and compare to a literal of disjunction to a InPredicate so that it could be pushdown to storage engine. for example: ```sql col1 = 1 or col1 = 2 or col1 = 3 and (col2 = 4) col1 = 1 and col1 = 3 and col2 = 3 or col2 = 4 (col1 = 1 or col1 = 2) and (col2 = 3 or col2 = 4) ``` would be converted to ```sql col1 in (1, 2) or col1 = 3 and (col2 = 4) col1 = 1 and col1 = 3 and col2 = 3 or col2 = 4 (col1 in (1, 2) and (col2 in (3, 4))) ```	2023-07-12 10:53:06 +08:00
daidai	ff42cd9b49	[feature](hive)add read of the hive table textfile format array type (#21514 )	2023-07-11 22:37:48 +08:00
AKIRA	ed410034c6	[enhancement](nereids) Sync stats across FE cluster after analyze #21482 Before this PR, if user connect to follower and analyze table, stats would not get cached in follower FE, since Analyze stmt would be forwarded to master, and in follower it's still lazy load to cache.After this PR, once analyze finished on master, master would sync stats to all followers and update follower's stats cache Load partition stats to col stats	2023-07-11 20:09:02 +08:00
zhengyu	8ffa21a157	[fix](config) set FE header size limit to 1MB from 10k (#21719 ) Enlarge jetty_server_max_http_header_size to avoid Request Header Fields Too Large error when streamloading to FE. Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2023-07-11 19:52:14 +08:00
lihangyu	b2c7a4575c	[Bug](dynamic table) set all CreateTableStmt from cup parser dynamic table flag false (#21706 )	2023-07-11 15:23:27 +08:00
zy-kkk	7a758f7944	[enhancement](mysql) Add have_query_cache variable to be compatible with old mysql client (#21701 )	2023-07-11 14:05:40 +08:00
zy-kkk	8d98f2ac7e	[fix](errCode) Change the error code of a read-only variable (#21705 )	2023-07-11 14:05:18 +08:00
zy-kkk	5ed42705d4	[fix](jdbc scan) `1=1` does not translate to `TRUE` (#21688 ) For most database systems, they recognize where 1=1 but not where true, so we should send the original 1=1 to the database	2023-07-11 14:04:49 +08:00
zy-kkk	d3be10ee58	[improvement](column) Support for the default value of current_timestamp in microsecond (#21487 )	2023-07-11 14:04:13 +08:00
zy-kkk	5a15967b65	[fix](sparkdpp) Change spark dpp default version to 1.2-SNAPSHOT (#21698 )	2023-07-11 10:49:53 +08:00
bobhan1	7b403bff62	[feature](partial update)support insert new rows in non-strict mode partial update with nullable unmentioned columns (#21623 ) 1. expand the semantics of variable strict_mode to control the behavior for stream load: if strict_mode is true, the stream load can only update existing rows; if strict_mode is false, the stream load can insert new rows if the key is not present in the table 2. when inserting a new row in non-strict mode stream load, the unmentioned columns should have default value or be nullable	2023-07-11 09:38:56 +08:00
TengJianPing	d59c21e594	[test](spill) disable fuzzy spill variables for now (#21677 ) we will rewrite this logic, so that it is useless now. Not test it anymore.	2023-07-10 22:28:41 +08:00
Mryange	8973610543	[feature](datetime) "timediff" supports calculating microseconds (#21371 )	2023-07-10 19:21:32 +08:00
acnot	202a5c636f	[fix](create table) modify varchar default length 1 to 65533 (#21302 ) modify archer default length 1 to varchar.max.length , when create table. ```mysql create table t2 ( k1 CHAR, K2 CHAR(10) , K3 VARCHAR , K4 VARCHAR(1024) ) duplicate key (k1) distributed by hash(k1) buckets 1 properties('replication_num' = '1'); desc t2; ``` \| Field \| Type \| Null \| Key \| Default \| Extra \| \| -- \|--\|--\| -\| -\| -\| \| k1 \| CHAR(1) \| Yes \| true \| NULL \| \| \| K2 \| CHAR(10) \| Yes \| false \| NULL \| NONE \| \| K3 \| VARCHAR(65533) \| Yes \| false \| NULL \| NONE \| \| K4 \| VARCHAR(1024) \| Yes \| false \| NULL \| NONE \|	2023-07-10 17:57:21 +08:00
jakevin	2b04fa604c	fix: toCalendar should use Calendar.MONTH instead MONDAY (#21665 )	2023-07-10 16:49:42 +08:00
zy-kkk	0be349e250	[feature](jdbc) Support jdbc catalog to read json types (#21341 )	2023-07-10 16:21:00 +08:00
ElvinWei	a1a8ee8320	[enchancement](stats) Inject partition statistics #21543 The cost estimation can be more accurate if the statistics of partition are available. But we are running big data like 1T, can not really import. So now we want to extend this by injecting partition statistics. Syntax： ALTER TABLE table_name MODIFY COLUMN column_name SET STATS ('stat_name' = 'stat_value', ...) [ PARTITION (partition_name) ]; Explanation: - Table_name: The table to which the statistics are dropped. It can be a db_name.table_name form. Column_name: Specified target column. table_name Must be a column that exists in. Statistics can only be modified one column at a time. - Stat _ name and stat _ value: The corresponding stat name and the value of the stat info. Multiple stats are comma separated. Statistics that can be modified include row_count, ndv, num_nulls min_value max_value, and data_size. - Partition_name: specifies the target partition. Must be a partition existing in table_name. Multiple partitions are separated by commas.	2023-07-10 15:06:25 +08:00
Jibing-Li	f9c56d59fc	[improvement](statistics)Support external table show table stats, modify column stats and drop stats (#21624 ) Support external table show table stats, modify column stats and drop stats.	2023-07-10 11:33:06 +08:00
Pxl	77336bff44	[Bug](materialized-view) adjust limit for create materialized view on uniq/agg table (#21580 ) adjust limit for create materialized view on uniq/agg table	2023-07-10 10:04:17 +08:00
jakevin	41fb3d5fa4	[opt](Nereids): Join use List<Plan> as children (#21608 ) Join use List as children can avoid to construct extra ImmutableList	2023-07-09 17:11:55 +08:00
Calvin Kirs	d9974e6337	[Chore](Job)Fix the wrong log when the export job reads fields and add more clear log information (#21490 ) * [Chore](Job)Fix the wrong log when the export job reads fields and add more clear log information * add OriginStatement .toString method	2023-07-09 17:06:38 +08:00
lihangyu	6b945680a7	[Improve](point query) audit point query (#21587 )	2023-07-09 16:43:41 +08:00
yujun	015426b2b4	[fix](tablet report) fix fe can not update replica's status with be's report #21600	2023-07-09 16:23:18 +08:00
Jack Drogon	aacb9b9b66	[Enhancement](binlog) Add create/drop table, add/drop paritition && alter job, modify columns binlog support (#21544 )	2023-07-09 09:11:56 +08:00
HappenLee	f2fb23e98f	[pipeline](exec) disable pipeline load in now version (#21632 )	2023-07-09 01:00:06 +08:00
Mryange	f8a2c66174	[refactor](planner) refactor automatically set instance_num (#21640 ) refactor automatically set instance_num	2023-07-08 21:59:17 +08:00
morrySnow	aad8043d44	[opt](Nereids) enable parallel scan for local phase agg (#21642 ) after we forbid some cases off agg candidate plans, all local phase agg require DistributionSpecAny for child. So, we could enable parallel scan for it	2023-07-08 21:47:17 +08:00
Jack Drogon	51b0bbb667	[Feature] (binlog) Add getBinlogLag (#21637 ) Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>	2023-07-08 07:41:45 +08:00
谢健	499592178e	[fix](Nereids) Add alias name for system variable (#21615 ) Add alias name for system variable to fix the col name is the values of system variable like: ``` mysql> select @@character_set_client; +--------+ \| 'utf8' \| +--------+ \| utf8 \| +--------+ ================================== mysql> select @@character_set_client; +------------------------+ \| @@character_set_client \| +------------------------+ \| utf8 \| +------------------------+ ```	2023-07-07 23:26:01 +08:00

... 61 62 63 64 65 ...

8289 Commits