doris

Author	SHA1	Message	Date
ElvinWei	15fc3c2c89	[enhancement](statistics) optimize the default configuration related to statistics, etc. (#13136 ) This pr is mainly to optimize statistical tasks. Includes the following: 1. No longer generate statistics tasks for empty tables, and move the logic of skipping empty partitions to the process of task generation. 2. Adjusted the default configuration related to statistics to improve the efficiency of statistics collection, parameters include `cbo_concurrency_statistics_task_num`,`statistic_job_scheduler_execution_interval_ms` and `statistic_task_scheduler_execution_interval_ms`. 3. Optimize the display of statistical tasks. 4. In addition, some `org.apache.parquet.Strings` packages are changed to `com.google.common.base.Strings` to avoid the exception that Strings cannot be found in local debug. etc.	2022-10-09 16:34:20 +08:00
morrySnow	da933ecd21	[fix](Nereids) plan broadcast on right semi join by mistake (#13206 )	2022-10-09 16:32:12 +08:00
Mingyu Chen	ece4a6c194	[doc][fix](multi-catalog) add doc for multi catalog and fix refresh bug (#13097 ) 1. Add all document about multi catalog feature. 2. Fix a bug that REFRESH edit log is not handled	2022-10-09 09:14:44 +08:00
Gabriel	869fe2bc5d	[Improvement](outfile) Support ORC format in outfile (#13019 )	2022-10-08 20:56:32 +08:00
jakevin	63f5dc1953	[feature](Nereids): support Alias join reorder and fix bug. (#12890 ) * [improve](Nereids): simplify onCondition check. * feature: support project Alias for join reorder.	2022-10-08 10:45:04 +08:00
Lightman	7b75c2df54	[fix](BE) fix the stream load error when upgrade BE from 1.1.2 to master (#13058 )	2022-10-05 12:13:26 +08:00
DingGeGe	4a0b4f1836	[fix](fe-test) TestWithFeService do not clean up dorisHome (#13073 )	2022-10-04 21:32:27 +08:00
xueweizhang	b083fb6d5f	[fix](decimal) retain Decimal trailing zero when select on fe (#13065 )	2022-10-04 21:31:18 +08:00
Zhengguo Yang	74fc98ceeb	[improvement](ResourceTag) support upper case in tag name (#13063 )	2022-10-04 21:30:37 +08:00
zhangstar333	3f47f67b16	[fix](parquet) fix parquet write setting property is not effective (#12912 )	2022-10-04 21:25:57 +08:00
zhangstar333	e167aa120f	[fix](jdbc) fix insert into date type to oracle using wrong type (#12883 ) using JDBC insert into date type to ORACLE, it's should be use to_date function convert string to java.sql.date	2022-10-04 21:24:33 +08:00
zhannngchen	b53533408b	not allow alter mow property (#13108 )	2022-10-03 21:31:09 +08:00
Mingyu Chen	d44af5decf	[fix](alter-load) fix bug that tablet version may be wrong when doing alter and load (#13070 ) the `isRunning()` method of `TransactionState` is missing `PRE_COMMITTED` status. Which cause wrong judgment of `isPreviousTransactionsFinished`	2022-09-30 23:39:30 +08:00
morrySnow	95561baddd	[fix](planner) throw NPE when all group by expr is constant and no agg expr in select list (#13087 )	2022-09-30 18:47:01 +08:00
morrySnow	90f11ed7c1	[enhancement](Nereids) remove unnecessary exchange between global and distinct local aggregate node (#13057 ) Add partition info into LogicalAggregate and set it as original group expression list of aggregate when we do aggregate disassemble with distinct aggregate function.	2022-09-29 23:12:37 +08:00
Kikyou1997	31a23baa37	[fix](planner) Add default execution interval time for stats framework (#13044 ) Set a default execution interval for stats collection related threads.	2022-09-29 22:40:27 +08:00
Gabriel	287ff50a6f	[Bug](datev2) Fix compatible error between datev2 and date (#13024 )	2022-09-29 18:01:55 +08:00
Shuo Wang	a7b42a7029	[Fix](Nereids) Fix exception message when can't bind slot. (#13048 )	2022-09-29 16:51:07 +08:00
jakevin	42729786bf	[enhancement](Nereids) push filter into join otherJoinCondition (#12842 )	2022-09-29 16:19:30 +08:00
mch_ucchi	1ae9454771	[enhancement](Nereids) planner performance speed up (#12858 ) optimize planner by: 1. reduce duplicated calculation on equals, getOutput, computeOutput eq. 2. getOnClauseUsedSlots: the two side of equalTo is centainly slot, so not need to use List.	2022-09-29 16:01:10 +08:00
DingGeGe	fae7296336	[Enhancement](fe-core) make UT-SelectRollupTest more stable (#13030 )	2022-09-29 14:25:01 +08:00
Gabriel	c2fae109c3	[Improvement](outfile) Support output null in parquet writer (#12970 )	2022-09-29 13:36:30 +08:00
morrySnow	d53205076e	[feature](Nereids) implicit cast StringLiteral to another side type of BinaryOperator if available (#13038 ) for expression 5 > '1'. before this PR, we normalize it to '5' > '1'. After this PR, we normalize it to 5 > 1 to compatible with legacy planner.	2022-09-28 21:34:25 +08:00
minghong	d739aa7c53	[enhancement](Nereids) optimization for star-schema join reorder (#12817 ) the basic idea of star-schema support is: 1. fact_table JOIN dimension_table, if dimension table are filtered, the result can be regarded as applying a filter on fact table. 2. fact_table JOIN dimension_table, if the dimension table is not filtered, the number of join result tuple equals to the number of fact tuples. 3. dimension table JOIN fact table, the number of join result tuple is that of fact table or 2 times of dimension table. If star-schema support is enabled: 1. nereids regard duplicate key(unique key/aggregation key) as primary key 2. nereids try to regard one join key as primary key and another join key as foreign key. 3. if nereids found that no join key is primary key, nereids fall back to normal estimation.	2022-09-28 21:09:55 +08:00
morrySnow	7019166469	[enhancement](Nereids) let BinaryArithmetic's dataType and nullable match with BE (#13015 ) Do type promotion for BinaryArithmetic: - Add - Subtract - Multiply Do always nullable for: - Mod	2022-09-28 20:02:27 +08:00
luozenglin	28ce1878ca	[fix](planner) fix push down no grouping agg (#12983 ) The value column of the agg does not support zone_map index, fixing the value column pushing down to zone map causes null pointer.	2022-09-28 17:01:01 +08:00
carlvinhust2012	1b1f13ec84	[optimization](array-type) optimize error prompts when sql parser report error (#12999 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-09-28 14:35:41 +08:00
morrySnow	eef9367705	[feature](Nereids) use one stage aggregation if available (#12849 ) Currently, we always disassemble aggregation into two stage: local and global. However, in some case, one stage aggregation is enough, there are two advantage of one stage aggregation. 1. avoid unnecessary exchange. 2. have a chance to do colocate join on the top of aggregation. This PR move AggregateDisassemble rule from rewrite stage to optimization stage. And choose one stage or two stage aggregation according to cost.	2022-09-28 10:38:03 +08:00
starocean999	339877930d	[fix](join)report 'natural join is not supported' instead of getting wrong result (#13008 ) * [fix](join)report 'natural join is not supported' instead of getting wrong result * add regression test	2022-09-28 09:08:56 +08:00
Mingyu Chen	d80b7b9689	[feature-wip](new-scan) support more load situation (#12953 )	2022-09-27 21:48:32 +08:00
jakevin	9a38a9677a	[feature](Nereids) Eliminate outer join (#12985 ) eliminate outer join if we have non-null predicate on slots of inner side of outer join. TODO: 1. use constant viariable to handle it (we can handle more case like nullsafeEqual ......) 2. using constant folding to handle of null values, is more general and does not require writing long logical judgments 3. handle null safe equals(<=>)	2022-09-27 21:09:25 +08:00
Shuo Wang	57570f2090	[feature](Nereids) Set pre-aggregation status for OLAP table scan (#12785 ) This is the second step for #12303. The previous PR #12464 added the framework to select the rollup index for OLAP table, but pre-aggregation is turned on by default. This PR set pre-aggregation for scan OLAP table. The main steps are as below: 1. Select rollup index when aggregate is present, this is handled by `SelectRollupWithAggregate` rule. Expressions in aggregate functions, grouping expressions, and pushdown predicates would be used to check whether the pre-aggregation should be turned off. 2. When selecting from olap scan table without aggregate plan, it would be handled by `SelectRollupWithoutAggregate`.	2022-09-27 19:12:15 +08:00
Pxl	9607f60845	[Feature](serialize) move block_data_version to fe heart beat (#12667 ) Move block_data_version from be config to fe heart beat	2022-09-27 18:25:54 +08:00
ElvinWei	ba5705a589	[feature-wip](statistics) step6: statistics is available (#8864 ) This pull request includes some implementations of the statistics(https://github.com/apache/incubator-doris/issues/6370). Execute these sql such as "`ANALYZE`, `SHOW ANALYZE`, `SHOW TABLE/COLUMN STATS...`" to collect statistics information and query them. The following are the changes in this PR: 1. Added the necessary test cases for statistics. 2. Statistics optimization. To ensure the validity of statistics, statistics can only be updated after the statistics task is completed or manually updated by SQL, and the collected statistics should not be changed in other ways. The reason is to ensure that the statistics are not distorted. 3. Some code or comments have been adjusted to fix checkStyle problem. 4. Remove some code that was previously added because statistics were not available. 5. Add a configuration, which indicates whether to enable the statistics. The current statistics may not be stable, and it is not enabled by default (`enable_cbo_statistics=false`). Currently, it is mainly used for CBO test. See this PR(#12766) syntax, some simple examples of statistics: ```SQL -- enable statistics SET enable_cbo_statistics=true; -- collect statistics for all tables in the current database ANALYZE; -- collect all column statistics for table1 ANALYZE test.table1; -- collect statistics for siteid of table1 ANALYZE test.table1(siteid); ANALYZE test.table1(pv, citycode); -- collect statistics for partition of table1 ANALYZE test.table1 PARTITION(p202208); ANALYZE test.table1 PARTITIONS(p202208, p202209); -- display table statistics SHOW TABLE STATS test.table1; -- display partition statistics of table1 SHOW TABLE STATS test.table1 PARTITION(p202208); -- display column statistics of table1 SHOW COLUMN STATS test.table1; -- display column statistics of partition SHOW COLUMN STATS test.table1 PARTITION(p202208); -- display the details of the statistics jobs SHOW ANALYZE; SHOW ANALYZE idxxxx; ```	2022-09-27 17:24:14 +08:00
starocean999	a6db5e63df	[fix](projection)sort node's unmaterialized slots should be removed from resolvedTupleExprs (#12963 )	2022-09-27 11:46:44 +08:00
shee	7977bebfed	[feature](Nereids) constant expression folding (#12151 )	2022-09-26 17:16:23 +08:00
DingGeGe	3902b2bfad	[refactor](fe-core src test catalog): refactor and replace use NIO #12818 (#12818 )	2022-09-26 16:51:46 +08:00
minghong	c809a21993	[feature](nereids) extract single table expression for push down (#12894 ) TPCH q7, we have expression like (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY') or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE') this expression implies (n1.n_name='FRANCE' or n1.n_name=''GERMANY) The implied expression is logical redundancy, but it could be used to reduce the output tuple number of scan(n1), if nereids pushes this expression down. This pr introduces a RULE to extract such expressions. NOTE: 1. we only extract expression on a single table. 2. if the extracted expression cannot be pushed down, e.g. it is on right table of left outer join, we need another rule to remove all the useless expressions.	2022-09-26 11:19:37 +08:00
jiafeng.zhang	9c03deb150	[fix](log)Audit log status is incorrect (#12824 ) Audit log status is incorrect	2022-09-26 09:57:52 +08:00
Shane	59699a4321	[feature](JSON datatype)Support JSON datatype (#10322 ) Add `JSON` datatype, following features are implemented by this PR: 1. `CREATE` tables with `JSON` type columns 2. `INSERT` values containing `JSON` type value stored in `String`, which is represented as binary format(AKA `JSONB`) at BE 3. `SELECT` JSON columns Detail design refers [DSIP-016: Support JSON type](https://cwiki.apache.org/confluence/display/DORIS/DSIP-016%3A+Support+JSON+type) * add JSONB data storage format type * fix JsonLiteral resolve bug * add DataTypeJson case in data_type_factory * add JSON syntax check in FE * add operators for jsonb_document, currently not support comparison between any JSON type value * add ColumnJson and DataTypeJson * add JsonField to store JsonValue * add JsonValue to convert String JSON to BINARY JSON and JsonLiteral case for vliteral * add push_json for MysqlResultWriter * JSON column need no zone_map_index * Revert "JSON column need no zone_map_index" This reverts commit f71d1ce1ded9dbae44a5d58abcec338816b70d79. * add JSON writer and reader, ignore zone-map for JSON column * add json_to_string for DataTypeJson * add olap_data_convertor for JSON type * add some enum * add OLAP_FIELD_TYPE_JSON type, FieldTypeTraits for it and corresponding cases or functions * fix column_json offsets overflow bug, format code * remove useless TODOs, add CmpType cases for JSON type * add license header * format license * format be codes * resolve rebase master conflicts * fix bugs for CREATE and meta related code * refactor JsonValue constructors, add fe JSON cases and fix some bugs, reformat codes * modification be codes along code review advice * fix rebase conflicts with master * add unit test for json_value and column_json * fix rebase error * rename json to jsonb * fix some data convert bugs, set Mysql type to JSON	2022-09-25 14:06:49 +08:00
Jibing-Li	f1a64ea09f	[fix](new-scan)Fix new scanner load job bugs (#12903 ) Fix bugs: 1. Fe need to send file format (e.g. parquet, orc ...) to be while processing load jobs using new scanner. 2. Try to get parquet file column type from SchemaElement.type before getting from Logical type and Converted type.	2022-09-24 17:21:19 +08:00
HappenLee	d65756b504	[Bug](bucket shuffle) fix error bucket shuffle join plan in two same table (#12930 )	2022-09-24 09:59:23 +08:00
Yongqiang YANG	9dc35ab534	[fix](streamload) set coord for streamLoad (#12744 ) When a stream load is canceled, status is reported to coord.	2022-09-23 20:23:19 +08:00
jakevin	7f5970d62f	[fix](Nereids): add stats in plan. (#12790 ) * [improve](Nereids): add stats for bestPlan and correct fix selectivity	2022-09-23 19:26:49 +08:00
HappenLee	f7e3ca29b5	[Opt](Vectorized) Support push down no grouping agg (#12803 ) Support push down no grouping agg	2022-09-23 18:29:54 +08:00
morrySnow	bd12a49baf	[feature](Nereids) enable bucket shuffle join on fragment without scan node (#12891 ) In the past, with legacy planner, we could only do bucket shuffle join on the join node belonging to the fragment with at least one scan node. But, bucket shuffle join should do on each join node that left child's data distribution satisfy join's demand. In nereids, we have data distribution info on each node. So we could enable bucket shuffle join on fragment without scan node.	2022-09-23 15:01:50 +08:00
morrySnow	c100d24116	[enhancement](Nereids) remove unnecessary ExchangeNode under AssertNumRowsNode (#12841 ) current, we always add exchange under AssertNumRowsNode. Nevertheless, if its child node's partition is unpartitioned, no need to add exchange at all.	2022-09-23 14:50:27 +08:00
ElvinWei	892e53a15b	[fix](test) fix a test failure problem after merging (#12902 )	2022-09-23 14:22:29 +08:00
ElvinWei	e28e30fe71	[Improvement](statistics) collect statistics in parallel and add test cases (#12839 ) This PR mainly improves some functions of the statistics module(#6370)： 1. when collecting partition statistics, filter empty partitions in advance and do not generate statistical tasks. 2. the old statistical update method may have problems when updating statistics in parallel, which has been solved. 3. optimize internal-query. 4. add test cases related to statistics. 5. modify some comments as prompted by CheckStyle.	2022-09-23 11:59:53 +08:00
Adonis Ling	b7eea72d1d	[feature-wip](MTMV) Support showing and dropping materialized view for multiple tables (#12762 ) Use cases: mysql> CREATE TABLE t1 (pk INT, v1 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1'); Query OK, 0 rows affected (0.05 sec) mysql> CREATE TABLE t2 (pk INT, v2 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1'); Query OK, 0 rows affected (0.01 sec) mysql> CREATE MATERIALIZED VIEW mv BUILD IMMEDIATE REFRESH COMPLETE KEY (mv_pk) DISTRIBUTED BY HASH (mv_pk) PROPERTIES ('replication_num' = '1') AS SELECT t1.pk as mv_pk FROM t1, t2 WHERE t1.pk = t2.pk; Query OK, 0 rows affected (0.02 sec) mysql> SHOW TABLES; +---------------+ \| Tables_in_dev \| +---------------+ \| mv \| \| t1 \| \| t2 \| +---------------+ 3 rows in set (0.00 sec) mysql> SHOW CREATE TABLE mv; +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| Materialized View \| Create Materialized View \| +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| mv \| CREATE MATERIALIZED VIEW `mv` BUILD IMMEDIATE REFRESH COMPLETE ON DEMAND KEY(`mv_pk`) DISTRIBUTED BY HASH(`mv_pk`) BUCKETS 10 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2", "disable_auto_compaction" = "false" ) AS SELECT `t1`.`pk` AS `mv_pk` FROM `default_cluster:dev`.`t1` , `default_cluster:dev`.`t2` WHERE `t1`.`pk` = `t2`.`pk`; \| +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.01 sec) mysql> DROP MATERIALIZED VIEW mv; Query OK, 0 rows affected (0.01 sec)	2022-09-23 10:36:40 +08:00

1 2 3 4 5 ...

1771 Commits