doris

Author	SHA1	Message	Date
shee	7977bebfed	[feature](Nereids) constant expression folding (#12151 )	2022-09-26 17:16:23 +08:00
DingGeGe	3902b2bfad	[refactor](fe-core src test catalog): refactor and replace use NIO #12818 (#12818 )	2022-09-26 16:51:46 +08:00
minghong	c809a21993	[feature](nereids) extract single table expression for push down (#12894 ) TPCH q7, we have expression like (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY') or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE') this expression implies (n1.n_name='FRANCE' or n1.n_name=''GERMANY) The implied expression is logical redundancy, but it could be used to reduce the output tuple number of scan(n1), if nereids pushes this expression down. This pr introduces a RULE to extract such expressions. NOTE: 1. we only extract expression on a single table. 2. if the extracted expression cannot be pushed down, e.g. it is on right table of left outer join, we need another rule to remove all the useless expressions.	2022-09-26 11:19:37 +08:00
jiafeng.zhang	9c03deb150	[fix](log)Audit log status is incorrect (#12824 ) Audit log status is incorrect	2022-09-26 09:57:52 +08:00
Shane	59699a4321	[feature](JSON datatype)Support JSON datatype (#10322 ) Add `JSON` datatype, following features are implemented by this PR: 1. `CREATE` tables with `JSON` type columns 2. `INSERT` values containing `JSON` type value stored in `String`, which is represented as binary format(AKA `JSONB`) at BE 3. `SELECT` JSON columns Detail design refers [DSIP-016: Support JSON type](https://cwiki.apache.org/confluence/display/DORIS/DSIP-016%3A+Support+JSON+type) * add JSONB data storage format type * fix JsonLiteral resolve bug * add DataTypeJson case in data_type_factory * add JSON syntax check in FE * add operators for jsonb_document, currently not support comparison between any JSON type value * add ColumnJson and DataTypeJson * add JsonField to store JsonValue * add JsonValue to convert String JSON to BINARY JSON and JsonLiteral case for vliteral * add push_json for MysqlResultWriter * JSON column need no zone_map_index * Revert "JSON column need no zone_map_index" This reverts commit f71d1ce1ded9dbae44a5d58abcec338816b70d79. * add JSON writer and reader, ignore zone-map for JSON column * add json_to_string for DataTypeJson * add olap_data_convertor for JSON type * add some enum * add OLAP_FIELD_TYPE_JSON type, FieldTypeTraits for it and corresponding cases or functions * fix column_json offsets overflow bug, format code * remove useless TODOs, add CmpType cases for JSON type * add license header * format license * format be codes * resolve rebase master conflicts * fix bugs for CREATE and meta related code * refactor JsonValue constructors, add fe JSON cases and fix some bugs, reformat codes * modification be codes along code review advice * fix rebase conflicts with master * add unit test for json_value and column_json * fix rebase error * rename json to jsonb * fix some data convert bugs, set Mysql type to JSON	2022-09-25 14:06:49 +08:00
Jibing-Li	f1a64ea09f	[fix](new-scan)Fix new scanner load job bugs (#12903 ) Fix bugs: 1. Fe need to send file format (e.g. parquet, orc ...) to be while processing load jobs using new scanner. 2. Try to get parquet file column type from SchemaElement.type before getting from Logical type and Converted type.	2022-09-24 17:21:19 +08:00
HappenLee	d65756b504	[Bug](bucket shuffle) fix error bucket shuffle join plan in two same table (#12930 )	2022-09-24 09:59:23 +08:00
Yongqiang YANG	9dc35ab534	[fix](streamload) set coord for streamLoad (#12744 ) When a stream load is canceled, status is reported to coord.	2022-09-23 20:23:19 +08:00
jakevin	7f5970d62f	[fix](Nereids): add stats in plan. (#12790 ) * [improve](Nereids): add stats for bestPlan and correct fix selectivity	2022-09-23 19:26:49 +08:00
HappenLee	f7e3ca29b5	[Opt](Vectorized) Support push down no grouping agg (#12803 ) Support push down no grouping agg	2022-09-23 18:29:54 +08:00
morrySnow	bd12a49baf	[feature](Nereids) enable bucket shuffle join on fragment without scan node (#12891 ) In the past, with legacy planner, we could only do bucket shuffle join on the join node belonging to the fragment with at least one scan node. But, bucket shuffle join should do on each join node that left child's data distribution satisfy join's demand. In nereids, we have data distribution info on each node. So we could enable bucket shuffle join on fragment without scan node.	2022-09-23 15:01:50 +08:00
morrySnow	c100d24116	[enhancement](Nereids) remove unnecessary ExchangeNode under AssertNumRowsNode (#12841 ) current, we always add exchange under AssertNumRowsNode. Nevertheless, if its child node's partition is unpartitioned, no need to add exchange at all.	2022-09-23 14:50:27 +08:00
ElvinWei	892e53a15b	[fix](test) fix a test failure problem after merging (#12902 )	2022-09-23 14:22:29 +08:00
ElvinWei	e28e30fe71	[Improvement](statistics) collect statistics in parallel and add test cases (#12839 ) This PR mainly improves some functions of the statistics module(#6370)： 1. when collecting partition statistics, filter empty partitions in advance and do not generate statistical tasks. 2. the old statistical update method may have problems when updating statistics in parallel, which has been solved. 3. optimize internal-query. 4. add test cases related to statistics. 5. modify some comments as prompted by CheckStyle.	2022-09-23 11:59:53 +08:00
Adonis Ling	b7eea72d1d	[feature-wip](MTMV) Support showing and dropping materialized view for multiple tables (#12762 ) Use cases: mysql> CREATE TABLE t1 (pk INT, v1 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1'); Query OK, 0 rows affected (0.05 sec) mysql> CREATE TABLE t2 (pk INT, v2 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1'); Query OK, 0 rows affected (0.01 sec) mysql> CREATE MATERIALIZED VIEW mv BUILD IMMEDIATE REFRESH COMPLETE KEY (mv_pk) DISTRIBUTED BY HASH (mv_pk) PROPERTIES ('replication_num' = '1') AS SELECT t1.pk as mv_pk FROM t1, t2 WHERE t1.pk = t2.pk; Query OK, 0 rows affected (0.02 sec) mysql> SHOW TABLES; +---------------+ \| Tables_in_dev \| +---------------+ \| mv \| \| t1 \| \| t2 \| +---------------+ 3 rows in set (0.00 sec) mysql> SHOW CREATE TABLE mv; +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| Materialized View \| Create Materialized View \| +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| mv \| CREATE MATERIALIZED VIEW `mv` BUILD IMMEDIATE REFRESH COMPLETE ON DEMAND KEY(`mv_pk`) DISTRIBUTED BY HASH(`mv_pk`) BUCKETS 10 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2", "disable_auto_compaction" = "false" ) AS SELECT `t1`.`pk` AS `mv_pk` FROM `default_cluster:dev`.`t1` , `default_cluster:dev`.`t2` WHERE `t1`.`pk` = `t2`.`pk`; \| +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.01 sec) mysql> DROP MATERIALIZED VIEW mv; Query OK, 0 rows affected (0.01 sec)	2022-09-23 10:36:40 +08:00
zhangstar333	617820b1f5	[Refactor](parquet) refactor parquet write to uniform and consistent logic (#12730 )	2022-09-23 09:12:34 +08:00
Stalary	84dd3edd0d	[Bug](view) Show create view support comment #12838	2022-09-23 09:09:44 +08:00
ElvinWei	340784e294	[feature-wip](statistics) add statistics module related syntax (#12766 ) This pull request includes some implementations of the statistics(#6370), it adds statistics module related syntax. The current syntax for collecting statistics will not collect statistics (It will collect statistics until test is stable). - `ANALYZE` syntax(collect statistics) ```SQL ANALYZE [[ db_name.tb_name ] [( column_name [, ...] )], ...] [PARTITIONS(...)] [ PROPERTIES(...) ] ``` > db_name.tb_name: collect table and column statistics from tb_name. > column_name: collect column statistics from column_name. > properties: properties of statistics jobs. example： ```SQL ANALYZE; -- collect statistics for all tables in the current database ANALYZE table1(pv, citycode); -- collect pv and citycode statistics for table1 ANALYZE test.table2 PARTITIONS(partition1); -- collect statistics for partition1 of table2 ``` - `SHOW ANALYZE` syntax(show statistics job info) ```SQL SHOW ANALYZE [TABLE \| ID] [ WHERE [STATE = ["PENDING"\|"SCHEDULING"\|"RUNNING"\|"FINISHED"\|"FAILED"\|"CANCELLED"]] ] [ORDER BY ...] [LIMIT limit][OFFSET offset]; ``` - `SHOW TABLE STATS`syntax(show table statistics) ```SQL SHOW TABLE STATS [ db_name.tb_name ] ``` - `SHOW COLUMN STATS` syntax(show column statistics) ```SQL SHOW COLUMN STATS [ db_name.tb_name ] ```	2022-09-22 11:15:00 +08:00
ElvinWei	3fa820ec50	[feature-wip](statistics) collect statistics by sql task (#12765 ) This pull request includes some implementations of the statistics(#6370), it Implements sql-task to collect statistics based on internal-query(#9983). After the ANALYZE statement is parsed, statistical tasks will be generated. The statistical tasks includes mata-task(get statistics from metadata) and sql-task(get statistics by sql query). For sql-task, it will get statistics such as the row_count, the number of null values, and the maximum value by SQL query. For statistical tasks, also include sampling sql-task, which will be implemented in the next pr.	2022-09-22 11:13:35 +08:00
xueweizhang	70ab9cb43e	[feature](http) refactor version info and add new http api for get version info (#12513 ) Refactor version info and add new http api for get version info	2022-09-22 10:53:04 +08:00
Lei Zhang	6cd4c9ecb5	[bugfix](fe) Fix test_materialized_view_hll case npt. (#12829 ) when enable light schema change, run test_materialized_view_hll case throw NullPointerException. java.lang.NullPointerException: null at org.apache.doris.analysis.SlotDescriptor.setColumn(SlotDescriptor.java:153) at org.apache.doris.planner.OlapScanNode.updateSlotUniqueId(OlapScanNode.java:399)	2022-09-22 09:50:53 +08:00
jakevin	cbadbecd9a	[fix](Nereids) anti join could not be reorder (#12827 )	2022-09-22 09:19:12 +08:00
wxy	1ae7c4e307	[fix](LOAD statement): fix bug for `toSql` func of LoadStmt. (#12648 )	2022-09-22 09:07:46 +08:00
morrySnow	c58e4ca03b	[enhancement](Nereids) turn on all reorder rule that needed by zig-zag tree (#12767 )	2022-09-22 02:35:31 +08:00
jakevin	0dee640a3e	[feature](Nereids): eliminate filter true and add checker. (#12821 )	2022-09-22 02:31:11 +08:00
morrySnow	1c98c3a8f0	[fix](Nereids) GroupExpression never be optimize if it run with exploration job (#12815 ) Exploration job only do explore, but never call optimize. So the GroupExpression explored by exploration only job will never do implementation.	2022-09-21 21:03:37 +08:00
Jibing-Li	ec2b3bf220	[feature-wip](new-scan)Refactor VFileScanner, support broker load, remove unused functions in VScanner base class. (#12793 ) Refactor of scanners. Support broker load. This pr is part of the refactor scanner tasks. It provide support for borker load using new VFileScanner. Work still in progress.	2022-09-21 12:49:56 +08:00
morrySnow	7b46e2400f	[enhancement](Nereids) add all necessary PhysicalDistribute on Join's child to ensure get correct cost (#12483 ) In an earlier PR #11976 , we add shuffle join and bucket shuffle support. But if join's right child's distribution spec satisfied join's require, we do not add distribute on right child. Instead of, do it in plan translator. It is hard to calculate accurate cost in this way, since we some distribute cost do not calculated. In this PR, we introduce a new shuffle type BUCKET, and change the way of add enforce to ensure all necessary distribute will be added in cost and enforcer job.	2022-09-21 12:18:37 +08:00
jakevin	52a0da1f5c	[improve](Nereids): add check validator during post. (#12702 )	2022-09-21 11:25:04 +08:00
Gabriel	632867c1c1	[Bug](datetimev2) Fix lost precision for datetimev2 (#12723 )	2022-09-21 11:15:02 +08:00
camby	c5b6056b7a	[fix](lateral_view) fix lateral view explode_split with temp table (#12643 ) Problem describe: follow SQL return wrong result: WITH example1 AS ( select 6 AS k1 ,'a,b,c' AS k2) select k1, e1 from example1 lateral view explode_split(k2, ',') tmp as e1; Wrong result: +------+------+ \| k1 \| e1 \| +------+------+ \| 0 \| a \| \| 0 \| b \| \| 0 \| c \| +------+------+ Correct result should be: +------+------+ \| k1 \| e1 \| +------+------+ \| 6 \| a \| \| 6 \| b \| \| 6 \| c \| +------+------+ Why? TableFunctionNode::outputSlotIds do not include column k1. Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-09-21 09:19:18 +08:00
Gabriel	d5486726de	[Bug](date) Fix wrong result produced by date function (#12720 )	2022-09-20 21:09:26 +08:00
Gabriel	cc072d35b7	[Bug](date) Fix wrong type in TimestampArithmeticExpr (#12727 )	2022-09-20 21:08:48 +08:00
caiconghui	bb7206d461	[refactor](SimpleScheduler) refactor code for getting available backend in SimpleScheduler (#12710 )	2022-09-20 18:08:29 +08:00
mch_ucchi	47797ad7e8	[feature](Nereids) Push down not slot references expression of on clause (#11805 ) pushdown not slotreferences expr of on clause. select * from t1 join t2 on t1.a + 1 = t2.b + 2 and t1.a + 1 > 2 project() +---join(t1.a + 1 = t2.b + 2 && t1.a + 1 > 2) \|---scan(t1) +---scan(t2) transform to project() +---join(c = d && c > 2) \|---project(t1.a -> t1.a + 1) \| +---scan(t1) +---project(t2.b -> t2.b + 2) +---scan(t2)	2022-09-20 13:41:54 +08:00
minghong	d83eb13ac5	[enhancement](nereids) use Literal promotion to avoid unnecessary cast (#12663 ) Instead of add a cast function on literal, we directly change the literal type. This change could save cast execution time and memory. For example: In SQL: "CASE WHEN l_orderkey > 0 THEN ...", 0 is a TinyIntLiteral. Before this PR: "CASE WHEN l_orderkey > CAST (TinyIntLiteral(0) AS INT)` With this PR: "CASE WHEN l_orderkey > IntegerLiteral(0)"	2022-09-20 11:15:47 +08:00
morrySnow	954c44db39	[enhancement](Nereids) compare LogicalProperties with output set instead of output list (#12743 ) We used output list to compare two LogicalProperties before. Since join reorder will change the children order of a join plan and caused output list changed. the two join plan will not equals anymore in memo although they should be. So we must add a project on the new join to keep the LogicalProperties the same. This PR changes the equals and hashCode funtions of LogicalProperties. use a set of output to compare two LogicalProperties. Then we do not need add the top peoject anymore. This help us keep memo simple and efficient.	2022-09-20 10:55:29 +08:00
starocean999	4f27692898	[fix](inlineview)the inlineview's slots' nullability property is not set correctly (#12681 ) The output slots of inline view may come from an outer join nullable side table. So it's should be nullable.	2022-09-20 09:29:15 +08:00
ElvinWei	e1d2f82d8e	[feature](statistics) template for building internal query SQL statements (#12714 ) Template for building internal query SQL statements，it mainly used for statistics module. After the template is defined, the executable statement will be built after the given parameters. For example, template and parameters: - template: `SELECT ${col} FROM ${table} WHERE id = ${id};`, - parameters: `{col=colName, table=tableName, id=1}` - result sql: `SELECT colName FROM tableName WHERE id = 1;` usage: ``` String template = "SELECT * FROM ${table} WHERE id = ${id};"; Map<String, String> params = new HashMap<>(); params.put("table", "table0"); params.put("id", "123"); // result: SELECT * FROM table0 WHERE id = 123; String result = InternalSqlTemplate.processTemplate(template, params); ```	2022-09-19 22:10:28 +08:00
mch_ucchi	94d73abf2a	[test](Nereids) runtime filter unit cases not rely on NereidPlanner to generate PhysicalPlan anymore (#12740 ) This PR: 1. add rewrite and implement method to PlanChecker 2. improve unit tests of runtime filter	2022-09-19 19:53:55 +08:00
ElvinWei	1339eef33c	[fix](statistics) remove statistical task multiple times in one loop cycle (#12741 ) There is a problem with StatisticsTaskScheduler. The peek() method obtains a reference to the same task object, but the for-loop executes multiple removes.	2022-09-19 19:28:51 +08:00
jakevin	4b5cc62348	[refactor](Nereids) rename transform to applyExploration UT helper class PlanChecker (#12725 )	2022-09-19 16:49:56 +08:00
ElvinWei	08a71236a9	[feature](statistics) Internal-query, execute SQL query statement internally in FE (#9983 ) Execute SQL query statements internally(in FE). Internal-query mainly used for statistics module, FE obtains statistics by SQL from BE, such as column maximum value, minimum value, etc. This is a tool module as statistics, it will not affect the original code, also will not affect the use of users. The simple usage process is as follows(the following code does no exception handling): ``` String dbName = "test"; String sql = "SELECT * FROM table0"; InternalQuery query = new InternalQuery(dbName, sql); InternalQueryResult result = query.query(); List<ResultRow> resultRows = result.getResultRows(); for (ResultRow resultRow : resultRows) { List<String> columns = resultRow.getColumns(); for (int i = 0; i < resultRow.getColumns().size(); i++) { resultRow.getColumnIndex(columns.get(i)); resultRow.getColumnName(i); resultRow.getColumnType(columns.get(i)); resultRow.getColumnType(i); resultRow.getColumnValue(columns.get(i)); resultRow.getColumnValue(i); } } ```	2022-09-19 16:26:54 +08:00
jakevin	399af4572a	[improve](Nereids) improve join cost model (#12657 )	2022-09-19 16:25:30 +08:00
Jibing-Li	5978fd9647	[refactor](file scanner)Refactor file scanner. (#12602 ) Refactor the scanners for hms external catalog, work in progress. Use VFileScanner, will remove NewFileParquetScanner, NewFileOrcScanner and NewFileTextScanner after fully tested. Query for parquet file has been tested, still need to add readers for orc file, text file and load logic as well.	2022-09-19 15:23:51 +08:00
jakevin	75d7de89a5	[improve](Nereids) Add all slots used by onClause to project when reorder and fix reorder mark (#12701 ) 1. Add all slots used by onClause in project ``` (A & B) & C like join(hash conjuncts: C.t2 = A.t2) \|---project(A.t2) \| +---join(hash conjuncts: A.t1 = B.t1) \| +---A \| +---B +---C transform to (A & C) & B join(hash conjuncts: A.t1 = B.t1) \|---project(A.t2) \| +---join(hash conjuncts: C.t2 = A.t2) \| +---A \| +---C +---B ``` But projection just include `A.t2`, can't find `A.t1`, we should add slots used by onClause when projection exist. 2. fix join reorder mark Add mark `LAsscom` when apply `LAsscom` 3. remove slotReference use `Slot` instead of `SlotReference` to avoid cast.	2022-09-19 11:01:25 +08:00
Mingyu Chen	a4ed023bad	[fix](colocation) fix decommission failure with 2 BEs and colocation table (#12644 ) This PR fix: 2 Backends. Create tables with colocation group, 1 replica. Decommission one of Backends. The tablet on decommissioned Backend is not reduced. This is a bug of ColocateTableCheckerAndBalancer.	2022-09-19 08:34:50 +08:00
abmdocrt	4f98146e83	[enhancement](tracing) Support forward to master tracing (#12290 )	2022-09-18 17:39:04 +08:00
Lightman	e01986b8b9	[feature](light-schema-change) fix light-schema-change and add more cases (#12160 ) Fix _delete_sign_idx and _seq_col_idx when append_column or build_schema when load. Tablet schema cache support recycle when schema sptr use count equals 1. Add a http interface for flink-connector to sync ddl. Improve tablet->tablet_schema() by max_version_schema.	2022-09-17 11:29:36 +08:00
924060929	0a95ebf602	[feature](Nereids) Add scalar function code generator and some function trait (#12671 ) This pr did these things: 1. Change the nullable mode of 'from_unixtime' and 'parse_url' from DEPEND_ON_ARGUMENT to ALWAYS_NULLABLE, which nullable configuration was missing previously. 2. Add some new interfaces for origin NullableMode. This change inspired by the grammar of scala's mix-in trait, It help us to quickly understand the traits of function without read the lengthy procedural code and save the work to write some template code, like `class Substring extends ScalarFunction implements ImplicitCastInputTypes, PropagateNullable`. These are the interfaces: - PropagateNullable: equals to NullableMode.DEPEND_ON_ARGUMENT - AlwaysNullable: equals to NullableMode.ALWAYS_NULLABLE - AlwaysNotNullable: equals to NullableMode.ALWAYS_NOT_NULLABLE - others ComputeNullable: equals to NullableMode.CUSTOM 3. Add `GenerateScalarFunction` to generate nereids-style function code from legacy functions, but not actual generate any new function class yet, because the function's trait is not ready for use. I need add some traits for the legacy function's CompareMode and NonDeterministic, this thought is the same as ComputeNullable.	2022-09-16 21:27:30 +08:00

1 2 3 4 5 ...

2818 Commits