doris

Author	SHA1	Message	Date
yangzhg	71bc815b20	[SQL] Support subquery in case when statement (#3135 ) #3153 implement subquery support for sub query in case when statement like ``` SELECT CASE WHEN ( SELECT COUNT() / 2 FROM t ) > k4 THEN ( SELECT AVG(k4) FROM t ) ELSE ( SELECT SUM(k4) FROM t ) END AS kk4 FROM t; ``` this statement will be rewrite to ``` SELECT CASE WHEN t1.a > k4 THEN t2.a ELSE t3.a END AS kk4 FROM t, ( SELECT COUNT() / 2 AS a FROM t ) t1, ( SELECT AVG(k4) AS a FROM t ) t2, ( SELECT SUM(k4) AS a FROM t ) t3; ```	2020-03-25 17:12:54 +08:00
EmmyMiao87	b2518fc285	[SQL] Support non-correlated subquery in having clause (#3150 ) This commit support the non-correlated subquery in having clause. For example: select k1, sum(k2) from table group by k1 having sum(k2) > (select avg(k1) from table); Also the non-scalar subquery is supportted in Doris. For example: select k1, sum(k2) from table group by k1 having sum(k2) > (select avg(k1) from table group by k2); Doris will check the result row numbers of subquery in executing. If more then one row returned by subquery, the query will thrown exception. The implement method: The entire outer query is regarded as inline view of new query. The subquery in having clause is changed to the where predicate in this new query. After this commit, tpc-ds 23,24,44 are supported. This commit also support the subquery in ArithmeticExpr. For example: select k1 from table where k1=0.9*(select k1 from t);	2020-03-25 16:29:09 +08:00
WingC	3cff89df7f	[Dynamic Partition] Support for automatically drop partitions (#3081 )	2020-03-25 10:24:46 +08:00
Dayue Gao	e794bb69b7	[BUG] Make default result ordering of SHOW PARTITIONS statement be consist with 0.11 (#3184 )	2020-03-24 17:14:27 +08:00
lichaoyong	e20d905d70	Remove unused KUDU codes (#3175 ) KUDU table is no longer supported long time ago. Remove code related to it.	2020-03-24 13:54:05 +08:00
HangyuanLiu	d4c1938b5c	Open datetime min value limit (#3158 ) the min_value in olap/type.h of datetime is 0000-01-01 00:00:00, so we don't need restrict datetime min in tablet_sink	2020-03-24 10:52:57 +08:00
Mingyu Chen	473a67a5b8	[Syntax] Remove all EmptyStmt from the end of multi-statements list (#3140 ) to resolve the ISSUE: #3139 When user execute query by some client library such as python MysqlDb, if user execute like: "select * from tbl1;" (with a comma at the end of statement) The sql parser will produce 2 statements: `SelectStmt` and `EmptyStmt`. Here we discard the `EmptyStmt` to make it act like one single statement. This is for some compatibility. Because in python MysqlDb, if the first `SelectStmt` results in some warnings, it will try to execute a `SHOW WARNINGS` statement right after the SelectStmt, but before the execution of `EmptyStmt`. So there will be an exception: `(2014, "Commands out of sync; you can't run this command now")` I though it is a flaw of python MysqlDb. However, in order to maintain the consistency of user use, here we remove all EmptyStmt at the end to prevent errors.(Leave at least one statement) But if user execute statements like: `"select * from tbl1;;select 2"` If first `select * from tbl1` has warnings, python MysqlDb will still throw exception.	2020-03-23 09:39:22 +08:00
Mingyu Chen	0f14408f13	[Temp Partition] Support loading data into temp partitions (#3120 ) Related issue: #2663, #2828. This CL support loading data into specified temporary partitions. ``` INSERT INTO tbl TEMPORARY PARTITIONS(tp1, tp2, ..) ....; curl .... -H "temporary_partition: tp1, tp, .. " .... LOAD LABEL db1.label1 ( DATA INFILE("xxxx") INTO TABLE `tbl2` TEMPORARY PARTITION(tp1, tp2, ...) ... ``` NOTICE: this CL change the FE meta version to 77. There 3 major changes in this CL ## Syntax reorganization Reorganized the syntax related to the `specify-partitions`. Removed some redundant syntax definitions, and unified the syntax related to the `specify-partitions` under one syntax entry. ## Meta refactor In order to be able to support specifying temporary partitions, I made some changes to the way the partition information in the table is stored. Partition information is now organized as follows: The following two maps are reserved in OlapTable for storing formal partitions: ``` idToPartition nameToPartition ``` Use the `TempPartitions` class for storing temporary partitions. All the partition attributes of the formal partition and the temporary partition, such as the range, the number of replicas, and the storage medium, are all stored in the `partitionInfo` of the OlapTable. In `partitionInfo`, we use two maps to store the range of formal partition and temporary partition: ``` idToRange idToTempRange ``` Use separate map is because the partition ranges of the formal partition and the temporary partition may overlap. Separate map can more easily check the partition range. All partition attributes except the partition range are stored using the same map, and the partition id is used as the map key. ## Method to get partition A table may contain both formal and temporary partitions. There are several methods to get the partition of a table. Typically divided into two categories: 1. Get partition by id 2. Get partition by name According to different requirements, the caller may want to obtain a formal partition or a temporary partition. These methods are described below in order to obtain the partition by using the correct method. 1. Get by name This type of request usually comes from a user with partition names. Such as `select * from tbl partition(p1);`. This type of request has clear information to indicate whether to obtain a formal or temporary partition. Therefore, we need to get the partition through this method: `getPartition(String partitionName, boolean isTemp)` To avoid modifying too much code, we leave the `getPartition(String partitionName)`, which is same as: `getPartition(partitionName, false)` 2. Get by id This type of request usually means that the previous step has obtained certain partition ids in some way, so we only need to get the corresponding partition through this method: `getPartition(long partitionId)`. This method will try to get both formal partitions and temporary partitions. 3. Get all partition instances Depending on the requirements, the caller may want to obtain all formal partitions, all temporary partitions, or all partitions. Therefore we provide 3 methods, the caller chooses according to needs. `getPartitions()` `getTempPartitions()` `getAllPartitions()`	2020-03-19 15:07:01 +08:00
caiconghui	cb87a54c2b	[Syntax] Support schema keyword to be compatible with the mysql syntax (#3115 ) create schema db1; drop schema db1;	2020-03-16 17:17:49 +08:00
Mingyu Chen	14c088161c	[New Stmt] Support setting replica status manually (#1522 ) Sometimes a replica is broken on BE, but FE does not notice that. In this case, we have to manually delete that replica on BE. If there are hundreds of replicas need to be handled, this is a disaster. So I add a new stmt: ADMIN SET REPLICA STATUS which support setting tablet on specified BE as BAD or OK.	2020-03-16 13:42:30 +08:00
wyb	01a4ab01c4	[Bug] Fix mapping columns not exist in the table schema (#3113 )	2020-03-14 22:45:39 +08:00
Mingyu Chen	4c98596283	[MysqlProtocol] Support MySQL multiple statements protocol (#3050 ) 2 Changes in this CL: ## Support multiple statements in one request like: ``` select 10; select 20; select 30; ``` ISSUE: #3049 For simple testing this CL, you can using mysql-client shell command tools: ``` mysql> delimiter // mysql> select 1; select 2; // +------+ \| 1 \| +------+ \| 1 \| +------+ 1 row in set (0.01 sec) +------+ \| 2 \| +------+ \| 2 \| +------+ 1 row in set (0.02 sec) Query OK, 0 rows affected (0.02 sec) ``` I add a new class called `OriginStatement.java`, to save the origin statement in string format with an index. This class is mainly for the following cases: 1. User send a multi-statement to the non-master FE: `DDL1; DDL2; DDL3` 2. Currently we cannot separate the original string of a single statement from multiple statements. So we have to forward the entire statement to the Master FE. So I add an index in the forward request. `DDL1`'s index is 0, `DDL2`'s index is 1,... 3. When the Master FE handle the forwarded request, it will parse the entire statement, got 3 DDL statements, and using the `index` to get the specified the statement. ## Optimized the display of syntax errors I have also optimized the display of syntax errors so that longer syntax errors can be fully displayed.	2020-03-13 22:21:40 +08:00
kangkaisen	aa540966c6	Output null for hll and bitmap column when select * (#2991 )	2020-03-13 11:59:30 +08:00
kangkaisen	d8c756260b	Rewrite count distinct to bitmap and hll (#3096 )	2020-03-13 11:44:40 +08:00
EmmyMiao87	c8705ccf12	[MaterializedView] Support dropping materialized view (#3068 ) `DROP MATERIALIZE VIEW [ IF EXISTS ] <mv_name> ON [db_name].<table_name>` Parameters: IF EXISTS: Do not throw an error if the materialized view does not exist. A notice is issued in this case. mv_name: The name of the materialized view to remove. db_name: The name of db to which materialized view belongs. table_name: The name of table to which materialized view belongs.	2020-03-11 18:16:24 +08:00
Mingyu Chen	172838175f	[Bug] Fix bug that index name in MaterializedViewMeta is not changed after schema change (#3048 ) The index name in MaterializedViewMeta is still with `__doris_shadow` prefix after schema change finished. In this CL, I just remove the index name field in MaterializedViewMeta, so that it would makes managing change of names less error-prone.	2020-03-09 10:11:16 +08:00
Mingyu Chen	7b30bbea42	[MaterializedView] Support different keys type between MVs and base table (#3036 ) Firstly, add materialized index meta in olap table The materialized index meta include index name, schema, schemahash, keystype etc. The information itself scattered in each map is encapsulated into MaterializedIndexMeta. Also the keys type of index meta maybe not same as keys type of base index after materialized view enabled. Secondly, support the deduplicate mv. If there is group by or aggregation function in create mv stmt, the keys type of mv is agg. At the same time, the keys type of base table is duplicate. For example Duplicate table (k1, k2, v1) MV (k1, k2) group by k1, k2 It should be aggregated during executing mv.	2020-03-05 18:19:18 +08:00
Mingyu Chen	c731c8b9bc	[Bug] Fix bug of NPE when get replication number from olap table (#3029 ) The default replication number of an olap table may not be set. Every time we call `getReplicationNum()`, we have to check if it returns null, which is inconvenience and may cause problem So in this PR, I set a default value to table's replication number. This bug is introduced by #2958	2020-03-05 12:18:38 +08:00
Mingyu Chen	c032d634f4	[FsBroker] Fix bug that broker cannot read file with %3A in name (#3028 ) The hdfs support file with name like: "2018-01-01 00%3A00%3A00", we should support it. Also change the default broker log level to INFO.	2020-03-04 11:03:01 +08:00
yangzhg	54aa0ed26b	[SetOperation] Change set operation from random shuffle to hash shuffle (#3015 ) use hash shuffle instead of random shuffle in set operation, prepare for intersect and except operation	2020-03-02 19:34:41 +08:00
EmmyMiao87	d151718e98	[MaterializedView] Fix bug that preAggregation is different between old and new selector (#3018 ) If there is no aggregated column in aggregate index, the index will be deduplicate table. For example: aggregate table (k1, k2, v1 sum) mv index (k1, k2) This kind of index is SPJG which same as `select k1, k2 from aggregate_table group by k1, k2`. It also need to check the grouping column using following steps. If there is no aggregated column in duplicate index, the index will be SPJ which passes the grouping verification directly. Also after the supplement of index, the new candidate index should be checked the output columns also.	2020-03-02 19:11:10 +08:00
Mingyu Chen	511c5eed50	[Doc] Modify format of some docs (#3021 ) Format of some docs are incorrect for building the doc website. * fix a bug that `gensrc` dir can not be built with -j. * fix ut bug of CreateFunctionTest	2020-03-02 19:07:52 +08:00
worker24h	ef4bb0c011	[RoutineLoad] Auto Resume RoutineLoadJob (#2958 ) When all backends restart, the routine load job can be resumed.	2020-03-02 13:27:35 +08:00
Mingyu Chen	df56588bb5	[Temp Partition] Support add/drop/replace temp partitions (#2828 ) This CL implements 3 new operations: ``` ALTER TABLE tbl ADD TEMPORARY PARTITION ...; ALTER TABLE tbl DROP TEMPORARY PARTITION ...; ALTER TABLE tbl REPLACE TEMPORARY PARTITION (p1, p2, ...); ``` User manual can be found in document: `docs/documentation/cn/administrator-guide/alter-table/alter-table-temp-partition.md` I did not update the grammar manual of `alter-table.md`. This manual is too confusing and too big, I will reorganize this manual after. This is the first part to implement the "overwrite load" feature mentioned in issue #2663. I will implement the "load to temp partition" feature in next PR. This CL also add GSON serialization method for the following classes (But not used): ``` Partition.java MaterializedIndex.java Tablet.java Replica.java ```	2020-03-01 21:30:34 +08:00
yangzhg	3b5a0b6060	[TPCDS] Implement the planner for set operation (#2957 ) Implement intersect and except planner. This CL does not implement intersect and except node in execution level.	2020-02-27 16:03:31 +08:00
EmmyMiao87	a3e588f39c	[MaterializedView] Implement new materialized view selector (#2821 ) This commit mainly implements the new materialized view selector which supports SPJ<->SPJG. Two parameters are currently used to regulate this function. 1. test_materialized_view: When this parameter is set to true, the user can create a materialized view for the duplicate table by using 'CREATE MATERIALIZED VIEW' command. At the same time, if the result of the new materialized views is different from the old version during the query, an error will be reported. This parameter is false by default, which means that the new version of the materialized view function cannot be enabled. 2. use_old_mv_selector: When this parameter is set to true, the result of the old version selector will be selected. If set to false, the result of the new version selector will be selected. This parameter is true by default, which means that the old selector is used. If the default values of the above two parameters do not change, there will be no behavior changes in the current version. The main steps for the new selector are as follows: 1. Predicates stage: This stage will mainly filter out all materialized views that do not meet the current query requirements. 2. Priorities stage: This stage will sort the results of the first stage and choose the best materialized view. The predicates phase is divided into 6 steps: 1. Calculate the predicate gap between the current query and view. 2. Whether the columns in the view can meet the needs of the compensating predicates. 3. Determine whether the group by columns of view match the group by columns of query. 4. Determine whether the aggregate columns of view match the aggregate columns of query. 5. Determine whether the output columns of view match the output columns of query. 6. Add partial materialized views The priorities phase is divided into two steps: 1. Find the materialized view that matches the best prefix index 2. Find the materialized view with the least amount of data The biggest difference between the current materialized view selector and the previous one is that it supports SPJ <-> SPJG.	2020-02-27 09:14:32 +08:00
Mingyu Chen	8f71b1025a	[Bug][Broker] Fix bug that Broker's alive status is inconsistent in different FEs In this CL, the isAlive field in FsBroker class will be persisted in metadata, to solve the problem describe in ISSUE: #2989 Notice: this CL update FeMetaVersion to 73	2020-02-25 22:27:27 +08:00
kangkaisen	fb5b58b75a	Add more constraints for bitmap column (#2966 )	2020-02-24 10:41:18 +08:00
Mingyu Chen	35b09ecd66	[JDK] Support OpenJDK (#2804 ) Support compile and running Frontend process and Broker process with OpenJDK. OpenJDK 13 is tested.	2020-02-20 23:47:02 +08:00
kangkaisen	ece8740c1b	Fix some function DATE type priority (#2952 ) 1. Fix the bug introduced by https://github.com/apache/incubator-doris/pull/2947. The following sql result is 0000, which is wrong. The result should be 1601 ``` select date_format('2020-02-19 16:01:12','%H%i'); ``` 2. Add constant Express plan test, ensure the FE constant Express compute result is right. 3. Remove the `castToInt ` function in `FEFunctions`, which is duplicated with `CastExpr::getResultValue` 4. Implement `getNodeExplainString` method for `UnionNode`	2020-02-20 20:45:45 +08:00
Mingyu Chen	a015cd0c8b	[Alter] Change table's state right after all rollup jobs being cancelled	2020-02-19 19:45:35 +08:00
kangkaisen	a76f2b8211	bitmap_union_count support window function (#2902 )	2020-02-19 14:33:05 +08:00
yangzhg	7be2871c36	[GroupingSet] Disable column both in select list and aggregate functions when using GROUPING SETS/CUBE/ROLLUP (#2921 )	2020-02-18 13:56:56 +08:00
kangkaisen	625411bd28	Doris support in memory olap table (#2847 )	2020-02-18 10:45:54 +08:00
Mingyu Chen	0fb52c514b	[UDF] Fix bug that UDF can't handle constant null value (#2914 ) This CL modify the `evalExpr()` of ExpressionFunctions, so that it won't change the `FunctionCallExpr` to `NullLiteral` when there is null parameter in UDF. Which will fix the problem described in ISSUE: #2913	2020-02-17 22:13:50 +08:00
Mingyu Chen	1e3b0d31ea	[Rollup] Change table's state right after all rollup jobs are done (#2904 ) In the current implementation, the state of the table will be set until the next round of job scheduling. So there may be tens of seconds between job completion and table state changes to NORMAL. And also, I made the synchronized range smaller by replacing the synchronized methods with synchronized blocks, which may solve the problem described in #2903	2020-02-14 21:28:51 +08:00
yangzhg	ed95352ecd	support intersect and except syntax (#2882 )	2020-02-13 16:48:46 +08:00
yangzhg	3e160aeb66	[GroupingSet] fix a bug when using grouping set without all column in a grouping set item (#2877 ) fix a bug when using grouping sets without all column in a grouping set item will produce wrong value. fix grouping function check will not work in group by clause	2020-02-12 21:50:12 +08:00
wangbo	1f001481ae	Support batch add and drop rollup indexes #2671 (#2781 )	2020-02-11 12:58:01 +08:00
Mingyu Chen	bb4a7381ae	[UnitTest] Support starting mocked FE and BE process in unit test (#2826 ) This CL implements a simulated FE process and a simulated BE service. You can view their specific usage methods at `fe/src/test/java/org/apache/doris/utframe/DemoTest.java` At the same time, I modified the configuration of the maven-surefire-plugin plugin, so that each unit test runs in a separate JVM, which can avoid conflicts caused by various singleton classes in FE. Starting a separate jvm for each unit test will bring about 30% extra time overhead. However, you can control the number of concurrency of unit tests by setting the `forkCount` configuration of the maven-surefire-plugin plugin in `fe/pom.xml`. The default configuration is still 1 for easy viewing of the output log. If set to 3, the entire FE unit test run time is about 4 minutes.	2020-02-03 21:17:57 +08:00
Mingyu Chen	bb00f7e656	[Load] Fix bug of wrong file group aggregation when handling broker load job (#2824 ) Describe the bug First, In the broker load, we allow users to add multiple data descriptions. Each data description represents a description of a file (or set of files). Including file path, delimiter, table and partitions to be loaded, and other information. When the user specifies multiple data descriptions, Doris currently aggregates the data descriptions belonging to the same table and generates a unified load task. The problem here is that although different data descriptions point to the same table, they may specify different partitions. Therefore, the aggregation of data description should not only consider the table level, but also the partition level. Examples are as follows: data description 1 is: ``` DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file1") INTO TABLE `tbl1` PARTITION (p1, p2) ``` data description 2 is: ``` DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file2") INTO TABLE `tbl1` PARTITION (p3, p4) ``` What user expects is to load file1 into partition p1 and p2 of tbl1, and load file2 into paritition p3 and p4 of same table. But currently, it will be aggregated together, which result in loading file1 and file2 into all partitions p1, p2, p3 and p4. Second, the following 2 data descriptions are not allowed: ``` DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file1") INTO TABLE `tbl1` PARTITION (p1, p2) DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file2") INTO TABLE `tbl1` PARTITION (p2, p3) ``` They have overlapping partition(p2), which is not support yet. And we should throw an Exception to cancel this load job. Third, there is a problem with the code implementation. In the constructor of `OlapTableSink.java`, we pass in a string of partition names separated by commas. But at the `OlapTableSink` level, we should be able to pass in a list of partition ids directly, instead of names. ISSUE: #2823	2020-02-03 20:15:13 +08:00
xy720	2a30ac2ba5	[SQL] Return NullLiteral in castTo method instead of throwing a exception (#2799 )	2020-01-21 10:20:31 +08:00
caiconghui	9dc9051930	Remove unused code for ShowPartitionsStmtTest and add apache license header (#2808 )	2020-01-20 22:51:26 +08:00
caiconghui	58ff952837	[Stmt] Support new show functions syntax to make user search function more conveniently (#2800 ) SHOW [FULL] [BUILTIN] FUNCTIONS [IN\|FROM db] [LIKE 'function_pattern'];	2020-01-20 14:14:42 +08:00
WingC	92d8f6ae78	[Alter] Allow submitting alter jobs when table is unstable Alter job will wait table to be stable before running.	2020-01-18 22:56:37 +08:00
caiconghui	ae018043b0	[Alter] Support replication_num setting for table level (#2737 ) Support replication_num setting for table level, so There is no need for user to set replication_num for every alter table add partition statement. eg: `alter table tbl set ("default.replication_num" = "2");`	2020-01-18 21:17:22 +08:00
yangzhg	fc55423032	[SQL] Support Grouping Sets, Rollup and Cube to extend group by statement Support Grouping Sets, Rollup and Cube to extend group by statement support GROUPING SETS syntax ``` SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS ( (a, b), (a), (b), ( ) ); ``` cube or rollup like ``` SELECT a, b,c, SUM( d ) FROM tab1 GROUP BY ROLLUP\|CUBE(a,b,c) ``` [ADD] support grouping functions in expr like grouping(a) + grouping(b) (#2039) [FIX] fix analyzer error in window function(#2039)	2020-01-17 16:24:02 +08:00
xy720	463c0e87ec	Replace PowerMock/EasyMock by Jmockit (4/4) (#2784 ) This commit replaces the PowerMock/EasyMock in our unit tests. (All)	2020-01-17 14:09:00 +08:00
xy720	753a7dd73a	Replace PowerMock/EasyMock by Jmockit (3/4)	2020-01-16 13:24:43 +08:00
xy720	9bc306d17c	Replace PowerMock/EasyMock by Jmockit (2/4) (#2749 )	2020-01-15 20:31:30 +08:00

1 2 3 4 5

234 Commits