doris

Author	SHA1	Message	Date
EmmyMiao87	aa4ac2d078	[Bug] Serialize storage format in rollup job (#3686 ) The segment v2 rollup job should set the storage format v2 and serialize it. If it is not serialized, the rollup of segment v2 may use the error format 'segment v1'.	2020-05-26 15:35:12 +08:00
Mingyu Chen	1124808fbc	[Enhancement] Add detail msg to show the reason of publish failure. (#3647 ) Add 2 new columns `PublishTime` and `ErrMsg` to show publish version time and errors happen during the transaction process. Can be seen by executing: `SHOW PROC "/transactions/dbId/";` or `SHOW TRANSACTION WHERE ID=xx;` Currently is only record error happen in publish phase, which can help us to find out which txn is blocked. Fix #3646	2020-05-22 22:59:53 +08:00
yangzhg	00d563d014	[SQL] Support more syntax in case when clause (#3625 ) support support more syntax in case-when clause with subquey. suport query like ` case when k1 > subquery1 and k2 < subquey2 then ... else ... ` or `case when subquey in null then ...`	2020-05-22 10:22:59 +08:00
worker24h	ef8fd1fcbe	[Load] Support load json-data into Doris by RoutineLoad or StreamLoad (#3553 ) Doris support load json-data by RoutineLoad or StreamLoad	2020-05-21 13:00:49 +08:00
Mingyu Chen	4cbcae1574	[Spark on Doris] Shade and provide the thrift lib in spark-doris-connector (#3631 ) Mainly changes: 1. Shade and provide the thrift lib in spark-doris-connector 2. Add a `build.sh` for spark-doris-connector 3. Move the README.md of spark-doris-connector to `docs/` 4. Change the line delimiter of `fe/src/test/java/org/apache/doris/analysis/AggregateTest.java`	2020-05-19 14:20:21 +08:00
Mingyu Chen	9d72d1bb87	[Refactor] Refactor some redundant code && Replace some UT by UtFrameUtils This CL have no logic changed, just do some code refactor and use new UtFrameWork to replace some old UT. NOTICE(#3622): This is a "revert of revert pull request". This pr is mainly used to synthesize the PRs whose commits were scattered and submitted due to the wrong merge method into a complete single commit.	2020-05-18 14:53:59 +08:00
Mingyu Chen	c2c81d58dc	[Fix]SlotRef.tosql() is the same as the SQL returned by different sql Fix: #3555 NOTICE(#3622): This is a "revert of revert pull request". This pr is mainly used to synthesize the PRs whose commits were scattered and submitted due to the wrong merge method into a complete single commit.	2020-05-18 14:47:48 +08:00
Mingyu Chen	7a83c5662d	[Bug] fix OrCompoundPredicate predicate fold bug #3596 Fix: #3596 NOTICE(#3622): This is a "revert of revert pull request". This pr is mainly used to synthesize the PRs whose commits were scattered and submitted due to the wrong merge method into a complete single commit.	2020-05-18 14:46:34 +08:00
Mingyu Chen	903592d82b	Revert "Refactor some redunant code && Replace some UT by UtFrameUtils" (#3613 ) This revert is used to correct the mess of the commit timeline caused by the wrong merge method.	2020-05-18 13:11:39 +08:00
Mingyu Chen	539efb3532	Revert "[Fix]SlotRef.tosql() is the same as the SQL returned by different sql" (#3610 ) This revert is used to correct the mess of the commit timeline caused by the wrong merge method.	2020-05-18 13:07:21 +08:00
Mingyu Chen	20f20239f2	Revert "[Bug] fix OrCompoundPredicate predicate fold bug #3596 " (#3609 ) This revert is used to correct the mess of the commit timeline caused by the wrong merge method.	2020-05-18 13:03:24 +08:00
Mingyu Chen	2f3b7b5b8e	[Refactor] Refactor some redundant code && Replace some UT by UtFrameUtils	2020-05-18 10:53:32 +08:00
Mingyu Chen	62f746fc87	[Fix] SlotRef.tosql() is the same as the SQL returned by different sql	2020-05-18 10:41:15 +08:00
Mingyu Chen	e6588981b4	[Bug] fix OrCompoundPredicate predicate fold bug #3596 (#3597 ) * [Bug] fix OrCompoundPredicate predicate fold bug * fix code style	2020-05-18 10:36:13 +08:00
wutiangan	5138197d57	[Bug] generate exceptions to avoid mulitDistinctAggregation produces wrong results (#3561 ) when a query (#3492) contain “2 DistinctAggregation with one column” and “1 DistinctAggregation with two columns”, it will produce wrong result. This pull request is not to solve this problem really, but to generate exceptions to avoid getting wrong results. This problem needs a real repair in future.	2020-05-16 21:36:43 +08:00
marising	4217db00d3	Tosql method returns slot index and column name	2020-05-15 17:31:25 +08:00
wutiangan	0919407092	[Bug] fix OrCompoundPredicate predicate fold bug	2020-05-15 10:20:13 +08:00
wutiangan	9f224cdd8a	[Bug] Fix bug of Partition prune of constant in predicate (#3476 ) 1. phenomenon： The following two statements are the same, but a query has results and the other query has no results mysql> select * from (select '积极' as kk1, sum(k2) from table_range where k1 = '2013-01-01' group by kk1)tt where kk1 = '积极'; +--------+-----------+ \| kk1 \| sum(`k2`) \| +--------+-----------+ \| 积极 \| 1 \| +--------+-----------+ 1 row in set (0.01 sec) mysql> select * from (select '积极' as kk1, sum(k2) from table_range where k1 = '2013-01-01' group by kk1)tt where kk1 in ('积极'); Empty set (0.01 sec) 2. reason： In partition prune, constant in predicate（‘积极’ in ‘积极’） is mistakenly considered to meet partition prune conditions, and mistakenly regarded as partition prune column. Then in partition prune , no partition is considered to meet the requirements, so it is planned to be 0 partition in query planning	2020-05-14 11:46:13 +08:00
Mingyu Chen	ca7c0717cd	Fix compile bug (#3557 )	2020-05-12 10:24:37 +08:00
caiconghui	b648734441	[TxxMgr] Support txn management in db level and use ArrayDeque to improve txn task performance (#3369 ) This PR is the first step to make Doris stream load more robust with higher concurrent performance(#3368)，the main work is to support txn management in db level isolation and use ArrayDeque to stored final status txns.	2020-05-11 23:32:43 +08:00
WingC	4294301c53	Throw DdlException when use `admin set frontend config` (#3539 ) The set more than one config in a single set config stmt, an exception will be thrown to forbid the operation.	2020-05-11 23:29:38 +08:00
wangcong18	1b14dd4426	Refactor some redunant code && Replace some UT by UtFrameUtils This CL have no logic change, just do some code refacotr and UT Change.	2020-05-08 12:07:33 +08:00
Mingyu Chen	084515317f	[Bug] Fix constant In Predicate result error (#3511 ) `select 1 not in (2, NULL, 1);` should return `0`	2020-05-08 11:30:11 +08:00
wangbo	d60bb81cb0	[SQL Function] Calculate 'case when expr' when possible (#3396 ) Calculate 'case when expr' when possible	2020-05-07 22:04:09 +08:00
Mingyu Chen	ca36dc697f	[Bug] Fix bug that push down logic error on semi join (#3481 ) For SQL like: ``` select * from join1 left semi join join2 on join1.id = join2.id and join2.id > 1; ``` the predicate `join2.id > 1` can not be pushed down to table join1.	2020-05-07 09:30:30 +08:00
Mingyu Chen	101628c813	[Bug] Fix bug of predicate pushdown logic (#3475 ) When there is subquery in where clause, the query will be rewritten to join operation. And some auxiliary binary predicates will be generated. These binary predicates will not go through the ExprRewriteRule, so they are not normalized as "column to the left and constant to the right" format. We need to take this case into account so that the `canPushDownPredicate()` judgement will not throw exception.	2020-05-06 15:15:37 +08:00
kangkaisen	caa7a07c70	[Query Plan]Support simple transitivity on join predicate pushdown (#3453 ) Current implement is very simply and conservative, because our query planner is error-prone. After we implement the new query planner, we could do this work by `Predicate Equivalence Class` and `PredicatePushDown` rule like presto.	2020-05-04 15:32:19 +08:00
Mingyu Chen	a5922051c9	[Fix] Fix bug that rowset meta is deleted after compaction (#3451 ) * [Fix] Fix bug that rowset meta is deleted after compaction After compaction, the tablet rowset meta will be modified by adding to new output rowsets and deleting the old input rowsets. The output version may equals to the input version. So we should delete the "input" version from _rs_version_map before adding the "output" version to _rs_version_map. Otherwise, the new "output" version will be lost in _rs_version_map.	2020-05-04 09:45:25 +08:00
wangbo	da4d2d2699	[UT] Fix UT bug (#3456 ) SSD cool downtime shouldn't be fix time in UT;	2020-05-03 16:24:08 +08:00
Mingyu Chen	7ef1e2ce5b	[Bug] Fix bug that load data to wrong temp partitions (#3422 ) When loading data without specifying partition, the data should only be loaded to formal partitions, not including temp partitions;	2020-04-30 15:11:28 +08:00
Mingyu Chen	c9ec4e8a73	[UT] Fix AlterTest UT failed (#3437 )	2020-04-30 14:40:33 +08:00
caiconghui	42d14028a0	Use ThreadPoolManager to create threadPool and add some prometheus metrics about pool (#3386 )	2020-04-25 15:57:15 +08:00
yangzhg	0e66385235	[SQL] Disable some unsupported syntax (#3357 ) Disable some syntax when subquery is not binary predicate in case when clause.	2020-04-24 22:01:35 +08:00
EmmyMiao87	07a9401f82	Forbidden correlated having clause (#3378 ) 1. The correlated slot ref should be bound by the agg tuple of outer query. However, the correlated having clause can not be analyzed correctly so the result is incorrect. For example: ``` SELECT k1 FROM test GROUP BY k1 HAVING EXISTS(SELECT k1 FROM baseall GROUP BY k1 HAVING SUM(test.k1) = k1); ``` The correlated predicate is not executed. 2. The limit offset should also be rewritten when there is subquery in having clause. For example: ``` select k1, count(*) cnt from test group by k1 having k1 in (select k1 from baseall order by k1 limit 2) order by k1 limit 5 offset 3; ``` The new stmt should has a limit element with offset.	2020-04-24 21:34:40 +08:00
xy720	09eb40e356	[New Stmt] Alter replication number for table (#3360 ) This CL add new command to set replication number of table in one time. ``` alter table test_tbl set ("replication_num" = "3"); ``` It changes replication num of a unpartitioned table. and ``` alter table test_tbl set ("default.replication_num" = "3"); ``` It changes default replication num of the specified table.	2020-04-23 21:58:09 +08:00
WingC	a88ae53326	[Bug]Use OlapTableSink::close to replace OlapTableSink::finalize method to avoid OOM (#3363 ) This CL mainly solve the problem that when recycle `OlapTableSink` object, GC thread will not do it immediately because the class override the `finalize` method, and it will cause OOM.	2020-04-22 19:51:04 +08:00
Mingyu Chen	c6ac60bab9	[SegmentV2] Optimize the upgrade logic of SegmentV2 (#3340 ) This CL mainly made the following modifications: 1. Reorganized SegmentV2 upgrade document. 2. When the variable `use_v2_rollup` is set to true, the base rollup in v2 format is forcibly queried for verifying the data. 3. Fix a problem that there is no persistent storage format information in the schema change operation that performs v2 conversion. 4. Allow users to directly create v2 format tables.	2020-04-21 10:45:29 +08:00
Mingyu Chen	a2c8d14fd9	[Bug] Partition key's type has been changed after executing queries (#3348 ) Expr's `uncheckedCastTo()` method should return a new instance of casted expr. The origin expr should remain unchanged.	2020-04-21 08:30:02 +08:00
Mingyu Chen	46272a5621	[Bug] Fix bug of TransactionState SerDe error (#3356 ) The TransactionState's coordinator should be created when deserialized from old meta.	2020-04-21 08:24:10 +08:00
xy720	c223d37c99	[Delete] Make some correct in delete operation (#3338 ) #3190 1. Correct the directory of DeleteJob.java 2. Fix some logic fault in DeleteHandlerTest.java 3. Add timeout value in log and exception	2020-04-19 11:49:02 +08:00
kangpinghuang	77a7037346	Fix cooldown timestamp bug (#3336 ) when add a parition with storage_cooldown_time property like this: alter table tablexxx ADD PARTITION p20200421 VALUES LESS THAN("1588262400") ("storage_medium" = "SSD", "storage_cooldown_time" = "2020-05-01 00:00:00"); and show partitions from tablexxx; the CooldownTime is wrong: 2610-02-17 10:16:40, and what is more, the storage migration is based on the wrong timestamp. The reason is that the result of DateLiteral.getLongValue is not timestamp.	2020-04-18 22:47:22 +08:00
WingC	9331574818	[Transaction] Cancel all txns whose coordinate BE is down. (#3293 ) This CL solve problem: - FE can't aware Coordinate BE down and cancel the txns because the txns can't finish. - Do some code style refactor NOTICE: FE meta version upgrade to 83	2020-04-17 11:24:03 +08:00
xy720	b29cb9dbb3	[Optimize][Delete] Simplify the delete process to make it fast (#3191 ) Our current DELETE strategy reuses the LoadChecker framework. LoadChecker runs jobs in different stages by polling them in every 5 seconds. There are four stages of a load job, Pending/ETL/Loading/Quorum_finish, each of them is allocated to a LoadChecker. Four example, if a load job is submitted, it will be initialized to the Pending state, then wait for running by the Pending LoadChecker. After the pending job is ran, its stage will change to ETL stage, and then wait for running by the next LoadChecker(ETL). Because interval time of the LoadChecker is 5s, in worst case, a pending job need to wait for 20s during its life cycle. In particular, the DELETE jobs do not need to wait for polling, they can run the pushTask() function directly to delete. In this commit, I add a delete handler to concurrently processing delete tasks. All delete tasks will push to BE immediately, not required to wait for LoadCheker, without waiting for 2 LoadChecker(delete job started in LOADING state), at most 10s will be save(5s per LoadCheker). The delete process now is synchronized and users get response only after the delete finished or be canceled. If a delete is running over a certain period of time, it will be cancelled with a timeout exception. NOTICE: this CL upgrade FE meta version to 82	2020-04-16 10:32:44 +08:00
Mingyu Chen	e61793763a	[Bug] Use equals() method to judge whether "type" are equal (#3310 ) I don't why, but I found that sometimes when I use "==" to judge the equality of type, it return false, even if the types are exactly same. ISSUE: #3309 This CL only changes == to equals() to solve the problem, but the reason is still unknown.	2020-04-15 15:04:13 +08:00
caiconghui	9257535f91	[New Feature] Support setting replica quota in db level (#3283 ) This PR is to limit the replica usage, admin need to know the replica usage for every db and table, be able to set replica quota for every db. ``` ALTER DATABASE db_name SET REPLICA QUOTA quota; ```	2020-04-14 22:25:32 +08:00
lichaoyong	3086790e06	Fix bug when use ZoneMap/BloomFiter on column with REPLACE/REPLACE_IF_NOT_NULL (#3288 ) Now, column with REPLACE/REPLACE_IF_NOT_NULL can be filtered by ZoneMap/BloomFilter when the rowset is base(version starts with zero). Always we think is an optimization. But when some case, it will occurs bug. create table test( k1 int, v1 int replace, v2 int sum ); If I have two records on different two versions 1 2 2 on version [0-10] 1 3 1 on version 11 If I perform a query select * from test where k1 = 1 and v1 = 3; The result will be 1 3 1, this is not right because of the first record is filtered. The right answer is 1 3 3, the v2 should be summed. Remove this optimization is necessity to make the result is right.	2020-04-10 10:22:21 +08:00
Mingyu Chen	ce1d5ab9ab	[Bug] Fix some bugs of install/uninstall plugins (#3267 ) 1. Avoid losing plugin if plugin failed to load when replaying When in replay process, the plugin should always be added to the plugin manager, even if that plugin failed to be loaded. 2. `show plugin` statement should show all plugins, not only the successfully installed plugins. 3. plugin's name should be unique globally and case insensitive. 4. Avoid creating new instances of plugins when doing metadata checkpoint. 5. Add a __builtin_ prefix for builtin plugins.	2020-04-09 23:04:28 +08:00
HangyuanLiu	037bc53b54	[BUG] Fix cast result expr bug (#3279 ) When the result type is a date type, the result expr type should not be cast. Because in the FE function, the specific type of the date type is determined by the actual type of the return value, not by the function return value type. For example, the function `str_to_date` may return DATE or DATETIME, depends on the format pattern. DATE: ``` mysql> select str_to_date('11/09/2011', '%m/%d/%Y'); +---------------------------------------+ \| str_to_date('11/09/2011', '%m/%d/%Y') \| +---------------------------------------+ \| 2011-11-09 \| +---------------------------------------+ ``` DATETIME: ``` mysql> select str_to_date('2014-12-21 12:34:56', '%Y-%m-%d %H:%i:%s'); +---------------------------------------------------------+ \| str_to_date('2014-12-21 12:34:56', '%Y-%m-%d %H:%i:%s') \| +---------------------------------------------------------+ \| 2014-12-21 12:34:56 \| +---------------------------------------------------------+	2020-04-09 22:02:05 +08:00
yangzhg	8699bb7bd4	[Query] Optimize where clause by extracting the common predicate in the OR compound predicate. (#3278 ) Queries like below cannot finish in a acceptable time, `store_sales` has 2800w rows, `customer_address` has 5w rows, for now Doris will create only one cross join node to execute this sql, the time of eval the where clause is about 200-300 ns, the total count of eval will be 2800w * 5w, this is extremely large, and this will cost 2800w * 5w * 250 ns = 4 billion seconds； ``` select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales, customer_address where ((ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('CO', 'IL', 'MN') and ss_net_profit between 100 and 200 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('OH', 'MT', 'NM') and ss_net_profit between 150 and 300 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('TX', 'MO', 'MI') and ss_net_profit between 50 and 250 )) ``` but this sql can be rewrite to ``` select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales, customer_address where ss_addr_sk = ca_address_sk and ca_country = 'United States' and (((ca_state in ('CO', 'IL', 'MN') and ss_net_profit between 100 and 200 ) or (ca_state in ('OH', 'MT', 'NM') and ss_net_profit between 150 and 300 ) or (ca_state in ('TX', 'MO', 'MI') and ss_net_profit between 50 and 250 )) ) ``` there for we can do a hash join first and then use ``` (((ca_state in ('CO', 'IL', 'MN') and ss_net_profit between 100 and 200 ) or (ca_state in ('OH', 'MT', 'NM') and ss_net_profit between 150 and 300 ) or (ca_state in ('TX', 'MO', 'MI') and ss_net_profit between 50 and 250 )) ) ``` to filter the value, in TPCDS 10g dataset, the rewritten sql only cost about 1 seconds.	2020-04-09 21:57:45 +08:00
lichaoyong	c9c58342b2	[License] Add License to codes (#3272 )	2020-04-07 16:35:13 +08:00

1 2 3 4 5 ...

298 Commits