doris

Author	SHA1	Message	Date
Yingchun Lai	37fccd53c4	[Tablet] A small refactor on class Tablet (#3339 ) There is no functional changes in this patch. Key refactor points are: - Remove meaningless return value of functions in class Tablet, and also some related functions in other classes - Allow RowsetGraph::capture_consistent_versions to pass a nullptr to the output parameter - Use CHECK instead of LOG(FATAL) to simplify code	2020-04-24 22:22:26 +08:00
yangzhg	0e66385235	[SQL] Disable some unsupported syntax (#3357 ) Disable some syntax when subquery is not binary predicate in case when clause.	2020-04-24 22:01:35 +08:00
HappenLee	4eb27bc7e3	[Profile] Make running profile clearer and more intuitive to improve usability (#3365 ) (#3383 ) This CL mainly made the following modifications: 1. Delete Invalid method in Running Profile Class. 2. Move Memlimit Counter from blockmgr to fragment and add PeakMemUsage Counter 3. Fix the bug of buffer pool memlimit counter 4. Call compute_time_in_profile() before pretty_print() to show the _local_time_percent without child running profile 5. Add TransferThread ThreadToken count in AveThreadToken Counter	2020-04-24 21:38:55 +08:00
EmmyMiao87	07a9401f82	Forbidden correlated having clause (#3378 ) 1. The correlated slot ref should be bound by the agg tuple of outer query. However, the correlated having clause can not be analyzed correctly so the result is incorrect. For example: ``` SELECT k1 FROM test GROUP BY k1 HAVING EXISTS(SELECT k1 FROM baseall GROUP BY k1 HAVING SUM(test.k1) = k1); ``` The correlated predicate is not executed. 2. The limit offset should also be rewritten when there is subquery in having clause. For example: ``` select k1, count(*) cnt from test group by k1 having k1 in (select k1 from baseall order by k1 limit 2) order by k1 limit 5 offset 3; ``` The new stmt should has a limit element with offset.	2020-04-24 21:34:40 +08:00
lichaoyong	7715deed4e	[Doc] Add download link for 0.12.0 release (#3388 )	2020-04-24 21:04:19 +08:00
xy720	09eb40e356	[New Stmt] Alter replication number for table (#3360 ) This CL add new command to set replication number of table in one time. ``` alter table test_tbl set ("replication_num" = "3"); ``` It changes replication num of a unpartitioned table. and ``` alter table test_tbl set ("default.replication_num" = "3"); ``` It changes default replication num of the specified table.	2020-04-23 21:58:09 +08:00
yangzhg	a58bc1957e	Fix expect may produce incorrect values (#3381 )	2020-04-23 09:35:41 +08:00
HangyuanLiu	ad6698cd31	[Performance] Use Google/CCTZ to replace boost at timezone function (#3300 ) NOTICE: the thirdparty dependency need to upgrade to add libcctz.	2020-04-23 09:26:04 +08:00
Mingyu Chen	d854a79878	[Bug] `isQuery` field should be reset at the beginning of query execution (#3374 ) If not reset, all queries comes from same session will have save isQuery field value. This bug will cause all entries in fe.audit.log has same IsQuery=true. This CL also fix another bug: The resolved IPs of domain of a user should not appear in other user's white list. Fix #3380	2020-04-23 09:00:47 +08:00
Yingchun Lai	4a7a88ede1	[LSAN] Fix some memory leak detected by LSAN (#3326 )	2020-04-22 22:59:44 +08:00
WingC	a88ae53326	[Bug]Use OlapTableSink::close to replace OlapTableSink::finalize method to avoid OOM (#3363 ) This CL mainly solve the problem that when recycle `OlapTableSink` object, GC thread will not do it immediately because the class override the `finalize` method, and it will cause OOM.	2020-04-22 19:51:04 +08:00
Yingchun Lai	22e90f7260	[SegmentV2] Fix bloom filter bits buffer not initialize as 0 (#3372 )	2020-04-22 19:50:05 +08:00
wyb	2de78e50e2	[Bug] Fix authorization missing when auditloader plugin redirect stream load (#3367 ) HttpURLConnection can automatically redirect stream load to BE, but there is no authorization information in http request headers after redirect. Maybe HttpURLConnection remove authorization info when do followRedirect. The solution is set the followRedirect property to false on the connection object and do the redirect request manually. #3364	2020-04-21 22:03:18 +08:00
jmk1011	5c53e0fee7	[UnitTest] Modify test to be compatible with coverage tool (#3366 ) C ++ R syntax is not compatible with coverage tools, so modify the syntax for test case.	2020-04-21 21:23:17 +08:00
Mingyu Chen	c6ac60bab9	[SegmentV2] Optimize the upgrade logic of SegmentV2 (#3340 ) This CL mainly made the following modifications: 1. Reorganized SegmentV2 upgrade document. 2. When the variable `use_v2_rollup` is set to true, the base rollup in v2 format is forcibly queried for verifying the data. 3. Fix a problem that there is no persistent storage format information in the schema change operation that performs v2 conversion. 4. Allow users to directly create v2 format tables.	2020-04-21 10:45:29 +08:00
Yunfeng,Wu	b60aabda11	[Doris On ES] Pushdown some castexpr predicate to ES (#3351 ) Process castexpr, such as: k (float) > 2.0, k(int) > 3.2, Doris On Es should ignore this doris native cast transformation for every row's col value, we push down this `cast semantic` to Elasticsearch. I believe in this `predicate` situation, would decrease the mount of data for transmission。 k1 is float: ```` k1 >= 5 ```` push-down filter: ``` {"range":{"k1":{"gte":"5.000000"}}} ``` k2 is int : ``` k2 > 3.2 ``` push-down filter: ``` {"range":{"k2":{"gte":"3.2"}}} ```	2020-04-21 08:34:20 +08:00
Mingyu Chen	a2c8d14fd9	[Bug] Partition key's type has been changed after executing queries (#3348 ) Expr's `uncheckedCastTo()` method should return a new instance of casted expr. The origin expr should remain unchanged.	2020-04-21 08:30:02 +08:00
Mingyu Chen	46272a5621	[Bug] Fix bug of TransactionState SerDe error (#3356 ) The TransactionState's coordinator should be created when deserialized from old meta.	2020-04-21 08:24:10 +08:00
WingC	94b7bb5ad6	[Bug][Dynamic Partition]Fix Bug that dynamic partition properties is not consistent (#3359 )	2020-04-20 23:52:47 +08:00
wutiangan	c69bf9ac44	[New Stmt] Add SHOW KEYS gramma (#3342 ) support `SHOW KEYS FROM table` for the data connector of mainstream BI tools like PowerBI/FineBI #3334	2020-04-20 15:58:20 +08:00
Yunfeng,Wu	753d6cc19f	Add LOG.isDebugEnabled for some debug logical of Coordinator (#3352 ) This may very slightly affect the performance or not.	2020-04-20 08:30:57 +08:00
wangbo	929e93699a	Fix Colocate Join Bug (#3354 ) 1 Fix sync error colocate group status between fe 2 Fix losing call of EditLog.logColocateRemoveTable	2020-04-20 08:29:34 +08:00
xy720	c223d37c99	[Delete] Make some correct in delete operation (#3338 ) #3190 1. Correct the directory of DeleteJob.java 2. Fix some logic fault in DeleteHandlerTest.java 3. Add timeout value in log and exception	2020-04-19 11:49:02 +08:00
Stalary	1d3370532b	[Doc] Fix some typo, mod routine load doc (#3350 ) Fix BOOLEAN typo, improve the routine load sample	2020-04-19 11:39:10 +08:00
xy720	31ebb2496d	[ISSUE #3190 ]Add documents for delete simplifly (#3335 )	2020-04-18 22:48:18 +08:00
kangpinghuang	77a7037346	Fix cooldown timestamp bug (#3336 ) when add a parition with storage_cooldown_time property like this: alter table tablexxx ADD PARTITION p20200421 VALUES LESS THAN("1588262400") ("storage_medium" = "SSD", "storage_cooldown_time" = "2020-05-01 00:00:00"); and show partitions from tablexxx; the CooldownTime is wrong: 2610-02-17 10:16:40, and what is more, the storage migration is based on the wrong timestamp. The reason is that the result of DateLiteral.getLongValue is not timestamp.	2020-04-18 22:47:22 +08:00
caiconghui	67b0da5652	Fix rowset_meta race condition for commit_txn in TxnManager (#3330 )	2020-04-18 18:38:48 +08:00
Yunfeng,Wu	0624f6b9eb	[Doris On ES]Add simple explain for EsTable (#3341 ) related issue: #3306 Note: this PR just remove the es_scan_node_test.cpp which is useless For the moment, just add a simple explain syntax for EsTable without translating the native predicates to ES queryDSL which is better to finished with moving the predicate translating from Doris BE to Doris FE, the whole work is still WIP.	2020-04-18 10:04:03 +08:00
WingC	9331574818	[Transaction] Cancel all txns whose coordinate BE is down. (#3293 ) This CL solve problem: - FE can't aware Coordinate BE down and cancel the txns because the txns can't finish. - Do some code style refactor NOTICE: FE meta version upgrade to 83	2020-04-17 11:24:03 +08:00
kangpinghuang	f3e5320fea	Fix document bug of storage_cooldown_time (#3333 )	2020-04-17 09:34:28 +08:00
caiconghui	224f5d8bad	[SegmentV1] Enable to read and write boolean type data (#3324 ) This PR is to enable to read and write boolean type data for segment v1	2020-04-16 23:39:08 +08:00
xy720	b29cb9dbb3	[Optimize][Delete] Simplify the delete process to make it fast (#3191 ) Our current DELETE strategy reuses the LoadChecker framework. LoadChecker runs jobs in different stages by polling them in every 5 seconds. There are four stages of a load job, Pending/ETL/Loading/Quorum_finish, each of them is allocated to a LoadChecker. Four example, if a load job is submitted, it will be initialized to the Pending state, then wait for running by the Pending LoadChecker. After the pending job is ran, its stage will change to ETL stage, and then wait for running by the next LoadChecker(ETL). Because interval time of the LoadChecker is 5s, in worst case, a pending job need to wait for 20s during its life cycle. In particular, the DELETE jobs do not need to wait for polling, they can run the pushTask() function directly to delete. In this commit, I add a delete handler to concurrently processing delete tasks. All delete tasks will push to BE immediately, not required to wait for LoadCheker, without waiting for 2 LoadChecker(delete job started in LOADING state), at most 10s will be save(5s per LoadCheker). The delete process now is synchronized and users get response only after the delete finished or be canceled. If a delete is running over a certain period of time, it will be cancelled with a timeout exception. NOTICE: this CL upgrade FE meta version to 82	2020-04-16 10:32:44 +08:00
Mingyu Chen	e61793763a	[Bug] Use equals() method to judge whether "type" are equal (#3310 ) I don't why, but I found that sometimes when I use "==" to judge the equality of type, it return false, even if the types are exactly same. ISSUE: #3309 This CL only changes == to equals() to solve the problem, but the reason is still unknown.	2020-04-15 15:04:13 +08:00
HuangWei	91438fcb40	[rowset id] Reduce memory of UniqueRowsetIdGenerator (#3316 )	2020-04-14 22:27:49 +08:00
caiconghui	9257535f91	[New Feature] Support setting replica quota in db level (#3283 ) This PR is to limit the replica usage, admin need to know the replica usage for every db and table, be able to set replica quota for every db. ``` ALTER DATABASE db_name SET REPLICA QUOTA quota; ```	2020-04-14 22:25:32 +08:00
令狐少侠	688927918c	[Doris on ES] Fix bug: when Doris and ES type not match (#3315 )	2020-04-14 20:15:13 +08:00
HuangWei	807499427c	unregister fragment mem tracker in close() (#3286 ) ref https://github.com/apache/incubator-doris/issues/3273 P.S. `614a76beea/be/src/runtime/plan_fragment_executor.cpp (L559-L562)` I think this piece of code is useless. This `_mem_tracker` in `PlanFragmentExecutor` is set as fragment_mem_tracker of `RuntimeState`. direct use We use it in these code, when rowbatch reset, mem tracker's consumption will be released. `7eab12a40e/be/src/exec/olap_rewrite_node.cpp (L57-L58)` `839ec45197/be/src/exec/olap_scan_node.cpp (L1217-L1218)` other usage e.g. `6c33f80544/be/src/exec/olap_scanner.cpp (L245)` won't consume the fragment mem tracker. We don't need to worry about the fragment mem tracker consumption is not zero when we want to destroy it. Or we can add a consumption check before we close the mem tracker?	2020-04-13 23:15:56 +08:00
Yunfeng,Wu	a467c6f81f	[ES Connector] Add field context for string field keyword type (#3305 ) This PR is just a transitional way，but it is better to move the predicates transformation from Doris BE to Doris BE, in this way, Doris BE is responsible for fetching data from ES. Add a `enable_keyword_sniff ` configuration item in creating External Elasticsearch Table ，it default to true , would to sniff the `keyword` type on the `text analyzed` Field and return the `json_path` which substitute the origin col name. ``` CREATE EXTERNAL TABLE `test` ( `k1` varchar(20) COMMENT "", `create_time` datetime COMMENT "" ) ENGINE=ELASTICSEARCH PROPERTIES ( "hosts" = "http://10.74.167.16:8200", "user" = "root", "password" = "root", "index" = "test", "type" = "doc", "enable_keyword_sniff" = "true" ); ``` note: `enable_keyword_sniff` default to "true" run this SQL： ``` select * from test where k1 = "wu yun feng" ``` Output predicate DSL： ``` {"term":{"k1.keyword":"wu yun feng"}} ``` and in this PR, I remove the elasticsearch version detected logic for now this is useless, maybe future is needed.	2020-04-13 23:07:33 +08:00
Lijia Liu	be090f5929	Use read lock when iterate tablet_map in TabletManager::start_trash_sweep (#3294 )	2020-04-13 11:18:33 +08:00
EmmyMiao87	7c07083cd5	Forbidden multi subquery in having clause (#3291 ) Multiple subqueries in the having statement need to be rewritten into multiple tables for join. The current rewriting rules need to be transformed. And this writing is not common, and there is no strong requirement from the business side. This function will be added later if it is required.	2020-04-11 21:56:08 +08:00
Mingyu Chen	5b69c70f9a	[Bug] Fix bug that user plugin dir is removed after installing the plugin (#3302 ) When user install a FE plugin from a directory, the directory should not be removed after installing.	2020-04-11 20:30:14 +08:00
lichaoyong	3086790e06	Fix bug when use ZoneMap/BloomFiter on column with REPLACE/REPLACE_IF_NOT_NULL (#3288 ) Now, column with REPLACE/REPLACE_IF_NOT_NULL can be filtered by ZoneMap/BloomFilter when the rowset is base(version starts with zero). Always we think is an optimization. But when some case, it will occurs bug. create table test( k1 int, v1 int replace, v2 int sum ); If I have two records on different two versions 1 2 2 on version [0-10] 1 3 1 on version 11 If I perform a query select * from test where k1 = 1 and v1 = 3; The result will be 1 3 1, this is not right because of the first record is filtered. The right answer is 1 3 3, the v2 should be summed. Remove this optimization is necessity to make the result is right.	2020-04-10 10:22:21 +08:00
Yingchun Lai	f39c8b156d	[refactor] A small refactor on class DataDir (#3276 ) main refactor points are: - Use a single get_absolute_tablet_path function instead of 3 independent functions - Remove meaningless return value of register_tablet and deregister_tablet - Some typo and format	2020-04-10 00:32:22 +08:00
Mingyu Chen	ce1d5ab9ab	[Bug] Fix some bugs of install/uninstall plugins (#3267 ) 1. Avoid losing plugin if plugin failed to load when replaying When in replay process, the plugin should always be added to the plugin manager, even if that plugin failed to be loaded. 2. `show plugin` statement should show all plugins, not only the successfully installed plugins. 3. plugin's name should be unique globally and case insensitive. 4. Avoid creating new instances of plugins when doing metadata checkpoint. 5. Add a __builtin_ prefix for builtin plugins.	2020-04-09 23:04:28 +08:00
caiconghui	a5703ef114	[Performance] Support sharding txn_map_lock into more small map locks to make good performance for txn manage task (#3222 ) This PR is to enhance the performance for txn manage task, when there are so many txn in BE, the only one txn_map_lock and additional _txn_locks may cause poor performance, and now we remove the additional _txn_locks and split the txn_map_lock into many small locks.	2020-04-09 22:35:15 +08:00
HangyuanLiu	037bc53b54	[BUG] Fix cast result expr bug (#3279 ) When the result type is a date type, the result expr type should not be cast. Because in the FE function, the specific type of the date type is determined by the actual type of the return value, not by the function return value type. For example, the function `str_to_date` may return DATE or DATETIME, depends on the format pattern. DATE: ``` mysql> select str_to_date('11/09/2011', '%m/%d/%Y'); +---------------------------------------+ \| str_to_date('11/09/2011', '%m/%d/%Y') \| +---------------------------------------+ \| 2011-11-09 \| +---------------------------------------+ ``` DATETIME: ``` mysql> select str_to_date('2014-12-21 12:34:56', '%Y-%m-%d %H:%i:%s'); +---------------------------------------------------------+ \| str_to_date('2014-12-21 12:34:56', '%Y-%m-%d %H:%i:%s') \| +---------------------------------------------------------+ \| 2014-12-21 12:34:56 \| +---------------------------------------------------------+	2020-04-09 22:02:05 +08:00
yangzhg	8699bb7bd4	[Query] Optimize where clause by extracting the common predicate in the OR compound predicate. (#3278 ) Queries like below cannot finish in a acceptable time, `store_sales` has 2800w rows, `customer_address` has 5w rows, for now Doris will create only one cross join node to execute this sql, the time of eval the where clause is about 200-300 ns, the total count of eval will be 2800w * 5w, this is extremely large, and this will cost 2800w * 5w * 250 ns = 4 billion seconds； ``` select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales, customer_address where ((ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('CO', 'IL', 'MN') and ss_net_profit between 100 and 200 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('OH', 'MT', 'NM') and ss_net_profit between 150 and 300 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('TX', 'MO', 'MI') and ss_net_profit between 50 and 250 )) ``` but this sql can be rewrite to ``` select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales, customer_address where ss_addr_sk = ca_address_sk and ca_country = 'United States' and (((ca_state in ('CO', 'IL', 'MN') and ss_net_profit between 100 and 200 ) or (ca_state in ('OH', 'MT', 'NM') and ss_net_profit between 150 and 300 ) or (ca_state in ('TX', 'MO', 'MI') and ss_net_profit between 50 and 250 )) ) ``` there for we can do a hash join first and then use ``` (((ca_state in ('CO', 'IL', 'MN') and ss_net_profit between 100 and 200 ) or (ca_state in ('OH', 'MT', 'NM') and ss_net_profit between 150 and 300 ) or (ca_state in ('TX', 'MO', 'MI') and ss_net_profit between 50 and 250 )) ) ``` to filter the value, in TPCDS 10g dataset, the rewritten sql only cost about 1 seconds.	2020-04-09 21:57:45 +08:00
HangyuanLiu	3dc7ef634b	[Dependency]Add cctz lib (#3280 ) Add Google/CCTZ lib in Doris	2020-04-09 19:14:09 +08:00
HuangWei	e32ed28bf4	[Storage] Use getmntent_r() for thread-safe (#3284 )	2020-04-09 14:19:09 +08:00
Yunfeng,Wu	614a76beea	[Doris on ES] Support compound_and predicate push down to Elasticsearch (#3277 ) Relate Issue: https://github.com/apache/incubator-doris/issues/3248 SQL: ``` select * from test where (k2 = 6 and k3 = 1) or (k2 = 2 and k3 =3 and k4 = 'beijing'); ``` Output filter: ``` ((#k2:[6 TO 6] #k3:[1 TO 1]) (#(#k2:[2 TO 2] #k3:[3 TO 3]) #k4:beijing))~1 ``` SQL: ``` select * from test where (k2 = 6 or k3 = 7) or (k2 = 2 and k3 =3 and (k4 = 'beijing' or k4 = 'zhaochun')); ``` Output filter: ``` (k2:[6 TO 6] k3:[7 TO 7] (#(#k2:[2 TO 2] #k3:[3 TO 3]) #((k4:beijing k4:zhaochun)~1)))~1 ``` SQL: ``` select * from test where (k2 = 6 or k3 = 7) or (k2 = 2 and abs(k3) =3 and (k4 = 'beijing' or k4 = 'zhaochun')); ``` Output filter (`abs` can not be pushed down to es, so doris on es would not process this scenario ): ``` match_all ```	2020-04-08 21:09:39 +08:00

1 2 3 4 5 ...

1750 Commits