doris

Author	SHA1	Message	Date
Xinyi Zou	b6bdb3bdbc	[fix] (mem tracker) Fix MemTracker accuracy (#11190 )	2022-07-27 18:59:24 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
carlvinhust2012	cff9ffa0e1	fix the inaccurate comments (#10617 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-07-06 17:54:43 +08:00
Gabriel	a2f74bf260	[Improvement] remove profile with poor readability (#10581 )	2022-07-05 11:09:23 +08:00
Tiewei Fang	c9f86bc7e2	[refactor] Refactoring Status static methods to format message using fmt(#9533 )	2022-07-02 18:58:23 +08:00
yiguolei	4ec6e3ee81	[refactor] Remove debug action since it is never used. (#10484 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-06-29 20:37:51 +08:00
Pxl	fd0bd395ac	[Enhancement] Remove some unused include (#10035 )	2022-06-17 10:47:25 +08:00
jacktengg	f4dd3bf013	[bugfix] fix memleak in olapscannode(#9736 )	2022-05-26 15:06:54 +08:00
Xinyi Zou	519305cb22	[feature-wip] (memory tracker) (step4) Switch TLS mem tracker to separate more detailed memory usage (#8669 ) Based on #8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.	2022-04-08 09:02:26 +08:00
Xinyi Zou	eeae516e37	[Feature](Memory) Hook TCMalloc new/delete automatically counts to MemTracker (#8476 ) Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G Implement a new way of memory statistics based on TCMalloc New/Delete Hook, MemTracker and TLS, and it is expected that all memory new/delete/malloc/free of the BE process can be counted.	2022-03-20 23:06:54 +08:00
Pxl	668188b91f	[improvement][vectorized] support es node predicate peel (#8174 )	2022-02-26 17:02:54 +08:00
HappenLee	e1d7233e9c	[feature](vectorization) Support Vectorized Exec Engine In Doris (#7785 ) # Proposed changes Issue Number: close #6238 Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com> Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com> Co-authored-by: wangbo <506340561@qq.com> Co-authored-by: emmymiao87 <522274284@qq.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com> Co-authored-by: thinker <zchw100@qq.com> Co-authored-by: Zeno Yang <1521564989@qq.com> Co-authored-by: Wang Shuo <wangshuo128@gmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> Co-authored-by: xinghuayu007 <1450306854@qq.com> Co-authored-by: weizuo93 <weizuo@apache.org> Co-authored-by: yiguolei <guoleiyi@tencent.com> Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com> Co-authored-by: awakeljw <993007281@qq.com> Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com> Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com> ## Problem Summary: ### 1. Some code from clickhouse ClickHouse is an excellent implementation of the vectorized execution engine database, so here we have referenced and learned a lot from its excellent implementation in terms of data structure and function implementation. We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers. The following comment has been added to the code from Clickhouse, eg: // This file is copied from // https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h // and modified by Doris ### 2. Support exec node and query: * vaggregation_node * vanalytic_eval_node * vassert_num_rows_node * vblocking_join_node * vcross_join_node * vempty_set_node * ves_http_scan_node * vexcept_node * vexchange_node * vintersect_node * vmysql_scan_node * vodbc_scan_node * volap_scan_node * vrepeat_node * vschema_scan_node * vselect_node * vset_operation_node * vsort_node * vunion_node * vhash_join_node You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set. ### 3. Data Model Vec Exec Engine Support Dup/Agg/Unq table, Support Block Reader Vectorized. Segment Vec is working in process. ### 4. How to use 1. Set the environment variable `set enable_vectorized_engine = true; `(required) 2. Set the environment variable `set batch_size = 4096; ` (recommended) ### 5. Some diff from origin exec engine https://github.com/doris-vectorized/doris-vectorized/issues/294 ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (Yes) 3. Has document been added or modified: (No) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (Yes)	2022-01-18 10:07:15 +08:00
Zhengguo Yang	6c6380969b	[refactor] replace boost smart ptr with stl (#6856 ) 1. replace all boost::shared_ptr to std::shared_ptr 2. replace all boost::scopted_ptr to std::unique_ptr 3. replace all boost::scoped_array to std::unique<T[]> 4. replace all boost:thread to std::thread	2021-11-17 10:18:35 +08:00
stdpain	7eae3e280a	[optimization] use inline optimize ExprContext::get_value (#5385 )	2021-02-16 22:35:14 +08:00
Zhengguo Yang	93a4c7efc1	[LOG] Standardize the use of VLOG in code (#5264 ) At present, the application of vlog in the code is quite confusing. It is inherited from impala VLOG_XX format, and there is also VLOG(number) format. VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG	2021-01-21 12:09:09 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
Yunfeng,Wu	7b2762b1b1	[Doris On ES][Bug-Fix] Can not pushdown limit when some predicate can not processed by ES (#4768 ) Can not pushdown limit when some predicate not processed by ES, fixed: #4761	2020-10-21 12:10:55 +08:00
Zhengguo Yang	09f97f8a05	[Refactor] Fixes some be typo part 2 (#4747 )	2020-10-20 09:28:57 +08:00
HuangWei	10f822eb43	[MemTracker] make all MemTrackers shared (#4135 ) We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web. As follows: 1. nearly all MemTracker raw ptr -> shared_ptr 2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent) 3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec before MemTracker's destructor. So we don't change these code. 4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile. Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.	2020-07-31 21:57:21 +08:00
令狐少侠	5a57ecca15	[Doris On ES]fix bug of query failed in doc_value_mode when fields have none value (#3513 ) #3479 Here I try to explain the cause of the problem and how to fix it. The Cause of The problem Take the case in issue(#3479 ) as an example: The general results are as follows: ``` GET table/_doc/_search {"query":{"match_all":{}},"stored_fields":"_none_","docvalue_fields":["k1"],"sort":["_doc"],"size":100} { "took": 6, "timed_out": false, "_shards": { …… }, "hits": { "total": 3, "max_score": null, "hits": [ { "_index": "table", "_score": null, "sort": [ 0 ] }, { "_index": "table", "_score": null, "fields": { "k1": [ "kkk1" ] }, "sort": [ 0 ] }, { "_index": "table", "_score": null, "sort": [ 0 ] } ] } } ``` But in Doris on ES，Be fetched data parallelly on all shards, and use `filter_path` to reduce the network cost. The process will be as follows: ``` GET table/_doc/_search?preference=_shards:1&filter_path=_scroll_id,hits.hits._source,hits.total,_id,hits.hits._source.fields,hits.hits.fields {"query":{"match_all":{}},"stored_fields":"_none_","docvalue_fields":["k1"],"sort":["_doc"],"size":100} { "hits": { "total": 0 } } GET table/_doc/_search?preference=_shards:2&filter_path=_scroll_id,hits.hits._source,hits.total,_id,hits.hits._source.fields,hits.hits.fields {"query":{"match_all":{}},"stored_fields":"_none_","docvalue_fields":["k1"],"sort":["_doc"],"size":100} { "hits": { "total": 1 } } GET table/_doc/_search?preference=_shards:3&filter_path=_scroll_id,hits.hits._source,hits.total,_id,hits.hits._source.fields,hits.hits.fields {"query":{"match_all":{}},"stored_fields":"_none_","docvalue_fields":["k1"],"sort":["_doc"],"size":100} { "hits": { "total": 1, "hits": [ { "fields": { "k1": [ "kkk1" ] } } ] } } ``` Scan-Worker On BE which processed result of shard2 will failed. The reasons are as follows: 1. "filter_path" causes the hits.hits object not exist. 2. In the current implementation, if there are some data rows（total > 0）, the hits.hits. object must be an array How To Fix it Two Method: 1. modify "filter_path" to contain the hits. Pros: Fixed Code is very simple Cons: More network cost 2. Deal with the case where fields are missing in a batch. Pros: No loss of performance Cons: Code is more complex Performance first, I use Method2. Design 1. Add a variable "_doc_value_mode" into Class "EsScrollParser" to =indicate whether the data processed by this parser is doc_value_mode or not. 2. "_doc_value_mode" is passed from ESScollReader <- ESScanner <- ScrollQueryBuilder::build() that determines whether DSL is enable doc_value_mode 3. When hits.hits of response from ES is empty and total > 0. We know there are data lines, but the corresponding fields do not exist. EsScrollParser will use "_doc_value_mode" and _total to construct _total lines which fields are assigned with 'NULL'	2020-05-11 15:34:12 +08:00
Yunfeng,Wu	a467c6f81f	[ES Connector] Add field context for string field keyword type (#3305 ) This PR is just a transitional way，but it is better to move the predicates transformation from Doris BE to Doris BE, in this way, Doris BE is responsible for fetching data from ES. Add a `enable_keyword_sniff ` configuration item in creating External Elasticsearch Table ，it default to true , would to sniff the `keyword` type on the `text analyzed` Field and return the `json_path` which substitute the origin col name. ``` CREATE EXTERNAL TABLE `test` ( `k1` varchar(20) COMMENT "", `create_time` datetime COMMENT "" ) ENGINE=ELASTICSEARCH PROPERTIES ( "hosts" = "http://10.74.167.16:8200", "user" = "root", "password" = "root", "index" = "test", "type" = "doc", "enable_keyword_sniff" = "true" ); ``` note: `enable_keyword_sniff` default to "true" run this SQL： ``` select * from test where k1 = "wu yun feng" ``` Output predicate DSL： ``` {"term":{"k1.keyword":"wu yun feng"}} ``` and in this PR, I remove the elasticsearch version detected logic for now this is useless, maybe future is needed.	2020-04-13 23:07:33 +08:00
Yunfeng,Wu	614a76beea	[Doris on ES] Support compound_and predicate push down to Elasticsearch (#3277 ) Relate Issue: https://github.com/apache/incubator-doris/issues/3248 SQL: ``` select * from test where (k2 = 6 and k3 = 1) or (k2 = 2 and k3 =3 and k4 = 'beijing'); ``` Output filter: ``` ((#k2:[6 TO 6] #k3:[1 TO 1]) (#(#k2:[2 TO 2] #k3:[3 TO 3]) #k4:beijing))~1 ``` SQL: ``` select * from test where (k2 = 6 or k3 = 7) or (k2 = 2 and k3 =3 and (k4 = 'beijing' or k4 = 'zhaochun')); ``` Output filter: ``` (k2:[6 TO 6] k3:[7 TO 7] (#(#k2:[2 TO 2] #k3:[3 TO 3]) #((k4:beijing k4:zhaochun)~1)))~1 ``` SQL: ``` select * from test where (k2 = 6 or k3 = 7) or (k2 = 2 and abs(k3) =3 and (k4 = 'beijing' or k4 = 'zhaochun')); ``` Output filter (`abs` can not be pushed down to es, so doris on es would not process this scenario ): ``` match_all ```	2020-04-08 21:09:39 +08:00
令狐少侠	fd492e3b6f	[Doris on ES] Support escape character (#2865 )	2020-02-13 11:32:48 +08:00
令狐少侠	d05768ffd4	Fix core when es_scanner_node exit (#2634 )	2020-01-02 16:30:11 +08:00
Yunfeng,Wu	5a3f71dd6b	Push limit to Elasticsearch external table (#2400 )	2019-12-07 21:13:44 +08:00
Yunfeng,Wu	0f00febd21	Optimize Doris On Elasticsearch performance (#2237 ) Pure DocValue optimization for doris-on-es Future todo: Today, for every tuple scan we check if pure_docvalue is enabled, this is not reasonable, should check pure_docvalue enabled for one whole scan outside, I will add this todo in future	2019-12-04 12:57:45 +08:00
shgxwxl	c2de62d6a1	Collect scanner's status when es_http_scan_node close (#1861 )	2019-09-25 12:20:13 +08:00
ZHAO Chun	9d03ba236b	Uniform Status (#1317 )	2019-06-14 23:38:31 +08:00
EmmyMiao87	79ab7f4413	Change label of broker load txn (#1134 ) * Change label of broker load txn 1. put broker load label into txn label 2. fix the bug of `label is already used` 3. fix partition error of new broker load * Fix count error in mini load and broker load There are three params (num_rows_load_total, num_rows_load_filtered, num_rows_load_unselected) which are used to count dpp.norm.ALL and dpp.abnorm.ALL. num_rows_load_total is the number rows of source file. num_rows_load_unselected is the not satisfied (where conjuncts) rows of num_rows_load_total num_rows_load_filtered is the rows (quality not good enough) of (num_rows_load_total-num_rows_load_unselected)	2019-05-10 16:53:46 +08:00
lide	9c82d41981	Support Doris query ES by HTTP way (#925 )	2019-04-28 17:14:44 +08:00

30 Commits