doris

Author	SHA1	Message	Date
Mingyu Chen	3f2fdd236f	Add scan thread token (#6443 )	2021-08-27 10:56:17 +08:00
Zhengguo Yang	8738ce380b	Add long text type STRING, with a maximum length of 2GB. Usage is similar to varchar, and there is no guarantee for the performance of storing extremely long data (#6391 )	2021-08-18 09:05:40 +08:00
Mingyu Chen	2030c44dba	[Log] Modify some log level on BE side (#6381 )	2021-08-14 10:25:45 +08:00
stdpain	34af66bf1d	[BUG][Memory] fix memory tracker DCHECK fail in debug mode and Fix Process Memory limit fail (#6438 )	2021-08-14 10:24:33 +08:00
HappenLee	9216735cfa	[New Featrue] Support Vectorization Execution Engine Interface For Doris (#6329 ) 1. FE vectorized plan code 2. Function register vec function 3. Diff function nullable type 4. New thirdparty code and new thrift struct	2021-08-11 14:54:06 +08:00
Zhengguo Yang	ed3ff470ce	[ARRAY] Support array type load and select not include access by index (#5980 ) This is part of the array type support and has not been fully completed. The following functions are implemented 1. fe array type support and implementation of array function, support array syntax analysis and planning 2. Support import array type data through insert into 3. Support select array type data 4. Only the array type is supported on the value lie of the duplicate table this pr merge some code from #4655 #4650 #4644 #4643 #4623 #2979	2021-07-13 14:02:39 +08:00
Zhengguo Yang	739c0268ff	[refactor] Remove decimal v1 related code from code base (#6079 ) remove ALL DECIMAL V1 type code ， this is a part of #6073	2021-07-07 10:26:32 +08:00
stdpain	149def9e42	[Feature] Support RuntimeFilter in Doris (BE Implement) (#6077 ) 1. support in/bloomfilter/minmax 2. support broadcast/shuffle/bucket shuffle/colocate join 3. opt memory use and cpu cache miss while build runtime filter 4. opt memory use in left semi join (works well on tpcds-95)	2021-07-04 20:59:05 +08:00
stdpain	1999a0c26b	[optimization] open gcc strict-aliasing optimization (#6034 ) * open gcc strict-aliasing optimization * use -Werror=strick-alias	2021-06-18 11:39:24 +08:00
Xinyi Zou	5748241dab	[Bug-fix] When query cancel, transfer_thread does not continue to schedule scanner_thread (#5768 ) The cause of the problem is that after query cancel, OlapScanNode::transfer_thread still continues to schedule OlapScanNode::scanner_thread until all tasks are scheduled. Although each task does not scan data and exits quickly, it still consumes a lot of resources. (Guess)This may be the cause of the BUG (#5767) causing the I/O to be full. So after query cancel, immediately exit the scheduling loop in transfer_thread, and after waiting for the end of all scanner_threads, transfer_thread will also exit.	2021-05-19 09:26:58 +08:00
Zhengguo Yang	98e80aa65e	[refactor] Replace boost::function with std::function (#5700 ) Replace boost::function with std::function	2021-05-09 22:00:48 +08:00
HappenLee	6ad1bf7d7e	[Bug] Fix dead lock in olap scan node and refactor some code in FE profile (#5713 ) * [Bug] Fix dead lock in olap scan node and refactor some code in FE profile * Add some comment	2021-04-30 10:12:18 +08:00
Zhengguo Yang	c4cc681d14	remove boost_foreach, using c++ foreach instead (#5611 )	2021-04-15 10:52:29 +08:00
xxiao2018	1100a0f3a0	[Profile] Add more timer for scan thread (#5511 ) 1. Add timer to count the time the transfer thread waits for the scaner thread to return rowbatch. 2. Add timer to count the time that the scanner thread waits for the available worker threads in the thread pool. Co-authored-by: chenmingyu <chenmingyu@baidu.com>	2021-03-15 10:07:11 +08:00
stdpain	7eae3e280a	[optimization] use inline optimize ExprContext::get_value (#5385 )	2021-02-16 22:35:14 +08:00
HappenLee	a5298d617d	[Performance Improve] Push Down _conjunctf of 'not in' and '!=' to Storage Engine. (#5207 )	2021-01-23 21:07:01 +08:00
Zhengguo Yang	93a4c7efc1	[LOG] Standardize the use of VLOG in code (#5264 ) At present, the application of vlog in the code is quite confusing. It is inherited from impala VLOG_XX format, and there is also VLOG(number) format. VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG	2021-01-21 12:09:09 +08:00
HuangWei	17d939b789	[Bug] Fix scanner threads heap-use-after-free (#5111 ) Scanner threads may be running and using the member vars of OlapScanNode, when the OlapScanNode has already destroyed. We can use `_running_thread` to be the last accessed member variable. And `transfer_thread` need to wait for `_running_thread==0`. After `transfer_thread` joined, `OlapScanNode::close()` can continue.	2021-01-04 09:28:51 +08:00
HappenLee	9e19b6b133	[Performance Improve] Push Down _conjunct of 'A is NULL' and 'B is not NULL' to Storage Engine. (#5092 ) This patch mainly do the following: - Support #5086 - Refactor ColumnRangeValue to support contain null	2021-01-03 15:45:07 +08:00
Mingyu Chen	81c7c0360e	[Bug] Fix a core dump of counter in BE (#5078 ) Introduced by PR #5051. As @liutang123 said, when PlanFragmentExecutor is destructed, it will call `close -> ExecNode::close -> OlapScanNode::close`. OlapScanNode will wait for `_transfer_thread`. `_transfer_thread` will wait for all OlapScanner processing to complete. OlapScanner is processed by the scanner thread. When the last scanner processing is completed, `_transfer_thread` will break out of the loop, and PlanFragmentExecutor will continue to destruct. And if it is completed, its RuntimeProfile::Counter will also be destructed. At this time, the ScopedTimer in the Scan thread may still use this Counter when it is destructed. So we must make sure that the timer is deconstructed before deconstructing the runtime profile.	2020-12-15 09:33:38 +08:00
HappenLee	0a0e46fd53	[Bug] Fix the bug of where condition a in ('A', 'B', 'V') and a in ('A') return error result (#5072 ) And Refactor ColumnRangeValue and OlapScanNode This patch mainly do the following: - Fix issue #5071 - Change type_min in ColumnRangeValue as static - Add Class of type_limit make code clear - Refactor the function of normalize_in_and_eq_predicate	2020-12-15 09:29:10 +08:00
Lijia Liu	ff4bd1223f	[Profile] Add cpu time cost in query audit (#5051 )	2020-12-13 22:22:15 +08:00
HappenLee	6021d6fc7f	[Performance Optimization] Remove push down conjuncts in olap scan node (#4999 ) Push conjunct to Storage Engine as more as possible olap scan node do not need filter data use push down conjuncts again. fix #4986	2020-12-06 08:50:08 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
Xinyi Zou	66132d2836	[Feature] Running Profile OLAP_SCAN_NODE layering and enhance readability (#4825 ) mainly includes: - `OLAP_SCAN_NODE` profile layering: `OLAP_SCAN_NODE`,`OlapScanner`, and `SegmentIterator`. - Delete meaningless statistical values. mainly in scan_node.cpp. - Increase `RowsConditionsFiltered` statistical, split from `RowsDelFiltered`, the meaning is the number of rows filtered by various column indexes, only in segment V2. - Modify the document based on the above, and enhance readability.	2020-11-11 21:21:25 +08:00
HappenLee	b1c1ffda4a	[Refactor] Refactor olap scan node code (#4823 ) 1. Remove meaningless code in Doris 2. Replace string copy by string reference 3. Simplified the implementation of some functions	2020-11-01 09:12:23 +08:00
Zhengguo Yang	09f97f8a05	[Refactor] Fixes some be typo part 2 (#4747 )	2020-10-20 09:28:57 +08:00
Zhengguo Yang	75e0ba32a1	Fixes some be typo (#4714 )	2020-10-13 09:37:15 +08:00
Mingyu Chen	8b0b120aca	[Profile] Add 2 Segment related metrics in query profile (#4348 ) Total number of segments and filterd number of segment	2020-08-27 12:07:21 +08:00
sduzh	3ce6fc631e	[BUG] Fix wrong result of querying with cast expr in where clause (#4219 )	2020-08-01 17:46:39 +08:00
HuangWei	10f822eb43	[MemTracker] make all MemTrackers shared (#4135 ) We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web. As follows: 1. nearly all MemTracker raw ptr -> shared_ptr 2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent) 3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec before MemTracker's destructor. So we don't change these code. 4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile. Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.	2020-07-31 21:57:21 +08:00
Mingyu Chen	725ebafd99	[Bug] Cancel the query if OlapScanner prepare failed (#4002 )	2020-07-03 21:33:07 +08:00
Mingyu Chen	0cbacaf01d	[Refactor] Replace some boost to std in OlapScanNode (#3934 ) Replace some boost to std in OlapScanNode. This refactor seems solve the problem describe in #3929. Because I found that BE will crash to calling `boost::condition_variable.notify_all()`. But after upgrade to this, BE does not crash any more.	2020-06-29 19:13:03 +08:00
Mingyu Chen	b8ee84a120	[Doc] Add docs to OLAP_SCAN_NODE query profile (#3808 )	2020-06-13 16:25:40 +08:00
Mingyu Chen	27046c5b61	[Enhancement] Improve the performance of query with IN predicate (#3694 ) This CL mainly changes: 1. Add a new BE config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to storage engine. 2. Add 2 new session variables `max_scan_key_num` and `doris_max_scan_key_num` which can set in session level and overwrite the config value in BE.	2020-06-04 11:39:00 +08:00
HappenLee	d0fe7e4d94	[Profile] Make running profile clearer and more intuitive to improve usability (#3405 ) This CL mainly made the following modifications: 1. Delete Invalid MemoryUsed Counter and Add PeakMemUsage in each exec node and datastreamsender 2. Add intent in child execnode profile，make it is easily to know the relationship between execnode 3. Del _is_result_order we not support any more in olap_scan_node.h and olap_scan_node.cpp 4. Add scan_disk method to olap_scanner to fix the counter _num_disks_accessed_counter 5. Now we do not use buffer pool to read and write disk, so annotation eadio counter and 6. Delete the MemUsed counter in exec node.	2020-04-30 14:57:21 +08:00
HappenLee	4eb27bc7e3	[Profile] Make running profile clearer and more intuitive to improve usability (#3365 ) (#3383 ) This CL mainly made the following modifications: 1. Delete Invalid method in Running Profile Class. 2. Move Memlimit Counter from blockmgr to fragment and add PeakMemUsage Counter 3. Fix the bug of buffer pool memlimit counter 4. Call compute_time_in_profile() before pretty_print() to show the _local_time_percent without child running profile 5. Add TransferThread ThreadToken count in AveThreadToken Counter	2020-04-24 21:38:55 +08:00
trueeyu	839ec45197	Remove llvm relative code from be/src/exec (#2955 ) Remove unused LLVM related codes of directory:be/src/exec (#2910) there are many LLVM related codes in code base, but these codes are not really used. The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris. The PR delete all LLVM related code of directory: be/src/exec.	2020-02-20 20:43:26 +08:00
lichaoyong	1cf0fb9117	Use ThreadPool to refactor MemTableFlushExecutor (#2931 ) 1. MemTableFlushExecutor maintain a ThreadPool to receive FlushTask. 2. FlushToken is used to seperate different tasks from different tablets. Every DeltaWriter of tablet constructs a FlushToken, task in FlushToken are handle serially, task between FlushToken are handle concurrently. 3. I have remove thread limit on data_dir, because of I/O is not the main timer consumer of Flush thread. Much of time is consumed in CPU decoding and compress.	2020-02-18 18:39:04 +08:00
Lijia Liu	99ad56d1bf	Support bitmap index for more type (#2630 ) For #2589 1. date(uint24_t)/datetime(int64_t)/largeint(int128_t) use frame of reference code as dict. 2. decimal(decimal12_t) also uses frame of reference code as dict. 3. float/double use bitshuffle code as dict.	2020-01-31 21:09:29 +08:00
kangpinghuang	d31f774852	Add block split bloom filter (#2471 ) [STORAGE][SEGMENTV2] use block split bloom filter build bloom filter against data page add distinct value to bloom filter add ordinal index to bloom filter index	2019-12-18 12:57:44 +08:00
kangkaisen	f828670245	Add Bitmap index reader (#2319 ) [STORAGE] [INDEX] For #2061 and #2062 Add bitmap index reader SegmentIterator support bitmap index Add some metrics	2019-12-03 23:01:40 +08:00
EmmyMiao87	42395d2455	Change Null-safe equal operator from cross join to hash join (#2156 ) * Change Null-safe equal operator from cross join to hash join ISSUE-2136 This commit change the join method from cross join to hash join when the equal operator is Null-safe '<=>'. It will improve the speed of query which has the Null-safe equal operator. The finds_nulls field is used to save if there is Null-safe operator. The finds_nulls[i] is true means that the i-th equal operator is Null-safe. The equal function in hash table will return true, if both val and loc are NULL when finds_nulls[i] is true.	2019-11-08 12:43:48 +08:00
kangpinghuang	9c2d149c36	add profile for segment v2 (#2015 )	2019-10-22 09:43:16 +08:00
Mingyu Chen	3c12af4dcc	Limit the memory consumption of broker scan node (#1996 ) If memory exceed limit, no more row batch will be pushed to batch queue	2019-10-17 14:40:16 +08:00
kangkaisen	b246d93128	Avoid SerDe for aggregation query with object pool (#1854 )	2019-09-26 13:51:13 +08:00
ZHAO Chun	0dc0dadad1	Reduce unnecessary memory allocat and copy in OlapScanNode (#1742 )	2019-09-04 21:05:12 +08:00
ZHAO Chun	81ca3e3abf	Free olap scanner out of lock (#1733 ) Close scanner out of OlapScanner's batch lock, which will lead all scanners wait for one scanner to finish.	2019-09-02 16:49:28 +08:00
lichaoyong	0d48a3961c	Refactor Storage Engine (#1478 ) NOTE: This patch would modify all Backend's data. And this will cause a very long time to restart be. So if you want to interferer your product environment, you should upgrade backend one by one. 1. Refactoring be is to clarify the structure the codes. 2. Use unique id to indicate a rowset. Nameing rowset with tablet_id and version will lead to many conflicts among compaction, clone, restore. 3. Extract an rowset interface to encapsulate rowsets with different format.	2019-07-15 21:18:22 +08:00
chenhao	687d57be66	Fix bug that query statistics in audit log are wrong (#1354 )	2019-06-21 19:16:05 +08:00

1 2

66 Commits