doris

Author	SHA1	Message	Date
Xinyi Zou	b194a7cf83	[improvement](memory) Support GC segment cache, when memory insufficient (#16987 ) fix segment cache memory tracker statistics support GC	2023-02-22 18:31:20 +08:00
DuRipeng	e65a061256	[Enhancement](datetimev2-enhance) support 'microseconds_add' function for datetimev2 (#16970 ) support 'microseconds_add' function for datetimev2	2023-02-22 17:49:41 +08:00
Xin Liao	0b624d282d	[enhancement](ut) add merge-on-write ut code back (#16939 )	2023-02-22 16:29:15 +08:00
plat1ko	66ceab540a	[fix](replica) Fix inconsistent replica id between BE and FE in corner case of tablet rebalance (#16889 )	2023-02-22 16:21:11 +08:00
Adonis Ling	0b3e18d060	[chore](macOS) Support LLVM Clang 15 (#16991 ) Remove the deprecated classes std::codecvt_utf8_utf16<char16_t> and std::wstring_convert. Use libiconv to convert UTF-8 strings to UTF-16LE ones.	2023-02-22 15:04:48 +08:00
zhannngchen	3636d0a561	[feature](merge-on-write) add DCHECK in compaction to detect data inconsistency (#16564 ) MoW will mark all duplicate primary key as deleted, so we can add a DCHECK while compaction, if MoW's delete bitmap works incorrectly, we're able to detect this kind of issue ASAP. In Debug version, DCHECK will make BE crush, in release version, compaction will fail and finally load will fail due to -235	2023-02-22 14:59:18 +08:00
chenlinzhong	0e3be4eff5	[Improvement](brpc) Using a thread pool for RPC service avoiding std::mutex block brpc::bthread (#16639 ) mainly include: - brpc service adds two types of thread pools. The number of "light" and "heavy" thread pools is different Classify the interfaces of be. Those related to data transmission are classified as heavy interfaces and others as light interfaces - Add some monitoring to the thread pool, including the queue size and the number of active threads. Use these - indicators to guide the configuration of the number of threads	2023-02-22 14:15:47 +08:00
zxealous	29c46d6926	[fix](struct-type) fix be core when load array orc file (#16978 ) * fix be core when load array orc file	2023-02-22 10:15:39 +08:00
Adonis Ling	4cb97b6fb7	[chore](macOS) Fix linkage errors for the release build (#17002 ) Issue Number: close #17003 ## Problem summary The linker couldn't find some symbols because the implementation of a template member function doris::vectorized::Decoder::init_decimal_converter is missing in the header file in which the corresponding declaration is placed.	2023-02-22 10:01:51 +08:00
plat1ko	52f9e03eea	[fix](cooldown) Use `pending_remote_rowsets` to avoid deleting rowset files being uploaded (#16803 )	2023-02-21 21:58:20 +08:00
zhengyu	09d41c3479	[fix](log) clarify error msg for tablet writer write failure (#14078 ) (#16954 ) (#16950 ) fmt::format dosen't support non-template object as args, even if it implements `to_string()` or `operator<<`. so orignal code may cause `false` to be printed instead of real cause of the failure. So to_string() need to be manually invoked. Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2023-02-21 19:42:49 +08:00
HappenLee	f37da6e789	[Function](vec) use const column to opt function current_time() (#16953 )	2023-02-21 16:26:35 +08:00
YueW	879a729afb	[improve](inverted index) not apply inverted index on 'in' or 'not_in' predicate which is produced by runtime_filter (#16952 ) When there are multi-table join query, there will be many in or not_in predicate of runtime filter pushed down to the storage layer. According to our test, if apply those predicates by inverted index, the performance will be degraded because there are many conditions in in_predicate. Therefore, the inverted index not apply on in or not_in predicate which is produced by runtime_filter. Based on that situation, this pr will do: not apply inverted index on in or not_in predicate which is produced by runtime_filter.	2023-02-21 14:24:50 +08:00
TengJianPing	6f94e84da7	[improvement](memory) fix possible double free in vcollect iterator (#16875 ) This code in VCollectIterator::build_heap is possible to cause double free if cumu_iter->init() fails and returns early, becuase some LevelIterator* exists both in VCollectIterator::_children and cumu_iter::_children.	2023-02-21 14:18:04 +08:00
TengJianPing	5ec8c51366	[fix](union iterator) fix bug that result data order of VUnionIterator is different (#16938 ) Fix bug of #16680, data order of VUnionIterator outout block is changed, which will impact compaction.	2023-02-21 14:17:21 +08:00
Mingyu Chen	491d269412	[fix](tvf) fix bug that failed to get schema of tvf when file is empty (#16928 ) In previous implementation, when querying tvf, FE will get schema from BE. And BE will try to open the first file to get its schema info, but for orc or parquet format, if the file is empty, it will return error. But even for an empty file, we can still get schema info from file's footer. So we should handle the empty file to get schema info correctly. Also modify the catalog doc to add some FAQ.	2023-02-21 14:14:32 +08:00
Mingyu Chen	c0bb2e33a8	[improvement](scan) separate scanner into local and remote scanner pool (#16891 ) There are 2 kinds for scanner thread pool, local and remote. Local is for local file read, specially for olap scanner. Remote is for other external data source, such as file scanner, jdbc scanner. This PR mainly changes: For olap scanner, use cold or hot rowset to decide whether to use local or remote pool. For other scanner, user remote pool by default. Add a new BE config doris_max_remote_scanner_thread_pool_thread_num, default is 512, indicate the max thread number of the remote scanner thread pool This will alleviate the problem of interaction between olap queries with load job and external queries.	2023-02-21 14:13:09 +08:00
lihangyu	113023fb86	(Enhancement)[load-json] support simdjson in new json reader (#16903 ) be config: enable_simdjson_reader=true related PR #11665	2023-02-21 11:31:00 +08:00
yixiutt	4522aeb74a	[improvement](MOW) use shared_lock when get load info in publish txn (#16874 )	2023-02-21 10:14:40 +08:00
Jerry Hu	08adf914f9	[improvement](vec) avoid creating a new column while filtering mutable columns (#16850 ) Currently, when filtering a column, a new column will be created to store the filtering result, which will cause some performance loss。 ssb-flat without pushdown expr from 19s to 15s.	2023-02-21 09:47:21 +08:00
yiguolei	e04c13b7a6	[enhancement](exception safe) make function state exception safe (#16771 )	2023-02-20 23:01:45 +08:00
Qi Chen	a46941c684	[Fix](multi-catalog) Fix switch-case fall-through issue in multi-catalog module. (#16931 ) Fix switch-case fall-through issue in multi-catalog module.	2023-02-20 21:35:41 +08:00
lihangyu	a1799e5506	[improve](point query) reuse rowset from lookup_row_key to eliminate tablet lock (#16770 ) Reuse rowset for 2 reasons: 1. eliminate tablet lock for performance issue, if other thread hold the lock too long could affect point query latency 2. rowset should be acquired during lookup procedure	2023-02-20 18:38:11 +08:00
airborne12	83ab29fd56	[Fix](inverted index) fix compound directory unlock problem (#16861 ) In DorisCompoundDirectory::FSIndexInput::close, use lock_guard to automatic unlock, or it may cause lock leak.	2023-02-20 18:29:39 +08:00
ElvinWei	f32cd2c123	[fix](statistics) fix a problem with histogram statistics collection parameters (#16918 ) 1. Fixed a problem with histogram statistics collection parameters. 2. Solved the problem that it takes a long time to collect histogram statistics. TODO: Optimize histogram statistics sampling method and make the sampling parameters effective. The problem is that the histogram function works as expected in the single-node test, but doesn't work in the multi-node test. In addition, the performance of the current support sampling to collect histogram is low, resulting in a large time consumption when collecting histogram information. Fixed the parameter issue and temporarily removed support for sampling to speed up the collection of histogram statistics. Will next support sampling to collect histogram information.	2023-02-20 16:33:18 +08:00
Xin Liao	c98a0bf803	[Enchancement](merge-on-write) check the correctness of rowid conversion after compaction (#16689 ) MoW updates the delete bitmap of the imported data during the compaction by rowid conversion. The correctness of rowid conversion is very important to the result of delete bitmap. So I add a rowid conversion result check.	2023-02-20 16:27:18 +08:00
Xin Liao	3a5e8f83e8	[fix](merge-on-write) fix that be may coredump when sequence column is null (#16832 ) To facilitate the use of the primary key index, encode the seq column to the minimum value of the corresponding length when the seq column is null.	2023-02-20 16:25:52 +08:00
airborne12	a3aceab72b	[Fix](inverted index) fix inverted index bkd reader memory leak problem (#16885 ) Original implementation of get_bkd_reader's raw pointer usage may cause memory leak problem, use shared_ptr to avoid that.	2023-02-20 15:39:04 +08:00
Qi Chen	ef2fdb79bb	[Improvement](parquet-reader) Optimize and refactor parquet reader to improve performance. (#16818 ) Optimize and refactor parquet reader to improve performance. - Improve 2x performance for small dict string by aligned copying. - Refactor code to decrease condition(if) checking. - Don't call skip(0). - Don't read page index if no condition. ssb-flat-100: (single-machine, single-thread) \| Query \| before opt \| after opt \| \| ------------- \|:-------------:\| ---------:\| \| SELECT count(lo_revenue) FROM lineorder_flat \| 9.23 \| 9.12 \| \| SELECT count(lo_linenumber) FROM lineorder_flat \| 4.50 \| 4.36 \| \| SELECT count(c_name) FROM lineorder_flat \| 18.22 \| 17.88\| \| SELECT count(lo_shipmode) FROM lineorder_flat \|10.09 \| 6.15\|	2023-02-20 11:42:29 +08:00
Pxl	2bc014d83a	[Enchancement](function) remove unused params on aggregate function (#16886 ) remove unused params on aggregate function	2023-02-20 11:08:45 +08:00
Xin Liao	46d5cca661	[fix](merge-on-write) The delete bitmap of the currently imported rowset is not persistent (#16859 )	2023-02-20 11:02:41 +08:00
zhannngchen	b7d2bec8ea	[fix](merge-on-write) add check for segment num (#14032 )	2023-02-20 11:01:34 +08:00
ZhaoChangle	e958b13747	[Exec] Add conjection for union_node. (#16777 )	2023-02-20 10:48:58 +08:00
zhangstar333	5291f14aff	[vectorized](udf) java udf support array type (#16841 )	2023-02-20 10:00:25 +08:00
Xinyi Zou	2074b83c67	[enhancement](third-party) Upgrade JEMalloc version from 5.2.1 to 5.3.0 (#14871 ) https://github.com/jemalloc/jemalloc/releases	2023-02-20 00:00:40 +08:00
Kang	58c51086ca	[bugfix](topn) fix topn read_orderby_key_columns nullptr (#16896 ) The SQL `SELECT nationkey FROM regression_test_query_p0_limit.tpch_tiny_nation ORDER BY nationkey DESC LIMIT 5` make be core dump since dereference a nullptr `read_orderby_key_columns in VCollectIterator::_topn_next`, triggered by skipping _colname_to_value_range init in #16818 . This PR makes two changes: 1. avoid read_orderby_key_columns nullptr in TabletReader::_init_orderby_keys_param 2. return error if read_orderby_key_columns is nullptr unexpected in VCollectIterator::_topn_next to avoid core dump	2023-02-19 23:28:33 +08:00
amory	8b70bfdc31	[Feature](map-type) Support stream load and fix some bugs for map type (#16776 ) 1、support stream load with json, csv format for map 2、fix olap convertor when compaction action in map column which has null 3、support select outToFile for map 4、add some regression-test	2023-02-19 15:11:54 +08:00
zhengshengjun	e2e6a0dd83	[Feature](load) Support mutable property for partition (#16036 ) The background is described in this issue: #15723, where users used Apache Druid to satisfy such lambada requirements before. We will not make Doris dropping data not belonged to current time window automatically like Druid, which is not flexible. We demand a ability to support mutable/immutable partition, the PR works this way: 1. Support mutable property for a partition. 2. The mutable property of a partition is passed from FE to BE in a load procedure 3. If a record's partition is immutable, we mark this row as "un selected" which will not be included in computation of 'max_filter_ratio', so that data write to immutable partition will be neglected and not cause load failure. Use Example: 1. Add immutable partition or modify an partition to be immutable: - alter table test_tbl add [temporary] partition xxx values less than ('xxx') ('mutable' = 'true'); - alter table test_tbl modify partition xx set ('mutable' = 'false'); 2. Write 5 records into table, two of then belongs to immutable partition	2023-02-18 23:09:34 +08:00
ZhaoChangle	d6a841409f	[Enhancement](func)Introduce non_nullable extraction function. #16621 Introduced a new function non_nullable to BE, which can extract concrete data column from a nullable column. If the input argument is already not a nullable column, raise an error.	2023-02-18 20:44:07 +08:00
HappenLee	fda4afecf5	[RegressionTest](Pipeline) Fix pipeline failed in regression test (#16880 ) regression-test/suites/inverted_index_p0/test_add_drop_index_with_data.groovy	2023-02-17 20:49:17 +08:00
pengxiangyu	6a1e3d3435	[fix](cooldown)Fix bug for single cooldown compaction, add remote meta (#16812 ) * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction * fix bug, add remote meta for compaction	2023-02-17 15:13:06 +08:00
Pxl	da147f1d1c	[Chore](build) remove memory_copy and remove some wno build check (#16831 ) * remove memory_copy and remove some wno cbuild check	2023-02-17 14:43:24 +08:00
TengJianPing	ef2130de57	[improvement](memory) fix possible memory leak of vcollect iterator (#16822 ) Logic in function VCollectIterator::build_heap is not robust, which may cause memory leak: Level1Iterator* cumu_iter = new Level1Iterator( cumu_children, _reader, cumu_children.size() > 1, _is_reverse, _skip_same); RETURN_IF_NOT_EOF_AND_OK(cumu_iter->init()); std::list<LevelIterator> children; children.push_back(base_reader_child); children.push_back(cumu_iter); _inner_iter.reset( new Level1Iterator(children, _reader, _merge, _is_reverse, _skip_same)); cumu_iter will be leaked if cumu_iter->init()); is not success.	2023-02-17 14:40:15 +08:00
YueW	30dafd6a44	[improve](inverted index) Add element count limit for inverted index searcher cache (#16758 ) The element in InvertedIndexSearcherCache is inverted index searcher, which is a file descriptor of inverted index file, so InvertedIndexSearcherCache is actually cache file descriptor of inverted index file. If open file descriptor limit of the Linux system is set too small and config inverted_index_searcher_cache_limit is too big, during high pressure load maybe cause "Too many open files". So, when insert inverted index searcher into InvertedIndexSearcherCache, need also check whether reach file_descriptor_number limit for inverted index file.	2023-02-17 11:53:07 +08:00
airborne12	1a9eefebd4	[Fix](inverted index) fix array inverted index error match result when doing schema change add index (#16839 ) There is a bug in inverted_index_writer when adding multiple lines array values' index. This problem can cause error result when doing schema change adding index.	2023-02-17 11:50:39 +08:00
lihangyu	5dfd6d2390	[improve](dynamic table) refine SegmentWriter columns writer generate (#16816 ) * [improve](dynamic table) refine SegmentWriter columns writer generate ``` Dynamic Block consists of two parts, dynamic part of columns and static part of columns static dynamic \| ----- \| ------- \| the static ones are original _tablet_schame columns the dynamic ones are auto generated and extended from file scan. ``` We should only consisder to use Block info to generte columns when it's a dynamic table load procudure. And seperate the static ones and dynamic ones * test	2023-02-17 10:24:33 +08:00
lihangyu	2426d8e6e8	[chore](be-config) set disable_storage_row_cache default true to default disable row cache (#16827 )	2023-02-17 10:21:28 +08:00
Gabriel	3d6077efe0	[pipeline](profile) Support real-time profile report in pipeline (#16772 )	2023-02-17 10:01:34 +08:00
HappenLee	24ef60b491	[Opt](exec) opt aggreate function performance in nullable column	2023-02-16 22:26:12 +08:00
Xin Liao	2a9e748073	[enhancement](merge-on-write) do compaction with merge on read (#16799 ) To avoid data irrecoverable due to delete bitmap calculation error，do compaction with merge on read. Through this way ，even if the delete bitmap calculation is wrong, the data can be recovered by full compaction.	2023-02-16 19:20:15 +08:00

1 2 3 4 5 ...

3849 Commits