doris

Author	SHA1	Message	Date
Pxl	16fc3a0e22	[Chore](compile) remove some unused static on inline function to reduce compile time (#17603 ) remove some unused static on inline function to reduce compile time	2023-03-13 11:11:59 +08:00
Pxl	e2ac06d6d6	[Chore](execution) change PipelineTaskState to enum class && remove some row-based code (#17300 ) 1. change PipelineTaskState to enum class 2. remove some row-based code on FoldConstantExecutor::_get_result 3. reduce memcpy on minmax runtime filter function(Now we can guarantee that the input data is aligned) 4. add Wunused-template check, and remove some unused function, change some static function to inline function.	2023-03-08 12:41:15 +08:00
yiguolei	4692d6764c	[refactor](remove string val) remove string val structure, it is same with string ref (#17461 ) remove stringval, decimalv2val, bigintval	2023-03-08 10:42:20 +08:00
yiguolei	9477c48ef8	[refactor](functioncontext) remove duplicate type definition in function context (#17421 ) remove duplicate type definition in function context remove unused method in function context not need stale state in vexpr context because vexpr is stateless and function context saves state and they are cloned. remove useless slot_size in all tuple or slot descriptor. remove doris_udf namespace, it is useless. remove some unused macro definitions. init v_conjuncts in vscanner, not need write the same code in every scanner. using unique ptr to manage function context since it could only belong to a single expr context. Issue Number: close #xxx --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-06 16:07:09 +08:00
ZhaoChangle	e82b827bc8	[optimize](vectorization)Optimize to_string's performance. (#17076 )	2023-03-03 10:35:59 +08:00
HappenLee	3e40467ce6	[Bug](vec) Fix chinese pinyin order by (#17152 ) bug: some chinese word not sort by pinyin in GBK coding CREATE TABLE `test_convert` ( `a` varchar(100) NULL ) ENGINE=OLAP DUPLICATE KEY(`a`) DISTRIBUTED BY HASH(`a`) BUCKETS 3 PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); insert into test_convert values("b"), ("a"), ("c"), ("睿"), ("多"), ("丝"); Query OK, 6 rows affected (0.03 sec) {'label':'insert_ca73a6acc2194d5b_888218a3949355a6', 'status':'VISIBLE', 'txnId':'18068'} mysql [test]>select * from test_convert; +------+ \| a \| +------+ \| a \| \| c \| \| 丝 \| \| b \| \| 多 \| \| 睿 \| +------+ 6 rows in set (0.01 sec) mysql [test]>select * from test_convert order by convert(a using gbk); +------+ \| a \| +------+ \| a \| \| b \| \| c \| \| 多 \| \| 丝 \| \| 睿 \| +------+ 6 rows in set (0.01 sec)	2023-02-28 14:29:56 +08:00
TengJianPing	aab8dad191	[fix](sort) fix bug of sort (#17151 ) The logic of topn and full sort is wrong when there are both offsets and limits, the offset is not considered when doing the max heap optimization, which will lead to wrong result.	2023-02-27 10:55:12 +08:00
HappenLee	a8a5cbb403	[Opt](Hash) Deduce virtual function call is null at in single nullable column (#16650 )	2023-02-14 08:44:12 +08:00
lihangyu	36955a6769	[regression-test](dynamic-table) add regression test for dynamic table (#16656 )	2023-02-14 00:03:19 +08:00
lihangyu	37d1519316	[WIP](dynamic-table) support dynamic schema table (#16335 ) Issue Number: close #16351 Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.	2023-02-11 13:37:50 +08:00
yiguolei	d390e63a03	[enhancement](stream receiver) make stream receiver exception safe (#16412 ) make stream receiver exception safe change get_block(block*) to get_block(block , bool* eos) unify stream semantic	2023-02-07 12:44:20 +08:00
lihangyu	f94a78ab4a	[Fix](topn) fix wrong nullable cast for RowId column and use heapsorter for two phase read (#16399 ) convert_nullable_flags does not contain nullable info for RowID column, but valid_column_ids contain RowID column, nullable falg will be undefined for RowID column	2023-02-03 20:49:45 +08:00
Pxl	5e4bb98900	[Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290 ) enable -Wpedantic and update lowest gcc version to 11.1	2023-02-03 11:28:48 +08:00
TengJianPing	a7b030778a	[fix](sort) fix heap-use-after-free error if sort with limit and is spilled (#16267 )	2023-01-31 09:59:03 +08:00
Jerry Hu	a9671b6dfd	[feature](agg)support two level-hash map in aggregation node (#15967 )	2023-01-30 16:43:33 +08:00
yiguolei	e49766483e	[refactor](remove unused code) remove many xxxVal structure (#16143 ) remove many xxxVal structure remove BetaRowsetWriter::_add_row remove anyval_util.cpp remove non-vectorized geo functions remove non-vectorized like predicate Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-28 14:17:43 +08:00
ZhaoChangle	199d7d3be8	[Refactor]Merged string_value into string_ref (#15925 )	2023-01-22 16:39:23 +08:00
lihangyu	3894de49d2	[Enhancement](topn) support two phase read for topn query (#15642 ) This PR optimize topn query like `SELECT * FROM tableX ORDER BY columnA ASC/DESC LIMIT N`. TopN is is compose of SortNode and ScanNode, when user table is wide like 100+ columns the order by clause is just a few columns.But ScanNode need to scan all data from storage engine even if the limit is very small.This may lead to lots of read amplification.So In this PR I devide TopN query into two phase: 1. The first phase we just need to read `columnA`'s data from storage engine along with an extra RowId column called `__DORIS_ROWID_COL__`.The other columns are pruned from ScanNode. 2. The second phase I put it in the ExchangeNode beacuase it's the central node for topn nodes in the cluster.The ExchangeNode will spawn a RPC to other nodes using the RowIds(sorted and limited from SortNode) read from the first phase and read row by row from storage engine. After the second phase read, Block will contain all the data needed for the query	2023-01-19 10:01:33 +08:00
HappenLee	d5a3e8df3a	[Exec](opt) Opt the vexplode_split function performance (#15945 )	2023-01-17 19:02:57 +08:00
AlexYue	049f8ad2f9	[Bug](sort)fix merge sorter might div zero when block bytes less than block rows (#15859 ) If block bytes are bigger than the corresponding block's rows, then the avg_size_per_row would be zero. Which would end up diving zero in the following logic.	2023-01-13 18:33:40 +08:00
TengJianPing	730571e386	[fix](sort spill) fix bug of failed to create spilled file (#15864 ) Also increase buffered block size when it has started to spill.	2023-01-13 09:23:26 +08:00
TengJianPing	8f31a36429	[feature] support spill to disk for sort node (#15624 )	2023-01-11 08:40:58 +08:00
Jerry Hu	4bbc93b7ce	[refactor](hashtable) simplify template args of partitioned hash table (#15736 )	2023-01-11 08:39:13 +08:00
zclllyybb	c3da5a687a	[fix]fixed dangerous usage of namespace std (#15741 ) Co-authored-by: zhaochangle <zhaochangle@selectdb.com>	2023-01-10 16:10:49 +08:00
Tiewei Fang	f17d69e450	[feature](file cache)Import `file cache` for remote file reader (#15622 ) The main purpose of this pr is to import `fileCache` for lakehouse reading remote files. Use the local disk as the cache for reading remote file, so the next time this file is read, the data can be obtained directly from the local disk. In addition, this pr includes a few other minor changes Import File Cache: 1. The imported `fileCache` is called `block_file_cache`, which uses lru replacement policy. 2. Implement a new FileRereader `CachedRemoteFilereader`, so that the logic of `file cache` is hidden under `CachedRemoteFilereader`. Other changes: 1. Add a new interface `fs()` for `FileReader`. 2. `IOContext` adds some statistical information to count the situation of `FileCache` Co-authored-by: Lightman <31928846+Lchangliang@users.noreply.github.com>	2023-01-10 12:23:56 +08:00
Kang	9d1f02c580	[Improvement](topn) runtime prune for topn query (#15558 )	2023-01-05 20:10:12 +08:00
Gabriel	af54299b26	[Pipeline](projection) Support projection on pipeline engine (#15220 )	2022-12-21 15:47:29 +08:00
Jerry Hu	ef21eea2e8	[fix](pipeline) _valid_element_in_hash_tbl was not set correctly (#15072 )	2022-12-16 18:06:49 +08:00
TengJianPing	8c0e13ab51	[improvement](profile) add detail memory counter for exec nodes (#14806 ) * [improvement](profile) improve accuraccy of memory usage and add detail memory counter * fix	2022-12-05 11:51:52 +08:00
Xinyi Zou	e1f0fa069c	[enhancement](memory) Refactored process memory statistics periodically refresh, and fix catch bad_alloc (#14580 )	2022-11-29 10:15:25 +08:00
starocean999	78adecac1b	[enhancemennt](be)optimize mem usage in join and set node (#14602 )	2022-11-27 13:38:49 +08:00
TengJianPing	ac46922433	[fix](ut) Fix failures for BE UT macOS (#14543 )	2022-11-24 17:39:37 +08:00
TengJianPing	6c7f758ef7	[improvement](hashjoin) support partitioned hash table in hash join (#14480 )	2022-11-24 14:16:47 +08:00
starocean999	1520e5c88a	[enhancement](agg)use new method to serialize keys in batch if the key is too large (#14484 ) * [enhancement](agg)use new method to serialize keys in batch if the key is too large * fix compile error	2022-11-23 17:35:39 +08:00
Gabriel	1ec7f45fb6	[Bug](avg) Fix `avg` for bigint (#14433 )	2022-11-22 10:29:59 +08:00
Gabriel	2c42f0a905	[refactor](decimalv3) Refine code for DecimalV3 (#14394 )	2022-11-19 16:57:17 +08:00
starocean999	1f326fc0d6	[enhancement](be)limit mem cost to 16m when pre serialize keys in agg node (#14321 ) * [enhancement](be)limit mem cost to 16m when pre serialize keys in agg node * use only one chunk memory when serializing keys in agg node	2022-11-18 12:31:52 +08:00
starocean999	6d2e6d85d3	[enhancement](be)release memory in Node's close() method (#14258 ) * [enhancement](be)release memory in Node's close() method * format code	2022-11-15 15:59:23 +08:00
Xinyi Zou	cffdeff4ec	[fix](memory) Fix memory leak by calling boost::stacktrace (#14269 ) boost::stacktrace::stacktrace() has memory leak, so use glog internal func to print stacktrace. The reason for the memory leak of boost::stacktrace is that a state is saved in the thread local of each thread but not actively released. The test found that each thread leaked about 100M after calling boost::stacktrace. refer to: boostorg/stacktrace#118 boostorg/stacktrace#111	2022-11-15 08:58:57 +08:00
Xinyi Zou	dd11d5c0a5	[enhancement](memory) Support try catch bad alloc (#14135 )	2022-11-13 11:22:56 +08:00
xy720	035657c5a1	[typo](comment) Fix a lot of spell errors in be comments (#14208 ) fix typos.	2022-11-12 16:06:15 +08:00
WenYao	e692636b4f	[performance-wip] (vectorization) Opt HashJoin Performance (#12390 )	2022-11-09 14:07:49 +08:00
Xinyi Zou	0b945fe361	[enhancement](memtracker) Refactor mem tracker hierarchy (#13585 ) mem tracker can be logically divided into 4 layers: 1)process 2)type 3)query/load/compation task etc. 4)exec node etc. type includes enum Type { GLOBAL = 0, // Life cycle is the same as the process, e.g. Cache and default Orphan QUERY = 1, // Count the memory consumption of all Query tasks. LOAD = 2, // Count the memory consumption of all Load tasks. COMPACTION = 3, // Count the memory consumption of all Base and Cumulative tasks. SCHEMA_CHANGE = 4, // Count the memory consumption of all SchemaChange tasks. CLONE = 5, // Count the memory consumption of all EngineCloneTask. Note: Memory that does not contain make/release snapshots. BATCHLOAD = 6, // Count the memory consumption of all EngineBatchLoadTask. CONSISTENCY = 7 // Count the memory consumption of all EngineChecksumTask. } Object pointers are no longer saved between each layer, and the values of process and each type are periodically aggregated. other fix: In [fix](memtracker) Fix transmit_tracker null pointer because phamp is not thread safe #13528, I tried to separate the memory that was manually abandoned in the query from the orphan mem tracker. But in the actual test, the accuracy of this part of the memory cannot be guaranteed, so put it back to the orphan mem tracker again.	2022-11-08 09:52:33 +08:00
Pxl	57a9b0fa65	[Enhancement](chore) remove unused diagnostic (#12337 ) remove unused diagnostic	2022-10-31 19:19:13 +08:00
HappenLee	d2be5096d6	[Revert](mem) revert the mem config cause perfermace degradation (#13526 ) * Revert "[fix](mem) failure of allocating memory (#13414)" This reverts commit 971eb9172f3e925c0b46ec1ffd1a9037a1b49801. * Revert "[improvement](memory) disable page cache and chunk allocator, optimize memory allocate size (#13285)" This reverts commit a5f3880649b094b58061f25c15dccdb50a4a2973.	2022-10-21 08:32:16 +08:00
xy720	f329d33666	[chore](fix) Fix some spell errors in be's comments. #13452	2022-10-20 08:56:01 +08:00
yiguolei	1e42598fe6	[memory](podarray) revert not allocate too much memory in podarray change (#13457 ) revert not allocate too much memory in podarray change	2022-10-19 14:08:44 +08:00
starocean999	ac037e57f5	[fix](sort)the sort expr's nullability property may not be right (#13328 )	2022-10-18 22:09:02 +08:00
Adonis Ling	125def5102	[enhancement](macOS M1) Support building from source on macOS (M1) (#13195 ) # Proposed changes This PR fixed lots of issues when building from source on macOS with Apple M1 chip. ## ATTENTION The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime: 1. Some errors with memory tracker occur when BE (RELEASE) starts. 2. Some UT cases fail. ... Temporarily, the following changes are made on macOS to start BE successfully. 1. Disable memory tracker. 2. Use tcmalloc instead of jemalloc. This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues. ## Use case ```shell ./build.sh -j 8 --be --clean cd output/be/bin ulimit -n 60000 ./start_be.sh --daemon ``` ## Something else It takes around _10+_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the development experience on macOS greatly when we finish the adaptation job.	2022-10-18 13:10:13 +08:00
Gabriel	1d5ba9cbcc	[Improvement](like) Change `like` function to batch call (#13314 )	2022-10-16 16:18:22 +08:00

1 2 3

108 Commits