doris

Author	SHA1	Message	Date
wxy	6472d5506f	[fix](cache) fix cache overflow problem #14515 (#14516 ) Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>	2022-11-24 11:18:46 +08:00
abmdocrt	70ea07bc4b	[fix](nullable) Fix nullable cache to avoid function returning wrong value (#14463 )	2022-11-24 09:35:08 +08:00
Gabriel	496a92b668	[JavaUDF](loader) Fix compatible problem for JAVA 11 (#14519 )	2022-11-23 23:36:39 +08:00
Gabriel	d14e1d25ff	[Bug](vectorized) Fix wrong column type (#14387 )	2022-11-23 18:07:33 +08:00
starocean999	1520e5c88a	[enhancement](agg)use new method to serialize keys in batch if the key is too large (#14484 ) * [enhancement](agg)use new method to serialize keys in batch if the key is too large * fix compile error	2022-11-23 17:35:39 +08:00
yiguolei	fd3af489a4	[memory](chunkallocator) disable chunkallocator when reserved bytes == 0 (#14494 ) disable chunkallocator when reserved bytes == 0 disable chunkallocator by default	2022-11-23 17:12:53 +08:00
xy720	0074f55f96	[refactor](array-type) Remove encoding info for array type (#14498 ) Array column should not have encoding info because it use its sub columns' encoding info And this encoding info is never used and easy to make people confused. We should remove it.	2022-11-23 11:45:47 +08:00
Xin Liao	3b5f4ad198	[fix](unique-key-merge-on-write) fix that unique key with mow may loss some data in the query result with predicates (#14455 ) When unique key with MOW table has sequence column, the query result may be wrong with predicates. There are two problems: The sequence column needs to be removed from primary key index when comparing key. The sequence column needs to be removed from min/max key.	2022-11-23 09:08:07 +08:00
Adonis Ling	249b688663	[chore](github) Add a workflow to check BE UT on macOS (#14506 )	2022-11-23 08:38:28 +08:00
zhengyu	ab8346560d	[Enhancement](storage) add num_values consistency check when build/load IndexedColumn (#14447 ) (#14450 )	2022-11-22 21:37:08 +08:00
zhangstar333	b04ec41c1d	[Vectorized](udaf) fix java-udaf couldn't get jar core dump (#14393 ) fix java-udaf couldn't get jar core dump	2022-11-22 20:49:02 +08:00
luozenglin	30e1818724	[fix](tracing) fix tracing in the new scan node does not meet expectations (#14155 ) Issue Number: close #14149 - Remove unexpected tracing, like 'vscanner::scan' - Merge span vscannode::get_next	2022-11-22 16:44:02 +08:00
yiguolei	f72c63e4bb	[chore](error status) print error stack when rpc error (#14473 ) Currently, BE will print fail to get master client from cache. host=xxxxx, port=9228, code=THRIFT_RPC_ERROR but we did not know which step generate this error. So that I refactor error status in be and add error stack for RPC_ERROR. W1122 10:19:21.130796 30405 utils.cpp:89] fail to get master client from cache. host=xxxx, port=9228, code=RPC error(error -1): Couldn't open transport for xxxx:9228 (open() timed out)/n @ 0x559af8f774ea doris::Status::ConstructErrorStatus() @ 0x559af9aacbee _ZN5doris16ThriftClientImpl4openEv.cold @ 0x559af97f563a doris::ClientCacheHelper::_create_client() @ 0x559af97f78cd doris::ClientCacheHelper::get_client() @ 0x559af934f38b doris::MasterServerClient::report() @ 0x559af932e7a7 doris::TaskWorkerPool::_handle_report() @ 0x559af932f07c doris::TaskWorkerPool::_report_task_worker_thread_callback() @ 0x559af9b223c5 doris::ThreadPool::dispatch_thread() @ 0x559af9b187af doris::Thread::supervise_thread() @ 0x7f661bd8bea5 start_thread @ 0x7f661c09eb0d __clone Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-11-22 14:29:28 +08:00
Gabriel	1ec7f45fb6	[Bug](avg) Fix `avg` for bigint (#14433 )	2022-11-22 10:29:59 +08:00
yixiutt	63f4b35f95	[bugfix](short_key) fix short key coder for nullable key (#14298 )	2022-11-22 09:27:22 +08:00
Xin Liao	fea9966728	[fix](parquet-orc) fix that be core dump when some columns specified are not in the parquet or orc file (#14440 ) When some columns specified are not in the parquet or orc file in broker load, _batch->num_columns() will less than _num_of_columns_from_file. It will lead to be core dump. To prevent be core dump, just return an error in this case.	2022-11-22 09:10:38 +08:00
zhangstar333	16d8a1853a	[Bug](array-function) array set function not handle all null value (#14318 )	2022-11-22 09:07:43 +08:00
zhengyu	ca486cdfbc	[Enhancement](storage) optimize segment compaction log (#14448 ) (#14449 ) Signed-off-by: freemandealer <freeman.zhang1992@gmail.com> Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2022-11-22 08:43:51 +08:00
Pxl	bcd641877f	[Enhancement](scan) disable build key range and filters when push down agg work (#14248 ) disable build key range and filters when push down agg work	2022-11-21 12:47:57 +08:00
Adonis Ling	ff197b0fa5	[chore](macOS) Fix linker errors (#14410 )	2022-11-21 10:38:36 +08:00
zhannngchen	41dae8b6bb	[improvement](load) add a log when close OlapTableSink with error (#14257 )	2022-11-21 10:33:37 +08:00
Pxl	c18a471303	[Optimize](predicate) update inplace on VcompoundPred (#14402 ) select count(*) from lineorder where lo_orderkey<100000000 OR lo_orderkey>100000000 AND lo_orderkey<200000000 OR lo_orderkey >200000000; 0.6s -> 0.5s	2022-11-21 09:12:30 +08:00
zhannngchen	3e1e8db173	[fix](exec) fix thread token shutdown (#14418 ) Fix Thread pool token was shut down error. This is because when there are more than 1 fragment of a query on one BE, the thread token maybe reset incorrectly, causing thread token shutdown earlier. cherry-pick from master Introduced from #13021	2022-11-20 00:04:48 +08:00
Gabriel	2c42f0a905	[refactor](decimalv3) Refine code for DecimalV3 (#14394 )	2022-11-19 16:57:17 +08:00
Mingyu Chen	512b787559	[fix](parquet-reader) fix stack-use-after-return error (#14411 )	2022-11-19 10:52:50 +08:00
lihangyu	b4aef889f2	[feature-array](array-function) add array constructor function `array()` (#14250 ) * [feature-array](array-function) add array constructor function `array()` ``` mysql> select array(qid, creationDate) from nested_c_2 limit 10; +------------------------------+ \| array(`qid`, `creationDate`) \| +------------------------------+ \| [1000038, 20090616074056] \| \| [1000069, 20090616075005] \| \| [1000130, 20090616080918] \| \| [1000145, 20090616081545] \| +------------------------------+ 10 rows in set (0.01 sec) ```	2022-11-19 10:49:50 +08:00
Xin Liao	a82896f420	[fix](broker-load) fix that broker load don not set be exec version and limit node channel memory (#14399 )	2022-11-18 23:38:37 +08:00
Xinyi Zou	21416f9947	[enhancement](memory) Support Jemalloc metrics and default allocator changed to Jemalloc (#14384 )	2022-11-18 21:02:54 +08:00
carlvinhust2012	eab0af7afe	[optimization](array-type) optimize the export precision of floating point numbers (#14261 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-11-18 18:24:11 +08:00
Pxl	734525de86	[Bug](runtime filter) fix minmax filter not copy rightly on shared hash join (#14367 ) fix minmax filter not copy rightly on shared hash join	2022-11-18 17:52:45 +08:00
starocean999	1f326fc0d6	[enhancement](be)limit mem cost to 16m when pre serialize keys in agg node (#14321 ) * [enhancement](be)limit mem cost to 16m when pre serialize keys in agg node * use only one chunk memory when serializing keys in agg node	2022-11-18 12:31:52 +08:00
Adonis Ling	2b6f85ab96	[chore](macOS) Fix BE UT (#14307 ) #13195 left some unresolved issues. One of them is that some BE unit tests fail. This PR fixes this issue. Now, we can run the command ./run-be-ut.sh --run successfully on macOS.	2022-11-18 10:13:38 +08:00
Xinyi Zou	bd5a593403	[enhancement](memtracker) Use proc/meminfo MemAvailable to control memory and optimize MemTracker log printing (#14335 )	2022-11-17 22:46:07 +08:00
spaces-x	1a035e2073	[fix](profile)(AggNode) fix the GetResultsTime is always zero (#14366 ) add scoped_timer in _serialize_with_serialized_key_result	2022-11-17 22:30:21 +08:00
Gabriel	50bfd99b59	[feature](join) support nested loop semi/anti join (#14227 )	2022-11-17 22:20:08 +08:00
HappenLee	d5af4f6558	[Neried](Profile) Add projection timer for neried (#14286 )	2022-11-17 22:17:55 +08:00
TengJianPing	a382bb95e7	[fix](runtimefilter) fix heap-user-after-free of runtime filter merge (#14362 )	2022-11-17 19:38:45 +08:00
yiguolei	dba19e591c	[cherry-pick](scanner) using avg rowset to calculate batch size instead of using total_bytes since it costs a lot of cpu (#14345 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-11-17 18:57:21 +08:00
slothever	6da2948283	[feature-wip](multi-catalog) support iceberg v2(step 1) (#13867 ) Support position delete(part of).	2022-11-17 17:56:48 +08:00
Mingyu Chen	7182f14645	[improvement][fix](multi-catalog) speed up list partition prune (#14268 ) In previous implementation, when doing list partition prune, we need to generation `rangeToId` every time we doing prune. But `rangeToId` is actually a static data that should be create-once-use-every-where. So for hive partition, I created the `rangeToId` and all other necessary data structures for partition prunning in partition cache, so that we can use it directly. In my test, the cost of partition prune for 10000 partitions reduce from 8s -> 0.2s. Aslo add "partition" info in explain string for hive table. ``` \| 0:VEXTERNAL_FILE_SCAN_NODE \| \| predicates: `nation` = '0024c95b' \| \| inputSplitNum=1, totalFileSize=4750, scanRanges=1 \| \| partition=1/10000 \| \| numNodes=1 \| \| limit: 10 \| ``` Bug fix: 1. Fix bug that es scan node can not filter data 2. Fix bug that query es with predicate like `where substring(test2,2) = "ext2";` will fail at planner phase. `Unexpected exception: org.apache.doris.analysis.FunctionCallExpr cannot be cast to org.apache.doris.analysis.SlotRef` TODO: 1. Some problem when quering es version 8: ` Unexpected exception: Index: 0, Size: 0`, will be fixed later.	2022-11-17 08:30:03 +08:00
Ashin Gau	20634ab7e3	[feature-wip](multi-catalog) support partition&missing columns in parquet lazy read (#14264 ) PR https://github.com/apache/doris/pull/13917 has supported lazy read for non-predicate columns in ParquetReader, but can't trigger lazy read when predicate columns are partition or missing columns. This PR support such case, and fill partition and missing columns in `FileReader`.	2022-11-16 08:43:11 +08:00
camby	3ea9d3f2e1	[enhancement](array) support read list(Array) type from orc file (#14132 ) Before this pr, if we try to load ORC file with native list(or array) type data, the be will crash. Because complex types in ORC file include multi real columns, so we need to filter columns by column names. Otherwise we could not read all columns we need. Now arrow release-7.0.0 only support create stripe reader by column index, so we patch it to support create stripe reader by column names. Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-11-15 17:48:17 +08:00
yixiutt	9d70c531a3	[improvement](publish) fix publish timeout in cocurrent load (#14231 ) In concurrent load, some publish timeout happens occasionally. This is cause by meta lock hold by other thread so publish add increase rowset hang for several seconds. StorageEngine::start_delete_unused_rowset will hold gc_mutex and it cost a lot of time, so that add_used_rowset wait lock, and compaction modify_rowset or other tablet method will hold meta_lock and call add_unused_rowset which will make meta_lock occupied for too long, finally makes publish timeout. In this pr, I copy unused_rowsets in lock and delete these rowset without lock, makes gc_mutex more lightweight so meta lock can be acquired immediately in publish thread. My test shows that no publish timeout in concurrent stream load.	2022-11-15 16:39:38 +08:00
zhangstar333	70cc725649	[Vectorized](function) support avg_weighted/percentile_array/topn_wei… (#14209 ) * [Vectorized](function) support avg_weighted/percentile_array/topn_weighted functions * update add to stringRef	2022-11-15 16:38:38 +08:00
huangzhaowei	5badd70db2	[fix](csv-reader) Fix core dump when load text into doris with special delimiter (#14196 )	2022-11-15 16:06:59 +08:00
starocean999	6d2e6d85d3	[enhancement](be)release memory in Node's close() method (#14258 ) * [enhancement](be)release memory in Node's close() method * format code	2022-11-15 15:59:23 +08:00
Adonis Ling	333c6390ee	[fix](be-ut) AddressSanitizer detects container-overflow issues (#14255 ) * [chore] Fix the container-overflow errors detected by address sanitizer * Fix compilation errors	2022-11-15 15:49:55 +08:00
abmdocrt	f86886f8f5	[Feature](function) Support array_compact function (#14141 )	2022-11-15 14:24:37 +08:00
abmdocrt	6cc5ae077e	[Improvement](Sequence function) Capitalize const variables (#14270 )	2022-11-15 10:41:53 +08:00
Gabriel	215a4c6e02	[Bug](BHJ) Fix wrong result when use broadcast hash join for naaj (#14253 )	2022-11-15 09:40:00 +08:00

1 2 3 4 5 ...

3233 Commits