doris

Author	SHA1	Message	Date
spaces-x	1a035e2073	[fix](profile)(AggNode) fix the GetResultsTime is always zero (#14366 ) add scoped_timer in _serialize_with_serialized_key_result	2022-11-17 22:30:21 +08:00
Gabriel	50bfd99b59	[feature](join) support nested loop semi/anti join (#14227 )	2022-11-17 22:20:08 +08:00
HappenLee	d5af4f6558	[Neried](Profile) Add projection timer for neried (#14286 )	2022-11-17 22:17:55 +08:00
TengJianPing	a382bb95e7	[fix](runtimefilter) fix heap-user-after-free of runtime filter merge (#14362 )	2022-11-17 19:38:45 +08:00
yiguolei	dba19e591c	[cherry-pick](scanner) using avg rowset to calculate batch size instead of using total_bytes since it costs a lot of cpu (#14345 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-11-17 18:57:21 +08:00
slothever	6da2948283	[feature-wip](multi-catalog) support iceberg v2(step 1) (#13867 ) Support position delete(part of).	2022-11-17 17:56:48 +08:00
Mingyu Chen	7182f14645	[improvement][fix](multi-catalog) speed up list partition prune (#14268 ) In previous implementation, when doing list partition prune, we need to generation `rangeToId` every time we doing prune. But `rangeToId` is actually a static data that should be create-once-use-every-where. So for hive partition, I created the `rangeToId` and all other necessary data structures for partition prunning in partition cache, so that we can use it directly. In my test, the cost of partition prune for 10000 partitions reduce from 8s -> 0.2s. Aslo add "partition" info in explain string for hive table. ``` \| 0:VEXTERNAL_FILE_SCAN_NODE \| \| predicates: `nation` = '0024c95b' \| \| inputSplitNum=1, totalFileSize=4750, scanRanges=1 \| \| partition=1/10000 \| \| numNodes=1 \| \| limit: 10 \| ``` Bug fix: 1. Fix bug that es scan node can not filter data 2. Fix bug that query es with predicate like `where substring(test2,2) = "ext2";` will fail at planner phase. `Unexpected exception: org.apache.doris.analysis.FunctionCallExpr cannot be cast to org.apache.doris.analysis.SlotRef` TODO: 1. Some problem when quering es version 8: ` Unexpected exception: Index: 0, Size: 0`, will be fixed later.	2022-11-17 08:30:03 +08:00
Ashin Gau	20634ab7e3	[feature-wip](multi-catalog) support partition&missing columns in parquet lazy read (#14264 ) PR https://github.com/apache/doris/pull/13917 has supported lazy read for non-predicate columns in ParquetReader, but can't trigger lazy read when predicate columns are partition or missing columns. This PR support such case, and fill partition and missing columns in `FileReader`.	2022-11-16 08:43:11 +08:00
camby	3ea9d3f2e1	[enhancement](array) support read list(Array) type from orc file (#14132 ) Before this pr, if we try to load ORC file with native list(or array) type data, the be will crash. Because complex types in ORC file include multi real columns, so we need to filter columns by column names. Otherwise we could not read all columns we need. Now arrow release-7.0.0 only support create stripe reader by column index, so we patch it to support create stripe reader by column names. Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-11-15 17:48:17 +08:00
yixiutt	9d70c531a3	[improvement](publish) fix publish timeout in cocurrent load (#14231 ) In concurrent load, some publish timeout happens occasionally. This is cause by meta lock hold by other thread so publish add increase rowset hang for several seconds. StorageEngine::start_delete_unused_rowset will hold gc_mutex and it cost a lot of time, so that add_used_rowset wait lock, and compaction modify_rowset or other tablet method will hold meta_lock and call add_unused_rowset which will make meta_lock occupied for too long, finally makes publish timeout. In this pr, I copy unused_rowsets in lock and delete these rowset without lock, makes gc_mutex more lightweight so meta lock can be acquired immediately in publish thread. My test shows that no publish timeout in concurrent stream load.	2022-11-15 16:39:38 +08:00
zhangstar333	70cc725649	[Vectorized](function) support avg_weighted/percentile_array/topn_wei… (#14209 ) * [Vectorized](function) support avg_weighted/percentile_array/topn_weighted functions * update add to stringRef	2022-11-15 16:38:38 +08:00
huangzhaowei	5badd70db2	[fix](csv-reader) Fix core dump when load text into doris with special delimiter (#14196 )	2022-11-15 16:06:59 +08:00
starocean999	6d2e6d85d3	[enhancement](be)release memory in Node's close() method (#14258 ) * [enhancement](be)release memory in Node's close() method * format code	2022-11-15 15:59:23 +08:00
Adonis Ling	333c6390ee	[fix](be-ut) AddressSanitizer detects container-overflow issues (#14255 ) * [chore] Fix the container-overflow errors detected by address sanitizer * Fix compilation errors	2022-11-15 15:49:55 +08:00
abmdocrt	f86886f8f5	[Feature](function) Support array_compact function (#14141 )	2022-11-15 14:24:37 +08:00
abmdocrt	6cc5ae077e	[Improvement](Sequence function) Capitalize const variables (#14270 )	2022-11-15 10:41:53 +08:00
Gabriel	215a4c6e02	[Bug](BHJ) Fix wrong result when use broadcast hash join for naaj (#14253 )	2022-11-15 09:40:00 +08:00
Xinyi Zou	cffdeff4ec	[fix](memory) Fix memory leak by calling boost::stacktrace (#14269 ) boost::stacktrace::stacktrace() has memory leak, so use glog internal func to print stacktrace. The reason for the memory leak of boost::stacktrace is that a state is saved in the thread local of each thread but not actively released. The test found that each thread leaked about 100M after calling boost::stacktrace. refer to: boostorg/stacktrace#118 boostorg/stacktrace#111	2022-11-15 08:58:57 +08:00
zhangstar333	93e5d8e660	[Vectorized](function) support bitmap_from_array function (#14259 )	2022-11-15 01:55:51 +08:00
Ashin Gau	fc70179acb	[multi-catalog](fix) the eof of lazy read columns may be not equal to the eof of predicate columns (#14212 ) Fix three bugs: 1. The EOF of lazy read columns may be not equal to the EOF of predicate columns. (for example: If the predicate column has 3 pages, with 400 rows for each, but the last page is filtered by page index. When batch_size=992, the EOF of predicate column is true. However, we should set batch_size=800 for lazy read column, so the EOF of lazy read column may be false.) 2. The array column does not count the number of nulls 3. Generate wrong NullMap for array column	2022-11-14 14:37:21 +08:00
Mingyu Chen	7eed5a292c	[feature-wip](multi-catalog) Support hive partition cache (#14134 )	2022-11-14 14:12:40 +08:00
AlexYue	15eb07b829	[BugFix](file cache) don't clean clone dir when doing _gc_unused_file_caches (#14194 ) * use another file_size overload for noexcept * don't gc clone dir * use better status	2022-11-14 11:35:08 +08:00
Adonis Ling	7bb3792d51	[chore](build) Split the compliation units to build them in parallel (#14232 )	2022-11-14 10:57:10 +08:00
pengxiangyu	d55faa7f6a	[feature](remote)Only query can use local cache when reading remote files. (#13865 ) When calling select on remote files, download cache files to local disk. When calling alter table on remote files, read files directly from remote storage. So if tablet is too large, it will not take up too many local disk when creating local cache file.	2022-11-14 10:30:15 +08:00
zhengyu	24b51b9035	[fix](compaction) segcompaction coredump if the rowset starts with a big segment (#14174 ) (#14176 ) Signed-off-by: freemandealer <freeman.zhang1992@gmail.com> Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2022-11-14 09:54:08 +08:00
starocean999	139c4a77f1	[enhancement](be)close ExecNode ASAP to release resource earlier (#14203 )	2022-11-14 09:41:35 +08:00
plat1ko	a179b22937	[fix](schema) Release memory of TabletSchemaPB in RowsetMetaPB #13993	2022-11-14 08:36:30 +08:00
Xinyi Zou	3bc26f773d	[hotfix](memtracker) Fix expired `DCHECK(_limit != -1);` and segment_meta_mem_tracker inelegant end (#14223 )	2022-11-13 17:15:29 +08:00
zhannngchen	72748c229a	update (#14215 )	2022-11-13 12:31:42 +08:00
Xin Liao	33b50860c7	[improvement](load) release load channel actively when error occurs (#14218 )	2022-11-13 12:31:15 +08:00
Xinyi Zou	dd11d5c0a5	[enhancement](memory) Support try catch bad alloc (#14135 )	2022-11-13 11:22:56 +08:00
zhannngchen	7682c08af0	[improvement](load) reduce memory in batch for small load channels (#14214 )	2022-11-12 22:14:01 +08:00
luozenglin	376b4fda9f	[fix](scankey) fix extended scan key errors. (#14200 ) Issue Number: close #14199	2022-11-12 20:44:09 +08:00
xy720	035657c5a1	[typo](comment) Fix a lot of spell errors in be comments (#14208 ) fix typos.	2022-11-12 16:06:15 +08:00
lihangyu	43490a33a5	[feature-array](array-type) Add array function array_with_constant (#14115 ) Return array of constants with length num. ``` mysql> select array_with_constant(4, 1223); +------------------------------+ \| array_with_constant(4, 1223) \| +------------------------------+ \| [1223, 1223, 1223, 1223] \| +------------------------------+ 1 row in set (0.01 sec) ``` co-authored-by @eldenmoon	2022-11-11 22:08:43 +08:00
Yixi Zhang	0ba13af8ff	[feature](running_difference) support running_difference function (#13737 )	2022-11-11 21:22:56 +08:00
Xin Liao	43f80e2633	[enhancement](load) Increase batch size of node channel to improve import performance (#13912 )	2022-11-11 18:05:36 +08:00
Gabriel	fe2944d56d	[Bug](nljoin) Keep compatibility for nljoin (#14182 )	2022-11-11 15:54:55 +08:00
HappenLee	74a1e28af3	[Opt](exec) prevent the scan key split whole range (#14088 ) prevent the scan key split whole range	2022-11-11 15:46:00 +08:00
Gabriel	02a86d2215	[Bug](runtimefilter) Fix concurrent bug in runtime filter #14177 For runtime filter, signal will be called by a thread which is different from the await thread. So there will be a potential race for variable is_ready	2022-11-11 14:16:18 +08:00
abmdocrt	b6ba654f5b	[Feature](Sequence) Support sequence_match and sequence_count functions (#13785 )	2022-11-11 13:38:45 +08:00
Adonis Ling	118a7dff07	[chore](build) Optimize the compilation time (#14170 ) Currently, it takes too much time to build BE from source in workflow environments (P0/P1) which affects the efficiency of daily development. We can measure the time by executing the following command. time EXTRA_CXX_FLAGS='-O3' BUILD_TYPE=ASAN ./build.sh --be --fe --clean -j "$(nproc)" This PR optimizes the compilation time by exploiting the following methods. Reduce the codegen by removing some useless std::visit. Disable the optimization for some template functions which are instantiated by std::visit conditionally (except for the RELEASE build).	2022-11-11 12:09:54 +08:00
Xin Liao	883dfa38ab	[fix](decimal) change log fatal to log warning to avoid code dump on decimal type (#14150 )	2022-11-11 11:22:41 +08:00
Gabriel	d204c7dc1e	[Improvement](profile) Improve readability for runtime filters in profile string (#14165 ) * [Improvement](profile) Improve readability for runtime filters in profile string * update	2022-11-11 11:19:24 +08:00
Lightman	1f9fb4dc8b	[Bugfix] Fix upgrade from 1.1 coredump (#14163 ) When upgrade from 1.1 to master, and then rollback to 1.1, and upgrade to master again, BE will coredump because some rowsets has schema and some rowsets has no schema. In the first time upgrade from 1.1, BE will flush schema in all rowsets and after rollback to 1.1, BE do compaction, and create some new rowset without schema. And the second time upgrade from 1.1, BE coredump because some conditions depend on having all or none of the rowsets.	2022-11-11 10:29:34 +08:00
Zhengguo Yang	12652ebb0e	[UDF](java udf) using config to enable java udf instead of macro at compile time (#14062 ) * [UDF](java udf) useing config to enable java udf instead of macro at compile time	2022-11-11 09:03:52 +08:00
Gabriel	1ef85ae1f2	[Improvement](join) Support nested loop outer join (#13965 )	2022-11-10 19:50:46 +08:00
Ashin Gau	6bd5378f66	[feature-wip](multi-catalog) lazy read for ParquetReader (#13917 ) Read predicate columns firstly, and use VExprContext(push-down predicates) to generate the select vector, which is then applied to read the non-predicate columns. The data in non-predicate columns may be skipped by select vector, so the value-decode-time can be reduced. If a whole page can be skipped, the decompress-time can also be reduced.	2022-11-10 16:56:14 +08:00
Zhengguo Yang	724cf1cdb8	[chore][build] add instructions to build version string (#14067 )	2022-11-10 16:23:34 +08:00
Pxl	0e26f28bf2	[Enhancement](runtime-filter) enlarge runtime filter in predicate threshold (#13581 ) enlarge runtime filter in predicate threshold	2022-11-10 15:48:46 +08:00

1 2 3 4 5 ...

3084 Commits