doris

Author	SHA1	Message	Date
Ashin Gau	fc70179acb	[multi-catalog](fix) the eof of lazy read columns may be not equal to the eof of predicate columns (#14212 ) Fix three bugs: 1. The EOF of lazy read columns may be not equal to the EOF of predicate columns. (for example: If the predicate column has 3 pages, with 400 rows for each, but the last page is filtered by page index. When batch_size=992, the EOF of predicate column is true. However, we should set batch_size=800 for lazy read column, so the EOF of lazy read column may be false.) 2. The array column does not count the number of nulls 3. Generate wrong NullMap for array column	2022-11-14 14:37:21 +08:00
Mingyu Chen	7eed5a292c	[feature-wip](multi-catalog) Support hive partition cache (#14134 )	2022-11-14 14:12:40 +08:00
AlexYue	15eb07b829	[BugFix](file cache) don't clean clone dir when doing _gc_unused_file_caches (#14194 ) * use another file_size overload for noexcept * don't gc clone dir * use better status	2022-11-14 11:35:08 +08:00
Adonis Ling	7bb3792d51	[chore](build) Split the compliation units to build them in parallel (#14232 )	2022-11-14 10:57:10 +08:00
pengxiangyu	d55faa7f6a	[feature](remote)Only query can use local cache when reading remote files. (#13865 ) When calling select on remote files, download cache files to local disk. When calling alter table on remote files, read files directly from remote storage. So if tablet is too large, it will not take up too many local disk when creating local cache file.	2022-11-14 10:30:15 +08:00
zhengyu	24b51b9035	[fix](compaction) segcompaction coredump if the rowset starts with a big segment (#14174 ) (#14176 ) Signed-off-by: freemandealer <freeman.zhang1992@gmail.com> Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2022-11-14 09:54:08 +08:00
starocean999	139c4a77f1	[enhancement](be)close ExecNode ASAP to release resource earlier (#14203 )	2022-11-14 09:41:35 +08:00
plat1ko	a179b22937	[fix](schema) Release memory of TabletSchemaPB in RowsetMetaPB #13993	2022-11-14 08:36:30 +08:00
Xinyi Zou	3bc26f773d	[hotfix](memtracker) Fix expired `DCHECK(_limit != -1);` and segment_meta_mem_tracker inelegant end (#14223 )	2022-11-13 17:15:29 +08:00
zhannngchen	72748c229a	update (#14215 )	2022-11-13 12:31:42 +08:00
Xin Liao	33b50860c7	[improvement](load) release load channel actively when error occurs (#14218 )	2022-11-13 12:31:15 +08:00
Xinyi Zou	dd11d5c0a5	[enhancement](memory) Support try catch bad alloc (#14135 )	2022-11-13 11:22:56 +08:00
zhannngchen	7682c08af0	[improvement](load) reduce memory in batch for small load channels (#14214 )	2022-11-12 22:14:01 +08:00
luozenglin	376b4fda9f	[fix](scankey) fix extended scan key errors. (#14200 ) Issue Number: close #14199	2022-11-12 20:44:09 +08:00
xy720	035657c5a1	[typo](comment) Fix a lot of spell errors in be comments (#14208 ) fix typos.	2022-11-12 16:06:15 +08:00
lihangyu	43490a33a5	[feature-array](array-type) Add array function array_with_constant (#14115 ) Return array of constants with length num. ``` mysql> select array_with_constant(4, 1223); +------------------------------+ \| array_with_constant(4, 1223) \| +------------------------------+ \| [1223, 1223, 1223, 1223] \| +------------------------------+ 1 row in set (0.01 sec) ``` co-authored-by @eldenmoon	2022-11-11 22:08:43 +08:00
Yixi Zhang	0ba13af8ff	[feature](running_difference) support running_difference function (#13737 )	2022-11-11 21:22:56 +08:00
Adonis Ling	28ae281936	[chore](cmake) Fix wrong statements (#14187 )	2022-11-11 18:22:49 +08:00
Xin Liao	43f80e2633	[enhancement](load) Increase batch size of node channel to improve import performance (#13912 )	2022-11-11 18:05:36 +08:00
Gabriel	fe2944d56d	[Bug](nljoin) Keep compatibility for nljoin (#14182 )	2022-11-11 15:54:55 +08:00
HappenLee	74a1e28af3	[Opt](exec) prevent the scan key split whole range (#14088 ) prevent the scan key split whole range	2022-11-11 15:46:00 +08:00
Gabriel	02a86d2215	[Bug](runtimefilter) Fix concurrent bug in runtime filter #14177 For runtime filter, signal will be called by a thread which is different from the await thread. So there will be a potential race for variable is_ready	2022-11-11 14:16:18 +08:00
abmdocrt	b6ba654f5b	[Feature](Sequence) Support sequence_match and sequence_count functions (#13785 )	2022-11-11 13:38:45 +08:00
Adonis Ling	118a7dff07	[chore](build) Optimize the compilation time (#14170 ) Currently, it takes too much time to build BE from source in workflow environments (P0/P1) which affects the efficiency of daily development. We can measure the time by executing the following command. time EXTRA_CXX_FLAGS='-O3' BUILD_TYPE=ASAN ./build.sh --be --fe --clean -j "$(nproc)" This PR optimizes the compilation time by exploiting the following methods. Reduce the codegen by removing some useless std::visit. Disable the optimization for some template functions which are instantiated by std::visit conditionally (except for the RELEASE build).	2022-11-11 12:09:54 +08:00
Xin Liao	883dfa38ab	[fix](decimal) change log fatal to log warning to avoid code dump on decimal type (#14150 )	2022-11-11 11:22:41 +08:00
Gabriel	d204c7dc1e	[Improvement](profile) Improve readability for runtime filters in profile string (#14165 ) * [Improvement](profile) Improve readability for runtime filters in profile string * update	2022-11-11 11:19:24 +08:00
Lightman	1f9fb4dc8b	[Bugfix] Fix upgrade from 1.1 coredump (#14163 ) When upgrade from 1.1 to master, and then rollback to 1.1, and upgrade to master again, BE will coredump because some rowsets has schema and some rowsets has no schema. In the first time upgrade from 1.1, BE will flush schema in all rowsets and after rollback to 1.1, BE do compaction, and create some new rowset without schema. And the second time upgrade from 1.1, BE coredump because some conditions depend on having all or none of the rowsets.	2022-11-11 10:29:34 +08:00
Zhengguo Yang	12652ebb0e	[UDF](java udf) using config to enable java udf instead of macro at compile time (#14062 ) * [UDF](java udf) useing config to enable java udf instead of macro at compile time	2022-11-11 09:03:52 +08:00
Gabriel	1ef85ae1f2	[Improvement](join) Support nested loop outer join (#13965 )	2022-11-10 19:50:46 +08:00
Ashin Gau	6bd5378f66	[feature-wip](multi-catalog) lazy read for ParquetReader (#13917 ) Read predicate columns firstly, and use VExprContext(push-down predicates) to generate the select vector, which is then applied to read the non-predicate columns. The data in non-predicate columns may be skipped by select vector, so the value-decode-time can be reduced. If a whole page can be skipped, the decompress-time can also be reduced.	2022-11-10 16:56:14 +08:00
Zhengguo Yang	724cf1cdb8	[chore][build] add instructions to build version string (#14067 )	2022-11-10 16:23:34 +08:00
Pxl	0e26f28bf2	[Enhancement](runtime-filter) enlarge runtime filter in predicate threshold (#13581 ) enlarge runtime filter in predicate threshold	2022-11-10 15:48:46 +08:00
Xinyi Zou	a73f4dfdc1	[fix](memtracker) Fix scanner thread ending after fragment thread causing mem tracker null pointer #14143	2022-11-10 15:42:53 +08:00
xueweizhang	90bfd87660	[feature](function) add new function uuid() (#14092 )	2022-11-10 14:55:41 +08:00
Gabriel	184cee2d2b	[Bug](outfile) Fix wrong decimal format for ORC (#14124 )	2022-11-10 11:01:30 +08:00
Tiewei Fang	43eb946543	[feature](table-valued-function)S3 table valued function supports parquet/orc/json file format #14130 S3 table valued function supports parquet/orc/json file format. For example: parquet format	2022-11-10 10:33:12 +08:00
Jerry Hu	10df61b5bf	[improvement](join) Share hash table in fragments for broadcast join (#13921 )	2022-11-10 09:48:34 +08:00
zhangstar333	df622d8b7d	[Bug](udf) fix java-udaf process string type error and add some tests (#14106 )	2022-11-10 09:30:57 +08:00
Xin Liao	3690c4dbe7	[fix](load) fix that load channel failed to be released in time (#14119 )	2022-11-09 22:38:08 +08:00
Pxl	794a551b0f	[Enhancement][fix](profile)() modify some profiles (#14074 ) 1. add RemainedDownPredicates 2. fix core dump when _scan_ranges is empty 3. fix invalid memory access on vLiteral's debug_string() 4. enlarge mv test wait time	2022-11-09 21:59:28 +08:00
camby	322ac5cf89	[refractor](array) refractor DataTypeArray from_string (#13905 ) refractor DataTypeArray from_string, make it more clear; support ',' and ']' inside string element, for example: ['hello,,,', 'world][]'] support empty elements, such as [,] ==> [0,0] Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-11-09 16:58:08 +08:00
camby	f912d4e392	[fix](compile) fix compile error #14103 Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-11-09 14:10:06 +08:00
WenYao	e692636b4f	[performance-wip] (vectorization) Opt HashJoin Performance (#12390 )	2022-11-09 14:07:49 +08:00
Gabriel	a3c5fa8c01	[Compile](join) Boost compiling and linking (#14081 )	2022-11-09 11:27:46 +08:00
ChPi	55ca810445	[fix](Vectorized)fix json_object and json_array function return wrong result on vectorized engine (#13775 ) Issue Number: close #13598	2022-11-09 11:26:55 +08:00
Kang	aec214b4b0	[bug](ColumnDecimal)call set_decimalv2_type when cloning ColumnDecimal (#14061 ) * call set_decimalv2_type when cloning ColumnDecimal * clang format	2022-11-09 11:23:43 +08:00
Adonis Ling	291fa499e9	[fix](JSON) Fail to parse JSONPath (libc++) (#13941 )	2022-11-09 08:58:01 +08:00
zhengyu	6a1c7fac9d	[enhancement](load) shrink reserved buffer for page builder (#14012 ) (#14014 ) * [enhancement](load) shrink reserved buffer for page builder (#14012) For table with hundreds of text type columns, flushing its memtable may cost huge memory. These memory are consumed when initializing page builder, as it reserves 1MB for each column. So memory consumption grows in proportion with column number. Shrinking the reservation may reduce memory substantially in load process. Signed-off-by: freemandealer <freeman.zhang1992@gmail.com> * response to the review Signed-off-by: freemandealer <freeman.zhang1992@gmail.com> * Update binary_plain_page.h * Update binary_dict_page.cpp * Update binary_plain_page.h Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2022-11-09 08:40:07 +08:00
Mingyu Chen	cd8f0713ea	[refactor](new-scan) remove old vectorized scan node (#14029 )	2022-11-09 08:39:20 +08:00
Kang	151842a1fe	[feature](inverted index)WIP inverted index api: SQL syntax and metadata (#13430 ) Introduce a SQL syntax for creating inverted index and related metadata changes. ``` -- create table with INVERTED index CREATE TABLE httplogs ( ts datetime, clientip varchar(20), request string, status smallint, size int, INDEX idx_size (size) USING INVERTED, INDEX idx_status (status) USING INVERTED, INDEX idx_clientip (clientip) USING INVERTED PROPERTIES("parser"="none") ) DUPLICATE KEY(ts) DISTRIBUTED BY RANDOM BUCKETS 10 -- add an INVERTED index to a table CREATE INDEX idx_request ON httplogs(request) USING INVERTED PROPERTIES("parser"="english"); ```	2022-11-08 23:46:53 +08:00

1 2 3 4 5 ...

3181 Commits