doris

Author	SHA1	Message	Date
Ashin Gau	3ea6478ba8	[feature](multi-catalog) parquet reader support nested array column (#16961 ) Support to decode nested array column in parquet reader: 1. FE should generate the right nested column type. FE doesn't check the nesting depth and legality, like map\<array\<int\>, int\>. 2. `ParquetColumnReader` has removed the filtering of page index to support nested array type. It's too difficult to skip values in nested complex types. Maybe we should support the filtering of page index and lazy read in later PR. 3. `ExternalFileScanNode` has a bug in creating default value expression. 4. Maybe it's slow to read repetition levels in a while loop. I'll optimize this in next PR. 5. Array column has temporary `SchemaElement` in its thrift definition, we have removed them and keep its parent in former implementation. The remaining parent should inherit the repetition and definition level of its child.	2023-02-23 14:54:58 +08:00
Qi Chen	61826e3a77	[Improvement](parquet-reader) Improve performance of parquet reader filter calculation. (#16934 ) Improve performance of parquet reader filter calculation. - Use `filter_data` instead of `(*filter_ptr)` to merge filter to improve performance. - Use mutable column filter func instead of original new column filter func which introduced by #16850. - Avoid column ref-count increasing which caused unnecessary copying by passing column pointer ref.	2023-02-23 14:41:30 +08:00
ZhangYu0123	eb116cd25e	[chore](ui) execute selected code in sql editor. (#16906 ) * ui playground support selection sql to execute * ui playground support selection sql to execute	2023-02-23 14:25:06 +08:00
Tiewei Fang	c2cc75d741	[BugFix](Jdbc Catalog) Fix null pointer exception in JdbcExecutor (#16958 ) This pr do two things: 1. fix: It use `column[0]` to judge class type in JdbcExecutor, but column[0] may be null ! 2. Enhencement In the original logic, all fields in jdbc catalog table will be set Nullable. However, it is inefficient for nullable fields. Actually, we can know if the fields in data source table is nullable through jdbc. So we can set the corresponding fields in Doris jdbc catalog to nullable or not.	2023-02-23 14:04:54 +08:00
slothever	51bbae27b8	[feature-wip](iceberg) add dlf and glue catalog impl for iceberg catalog (#16602 ) iceberg catalog supports DLF on Alibaba Cloud and AWS Glue Catalog	2023-02-23 14:02:41 +08:00
Jibing-Li	bc619ce5be	[Fix](load)Pass hidden column to load columns (#17004 ) The LoadScanProvider doesn't get Hidden Columns from stream load parameter. This may cause stream load delete operation fail. This pr is to pass the hidden columns to LoadScanProvider.	2023-02-23 13:54:36 +08:00
morrySnow	37960e83d3	[test](Nereids) add ssb sf0.1 p1 regression case (#17046 )	2023-02-23 12:25:10 +08:00
minghong	4a56140c3a	[fix](planner) sub_bitmap should be always nullable (#17010 ) sub_bitmap return type should be ALWAYS_NULLABLE, not depend on children. For example sub_bitmap(bitmap_empty(), 1, 2) return NULL, but all children are not null.	2023-02-23 12:18:28 +08:00
yongkang.zhong	2e1ed384fd	[typo](docs) add split_by_string function 1.2.2 label (#17057 )	2023-02-23 11:17:25 +08:00
Lijia Liu	8eeb435963	[improvement](meta) Enhance Doris's fault tolerance to disk error (#16472 ) Sense io error. Retry query when io error. Greylist: When finds one disk is completely broken, or the diff of tablet number in BE and FE meta is too large,reduce the query priority of the BE.	2023-02-23 08:40:45 +08:00
Xinyi Zou	a1c0054b4c	[fix](memory) fix memory GC details and join probe catch bad_alloc (#16989 ) Fix Redhat 4.x OS /proc/meminfo has no MemAvailable, disable MemAvailable to control memory. vm_rss_str and mem_available_str recorded when gc is triggered, to avoid memory changes during gc and cause inaccurate logs. join probe catch bad_alloc, this may alloc 64G memory at a time, avoid OOM. Modify document doris_be_all_segments_num and doris_be_all_rowsets_num names.	2023-02-23 08:33:30 +08:00
yongkang.zhong	d7d82f26af	[typo](docs) add date_trunc function 1.2 label (#17037 )	2023-02-22 22:42:18 +08:00
minghong	a9fb47a80a	[fix](planner) create view init bug (#16890 ) the body of create view stmt is parsed twice. in the second parse, we get sql string from CreateViewStmt.viewDefStmt.toSql() function, which missed selectlist.	2023-02-22 20:40:08 +08:00
mch_ucchi	df2f248712	[feature](planner) add dayofweek for FEFunctions to support fold constant (#16993 ) add dayofweek for FEFunctions to support fold constant. use Zellar algorithm	2023-02-22 20:27:49 +08:00
starocean999	7aa063c1f3	[fix](planner) bucket shuffle join is not recognized if the first table is a subquery (#16985 ) consider sql select * from (select * from test_1) a inner join (select * from test_2) b on a.id = b.id inner join (select * from test_3) c on a.id = c.id Because a.id is from a subquery, to find its source table, need use function getSrcSlotRef().	2023-02-22 20:23:00 +08:00
YueW	7b0fc17c04	[enhancement](inverted index) Support fulltext index evaluate equal query and list query (#16994 ) Fulltext index is the inverted index of the specified tokenizer, before this pr, fulltext index only can evaluate match predicate, this pr to support evaluate equal predicate and list predicate.	2023-02-22 20:18:10 +08:00
catpineapple	4c92730c3a	[fix](planner)fix multi partition support datetime column #16759	2023-02-22 19:38:42 +08:00
zhangstar333	dc3dab5a23	[vectorized](jdbc) fix jdbc connect sql server error (#16929 )	2023-02-22 19:36:27 +08:00
Mingyu Chen	12b6786522	[fix](hive) fix unable to specify user to access hdfs (#16999 ) In version 1.2.1, user can set `"hadoop.username" = "xxx"` to specify a remote user to access hdfs when creating hive catalog. But in version 1.2.2, we upgrade the hadoop version from 2.8 to 3.3, some behavior changed and the user specified remote user is useless. This PR try to fix this by using `UserGroupInformation` to delegate.	2023-02-22 19:35:40 +08:00
ZhangYu0123	56ebbf8bc9	[chore](tools) fix load-clickbench-data script cannot be interrupted #17000	2023-02-22 19:34:40 +08:00
yongkang.zhong	8dd1a12ea6	[typo](docs)Add upgrade precautions #17027	2023-02-22 19:27:20 +08:00
cjq9458	e48d9c9d62	[doc](typo)update datax.md #17009	2023-02-22 19:27:03 +08:00
Xinyi Zou	b194a7cf83	[improvement](memory) Support GC segment cache, when memory insufficient (#16987 ) fix segment cache memory tracker statistics support GC	2023-02-22 18:31:20 +08:00
DuRipeng	e65a061256	[Enhancement](datetimev2-enhance) support 'microseconds_add' function for datetimev2 (#16970 ) support 'microseconds_add' function for datetimev2	2023-02-22 17:49:41 +08:00
morrySnow	7956800df7	[refactor](Nereids) let type coercion same with legacy planner (#16844 ) - change for Nereids 1. add a variable length parameter to the ctor of Count for a good error reporting of Count(a, b) 2. refactor StringRegexPredicate, let it inherit from ScalarFunction 3. remove useless class TypeCollection 4. use catalog.Type.Collection to check expression arguments type 5. change type coercion for TimestampArithmetic, divide, integral divide, comparison predicate, case when and in predicate. Let them same as legacy planner. - change for legacy planner 1. change the common type of floating and Decimal from Decimal to Double	2023-02-22 17:29:37 +08:00
Xin Liao	0b624d282d	[enhancement](ut) add merge-on-write ut code back (#16939 )	2023-02-22 16:29:15 +08:00
plat1ko	66ceab540a	[fix](replica) Fix inconsistent replica id between BE and FE in corner case of tablet rebalance (#16889 )	2023-02-22 16:21:11 +08:00
Kang	51eb147711	fix inverted index doc typo and reorganize index related docs (#16915 )	2023-02-22 15:15:10 +08:00
Adonis Ling	0b3e18d060	[chore](macOS) Support LLVM Clang 15 (#16991 ) Remove the deprecated classes std::codecvt_utf8_utf16<char16_t> and std::wstring_convert. Use libiconv to convert UTF-8 strings to UTF-16LE ones.	2023-02-22 15:04:48 +08:00
zhannngchen	3636d0a561	[feature](merge-on-write) add DCHECK in compaction to detect data inconsistency (#16564 ) MoW will mark all duplicate primary key as deleted, so we can add a DCHECK while compaction, if MoW's delete bitmap works incorrectly, we're able to detect this kind of issue ASAP. In Debug version, DCHECK will make BE crush, in release version, compaction will fail and finally load will fail due to -235	2023-02-22 14:59:18 +08:00
chenlinzhong	0e3be4eff5	[Improvement](brpc) Using a thread pool for RPC service avoiding std::mutex block brpc::bthread (#16639 ) mainly include: - brpc service adds two types of thread pools. The number of "light" and "heavy" thread pools is different Classify the interfaces of be. Those related to data transmission are classified as heavy interfaces and others as light interfaces - Add some monitoring to the thread pool, including the queue size and the number of active threads. Use these - indicators to guide the configuration of the number of threads	2023-02-22 14:15:47 +08:00
airborne12	ad86b931d4	[Thirdparty](clucene) update clucene to v2.4.6 to fix bthread/pthread context bug (#16982 ) 1. change clucene version from 2.4.4->2.4.6 2. update build-thirdparty.sh clucene's build block, adding USE_BTHREAD CMAKE flag, this flag is inherited from doris's USE_BTHREAD_SCANNER.	2023-02-22 11:24:45 +08:00
zxealous	29c46d6926	[fix](struct-type) fix be core when load array orc file (#16978 ) * fix be core when load array orc file	2023-02-22 10:15:39 +08:00
Adonis Ling	4cb97b6fb7	[chore](macOS) Fix linkage errors for the release build (#17002 ) Issue Number: close #17003 ## Problem summary The linker couldn't find some symbols because the implementation of a template member function doris::vectorized::Decoder::init_decimal_converter is missing in the header file in which the corresponding declaration is placed.	2023-02-22 10:01:51 +08:00
UnicornLee	16c4e42f42	[typo](doc) 字段描述与建表sql中的不一致 (#16270 ) * 字段描述与建表sql中的不一致 * 1. 英文文档将`key_desc`改为`keys_type`。 * 1. 英文文档将`partition_desc`改为`partition_info`。 --------- Co-authored-by: unicornlee@dingtalk.com <lxb@201092104>	2023-02-21 23:00:26 +08:00
wudi	085f0826f6	update (#16975 ) Co-authored-by: wudi <>	2023-02-21 22:53:49 +08:00
YueW	76ef4af29d	[fix](alter inverted index) fix write edit log in replaymodifyTableAddOrDropInvertedIndices function (#16977 ) Actually, when modifyTableAddOrDropInvertedIndices, no need write logAlterJob edit log, because write logModifyTableAddOrDropInvertedIndices is enough	2023-02-21 22:36:56 +08:00
plat1ko	52f9e03eea	[fix](cooldown) Use `pending_remote_rowsets` to avoid deleting rowset files being uploaded (#16803 )	2023-02-21 21:58:20 +08:00
starocean999	0de8f90a83	[enhancement](nereids) add a session variable to control join reorder algorithm (#16783 ) 1. disable join reorder in nereids if session variable disable_join_reorder is true. 2. add a session variable max_table_count_use_cascades_join_reorder to control join reorder algorithm in nereids. if dp hyper is used only when enable_dphyp_optimizer is true and the joined table count more than max_table_count_use_cascades_join_reorder, which default value is 10.	2023-02-21 21:08:39 +08:00
zhengyu	09d41c3479	[fix](log) clarify error msg for tablet writer write failure (#14078 ) (#16954 ) (#16950 ) fmt::format dosen't support non-template object as args, even if it implements `to_string()` or `operator<<`. so orignal code may cause `false` to be printed instead of real cause of the failure. So to_string() need to be manually invoked. Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2023-02-21 19:42:49 +08:00
jakevin	54bf40b6e7	[feature](Nereids): Eliminate duplicate join condition. (#16910 )	2023-02-21 19:40:44 +08:00
AKIRA	a95f47ac0a	[ehancement](planner) Support filter the output of set operation node (#16666 )	2023-02-21 19:22:09 +08:00
TengJianPing	ed05f3b480	[regression-test](fuzzy) fuzzy session variable batch_size (#16384 )	2023-02-21 17:53:19 +08:00
HappenLee	f37da6e789	[Function](vec) use const column to opt function current_time() (#16953 )	2023-02-21 16:26:35 +08:00
YangShaw	cc839aead7	[fix](Nereids) fix signatures of some window functions (#16871 ) change signatures of lead(), lag(), first_value(), last_value() to be equal with legacy optimizer; these four functions only support Type.trivialTypes as returnType and input column type	2023-02-21 15:55:29 +08:00
ElvinWei	004872c99a	[fix](doc) fix invalid urls in tpch.md (#16949 )	2023-02-21 15:45:31 +08:00
Dazhuwei	246dd65435	[fix](doc) fix export-manual.md (#16969 )	2023-02-21 15:44:41 +08:00
zhangguoqiang	6cb452c22d	[improvement](test)Set compile required and add clickbench,arm to buildall (#16944 )	2023-02-21 14:47:17 +08:00
YueW	879a729afb	[improve](inverted index) not apply inverted index on 'in' or 'not_in' predicate which is produced by runtime_filter (#16952 ) When there are multi-table join query, there will be many in or not_in predicate of runtime filter pushed down to the storage layer. According to our test, if apply those predicates by inverted index, the performance will be degraded because there are many conditions in in_predicate. Therefore, the inverted index not apply on in or not_in predicate which is produced by runtime_filter. Based on that situation, this pr will do: not apply inverted index on in or not_in predicate which is produced by runtime_filter.	2023-02-21 14:24:50 +08:00
lihangyu	13ae8cd6c6	[doc](point query) add row cache doc for hight-concurrent-point-query (#16972 ) This code in VCollectIterator::build_heap is possible to cause double free if cumu_iter->init() fails and returns early, becuase some LevelIterator* exists both in VCollectIterator::_children and cumu_iter::_children.	2023-02-21 14:18:37 +08:00

... 96 97 98 99 100 ...

13721 Commits