doris

Author	SHA1	Message	Date
yongkang.zhong	ac9e92e1aa	[typo](docs) Optimize mac compilation documentation (#19629 )	2023-05-15 20:34:47 +08:00
Dongyang Li	0a28959675	[config](mem) change default mem_limit from 90% to 80% (#19602 ) With the default config of 90%, be may meet OOM when the load pressure is big. when set to 80%, be works well with the same load pressure in my cluster.	2023-05-15 17:48:43 +08:00
zhannngchen	fad9237d30	[fix](storage) consider file size on page cache key (#19619 ) The core is due to a DCHECK: F0513 22:48:56.059758 3996895 tablet.cpp:2690] Check failed: num_to_read == num_read Finally, we found that the DCHECK failure is due to page cache: 1. At first we have 20 segments, which id is 0-19. 2. For MoW table, memtable flush process will calculate the delete bitmap. In this procedure, the index pages and data pages of PrimaryKeyIndex is loaded to cache 3. Segment compaction compact all these 10 segments to 2 segment, and rename it to id 0,1 4. Finally, before the load commit, we'll calculate delete bitmap between segments in current rowset. This procedure need to iterator primary key index of each segments, but when we access data of new compacted segments, we read data of old segments in page cache To fix this issue, the best policy is: 1. Add a crc32 or last modified time to CacheKey. 2. Or invalid related cache keys after segment compaction. For policy 1, we don't have crc32 in segment footer, and getting the last-modified-time needs to perform 1 additional disk IO. For policy 2, we need to add additional page cache invalidation methods, which may cause the page cache not stable So I think we can simply add a file size to identify that the file is changed. In LSM-Tree, all modification will generate new files, such file-name reuse is not normal case(as far as I know, only segment compaction), file size is enough to identify the file change.	2023-05-15 17:16:31 +08:00
Liqf	c87e78dc35	[bug](jsonb) fix jsonb query bug When the json key value contains "." (#19185 ) Issue Number: close #19173 mysql> SELECT jsonb_extract('{"a.b.c":{"k1":"v31", "k2.a1": 300},"a":"opentelemetry"}', '$."a.b.c".k1'); +-------------------------------------------------------------------------------------------+ \| jsonb_extract('{"a.b.c":{"k1":"v31", "k2.a1": 300},"a":"opentelemetry"}', '$."a.b.c".k1') \| +-------------------------------------------------------------------------------------------+ \| "v31" \| +-------------------------------------------------------------------------------------------+ 1 row in set (0.06 sec)	2023-05-15 15:43:12 +08:00
LiBinfeng	052c7cff89	[Fix](Planner) fix cast from decimal to boolean (#19585 )	2023-05-15 15:13:16 +08:00
Pxl	2a02561863	[Bug](ubsan) fix some wrong downcast founded by ubsan (#19591 ) fix some wrong downcast founded by ubsan. ```cpp doris/be/src/olap/bloom_filter_predicate.h:43:32: runtime error: downcast of address 0x7f8ec2b691a0 which does not point to an object of type 'doris::BloomFilterColumnPredicate<doris::TYPE_DATE>::SpecificFilter' (aka 'BloomFilterFunc<(doris::PrimitiveType)11U>') 0x7f8ec2b691a0: note: object is of type 'doris::BloomFilterFunc<(doris::PrimitiveType)12>' e5 55 00 00 10 74 58 42 e5 55 00 00 00 00 10 00 8e 7f 00 00 20 07 6f cc 8e 7f 00 00 80 fe 68 cc ^~~~~~~~~~~~~~~~~~~~~~~ vptr for 'doris::BloomFilterFunc<(doris::PrimitiveType)12>' ``` 1. TYPE_DATE/TYPE_DATETIME have same data format, so I change the cast about bloom filter to reinterpret cast. ```cpp doris/be/src/vec/exec/format/orc/vorc_reader.h:281:17: runtime error: downcast of address 0x7f562f4c3180 which does not point to an object of type 'ColumnVector<int>' 0x7f562f4c3180: note: object is of type 'doris::vectorized::ColumnDecimal<doris::vectorized::Decimal<int> >' 74 65 00 00 20 91 70 f5 ca 55 00 00 02 00 00 00 00 00 00 00 f0 d4 4c 2f 56 7f 00 00 f0 d4 4c 2f ^~~~~~~~~~~~~~~~~~~~~~~ vptr for 'doris::vectorized::ColumnDecimal<doris::vectorized::Decimal<int> >' ``` 2. doris use ColumnDecimal to store decimal elements.	2023-05-15 14:27:48 +08:00
jakevin	69243b3a57	[fix](Nereids): SemiJoinLogicalJoinTranspose shouldn't throw error when eliminate outer failed. (#19566 )	2023-05-15 12:31:54 +08:00
Pxl	4eb2604789	[Bug](function) fix function define of Retention inconsist and change some static_cast to assert cast (#19455 ) 1. fix function define of `Retention` inconsist, this function return tinyint on `FE` and return uint8 on `BE` 2. make assert_cast support cast to derived 3. change some static cast to assert cast 4. support sum(bool)/avg(bool)	2023-05-15 11:50:02 +08:00
Zhang Wenxin	5df5c77d39	[fix](Nereids) should not colocate agg when scan data partition is random (#19598 )	2023-05-15 11:22:41 +08:00
Zhengguo Yang	6748ae4a57	[Feature] Collect the information statistics of the query hit (#18805 ) 1. Show the query hit statistics for `baseall` ```sql MySQL [test_query_db]> show query stats from baseall; +-------+------------+-------------+ \| Field \| QueryCount \| FilterCount \| +-------+------------+-------------+ \| k0 \| 0 \| 0 \| \| k1 \| 0 \| 0 \| \| k2 \| 0 \| 0 \| \| k3 \| 0 \| 0 \| \| k4 \| 0 \| 0 \| \| k5 \| 0 \| 0 \| \| k6 \| 0 \| 0 \| \| k10 \| 0 \| 0 \| \| k11 \| 0 \| 0 \| \| k7 \| 0 \| 0 \| \| k8 \| 0 \| 0 \| \| k9 \| 0 \| 0 \| \| k12 \| 0 \| 0 \| \| k13 \| 0 \| 0 \| +-------+------------+-------------+ 14 rows in set (0.002 sec) MySQL [test_query_db]> select k0, k1,k2, sum(k3) from baseall where k9 > 1 group by k0,k1,k2; +------+------+--------+-------------+ \| k0 \| k1 \| k2 \| sum(`k3`) \| +------+------+--------+-------------+ \| 0 \| 6 \| 32767 \| 3021 \| \| 1 \| 12 \| 32767 \| -2147483647 \| \| 0 \| 3 \| 1989 \| 1002 \| \| 0 \| 7 \| -32767 \| 1002 \| \| 1 \| 8 \| 255 \| 2147483647 \| \| 1 \| 9 \| 1991 \| -2147483647 \| \| 1 \| 11 \| 1989 \| 25699 \| \| 1 \| 13 \| -32767 \| 2147483647 \| \| 1 \| 14 \| 255 \| 103 \| \| 0 \| 1 \| 1989 \| 1001 \| \| 0 \| 2 \| 1986 \| 1001 \| \| 1 \| 15 \| 1992 \| 3021 \| +------+------+--------+-------------+ 12 rows in set (0.050 sec) MySQL [test_query_db]> show query stats from baseall; +-------+------------+-------------+ \| Field \| QueryCount \| FilterCount \| +-------+------------+-------------+ \| k0 \| 1 \| 0 \| \| k1 \| 1 \| 0 \| \| k2 \| 1 \| 0 \| \| k3 \| 1 \| 0 \| \| k4 \| 0 \| 0 \| \| k5 \| 0 \| 0 \| \| k6 \| 0 \| 0 \| \| k10 \| 0 \| 0 \| \| k11 \| 0 \| 0 \| \| k7 \| 0 \| 0 \| \| k8 \| 0 \| 0 \| \| k9 \| 1 \| 1 \| \| k12 \| 0 \| 0 \| \| k13 \| 0 \| 0 \| +-------+------------+-------------+ 14 rows in set (0.001 sec) ``` 2. Show the query hit statistics summary for all the mv in a table ```sql MySQL [test_query_db]> show query stats from baseall all; +-----------+------------+ \| IndexName \| QueryCount \| +-----------+------------+ \| baseall \| 1 \| +-----------+------------+ 1 row in set (0.005 sec) ``` 3. Show the query hit statistics detail info for all the mv in a table ```sql MySQL [test_query_db]> show query stats from baseall all verbose; +-----------+-------+------------+-------------+ \| IndexName \| Field \| QueryCount \| FilterCount \| +-----------+-------+------------+-------------+ \| baseall \| k0 \| 1 \| 0 \| \| \| k1 \| 1 \| 0 \| \| \| k2 \| 1 \| 0 \| \| \| k3 \| 1 \| 0 \| \| \| k4 \| 0 \| 0 \| \| \| k5 \| 0 \| 0 \| \| \| k6 \| 0 \| 0 \| \| \| k10 \| 0 \| 0 \| \| \| k11 \| 0 \| 0 \| \| \| k7 \| 0 \| 0 \| \| \| k8 \| 0 \| 0 \| \| \| k9 \| 1 \| 1 \| \| \| k12 \| 0 \| 0 \| \| \| k13 \| 0 \| 0 \| +-----------+-------+------------+-------------+ 14 rows in set (0.017 sec) ``` 4. Show the query hit for a database ```sql MySQL [test_query_db]> show query stats for test_query_db; +----------------------------+------------+ \| TableName \| QueryCount \| +----------------------------+------------+ \| compaction_tbl \| 0 \| \| bigtable \| 0 \| \| empty \| 0 \| \| tempbaseall \| 0 \| \| test \| 0 \| \| test_data_type \| 0 \| \| test_string_function_field \| 0 \| \| baseall \| 1 \| \| nullable \| 0 \| +----------------------------+------------+ 9 rows in set (0.005 sec) ``` 5. Show query hit statistics for all the databases ```sql MySQL [(none)]> show query stats; +-----------------+------------+ \| Database \| QueryCount \| +-----------------+------------+ \| test_query_db \| 1 \| +-----------------+------------+ 1 rows in set (0.005 sec) ```	2023-05-15 10:56:34 +08:00
zclllyybb	92bf485abd	[Bug] Fix doris pipeline shared scan and top n opt (#19599 )	2023-05-15 10:00:44 +08:00
Mingyu Chen	554b89183b	[community](collaborator) remove inactive collaborator (#19627 )	2023-05-15 09:49:28 +08:00
zzzzzzzs	91d5e956a0	[typo](doc) Fixed typos in cluster-action.md (#19549 )	2023-05-14 23:52:41 +08:00
Hong Liu	80886af828	[doc](grant)add the version for grant for user; (#19556 )	2023-05-14 23:52:18 +08:00
zzzzzzzs	859b203b1d	[typo](doc) Fixed typos in query-profile-action.md (#19552 )	2023-05-14 23:51:58 +08:00
wudi	2b402483a9	add release shade and sdk doc (#19576 )	2023-05-14 23:51:17 +08:00
DongLiang-0	f4aea2a6db	[Doc](binlog-load) delete binlog-load doc side bar (#19593 )	2023-05-14 23:50:55 +08:00
AlexYue	0617c7e56b	[enhance](Cold&Heat separation) use file block cache for cold heat separation rowset (#19410 ) For performance issue, we would specify rowset included by cold heat separation table to use file block cache no matter what config user has set. I've tested the config using cold_heat_seperation_case_p2 and it works well.	2023-05-14 22:06:26 +08:00
abmdocrt	be0f4abc71	[doc](doris-future)Add doc for doris future (#19617 )	2023-05-14 20:22:05 +08:00
nanfeng	0068828a94	[Feature](insert) support insert overwrite stmt (#19616 )	2023-05-14 20:01:30 +08:00
HHoflittlefish777	f8ef25bb10	[enhancement](load) lazy-open necessary partitions when load (#18874 )	2023-05-14 16:09:55 +08:00
Tiewei Fang	91cdb79d89	[Bugfix](Outfile) fix that export data to parquet and orc file format (#19436 ) 1. support export `LARGEINT` data type to parquet/orc file format. 2. Export the DORIS `DATE/DATETIME` type to the `Date/Timestamp` logic type of parquet file format. 3. Fix that the data is not correct when the DATE type data is exported to ORC.	2023-05-13 22:39:24 +08:00
Adonis Ling	e98f4c4a5e	[fix](be) BE UT built against Clang-16 failed (#19610 ) If we use Clang-16 to build the third-party libraries and build doris_be_test against them, we can not run doris_be_test successfully. Some errors with BRPC occur. I tested this on Linux (x86_64) and macOS (x86_64/arm64), these errors always raised.	2023-05-13 22:32:29 +08:00
wangbo	38294b98db	Fix comparator of ResouceGroupSet (#19523 )	2023-05-13 09:17:16 +08:00
zhannngchen	86ba0ebf42	[fix](mow) revert 17147 and 18750 (#19583 )	2023-05-13 08:43:36 +08:00
slothever	cd9d633c1b	[doc](multi-catalog)add properties converter docs (#18287 ) update doc for #18005	2023-05-12 21:03:30 +08:00
HappenLee	cb943ae7ca	[pipeline](bug) DCHECK may failed in pip sender queue (#19545 ) DCHECK may failed in pip sender queue	2023-05-12 20:39:18 +08:00
ElvinWei	26d1eb64d2	[Doc](statistics) add statistics documents (#19323 ) The stats feature will continue to be refined, and the documentation will change over time.	2023-05-12 20:11:29 +08:00
ZhangYu0123	03d774d0af	[fix](inverted index) fix query fail caused by FullTextIndexReader not check index file whether exists	2023-05-12 20:00:10 +08:00
LiBinfeng	316223ef34	[fix](planner) forbidden query in insert value list (#19493 )	2023-05-12 19:46:19 +08:00
Mingyu Chen	4142cc0e8c	[fix](merge conflict) fix FE compile error (#19586 )	2023-05-12 18:18:22 +08:00
ElvinWei	c37d781942	[enchancement](statistics) manually inject table level statistics (#19495 ) supports users to manually inject table level statistics. table stats type: - row_count Modify table or partition statistics: ```SQL ALTER TABLE table_name SET STATS ('k1' = 'v1', ...) ``` TODO： - support other table stats type if necessary - update statistics cache if necessary	2023-05-12 17:03:12 +08:00
zhangdong	26a7f86b66	[improvement](auth)only GRANT_PRIV and USAGE_PRIV can GRANT for RESOURCE (#19547 ) only GRANT_PRIV and USAGE_PRIV can GRANT for RESOURCE	2023-05-12 15:47:04 +08:00
Mingyu Chen	26e930eed1	[Fix](multi-catalog) Make BE selection policy works fine when enable prefer_compute_node_for_external_table (#19346 )	2023-05-12 15:32:50 +08:00
Mingyu Chen	860ce97622	[feature](torc) support insert only transactional hive table on FE side (#19419 ) * [feature](torc) support insert only transactional hive table on FE side * 3 * commit * 1	2023-05-12 15:32:26 +08:00
zzzzzzzs	feef5afa0b	[typo](doc) Fixed typos in SHOW-ROUTINE-LOAD.md (#19573 )	2023-05-12 14:37:28 +08:00
Zhang Wenxin	a1da57c63e	[opt](Nereids)(WIP) optimize agg and window normalization step 2 #19305 1. refactor aggregate normalization to avoid data amplification before aggregate 2. remove useless aggreagte processing in ExtractAndNormalizeWindowExpression 3. only push distinct aggregate function children TODO: 1. push down redundant expression in aggregate functions 2. refactor normalize repeat rule 3. move expression normalization and optimization after plan normalization to avoid unexpected expression optimization.	2023-05-12 14:00:13 +08:00
Mingyu Chen	0477a9f5de	[fix](dateformat) Fix hour date format (#19569 ) Introduced from #19265. The hour format should support both "5" and "05".	2023-05-12 13:38:41 +08:00
luozenglin	56a6431b55	[fix](pipeline) fix query returns empty result instead of an error occasionally after being cancelled (#19561 )	2023-05-12 12:40:41 +08:00
Gabriel	56bc8a762d	[decimalv3](literal) use decimalv3 literal if enable_decimal_conversion is true (#19559 )	2023-05-12 12:01:54 +08:00
Mingyu Chen	9bf6ecca48	[minor](log) change debug log to info to observe the storage medium change #19529 When user set default_storage_medium to true, the storage medium of all partitions should be SSD, and cooldown time should be 9999-12-31 23:59:59. So that it won't change to HDD. But looks like sometimes it still change to HDD. So I change the debug log to info to observer it.	2023-05-12 11:02:55 +08:00
yiguolei	8ef9212ddc	[enhancement](exceptionsafe) force check exec node method's return value (#19538 )	2023-05-12 10:21:00 +08:00
AlexYue	157ec5757a	[fix](s3FileWriter) don't use bthread countdown event to sync #19534 Unfortunately BthreadCountDownEvent will not serve as one sync primitive for this scenario where are all pthread workers. BthreadCountDownEvent::time_wait is used for bthread so it will result in some confusing sync problem like heap buffer use after free.	2023-05-12 09:19:57 +08:00
Jerry Hu	bd6a36091e	[chore](cmake) fix DORIS_JAVA_HOME from JAVA_HOME (#19521 )	2023-05-12 09:12:38 +08:00
Yongqiang YANG	1296a920c2	[chore](collaborator) add several collaborators to manage issue (#19550 )	2023-05-12 09:09:52 +08:00
Mingyu Chen	868bae47f6	[improvement](docker) update compilation Dockerfile (#19563 )	2023-05-12 09:06:45 +08:00
starocean999	e9392780a9	[fix](nereids)fix some nereids planner bugs (#19509 ) 1.some encrypt and decrypt functions have wrong blockEncryptionMode 2.topN node should compare tuples from intermediate_row_desc with first_sort_slot.tuple_id 3.must keep the limit if it's an uncorrelated in-subquery with limit on sort, like select a from t1 where a in ( select b from t2 order by xx limit yy )	2023-05-12 09:06:16 +08:00
Chuang Li	a041f8eabe	[fix](fe) Fx SimpleDateFormatter thread unsafe issue by replacing to DateTimeFormatter. (#19265 ) DateTimeFormatter replace SimpleDateFormat in fe module because SimpleDateFormat is not thread-safe.	2023-05-11 22:50:24 +08:00
jakevin	d58498841a	[fix](Nereids) Should copy JoinReorderContext for PushdownProject (#19508 ) 1. should copy JoinReorderContext 2. verify bushy tree join reorder	2023-05-11 21:05:12 +08:00
yongkang.zhong	9568de303a	[Chore](build) update clang-format version check (#19542 ) update clang-format version check	2023-05-11 19:38:58 +08:00

1 2 3 4 5 ...

10468 Commits