doris

Author	SHA1	Message	Date
Pxl	d784c99360	[Bug](planner) fix unassigned conjunct assigned on wrong node (#19672 ) * fix unassigned conjunct assigned on wrong node	2023-05-17 10:28:22 +08:00
zxealous	2d9cc8fe8f	[improvement](file cache)Support set min file segment size while use block file cache (#19536 )	2023-05-17 10:23:33 +08:00
Gabriel	8fd1eb0d1e	[minor](hash table) parameterize hash table (#19653 )	2023-05-17 09:58:26 +08:00
Gabriel	0cae9bb3a1	[UT](decimalv3) fix FE UT when enable decimal conversion (#19701 )	2023-05-17 09:55:05 +08:00
TengJianPing	2bdfaac609	[fix](ubsan) fix ubsan errors (#19658 ) ixu ubsan errors: doris/be/src/util/string_parser.hpp:275:58: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int' doris/be/src/vec/functions/functions_comparison.h:214:51: runtime error: addition of unsigned offset to 0x7fea6c6b7010 overflowed to 0x7fea6c6b700c doris/be/src/vec/functions/multiply.cpp:67:50: runtime error: signed integer overflow: 1295699415680000000 * 0x0000000000015401d0a4cd4890a77700 cannot be represented in type '__int128 doris/be/src/vec/aggregate_functions/aggregate_function_percentile_approx.h:445:73: runtime error: addition of unsigned offset to 0x7feca3343d10 overflowed to 0x7feca3343d08 doris/be/src/exec/schema_scanner/schema_tables_scanner.cpp:330:24: run	2023-05-17 09:32:03 +08:00
zhangdong	54507bb058	[fix](FQDN)fix Checkpoint error (#19678 ) Must use Env.getServingEnv() instead of getCurrentEnv(),because here we need to obtain selfNode through the official service catalog.	2023-05-17 08:47:11 +08:00
fuchanghai	ccae3753e7	[fix](doc)update readme docs link 404 (#19719 )	2023-05-17 08:22:27 +08:00
abmdocrt	9cc7af6062	[doc](doris future) Add mentor doc for doris future in community page (#19690 )	2023-05-17 08:20:35 +08:00
abmdocrt	3a7bc3a7a8	[doc](retention) optimize retention doc (#19692 )	2023-05-17 08:17:45 +08:00
xy720	0d11c4207a	[docs](struct-type) add docs for struct and named_struct function (#19700 )	2023-05-17 08:16:33 +08:00
Pxl	7f73749b88	[Bug](pipeline) fix distributionColumnIds not updated correct when outputColumnUnique… (#19704 ) fix distributionColumnIds not updated correct when outputColumnUnique	2023-05-17 00:13:10 +08:00
yongkang.zhong	a1b1aff0ee	[improvement](jdbc catalog) Adapt to hana's special view & Optimize jdbc name format (#19696 )	2023-05-16 23:29:30 +08:00
yiguolei	8f8814e49c	[bugfix](be core) master info is deconstructed before fragment mgr and be will core (#19687 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-05-16 21:55:15 +08:00
Adonis Ling	fe553f7dfc	[chore](third-party) Support specifying packages to build (#19688 ) Usage: ./build-thirdparty.sh [options...] [packages...] Optional options: -j <num> build thirdparty parallel --clean clean the extracted data --continue <package> continue to build the remaining packages (starts from the specified package) Examples: 1. Specify packages to build. Build gflags, gtest and glog by executing ./build-thirdparty.sh gflags gtest glog. 2. Continue to build the remaining packages. Build the remaining packages (starts from sse2neon) by executing ./build-thirdparty.sh --continue sse2neon.	2023-05-16 19:24:19 +08:00
xy720	12c21287a5	[docs](struct-type) Add docs for struct type (#19694 )	2023-05-16 19:13:27 +08:00
Xinyi Zou	16f5d3d5b3	[Improvement](memory) new page use Allocator (#19472 )	2023-05-16 19:09:17 +08:00
zhannngchen	92a533724c	[enhancement](merge-on-write) avoid unecessary pk index iteration (#19620 )	2023-05-16 17:05:14 +08:00
Ziyu Wang	325a1d4b28	[vectorized](function) support array_count function (#18557 ) support array_count function. array_count：Returns the number of non-zero and non-null elements in the given array.	2023-05-16 17:00:01 +08:00
lihangyu	e22f5891d2	[WIP](row store) two phase opt read row store (#18654 )	2023-05-16 13:21:58 +08:00
Yongqiang YANG	610f1c8ef5	[improvement](load) skip compression when memtable is small (#19300 ) * [improvement](load) skip compression when memtable is small * format	2023-05-16 12:08:41 +08:00
slothever	3f2d1ae9a4	[feature-wip](multi-catalog)(step1)support connect to max compute (#19606 ) Issue Number: #19679 support connect to max compute metadata by odps sdk	2023-05-16 11:30:27 +08:00
Yongqiang YANG	9cd7005dec	[fix](delete) notify all when there is no high priority task (#19577 ) In somecases high priority threads are waked but normal are not. We notify_all as a workaround.	2023-05-16 11:29:10 +08:00
Pxl	b927f8cd37	[Chore](asan) change asan_suppr from interceptor_via_lib to interceptor_via_fun (#19636 ) change asan_suppr from interceptor_via_lib to interceptor_via_fun	2023-05-16 10:51:43 +08:00
Adonis Ling	ddcf7ec1b4	[chore](third-party) Don't link keyutils to krb5 explicitly (#19632 ) We may link system-wide keyutils to krb5 when building krb5 which may introduce an extra dependency to the codebase.	2023-05-16 10:37:37 +08:00
Stalary	9cede6d763	[fix](row-policy) row policy supports external catalog (#19570 ) Row policy support external catalog	2023-05-16 08:54:06 +08:00
Weijie Guo	9535ed01aa	[feature](tvf) Support compress file for tvf hdfs() and s3() (#19530 ) We can support this by add a new properties for tvf, like : `select * from hdfs("uri" = "xxx", ..., "compress_type" = "lz4", ...)` User can: Specify compression explicitly by setting `"compression" = "xxx"`. Doris can infer the compression type by the suffix of file name(e.g. `file1.gz`) Currently, we only support reading compress file in `csv` format, and on BE side, we already support. All need to do is to analyze the `"compress_type"` on FE side and pass it to BE.	2023-05-16 08:50:43 +08:00
Xiangyu Wang	8284c342cb	[Fix](multi-catalog) Fix query hms tbl with compressed data files. (#19557 ) If a hms table's file format is csv, uncompressed data files may be coexists with compressed data files, so we need to set compressType separately.	2023-05-16 08:49:45 +08:00
zhangdong	e48524009d	[doc](fqdn)fqdn doc en (#19634 )	2023-05-16 08:48:34 +08:00
zhangdong	8ec18660fe	[improvement](FQDN)Remove unused code (#19638 )	2023-05-16 08:48:20 +08:00
HHoflittlefish777	e2b8c0004b	[Fix](lazy_open) Fix dead lock in lazy open (#19652 )	2023-05-15 23:18:33 +08:00
luozenglin	6c9c9e9765	[feature-wip](resource-group) Supports memory hard isolation of resource group (#19526 )	2023-05-15 22:45:46 +08:00
Yongqiang YANG	276e631e9c	[chore](ddlExecutor) log class of unknown stmt in DdlExecutor (#19631 ) * [chore](ddlExecutor) log class of unknown stmt in DdlExecutor	2023-05-15 21:59:44 +08:00
Mingyu Chen	643db55a78	[improvement](thread) stop threads when BE exit gracefully (#19506 )	2023-05-15 21:54:21 +08:00
yongkang.zhong	ac9e92e1aa	[typo](docs) Optimize mac compilation documentation (#19629 )	2023-05-15 20:34:47 +08:00
Dongyang Li	0a28959675	[config](mem) change default mem_limit from 90% to 80% (#19602 ) With the default config of 90%, be may meet OOM when the load pressure is big. when set to 80%, be works well with the same load pressure in my cluster.	2023-05-15 17:48:43 +08:00
zhannngchen	fad9237d30	[fix](storage) consider file size on page cache key (#19619 ) The core is due to a DCHECK: F0513 22:48:56.059758 3996895 tablet.cpp:2690] Check failed: num_to_read == num_read Finally, we found that the DCHECK failure is due to page cache: 1. At first we have 20 segments, which id is 0-19. 2. For MoW table, memtable flush process will calculate the delete bitmap. In this procedure, the index pages and data pages of PrimaryKeyIndex is loaded to cache 3. Segment compaction compact all these 10 segments to 2 segment, and rename it to id 0,1 4. Finally, before the load commit, we'll calculate delete bitmap between segments in current rowset. This procedure need to iterator primary key index of each segments, but when we access data of new compacted segments, we read data of old segments in page cache To fix this issue, the best policy is: 1. Add a crc32 or last modified time to CacheKey. 2. Or invalid related cache keys after segment compaction. For policy 1, we don't have crc32 in segment footer, and getting the last-modified-time needs to perform 1 additional disk IO. For policy 2, we need to add additional page cache invalidation methods, which may cause the page cache not stable So I think we can simply add a file size to identify that the file is changed. In LSM-Tree, all modification will generate new files, such file-name reuse is not normal case(as far as I know, only segment compaction), file size is enough to identify the file change.	2023-05-15 17:16:31 +08:00
Liqf	c87e78dc35	[bug](jsonb) fix jsonb query bug When the json key value contains "." (#19185 ) Issue Number: close #19173 mysql> SELECT jsonb_extract('{"a.b.c":{"k1":"v31", "k2.a1": 300},"a":"opentelemetry"}', '$."a.b.c".k1'); +-------------------------------------------------------------------------------------------+ \| jsonb_extract('{"a.b.c":{"k1":"v31", "k2.a1": 300},"a":"opentelemetry"}', '$."a.b.c".k1') \| +-------------------------------------------------------------------------------------------+ \| "v31" \| +-------------------------------------------------------------------------------------------+ 1 row in set (0.06 sec)	2023-05-15 15:43:12 +08:00
LiBinfeng	052c7cff89	[Fix](Planner) fix cast from decimal to boolean (#19585 )	2023-05-15 15:13:16 +08:00
Pxl	2a02561863	[Bug](ubsan) fix some wrong downcast founded by ubsan (#19591 ) fix some wrong downcast founded by ubsan. ```cpp doris/be/src/olap/bloom_filter_predicate.h:43:32: runtime error: downcast of address 0x7f8ec2b691a0 which does not point to an object of type 'doris::BloomFilterColumnPredicate<doris::TYPE_DATE>::SpecificFilter' (aka 'BloomFilterFunc<(doris::PrimitiveType)11U>') 0x7f8ec2b691a0: note: object is of type 'doris::BloomFilterFunc<(doris::PrimitiveType)12>' e5 55 00 00 10 74 58 42 e5 55 00 00 00 00 10 00 8e 7f 00 00 20 07 6f cc 8e 7f 00 00 80 fe 68 cc ^~~~~~~~~~~~~~~~~~~~~~~ vptr for 'doris::BloomFilterFunc<(doris::PrimitiveType)12>' ``` 1. TYPE_DATE/TYPE_DATETIME have same data format, so I change the cast about bloom filter to reinterpret cast. ```cpp doris/be/src/vec/exec/format/orc/vorc_reader.h:281:17: runtime error: downcast of address 0x7f562f4c3180 which does not point to an object of type 'ColumnVector<int>' 0x7f562f4c3180: note: object is of type 'doris::vectorized::ColumnDecimal<doris::vectorized::Decimal<int> >' 74 65 00 00 20 91 70 f5 ca 55 00 00 02 00 00 00 00 00 00 00 f0 d4 4c 2f 56 7f 00 00 f0 d4 4c 2f ^~~~~~~~~~~~~~~~~~~~~~~ vptr for 'doris::vectorized::ColumnDecimal<doris::vectorized::Decimal<int> >' ``` 2. doris use ColumnDecimal to store decimal elements.	2023-05-15 14:27:48 +08:00
jakevin	69243b3a57	[fix](Nereids): SemiJoinLogicalJoinTranspose shouldn't throw error when eliminate outer failed. (#19566 )	2023-05-15 12:31:54 +08:00
Pxl	4eb2604789	[Bug](function) fix function define of Retention inconsist and change some static_cast to assert cast (#19455 ) 1. fix function define of `Retention` inconsist, this function return tinyint on `FE` and return uint8 on `BE` 2. make assert_cast support cast to derived 3. change some static cast to assert cast 4. support sum(bool)/avg(bool)	2023-05-15 11:50:02 +08:00
Zhang Wenxin	5df5c77d39	[fix](Nereids) should not colocate agg when scan data partition is random (#19598 )	2023-05-15 11:22:41 +08:00
Zhengguo Yang	6748ae4a57	[Feature] Collect the information statistics of the query hit (#18805 ) 1. Show the query hit statistics for `baseall` ```sql MySQL [test_query_db]> show query stats from baseall; +-------+------------+-------------+ \| Field \| QueryCount \| FilterCount \| +-------+------------+-------------+ \| k0 \| 0 \| 0 \| \| k1 \| 0 \| 0 \| \| k2 \| 0 \| 0 \| \| k3 \| 0 \| 0 \| \| k4 \| 0 \| 0 \| \| k5 \| 0 \| 0 \| \| k6 \| 0 \| 0 \| \| k10 \| 0 \| 0 \| \| k11 \| 0 \| 0 \| \| k7 \| 0 \| 0 \| \| k8 \| 0 \| 0 \| \| k9 \| 0 \| 0 \| \| k12 \| 0 \| 0 \| \| k13 \| 0 \| 0 \| +-------+------------+-------------+ 14 rows in set (0.002 sec) MySQL [test_query_db]> select k0, k1,k2, sum(k3) from baseall where k9 > 1 group by k0,k1,k2; +------+------+--------+-------------+ \| k0 \| k1 \| k2 \| sum(`k3`) \| +------+------+--------+-------------+ \| 0 \| 6 \| 32767 \| 3021 \| \| 1 \| 12 \| 32767 \| -2147483647 \| \| 0 \| 3 \| 1989 \| 1002 \| \| 0 \| 7 \| -32767 \| 1002 \| \| 1 \| 8 \| 255 \| 2147483647 \| \| 1 \| 9 \| 1991 \| -2147483647 \| \| 1 \| 11 \| 1989 \| 25699 \| \| 1 \| 13 \| -32767 \| 2147483647 \| \| 1 \| 14 \| 255 \| 103 \| \| 0 \| 1 \| 1989 \| 1001 \| \| 0 \| 2 \| 1986 \| 1001 \| \| 1 \| 15 \| 1992 \| 3021 \| +------+------+--------+-------------+ 12 rows in set (0.050 sec) MySQL [test_query_db]> show query stats from baseall; +-------+------------+-------------+ \| Field \| QueryCount \| FilterCount \| +-------+------------+-------------+ \| k0 \| 1 \| 0 \| \| k1 \| 1 \| 0 \| \| k2 \| 1 \| 0 \| \| k3 \| 1 \| 0 \| \| k4 \| 0 \| 0 \| \| k5 \| 0 \| 0 \| \| k6 \| 0 \| 0 \| \| k10 \| 0 \| 0 \| \| k11 \| 0 \| 0 \| \| k7 \| 0 \| 0 \| \| k8 \| 0 \| 0 \| \| k9 \| 1 \| 1 \| \| k12 \| 0 \| 0 \| \| k13 \| 0 \| 0 \| +-------+------------+-------------+ 14 rows in set (0.001 sec) ``` 2. Show the query hit statistics summary for all the mv in a table ```sql MySQL [test_query_db]> show query stats from baseall all; +-----------+------------+ \| IndexName \| QueryCount \| +-----------+------------+ \| baseall \| 1 \| +-----------+------------+ 1 row in set (0.005 sec) ``` 3. Show the query hit statistics detail info for all the mv in a table ```sql MySQL [test_query_db]> show query stats from baseall all verbose; +-----------+-------+------------+-------------+ \| IndexName \| Field \| QueryCount \| FilterCount \| +-----------+-------+------------+-------------+ \| baseall \| k0 \| 1 \| 0 \| \| \| k1 \| 1 \| 0 \| \| \| k2 \| 1 \| 0 \| \| \| k3 \| 1 \| 0 \| \| \| k4 \| 0 \| 0 \| \| \| k5 \| 0 \| 0 \| \| \| k6 \| 0 \| 0 \| \| \| k10 \| 0 \| 0 \| \| \| k11 \| 0 \| 0 \| \| \| k7 \| 0 \| 0 \| \| \| k8 \| 0 \| 0 \| \| \| k9 \| 1 \| 1 \| \| \| k12 \| 0 \| 0 \| \| \| k13 \| 0 \| 0 \| +-----------+-------+------------+-------------+ 14 rows in set (0.017 sec) ``` 4. Show the query hit for a database ```sql MySQL [test_query_db]> show query stats for test_query_db; +----------------------------+------------+ \| TableName \| QueryCount \| +----------------------------+------------+ \| compaction_tbl \| 0 \| \| bigtable \| 0 \| \| empty \| 0 \| \| tempbaseall \| 0 \| \| test \| 0 \| \| test_data_type \| 0 \| \| test_string_function_field \| 0 \| \| baseall \| 1 \| \| nullable \| 0 \| +----------------------------+------------+ 9 rows in set (0.005 sec) ``` 5. Show query hit statistics for all the databases ```sql MySQL [(none)]> show query stats; +-----------------+------------+ \| Database \| QueryCount \| +-----------------+------------+ \| test_query_db \| 1 \| +-----------------+------------+ 1 rows in set (0.005 sec) ```	2023-05-15 10:56:34 +08:00
zclllyybb	92bf485abd	[Bug] Fix doris pipeline shared scan and top n opt (#19599 )	2023-05-15 10:00:44 +08:00
Mingyu Chen	554b89183b	[community](collaborator) remove inactive collaborator (#19627 )	2023-05-15 09:49:28 +08:00
zzzzzzzs	91d5e956a0	[typo](doc) Fixed typos in cluster-action.md (#19549 )	2023-05-14 23:52:41 +08:00
Hong Liu	80886af828	[doc](grant)add the version for grant for user; (#19556 )	2023-05-14 23:52:18 +08:00
zzzzzzzs	859b203b1d	[typo](doc) Fixed typos in query-profile-action.md (#19552 )	2023-05-14 23:51:58 +08:00
wudi	2b402483a9	add release shade and sdk doc (#19576 )	2023-05-14 23:51:17 +08:00
DongLiang-0	f4aea2a6db	[Doc](binlog-load) delete binlog-load doc side bar (#19593 )	2023-05-14 23:50:55 +08:00

1 2 3 4 5 ...

10501 Commits