Commit Graph

10511 Commits

Author SHA1 Message Date
cc9d340400 [Fix](Nereids) Fix minidump connect context loading and concurrency bug (#19578)
There are two problems of mini dump:
1、minidump do not load connect context to ThreadInfo, so it can not be get easily
2、minidump write maps with not concurrent protection, so the map size with change when we iterating map iterator

Solution:
1、loading connect context to minidump threading
2、use immutable map copy a new map before we actually doing iteration
2023-05-17 15:09:00 +08:00
3e661a30c2 [fix](planner)just return non-empty side of ExprSubstitutionMap if one of ExprSubstitutionMap is empty (#19600) 2023-05-17 15:06:43 +08:00
d9950a6422 [fix](Nereids) not fallback correctly when do forward (#19675) 2023-05-17 14:22:40 +08:00
802e55114b http interfaces between FEs are not redirected (#19590) 2023-05-17 14:21:53 +08:00
48ec530d2c [fix](functions) fix least/greatest function coredump bug (#19462)
fix least/greatest function coredump bug
2023-05-17 14:12:52 +08:00
56809230d1 [Improvement](string function) optimize substring and in string set (#19257)
* [Improvement](string function) optimize substring and in string set

* update
2023-05-17 14:09:52 +08:00
4607a3408e [minor](Nereids): unify name about Transpose. (#19662) 2023-05-17 11:33:02 +08:00
1462e44162 [Bug](topn) fix rowid fetcher merge with empty block (#19712) 2023-05-17 10:56:32 +08:00
f95c1d7cb6 [feat](profile) Add a new rest api to query instance host and ip information for query profile action in branch master(#18668) (#19643) 2023-05-17 10:52:47 +08:00
c98147375d [fix](Nereids) decimal compare float should use double as common type (#19710) 2023-05-17 10:36:04 +08:00
Pxl
d784c99360 [Bug](planner) fix unassigned conjunct assigned on wrong node (#19672)
* fix unassigned conjunct assigned on wrong node
2023-05-17 10:28:22 +08:00
2d9cc8fe8f [improvement](file cache)Support set min file segment size while use block file cache (#19536) 2023-05-17 10:23:33 +08:00
8fd1eb0d1e [minor](hash table) parameterize hash table (#19653) 2023-05-17 09:58:26 +08:00
0cae9bb3a1 [UT](decimalv3) fix FE UT when enable decimal conversion (#19701) 2023-05-17 09:55:05 +08:00
2bdfaac609 [fix](ubsan) fix ubsan errors (#19658)
ixu ubsan errors:

doris/be/src/util/string_parser.hpp:275:58: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'

doris/be/src/vec/functions/functions_comparison.h:214:51: runtime error: addition of unsigned offset to 0x7fea6c6b7010 overflowed to 0x7fea6c6b700c

doris/be/src/vec/functions/multiply.cpp:67:50: runtime error: signed integer overflow: 1295699415680000000 * 0x0000000000015401d0a4cd4890a77700 cannot be represented in type '__int128

doris/be/src/vec/aggregate_functions/aggregate_function_percentile_approx.h:445:73: runtime error: addition of unsigned offset to 0x7feca3343d10 overflowed to 0x7feca3343d08 

doris/be/src/exec/schema_scanner/schema_tables_scanner.cpp:330:24: run
2023-05-17 09:32:03 +08:00
54507bb058 [fix](FQDN)fix Checkpoint error (#19678)
Must use Env.getServingEnv() instead of getCurrentEnv(),because here we need to obtain selfNode through the official service catalog.
2023-05-17 08:47:11 +08:00
ccae3753e7 [fix](doc)update readme docs link 404 (#19719) 2023-05-17 08:22:27 +08:00
9cc7af6062 [doc](doris future) Add mentor doc for doris future in community page (#19690) 2023-05-17 08:20:35 +08:00
3a7bc3a7a8 [doc](retention) optimize retention doc (#19692) 2023-05-17 08:17:45 +08:00
0d11c4207a [docs](struct-type) add docs for struct and named_struct function (#19700) 2023-05-17 08:16:33 +08:00
Pxl
7f73749b88 [Bug](pipeline) fix distributionColumnIds not updated correct when outputColumnUnique… (#19704)
fix distributionColumnIds not updated correct when outputColumnUnique
2023-05-17 00:13:10 +08:00
a1b1aff0ee [improvement](jdbc catalog) Adapt to hana's special view & Optimize jdbc name format (#19696) 2023-05-16 23:29:30 +08:00
8f8814e49c [bugfix](be core) master info is deconstructed before fragment mgr and be will core (#19687)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-05-16 21:55:15 +08:00
fe553f7dfc [chore](third-party) Support specifying packages to build (#19688)
Usage: ./build-thirdparty.sh [options...] [packages...]
  Optional options:
     -j <num>               build thirdparty parallel
     --clean                clean the extracted data
     --continue <package>   continue to build the remaining packages (starts from the specified package)

Examples:
1. Specify packages to build.
    Build gflags, gtest and glog by executing ./build-thirdparty.sh gflags gtest glog.
2. Continue to build the remaining packages.
    Build the remaining packages (starts from sse2neon) by executing ./build-thirdparty.sh --continue sse2neon.
2023-05-16 19:24:19 +08:00
12c21287a5 [docs](struct-type) Add docs for struct type (#19694) 2023-05-16 19:13:27 +08:00
16f5d3d5b3 [Improvement](memory) new page use Allocator (#19472) 2023-05-16 19:09:17 +08:00
92a533724c [enhancement](merge-on-write) avoid unecessary pk index iteration (#19620) 2023-05-16 17:05:14 +08:00
325a1d4b28 [vectorized](function) support array_count function (#18557)
support array_count function.
array_count:Returns the number of non-zero and non-null elements in the given array.
2023-05-16 17:00:01 +08:00
e22f5891d2 [WIP](row store) two phase opt read row store (#18654) 2023-05-16 13:21:58 +08:00
610f1c8ef5 [improvement](load) skip compression when memtable is small (#19300)
* [improvement](load) skip compression when memtable is small

* format
2023-05-16 12:08:41 +08:00
3f2d1ae9a4 [feature-wip](multi-catalog)(step1)support connect to max compute (#19606)
Issue Number: #19679

support connect to max compute metadata by odps sdk
2023-05-16 11:30:27 +08:00
9cd7005dec [fix](delete) notify all when there is no high priority task (#19577)
In somecases high priority threads are waked but normal are not. We
notify_all as a workaround.
2023-05-16 11:29:10 +08:00
Pxl
b927f8cd37 [Chore](asan) change asan_suppr from interceptor_via_lib to interceptor_via_fun (#19636)
change asan_suppr from interceptor_via_lib to interceptor_via_fun
2023-05-16 10:51:43 +08:00
ddcf7ec1b4 [chore](third-party) Don't link keyutils to krb5 explicitly (#19632)
We may link system-wide keyutils to krb5 when building krb5 which may introduce an extra dependency to the codebase.
2023-05-16 10:37:37 +08:00
9cede6d763 [fix](row-policy) row policy supports external catalog (#19570)
Row policy support external catalog
2023-05-16 08:54:06 +08:00
9535ed01aa [feature](tvf) Support compress file for tvf hdfs() and s3() (#19530)
We can support this by add a new properties for tvf, like :

`select * from hdfs("uri" = "xxx", ..., "compress_type" = "lz4", ...)`

User can:

Specify compression explicitly by setting `"compression" = "xxx"`.
Doris can infer the compression type by the suffix of file name(e.g. `file1.gz`)
Currently, we only support reading compress file in `csv` format, and on BE side, we already support.
All need to do is to analyze the `"compress_type"` on FE side and pass it to BE.
2023-05-16 08:50:43 +08:00
8284c342cb [Fix](multi-catalog) Fix query hms tbl with compressed data files. (#19557)
If a hms table's file format is csv, uncompressed data files may be coexists with compressed data files, so we need to set compressType separately.
2023-05-16 08:49:45 +08:00
e48524009d [doc](fqdn)fqdn doc en (#19634) 2023-05-16 08:48:34 +08:00
8ec18660fe [improvement](FQDN)Remove unused code (#19638) 2023-05-16 08:48:20 +08:00
e2b8c0004b [Fix](lazy_open) Fix dead lock in lazy open (#19652) 2023-05-15 23:18:33 +08:00
6c9c9e9765 [feature-wip](resource-group) Supports memory hard isolation of resource group (#19526) 2023-05-15 22:45:46 +08:00
276e631e9c [chore](ddlExecutor) log class of unknown stmt in DdlExecutor (#19631)
* [chore](ddlExecutor) log class of unknown stmt in DdlExecutor
2023-05-15 21:59:44 +08:00
643db55a78 [improvement](thread) stop threads when BE exit gracefully (#19506) 2023-05-15 21:54:21 +08:00
ac9e92e1aa [typo](docs) Optimize mac compilation documentation (#19629) 2023-05-15 20:34:47 +08:00
0a28959675 [config](mem) change default mem_limit from 90% to 80% (#19602)
With the default config of 90%, be may meet OOM when the load pressure is big.
when set to 80%, be works well with the same load pressure in my cluster.
2023-05-15 17:48:43 +08:00
fad9237d30 [fix](storage) consider file size on page cache key (#19619)
The core is due to a DCHECK:

F0513 22:48:56.059758 3996895 tablet.cpp:2690] Check failed: num_to_read == num_read
Finally, we found that the DCHECK failure is due to page cache:

1. At first we have 20 segments, which id is 0-19.
2. For MoW table, memtable flush process will calculate the delete bitmap. In this procedure, the index pages and data pages of PrimaryKeyIndex is loaded to cache
3. Segment compaction compact all these 10 segments to 2 segment, and rename it to id 0,1
4. Finally, before the load commit, we'll calculate delete bitmap between segments in current rowset. This procedure need to iterator primary key index of each segments, but when we access data of new compacted segments, we read data of old segments in page cache
To fix this issue, the best policy is:

1. Add a crc32 or last modified time to CacheKey.
2. Or invalid related cache keys after segment compaction.
For policy 1, we don't have crc32 in segment footer, and getting the last-modified-time needs to perform 1 additional disk IO.
For policy 2, we need to add additional page cache invalidation methods, which may cause the page cache not stable

So I think we can simply add a file size to identify that the file is changed.
In LSM-Tree, all modification will generate new files, such file-name reuse is not normal case(as far as I know, only segment compaction), file size is enough to identify the file change.
2023-05-15 17:16:31 +08:00
c87e78dc35 [bug](jsonb) fix jsonb query bug When the json key value contains "." (#19185)
Issue Number: close #19173

mysql> SELECT jsonb_extract('{"a.b.c":{"k1":"v31", "k2.a1": 300},"a":"opentelemetry"}', '$."a.b.c".k1');
+-------------------------------------------------------------------------------------------+
| jsonb_extract('{"a.b.c":{"k1":"v31", "k2.a1": 300},"a":"opentelemetry"}', '$."a.b.c".k1') |
+-------------------------------------------------------------------------------------------+
| "v31" |
+-------------------------------------------------------------------------------------------+
1 row in set (0.06 sec)
2023-05-15 15:43:12 +08:00
052c7cff89 [Fix](Planner) fix cast from decimal to boolean (#19585) 2023-05-15 15:13:16 +08:00
Pxl
2a02561863 [Bug](ubsan) fix some wrong downcast founded by ubsan (#19591)
fix some wrong downcast founded by ubsan.
```cpp
doris/be/src/olap/bloom_filter_predicate.h:43:32: runtime error: downcast of address 0x7f8ec2b691a0 which does not point to an object of type 'doris::BloomFilterColumnPredicate<doris::TYPE_DATE>::SpecificFilter' (aka 'BloomFilterFunc<(doris::PrimitiveType)11U>')
0x7f8ec2b691a0: note: object is of type 'doris::BloomFilterFunc<(doris::PrimitiveType)12>'
 e5 55 00 00  10 74 58 42 e5 55 00 00  00 00 10 00 8e 7f 00 00  20 07 6f cc 8e 7f 00 00  80 fe 68 cc
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'doris::BloomFilterFunc<(doris::PrimitiveType)12>'  
```
1. TYPE_DATE/TYPE_DATETIME have same data format, so I change the cast about bloom filter to reinterpret cast.
```cpp
doris/be/src/vec/exec/format/orc/vorc_reader.h:281:17: runtime error: downcast of address 0x7f562f4c3180 which does not point to an object of type 'ColumnVector<int>'
0x7f562f4c3180: note: object is of type 'doris::vectorized::ColumnDecimal<doris::vectorized::Decimal<int> >'
 74 65 00 00  20 91 70 f5 ca 55 00 00  02 00 00 00 00 00 00 00  f0 d4 4c 2f 56 7f 00 00  f0 d4 4c 2f
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'doris::vectorized::ColumnDecimal<doris::vectorized::Decimal<int> >'
```
2. doris use ColumnDecimal to store decimal elements.
2023-05-15 14:27:48 +08:00
69243b3a57 [fix](Nereids): SemiJoinLogicalJoinTranspose shouldn't throw error when eliminate outer failed. (#19566) 2023-05-15 12:31:54 +08:00