Commit Graph

9720 Commits

Author SHA1 Message Date
4db2ba226b [chore](regression) prevent creating stmt failed in cold heat separation regression case #18391
Previously in cold_heat_separation regression, it just tries to create resources/policies. Sometimes if the former cases failed or BE crashed when doing cases the resources would not be cleared so the next time invoking this regression cases would result in failure.
2023-04-06 10:01:04 +08:00
f28c75bd80 [fix](file_reader) bad_typeid when reading csv&json files (#18400)
PR(#18340) resolve the conflict with PR(#18301) has changed the file_reader to create, resulting in e: [E-123] std::bad_typeid exception.
2023-04-06 10:00:29 +08:00
d0219180a9 [feature-wip](multi-catalog)add properties converter (#18005)
Refactor properties of each cloud , use property converter to convert properties accessing fe
metadata and be data.
user docs #18287
2023-04-06 09:55:30 +08:00
66a0c090b8 [fix](column) Add unimplemented replicate function in ColumnStruct (#18368) 2023-04-06 09:50:27 +08:00
d305c459a1 [doc](datetimefunction)Supplement the description and case of days_diff (#18244) 2023-04-06 09:04:08 +08:00
60bad33e7e [fix](nereids) explain shape refactor #18399
previous pr 18296 has a bug when parse SHAPE_PLAN.
2023-04-06 08:55:05 +08:00
d12c4c6361 Small typos in docker run commands. (#18288) 2023-04-05 22:27:53 +08:00
cbbad5d95c [typo](doc)Update SHOW-PROC.md and SHOW-CATALOGS.md (#18398) 2023-04-05 22:24:35 +08:00
47aa8a6d8a [fix](file_cache) turn on file cache by FE session variable (#18340)
Fix tow bugs:
1. Enabling file caching requires both `FE session` and `BE` configurations(enable_file_cache=true) to be enabled.
2. `ParquetReader` has not used `IOContext` previously, but `CachedRemoteFileReader::read_at` needs `IOContext` after PR(#17586).
2023-04-05 15:51:47 +08:00
Pxl
0a4381197a [Bug](MTMV) fix waitingMTMVTaskFinished failed at test_mtmv_ssb_ddl (#18373)
fix waitingMTMVTaskFinished failed at test_mtmv_ssb_ddl
2023-04-05 11:04:41 +08:00
1ec400c786 [fix](SSL) fix ssl connection buffer overflow (#18359) 2023-04-05 08:42:41 +08:00
668031986b Update install-faq.md (#18385) 2023-04-05 08:38:06 +08:00
4edf2acc81 [typo](doc)Fixing broken links in docker cluster docs, improving formatting. (#18290) 2023-04-05 08:36:47 +08:00
ea60d65384 [Improvement](multi catalog)Move split size config to session variable (#18355)
Move split size config to session variable. Before, it was in Config class, user need to restart FE after change it.
2023-04-05 01:02:47 +08:00
7f8d92656e [fix](streamload) fix stream load failed when enable profile (#18364)
#18015 enables stream load profile log,  however be will encounter rpc fail when loading tpch data(see #18291). This is because when `is_report_success` is true, be will reportExecStatus to fe, but fe cannot find QueryInfo in `coordinatorMap`, thus it will return error to be.
2023-04-05 01:01:46 +08:00
d8b293de07 [fix](multi-catalog) add catalog info for show proc (#18276)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-04-04 22:49:22 +08:00
7c36bef6bc [Feature-Wip](MySQL Load)Show load warning for my sql load (#18224)
1. Support the show load warnings for mysql load to get the detail error message.
2. Fix fillByteBufferAsync not mark the load as finished in same data load
3. Fix drain data only in client mode.
2023-04-04 22:44:48 +08:00
e29fc3b46b [fix](chore) fix compile failed in JdbcExecutor and revert #18306 since be crash randomly (#18371)
fix 2 problems:
1. PR #18187 use the api resizeColumn in JNINativeMethod has been removed by #17960
2. revert PR #18306 to fix pipeline core when load
2023-04-04 20:04:28 +08:00
66bfd18601 [opt](file_reader) add prefetch buffer to read csv&json file (#18301)
Co-authored-by: ByteYue <[yj976240184@gmail.com](mailto:yj976240184@gmail.com)>
This PR is an optimization for https://github.com/apache/doris/pull/17478:
1. Change the buffer size of `LineReader` to 4MB to align with the size of prefetch buffer.
2. Lazily prefetch data in the first read to prevent wasted reading.
3. S3 block size is 32MB only, which is too small for a file split. Set 128MB as default file split size.
4. Add `_end_offset` for prefetch buffer to prevent wasted reading.

The query performance of reading data on object storage is improved by more than 3x+.
2023-04-04 19:05:22 +08:00
d7623028e9 [doc](developer-guide) add some debug tricks to dev-guide (#18225)
add method to debug core-dump file in vscode. and some BE debug tricks.
2023-04-04 17:10:34 +08:00
3fc8c19735 [improve](nereids)compute statsRange.length() according to the column datatype (#18331)
we map date/datetime/V2 to double. this map reserves date order, but it does not reserve range length.
For example, from 1990-01-01 to 1991-01-01, there are 12 months. for filter `A < 1990-02-01`, the selectivity
should be `1/12`.

if we compute this filter by their corresponding double value,
`sel = (19900201 - 19900101) / (19910101 - 19900101) = 100/10000 = 1/100`

the error is about 10 times.
This pr aims to fix this error.
Describe your changes.

Solution:
convert double to its corresponding dataType(date/datev2), then compute the range length with respect to its datatype.
2023-04-04 14:20:34 +08:00
175e5d405c [improvement](merge-on-write) remove CHECK if lookup_row_key return unexpected status (#18326) 2023-04-04 12:42:07 +08:00
87e83081ff [test](compaction) add delete test (#18335) 2023-04-04 12:28:19 +08:00
0cada3f81d [Enhancement](compaction) return error instead of core when ctx not valid (#18363) 2023-04-04 12:27:13 +08:00
54dbb4af67 [vectorzied](jdbc) refactor jdbc table read array type (#18187)
jdbc read array type get result from Doris is string, PG is java.sql.array, CK is java.lang.object
it's difficult to maintain and read the code,
so change all database's array result to string, then add a cast function from string to doris array type
2023-04-04 11:57:04 +08:00
418ea0a24e [fix](merge-on-write) fix that failed to capture_consistent_rowsets when full clone (#18346)
When full clone, if the max version of the local table is less than or equal to the max version of the clone table, there is no need to calculate the delete bitmap again.
2023-04-04 10:39:28 +08:00
2a301eb437 [deps](arrow) update arrow download link (#18360) 2023-04-04 10:39:04 +08:00
50e6c4216a [vectorized](function) suppoort date_trunc function truncate week mode (#18334)
support date_trunc could truncate week eg:
select date_trunc('2023-4-3 19:28:30', 'week');
2023-04-04 10:24:26 +08:00
a724443eb9 [Improvement](predicate) optimize short-circuit predicates (#18278)
For scan node with no vectorized predicate, the input column for the first short-circuit predicate is dense and we don't need to access the selector column.

This PR improve performance by ~30% on TPCH Q3.
2023-04-04 10:21:41 +08:00
6231ca80f7 [improve](clickhouse catalog) Add " wrap select column for the sql query clickhouse jdbc (#18352) 2023-04-04 10:19:24 +08:00
af80e65094 [Improve](FileCahe) Support the file cache profile in olap scan node and Update the profile (#17710)
We want to use file cache for caching cold data in S3.
When reading them, we want to know where the data come from and the time taken to read the datas.
So we support the metrics in olap scan node.
And for clearing the information, i also update the fields about the metrics.
2023-04-04 10:18:30 +08:00
3e7a9424e4 [feature](nereids) explain shape plan (#18296)
`explain shape plan select ...`
only print plan shape related information, including
- node name
- join type, join condition
- filter condition
- agg phase

It is painful to maintain regression cases using explain since there are a lot of mutable information, like slot id.
By this pr, we could use explain shape plan in regression cases.

for exmaple:
this is tpch q2
+-----------------------------------------------------------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------------------------------------------------------+
| PhysicalTopN |
| --PhysicalDistribute |
| ----PhysicalTopN |
| ------PhysicalProject |
| --------filter((cast(ps_supplycost as DECIMAL(27, 9)) = min(ps_supplycost) OVER(PARTITION BY p_partkey))) |
| ----------PhysicalWindow |
| ------------PhysicalQuickSort |
| --------------PhysicalProject |
| ----------------hashJoin[INNER_JOIN](supplier.s_suppkey = partsupp.ps_suppkey) |
| ------------------PhysicalProject |
| --------------------hashJoin[INNER_JOIN](part.p_partkey = partsupp.ps_partkey) |
| ----------------------PhysicalProject |
| ------------------------PhysicalOlapScan[partsupp] |
| ----------------------PhysicalProject |
| ------------------------filter((part.p_size = 15)(p_type like '%BRASS')) |
| --------------------------PhysicalOlapScan[part] |
| ------------------PhysicalDistribute |
| --------------------hashJoin[INNER_JOIN](supplier.s_nationkey = nation.n_nationkey) |
| ----------------------PhysicalOlapScan[supplier] |
| ----------------------PhysicalDistribute |
| ------------------------hashJoin[INNER_JOIN](nation.n_regionkey = region.r_regionkey) |
| --------------------------PhysicalProject |
| ----------------------------PhysicalOlapScan[nation] |
| --------------------------PhysicalDistribute |
| ----------------------------PhysicalProject |
| ------------------------------filter((region.r_name = 'EUROPE')) |
| --------------------------------PhysicalOlapScan[region] |
+-----------------------------------------------------------------------------------------------------------+
2023-04-04 09:44:15 +08:00
798d2e5160 [fix](catalog) all properties should be checked when create unpartitioned table (#18149)
all properties should be checked when create unpartitioned table like partitioned table.



Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-04-04 08:53:45 +08:00
8b85c55117 [vectorized](function) Support array_shuffle and shuffle function. (#18116)
---------

Co-authored-by: zhangyu209 <zhangyu209@meituan.com>
2023-04-04 08:53:13 +08:00
eb0fd0017e [Fix](orc-reader) Fix the scale of decimal column is incorrect when query orc tables. (#18324)
The scale of decimal column is incorrect when query orc tables.
2023-04-04 08:50:47 +08:00
fc407f4afe [improvement](executor) Reduce ScannnerCtx Scheduling times (#18306)
* remove sche in scan operator
2023-04-03 22:54:34 +08:00
88c5e64c4a [fix](nereids) fix bug of SelectMaterializedIndexWithAggregate rule (#18265)
1. create a project node to adjust the output column position when a mv is selected in olap scan node
2. pass SlotReference's column info when call Alias's toSlot() method
3. should compare plan's logical properties when compare two plans after rewrite
2023-04-03 22:32:43 +08:00
1e51af0784 [fix](scan) Avoid using incorrect cache code in ComparisonPredicate (#18332)
* [fix](scan) Avoid using incorrect cache code in ComparisonPredicate

* recovery the regression test
2023-04-03 20:37:35 +08:00
dd78001cc1 [fix](memory) Fix memtable flush mem tracker #18330 2023-04-03 20:37:14 +08:00
fe9d2b00fc [test](jdbc catalog) add clickhouse jdbc catalog base type test (#18007) 2023-04-03 20:18:36 +08:00
b627088e8c [Optimization](String) Optimize q20 q21 q22 q23 LIKE_SUBSTRING (like '%xxx%') (#18309)
Optimize q20, q21, q22, q23 LIKE_SUBSTRING (like '%xxxx%'). Idea is from clickhouse stringsearcher:

Stringsearcher is about 10%~20% faster than volnitsky algorithm when needle size is less than 10 using two chars at beginning search in SIMD .
Stringsearcher is faster than volnitsky algorithm, when needle size is less than 21.
The changes are as follows:

Using first two chars of needle at beginning search. We can compare two chars of needle and [n:n+17) chars in haystack in SIMD in one loop. Filter efficiency will be higher.
When env support SIMD, we use stringsearcher.
Test result in clickbench:

q20 is about 15% up.
q20: SELECT COUNT(*) FROM hits WHERE URL LIKE '%google%';
q21, q22 is about 1%~5% up.
q21: SELECT SearchPhrase, MIN(URL), COUNT(*) AS c FROM hits WHERE URL LIKE '%google%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;
q22: SELECT SearchPhrase, MIN(URL), MIN(Title), COUNT(*) AS c, COUNT(DISTINCT UserID) FROM hits WHERE Title LIKE '%Google%' AND URL NOT LIKE '%.google.%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;
q23 is about 30%~40% up and not stable.
q23: SELECT * FROM hits WHERE URL LIKE '%google%' ORDER BY EventTime LIMIT 10;
2023-04-03 18:09:15 +08:00
eb6dbc03e0 [typo](docs) add regression test doc & fix api doc (#18329) 2023-04-03 17:40:41 +08:00
d4688620e9 [opt](array) optimize array_sortby using qsort instead of bubble sort #18311 2023-04-03 17:10:51 +08:00
96a64dc9e8 [Improvement](pipeline) Use bloom runtime filter by default for pipeline engine (#18177) 2023-04-03 15:31:48 +08:00
368a2f7ace [Bug](decimal) Fix string to decimal (#18282) 2023-04-03 15:30:48 +08:00
3078ee1854 [regression](decimalv3)Add decimal type as filter condition in regression test (#17160)
Add decimal type as filter condition in regression test
2023-04-03 14:20:09 +08:00
aff260c06f [Enhancement](HttpServer) Support https interface (#16834)
1. Organize http documents
2. Add http interface authentication for FE
3. **Support https interface for FE**
4. Provide authentication interface
5. Add http interface authentication for BE
6. Support https interface for BE
2023-04-03 14:18:17 +08:00
ecd3fd07f6 [feature](colocate) support cross database colocate join (#18152) 2023-04-03 14:03:42 +08:00
e260dca7a1 [Improvement](multi catalog)Change hive metastore cache split value type to Doris defined Split. Fix split file length -1 bug (#18319)
HiveMetastoreCache type for file split was Hadoop InputSplit. In this pr, change it to Doris defined Split
This change could avoid convert it every time.
Also fix the explain verbose result return -1 for split file length.
2023-04-03 13:54:28 +08:00
6677841b7e [fix](merge-on-write) fix that failed to capture_consistent_rowsets when revise tablet meta (#18283)
Should modify _timestamped_version_tracker firstly before capture_consistent_rowsets when update delete bitmap in revise_tablet_meta.
2023-04-03 13:02:34 +08:00