Commit Graph

8276 Commits

Author SHA1 Message Date
5ec4e5586f [refactor]remove seek block in segmentIterator (#15413)
* remove seek block

* add reg test

Co-authored-by: Wang Bo <wangbo36@meituan.com>
2022-12-30 14:14:16 +08:00
520b6d7910 [Improvement](decimalv3) Add a config to check overflow for DECIMALV3 (#15463) 2022-12-30 14:02:24 +08:00
5db8b52441 [Fix](SparkLoad): fix the timeout aborted loadtasks are not cleaned up. (#15480)
Co-authored-by: spaces-x <weixiang06@meituan.com>
2022-12-30 14:02:00 +08:00
5c5b7a5c6f [Broker](bos) suppoert baidu bos object storage for broker (#15448) 2022-12-30 12:39:10 +08:00
2339dcda05 [fix](icebergv2)update icebergv2 regression case (#15442)
update icebergv2 regression case
Co-authored-by: jinzhe <jinzhe@selectdb.com>
2022-12-30 12:24:26 +08:00
917b266799 [fix](planner) table valued function could not used in subquery (#15496) 2022-12-30 10:01:25 +08:00
10be583e52 [chore](pipeline) optimize profile information (#15433) 2022-12-30 09:56:33 +08:00
2c8de30cce [optimize](multi-catalog) use dictionary encode&filter to process delete files (#15441)
**Optimize**
PR #14470 has used `Expr` to filter delete rows to match current data file,
but the rows in the delete file are [sorted by file_path then position](https://iceberg.apache.org/spec/#position-delete-files)
to optimize filtering rows while scanning, so this PR remove `Expr` and use binary search to filter delete rows.

In addition, delete files are likely to be encoded in dictionary, it's time-consuming to decode `file_path`
columns into `ColumnString`, so this PR use `ColumnDictionary` to read `file_path` column.

After testing, the performance of iceberg v2's MOR is improved by 30%+.

**Fix Bug**
Lazy-read-block may not have the filter column, if the whole group is filtered by `Expr`
and the batch_eof is generated from next batch.
2022-12-30 08:57:55 +08:00
85c7c531f1 [vectorized](jdbc) support array type in jdbc external table (#15303) 2022-12-30 00:29:08 +08:00
edb9a3b58d [Bug](timediff) Fix wrong result for function timediff (#15312) 2022-12-30 00:28:51 +08:00
9a517d6a8f [DataType](Deciamlv3) change the avg function scale of decimalv3 (#15445) 2022-12-30 00:27:51 +08:00
73f7ccb58f [typo](docs) fix document display error in SHOW-ALTER.md and SHOW-PARTITION-ID.md and SHOW-PARTITIONS.md (#15453) 2022-12-30 00:27:22 +08:00
3ff01ca799 [feature-wip](multi-catalog) support Iceberg time travel in external table (#15418)
For example
SELECT* FROM tbl FOR VERSION AS OF 10963874102873;
SELECT* FROM tbl FOR TIME AS OF '1986-10-26 01:21:00';
2022-12-30 00:25:21 +08:00
6c847daba0 [Feature](Nereids) Support grouping set for materialized index. (#15383)
This PR adds support for materialized index selecting when the query has grouping sets.
2022-12-29 23:17:02 +08:00
dda505487c [fix](nereids) SimplifyArithmeticRuleTest ut failed (#15486)
this PR remove typeCoercion on expected expr in ExpressionRewriteTestHelper. Because we should not rewrite expected expr at all. It will change the expected expr unexpectedly.
2022-12-29 22:53:27 +08:00
bb305aa572 [chore](badges) Remove daily test badges for origin engine (#15482)
The code of origin engine will be remove later, we already stop the daily test for origin engine, so we should remove this badges from home page.
2022-12-29 21:25:15 +08:00
c54c2f8035 [fix](statistics) fix npe when __internal_schema not created (#15464) 2022-12-29 21:24:33 +08:00
9b371f6b0b [fix](web ui) fix fe web ui (#14887) 2022-12-29 21:19:44 +08:00
79113b0cd1 [Fix](storage) Fix bug that cooldown time is error (#15444)
Cooldown time is wrong for data in SSD, because cooldown time for all `table/partitionis`
is only calculated once when class `DataProperty` loaded and that cannot be updated later.
This patch is to ensure that cooldown time for each table/partition can be calculated in real time
when table/partition is created.
Co-authored-by: weizuo <weizuo@xiaomi.com>
2022-12-29 21:01:36 +08:00
e651a9bb11 [feature](nereids) add variance function for nereids (#15370)
support variance function. currently, it dose not support decimalV3 type
2022-12-29 18:33:52 +08:00
43c8e7b465 [chore](thirdparty) Support cleaning extracted data before building them (#15458)
Currently, we may fail to build the third-party libraries if we keep the outdated extracted data.

Considering the following scenario, Bob added patches to some libraries and Alice updates the codebase and builds 
the third-party libraries. If Alice kept the outdated extracted data, she should fail to build the third-party libraries 
because the patches are not applied due to the outdated `patched_marks`.

This PR introduces a way to clean the outdated data before building the third-party libraries.
2022-12-29 16:01:23 +08:00
c22ba8e160 [Bug](Decimalv3) coredump of decimalv3 multiply (#15452) 2022-12-29 15:35:17 +08:00
89e2fb4301 [docs](readme)update the readme.md (#15465) 2022-12-29 14:51:17 +08:00
25b257e37c [enhancement](session var) varariable to control whether to rewrite OR to IN or not (#15437) 2022-12-29 14:50:32 +08:00
e2603ca883 [fix](docs) fix some docs about stream load and select. (#15372)
* [fix](docs) fix some docs about stream load and select.

* update
2022-12-29 14:50:06 +08:00
d95be84629 [enhancement](profile) add session variable parallel_fragment_exec_instance_num to profile (#15457) 2022-12-29 14:46:07 +08:00
657f3e6318 [fix](pipeline) disable sharing hashtable for broadcast join for pipeline engine (#15432) 2022-12-29 14:19:57 +08:00
2ae28ea9dd [typo](docs)fix-doc #15438 2022-12-29 14:19:24 +08:00
4179ea31bd [typo](docs) fix typo in SHOW-ALTER.md and SHOW-LOAD-WARNINGS.md (#15431) 2022-12-29 14:19:05 +08:00
4157d8c0a6 [Fix](Regression_test)add order by to increase test stability #15439 2022-12-29 14:18:51 +08:00
298c0a2391 [typo](docs)fix be dynamic configuration doc #15443 2022-12-29 14:18:14 +08:00
f5b4faf682 fix compile doc (#15454) 2022-12-29 14:17:53 +08:00
c5277bb4d6 [doc](label)update the label (#15455) 2022-12-29 14:17:40 +08:00
7ab6ea684b [Improvement](meta) hide password of show catalog xxx stmt and for es catalog (#15410)
* [Improvement](meta) hide password of show catalog xxx

* hide es password in show create ctlg and show ctlg xx stmt
2022-12-29 14:16:32 +08:00
5b09d27d54 [feature-wip](nereids) Made decimal in nereids more complete (#15087)
1. Add IntegralDivide operator to support `DIV` semantics
2. Add more operator rewriter to keep expression type consistent between operators
3. Support the convertion between float type and decimal type.

After this PR, below cases could be executed normaly like the legacy optimizer:
  use test_query_db;
  select k1, k5,100000*k5 from test order by k1, k2, k3, k4;
  select avg(k9) as a from test group by k1 having a < 100.0 order by a;
2022-12-29 13:01:47 +08:00
29492f0d6c [refactor](file-cache) refactor the file cache interface (#15398)
Refactor the usage of file cache

### Motivation

There may be many kinds of file cache for different scenarios.
So the logic of the file cache should be hidden inside the file reader,
so that for the upper-layer caller, the change of the file cache does not need to
modify the upper-layer calling logic.

### Details

1. Add `FileReaderOptions` param for `fs->open_file()`, and in `FileReaderOptions`
    1. `CachePathPolicy`
        Determine the cache file path for a given file path.
        We can implement different `CachePathPolicy` for different file cache.

    2. `FileCacheType`
        Specified file cache type: SUB_FILE_CACHE, WHOLE_FILE_CACHE, FILE_BLOCK_SIZE, etc.

2. Hide the cache logic inside the file reader.

    The `RemoteFileSystem` will handle the `CacheOptions` and determine whether to
    return a `CachedFileReader` or a `RemoteFileReader`.

    And the file cache is managed by `CachedFileReader`
2022-12-29 12:15:46 +08:00
93a7981427 [fix](brokerload) fix broker load parquet format file finished but no data actually (#15424)
broker openReader interface cannot get the file size, resulting in the file size being incorrectly set to 0
2022-12-29 12:04:19 +08:00
987970e8e3 [fix](multi catalog)Set column defualt value for query. (#15415)
Current column default value is used only for load task. But in the case of Iceberg schema change,
query task is also possible to read the default value for columns not exist in old schema.
This pr is to support default value for query task.

Manually tested the broker load and external emr regression cases.
2022-12-29 12:03:17 +08:00
304781d837 [Improvement](meta) update show create function result (#15414)
currently, show create function is designed for native function, it has some non suitable points.

this pr bring several improvements, and make result of show create function can be used to create function.

1. add type property.
2. add ALWAYS_NULLABLE perperty for java_udf
3. use file property rather than object_file for java_udf, follow usage of create java_udf
4. remove md5 property, coz file may vary when create function again.
5. remove INIT_FN,UPDATE_FN,MERGE_FN,SERIALIZE_FN etc properties for java_udf, cos java_udf does not need these properties.
2022-12-29 11:40:14 +08:00
ffef81a6ab [feature](BE)pad missed version with empty rowset (#15030)
If all replicas of one tablet are broken, user can use this http api to pad the missed version with empty rowset.
2022-12-29 11:20:44 +08:00
3146fc8189 [bug](jdbc) fix jdbc external table with char type length error (#15386)
Now have test pg and oracle with char(100), if data='abc'
but read string data length is 100, so need trim extral spaces
2022-12-29 11:19:03 +08:00
1f98dd2c74 [fix](Nereids) Generate is missing on alias query (#15416)
support table generating function on query alias, syntax as:
```sql
SELECT * FROM (SELECT * FROM tbl) tmp LATERAL VIEW explode(c1) gtmp AS ce;
```
2022-12-29 11:11:25 +08:00
0e154feeb9 [feature](multi catalog nereids)Add file scan node to nereids. (#15201)
Add file scan node to nereids, so that the new planner could support external hms table.
2022-12-29 10:31:11 +08:00
ad3b5cbf94 [fix](memtracker) Fix VDataStreamRecvr runtime state null pointer (#15451) 2022-12-29 09:36:03 +08:00
0f3c0b78e3 [Improvement](JSONB) improve performance JSONB initial json parsing using simdjson (#15219)
test data: https://data.gharchive.org/2020-11-13-18.json.gz, 2GB, 197696 lines
before: String 13s vs. JSONB 28s
after: String 13s vs. JSONB 16s

**NOTICE: simdjson need to be patched since BOOL is conflicted with a macro BOOL defined in odbc sqltypes.h**
2022-12-29 09:29:09 +08:00
1b1083eb52 [fix](metric) fix prometheus metric format error for doris_fe_query_latency_ms (#15447)
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2022-12-29 08:51:15 +08:00
6f200e0ef6 [fix](primary_index) not compare result in regression case: primary_index/test_pk_uk_case (#15436) 2022-12-28 22:14:20 +08:00
a22ee89431 [Enhancement](jemalloc):support heap dump by http request at runtime (#15429) 2022-12-28 20:10:50 +08:00
305dd15fea [improvement](index) Support bitmap index can be applied with compound predicate when enable vectorized engine query (#13035)
Current bitmap index only can apply pushed down predicates which in AND conditions. When predicates in OR conditions and other complex compound conditions, it will not be pushed down to the storage layer, this leads to read more data.

Based on that situation, this pr will do:

1. this pr in order to support bitmap index apply compound predicates, query sql like:
select * from tb where a > 'hello' or b < 100;
select * from tb where a > 'hello' or b < 100 or c > 'ok';
select * from tb where (a > 'hello' or b <100) and (a < 'world' or b > 200);
select * from tb where (not a> 'hello') or b < 100;
...
above sql,column a and b and c has created bitmap_index.

2. this optimization can reduce reading data by index
3. set config enable_index_apply_compound_predicates to use this optimization
2022-12-28 20:08:57 +08:00
4336aaa01a [bug](datetimev2) fix wrong info when show create table (#15422)
* [bug](datetimev2) fix wrong info when show create table

* update
2022-12-28 19:55:43 +08:00