Commit Graph

7952 Commits

Author SHA1 Message Date
487d159a3d [improvement](test) add one case for hll (#15543) 2023-01-01 11:02:34 +08:00
50f1931f96 [fix](multi-catalog) get dictionary-encode from parquet metadata (#15525) 2022-12-31 19:08:10 +08:00
e89adc6e1d [fix](create-table) wrong judgement about partition column type (#15542)
The following stmt should be success, but return error: `complex type cannt be partition column:ARRAY<VARCHAR(64)>`

```
create table test_array( 
task_insert_time BIGINT NOT NULL DEFAULT "0" COMMENT "" , 
task_project ARRAY<VARCHAR(64)>  DEFAULT NULL COMMENT "" ,
route_key DATEV2 NOT NULL COMMENT "range分区键"
) 
DUPLICATE KEY(`task_insert_time`)  
 COMMENT ""
PARTITION BY RANGE(route_key) 
(PARTITION `p202209` VALUES LESS THAN ("2022-10-01"),
PARTITION `p202210` VALUES LESS THAN ("2022-11-01"),
PARTITION `p202211` VALUES LESS THAN ("2022-12-01")) 
DISTRIBUTED BY HASH(`task_insert_time` ) BUCKETS 32 
PROPERTIES
(
    "replication_num" = "1",    
    "light_schema_change" = "true"    
);
```

This PR fix this
2022-12-31 13:10:39 +08:00
c47bdf6606 [vectorized](jdbc) fix external table of oracle have keyworld column (#15487)
if column name is keyword of oracle, the query will report error
2022-12-31 12:48:26 +08:00
781fa17993 [fix](Nereids) round function return type should be double (#15502) 2022-12-30 23:36:15 +08:00
96518db263 [enhencement](Nereids) remove constant expr constraint on OneRowRelation (#15506) 2022-12-30 23:35:15 +08:00
100834df8b [fix](nereids) fix some arrgregate bugs in Nereids (#15326)
1. the agg function without distinct keyword should be a "merge" funcion in threePhaseAggregateWithDistinct
2. use aggregateParam.aggMode.consumeAggregateBuffer instead of aggregateParam.aggPhase.isGlobal() to indicate if a agg function is a "merge" function
3. add an AvgDistinctToSumDivCount rule to support avg(distinct xxx) in some case
4. AggregateExpression's nullable method should call inner function's nullable method.
5. add a bind slot rule to bind pattern "logicalSort(logicalHaving(logicalProject()))"
6. don't remove project node in PhysicalPlanTranslator
7. add a cast to bigint expr when count( distinct datelike type )
8. fallback to old optimizer if bitmap runtime filter is enabled.
9. fix exchange node mem leak
2022-12-30 23:07:37 +08:00
cc7a9d92ad [refactor](non-vec) remove non vec code for indexed column reader (#15409) 2022-12-30 23:01:54 +08:00
9bba2f4cde [typo](docs) array function doc fix (#15449) 2022-12-30 23:00:48 +08:00
9c3c9db49b [enhancement](fuzzy test) support fuzzy test of RewriteOrToInPredicateThreshold #15469
Co-authored-by: wuhangze <wuhangze@jd.com>
2022-12-30 22:59:59 +08:00
ad68764977 [enhancement](tablet) Unify redundant create_rowset_writer methods (#15519)
* Remove redundant create_rowset_writer methods

* Set resource id when setting FS in rowset meta

* fix

* fix ut
2022-12-30 22:57:12 +08:00
edecc2e706 [feature-wip](inverted index) API for inverted index reader and syntax for fulltext match (#14211)
* [feature-wip](inverted index)inverted index api: reader

* [feature-wip](inverted index) Fulltext query syntax with MATCH/MATCH_ALL/MATCH_ALL

* [feature-wip](inverted index) Adapt to index meta

* [enhance] add more metrics

* [enhance] add fulltext match query check for column type and index parser

* [feature-wip](inverted index) Support apply inverted index in compound predicate which except leaf node of and node
2022-12-30 21:48:14 +08:00
b23d068281 [refactor](remove-non-vec) Remove non vec load from memtable and delta writer (#15517)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-12-30 21:22:58 +08:00
aacd11336a [typo](docs)update java udf demo (#15521) 2022-12-30 21:12:34 +08:00
aeaa319203 [fix](fe)change session variable group_concat_max_len from int to long (#15515) 2022-12-30 20:45:44 +08:00
ec52907b06 [fix](index) fix wrong dcheck in indexed column writer (#15520) 2022-12-30 20:12:41 +08:00
8e58d92e77 [typo](docs) fix document info missing in SHOW-TABLETS.md (#15488) 2022-12-30 18:39:21 +08:00
084eec87ee [docs](docs)update en docs (#15470)
* Update basic-summary.md
2022-12-30 18:38:26 +08:00
a7895ba169 [feature](Nereids): Support variance_samp function. (#15500) 2022-12-30 17:32:06 +08:00
34d7eeb571 [doc](session variable) add doc content for adding variables called rewrite_or_to_in_predicate_threshold (#15513)
Co-authored-by: wuhangze <wuhangze@jd.com>
2022-12-30 17:11:45 +08:00
93a25e1af5 [fix](nereids) the project node is lost when creating PhysicalStorageLayerAggregate node (#15467) 2022-12-30 16:33:24 +08:00
08d4dcefff [typo](doc)data partition doc including en and zh-CN #15379
Co-authored-by: Chen Jinquan 陈金泉 (690) <chenjinq@haier.com>
2022-12-30 15:38:25 +08:00
dec1eb360c [fix](brokerload) be core dump caused by broker load orc format file nullptr pointer (#15460) 2022-12-30 15:37:33 +08:00
2f572ccc43 [fix](index) fix that the last element of each batch will be read repeatedly for binary prefix page (#15481) 2022-12-30 15:36:55 +08:00
9246e03932 [Enhancement](hdfs) make libhdfs3 compatible with hdfs2 server (#15497)
When doris be getFileStatus from HDFS2 server, libhdfs3 will throw exception because of the permission code returned by hdfs2 server is greater than 1<<12.
The bit 12 of permission code is aclBit which has been deprecated in hadoop3. so we remove the check code in libhdfs3, same as hadoop3 java project.
2022-12-30 15:36:39 +08:00
2704651fde [fix](nereids) hll and bitmap type can't be used as order by and group by exprs (#15471)
hll, bitmap, array and quantile state type can't be used in order by, group by and some agg exprs.
2022-12-30 14:26:21 +08:00
5ec4e5586f [refactor]remove seek block in segmentIterator (#15413)
* remove seek block

* add reg test

Co-authored-by: Wang Bo <wangbo36@meituan.com>
2022-12-30 14:14:16 +08:00
520b6d7910 [Improvement](decimalv3) Add a config to check overflow for DECIMALV3 (#15463) 2022-12-30 14:02:24 +08:00
5db8b52441 [Fix](SparkLoad): fix the timeout aborted loadtasks are not cleaned up. (#15480)
Co-authored-by: spaces-x <weixiang06@meituan.com>
2022-12-30 14:02:00 +08:00
5c5b7a5c6f [Broker](bos) suppoert baidu bos object storage for broker (#15448) 2022-12-30 12:39:10 +08:00
2339dcda05 [fix](icebergv2)update icebergv2 regression case (#15442)
update icebergv2 regression case
Co-authored-by: jinzhe <jinzhe@selectdb.com>
2022-12-30 12:24:26 +08:00
917b266799 [fix](planner) table valued function could not used in subquery (#15496) 2022-12-30 10:01:25 +08:00
10be583e52 [chore](pipeline) optimize profile information (#15433) 2022-12-30 09:56:33 +08:00
2c8de30cce [optimize](multi-catalog) use dictionary encode&filter to process delete files (#15441)
**Optimize**
PR #14470 has used `Expr` to filter delete rows to match current data file,
but the rows in the delete file are [sorted by file_path then position](https://iceberg.apache.org/spec/#position-delete-files)
to optimize filtering rows while scanning, so this PR remove `Expr` and use binary search to filter delete rows.

In addition, delete files are likely to be encoded in dictionary, it's time-consuming to decode `file_path`
columns into `ColumnString`, so this PR use `ColumnDictionary` to read `file_path` column.

After testing, the performance of iceberg v2's MOR is improved by 30%+.

**Fix Bug**
Lazy-read-block may not have the filter column, if the whole group is filtered by `Expr`
and the batch_eof is generated from next batch.
2022-12-30 08:57:55 +08:00
85c7c531f1 [vectorized](jdbc) support array type in jdbc external table (#15303) 2022-12-30 00:29:08 +08:00
edb9a3b58d [Bug](timediff) Fix wrong result for function timediff (#15312) 2022-12-30 00:28:51 +08:00
9a517d6a8f [DataType](Deciamlv3) change the avg function scale of decimalv3 (#15445) 2022-12-30 00:27:51 +08:00
73f7ccb58f [typo](docs) fix document display error in SHOW-ALTER.md and SHOW-PARTITION-ID.md and SHOW-PARTITIONS.md (#15453) 2022-12-30 00:27:22 +08:00
3ff01ca799 [feature-wip](multi-catalog) support Iceberg time travel in external table (#15418)
For example
SELECT* FROM tbl FOR VERSION AS OF 10963874102873;
SELECT* FROM tbl FOR TIME AS OF '1986-10-26 01:21:00';
2022-12-30 00:25:21 +08:00
6c847daba0 [Feature](Nereids) Support grouping set for materialized index. (#15383)
This PR adds support for materialized index selecting when the query has grouping sets.
2022-12-29 23:17:02 +08:00
dda505487c [fix](nereids) SimplifyArithmeticRuleTest ut failed (#15486)
this PR remove typeCoercion on expected expr in ExpressionRewriteTestHelper. Because we should not rewrite expected expr at all. It will change the expected expr unexpectedly.
2022-12-29 22:53:27 +08:00
bb305aa572 [chore](badges) Remove daily test badges for origin engine (#15482)
The code of origin engine will be remove later, we already stop the daily test for origin engine, so we should remove this badges from home page.
2022-12-29 21:25:15 +08:00
c54c2f8035 [fix](statistics) fix npe when __internal_schema not created (#15464) 2022-12-29 21:24:33 +08:00
9b371f6b0b [fix](web ui) fix fe web ui (#14887) 2022-12-29 21:19:44 +08:00
79113b0cd1 [Fix](storage) Fix bug that cooldown time is error (#15444)
Cooldown time is wrong for data in SSD, because cooldown time for all `table/partitionis`
is only calculated once when class `DataProperty` loaded and that cannot be updated later.
This patch is to ensure that cooldown time for each table/partition can be calculated in real time
when table/partition is created.
Co-authored-by: weizuo <weizuo@xiaomi.com>
2022-12-29 21:01:36 +08:00
e651a9bb11 [feature](nereids) add variance function for nereids (#15370)
support variance function. currently, it dose not support decimalV3 type
2022-12-29 18:33:52 +08:00
43c8e7b465 [chore](thirdparty) Support cleaning extracted data before building them (#15458)
Currently, we may fail to build the third-party libraries if we keep the outdated extracted data.

Considering the following scenario, Bob added patches to some libraries and Alice updates the codebase and builds 
the third-party libraries. If Alice kept the outdated extracted data, she should fail to build the third-party libraries 
because the patches are not applied due to the outdated `patched_marks`.

This PR introduces a way to clean the outdated data before building the third-party libraries.
2022-12-29 16:01:23 +08:00
c22ba8e160 [Bug](Decimalv3) coredump of decimalv3 multiply (#15452) 2022-12-29 15:35:17 +08:00
89e2fb4301 [docs](readme)update the readme.md (#15465) 2022-12-29 14:51:17 +08:00
25b257e37c [enhancement](session var) varariable to control whether to rewrite OR to IN or not (#15437) 2022-12-29 14:50:32 +08:00