Commit Graph

9635 Commits

Author SHA1 Message Date
Pxl
e7bcd970f5 [Bug](materialized-view) fix isDisableTuplesMVRewriter rreturn true when expr is literal (#18246)
fix isDisableTuplesMVRewriter rreturn true when expr is literal
2023-03-31 11:30:47 +08:00
8e15388074 [fix](Nereids): use CBO rule instead of using rewrite rule. (#18256) 2023-03-31 11:23:26 +08:00
1938899aa3 [regression-test] add grouping sets test case (#18194) 2023-03-31 11:00:38 +08:00
35bae25568 [Improve](row store) add more profile info in log for point query and make row column page size more configurable (#18181)
save about 20% FE cpu cost for point query with prepared statement which table contains 100 columns
2023-03-31 10:58:59 +08:00
7d92bf095a [fix](expr) refractor create_tree_from_thrift to avoid stack overflow (#18214) 2023-03-31 10:38:20 +08:00
479272f86f [bugfix](topn) fix topn optimzation wrong result for NULL values (#18121)
1. add PassNullPredicate to fix topn wrong result for NULL values
2. refactor RuntimePredicate to avoid using TCondition
3. refactor using ordering_exprs in fe and vsort_node
2023-03-31 10:02:07 +08:00
4e1e0ce06d [bugfix](topn) fix topn optimzation wrong result for NULL values (#18121)
1. add PassNullPredicate to fix topn wrong result for NULL values
2. refactor RuntimePredicate to avoid using TCondition
3. refactor using ordering_exprs in fe and vsort_node
2023-03-31 10:01:34 +08:00
8be43857ef [feature](executor) Add memory limit for pip_scanner_context (#18238)
Co-authored-by: wangbo <506340561@qq.com>
2023-03-31 09:36:57 +08:00
e5793249cd [opt](hashtable) Modify default filled strategy to 75% (#18242) 2023-03-31 09:28:11 +08:00
e0f6083e73 [refactor](dynamic table) add get_type_as_tprimitive_type and get_type_as_primitive_type in IDataType to get PrimitiveType and TPrimitiveType (#18260) 2023-03-31 09:03:06 +08:00
1abb19d0fd filter estimation refactor (#18170) 2023-03-31 08:49:38 +08:00
a88e80f8ee [fix](ssl)refactor some SSL info logs to debug logs (#18234) 2023-03-31 08:41:02 +08:00
b5ea299697 [fix](planner) Fix agg on inlineview which with constant slot (#18201)
Since slot that reference to constant has been marked as constant expr either, just add condition check to make sure such slot wouldn't be eliminated as constant from group exprs
2023-03-30 23:54:37 +08:00
28793b6441 [fix](Nereids): fix copyIn() in Memo when useless project with groupplan (#18223) 2023-03-30 23:49:21 +08:00
d6b0fe9072 [feature](jni) jni table scanner framework (#17960)
A framework that read data from jni scanner, which can support the data source from java ecosystem(java API).

## Java Interface
Java scanner should extends `org.apache.doris.jni.JniScanner`, implements the following methods:
```
// Initialize JniScanner
public abstract void open() throws IOException;
// Close JniScanner and release resources
public abstract void close() throws IOException;
// Scan data and save as vector table
public abstract int getNext() throws IOException;
```
See demo usage in `org.apache.doris.jni.MockJniScanner`

## c++ interface
C++ reader should use `doris::JniConnector` to get data from `org.apache.doris.jni.JniScanner`. See demo usage in `doris::MockJniReader`. 

## Pushed-down predicates
Java scanner can get pushed-down predicates by `org.apache.doris.jni.vec.ScanPredicate`.

## Remaining works:
1. Implement complex nested types.
2. Read hudi MOR table as the end-to-end demo usage.
2023-03-30 23:47:45 +08:00
0c2ff09fcf [Fix](multi-catalog) fix hms automatic update error. (#18252)
Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>
2023-03-30 23:09:07 +08:00
1d2dbe7898 [Bug][Pipeline] Run clickbench dead lock in pipeline exec engine (#18211)
In pipeline exec engine run clickbench may dead lock in some query
2023-03-30 21:41:57 +08:00
1050df7076 [fix](fs) fix local file system copy bug (#18243)
`copy_dirs` has a bug that will cause infinity iteration
2023-03-30 21:36:07 +08:00
322b51180f [Fix](regression-test)replace now to a fixed datetime. (#18253) 2023-03-30 21:04:09 +08:00
3d2c70f75d [fix](Nereids): fix merge_group(). (#18250) 2023-03-30 20:34:47 +08:00
fefc0d6814 [Fix](planner)fix create view ignore order by info bug. (#18197) 2023-03-30 20:17:46 +08:00
ce79ff947a [Enhancement](spark load)Support spark time out config (#17108) 2023-03-30 20:12:46 +08:00
99bd5ec022 [fix](Nereids) fix some bugs in Subquery to window rule (#18233)
we introduce this rule by PR #17968, but some corner case do
not be processed correctly. This PR fix these bugs:
1. fix window function generation method, replace inner slot with
   equivalent outer slot
2. forbid below scenes
    a. inner has a mapping project
    b. inner has an unexpected filter
    c. outer has a mapping project
    d. outer has an unexpected filter
    e. outer has additional table
    f. outer has same table
    g. outer and inner with different join condition
    h. outer and inner has same table with different join condition
2023-03-30 16:09:16 +08:00
ea41d94582 [Improve](complex-type) Support Count(complexType) (#17868)
Support count function for ARRAY/MAP/STRUCT type
2023-03-30 15:43:32 +08:00
e3bd812887 [fix](stream-load) find line delimiter in csv should start with no offset (#18161)
when loading big file with multi bytes line delimiter, some line record maybe incomplete because of _output_buf_limit, so this incomplete data will move to the beginning of the output buf and read more data into output buf. In this case, find line delimiter should start with no offset to avoid a bug that spilt two lines as one line.
2023-03-30 14:42:34 +08:00
f2d71a347a [chore](check) remove fe ut check from branch-1.1-lts (#18240) 2023-03-30 14:15:46 +08:00
b7af110f61 [Bug](bloomfilter) Fix bloom filter for date type (#18205) 2023-03-30 14:15:06 +08:00
Pxl
cec983b7ef [Chore](materialized-view) forbiden create mv with where clause contained aggregate column (#18168)
forbiden create mv with where clause contained aggregate column

create table a_table(
	k1 int null,
	k2 int not null,
    k3 bigint null,
	k4 bigint sum null,
    k5 bitmap bitmap_union null,
    k6 hll hll_union null
)
aggregate key (k1,k2,k3)
distributed BY hash(k1) buckets 3
properties("replication_num" = "1");
create materialized view where_1 as select k1,k4 from a_table where k4 =1; // invalid, mv on agg table need group by
create materialized view where_2 as select k1,sum(k4) from a_table where k4 =1 group by k1; // invalid, k4 is agg column
create materialized view where_2 as select k1,sum(k4) from a_table where k1+k4 =1 group by k1; // invalid, k4 is agg column
2023-03-30 13:03:03 +08:00
Pxl
c8ad62a3cd [Enchancement](materialized-view) enchance materialized view where clause match (#18179)
enchance materialized view where clause match
2023-03-30 13:02:21 +08:00
c8ea5bff1d [Fix](planner) fix nested udf bind arguments exception (#18188)
nested alias function will cause bind argument exception, sql like:
``` sql
CREATE ALIAS FUNCTION f1(DATETIMEV2(3), INT)
            with PARAMETER (datetime1, int1) as date_trunc(days_sub(datetime1, int1), 'day')

CREATE ALIAS FUNCTION f2(DATETIMEV2(3), int)
            with PARAMETER (datetime1, int1) as DATE_FORMAT(HOURS_ADD(
                date_trunc(datetime1, 'day'),
                add(multiply(floor(divide(HOUR(datetime1), divide(24,int1))), 1), 1)
            ), '%Y%m%d:%H')

select f2(f1(now(3), 2), 3)
```

bug in FunctionCallExpr#rewriteExpr(), the retExpr will be replaced to originExpr to change the alias function to builtin function, but the retExpr.fn is not null, so when return to outer scope, the fn will be covered. That's the example:
```
f1(f1()) -> date_trunc(days_sub(date_trunc(days_sub()))) is correct and
f1(f1()) -> date_trunc(days_sub(days_sub())) is bug.
``` 

we fix it.
2023-03-30 11:39:02 +08:00
b3657959c9 [fix](planner )need add LateralViewRef's id into TableRef's allTableRefIds (#18220)
1. add LateralViewRef's id into TableRef's allTableRefIds, so the caller won't miss LateralViewRef when trying to get all the tableref ids.
2. TableFunctionNode should use child node's output tuple id as the input tuple id
2023-03-30 11:32:18 +08:00
9c1aad06ea [Improve](query) improve column match performance by introducing a column name map in MaterializedIndexMeta (#18203)
improve column match performance by introducing a column name map in `MaterializedIndexMeta`
`getColumnByName` is slow due to the linear search process, using a map to speed up search.
2023-03-30 11:24:51 +08:00
525f15dddf [vectorized](function) support array_sortby function (#18071) 2023-03-30 11:07:49 +08:00
9877143210 [fix](like) fix wrong result of like pattern with backslash (#18039)
Result is empty for query select * from person where address like '%\\\\%';, but MySQL can get a line of result.

CREATE TABLE `person` (
  `id` int(11) NULL,
  `name` text NULL,
  `age` int(11) NULL,
  `class` int(11) NULL,
  `address` text NULL
) ENGINE=OLAP
UNIQUE KEY(`id`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`id`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2",
"disable_auto_compaction" = "false"
); 

insert into person values (10001,'test1',30,2,'test\\\\,xxx');
Adding logs:

select * from person where address like '%\\\\%';

I0323 10:26:15.907760 2387043 like.cpp:558] arg str: %\\%, size: 4, pattern LIKE_ENDS_WITH_RE: (?:%+)(((\\%)|(\\_)|([^%_]))+), size: 30
I0323 10:26:15.907789 2387043 like.cpp:562] match 0: \\%, size: 3
I0323 10:26:15.907801 2387043 like.cpp:562] match 1: \%, size: 2
I0323 10:26:15.907811 2387043 like.cpp:562] match 2: \%, size: 2
I0323 10:26:15.907821 2387043 like.cpp:562] match 3: , size: 0
I0323 10:26:15.907830 2387043 like.cpp:562] match 4: \, size: 1
I0323 10:26:15.907842 2387043 like.cpp:615] search_string : \\%
I0323 10:26:15.907855 2387043 like.cpp:619] search_string escape removed: \%
It matchs against the LIKE_ENDS_WITH_RE which is wrong, the meaning of the sql should be: match strings that have one backslash in any place.
2023-03-30 11:05:09 +08:00
a1114d46e8 [refactor](unify type system) remove switch case in histogram helper (#18222) 2023-03-30 10:54:08 +08:00
2ee1468576 [improvement](executor) Support task group schedule in pipeline engine (#17615) 2023-03-30 10:49:50 +08:00
c0d55b54c0 [Improvement](statistics) Support for statistics removing and incremental collection (#18069)
* Support for removing statistics and incremental collection

* Fix syntax
2023-03-30 10:40:43 +08:00
f9c4542d04 [chore](build) Porting to Clang-16 (#18196)
This PR ports the codebase to Clang-16.

Upgrade some third-party libraries:
1. Apache BRPC: 1.2.0 -> 1.4.0 (Some bugs are fixed and all patches for 1.2.0 can be removed.)
2. Boost: 1.73.0 -> 1.81.0 (Porting to Clang-16)
3. libclucene: 2.4.6 -> 2.4.8 (Porting to Clang-16)
2023-03-30 10:36:29 +08:00
58bc18af54 [Improvement](log) avoid warn log in specializeTemplateFunction at initialization (#18070)
avoid warn log in specializeTemplateFunction at initialization like this:

```
2023-03-23 21:17:54,931 INFO (leaderCheckpointer|89) [FunctionSet.specializeTemplateFunction():1337] specializeTemplateFunction exception at initialize
org.apache.doris.catalog.TypeException: ARRAY<DECIMALV3(9, 0)> is not MapType
        at org.apache.doris.catalog.MapType.specializeTemplateType(MapType.java:137) ~[fe-common-1.2-SNAPSHOT.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.FunctionSet.specializeTemplateFunction(FunctionSet.java:1321) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.FunctionSet.getFunction(FunctionSet.java:1251) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.FunctionSet.getFunction(FunctionSet.java:1216) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.FunctionSet.addBuiltinBothScalaAndVectorized(FunctionSet.java:1449) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.FunctionSet.addScalarAndVectorizedBuiltin(FunctionSet.java:1432) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.builtins.ScalarBuiltins.initBuiltins(ScalarBuiltins.java:108) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.FunctionSet.init(FunctionSet.java:87) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.Env.<init>(Env.java:585) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.Env.getCurrentEnv(Env.java:665) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.master.Checkpoint.doCheckpoint(Checkpoint.java:143) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.master.Checkpoint.runAfterCatalogReady(Checkpoint.java:77) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.Daemon.run(Daemon.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]
```
2023-03-30 10:17:21 +08:00
3094815f8f [enhancement](profile) add blocks produced profile to track if output block is very small (#18217) 2023-03-30 09:51:03 +08:00
01d012bab7 [fix](memory) Remove page cache regular clear, disabled jemalloc prof by default (#18218)
Remove page cache regular clear
Now the page cache is turned off by default. If the user manually opens the page cache, it can be considered that the user can accept the memory usage of the page cache, and then can consider adding a manual clear command to the cache.

fix memory gc cancel top memory query

jemalloc prof is not enabled by default
2023-03-30 09:39:37 +08:00
3b04d42779 [fix](bitmap) fix bug: orthogonal_bitmap_union_count coredump when arg is nullable (#18182)
Query cause be cordump:

select    ORTHOGONAL_BITMAP_UNION_COUNT(     cast(null as bitmap)) from   t;
2023-03-30 09:31:58 +08:00
21895abfe7 [bugfix](buffercontrolblock) many query becomes very slow in 1.2.3 (#18229)
predicate in wait for is wrong, should not check is cancelled.

            VDataBufferSender  (dst_fragment_instance_id=-39f306bf41e3bafb--5dc95f12d4afdcdb):
                  -  AppendBatchTime:  7s50ms
                      -  ResultRendTime:  7s5ms
                      -  TupleConvertTime:  41.829ms
                  -  NumSentRows:  38.114K  (38114)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-30 08:54:38 +08:00
7e575d4286 [typo](docs) fix the error of markdown syntax (#18219) 2023-03-30 07:22:14 +08:00
55bf38dbab [feature-wip](MTMV) Use SSB ddl to test (#18150)
Add regression tests for MTMV.
2023-03-30 00:11:38 +08:00
58de8ec2df [enhance](Nereids): add variable to enable Bushy Tree (#18202) 2023-03-29 21:53:24 +08:00
6964d9f99c [fix](function) resubmit-fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17907)
* Revert "[fix](function) fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17420)"

This reverts commit 397cc011c4f1ba5a25c770258c13f1cd3f28b47d.

* [fix-resubmit](function) fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17420)

ECB algorithm, block_encryption_mode does not take effect, it only takes effect when init vector is provided.
Solved: 192/256 supports calculation without init vector

For other algorithms, an error should be reported when there is no init vector

Initialization Vector. The default value for the block_encryption_mode system variable is aes-128-ecb, or ECB mode, which does not require an initialization vector. The alternative permitted block encryption modes CBC, CFB1, CFB8, CFB128, and OFB all require an initialization vector.

Reference: https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_aes-decrypt

Note: This fix does not support smooth upgrades. during upgrade process, query may report error: funciton not found
2023-03-29 21:13:01 +08:00
d9fe5f7b67 [enhancement](memory) Remove MemPool and replace it with Arena (#17820)
Arena can replace MemPool in most scenarios. Except for memory reuse, MemPool supports reuse of previous memory chunks after clear, but Arena does not.

Some comparisons between MemPool and Arena:

 1. Expansion
     Arena is less than 128M index 2 alloc chunk; more than 128M memory, allocate 128M * n > `size`, n is equal to the minimum value that satisfies the expression;
     MemPool less than 512K index 2 alloc chunk, greater than 512K memory, separately apply for a `size` length chunk
     
     After Arena applied for a chunk larger than 128M last time, the minimum chunk applied for after that is 128M. Does this seem to be a waste of memory? MemPool is also similar. After the chunk of 512K was applied for last time, the minimum chunk of subsequent applications is 512K.

 2. Alignment
     MemPool defaults to 16 alignment, because memtable and other places that use int128 require 16 alignment;
     Arena has no default alignment;

 3. Memory reuse
     Arena only supports `rollback`, which reuses the memory of the current chunk, usually the memory requested last time.
     MemPool supports clear(), all chunks can be reused; or call ReturnPartialAllocation() to roll back the last requested memory; if the last chunk has no memory, search for the most free chunk for allocation

 4. Realloc
     Arena supports realloc contiguous memory; it also supports realloc contiguous memory from any position at the time of the last allocation. The difference between `alloc_continue` and `realloc` is:
         1. Alloc_continue does not need to specify the old size, but the default old size = head->pos - range_start
         2. alloc_continue supports expansion from range_start when additional_bytes is between head and pos, which is equivalent to reusing a part of memory, while realloc completely allocates a new memory
     MemPool does not support realloc, but supports transferring or absorbing chunks between two MemPools

 5. check mem limit
     MemPool checks the mem limit, and Arena checks at the Allocator layer.

 6. Support for ASAN
     Arena does something extra

 7. Error handling
     MemPool supports returning the error message of application failure directly through `Status`, and Arena throws Exception.
Tests that Arena can consider

 1. After the last applied chunk is larger than 128M, the minimum applied chunk is 128M, which seems to waste memory;

 2. Support clear, memory multiplexing;

 3. Increase the large list, alloc the memory larger than 128M, and the size is equal to `size`, so as to avoid the current chunk not being fully used, which is wasteful.

 4. In some cases, it may be possible to allocate backwards to find chunks t
2023-03-29 20:56:49 +08:00
f24174ebf1 when estimated rowCount is 0, adjust to 1 (#18174) 2023-03-29 19:03:53 +08:00
b92087dee8 [Fix](Nereids) ReorderJoin rule cannot process MarkJoin correctly (#18159)
Fix two problems,
1. The logical join containing the MarkJoinSlotRefrance column will generate a plan->MarkJoinSlotreference structure when reorderJoin is executed, and the MarkJoinSlotreference column will be restored after the reorder is completed. But when filter+crossJoin exists, it will be transformed into innerJoin in the rules, causing the map to fail, and the corresponding plan cannot be found, thus losing the MarkJoinSlotreference column.
2. Originally, the MarkJoinSlotReference column was used as the NonUserVisibleOutput of logicalJoin. At the same time, when logicalApply was generated, the added logicalProject did not include the MarkJoinSlotReference column, and the invalid logicalProject was deleted based on other rules, so as to ensure that LogicalApply was under the logicalFilter and could recognize the MarkJoinSlotReference column. But there will be problems if logicalProject cannot be deleted.

Repair method
1. For logicalJoin containing MarkJoinSlotreference, the rules of reorderJoin are not executed.
2. Use MarkJoinSlotreference as the output of logicalJoin and also as the output of LogicalApply.
3. When generating LogicalApply, if MarkJoinSlotreference is included, you need to add an additional logicalProject to logicalFilter, and remove the MarkJoinSlotreference column.

eg
```
logicalFilter(subquery with disconjunct)

after SubqueryToApply

logicalProject(without markJoinSlotReference)
+-- logicalFilter(markJoinSlotReference)
    +-- logicalProject(with markJoinSlotReference)
        +-- logicalApply()
```

```
SELECT * FROM sub_query_correlated_subquery1 WHERE k1 IN (SELECT k1 FROM sub_query_correlated_subquery3) OR k1 < 10;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalProject[60] ( distinct=false, projects=[k1#0, k2#1], excepts=[], canEliminate=true )                                                                                                        |
| +--LogicalProject[59] ( distinct=false, projects=[k1#0, k2#1], excepts=[], canEliminate=true )                                                                                                     |
|    +--LogicalFilter[58] ( predicates=($c$1#7#false OR (k1#0 < 10)) )                                                                                                                               |
|       +--LogicalProject[57] ( distinct=false, projects=[k1#0, k2#1, $c$1#7#false], excepts=[], canEliminate=true )                                                                                 |
|          +--LogicalApply ( correlationSlot=[], correlationFilter=Optional.empty, isMarkJoin=true, MarkJoinSlotReference=$c$1#7#false, scalarSubCorrespondingSlot=empty )                           |
|             |--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery1, indexName=<index_not_selected>, selectedIndexId=63105, preAgg=ON )    |
|             +--LogicalProject[34] ( distinct=false, projects=[k1#2], excepts=[], canEliminate=true )                                                                                               |
|                +--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery3, indexName=<index_not_selected>, selectedIndexId=63115, preAgg=ON ) |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```
2023-03-29 16:12:42 +08:00