Commit Graph

8106 Commits

Author SHA1 Message Date
7f2c433e08 [feature](Nereids) add relation id to unboundTVFRelation to avoid incorrect group expression comparison (#15740) 2023-01-11 12:49:14 +08:00
af3416ede0 [docs] Update be-vscode-dev.md (#15800)
Fix some syntax errors, making it more comfortable for developers to read.
2023-01-11 12:30:52 +08:00
94f6380137 [enhance](Nereids): github action forgot some nereids file. (#15746) 2023-01-11 11:42:52 +08:00
Pxl
2587095811 [Bug](mv) fix mv selector check group expr && forbid create dup mv with bitmap/hll && add some case (#15738) 2023-01-11 11:38:56 +08:00
3c8c31a5f8 [chore](Session) remove unused codes for enable_lateral_view
session variable `enable_lateral_view` has been removed for a long time.
This cl just remove variable name `enable_lateral_view`.
2023-01-11 11:24:28 +08:00
870b5c44e6 [fix](compile) compile failed in Mac with clang14 (#15661)
HOW to reproduce?
Add export CMAKE_BUILD_TYPE=DEBUG in custom_env.sh. Then build thirdparty in MAC.

There are two problems:

build vectorscan with DEBUG type, will got unused-but-set-variable error:
doris/thirdparty/src/vectorscan-vectorscan-5.4.7/src/nfa/mcclellancompile.cpp:1485:13: error: variable 'total_daddy' set but not used [-Werror,-Wunused-but-set-variable]
u16 total_daddy = 0;
gflags will output libgflags_debug.a instead of libgflags.a while build with DEBUG type. Then we will got error can not find library gflags error.
To avoid these errors, we set CMAKE_BUILD_TYPE while build vectorscan and gflags.

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Co-authored-by: Adonis Ling <adonis0147@gmail.com>
2023-01-11 11:09:00 +08:00
5c2a38d2a1 [chore](thirdparty) Fix the md5sum of the package brpc-1.2.0.tar.gz (#15789)
Apache brpc has graduated from incubator recently. The MD5 of the package we download from https://github.com/apache/incubator-brpc/archive/refs/tags/1.2.0.tar.gz changed and the mismatch MD5 makes the build scripts fail.
2023-01-11 11:05:21 +08:00
fe5e5d2bf4 [refactor] separate agg and flush in memtable (#15713) 2023-01-11 10:07:34 +08:00
f5948eb4b0 [Build](cmake) Uniform capitalization keyword of cmake (#15728) 2023-01-11 09:58:07 +08:00
3fec5ff0f5 [refactor](scan-pool) move scan pool from env to scanner scheduler (#15604)
The origin scan pools are in exec_env.
But after enable new_load_scan_node by default, the scan pool in exec_env is no longer used.
All scan task will be submitted to the scan pool in scanner_scheduler.

BTW, reorganize the scan pool into 3 kinds:

local scan pool
For olap scan node

remote scan pool
For file scan node

limited scan pool
For query which set cpu resource limit or with small limit clause

TODO:
Use bthread to unify all IO task.

Some trivial issues:

fix bug that the memtable flush size printed in log is not right
Add RuntimeProfile param in VScanner
2023-01-11 09:38:42 +08:00
d857b4af1b [refactor](remove row batch) remove impala rowbatch structure (#15767)
* [refactor](remove row batch) remove impala rowbatch structure

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-11 09:37:35 +08:00
5b10116eca [chore](thirdparty) fix bug that GSSAPI of libgsasl is disabled (#15753)
In #15037, I modified the build script of libgsasl to enable GSSAPI,
but it is still wrong, because the PATH does not include the `thirdparty/installed/bin`,
so when building libgsasl, it will report error:
`WARNING: MIT Kerberos krb5-config not found, disabling GSSAPI`

but `krb5-config` is in `thirdparty/installed/bin`.

Without GSSAPI, the libhdfs3 can not access hdfs with kerberos authentication.
2023-01-11 09:07:46 +08:00
89c21af87d [chore](fe) update fe snapshot to 1.2 and fix auditloader compile error (#15787)
This PR #14925 change some field of AuditEvent, so we need to upgrade the fe-core's SNAPSHOT to 1.2
because auditloader depends on fe-core

Already push the 1.2-SNAPSHOT to
https://repository.apache.org/content/repositories/snapshots/org/apache/doris/fe-core/1.2-SNAPSHOT/
2023-01-11 08:46:48 +08:00
8f31a36429 [feature] support spill to disk for sort node (#15624) 2023-01-11 08:40:58 +08:00
4bbc93b7ce [refactor](hashtable) simplify template args of partitioned hash table (#15736) 2023-01-11 08:39:13 +08:00
124c8662e8 [Bug](schema scanner) Fix wrong type in schema scanner (#15768) 2023-01-11 08:37:39 +08:00
bc34a44f06 [Fix](Nereids) fix type coercion for binary arithmetic (#15185)
support sql like: select true + 1 + '2.0' and prevent select true + 1 + 'x';
2023-01-11 02:55:44 +08:00
c87a9a5949 [fix](Nereids) Add varchar literal compare (#15672)
support "1" = "123"
2023-01-11 02:41:50 +08:00
280603b253 [fix](nereids) bind sort key priority problem (#15646)
`a.b.c` should only bind on `a.b.c`, not on `b.c` or `c`
2023-01-11 02:03:09 +08:00
f5b0f5e01a [chore](macOS) Don't build useless third-party stuff (#15763)
On macOS, we need some extra libraries to build the codebase,
therefore two packages were introduced to the project. They are `binutils` and `gettext`. 

It takes a lot of time to build these packages completely. This PR introduces a way to build the needed libraries
and other stuff are skipped to build. It can save the time to build the third-party libraries on macOS.
2023-01-11 00:20:37 +08:00
5dc644769a [mtmv](regression-test) add mtmv write data regression test (#15546)
* [regression-test](mtmv) add mtmv write data regression test

* [regression-test](mtmv) add mtmv write data regression test

* [regression-test](mtmv) add mtmv write data regression test

* [regression-test](mtmv) add mtmv write data regression test

* [regression-test](mtmv) add mtmv write data regression test
2023-01-10 23:42:42 +08:00
4be54cfcac [deps](hdfs) update libhdfs3 to v2.3.5 to support KMS (#15770)
Support KMS in libhdfs3: apache/doris-thirdparty#22
2023-01-10 23:21:53 +08:00
ab2e0fd397 [fix](tvf) cancel strict restrictions on tvf parameters (#15764)
Cancel strict restrictions on tvf parameters.
2023-01-10 22:40:19 +08:00
79b24cdb1f [fix](JdbcResource) fix that JdbcResource does not support the jdbcurl of Oracle and SQLServer (#15757)
Actually, `JdbcResource` should support `Oracle` jdbcurl and `SQLServer` jdbcurl for jdbc external table.
2023-01-10 22:38:30 +08:00
90a92f0643 [feature-wip](multi-catalog) add iceberg tvf to read snapshots (#15618)
Support new table value function `iceberg_meta("table" = "ctl.db.tbl", "query_type" = "snapshots")`
we can use the sql `select * from iceberg_meta("table" = "ctl.db.tbl", "query_type" = "snapshots")` to get snapshots info  of a table. The other iceberg metadata will be supported later when needed.

One of the usage:

Before we use following sql to time travel:
`select * from ice_table FOR TIME AS OF "2022-10-10 11:11:11"`;
`select * from ice_table FOR VERSION AS OF "snapshot_id"`;
we can use the snapshots metadata to get the `committed time` or `snapshot_id`, 
and then, we can use it as the time or version in time travel clause
2023-01-10 22:37:35 +08:00
542542a4b2 [fix](nereids) fix bug in estimation of min/max of Year (#15712)
1. fix bug in estimation of min/max of Year
2. remove Utils.getLocalDatetimeFromLong(Long). this method is will throw exception if input parameter is too big. And this method is not used any more when we fix the above bug
2023-01-10 21:29:16 +08:00
fec89ad58c [fix](nereids) week should be able to recognized as function name in function call context (#15735) 2023-01-10 19:54:59 +08:00
7767931aca [ehancement](nereids) let parser support utf8 identifier (#15721)
After this PR, below SQL could be parsed well too
- SELECT k1 AS 测试 FROM  test;
- SELECT k1 AS テスト FROM test;
2023-01-10 19:43:04 +08:00
bb28144c76 [fix](schema change) bugfix for light schema change while with rollup (#15681)
Describe your changes.
this problem come from pr: #11494

After add column to rollup index, it also change column UniqueId inside base index.
2023-01-10 19:03:06 +08:00
a67cea2d27 [Enhancement](metric) add current edit log metric (#15657) 2023-01-10 18:46:57 +08:00
503b6ee4da [chore](vulnerability) fix fe high risk vulnerability scanned by bug scanner (#15649) 2023-01-10 17:44:18 +08:00
672d11522b [regression](flink)add flink doris connector case (#15676)
* add flink doris connector case
2023-01-10 17:25:06 +08:00
c3da5a687a [fix]fixed dangerous usage of namespace std (#15741)
Co-authored-by: zhaochangle <zhaochangle@selectdb.com>
2023-01-10 16:10:49 +08:00
47097a3db8 [fix](having) revert 15143 and fix having clause with multi-conditions (#15745)
Describe your changes.

Firstly having clause of Mysql is really very complex, we are hard to follow all rules, so we revert pr15143 to keep the logic the same as before.

Secondly the origin implementation has problem while having clause has multi-conditions.
For example:

case1: here v2 inside having clause use table column test_having_alias_tb.v2
SELECT id, v1-2 as v, sum(v2) v2 FROM test_having_alias_tb GROUP BY id,v having(v2>1);
ERROR 1105 (HY000): errCode = 2, detailMessage = HAVING clause not produced by aggregation output (missing from GROUP BY clause?): (`v2` > 1)
case2: here v2 inside having clause use alias name v2 =sum(test_having_alias_tb.v2), another condition make logic of v2 differently.
SELECT id, v1-2 as v, sum(v2) v2 FROM test_having_alias_tb GROUP BY id,v having(v>0 AND v2>1) ORDER BY id,v;
+------+------+------+
| id   | v    | v2   |
+------+------+------+
|    2 |    1 |    3 |
+------+------+------+
So here we try to make the having clause rules simple:
Rule1: if alias name inside having clause is the same as column name, we use column name not alias name;
Rule2: if alias name inside having clause do not have same name as column name, we use alias name;

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2023-01-10 15:57:29 +08:00
ec0a9647f1 [typo](docs)Update sequence-column-manual.md #15727
创建unique模型的test_table数据表,并指定指定sequence列映射到表中的modify_date列。 重复
2023-01-10 14:54:57 +08:00
f17d69e450 [feature](file cache)Import file cache for remote file reader (#15622)
The main purpose of this pr is to import `fileCache` for lakehouse reading remote files.
Use the local disk as the cache for reading remote file, so the next time this file is read,
the data can be obtained directly from the local disk.
In addition, this pr includes a few other minor changes

Import File Cache:
1. The imported `fileCache` is called `block_file_cache`, which uses lru replacement policy.
2. Implement a new FileRereader `CachedRemoteFilereader`, so that the logic of `file cache` is hidden under `CachedRemoteFilereader`.

Other changes:
1. Add a new interface `fs()` for `FileReader`.
2. `IOContext` adds some statistical information to count the situation of `FileCache`

Co-authored-by: Lightman <31928846+Lchangliang@users.noreply.github.com>
2023-01-10 12:23:56 +08:00
dec79c000b [fix](MTMV) build mode is missing after restart FE (#15551) 2023-01-10 11:38:56 +08:00
1888aba301 [fix](MTMV) fix replayReplaceTable error when restart fe (#15564) 2023-01-10 11:36:17 +08:00
025623a124 [feature](Nereids) Support lots of aggregate functions (#15671)
1. generate lots of aggregate functions
2. support `group_concat(columns order by order_columns)`  grammer
3. support and generate array aggregate/scalar functions, like `array_union`. we should support array grammar in the future, e.g. `select [1, 2, 3]`
4. add `checkLegalityBeforeTypeCoercion` and `checkLegalityAfterRewrite` function to check the legality of expression before type coercion and after rewrite, copy the semantic check of `FunctionCallExpr` to the checkLegality; remove the `ForbiddenMetricTypeArguments`; move the check of aes/sm4 crypto function from translator to checkLegalityBeforeTypeCoercion
5. refactor the `NullableAggregateFunction`: distinct is the first parameter, alwaysNullable is the second parameter; Fix some wrong initialize order: some function invoke super(distinct, alwaysNullable) but some function invoke super(alwaysNullable, distinct)
2023-01-10 11:20:27 +08:00
601d9af23b [fix](planner) disconjunct in sub-query failed when plan it on hash join (#15653)
all conjuncts should be added before HashJoinNode init. Otherwise, some slots on conjuncts linked to the tuple not in intermediate tuple on HashJoinNode
2023-01-10 11:10:12 +08:00
d0e8f84279 [feature](vectorized) Support MemoryScratchSink on vectorized engine (#15612) 2023-01-10 10:38:35 +08:00
fd7d13d4c0 [typo](docs)Update dynamic-partition.md #15734
拼写错误
2023-01-10 10:14:44 +08:00
c19e391d32 [fix](profile) show query profile for pipeline engine (#15687) 2023-01-10 10:12:34 +08:00
9c0f96883a [fix](hashjoin) Fix right join pull output block memory overflow (#15440)
For outer join / right outer join / right semi join, when HashJoinNode::pull->process_data_in_hashtable outputs a block, it will output all rows of a key in the hash table into a block, and the output of a key is completed After that, it will check whether the block size exceeds the batch size, and if it exceeds, the output will be terminated.

If a key has 2000w+ rows, memory overflow will occur when the subsequent block operations on the 2000w+ rows are performed.
2023-01-10 10:10:43 +08:00
3990a44aba [typo](doc) add since dev lable to field function doc (#15648) 2023-01-10 09:52:37 +08:00
67a6ad648e [typo](doc) command of manually trigger compaction incorrect (#15709) 2023-01-10 09:50:47 +08:00
9e3a61989b [refactor](es) remove BE generated dsl for es query #15751
remove fe config enable_new_es_dsl and all related code.
Now the DSL for es is always generated on FE side.
2023-01-10 08:40:32 +08:00
ab186a60ce [enhancement](compaction) Optimize judging delete rowset and picking candidate rowsets for compaction #15631
Tablet::version_for_delete_predicate should travel all rowset metas in tablet meta which complex is O(N), however we can directly judge whether this rowset is a delete rowset by RowsetMeta::has_delete_predicate which complex is O(1).
As we won't call Tablet::version_for_delete_predicate when pick input rowsets for compaction, we can reduce the critical area of Tablet::_meta_lock.
2023-01-10 08:32:15 +08:00
05f6e4c48a [fix](predicate) fix be core dump caused by pushing down the double column predicate (#15693) 2023-01-09 19:31:04 +08:00
2b0e5e42a5 [ehancement](nereids) Support list parttion prune (#15724) 2023-01-09 19:00:53 +08:00