Commit Graph

8276 Commits

Author SHA1 Message Date
21319e6db4 [fix](nereids) generate invalid slot when translate predicates in filter on hash join (#12475)
test sql: TPC-H q21

```
select count(*)
from  lineitem l3 right anti join lineitem l1
      on l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey;
```
if we have other join conjuncts, we have to put all slots from left and right into `slotReferenceMap` instead of `hashjoin.getOutput()`

After splitting intermediate tuple and output tuple, we meet several issues in regression test. And hence, we make following changes:
1. since translating project will replace underlying hash-join node's output tuple, we add PhysicalHashJoin.shouldTranslateOutput
2. because PhysicalPlanTranslator will merge filter and hashJoin, we add PhysicalHashJoin.filterConjuncts and translate filter conjuncts in physicalHashJoin
3. In this pr, we set HashJoinNode.hashOutputSlotIds properly when using nereids planner.
4. in order to be compatible with BE, in substring function, nullable() returns true
2022-09-16 16:51:04 +08:00
9d6c199553 [Bug](vec) Fix avg overflow in clickbench (#12621) 2022-09-16 14:43:40 +08:00
131f2a42d2 [Improvement](Nereids) Restrict the condition to apply MergeConsecutiveLimits rule (#12624)
This PR added a condition check for MergeConsecutiveLimits rule: the input upper limit should not have valid offset info.
2022-09-16 13:05:39 +08:00
0f6dbb5769 [fix](Nereids): split INNER and OUTER into different rules. (#12646) 2022-09-16 10:34:42 +08:00
8364165e30 [regression_test](testcase) add regression test case from session variable skip_storage_engine_merge, skip_delete_predicate and show_hidden_columns (#12617)
also add this function to new olap scan node.
2022-09-16 10:33:12 +08:00
97ff14482f [enhancement](doc) When we use flink doris connector with bounded source, we should using the BATCH mode. (#12576) 2022-09-16 10:31:17 +08:00
d4f8e0c754 [Bug](spark load) fix spark load clearSparkLauncherLog NPE #12619
Co-authored-by: wuhangze <wuhangze@jd.com>
2022-09-16 10:30:57 +08:00
wxy
20de8ac29d [fix](auditloader plugin): fix bug for AuditLoaderPlugin that stmt appears truncated when stmt contains '\n'. (#12627)
Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>
2022-09-16 10:28:10 +08:00
380e3695f8 [test](window-function) add cte test in regression of window function #12635 2022-09-16 10:27:50 +08:00
f1811e41bc [fix](config)Update user_define_tables.sh #12542 2022-09-16 10:27:28 +08:00
Pxl
d44ec74988 [Enhancement](column) optimize for ColumnString::insert_many_dict_data (#12636)
optimize for ColumnString::insert_many_dict_data
2022-09-16 10:23:04 +08:00
c05d736331 [Improvement](sort) fallback to partial sort small block if topN is small (#12604)
* [Improvement](sort) fallback to partial sort small block if topN is small
2022-09-16 10:20:17 +08:00
2a063355ad [fix](vstream load) Fix the default value insertion problem when importing json (#12601)
* [fix](vstream load) Fix the default value insertion problem when importing json

* update
2022-09-16 09:54:45 +08:00
a97f63141e [fix](cast) Add validity check for date conversion for non-vectorization (#12608)
actual result
select cast("0.0000031417" as date);
+------------------------------+
| CAST('0.0000031417' AS DATE) |
+------------------------------+
| 2000-00-00 |
+------------------------------+

expect result
select cast("0.0000031417" as date);
+------------------------------+
| CAST('0.0000031417' AS DATE) |
+------------------------------+
| NULL |
+------------------------------+
2022-09-16 09:08:53 +08:00
d906e97f1b [bugfix](compression) fix lock bug in concurrent acquire context (#12638)
Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-09-16 09:05:29 +08:00
98dad6158b [fix](Nereids) type coercion on case-when is not correct (#12650)
When we do type coercion on CaseWhen expression, such as sql like this:
```
CASE WHEN n_nationkey > 1 THEN n_regionkey ELSE 0 END
```
The ELSE part 0 need do type coercion as CAST (0 AS INT). But we miss it in PR #11802
2022-09-16 02:26:11 +08:00
a63cdc8a7c [feature](Nereids) support basic runtime filter (#12182)
This PR add runtime filter to Nereids planner. Now only support push through join node and scan node.
TODO:
1. current support inner join, cross join, right outer join, and will support other join type in future.
2. translate left outer join to inner join if there are inner join ancestors.
3. some complex situation cannot be handled now, see more details in test case: testPushDownThroughJoin.
4. support src key is aggregate group key.
2022-09-16 02:21:01 +08:00
0daa25d9a9 [fix](nereids) UT failed when test cases in package (#12622)
NamedExpressionUtil::clear should reset the nextId rather than create a new IdGenerator<ExprId> because the old one may be referenced by other objects and it may cause some cases start in a dirty environment when we run test cases in package.
2022-09-15 22:25:40 +08:00
3072e17b39 [Bugfix](primary-key) fix calc delete bitmap bug in concurrent memtable flush (#12605)
Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-09-15 21:50:24 +08:00
db8bc80c36 [feature](Nereids): semi join transpose (#12590)
* [feature](Nereids): semi join transpose and enable ZIG_ZAG join reorder.
2022-09-15 21:32:50 +08:00
c6c84a2784 [chore](build) add build param to version string (#12591) 2022-09-15 17:09:22 +08:00
858e8234d7 [feature](Nereids) add predicates push down on all join type (#12571)
* [feature](Nereids) add predicates push down on all join type
2022-09-15 15:18:42 +08:00
5b6d48ed5b [feature](nereids) support distinct count (#12159)
support distinct count with group by clause.
for example:
SELECT count(distinct c_custkey + 1) FROM customer group by c_nation;

TODO: support distinct count without group by clause.
2022-09-15 13:01:47 +08:00
b11791b9a8 [Feature](Nereids) Limit pushdown. (#12518)
This PR adds rewrite rules to push the limit down. Following two cases would be handled:
```
limit -> join
limit -> project -> join
```
2022-09-15 12:12:10 +08:00
d2d5c19d51 [Improvement](Nereids) Avoid unsafe cast. (#12603)
This PR changed some interfaces to avoid unsafe cast.

- Modify `Plan.getExpressions()`'s return type from `List<Expression>` to `List<? extends Expression>`
Return projects (type is a list of named expression) in `getExpressions` can avoid unsafe cast. See `LogicalProject.getExpression()` as an example.

- Modify `EmptyRelation.getProjects()`'s return type from `List<NamedExpression>` to `List<? extends NamedExpression>`
Creating empty relation with a list of slots can avoid unsafe cast. See the `EliminateLimit` rule for example.
2022-09-15 12:02:35 +08:00
5e0dc11f87 [feature](Nereids)add RelationId as a unique identifier of relations (#12461)
In Nereids, we could not distinguish two relation from same table in one PlanTree.
This lead to some trick code to process them when do plan. Such as a separate branch to do equals in GroupExpression.
This PR add RelationId to LogicalRelation and PhysicalRelation. Then all relations equals function will compare RelationId to help us distinguish two relation from same table.

TODO:
add relation id to UnboundRelation, UnboundOneRowRelation, LogicalOneRowRelation, PhysicalOneRowRelation.
2022-09-15 11:56:56 +08:00
fc4298e85e [feature](outfile) support parquet writer (#12492) 2022-09-15 11:09:12 +08:00
22a8d35999 [Feature](vectorized) support jdbc sink for insert into data to table (#12534) 2022-09-15 11:08:41 +08:00
33f5a86e69 [fix](array-type) forbid to create materialized view for array column (#12543)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-09-15 11:08:23 +08:00
e413a2b8e9 [Opt](vectorized) Use new way to do hash shffle to speed up query (#12586) 2022-09-15 11:08:04 +08:00
353bb6fdfb [doc] update docs (#12615) 2022-09-15 11:07:34 +08:00
1080095f46 [typo](doc) fix some typos (#12611) 2022-09-15 11:07:19 +08:00
8e4374b7ec [enhancement](agg)remove unnessasery mem alloc and dealloc in agg node (#12535) 2022-09-15 11:07:06 +08:00
2ac790bf31 [enhancement](statistic) the calculation of routine load statistics are not accurate (#12594)
Co-authored-by: wuhangze <wuhangze@jd.com>
2022-09-15 11:00:57 +08:00
b136d80e1a [enhancement](compress) reuse compression ctx and buffer (#12573)
Reuse compression ctx and buffer.
Use a global instance for every compression algorithm, and use a
thread saft buffer pool to reuse compression buffer, pool size is equal
to max parallel thread num in compression, and this will not be too large.

Test shows this feature increase 5% of data import and compaction.

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-09-15 10:59:46 +08:00
6543924790 [fix](Nereids): avoid commute cause dead-loop. (#12616)
* [fix](Nereids): avoid commute cause dead-loop.

* update best plan
2022-09-15 10:47:11 +08:00
8aa5899484 [fix](load) add scan tuple for stream load scan node only when vectorization is enable (#12578) 2022-09-15 08:44:39 +08:00
beeb0ef3eb [Bug](lead) fix wrong child expression of lead function (#12587) 2022-09-15 08:44:18 +08:00
2dad67ee3e [docs](readme) update 1.1.2 released (#12596) 2022-09-15 08:43:45 +08:00
d8b6f09cc1 [Bugfix](string_functions) fix heap-buffer-overflow on find_in_set (#12613) 2022-09-15 08:43:10 +08:00
47d43b34b3 [enhancement](thirdparty) Compile Jemalloc separately on thirdparty (#12577)
Compile Jemalloc separately and optimize the configuration
2022-09-14 23:31:48 +08:00
f50054f547 [Enhancement](array-type) record offsets info to speed up the seek performance (#12293)
Store the offset rather than the length in file for the data with array type. The new file format can improve the seek performance. Please refer to #12246 to get the performance report.

Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com>
2022-09-14 22:41:54 +08:00
d4cb0bbdd5 [test](nereids) Add TPC-H regression test cases for nereids (#12600)
forbidden some test cases that could not run success. Will be open if we fix corresponding bugs
2022-09-14 22:37:56 +08:00
c5ad989065 [refactor](reader) refactor the interface of file reader (#12574)
Currently, Doris has a variety of readers for different file formats,
such as parquet reader, orc reader, csv reader, json reader and so on.

The interfaces of these readers are not unified, which makes it impossible to call them through a unified method.

In this PR, I added a `GenericReader` interface class, and other Readers will implement this interface class
to use the `get_next_block()` method.

This PR currently only modifies `arrow_reader` and `parquet reader`.
Other readers will be modified one by one in subsequent PRs.
2022-09-14 22:31:11 +08:00
be0a0200cf [fix](grpc-java) use pooled stub to call rpc on be instead of one stub (#10439)
A channel is closed when a timeout or exception happens, if only
one stub is used, then all query would fail.

If we dont close the channel, sometimes grpc-java stuck without sending
any rpc.
2022-09-14 22:30:45 +08:00
Pxl
0ead048b93 [Enhancement](column) remove ColumnString terminating zero and add a data_version for pblock (#12456)
1. remove ColumnString terminating zero
    2. add a data_version for pblock
    3. change EncryptionMode to enum class
2022-09-14 21:25:22 +08:00
c03f7c3ba4 [sample](flink-connector) add doris data delete function (#12599) 2022-09-14 19:18:59 +08:00
3130a19fe9 [feature](regression) Enhancement regression frame, support http post… (#12565) 2022-09-14 15:31:59 +08:00
3543f85ae5 [feature](nereids) merge push down and remove redundant operator rules into one batch (#12569)
1. For some related rules, we need to execute them together to get the expected plan.
2. Add session variables to avoid fallback to stale planner when running regression tests of nereids for piggyback.
2022-09-14 14:37:36 +08:00
08ee84ef67 [typo](docs)fix tablet-local-debug doc err #12572 2022-09-14 14:26:56 +08:00