Commit Graph

6608 Commits

Author SHA1 Message Date
fb9e48a34a [fix](vstream load) Fix bug when load json with jsonpath (#12660) 2022-09-19 10:13:18 +08:00
1fa65708d7 [test](time_add or sub)add time_add and time_sub funcation case #12641 2022-09-19 09:22:53 +08:00
4669fa54cc [enhancement](test) add tpch_sf100_unique p2 test (#12697) 2022-09-19 09:19:17 +08:00
b608de668f [fix](compile)compile error: open_telemetry_scop_wrapper.hpp cannot file 'UNLIKELY' (#12709) 2022-09-19 09:18:04 +08:00
6d3ae1e69c [regression](left join)Add left join, the left table is empty, the query result is not empty case (#12344)
Add left join, the left table is empty, the query result is not empty case
2022-09-19 08:53:50 +08:00
fa8ed2bccc [fix](array-type) fix the invalid format load for stream load (#12424)
this pr is used to fix the invalid format load for stream load.
before the change , we will get the error when we load the invalid array format.
the origin file to load :
1 [1, 2, 3]
2 [4, 5, 6]
3 \N
4 [7, \N, 8]
5 10, 11, 12
[hugo@xafj-palo]$ sh curl_cmd.sh
{
"TxnId": 11035,
"Label": "11c9f111-188e-4616-9a50-aec8b7814513",
"TwoPhaseCommit": "false",
"Status": "Fail",
"Message": "Array does not start with '[' character, found '1'",
"NumberTotalRows": 0,
"NumberLoadedRows": 0,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 55,
"LoadTimeMs": 7,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 2,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 3,
"CommitAndPublishTimeMs": 0
}
3. after this change, we will get success and the error url which report the error line.
[hugo@xafj-palo]$ sh curl_cmd.sh
{
"TxnId": 11046,
"Label": "249808ee-55f4-4c08-b671-b3d82689d614",
"TwoPhaseCommit": "false",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 5,
"NumberLoadedRows": 4,
"NumberFilteredRows": 1,
"NumberUnselectedRows": 0,
"LoadBytes": 55,
"LoadTimeMs": 39,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 2,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 19,
"CommitAndPublishTimeMs": 16,
"ErrorURL": "http://10.81.85.89:8502/api/_load_error_log?file=__shard_3/error_log_insert_stmt_8d4130f0c18aeb0a-ad7ffd4233c41893_8d4130f0c18aeb0a_ad7ffd4233c41893"
}

the sql select result:
MySQL [example_db]> select * from array_test06;
+------+--------------+
| k1 | k2 |
+------+--------------+
| 1 | [1, 2, 3] |
| 2 | [4, 5, 6] |
| 3 | NULL |
| 4 | [7, NULL, 8] |
+------+--------------+
4 rows in set (0.019 sec)

the url page show us:
"Reason: Invalid format for array column(k2). src line [10, 11, 12]; "

Issue Number: #7570
2022-09-19 08:52:59 +08:00
65cff8d40c [enhancement](compaction) prevent quick_compaction&auto_compaction conflict (#12674)
Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-09-19 08:39:27 +08:00
bc38b2fdfb [improvement](new-scan) graceful quit scanner scheduler (#12715) 2022-09-19 08:39:08 +08:00
625ac83f72 [enhancement](test) add opensky cases to p2 (#12693) 2022-09-19 08:38:17 +08:00
fc8f4c787d [enhancement](test) add yandex_metrica cases to p2 (#12692) 2022-09-19 08:37:48 +08:00
3b7a04ee8b [fix](inpredicate)always use PredicateColumn<TYPE_STRING> for CHAR, VARCHAR and STRING type (#12637)
The predicate column type for char, varchar and string is PredicateColumnType<TYPE_STRING>, so _base_evaluate method should convert the input column to PredicateColumnType<TYPE_STRING> always.
2022-09-19 08:37:06 +08:00
a4ed023bad [fix](colocation) fix decommission failure with 2 BEs and colocation table (#12644)
This PR fix:

2 Backends.
Create tables with colocation group, 1 replica.
Decommission one of Backends.
The tablet on decommissioned Backend is not reduced.
This is a bug of ColocateTableCheckerAndBalancer.
2022-09-19 08:34:50 +08:00
HB
00dda79735 [fix](broker-load) Correction of kerberos authentication time determination rule (#11793)
Every time a new broker load comes in, Doris will update the start time of Kerberos authentication,
but this logic is wrong.
Because the authentication duration of Kerberos is calculated from the moment when the ticket is obtained.

This PR change the logic:
1. If it is kerberos, check fs expiration by create time.
2.Otherwise, check fs expiration by access time
2022-09-18 17:46:13 +08:00
cb06e67fba [fix](tracing) Fix opentelemetry log output to be.out (#11856) 2022-09-18 17:40:23 +08:00
4f98146e83 [enhancement](tracing) Support forward to master tracing (#12290) 2022-09-18 17:39:04 +08:00
e9f105aa1e [enhancement](regression-test) add some p0 cases (#12243) 2022-09-18 17:36:08 +08:00
c30453e9ab [enhancement](regression-test) add ssb_sf100 to p2 cases (#12286) 2022-09-18 17:35:16 +08:00
a73b28789d Fix memory leak by calling in mem hook (#12708)
After the consume mem tracker exceeds the mem limit in the mem hook, the boost stacktrace will be printed. A query/load will only be printed once, and the process tracker will only be printed once per second.

After the process memory reaches the upper limit, the boost stacktrace will be printed every second. The observed phenomena are as follows:

After query/load is canceled, the memory increases instantly;
tcmalloc profile total physical memory is less than perf process memory;
The process mem tracker is smaller than the perf process memory;
2022-09-18 10:04:15 +08:00
2e41976b07 update tpch regression test (#12687)
turn on all TPC-H sf1 test cases except Q2. Q2 caused dead loop in Join reorder. Will turn on Q2 after fix it.
2022-09-17 17:06:39 +08:00
bac58a4774 [feature-wip](unique-key-merge-on-write) fix calculate delete bitmap when flush memtable (#12668) 2022-09-17 17:04:03 +08:00
35b97a5af0 [Opt](hash) Speed up insert from dict data map and not datetime (#12670)
Speed up dict data read and not datetime. same target #12636
2022-09-17 17:02:43 +08:00
3030a3606a [fix](load) fix stream load fail when setting strict mode (#12684) 2022-09-17 17:02:11 +08:00
3bb042e45c [fix](memtracker) Process physical mem check does not include tc/jemalloc allocator cache (#12688)
tcmalloc/jemalloc allocator cache does not participate in the mem check as part of the process physical memory.

because new/malloc will trigger mem hook when using tcmalloc/jemalloc allocator cache, but it may not actually alloc physical memory, which is not expected in mem hook fail.

in addition:

The value of tcmalloc/jemalloc allocator cache is used as a mem tracker, the parent is the process mem tracker, which is updated every 1s.
Modify the process default mem_limit to 90%. expect mem tracker to effectively limit the memory usage of the process.
2022-09-17 11:31:01 +08:00
e01986b8b9 [feature](light-schema-change) fix light-schema-change and add more cases (#12160)
Fix _delete_sign_idx and _seq_col_idx when append_column or build_schema when load.
Tablet schema cache support recycle when schema sptr use count equals 1.
Add a http interface for flink-connector to sync ddl.
Improve tablet->tablet_schema() by max_version_schema.
2022-09-17 11:29:36 +08:00
942b31038f [fix](memory) Fix BE OOM when load -238 fail (#12666)
When the flush is triggered when the load channel exceeds the mem limit, if the flush fails, an error message is returned and the load is terminated.

Usually flush failure is -238 error code. Because the memtable is frequently flushed after the load channel exceeds the mem limit, the number of segments exceeds the max value.
2022-09-17 00:17:53 +08:00
42b6532131 remove gc and fix print (#12682) 2022-09-17 00:16:15 +08:00
0a95ebf602 [feature](Nereids) Add scalar function code generator and some function trait (#12671)
This pr did these things:
1. Change the nullable mode of 'from_unixtime' and 'parse_url' from DEPEND_ON_ARGUMENT to ALWAYS_NULLABLE, which nullable configuration was missing previously.
2. Add some new interfaces for origin NullableMode. This change inspired by the grammar of scala's mix-in trait, It help us to quickly understand the traits of function without read the lengthy procedural code and save the work to write some template code, like `class Substring extends ScalarFunction implements ImplicitCastInputTypes, PropagateNullable`. These are the interfaces:
   - PropagateNullable: equals to NullableMode.DEPEND_ON_ARGUMENT
   - AlwaysNullable: equals to NullableMode.ALWAYS_NULLABLE
   - AlwaysNotNullable: equals to NullableMode.ALWAYS_NOT_NULLABLE
   - others ComputeNullable: equals to NullableMode.CUSTOM
3. Add `GenerateScalarFunction` to generate nereids-style function code from legacy functions, but not actual generate any new function class yet, because the function's trait is not ready for use. I need add some traits for the legacy function's CompareMode and NonDeterministic, this thought is the same as ComputeNullable.
2022-09-16 21:27:30 +08:00
b733a23cf7 [Bugfix](stack_over_flow) fix be may core dump because of stack-buffer-overflow when TBrokerOpenReaderResponse too large (#12658) 2022-09-16 20:57:22 +08:00
a3fee5afbb [doc](variables) fix forward_to_master doc bug #12659
Co-authored-by: wudi <>
2022-09-16 20:56:55 +08:00
6fc74def02 [fix](Broker load): fix bug for broker label has already been used (#12630) 2022-09-16 20:46:01 +08:00
378acfa28f [enhancement](Nereids) eliminate all unessential cross join in TPC-H benchmark (#12651)
For eliminate all unessential cross join in TPC-H benchmark, this PR:

1. push all predicates that can be push down through join before do ReorderJoin rule.
Then we could eliminate all cross join that can be eliminated in ReorderJoin rule since this rule need matching a LogicalFilter as a root pattern. (Q2, Q15, Q16, Q17, Q18)
2. enable expression optimization rule - extract common expression. (Q19)
3. fix cast translate failed. (Q19)
2022-09-16 19:09:58 +08:00
a4a5dae7dc [enhancement](test) add tpcds_sf100 to p2 cases (#12296) 2022-09-16 17:38:23 +08:00
21319e6db4 [fix](nereids) generate invalid slot when translate predicates in filter on hash join (#12475)
test sql: TPC-H q21

```
select count(*)
from  lineitem l3 right anti join lineitem l1
      on l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey;
```
if we have other join conjuncts, we have to put all slots from left and right into `slotReferenceMap` instead of `hashjoin.getOutput()`

After splitting intermediate tuple and output tuple, we meet several issues in regression test. And hence, we make following changes:
1. since translating project will replace underlying hash-join node's output tuple, we add PhysicalHashJoin.shouldTranslateOutput
2. because PhysicalPlanTranslator will merge filter and hashJoin, we add PhysicalHashJoin.filterConjuncts and translate filter conjuncts in physicalHashJoin
3. In this pr, we set HashJoinNode.hashOutputSlotIds properly when using nereids planner.
4. in order to be compatible with BE, in substring function, nullable() returns true
2022-09-16 16:51:04 +08:00
9d6c199553 [Bug](vec) Fix avg overflow in clickbench (#12621) 2022-09-16 14:43:40 +08:00
131f2a42d2 [Improvement](Nereids) Restrict the condition to apply MergeConsecutiveLimits rule (#12624)
This PR added a condition check for MergeConsecutiveLimits rule: the input upper limit should not have valid offset info.
2022-09-16 13:05:39 +08:00
0f6dbb5769 [fix](Nereids): split INNER and OUTER into different rules. (#12646) 2022-09-16 10:34:42 +08:00
8364165e30 [regression_test](testcase) add regression test case from session variable skip_storage_engine_merge, skip_delete_predicate and show_hidden_columns (#12617)
also add this function to new olap scan node.
2022-09-16 10:33:12 +08:00
97ff14482f [enhancement](doc) When we use flink doris connector with bounded source, we should using the BATCH mode. (#12576) 2022-09-16 10:31:17 +08:00
d4f8e0c754 [Bug](spark load) fix spark load clearSparkLauncherLog NPE #12619
Co-authored-by: wuhangze <wuhangze@jd.com>
2022-09-16 10:30:57 +08:00
wxy
20de8ac29d [fix](auditloader plugin): fix bug for AuditLoaderPlugin that stmt appears truncated when stmt contains '\n'. (#12627)
Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>
2022-09-16 10:28:10 +08:00
380e3695f8 [test](window-function) add cte test in regression of window function #12635 2022-09-16 10:27:50 +08:00
f1811e41bc [fix](config)Update user_define_tables.sh #12542 2022-09-16 10:27:28 +08:00
Pxl
d44ec74988 [Enhancement](column) optimize for ColumnString::insert_many_dict_data (#12636)
optimize for ColumnString::insert_many_dict_data
2022-09-16 10:23:04 +08:00
c05d736331 [Improvement](sort) fallback to partial sort small block if topN is small (#12604)
* [Improvement](sort) fallback to partial sort small block if topN is small
2022-09-16 10:20:17 +08:00
2a063355ad [fix](vstream load) Fix the default value insertion problem when importing json (#12601)
* [fix](vstream load) Fix the default value insertion problem when importing json

* update
2022-09-16 09:54:45 +08:00
a97f63141e [fix](cast) Add validity check for date conversion for non-vectorization (#12608)
actual result
select cast("0.0000031417" as date);
+------------------------------+
| CAST('0.0000031417' AS DATE) |
+------------------------------+
| 2000-00-00 |
+------------------------------+

expect result
select cast("0.0000031417" as date);
+------------------------------+
| CAST('0.0000031417' AS DATE) |
+------------------------------+
| NULL |
+------------------------------+
2022-09-16 09:08:53 +08:00
d906e97f1b [bugfix](compression) fix lock bug in concurrent acquire context (#12638)
Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-09-16 09:05:29 +08:00
98dad6158b [fix](Nereids) type coercion on case-when is not correct (#12650)
When we do type coercion on CaseWhen expression, such as sql like this:
```
CASE WHEN n_nationkey > 1 THEN n_regionkey ELSE 0 END
```
The ELSE part 0 need do type coercion as CAST (0 AS INT). But we miss it in PR #11802
2022-09-16 02:26:11 +08:00
a63cdc8a7c [feature](Nereids) support basic runtime filter (#12182)
This PR add runtime filter to Nereids planner. Now only support push through join node and scan node.
TODO:
1. current support inner join, cross join, right outer join, and will support other join type in future.
2. translate left outer join to inner join if there are inner join ancestors.
3. some complex situation cannot be handled now, see more details in test case: testPushDownThroughJoin.
4. support src key is aggregate group key.
2022-09-16 02:21:01 +08:00
0daa25d9a9 [fix](nereids) UT failed when test cases in package (#12622)
NamedExpressionUtil::clear should reset the nextId rather than create a new IdGenerator<ExprId> because the old one may be referenced by other objects and it may cause some cases start in a dirty environment when we run test cases in package.
2022-09-15 22:25:40 +08:00