Commit Graph

6391 Commits

Author SHA1 Message Date
bd4bfa8f00 [fix](memtracker) Fix thread mem tracker try consume accuracy #12782 2022-09-21 09:20:41 +08:00
c72a19f410 [BugFix](VExprContext) capture error status to prevent incorrect func call which causes coredump #12779 2022-09-21 09:20:16 +08:00
f1539761e8 [Bugfix](string_functions) rearrange code to avoid global buffer overflow in FindInSetOp::execute (#12677) 2022-09-21 09:19:38 +08:00
c5b6056b7a [fix](lateral_view) fix lateral view explode_split with temp table (#12643)
Problem describe:

follow SQL return wrong result:
WITH example1 AS ( select 6 AS k1 ,'a,b,c' AS k2) select k1, e1 from example1 lateral view explode_split(k2, ',') tmp as e1;

Wrong result:

+------+------+
| k1   | e1   |
+------+------+
|    0 | a    |
|    0 | b    |
|    0 | c    |
+------+------+
Correct result should be:
+------+------+
| k1   | e1   |
+------+------+
|    6 | a    |
|    6 | b    |
|    6 | c    |
+------+------+
Why?
TableFunctionNode::outputSlotIds do not include column k1.

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-09-21 09:19:18 +08:00
b0b876f640 [typo](docs) vectorization needs to be turned off to use native udf #12739 2022-09-21 09:13:48 +08:00
11e0151445 [chore](build) add an option to disable strip thridparty libs (#12772) 2022-09-21 09:11:25 +08:00
7dfbb7c639 [chore](regression-test) add order by column in tpch_sf1_p1/tpch_sf1/nereids/q11.groovy (#12770) 2022-09-20 22:26:24 +08:00
d5486726de [Bug](date) Fix wrong result produced by date function (#12720) 2022-09-20 21:09:26 +08:00
cc072d35b7 [Bug](date) Fix wrong type in TimestampArithmeticExpr (#12727) 2022-09-20 21:08:48 +08:00
b550985df6 fix thirdparty builder (#12768) 2022-09-20 19:41:00 +08:00
e70c298e0c [Bugfix](mem) Fix memory limit check may overflow (#12776)
This bug is because the result of subtracting signed and unsigned numbers may overflow if it is negative.
2022-09-20 18:18:23 +08:00
bb7206d461 [refactor](SimpleScheduler) refactor code for getting available backend in SimpleScheduler (#12710) 2022-09-20 18:08:29 +08:00
b837b2eb95 [feature-wip](parquet-reader) filter rows by page index (#12664)
# Proposed changes

[Parquet v1.11+ supports page skipping](https://github.com/apache/parquet-format/blob/master/PageIndex.md), 
which helps the scanner reduce the amount of data scanned, decompressed, decoded, and insertion.
According to the performance FlameGraph, decompression takes up 20% cpu time.
If a page can be filtered as a whole, the page can not be decompressed.

However, the row numbers between pages are not aligned. Columns containing predicates can be filtered by page granularity,
but other columns need to be skipped within pages, so non predicate columns can only save the decoding and insertion time.

Array column needs the repetition level to align with other columns, so the array column can only save the decoding and insertion time.

## Explore
`OffsetIndex` in the column metadata can locate the page position.
Theoretically, a page can be completely skipped, including the time of reading from HDFS.
However, the average size of a page is around 500KB. Skipping a page requires calling the `skip`.
The performance of `skip` is low when it is called frequently,
and may not be better than continuous reading of large blocks of data (such as 4MB).

If multiple consecutive pages are filtered, `skip` reading can be performed according to`OffsetIndex`.
However, for the convenience of programming and readability, the data of all pages are loaded and filtered in turn.
2022-09-20 15:55:19 +08:00
47797ad7e8 [feature](Nereids) Push down not slot references expression of on clause (#11805)
pushdown not slotreferences expr of on clause.
select * from t1 join t2 on t1.a + 1 = t2.b + 2 and t1.a + 1 > 2

project()
+---join(t1.a + 1 = t2.b + 2 && t1.a + 1 > 2)
    |---scan(t1)
    +---scan(t2)

transform to

project()
+---join(c = d && c > 2)
    |---project(t1.a -> t1.a + 1)
    |   +---scan(t1)
    +---project(t2.b -> t2.b + 2)
        +---scan(t2)
2022-09-20 13:41:54 +08:00
d83eb13ac5 [enhancement](nereids) use Literal promotion to avoid unnecessary cast (#12663)
Instead of add a cast function on literal, we directly change the literal type. This change could save cast execution time and memory.
For example:
In SQL: 
"CASE WHEN l_orderkey > 0 THEN ...", 0 is a TinyIntLiteral. 
Before this PR: 
"CASE WHEN l_orderkey > CAST (TinyIntLiteral(0) AS INT)` 
With this PR:  
"CASE WHEN l_orderkey > IntegerLiteral(0)"
2022-09-20 11:15:47 +08:00
954c44db39 [enhancement](Nereids) compare LogicalProperties with output set instead of output list (#12743)
We used output list to compare two LogicalProperties before. Since join reorder will change the children order of a join plan and caused output list changed. the two join plan will not equals anymore in memo although they should be. So we must add a project on the new join to keep the LogicalProperties the same.
This PR changes the equals and hashCode funtions of LogicalProperties. use a set of output to compare two LogicalProperties. Then we do not need add the top peoject anymore. This help us keep memo simple and efficient.
2022-09-20 10:55:29 +08:00
d435f0de41 [feature-wip](parquet-reader) add page index row range (#12652)
Add some utils and provide the candidate row range  (generated with skipped row range of each column) 
to read for page index filter
this version support binary operator filter

todo: 
- use context instead of structures in close() 
- process complex type filter
- use this instead of row group minmax filter
- refactor _eval_binary() for row group filter and page index filter
2022-09-20 10:36:19 +08:00
ca3e52a0bb [fix](agg)the output of window function's nullability should be consistent with output slot (#12607)
FE may force window function to output a nullable value in some case, be should follow this and change the nullability accordingly.
2022-09-20 09:29:44 +08:00
4f27692898 [fix](inlineview)the inlineview's slots' nullability property is not set correctly (#12681)
The output slots of inline view may come from an outer join nullable side table. So it's should be nullable.
2022-09-20 09:29:15 +08:00
41cf94498d [feature-wip](unique-key-merge-on-write) fix that incremental clone may lead to loss of delete bitmap (#12721) 2022-09-20 09:08:06 +08:00
f5f6a852fe [chore](regression-test) add order by in test_rollup_agg_date.groovy (#12737) 2022-09-20 09:06:13 +08:00
e1d2f82d8e [feature](statistics) template for building internal query SQL statements (#12714)
Template for building internal query SQL statements,it mainly used for statistics module. After the template is defined, the executable statement will be built after the given parameters.

For example, template and parameters:
- template: `SELECT ${col} FROM ${table} WHERE id = ${id};`,
- parameters:  `{col=colName, table=tableName, id=1}`
- result sql:  `SELECT colName FROM tableName WHERE id = 1;`

usage:
```
String template = "SELECT * FROM ${table} WHERE id = ${id};";
Map<String, String> params = new HashMap<>();
params.put("table", "table0");
params.put("id", "123");

// result: SELECT * FROM table0 WHERE id = 123;
String result = InternalSqlTemplate.processTemplate(template, params);
```
2022-09-19 22:10:28 +08:00
43d6be8c4d [docs](function) add a series of date function documents (#12713)
* [docs](function) add a series of date function documents
add docs for `hours_add`, `hours_sub`, `minutes_add`, `minutes_sub`,
`seconds_add`, `seconds_sub`, `years_sub`, `years_add`, `months_add`,
`months_sub`, `days_add`, `days_add`, `weeks_add`, `weeks_sub` functions.
2022-09-19 21:42:35 +08:00
a5d11dce3b [typo](docs) Add docs of math function (#12532)
* docs of math function
2022-09-19 21:41:59 +08:00
94d73abf2a [test](Nereids) runtime filter unit cases not rely on NereidPlanner to generate PhysicalPlan anymore (#12740)
This PR:
1. add rewrite and implement method to PlanChecker
2. improve unit tests of runtime filter
2022-09-19 19:53:55 +08:00
1339eef33c [fix](statistics) remove statistical task multiple times in one loop cycle (#12741)
There is a problem with StatisticsTaskScheduler. The peek() method obtains a reference to the same task object, but the for-loop executes multiple removes.
2022-09-19 19:28:51 +08:00
4b5cc62348 [refactor](Nereids) rename transform to applyExploration UT helper class PlanChecker (#12725) 2022-09-19 16:49:56 +08:00
08a71236a9 [feature](statistics) Internal-query, execute SQL query statement internally in FE (#9983)
Execute SQL query statements internally(in FE). Internal-query mainly used for statistics module, FE obtains statistics by SQL from BE,  such as column maximum value, minimum value, etc.

This is a tool module as statistics, it will not affect the original code, also will not affect the use of users.

The simple usage process is as follows(the following code does no exception handling):
```
String dbName = "test";
String sql = "SELECT * FROM table0";

InternalQuery query = new InternalQuery(dbName, sql);
InternalQueryResult result = query.query();
List<ResultRow> resultRows = result.getResultRows();

for (ResultRow resultRow : resultRows) {
    List<String> columns = resultRow.getColumns();
    for (int i = 0; i < resultRow.getColumns().size(); i++) {
        resultRow.getColumnIndex(columns.get(i));
        resultRow.getColumnName(i);
        resultRow.getColumnType(columns.get(i));
        resultRow.getColumnType(i);
        resultRow.getColumnValue(columns.get(i));
        resultRow.getColumnValue(i);
    }
}
```
2022-09-19 16:26:54 +08:00
399af4572a [improve](Nereids) improve join cost model (#12657) 2022-09-19 16:25:30 +08:00
5978fd9647 [refactor](file scanner)Refactor file scanner. (#12602)
Refactor the scanners for hms external catalog, work in progress.
Use VFileScanner, will remove NewFileParquetScanner, NewFileOrcScanner and NewFileTextScanner after fully tested.
Query for parquet file has been tested, still need to add readers for orc file, text file and load logic as well.
2022-09-19 15:23:51 +08:00
d68b8cce1a [fix](intersect) fix intersect query failed in row storage code (#12712) 2022-09-19 11:47:50 +08:00
75d7de89a5 [improve](Nereids) Add all slots used by onClause to project when reorder and fix reorder mark (#12701)
1. Add all slots used by onClause in project

```
(A & B) & C like
join(hash conjuncts: C.t2 = A.t2)
|---project(A.t2)
|   +---join(hash conjuncts: A.t1 = B.t1)
|       +---A
|       +---B
+---C

transform to (A & C) & B
join(hash conjuncts: A.t1 = B.t1)
|---project(A.t2)
|   +---join(hash conjuncts: C.t2 = A.t2)
|       +---A
|       +---C
+---B
```

But projection just include `A.t2`, can't find `A.t1`, we should add slots used by onClause when projection exist.

2. fix join reorder mark

Add mark `LAsscom` when apply `LAsscom`

3. remove slotReference

use `Slot` instead of `SlotReference` to avoid cast.
2022-09-19 11:01:25 +08:00
415721ef20 [enhancement](pred column) improve predicate column insert performance (#12690)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-09-19 10:53:48 +08:00
fb9e48a34a [fix](vstream load) Fix bug when load json with jsonpath (#12660) 2022-09-19 10:13:18 +08:00
1fa65708d7 [test](time_add or sub)add time_add and time_sub funcation case #12641 2022-09-19 09:22:53 +08:00
4669fa54cc [enhancement](test) add tpch_sf100_unique p2 test (#12697) 2022-09-19 09:19:17 +08:00
b608de668f [fix](compile)compile error: open_telemetry_scop_wrapper.hpp cannot file 'UNLIKELY' (#12709) 2022-09-19 09:18:04 +08:00
6d3ae1e69c [regression](left join)Add left join, the left table is empty, the query result is not empty case (#12344)
Add left join, the left table is empty, the query result is not empty case
2022-09-19 08:53:50 +08:00
fa8ed2bccc [fix](array-type) fix the invalid format load for stream load (#12424)
this pr is used to fix the invalid format load for stream load.
before the change , we will get the error when we load the invalid array format.
the origin file to load :
1 [1, 2, 3]
2 [4, 5, 6]
3 \N
4 [7, \N, 8]
5 10, 11, 12
[hugo@xafj-palo]$ sh curl_cmd.sh
{
"TxnId": 11035,
"Label": "11c9f111-188e-4616-9a50-aec8b7814513",
"TwoPhaseCommit": "false",
"Status": "Fail",
"Message": "Array does not start with '[' character, found '1'",
"NumberTotalRows": 0,
"NumberLoadedRows": 0,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 55,
"LoadTimeMs": 7,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 2,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 3,
"CommitAndPublishTimeMs": 0
}
3. after this change, we will get success and the error url which report the error line.
[hugo@xafj-palo]$ sh curl_cmd.sh
{
"TxnId": 11046,
"Label": "249808ee-55f4-4c08-b671-b3d82689d614",
"TwoPhaseCommit": "false",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 5,
"NumberLoadedRows": 4,
"NumberFilteredRows": 1,
"NumberUnselectedRows": 0,
"LoadBytes": 55,
"LoadTimeMs": 39,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 2,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 19,
"CommitAndPublishTimeMs": 16,
"ErrorURL": "http://10.81.85.89:8502/api/_load_error_log?file=__shard_3/error_log_insert_stmt_8d4130f0c18aeb0a-ad7ffd4233c41893_8d4130f0c18aeb0a_ad7ffd4233c41893"
}

the sql select result:
MySQL [example_db]> select * from array_test06;
+------+--------------+
| k1 | k2 |
+------+--------------+
| 1 | [1, 2, 3] |
| 2 | [4, 5, 6] |
| 3 | NULL |
| 4 | [7, NULL, 8] |
+------+--------------+
4 rows in set (0.019 sec)

the url page show us:
"Reason: Invalid format for array column(k2). src line [10, 11, 12]; "

Issue Number: #7570
2022-09-19 08:52:59 +08:00
65cff8d40c [enhancement](compaction) prevent quick_compaction&auto_compaction conflict (#12674)
Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-09-19 08:39:27 +08:00
bc38b2fdfb [improvement](new-scan) graceful quit scanner scheduler (#12715) 2022-09-19 08:39:08 +08:00
625ac83f72 [enhancement](test) add opensky cases to p2 (#12693) 2022-09-19 08:38:17 +08:00
fc8f4c787d [enhancement](test) add yandex_metrica cases to p2 (#12692) 2022-09-19 08:37:48 +08:00
3b7a04ee8b [fix](inpredicate)always use PredicateColumn<TYPE_STRING> for CHAR, VARCHAR and STRING type (#12637)
The predicate column type for char, varchar and string is PredicateColumnType<TYPE_STRING>, so _base_evaluate method should convert the input column to PredicateColumnType<TYPE_STRING> always.
2022-09-19 08:37:06 +08:00
a4ed023bad [fix](colocation) fix decommission failure with 2 BEs and colocation table (#12644)
This PR fix:

2 Backends.
Create tables with colocation group, 1 replica.
Decommission one of Backends.
The tablet on decommissioned Backend is not reduced.
This is a bug of ColocateTableCheckerAndBalancer.
2022-09-19 08:34:50 +08:00
HB
00dda79735 [fix](broker-load) Correction of kerberos authentication time determination rule (#11793)
Every time a new broker load comes in, Doris will update the start time of Kerberos authentication,
but this logic is wrong.
Because the authentication duration of Kerberos is calculated from the moment when the ticket is obtained.

This PR change the logic:
1. If it is kerberos, check fs expiration by create time.
2.Otherwise, check fs expiration by access time
2022-09-18 17:46:13 +08:00
cb06e67fba [fix](tracing) Fix opentelemetry log output to be.out (#11856) 2022-09-18 17:40:23 +08:00
4f98146e83 [enhancement](tracing) Support forward to master tracing (#12290) 2022-09-18 17:39:04 +08:00
e9f105aa1e [enhancement](regression-test) add some p0 cases (#12243) 2022-09-18 17:36:08 +08:00
c30453e9ab [enhancement](regression-test) add ssb_sf100 to p2 cases (#12286) 2022-09-18 17:35:16 +08:00