Commit Graph

8276 Commits

Author SHA1 Message Date
3894de49d2 [Enhancement](topn) support two phase read for topn query (#15642)
This PR optimize topn query like `SELECT * FROM tableX ORDER BY columnA ASC/DESC LIMIT N`.

TopN is is compose of SortNode and ScanNode, when user table is wide like 100+ columns the order by clause is just a few columns.But ScanNode need to scan all data from storage engine even if the limit is very small.This may lead to lots of read amplification.So In this PR I devide TopN query into two phase:
1. The first phase we just need to read `columnA`'s data from storage engine along with an extra RowId column called `__DORIS_ROWID_COL__`.The other columns are pruned from ScanNode.
2. The second phase I put it in the ExchangeNode beacuase it's the central node for topn nodes in the cluster.The ExchangeNode will spawn a RPC to other nodes using the RowIds(sorted and limited from SortNode) read from the first phase and read row by row from storage engine.

After the second phase read, Block will contain all the data needed for the query
2023-01-19 10:01:33 +08:00
c7a72436e6 [Feature](multi-catalog)Add support for JuiceFS (#15969)
The broker implements the interface to juicefs,It supports loading data from juicefs to doris through broker.
At the same time, it also implements the multi catalog to read the hive data stored in juicefs
2023-01-19 08:54:16 +08:00
wxy
7288f1f1d4 [Fix](profile) do not send export profile when enable_profile=false. (#15996) 2023-01-19 08:06:39 +08:00
c43edbdfea [bug](cooldown)fix bug for single cooldown (#16040)
* fix bug for single cooldown

* fix bug for single cooldown
2023-01-19 08:03:32 +08:00
45b39c5aaf [enhancement](regression-test) Support BenchmarkAction (#16071)
Support benchmarkAction for regression test, this action can help us to run the benchmark queries and print the result

example:

benchmark {
    executeTimes 3
    warmUp true
    skipFailure true
    printResult true

    sqls(["select 1", "select 2"])
}
2023-01-19 08:02:05 +08:00
76622bcab4 [enhance](FE): remove constructor just used for UT and useless ERROR code (#16080)
* [enhance](FE): remove constructor just used for UT.

* [enhance](FE): remove useless ERROR Code

* fix checkstyle
2023-01-19 08:00:48 +08:00
d8f598eeab [enhancement](Nereids) add timestampadd, timestampdiff functions (#16072) 2023-01-19 01:05:25 +08:00
2acf634f84 [CleanUp](FE): cleanup useless code in FE. (#16058) 2023-01-18 22:25:41 +08:00
baf62b4418 [test](Nereids) add regression-test for running_difference and regexp_extract_all (#16049) 2023-01-18 22:24:52 +08:00
78ba446487 [Enhancement](Nereids) add more clear message when parse failed (#16056) 2023-01-18 22:19:46 +08:00
cbcd5228b7 [enhance](nereids): polish code for mergeGroup(). (#16057) 2023-01-18 21:03:46 +08:00
feeb69438b [opt](Nereids) optimize DistributeSpec generator of OlapScan (#15965)
use the size of selected partitions instead of olap table partition size to decide whether generate hashDistributeSpec
2023-01-18 20:18:11 +08:00
34075368ec (improvement)[bucket] Add auto bucket implement (#15250) 2023-01-18 19:50:18 +08:00
0916cbcb10 [ehancement](nereids) Made the parse for named expression more complete (#16010)
After this PR, we could support such grammar.

SELECT SUBSTRING("dddd编", 0, 3) AS "测试";
SELECT SUBSTRING("dddd编", 0, 3) "测试";
2023-01-18 19:44:51 +08:00
ee76b9796c [Bug](regresstest) BE Crash in DEBUG mode run regress test (#16042) 2023-01-18 17:58:16 +08:00
95c91fab2e [refactor](vec) delete non-vec runtime filter (#16016)
* [refactor](vec) delete non-vec runtime filter

* update
2023-01-18 17:49:20 +08:00
4035bd83c3 [fix](jdbc) fix jdbc driver bug and external datasource p2 test case issue (#16033)
Fix bug that when create jdbc resource with only jdbc driver file name, it will failed to do checksum
This is because we forgot the pass the full driver url to JdbcClient.

Add ResultSet.FETCH_FORWARD and set AutoCommit to false to jdbc connection, so to avoid OOM when fetching large amount of data

set useCursorFetch in jdbc url for both MySQL and PostgreSQL.

Fix some p2 external datasource bug
2023-01-18 17:48:06 +08:00
bac2adfc74 [refractor](schema) refractor schema::get_predicate_column_ptr (#16043)
* refractor Schema::get_predicate_column_ptr

* update code format

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2023-01-18 17:47:37 +08:00
5265f5142f [fix](Nereids) add string and character type (#16044) 2023-01-18 17:27:45 +08:00
1fa2b662cf [opt](Nereids) add date_add/sub function (#16048)
1. add week_add week_diff function
2. register all date_add/date_diff function
2023-01-18 17:11:44 +08:00
94628f09e9 [regression-test](spark-connector) Add the regression case of the spark doris connector (#14877)
* [regression-test](spark-connector) Add the regression case of the spark doris connector
2023-01-18 16:41:41 +08:00
bd0d650c3d [fix](Nereids) prohibit cross join with on clause (#16035) 2023-01-18 16:21:01 +08:00
d257059e6b [refactor](remove hadoop dpp) remove hadoop dpp code since it is not used (#16009) 2023-01-18 15:01:04 +08:00
de0e402e52 [fix](nereids) bucket shuffle join use wrong shuffled column info (#16011) 2023-01-18 14:44:36 +08:00
65d9293fa9 [testcase](bitmap index)bitmap index testcase (#15975)
* add bitmap index testcases for all scalar types

* add bitmap index testcases for all scalar types
2023-01-18 14:17:24 +08:00
a7d572ec7f typo(docs):correct the default value of the be parameter brpc_num_threads in the document (#16041)
Co-authored-by: tongyang.hty <hantongyang@douyu.tv>
2023-01-18 13:50:58 +08:00
42b5d17fa1 [refactor](remove non vec) remove column block and column view (#16022)
* [refactor](remove non vec) remove column block and column view and column vectorized batch

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-18 12:40:53 +08:00
46ce97a190 [enhance](planner)convert 'or' into 'in-predicate' (#15737)
in previous [PR 12872](https://github.com/apache/doris/pull/12872), we convert multi equals on same slot into `in predicate`. for example, `a =1 or a = 2` => `a in (1, 2)`

This pr makes 4 changes about convert or to in:
1. fix a bug: `Not IN`  is merged with equal. `a =1 or a not in (2, 3)` => `a in (1, 2, 3)`
2. extends this rule on more cases
  - merge for more than one slot: 'a =1 or a = 2 or b = 3 or b = 4' => `a in (1, 2) or b in (3, 4)`
  - merge skip not-equal and not-in: 'a =1 or a = 2 or b !=3 or c not in (1, 2)' => 'a in (1, 2) or b!=3 or c not in (1,2)`
3. rewrite recursively. 
4. OrToIn is implemented in ExtractCommonFactorsRule. This rule will generate new exprs. OrToIn should apply on such generated exprs. for example `(a=1 and b=2) or (a=3 and b=4)` => `(a=1 or a=3) and (b=2 or b=4) and [(a=1 and b=2) or (a=3 and b=4)]` => `a in (1,3) and b in (2 ,4) and [(a=1 and b=2) or (a=3 and b=4)]` 

In addition, this pr add toString() for some Expr.
2023-01-18 12:33:20 +08:00
c0ea9b0b81 [fix](Nereids) running_difference return type is not right (#16028) 2023-01-18 11:35:02 +08:00
121f4d6ac0 [fix](Nereids) cannot put two same table value function into one memo (#16026) 2023-01-18 11:32:09 +08:00
18f71180ce [fix](Nereids) avoid same group expression add to one group when do merge (#15999) 2023-01-18 11:22:18 +08:00
b2fe385742 [refractor](schema) refractor function Schema::get_column_by_field to make it simple #16027
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2023-01-18 11:11:16 +08:00
40fa5b4019 [fix](MTMV) Show MTMV statement on table raises exceptions (#15882) 2023-01-18 10:25:33 +08:00
e579530c99 [Feature-WIP](inverted index) support use inverted index searcher cache (#16003)
use inverted index searcher cache to improve query performance

dependency pr: #14211 #15807 #15823
2023-01-18 09:30:55 +08:00
3bff5ebf9a [fix](DOE) only return first batch data in ES 8.x (#16025)
Do not use terminate_after and size together in scroll request of ES 8.x.
2023-01-18 09:28:34 +08:00
31cc99964c [Feature-WIP](inverted index)(bkd) bdk index'reader implementation which in inverted index using for numeric types (#15994)
Step3 of DSIP-023: Add inverted index for full text search
implementation of bkd index's reader which in inverted index using for numeric types
dependency pr: #14211 #15807 #15823
2023-01-18 09:24:19 +08:00
96b9115286 [fix](nereids) fix bug of invalid column in olap scan node when a materialized view is selected (#15976)
if a materialized view is selected, the olap scan node's NonUserVisibleOutput property may contains column from other materialized view. This pr remove invalid column
2023-01-18 01:02:12 +08:00
3810727688 [chore](thirdparty) Update bitshuffle from 0.3.5 to 0.5.1 (#15993)
In order to use AVX512 instructions. See #15972
2023-01-17 23:46:11 +08:00
388d623506 [fix](MTMV) Refine the process of refreshing data (#16006)
1. Remove some redundant code.
2. Fix the issue with the state of MTMV task.
3. Fix the case - test_create_mtmv.

## Problem summary

1. We used a retry policy to re-run the failed MTMV tasks, but we set the state to `FAILURE` during re-running the tasks.
We should do this after all the retry runs fail.
2. There are some redundant code can be removed.
3. In the case test_create_mtmv, we created many background tasks to refresh the data. Some task may fail due to the concurrency and cause the test fail. Actually, we only need single one task to verify the functionality.
2023-01-17 23:08:12 +08:00
9c723aec59 [fix](thirdparty) update bzip download info (#16012)
Check historical changes, no need for two sources for same bzip(as some others remove two source like boost etc.)
2023-01-17 21:08:05 +08:00
0c8255d9b8 [fix](nereids)nest loop join should support filter conjuncts like hash join (#15979) 2023-01-17 20:38:38 +08:00
3d05ffb10e [fix](Nereids) add 'integer' as alias of int type (#15983) 2023-01-17 20:33:26 +08:00
4d863a18c3 [fix](regression-test) Fix the build for Java UDF Case (#15851)
After opening the project in Intellij Idea, we can see the cause. It is because Apache Maven of which the version is 3.8.1 or newer blocks http repositories by default. Therefore, we can fix this issue by adding a https repository which contains this package in pom.xml.
2023-01-17 20:25:53 +08:00
e2d145cf5d [fix](fe)fix anti join bug (#15955)
* [fix](fe)fix anti join bug

* fix fe ut
2023-01-17 20:25:00 +08:00
d2a8b3fc1e [doc](multi-catalog) fix some format and typo (#15988) 2023-01-17 20:22:17 +08:00
wxy
061b28b32e [Fix](profile) fix /rest/v1/query_profile action. (#15981)
Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>
2023-01-17 20:21:48 +08:00
02a7995171 [fix](planner)wrong result when has order by under join (#15974) 2023-01-17 20:20:56 +08:00
e6a5d3375e [Feature-WIP](inverted index) add chinese analyzer for inverted index reader (#15998)
add chinese analyzer for inverted index reader
dependency pr: #14211 #15807 #15823
2023-01-17 20:20:40 +08:00
08f87a56fc update the default value of fragment_pool_thread_num_max in be-config.md (#16013)
Co-authored-by: smallhibiscus <844981280>
2023-01-17 20:20:15 +08:00
6be0cc252a [fix](BrokerFileReader) fix Compile error #16018 2023-01-17 19:53:06 +08:00