Commit Graph

6921 Commits

Author SHA1 Message Date
ceb7b60a64 [fix](Nereids) update immutable LogicalAggregate attribute by mistake (#13740) 2022-10-31 14:11:55 +08:00
2fb218173e [improvement](scan) change the max thread num and num of free blocks in new scan (#13793)
1. 
In the previous implementation, the max thread num of olap scanner was set relatively small, such as 3.
which would slow down some of queries.
In this PR, I changed the max thread num  to a quarter of the scaner thread pool(default is 12),
which is less than the old scan node's max thread num, but larger than the previous implementation.
The upper limit of the max thread num of the old scan node is too high, which is not reasonable.

2.
Lower down the number of pre allocated free blocks.
2022-10-31 14:00:06 +08:00
4f2ea0776c [enhancement](compaction) opt compaction task producer and quick compaction (#13495)
1.remove quick_compaction's rowset pick policy, call cu compaction when trigger
quick compaction
2. skip tablet's compaction task when compaction score is too small

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-10-31 12:24:05 +08:00
53e5f3939e [fix](plan)result exprs should be substituted in the same way as agg exprs (#13744)
* [fix](cast)ignore implicit cast when comparing two exprs

* fix fe ut
2022-10-31 10:19:32 +08:00
61b7c2c96c [fix](join) fix incorrect result when using anti join with other join predicates (#13743) 2022-10-31 09:51:34 +08:00
2b9e1878a2 [fix](hashjoin) return error if in progress of upgrade (#13753) 2022-10-31 09:41:20 +08:00
f5761c658f [Fix]Fix the extension mysql_to_doris bug (#13723)
* Fix the extension mysql_to_doris  BUG

e_mysql_to_doris.sh: command error,This error causes script execution errors.  :ERROR 1103 (42000) at line 1: Incorrect table name ''.
 ` ` symbol position error

* Update extension/mysql_to_doris/bin/e_mysql_to_doris.sh


Co-authored-by: Adonis Ling <adonis0147@gmail.com>
2022-10-31 08:45:34 +08:00
Pxl
711dad28fb [Chore](unused) remove QSorter #13769 2022-10-31 08:44:39 +08:00
6159e1cc3a [enhancement](tpch-tools) git ignore tpch tool gen file #13789 2022-10-31 08:39:54 +08:00
b15e0a9fb5 [Bug](function) fix bug of if function of nullable column process (#13779) 2022-10-31 08:38:53 +08:00
753c2ccfd1 [fix](test) drop table before create it (#13791) 2022-10-30 22:45:08 +08:00
9f7c76a0d6 [fix](memtracker) Fix the usage of bthread mem tracker (#13708)
bthead context init has performance loss, temporarily delete it first, it will be completely refactored in #13585.
2022-10-30 19:51:00 +08:00
98cc32aa0e [BugFix](regression-test) add order by in left/right join test case (#13774)
The result of right join is unordered, so we need add order by to guarantee results consistent.
2022-10-30 18:00:08 +08:00
efe813ba60 [fix](test)(explain) add full qualified name for scan node explain string (#13777)
1.
In the "explain" result of SQL, the table name in `ScanNode` should be full qualified with dbname.
And for olap scan node, the selected index name should not be "null".

2.
Remove `tpch_sf1_p1/tpch_sf1/nereids/` in regression test, it will be fixed later.
2022-10-30 13:24:48 +08:00
2a5d3dbb6e feat(nereids): draw hyper graph by graphviz (#13749) 2022-10-28 17:23:35 +08:00
e0667b297f [feature-wip](multi-catalog) reuse hdfsFs and decode parquet values in batch (#13688)
PR(https://github.com/apache/doris/pull/13404) introduced that ParquetReader
will break up batch insertion when encountering null values, which leads to the bad performance
compared to OrcReader.
So this PR has pushed null map into decode function, reduce the time of virtual function call
when encountering null values.

Further more, reuse hdfsFS among file readers to reduce the time of building connection to hdfs.
2022-10-28 15:52:52 +08:00
eab8876abc [Feature](remote) Using heavy schema change if the table is not enable light weight schema change (#13487) 2022-10-28 15:48:22 +08:00
f325119362 [fix](regression-test) update table name string in tpch_sf1 explain case (#13724) 2022-10-28 11:13:29 +08:00
Pxl
2fab0c45c7 [Feature](runtime-filter) add runtime filter breaking change adapt (#13246)
add runtime filter breaking change adapt
2022-10-28 10:59:28 +08:00
5805011629 [Feature](string-function) Add function mask/mask_first_n/mask_last_n (#13694)
Implementation of mask function from hive.
2022-10-28 10:43:56 +08:00
d6b72d9b89 [Bug](update) support to check optional value of agg_sort_infos (#13732) 2022-10-28 10:37:13 +08:00
a8a91a827a [fix] Fix the variable of boost_ROOT ,BOOST_ROOT will not work (#13450)
When execute shell command bash build.sh --be to build the backend, the cmake tool will show can't find the boost library, because the variable of BOOST_ROOT has some spelling mistake.

OS: Ubuntu 22.04 x86_64
CMake: 3.22.1
compiler: gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0
2022-10-28 08:46:35 +08:00
2ef8f3f6f4 [enhancement](java-udf) Support loading libjvm at runtime (#13660) 2022-10-28 08:45:12 +08:00
20363edc73 [BugFix](function) fix reverse function dynamic buffer overflow due to illegal character (#13671)
Previous logic of reverse function might not be strong enough to handle illegal character. For example, one one byte size character would be mistaken as one utf-8 character which occupies more than one byte space. And unfortunately exceeding the buffer space during future process.
2022-10-28 08:44:08 +08:00
859ffa6304 [bugfix](concat) be crash caused by function concat(ifnull) (#13693) 2022-10-28 08:42:51 +08:00
c108554f14 [function](date function) add new date function 'to_monday' #13707 2022-10-28 08:41:16 +08:00
f51464af59 [chore](macOS) Support Java UDF (#13714) 2022-10-28 08:40:56 +08:00
5dd052d386 [Function](array) support array_range function (#13547)
* array_range with 3 impl

* [Function](array) support array_range function

* update

* update code
2022-10-28 08:40:24 +08:00
43c6428aea [Function](string) support sub_replace function (#13736)
* [Function](string) support sub_replace function

* remove conf
2022-10-28 08:40:08 +08:00
36053d2419 [fix](array-type) fix the be core dump when select the invalid array format (#13514)
1. this pr is used to fix the be core dump when select the invalid array.
2. before the change, we run "select array_intersect([1, 2, 3, 1, 2, 3], '1[3, 2, 5]');" will cause be core dump.
MySQL [example_db]> select array_intersect([1, 2, 3, 1, 2, 3], '1[3, 2, 5]');
ERROR 1105 (HY000): RpcException, msg: io.grpc.StatusRuntimeException: UNAVAILABLE: Network closed for unknown reason
3. after the change, we run "select array_intersect([1, 2, 3, 1, 2, 3], '1[3, 2, 5]');" will get error message.
MySQL [example_db]> select array_intersect([1, 2, 3, 1, 2, 3], '1[3, 2, 5]');
errCode = 2, detailMessage = No matching function with signature: array_intersect(array<tinyint(4)>, varchar(-1))"
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-10-27 23:11:12 +08:00
45b31506c7 [improvement](delete) support delete from partitioned table without partition specified (#13533)
Support delete from partitioned table without partition specified in [DELETE] stmt.

## Usage
If it is a partitioned table, you can specify a partition.
If not specified, Doris will infer partition from the given conditions.
In two cases, Doris cannot infer the partition from conditions:
1) the conditions do not contain partition columns;
2) The operator of the partition column is `not in`.
When a partition table does not specify the partition,
or the partition cannot be inferred from the conditions,
the session variable `delete_without_partition` needs to be `true`
to make delete statement be applied to all partitions.

## Test case
Test case is added in `regression-test/suites/delete_p0/test_delete_from_partition.groovy`,
user can delete from partitioned table without partition specified now.
2022-10-27 21:32:45 +08:00
578d956a6b [typo](doc):Correct spelling mistakes UDAF. (#13711) 2022-10-27 21:21:29 +08:00
4bfa95f669 [enhancement](tools) opt tpch q21: change join order (#13699) 2022-10-27 16:55:23 +08:00
bad950136d [chore](build) Pass the compile flag -Wno-unused-but-set-variable on demand (#13716)
There are some issues with the compile flag `-Wno-unused-but-set-variable` for clang.
1. `-Wno-unused-but-set-variable` should be set when building source by clang-15 on Linux. (#13000 #13016)
2. On macOS Monterey, Apple Clang 13 may treat it as a unknown warning option and the compilation process may interrupt.

This PR introduces a better way to make this compile flag more portable.
1. Test whether the compiler recognizes this flag.
2. Add this flag if the compiler recognizes it.
2022-10-27 15:18:28 +08:00
738da0b139 [bugfix](join) inner join return wrong result (#13608)
* bug fix for vhash join

* add regression test

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-10-27 11:48:41 +08:00
ec86e9c9b2 [feature-wip][MTMV] The schedule framework for the MTMV (#13147)
Design document: https://github.com/apache/doris/issues/13146
2022-10-27 11:37:24 +08:00
0e70d681d9 [feature](Nereids): Construct join graph (#13679)
* feat: add hypergraph and its api

* feat: add visulization api

Signed-off-by: xiejiann <jianxie0@gmail.com>

* remove unused code

Signed-off-by: xiejiann <jianxie0@gmail.com>

* fix format

Signed-off-by: xiejiann <jianxie0@gmail.com>

* remove unused test

Signed-off-by: xiejiann <jianxie0@gmail.com>

* remove unused tests

Signed-off-by: xiejiann <jianxie0@gmail.com>

* format

Signed-off-by: xiejiann <jianxie0@gmail.com>

Signed-off-by: xiejiann <jianxie0@gmail.com>
2022-10-27 11:32:31 +08:00
d388de6c11 [Enhancement](threadpool) print thread pool name on error (#13706) 2022-10-27 10:49:18 +08:00
c874931ac8 [fix](join)output all value from no-null side of outer join (#13655)
* [fix](joinoutput all value from no-null side of outer join

* add regression test
2022-10-27 10:48:36 +08:00
2697f72d77 [Improvement][SET-PROPERTY] Support for set query_timeout property (#13444) 2022-10-27 10:03:39 +08:00
7557980d64 [improvement](regression-test) avoid query empty result after loading finished (#13682)
When running regression test, we always found that the query return empty result after loading finished,
even if we call "sync" before the query.
This is because for `stream load`, the load task result will be returned immediately after the txn's status changed to VISIBLE,
but before writing the edit log.
So if we do the query right after we got the load task result, it is possible that we can not see the latest loaded data.

Same issue with `insert` operation
2022-10-27 09:47:18 +08:00
ffcb2f8525 [opt](exec) Replace get_utf8_byte_length function by array (#13664) 2022-10-27 09:46:41 +08:00
5bd66243ee [minor](log) remove some unused logs (#13689)
1. When running regression test with specific suites or group, do not print other suite name or file name
2. Remove unused alter table job log.
2022-10-27 09:37:32 +08:00
3e8cd0c669 [typo](doc) Add the description of json HDFS broker load (#13683)
Add the instruction of HDFS broker load with json format file.
2022-10-27 09:36:57 +08:00
d2262bc8fb [docs]fix 404 (#13695)
[docs]fix 404
2022-10-27 08:49:36 +08:00
3c95106d45 [Bug](jdbc) Fix memory leak for JDBC datasource (#13657) 2022-10-27 00:02:25 +08:00
0134e9d2f4 [Improvement](runtime filter) Reduce merging time for bloom filter (#13668) 2022-10-27 00:02:05 +08:00
06e433e14a [fix](cmake)fix cmake error (#13637)
fix cmake error if variables(${LIB_JVM}) is ""
2022-10-26 21:38:50 +08:00
ddb27b9c3f nereids use decimal(27,9) (#13678) 2022-10-26 21:37:24 +08:00
f4c8d4ce85 [feature](nereids) estimate plan cost by column ndv and table row count (#13375)
In this version, we use column ndv information to estimate plan cost.

This is the first version, covers TPCH queries.
2022-10-26 20:35:10 +08:00