Commit Graph

3096 Commits

Author SHA1 Message Date
5a700223fe [fix](function) fix coredump cause by return type mismatch of vectorized repeat function (#13868)
Will not support repeat function during upgrade in vectorized engine.
2022-11-03 09:53:02 +08:00
32a029d9dc [enhancement](memtracker) Refactor load channel + memtable mem tracker (#13795) 2022-11-03 09:47:12 +08:00
b3c6af0059 [Bugfix](MV) Fixed load negative values into bitmap type materialized views successfully under non-vectorization (#13719)
* [Bugfix](MV) Fixed load negative values into bitmap type materialized views successfully under non-vectorization
2022-11-03 09:21:38 +08:00
37e4a1769d [fix](sequence) fix that update table core dump with sequence column (#13847)
* [fix](sequence) fix that update table core dump with sequence column

* update
2022-11-03 09:02:21 +08:00
1ee6518e00 [fix](unique-key-merge-on-write) Types don't match when calling IndexedColumnIterator::seek_at_or_after (#13885) 2022-11-03 08:50:29 +08:00
28a4a8dc17 [fix](storage) evaluate_and of ComparisonPredicateBase has logical error (#13895) 2022-11-03 08:48:28 +08:00
7b4c2cabb4 [feature](new-scan) support transactional insert in new scan framework (#13858)
Support running transactional insert operation with new scan framework. eg:

admin set frontend config("enable_new_load_scan_node" = "true");
begin;
insert into tbl1 values(1,2);
insert into tbl1 values(3,4);
insert into tbl1 values(5,6);
commit;
Add some limitation to transactional insert

Do not support non-literal value in insert stmt
Fix some issue about array type:

Forbid cast other non-array type to NESTED array type, it may cause BE crash.
Add getStringValueForArray() method for Expr, to get valid string-formatted array type value.
Add useLocalSessionState=true in regression-test jdbc url
without this config, the jdbc driver will send some init cmd each time it connect to server, such as
select @@session.tx_read_only.
But when we use transactional insert, after begin command, Doris do not support any other type of
stmt except for insert, commit or rollback.
So adding this config to let the jdbc NOT send cmd when connecting.
2022-11-03 08:36:07 +08:00
228e5afad8 [Load](Sink) remove validate the column data when data is NULL (#13919) 2022-11-03 08:33:45 +08:00
b83744d2f6 [feature](function)add regexp functions: regexp_replace_one, regexp_extract_all (#13766) 2022-11-02 23:15:57 +08:00
fbc8b7311f [Opt](function) opt the function of ndv (#13887) 2022-11-02 22:21:20 +08:00
62f765b7f5 [improvement](scan) speed up inserting strings into ColumnString (#13397) 2022-11-02 22:19:02 +08:00
374303186c [Vectorized](function) support topn_array function (#13869) 2022-11-02 19:49:23 +08:00
ba918b40e2 [chore](macOS) Fix compilation errors caused by the deprecated function (#13890) 2022-11-02 13:34:51 +08:00
Pxl
be124523f4 [enhancement](profile) add profile to show column predicates (#13862) 2022-11-02 09:07:26 +08:00
277025b046 [fix](join)ColumnNullable need handle const column with nullable const value (#13866) 2022-11-02 08:52:49 +08:00
de1dc62843 [enhancement](olap scanner) Scanner row bytes buffer is too small bug (#13874)
* [enhancement](olap scanner) Scanner row bytes buffer is too small, please try to increase be config

Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-11-02 08:41:50 +08:00
3924ecead5 [minor](load) Improve error message for string type in loading process (#13718) 2022-11-01 22:02:33 +08:00
8b3afd431e [improvement](memory) simplify memory config related to tcmalloc (#13781)
There are several configs related to tcmalloc, users do know how to config them. Actually users just want two modes, performance or compact, in performance mode, users want doris run query and load quickly while in compact mode, users want doris run with less memory usage.

If we want to config tcmalloc individually, we can use env variables which are supported by tcmalloc.
2022-11-01 21:45:19 +08:00
287a739510 [javaudf](string) Fix string format in java udf (#13854) 2022-11-01 21:25:12 +08:00
f30b974d54 [Bugfix](upgrade) Fix 1.1 upgrade 1.2 coredump when schema change (#13822)
When upgrade 1.2 version from 1.1, FE version will don't match BE version for a period of time. After upgrade BE and doing schema change, BE will use a field desc_tbl that add in 1.2 version FE. BE will coredump because the field desc_tbl is nullptr. So it need to refuse the request.
2022-11-01 17:35:24 +08:00
c14277e587 [fix](analytic) fix coredump cause by empty analytic parameter types (#13808)
* fix fe compile error
2022-11-01 17:25:36 +08:00
942611c185 Revert "[enhancement](compaction) opt compaction task producer and quick compaction (#13495)" (#13833)
This reverts commit 4f2ea0776ca3fe5315ab5ef7e00eefabfb5771a0.
2022-11-01 14:22:12 +08:00
7db916fc85 [enhancement](metric)Add metric for exec_state prepare function (#13646)
* add bvar metric for exec_state prepare function
2022-11-01 14:09:47 +08:00
42b2725f03 [Bug](delete) Fix wrong delete operation (#13840) 2022-11-01 13:38:43 +08:00
Pxl
164ca1e1a8 [Bug](function) change log fatal to log warning to avoid code dump on nullable double column cast to decimal column (#13819) 2022-11-01 09:54:35 +08:00
cc0fa5fef6 [fix](array-type) fix the be core dump when import array<largeint> (#13821)
- this pr is used to fix the be core dump when import array.
- before the change, we import array by rapidjson string will core dump under the non-vectorized scenario.
- after the change, we can import array by rapidjson string successfully.
2022-10-31 22:08:55 +08:00
Pxl
57a9b0fa65 [Enhancement](chore) remove unused diagnostic (#12337)
remove unused diagnostic
2022-10-31 19:19:13 +08:00
7ae60a0ad2 [feature](function)add url functions: domain and protocol (#13662) 2022-10-31 19:13:08 +08:00
2fb218173e [improvement](scan) change the max thread num and num of free blocks in new scan (#13793)
1. 
In the previous implementation, the max thread num of olap scanner was set relatively small, such as 3.
which would slow down some of queries.
In this PR, I changed the max thread num  to a quarter of the scaner thread pool(default is 12),
which is less than the old scan node's max thread num, but larger than the previous implementation.
The upper limit of the max thread num of the old scan node is too high, which is not reasonable.

2.
Lower down the number of pre allocated free blocks.
2022-10-31 14:00:06 +08:00
4f2ea0776c [enhancement](compaction) opt compaction task producer and quick compaction (#13495)
1.remove quick_compaction's rowset pick policy, call cu compaction when trigger
quick compaction
2. skip tablet's compaction task when compaction score is too small

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-10-31 12:24:05 +08:00
2b9e1878a2 [fix](hashjoin) return error if in progress of upgrade (#13753) 2022-10-31 09:41:20 +08:00
Pxl
711dad28fb [Chore](unused) remove QSorter #13769 2022-10-31 08:44:39 +08:00
b15e0a9fb5 [Bug](function) fix bug of if function of nullable column process (#13779) 2022-10-31 08:38:53 +08:00
9f7c76a0d6 [fix](memtracker) Fix the usage of bthread mem tracker (#13708)
bthead context init has performance loss, temporarily delete it first, it will be completely refactored in #13585.
2022-10-30 19:51:00 +08:00
e0667b297f [feature-wip](multi-catalog) reuse hdfsFs and decode parquet values in batch (#13688)
PR(https://github.com/apache/doris/pull/13404) introduced that ParquetReader
will break up batch insertion when encountering null values, which leads to the bad performance
compared to OrcReader.
So this PR has pushed null map into decode function, reduce the time of virtual function call
when encountering null values.

Further more, reuse hdfsFS among file readers to reduce the time of building connection to hdfs.
2022-10-28 15:52:52 +08:00
eab8876abc [Feature](remote) Using heavy schema change if the table is not enable light weight schema change (#13487) 2022-10-28 15:48:22 +08:00
Pxl
2fab0c45c7 [Feature](runtime-filter) add runtime filter breaking change adapt (#13246)
add runtime filter breaking change adapt
2022-10-28 10:59:28 +08:00
5805011629 [Feature](string-function) Add function mask/mask_first_n/mask_last_n (#13694)
Implementation of mask function from hive.
2022-10-28 10:43:56 +08:00
d6b72d9b89 [Bug](update) support to check optional value of agg_sort_infos (#13732) 2022-10-28 10:37:13 +08:00
a8a91a827a [fix] Fix the variable of boost_ROOT ,BOOST_ROOT will not work (#13450)
When execute shell command bash build.sh --be to build the backend, the cmake tool will show can't find the boost library, because the variable of BOOST_ROOT has some spelling mistake.

OS: Ubuntu 22.04 x86_64
CMake: 3.22.1
compiler: gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0
2022-10-28 08:46:35 +08:00
2ef8f3f6f4 [enhancement](java-udf) Support loading libjvm at runtime (#13660) 2022-10-28 08:45:12 +08:00
20363edc73 [BugFix](function) fix reverse function dynamic buffer overflow due to illegal character (#13671)
Previous logic of reverse function might not be strong enough to handle illegal character. For example, one one byte size character would be mistaken as one utf-8 character which occupies more than one byte space. And unfortunately exceeding the buffer space during future process.
2022-10-28 08:44:08 +08:00
859ffa6304 [bugfix](concat) be crash caused by function concat(ifnull) (#13693) 2022-10-28 08:42:51 +08:00
c108554f14 [function](date function) add new date function 'to_monday' #13707 2022-10-28 08:41:16 +08:00
f51464af59 [chore](macOS) Support Java UDF (#13714) 2022-10-28 08:40:56 +08:00
5dd052d386 [Function](array) support array_range function (#13547)
* array_range with 3 impl

* [Function](array) support array_range function

* update

* update code
2022-10-28 08:40:24 +08:00
43c6428aea [Function](string) support sub_replace function (#13736)
* [Function](string) support sub_replace function

* remove conf
2022-10-28 08:40:08 +08:00
36053d2419 [fix](array-type) fix the be core dump when select the invalid array format (#13514)
1. this pr is used to fix the be core dump when select the invalid array.
2. before the change, we run "select array_intersect([1, 2, 3, 1, 2, 3], '1[3, 2, 5]');" will cause be core dump.
MySQL [example_db]> select array_intersect([1, 2, 3, 1, 2, 3], '1[3, 2, 5]');
ERROR 1105 (HY000): RpcException, msg: io.grpc.StatusRuntimeException: UNAVAILABLE: Network closed for unknown reason
3. after the change, we run "select array_intersect([1, 2, 3, 1, 2, 3], '1[3, 2, 5]');" will get error message.
MySQL [example_db]> select array_intersect([1, 2, 3, 1, 2, 3], '1[3, 2, 5]');
errCode = 2, detailMessage = No matching function with signature: array_intersect(array<tinyint(4)>, varchar(-1))"
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-10-27 23:11:12 +08:00
bad950136d [chore](build) Pass the compile flag -Wno-unused-but-set-variable on demand (#13716)
There are some issues with the compile flag `-Wno-unused-but-set-variable` for clang.
1. `-Wno-unused-but-set-variable` should be set when building source by clang-15 on Linux. (#13000 #13016)
2. On macOS Monterey, Apple Clang 13 may treat it as a unknown warning option and the compilation process may interrupt.

This PR introduces a better way to make this compile flag more portable.
1. Test whether the compiler recognizes this flag.
2. Add this flag if the compiler recognizes it.
2022-10-27 15:18:28 +08:00
738da0b139 [bugfix](join) inner join return wrong result (#13608)
* bug fix for vhash join

* add regression test

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-10-27 11:48:41 +08:00