Commit Graph

3935 Commits

Author SHA1 Message Date
27352afdf6 [fix](fe)support multi distinct group_concat (#17237)
* [fix](fe)support multi distinct group_concat

* update based on comments
2023-03-02 17:53:13 +08:00
823d968452 [fix](expr) avoid crashing caused by big depth of expression tree (#17314) 2023-03-02 16:55:53 +08:00
39f59f554a [improvement](dry-run)(tvf) support csv schema in tvf and add "dry_run_query" variable (#16983)
This CL mainly changes:

Support specifying csv schema manually in s3/hdfs table valued function

s3 (
'URI' = 'https://bucket1/inventory.dat',
'ACCESS_KEY'= 'ak',
'SECRET_KEY' = 'sk',
'FORMAT' = 'csv',
'column_separator' = '|',
'csv_schema' = 'k1:int;k2:int;k3:int;k4:decimal(38,10)',
'use_path_style'='true'
)
Add new session variable dry_run_query

If set to true, the real query result will not be returned, instead, it will only return the number of returned rows.

mysql> select * from bigtable;
+--------------+
| ReturnedRows |
+--------------+
| 10000000     |
+--------------+
This can avoid large result set transmission time and focus on real execution time of query engine.
For debug and analysis purpose.
2023-03-02 16:51:27 +08:00
17f4990bd3 [enhancement](functioncontext) function context should use shared ptr and simply function context (#17311)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-02 16:23:54 +08:00
9f088f6e90 [feature](json) add json_valid function (#17247)
add json_valid function

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-03-02 14:08:52 +08:00
9155d8b9d1 [fix](delete) fix 'is null' or 'is not null' delete predicate will get wrong result (#17190)
fix 'is null' or 'is not null' delete predicate will get wrong result

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-03-02 14:05:44 +08:00
707f814fc2 [fix](inverted index) fix still execute match query after drop inverted index (#17293)
background:
At the moment, match query must with inverted index,

problem description:
After drop inverted index which is the only index in table, there still can use match query for this index column.

fix it:
The index should be updated on BE regardless of whether the indexes_desc from FE is empty.
2023-03-02 11:12:54 +08:00
30df268c1f [fix](hdfs)(catalog) fix BE crash when hdfs-site.xml not exist in be/conf and fix compute node logic (#17244)
We set LIBHDFS3_CONF env in start_be.sh, so libhdfs3 will try to read this hdfs-site.xml,
if file does not exist, it will throw error. But Doris does not handle this error, cause BE crash.
This CL mainly changes:

Modify start_be.sh to only set LIBHDFS3_CONF if hdfs-site.xml exist.
Refactor the HDFSCommonBuilder so that it can return error correctly.
Add BE IP info in status, so that we can get ip from error msg like:
ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]failed to init reader for file  000.snappy.orc, err: 
[INTERNAL_ERROR][172.21.0.101]failed to init HDFSCommonBuilder, please check check be/conf/hdfs-site.xml
The logic of prefer compute node is wrong, which causing the external table query can only assign up to 3 backends.
This CL refactor this logic and also change some FE config:

prefer_compute_node_for_external_table

If set to true, query on external table will prefer to assign to compute node.
And the max number of compute node is controlled by min_backend_num_for_external_table.
If set to false, query on external table will assign to any node.

min_backend_num_for_external_table

Only take effect when prefer_compute_node_for_external_table is true.
If the compute node number is less than this value, query on external table will try to get some mix node
to assign, to let the total number of node reach this value.
If the compute node number is larger than this value, query on external table will assign to compute node only.
2023-03-02 11:09:55 +08:00
de5112bd90 [bugfix](merger) traverse rs_meta in lock (#17271)
tablet_schema(version) will traverse rowset_meta and it should call in meta_lock.
2023-03-02 09:47:44 +08:00
b7677beab7 [enhancement](memtracker) Add special counter for memtracker and fix thread create and destroy track #17301
Add a special counter for memtracker, faster, but relaxed ordering and not accurate in real time
Track thread create and destroy memory, which was previously removed due to performance loss and added back
2023-03-02 08:55:00 +08:00
d7ee542dd4 [refactor](function) refine function geo #17289
remove unused constant args
2023-03-02 08:42:16 +08:00
Pxl
527eb5b059 [Enchancement](function) nullable inline refactor of min_max_by/bitmap && add register_functio… (#17228)
1. nullable inline refactor of min_max_by/bitmap/group_concat/histogram/topn
2. add register_function_both method
3. add datetimev2 type creator of min_max_by
4. remove uint16/32/64 in FOR_INTEGER_TYPES
2023-03-02 00:00:01 +08:00
1244eed1cd [Opt](exec) opt the dispose nullable column logic (#17192) 2023-03-01 23:25:40 +08:00
633f2d52a4 [minor](log) add some logs (#17287) 2023-03-01 22:41:50 +08:00
6de02f1f46 [minor](jvm) add more error logs for JNI (#17270) 2023-03-01 22:09:57 +08:00
34c5e84e9f [fix](insert) fix txn error reason clearly (#16997)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-03-01 20:28:41 +08:00
f1db0d9501 [Enhencement](File Reader) delete old file_reader (#17261)
* delete old file_reader

* fix 1
2023-03-01 20:24:03 +08:00
b839353c2d [fix](inverted index) fix BE coredump because of not ignore case ensitivity for column name when create index (#17276) 2023-03-01 19:32:39 +08:00
3871e989ac [fix](memory) Avoid repeating meaningless memory gc #17258 2023-03-01 19:23:33 +08:00
a1e3b908d7 [fix](memory) split mem usage thread and gc thread to different threads (#17213)
Ensure that the memory status is refreshed in time
Avoid frequent GC
2023-03-01 19:19:05 +08:00
48ef61780d [refactor](struct-type) refactor and clean unused code for struct type (#17257)
remove unused code for struct type
2023-03-01 15:49:31 +08:00
0732eb54bc [feature](struct-type) support csv format stream load for struct type (#17143)
Refactor from_string method in data_type_struct.cpp to support csv format stream load for struct type.
2023-03-01 15:48:48 +08:00
b8ebcdff78 [Bug](bloomfilter) Fix wrong result using bloomfilter with date type (#17225) 2023-03-01 12:29:20 +08:00
979cf42d7a [Bug](decimalv3) Use correct decimal scale for function round (#17232)
Co-authored-by: maochongxin <maochongxin@gmail.com>
2023-03-01 12:28:41 +08:00
62ec74f4e7 segcompaction featuring verticalcompaction (#16731)
This patchset applies the following changes:

using vertical compaction machanism to do segcompaction
basic (WIP) refraction to separate segcompaction logic from BetaRowsetWriter
add segcompaction specific ut and regression tests
2023-03-01 10:55:40 +08:00
e687f3badd Revert "[feature-wip](BE http)Support BE http service using brpc (#16123)" (#17219)
This reverts commit 049ecccc578802496e5421db19e21e7eb256699d.
Merge back after streamload is handled.
2023-03-01 09:18:25 +08:00
2f471de675 [fix](FileCache) load file cache before start up daemon threads (#17199)
Daemon threads in doris_main.cpp will upload tablet metrics periodically, which will use StorageEngine::instance(). However loading file cache is a process in main thread, when it takes a lot of time to load file cache, StorageEngine::instance() will be a null pointer in daemon threads.
2023-03-01 08:35:57 +08:00
e22a9ecc3b [enhancement](execute model) using thread pool to execute report or join task instead of staring too many thread (#17212)
* [enhancement](execute model) using thread pool to execute report or join task instead of staring too many thread

Doris will start report thread and join thread during fragment execution. There are many problems if create and destroy thread very frequently. Jemalloc may not behave very well, it may crashed.

jemalloc/jemalloc#1405

It is better to using thread pool to do these tasks.
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-01 08:35:27 +08:00
68e9a66aa0 [Enchancement](schema scanner) add SchemaScanner profile (#17230)
Add some profile information to the schema scanner to facilitate performance optimization.

Example:

SchemaScanner:
      -  FillBlockTime:  9s131ms
      -  GetDbTime:  12.816ms
      -  GetDescribeTime:  1s645ms
      -  GetTableTime:  25.433ms
2023-03-01 08:34:27 +08:00
7f6209ede4 [fix](routine load) fix be core dump while use routine load (#17222) 2023-02-28 21:01:38 +08:00
9bcc3ae283 [Fix](DOE)Fix be core dump when parse es epoch_millis date format (#17100) 2023-02-28 20:09:35 +08:00
459874be50 Revert "[Bug](log) add some log to find out bug (#16518)" (#17178)
This reverts commit d1c6b8114053e8c754c979d8d3fbf5c880d361d2.
2023-02-28 19:23:12 +08:00
34813bae13 [improvement](meta) make database,table,column names to support unicode (replace PR #13467 with this) (#14531)
Make database, table, column and other names support unicode by changing LABEL_REGEX COMMON_NAME_REGIEX COMMON_TABLE_NAME_REGEX COLUMN_NAME_REGEX regular expressions in class FeNameFormat.

P.S. @SharpRay has transfered PR #13467 to me, and I‘m responsible for the task now. There will be some modifications during the review period, so I create a new PR and the original #13467 could be closed. Thanks.
2023-02-28 18:50:36 +08:00
1dd2a41e38 [vectorized](bug) fix window function can't handle first row of beyond (#17084)
Issue Number: close #16845
2023-02-28 17:30:23 +08:00
79e49dad93 [fix](brpc) solve bthread hang problem (#17206) 2023-02-28 17:10:05 +08:00
f8e20ceca2 [Improvement](jsonb) add suport for JSONB type for arrow (#16869)
add suport for JSONB type for arrow, which is used by doris spark/flink connector.
2023-02-28 17:04:13 +08:00
a1db5c6f52 [fix](vec) crash caused by not-implemented function in ColumnFixedLengthObject (#17215) 2023-02-28 15:27:06 +08:00
3e40467ce6 [Bug](vec) Fix chinese pinyin order by (#17152)
bug: some chinese word not sort by pinyin in GBK coding

CREATE TABLE `test_convert` (
                 `a` varchar(100) NULL
             ) ENGINE=OLAP
               DUPLICATE KEY(`a`)
               DISTRIBUTED BY HASH(`a`) BUCKETS 3
               PROPERTIES (
               "replication_allocation" = "tag.location.default: 1"
               );
insert into test_convert values("b"), ("a"), ("c"), ("睿"), ("多"), ("丝");
Query OK, 6 rows affected (0.03 sec)
{'label':'insert_ca73a6acc2194d5b_888218a3949355a6', 'status':'VISIBLE', 'txnId':'18068'}
mysql [test]>select * from test_convert;
+------+
| a    |
+------+
| a    |
| c    |
| 丝   |
| b    |
| 多   |
| 睿   |
+------+
6 rows in set (0.01 sec)
mysql [test]>select * from test_convert order by convert(a using gbk);          
+------+
| a    |
+------+
| a    |
| b    |
| c    |
| 多   |
| 丝   |
| 睿   |
+------+
6 rows in set (0.01 sec)
2023-02-28 14:29:56 +08:00
bf5037d6d5 [fix](OrcReader) typo in anaylize null values (#17156)
typographical error in analyzing null values for OrcReader.
2023-02-28 14:29:13 +08:00
598038e674 [improvement](parquet-reader)support parquet data page v2 (#17054)
Support parquet data page v2
Now the parquet data on AWS glue use data page v2, but we didn't support before.
2023-02-28 14:23:45 +08:00
4d8b310de0 [fix](struct-type) fix struct subtype support (#17081)
1. Make sure all sub types which STRUCT supported work correctly;
2. remove unused variable `_need_validate_data`;
3. lazy init min or max decimal to support nested DecimalV2 column validate;

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2023-02-28 11:37:07 +08:00
1771d1e5e7 [fix](value-range) fix the value range of non-nullable column contains null causes query short key index error. (#16943)
* [fix](value-range) fix the value range of non-nullable column contains null causes query short key index error.
2023-02-28 11:15:32 +08:00
26a46d8c3f [fix](cooldown) Handle full clone with cooldowned rowsets (#17069) 2023-02-28 11:04:01 +08:00
00723e36cf [enhancement](merge-on-write) add delete bitmap correctness check for single load (#17147)
For Unique Key MoW table, if there are duplicate keys in one single load job and there's multiple segments, we need to calculate delete bitmap to mark these duplicate keys deleted.
Add a check here to detect any bugs that might cause duplicate keys.
2023-02-28 10:06:36 +08:00
049ecccc57 [feature-wip](BE http)Support BE http service using brpc (#16123)
Now, streamload is not supported.
2023-02-28 09:59:29 +08:00
e0cd8599d2 [fix](delete) fix delete from bug which can get wrong result (#17146)
理论上,如果是两次独立的删除,比如delete from table where a=1; delete from table where a=2;其实这个地方应该可以使用的,但是目前的代码,是把所有不同版本的delete predicates和不同列的delete predicates都放到一起了,失去了版本信息、失去了谓词间可能是and的关系,统一弱化成了delete predicates都是独立的,有一个delete predicates满足条件,就把page都去掉。
这个pr的修改方式,就是在当前代码的基础上,当只有一个delete predicate的时候才能保证后续淘汰page的正确性,所以这里一律加了 == 1的判断才传递delete predicates。
如果要把不同版本的delete predicates和不同列的delete predicates作为完整和严谨的逻辑去判断page,需要修改的设计就有点多了,目前的方案算是一种优先解决bug的思路,后续可以进一步把delete predicates这块加速zone判断进行page淘汰的逻辑完善,提高delete predicates使用的场景。
2023-02-28 09:20:10 +08:00
b51ce415e7 [Feature](load) Add submitter and comments to load job (#16878)
* [Feature](load) Add submitter and comments to load job
2023-02-28 09:06:19 +08:00
84413f33b8 [enhancement](merge-on-write) add skip_delete_bitmap session variable for debug purpose (#17127) 2023-02-27 23:31:28 +08:00
d5b1d3403f [fix](merge-on-write) fix that the version of delete bitmap is incorrect when calculate delete bitmap between segments (#17095)
Different version numbers are used to calculate the delete bitmap between segments and rowsets, resulting in the failure of the last update of the delete bitmap.
2023-02-27 17:17:25 +08:00
Pxl
b06f3da96c [Bug] fix not close when pipeline context prepare failed (#17061) 2023-02-27 14:24:39 +08:00