Commit Graph

13721 Commits

Author SHA1 Message Date
4b85c2738e [bug](function)fix potential npe in getFunction() when fe restart (#18989)
fix potential npe in getFunction() when fe restart
2023-05-04 23:45:22 +08:00
ddd67dba8c [chore](release) build-for-release.sh support arm (#19270)
Use `uname -m` to get arch
2023-05-04 19:48:41 +08:00
4e4fb33995 [refactor](conjuncts) simplify conjuncts in exec node (#19254)
Co-authored-by: yiguolei <yiguolei@gmail.com>
Currently, exec node save exprcontext**, but the object is in object pool, the code is very unclear. we could just use exprcontext*.
2023-05-04 18:04:32 +08:00
fa7d86efbd [improvement](log) log timeout seconds when creating partitions timeout (#19223) 2023-05-04 17:18:42 +08:00
e9a4cbcdf9 [Refact](type system) refact column with arrow serde (#19091)
* refact arrow serde

* add date serde

* update arrow and fix nullable and date type
2023-05-04 15:28:46 +08:00
feeda7230a [Enhancement](storage engine) avoid deleting tablets on unused disk (#19010) 2023-05-04 15:15:43 +08:00
e17a171a3c [fix](vertical_compaction) Fix continuous_agg_count PODArray wrong boundary judgment #19187 2023-05-04 14:50:30 +08:00
a573e1093a [fix](planner) insubquery should always be converted to semi or anti join (#19240) 2023-05-04 11:16:18 +08:00
aaf0ef741e [fix](regression) fix inverted_index_p1 q72.sql timeout error (#19241)
Fix inverted_index_p1 q72.sql timeout error
1、the runtime filter exeed wait time and lead to 100w * 1000w data join
2023-05-04 11:05:15 +08:00
2c1a5bb352 Revert "[chore](third-party) Fix the checksums of mysql (#19047)" (#19189)
This reverts commit c93d6ba3be2f2448b824d36da61835e2cd1235cd.
2023-05-04 10:09:37 +08:00
5459cd9c30 [Improve](fe)Upgrade dependencies and optimize jar package management (#18882)
bind netty-version to 4.1.89-final
bind jettison to 1.5.4
upgrade hadoop version to 3.3.5
upgrade range-plugins-common to 2.4.0
bind bcprov-jdk15on to 2.4.0
upgrade and bind woodstox to 6.5.1
upgrade and bind kerby to 2.0.3
upgrade hudi to 0.13.0
upgrade parquet to 1.13.0
upgrade maven-source-plugin to 3.2.1
upgrade maven-assembly-plugin to 3.3.0
upgrade maven-javadoc-plugin to 3.3.2
upgrade maven-shade-plugin to 3.3.4
upgrade maven-clean-plugin to 3.1.0
Remove meaningless plugins
Optimize doris maven path
Unify the Java modules for management in fe
2023-05-04 10:07:37 +08:00
ffd50b6aeb [improvement](broker) TOperationStatus determines that a null pointer is redundant. (#18712)
TOperationStatus determines that a null pointer is redundant. If tOperationStatus is a null pointer, then tOperationStatus.getMessage() will have a null pointer exception.
2023-05-04 10:03:09 +08:00
52d25f41a4 [feature](multi-catalog) Rename multi-catalog config 'specified_database_list' to 'include_database_list', and introduce new multi-catalog config 'exclude_database_list' (#18834)
In my scene, We need to specify databases that are excluded to synchronize to doris,
like some databases store temporary table.
Since #17803 introduce `specified_database_list` to specify 'include databases',
this pr introduce new config `exclude_database_list` to specify 'exclude databases',
and rename `specified_database_list` to `include_database_list` for naming symmetry.

BTW, when `include_database_list` and `exclude_database_list` specify overlapping databases, `exclude_database_list` would take effect with higher privilege over `include_database_list`.
2023-05-04 09:30:02 +08:00
7652d8649b [regression](nereids) check tpc-h 1G/500G/1T plan if backend_num == 1 #18848
cases in nereids_tpch_shape_sf1_p0, nereids_tpch_shape_sf500_p0 and nereids_tpch_shape_sf1000_p0 are only for one be environment
2023-05-04 08:55:06 +08:00
c98829c94b [improvement](load) log time consumed by waiting flush (#19226) 2023-05-03 17:48:13 +08:00
72d937ad52 [fix](auth)fix es catalog show table (#19202) 2023-05-02 20:22:07 +08:00
9d18be9dd3 [doc](thrift) update doc for thrift 0.16 (#19217)
* 1

update doc for thrift 0.16
2023-05-02 16:00:10 +08:00
145b94531f [Fix](load) fix request_slave_tablet_pull_rowset get wrong url in case of ipv6 address (#19026) 2023-05-02 09:55:09 +08:00
224bca3794 [docker](hudi) add hudi docker compose (#19048) 2023-05-02 09:54:52 +08:00
b0c215e694 [enhance](be)add more profile in prefetched buffered reader (#19119) 2023-05-02 09:53:39 +08:00
05beb8538e [Fix](multi-catalog) fix FE abnormal exit when replay OP_REFRESH_EXTERNAL_TABLE (#19120)
When salve FE nodes replay OP_REFRESH_EXTERNAL_TABLE log, it will invoke `org.apache.doris.datasource.hive.HiveMetaStoreCache#invalidateTableCache`,
but if the table is a non-partitioned table, it will invoke  `catalog.getClient().getTable`.
If some network problem occurs or this table is not existed, an exception will be thrown and FE will exit right away.
The solution is that we can use a dummy key as the file cache key which only contains db name and table name.
And when slave FE nodes replay OP_REFRESH_EXTERNAL_TABLE log,
it will not rely on the hms client and there will not any exception occurs.
2023-05-02 09:53:20 +08:00
43803940f5 [community](collaborator) add more collaborators (#19229)
Add @TangSiyang2001 as collaborator, and he helped a lot in good first issue.
2023-05-01 23:34:06 +08:00
eac61dc410 [vectorized](function) add some check about result type in array map (#19228) 2023-05-01 16:28:11 +08:00
a978be32a6 [fix](schema_change) remove shadow prefix of schema for tablesink (#18822)
LSC updates tablet's schema in writing. Be optimized adding columns via linked schema change and
it distinguishes adding by comparing column name. e.g. if new column's name is not found in old schema,
then it is a newly-add column.

When a table is under schema-changing, it adds __doris_shadow_ prefix in name of columns in shadow index.
Then  writes during schema-changing would bring schema with __doris_shadow_ to be.
If schema change request arrives at be after writes, then be do it as a add-column schema change due to 
__doris_shadow_ is not in base tablet.
2023-04-30 22:46:36 +08:00
da4de37dec [feature-wip](mv lifecycle) separate life cycle of base table and its materialized views (#19210)
support related syntax and add:regress-test case

---------

Co-authored-by: yzy <yzy@nanfeng_yzy@163.com>
2023-04-30 17:42:02 +08:00
8eab20d3df [bugfix](low cardinality) cached code is wrong will result wrong query result when many null pages (#19221)
Sometimes the dict is not initialized when run comparison predicate here, for example, the full page is null, then the reader will skip read, so that the dictionary is not inited. The cached code is wrong during this case, because the following page maybe not null, and the dict should have items in the future.
This will result the dict string column query return wrong result, if there are many null values in the column.
I also add some regression test for dict column's equal query, larger than query, less than query.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-29 21:28:41 +08:00
d383f1f3d7 [optimization](simd) optimize count_zero_num for ColumnNullable #19124 2023-04-29 14:50:39 +08:00
f2b15c03ca [fix]disable enable_resource_group for regression test (#19206)
When running regression test with setting enable_resource_group = true, it's shared by other test case, may be cause regression test failed.
So we should not set it to true until we have fully test it.
2023-04-29 14:47:50 +08:00
8c6ccc092a [fix](test) fix 2 unstable test (#19220) 2023-04-29 14:42:47 +08:00
fc3728c6ab [fix](dynamic-partition) create HOUR unit partition with DATEV2 throw exception (#19213)
Need to forbid create HOUR unit partition with partition column type DATEV2
```
Unexpected exception: String index out of range: 10
```
2023-04-29 08:23:06 +08:00
c74c2a4f8e [fix](Metadata tvf) Metadata TVF supports read the specified columns from Fe (#19110) 2023-04-29 00:06:08 +08:00
d006143330 [fix](multi-catalog) when endpoint has no region, need a suggestion (#19203)
solve the problem

```
 mysql> CREATE CATALOG iceberg PROPERTIES (  
    'type'='iceberg', 
    'iceberg.catalog.type'='rest',                                                                                                                                         
    'uri' = 'http://0.0.0.0:8888,  
    "AWS_ACCESS_KEY" = "admin",                                                                                                                                           
    "AWS_SECRET_KEY" = "password", 
    "AWS_REGION" = "us-east-1",                  
    "AWS_ENDPOINT" = "http://minio:9000"
);  
show databases; 

ERROR 1105 (HY000): IllegalArgumentException, msg: java.lang.IllegalArgumentException: The value of property fs.s3a.endpoint.region must not be null   
```
2023-04-29 00:05:41 +08:00
4a10d146bf [pipeline](exec) fix regression prepare failed cause query core dump (#19208)
fix regression prepare failed cause query core dump
2023-04-28 20:46:39 +08:00
bee3aa3007 be conf action supports specify item (#19159) 2023-04-28 19:12:51 +08:00
a324ee794c [fix](memory) Fix Aggregation null key memory leak due to incorrect aggfunc destroy #19201 2023-04-28 18:41:41 +08:00
b87d21d836 [doc](spark-load)add spark load ha EN docs (#19194)
* 15000-doc-spark-ha  english doc

* Update spark-load-manual.md format

---------

Co-authored-by: liujh <liujh@t3go.cn>
Co-authored-by: Luzhijing <82810928+luzhijing@users.noreply.github.com>
2023-04-28 18:18:42 +08:00
fd3c132d91 [enhancement](test) split large data of p2 cases (#19186) 2023-04-28 18:18:25 +08:00
1379d7f3e0 [fix](memory) mmap threshold can be modified in conf, Increase to 128M 2023-04-28 18:17:22 +08:00
43e70ab252 [chore](recover) add a config to recover remaining data in emergency (#18986) 2023-04-28 17:42:00 +08:00
365ac54102 [doc](fqdn)fqdn doc cn (#19179)
* fqdn doc cn

* Update fqdn.md format

---------

Co-authored-by: Luzhijing <82810928+luzhijing@users.noreply.github.com>
2023-04-28 17:26:49 +08:00
6626f26506 [optimize](string) optimize char_length function by SIMD (#18925)
Optimize char_length function by SIMD
(1) optimize utf8_len compute
(2) 840% up
2023-04-28 17:22:35 +08:00
aef9355cd3 [feature-wip](partial update) PART1: support basic partial write (#17542) 2023-04-28 17:17:57 +08:00
718297d3c1 [test](statistics) add p0 test of sampling statistics (#19176)
1. Added test p0 for sampling collection statistics
2. Modify the uniqueKeys of table analysis_jobs for deletion based on relevant conditions
3. Solve the problem that incremental statistics p0 is less stable
2023-04-28 15:50:05 +08:00
f0852f2ac9 [fix](fe)fix bug if left table is empty and there are multiple right tables need do bucket shuffle to left side (#19169)
* [fix](fe)fix bug if left table is empty and there are multiple right tables need do bucket shuffle to left side

* fix bug

* fix test cases
2023-04-28 15:06:38 +08:00
48c4679019 [doc] fix broken link in docs (#19175) 2023-04-28 14:29:14 +08:00
Pxl
ec517a53a8 [Chore](build) upgrade clang-format version to 16 && move thrift to fe-common (#19155)
upgrade clang-format version to 16
move thrift to fe-common
fix core dump on pipeline engine when operator canceled and not prepared
2023-04-28 14:14:51 +08:00
ffe27baeaf [FAQ](docs) add a FAQ about hive catalog occurring UnknownHostException (#19182)
[FAQ](docs) add a FAQ about hive catalog occurring UnknownHostException (#19182)
2023-04-28 13:50:24 +08:00
52b1bd2c81 [clone](download) fix be clone action download tablet content length overflow (#18851) 2023-04-28 11:35:17 +08:00
5e9c0c3500 [Enhancement](data-type) add FE config to prohibit create date and decimalv2 type (#19077)
* prohibits date and decimal type

* add config in test
2023-04-28 11:31:51 +08:00
65a82a0b57 [opt](FileReader) turn off prefetch data in parquet page reader when using MergeRangeFileReader (#19102)
Using both `MergeRangeFileReader` and `BufferedStreamReader` simultaneously would waste a lot of memory,
so turn off prefetch data in `BufferedStreamReader` when using MergeRangeFileReader.
2023-04-28 09:27:56 +08:00