Commit Graph

5948 Commits

Author SHA1 Message Date
15d54bae0e [fix](error-hub) use lock to protect the creation of error hub (#7605)
Add a lock when creating error_hub to ensure that no multiple threads create error_hub
(which could lead to a CORE) #7604
2022-01-09 16:57:31 +08:00
9aaa3f63f7 [improvement](spark-connector) Stream load http exception handling (#7514)
Stream load http exception handling
2022-01-09 16:54:55 +08:00
3a8a85b739 [Optimize][Extension] optimize extension datax doriswriter,Remove import doris via csv in Dataxwriter, only support via json (#7568)
* 1.Remove import doris via csv in Dataxwriter, only support via json;
2.Format Dataxwriter code;
3.Optimize exception handling and reduce multiple output of exception logs;
4.Update the dataxwriter's documentation;

* Delete DorisCsvCodec.java

delete unused file extension/DataX/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisCsvCodec.java

* 1.remove `format` config key;
2.Optimize serialization code in DorisJsonCodec class
2022-01-09 13:27:52 +08:00
ad35067a2a [chore][docs] add deploy spark/flink connectors to maven release repo docs (#7616) 2022-01-06 23:23:33 +08:00
482bf05da7 [refactor](log) remove RewriteClasses unused LOG reference (#7609) 2022-01-06 23:22:09 +08:00
1e0e472784 [fix](audit-plugin) Fix audit load plugin may stopped when throw unexpected exceptions (#7607)
Fix audit load may stopped when throw unexpected exceptions
2022-01-06 23:21:13 +08:00
90aa6c8a72 [fix](syntax) Add STRUCT to keywords (#7606) 2022-01-06 23:20:20 +08:00
1f88c5f849 [improvement](git) add vscode devcontainer config into git ignore (#7602)
When I use dev container feature in vscode, there is some config file that shouldn't be put in git.
So it's better to add config file into gitignore for convenience.
2022-01-06 23:19:50 +08:00
831f4cd71e [improvement](website)(proc) Make web page base proc dir and variables orderly (#7535) 2022-01-06 23:16:50 +08:00
563545475e [Optimize](Runtime Filter) Support merge in runtime filter(#7546) (#7547)
Support merge IN predicate when exist remote target(e.g. shuffle hash join).
Remote the code that IN predicate implicit conversion to Bloom filter then exist  remote target.

Close related #7546
2022-01-06 19:08:35 +08:00
e1374d8536 [fix](tablet-scheduler) Fix decommission backend bug (#7563)
Fix bug that decommission backend operation blocked with error:
`no proper tag is chose for tablet.`
2022-01-06 00:08:06 +08:00
2a2f12ca51 [refactor & fix](exce & olap) refactor reader: rename Reader to TabletReader (#7544)
1. Consider the responsibility of Reader,  Rename Reader to TabletReader, I think the new name TabletReader can represent its function exactly,  it is more suitable and meaningful
2. add virtual keyword for the destructor of OlapScanner, because VOlapScanner is derived from it
3. refactor struct ReaderParams and KeysParam as TabletReader's inner struct,guard by TabletReader name scope, it's also more reasonable
4. reduce OlapScanner's member data amount, just use _parent->member_data is simpler
5. bugfix: TupleReader has the same memeber data _collect_iter to its parent class Reader, this usage is dangerous, the writer may make some mistake, so i delete TupleReader::_collect_iter to fix it.
6. call set_tablet_reader() in OlapScanner::prepare() to setup _tablet_reader, VOlapScanner should override set_tablet_reader to new BlockReader instead,  use this way to avoid new Reader twice by reset unique_ptr _tablet_reader
7. if the member data is a inseparable part of a class, i suggest using normal variable while not pointer variable, because pointer bring a indirect lay and must handle coping and destructing carefully, it's not necessary
8. some other small changes for readability or design
2022-01-06 00:00:32 +08:00
738d2d2e07 [refactor] update parent pom version and optimize build scripts (#7548) 2022-01-05 10:45:11 +08:00
9ddcf0625c [improvement](load) Transaction for load job with no data for all partitions should be considered as normal and should not be aborted (#7240)
If the load result set is empty, or the load data is all filtered by the `where` condition,
it will not return failed with msg `all partitions have no load data`, but will return success directly.
2022-01-05 10:38:33 +08:00
5c104ec2d1 [Improvement] use "storage_cooldown_seconds" property when storage medium is SSD (#7532)
Refer to this issue #7528

When setting property `default_storage_medium=ssd` and `storage_cooldown_second=xxx` in `fe.conf`
`cooldownTime=System.currentTimeMillis()+ storage_cooldown_second` , not always `MAX_COOLDOWN_TIME_MS`
2022-01-04 10:32:57 +08:00
bf4a867e85 [improvement](tablet-repair) add a config repair_slow_replica (#7423)
Add a new FE config `repair_slow_replica`    
when this config is true, Doris will try to delete the replica
with the largest number of versions, and then rebalance the replica.
Usually, when the number of versions of a certain replica is much higher
then that of other replicas, there are some problems with the current be's compilation.
Migrating to other machines can typically solve this problem.
2022-01-04 10:28:14 +08:00
6657524c51 [feature](sql-block-rule) add partition_num, tablet_num, cardinality in SqlBlockRule to block big/slow sql (#7403)
Add partitionNum, tabletNum, cardinality in SqlBlockRule to block large/slow sql.

1. set partitionNum, tabletNum, cardinality as limitations to block sqls
2. compatible with lower version
3. add unit tests
4. add docs
2022-01-04 09:59:41 +08:00
7b13ac5b31 [deps][chore] make openssl works with old glibc version (#7541)
1. build OpenSSL with --with-rand-seed=devrandom
2. Modified: brpc 1.0.0-rc02 -> 1.0.0
2021-12-31 23:19:04 +08:00
a60d86c1e1 [improvement](broker) add disable cache config for broker (#7506) 2021-12-31 16:48:55 +08:00
d457ab3122 [imporvement] remove unused method from AggregateFunction (#7496) 2021-12-31 16:35:23 +08:00
d6cc3fdf03 [fix](materialized-view) forbidden create materialized view with distinct (#7494) 2021-12-31 16:08:37 +08:00
46ca012e2b [fix](bloom-filter) Fix error when handle empty string in bloom filter (#7448) 2021-12-31 16:05:33 +08:00
b2c5f25ef4 [docs] add more faq and FE debugging method (#7422)
1. Add more faq and FE debugging method.
2. Add security document.
2021-12-31 09:55:04 +08:00
7903e6491a [Bug](partition pruning v2) Fix NPE when calling Analyzer.getContext() in partition pruning related logic. (#7542)
The partition pruning v2 use connection context in `OlapScanNode`. 
Before this PR, NPE would occur when running SQL without ConnectContext such as export, load.
For example:
```
EXPORT TABLE t TO "file:///home/data/export.txt"
```
2021-12-30 17:03:14 +08:00
723ee84a66 [feature] (planner) InferPredicate (#7096)
This pr is for #7096 , which is add a rewrite rule for infer predicate.

For example:
origin stmt: select * from t1, t2, t3 where t1.id=t2.id and t2.i=t3.id and t2.id = 1
rewrite stmt: select * from t1, t2, t3 where t1.id=t2.id and t2.i=t3.id and t2.id = 1 and t1.id=1 and t3.id=1

+ Add a switch enable_infer_predicate to control whether to perform predicate expansion.
+ Register a new rule InferFiltersrule and add it to GlobalState.
+ Traverse Conjunct to construct on/where equivalence connection, numerical connection and isNullPredicate.
+ Infer all equivalence connections
+ Construct additional numerical connections and isNullPredicate
2021-12-30 13:24:30 +08:00
8da2e8b91b [fix](cache) Int overflow causes the wrong latest table to be obtained (#7533) 2021-12-30 11:16:50 +08:00
d88711aabb [fix] fix TableRef.java checkstyle failed (#7538) 2021-12-30 10:47:26 +08:00
7357089e4e [fix] change percentile_approx return from nan to null (#7512)
Change function percentile_approx return value from nan to null (like hive.) to ensure that return value of function percentile_approxcan be parsed by JDBC successfully.
Co-authored-by: weizuo <weizuo@xiaomi.com>
2021-12-30 10:24:35 +08:00
4d01219849 [fix](lower_case_table_names) Fix the bug of case-sensitive aliases in the query when lower_case_table_names=1 is set (#7495)
* [fix](lower_case_table_names) Fix the bug of case-sensitive aliases in the query when lower_case_table_names=1 is set
2021-12-30 10:23:45 +08:00
dc9cd34047 [docs] Add user manual for hdfs load and transaction. (#7497) 2021-12-30 10:22:48 +08:00
0894848045 fix having clause constant folding (#7507)
Change-Id: I49d7f2b17e498e8b393a8c67d85aa1196f961393

Co-authored-by: qijianliang01 <qijianliang01@baidu.com>
2021-12-30 10:22:07 +08:00
85c30fc720 [deps] Upgrade Log4j to 2.7.1 to solve the CVE-2021-44832 security vulnerability (#7536)
Upgrade Log4j to 2.7.1 to solve the CVE-2021-44832 security vulnerability

Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
2021-12-30 10:21:37 +08:00
bc4ceeca44 [improvement] optimize java cmd find (#7428)
* optimize java cmd find, if java_home not set use java in PATH
2021-12-30 10:16:56 +08:00
2872dbfeb8 [refactor] Standardize the writing of pom files, prepare for deployment to maven (#7477) 2021-12-30 10:16:37 +08:00
e93360791f Revert "[improvement](planner) make BinaryPredicate do not cast date to datetime/varchar (#7045)" (#7517) 2021-12-28 23:05:27 +08:00
3a5de976a3 [Feature](Partition pruning) Implement V2 version of partition prune. (#7434)
Implement a V2 version of partition prune algorithm. We use session variable partition_prune_algorithm_version as the control flag, with a default value of 2.

1. Support disjunctive predicates when prune partitions for both list and range partitions.
2. Optimize partition prune for multiple-column list partitions.

Closed #7433
2021-12-28 22:32:34 +08:00
a2d6e6e06f [improvement](config) Modify default value of some brpc config (#7493)
1. Change `brpc_socket_max_unwritten_bytes` to 1GB

    This can make the system more fault-tolerant.
    Especially in the case of high system load, try to reduce EOVERCROWDED errors.

2. Change `brpc_max_body_size` to 3GB

    To handle some large object such as bitmap or string.
2021-12-28 16:47:53 +08:00
Pxl
9fb89004aa [revert] part of "[improvement](planner) make BinaryPredicate do not cast date to datetime/varchar (#7045)" (#7501) 2021-12-28 15:07:10 +08:00
3454735eba [fix](balance) fix partition rebalance bug (#7213)
the number of replica on specified medium we get from `getReplicaNumByBeIdAndStorageMedium()` is
defined by table properties. But in fact there may not has SSD/HDD disk on this backend. 
So if we found that no SSD/HDD disk on this backend, set the replica number to 0,
but the partitionInfoBySkew doesn't consider this scene, medium has no SSD/HDD disk also skew,
cause rebalance exception
2021-12-28 15:03:29 +08:00
07e2acb2f3 [feature] Suport national secret (national commercial password) algorithm SM3/SM4 (#7464)
SM3 is password hash algorithm
SM4 is a block cipher used to replace DES / AES and other international algorithms.
2021-12-28 10:39:54 +08:00
6e052f4ede [Doc][Website] blogs are sorted by date (#7491)
* blogs are sorted by date

Co-authored-by: 943155336 <wangyongfeng>
Co-authored-by: jiafeng.zhang <zhangjf1@gmail.com>
2021-12-27 14:30:08 +08:00
80587e7ac2 [improvement](spark-connector)(flink-connector) Modify the max num of batch written by Spark/Flink connector each time. (#7485)
Increase the default batch size and flush interval
2021-12-26 11:13:47 +08:00
755e0693b9 [feature](broker) support ks3 for kmr in ksyun (#7484) 2021-12-26 11:10:47 +08:00
ab60c5eb59 [fix](spark-load) fix Roaring64Map big-endian read/write in de/serialization (#7480)
See #7479
This bug is triggered when the bitmap exceeds 32 bits.
2021-12-26 11:09:50 +08:00
43e93180c5 [chore](docker) Add clang11 in docker dev image (#7470) 2021-12-26 11:09:17 +08:00
ca97535491 [docs](executor) correct some be error code (#7460)
correct some be error code in doc.
2021-12-26 11:06:54 +08:00
98551f8e5e [fix](grouping-set) Grouping set clause act wrong for function expr in view (#7410) (#7411)
Fix #7410
2021-12-26 11:05:48 +08:00
0c154733e0 [feature](function) support bitmap_union/intersect have more columns parameters (#7379)
support multi bitmap parameter for all bitmap aggregation function
2021-12-26 11:03:20 +08:00
fe1d0c1428 [fix](materialized-view)(planner) fix mv rewrite bug (#7362)
Close related [#7361]

As the sql described in [#7361](https://github.com/apache/incubator-doris/issues/7361)

```
select k1, count(k2) / count(1) from UserTable group by k1
``` 

Before this pr, `count(k2) / count(1)` will be rewritten as `sum(UserTable.mv_count_k2) / count(1)`,
and will be kept in second-round analyze, which could cause mv select fail.

After this pr, `count(k2) / count(1)` will still be rewritten as `sum(UserTable.mv_count_k2) / count(1)`,
but won't be kept in second-round analyze, so query could successfully run.
2021-12-26 11:00:39 +08:00
4ed1846369 [fix](ut) Fix BE broker scanner unit test bug (#7486)
introduced from #7454
2021-12-26 10:30:37 +08:00