* 1.Remove import doris via csv in Dataxwriter, only support via json;
2.Format Dataxwriter code;
3.Optimize exception handling and reduce multiple output of exception logs;
4.Update the dataxwriter's documentation;
* Delete DorisCsvCodec.java
delete unused file extension/DataX/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisCsvCodec.java
* 1.remove `format` config key;
2.Optimize serialization code in DorisJsonCodec class
When I use dev container feature in vscode, there is some config file that shouldn't be put in git.
So it's better to add config file into gitignore for convenience.
Support merge IN predicate when exist remote target(e.g. shuffle hash join).
Remote the code that IN predicate implicit conversion to Bloom filter then exist remote target.
Close related #7546
1. Consider the responsibility of Reader, Rename Reader to TabletReader, I think the new name TabletReader can represent its function exactly, it is more suitable and meaningful
2. add virtual keyword for the destructor of OlapScanner, because VOlapScanner is derived from it
3. refactor struct ReaderParams and KeysParam as TabletReader's inner struct,guard by TabletReader name scope, it's also more reasonable
4. reduce OlapScanner's member data amount, just use _parent->member_data is simpler
5. bugfix: TupleReader has the same memeber data _collect_iter to its parent class Reader, this usage is dangerous, the writer may make some mistake, so i delete TupleReader::_collect_iter to fix it.
6. call set_tablet_reader() in OlapScanner::prepare() to setup _tablet_reader, VOlapScanner should override set_tablet_reader to new BlockReader instead, use this way to avoid new Reader twice by reset unique_ptr _tablet_reader
7. if the member data is a inseparable part of a class, i suggest using normal variable while not pointer variable, because pointer bring a indirect lay and must handle coping and destructing carefully, it's not necessary
8. some other small changes for readability or design
If the load result set is empty, or the load data is all filtered by the `where` condition,
it will not return failed with msg `all partitions have no load data`, but will return success directly.
Refer to this issue #7528
When setting property `default_storage_medium=ssd` and `storage_cooldown_second=xxx` in `fe.conf`
`cooldownTime=System.currentTimeMillis()+ storage_cooldown_second` , not always `MAX_COOLDOWN_TIME_MS`
Add a new FE config `repair_slow_replica`
when this config is true, Doris will try to delete the replica
with the largest number of versions, and then rebalance the replica.
Usually, when the number of versions of a certain replica is much higher
then that of other replicas, there are some problems with the current be's compilation.
Migrating to other machines can typically solve this problem.
Add partitionNum, tabletNum, cardinality in SqlBlockRule to block large/slow sql.
1. set partitionNum, tabletNum, cardinality as limitations to block sqls
2. compatible with lower version
3. add unit tests
4. add docs
The partition pruning v2 use connection context in `OlapScanNode`.
Before this PR, NPE would occur when running SQL without ConnectContext such as export, load.
For example:
```
EXPORT TABLE t TO "file:///home/data/export.txt"
```
This pr is for #7096 , which is add a rewrite rule for infer predicate.
For example:
origin stmt: select * from t1, t2, t3 where t1.id=t2.id and t2.i=t3.id and t2.id = 1
rewrite stmt: select * from t1, t2, t3 where t1.id=t2.id and t2.i=t3.id and t2.id = 1 and t1.id=1 and t3.id=1
+ Add a switch enable_infer_predicate to control whether to perform predicate expansion.
+ Register a new rule InferFiltersrule and add it to GlobalState.
+ Traverse Conjunct to construct on/where equivalence connection, numerical connection and isNullPredicate.
+ Infer all equivalence connections
+ Construct additional numerical connections and isNullPredicate
Change function percentile_approx return value from nan to null (like hive.) to ensure that return value of function percentile_approxcan be parsed by JDBC successfully.
Co-authored-by: weizuo <weizuo@xiaomi.com>
Implement a V2 version of partition prune algorithm. We use session variable partition_prune_algorithm_version as the control flag, with a default value of 2.
1. Support disjunctive predicates when prune partitions for both list and range partitions.
2. Optimize partition prune for multiple-column list partitions.
Closed#7433
1. Change `brpc_socket_max_unwritten_bytes` to 1GB
This can make the system more fault-tolerant.
Especially in the case of high system load, try to reduce EOVERCROWDED errors.
2. Change `brpc_max_body_size` to 3GB
To handle some large object such as bitmap or string.
the number of replica on specified medium we get from `getReplicaNumByBeIdAndStorageMedium()` is
defined by table properties. But in fact there may not has SSD/HDD disk on this backend.
So if we found that no SSD/HDD disk on this backend, set the replica number to 0,
but the partitionInfoBySkew doesn't consider this scene, medium has no SSD/HDD disk also skew,
cause rebalance exception
Close related [#7361]
As the sql described in [#7361](https://github.com/apache/incubator-doris/issues/7361)
```
select k1, count(k2) / count(1) from UserTable group by k1
```
Before this pr, `count(k2) / count(1)` will be rewritten as `sum(UserTable.mv_count_k2) / count(1)`,
and will be kept in second-round analyze, which could cause mv select fail.
After this pr, `count(k2) / count(1)` will still be rewritten as `sum(UserTable.mv_count_k2) / count(1)`,
but won't be kept in second-round analyze, so query could successfully run.