* Support bitmap_intersect
Support aggregate function Bitmap Intersect, it is mainly used to take intersection of grouped data.
The function 'bitmap_intersect(expr)' calculates the intersection of bitmap columns and returns a bitmap object.
The defination is following:
FunctionName: bitmap_intersect,
InputType: bitmap,
OutputType: bitmap
The scenario is as follows:
Query which users satisfy the three tags a, b, and c at the same time.
```
select bitmap_to_string(bitmap_intersect(user_id)) from
(
select bitmap_union(user_id) user_id from bitmap_intersect_test
where tag in ('a', 'b', 'c')
group by tag
) a
```
Closed#3552.
* Add docs of bitmap_union and bitmap_intersect
* Support null of bitmap_intersect
Mainly changes:
1. Shade and provide the thrift lib in spark-doris-connector
2. Add a `build.sh` for spark-doris-connector
3. Move the README.md of spark-doris-connector to `docs/`
4. Change the line delimiter of `fe/src/test/java/org/apache/doris/analysis/AggregateTest.java`
Fix some format.
NOTICE(#3622 ):
This is a "revert of revert pull request".
This pr is mainly used to synthesize the PRs whose commits were
scattered and submitted due to the wrong merge method into a complete single commit.
Add a new config `drop_backend_after_decommission` in FE. if this config
is false, the BE will not be dropped after finishing decommission operation.
This new config is try to solve the problem described in ISSUE: #3460 .
TODO:
This method will generate a lot of data migration, so it is only a temporary solution.
After that, we should try to solve the problem of data balancing within the BE.
This CL also add the documents of FE and BE configuration.
These documents are incomplete and can be added later.
Fix#3390
This CL add more info in `JobDetails` column of `SHOW LOAD` result for Broker Load Job.
For example:
```
{
"Unfinished backends": {
"9c3441027ff948a0-8287923329a2b6a7": [10002]
},
"All backends": {
"9c3441027ff948a0-8287923329a2b6a7": [10002, 10004, 10006]
},
"ScannedRows": 2390016,
"TaskNumber": 1,
"FileNumber": 1,
"FileSize": 1073741824
}
```
2 newly added keys:
`Unfinished backends` indicates the BE which task on them are not finished.
`All backends` indicates the BE which this job has tasks on it.
One more thing, I pass the Backend Id along with the heartbeat msg from FE to BE, so that BE can
know the Id of themselves.
This CL add new command to set replication number of table in one time.
```
alter table test_tbl set ("replication_num" = "3");
```
It changes replication num of a unpartitioned table.
and
```
alter table test_tbl set ("default.replication_num" = "3");
```
It changes default replication num of the specified table.
If not reset, all queries comes from same session will have save isQuery field value.
This bug will cause all entries in fe.audit.log has same IsQuery=true.
This CL also fix another bug:
The resolved IPs of domain of a user should not appear in other user's white list. Fix#3380
HttpURLConnection can automatically redirect stream load to BE, but there is no authorization
information in http request headers after redirect.
Maybe HttpURLConnection remove authorization info when do followRedirect.
The solution is set the followRedirect property to false on the connection object and do the
redirect request manually.
#3364
This CL mainly made the following modifications:
1. Reorganized SegmentV2 upgrade document.
2. When the variable `use_v2_rollup` is set to true, the base rollup in v2 format is forcibly queried for verifying the data.
3. Fix a problem that there is no persistent storage format information in the schema change operation that performs v2 conversion.
4. Allow users to directly create v2 format tables.
This PR is to limit the replica usage, admin need to know the replica usage for every db and
table, be able to set replica quota for every db.
```
ALTER DATABASE db_name SET REPLICA QUOTA quota;
```
Currently we have implemented the plugin framework in FE.
This CL make the original audit log logic pluggable.
The following classes are mainly implemented:
1. AuditPlugin
The interface of audit plugin
2. AuditEvent
An AuditEvent contains all information about an audit event, such as a query, or a connection.
3. AuditEventProcessor
Audit event processor receive all audit events and deliver them to all installed audit plugins.
This CL implements two audit module plugins:
1. The builtin plugin `AuditLogBuilder`, which act same as the previous logic, to save the
audit log to the `fe.audit.log`
2. An optional plugin `AuditLoader`, which will periodically inserts the audit log into a Doris table
specified by the user. In this way, users can conveniently use SQL to query and analyze this
audit log table.
Some documents are added:
1. HELP docs of install/uninstall/show plugin.
2. Rename the `README.md` in `fe_plugins/` dir to `plugin-development-manual.md` and move
it to the `docs/` dir
3. `audit-plugin.md` to introduce the usage of `AuditLoader` plugin.
ISSUE: #3226
1. Change word of palo to doris in conf file.
2. Set default meta_dir to ${DORIS_HOME}/doris-meta
3. Comment out FE meta_dir, leave it to ${DORIS_HOME}/doris-meta, as exsting in FE Config.java.
4. Comment out BE storage_root_path, leave it to ${DORIS_HOME}/storage, as exsting in BE config.h.
NOTICE: default config is changed.
The bug is described in issue: #3200.
This CL solve the problem by:
1. Refactor the alter operation conflict checking logic by introducing new classes `AlterOperations` and `AlterOpType`.
2. Allow add/drop temporary partition when dynamic partition feature is enabled.
3. Allow modifying table's property when there is temporary partition in table.
4. Make the properties `dynamic_partition.enable` optional, and default is true.
Doris support choose medium when create table, and the cluster balance strategy is dependent
between different storage medium, and most use will not specify the storage medium when create table,
even they kown that they should choose a storage medium, they have no idea about the
cluster's storage medium, so, I think we should make storage_medium and storage_cooldown_time
configurable, and this should be the admin's responsibility.
For Example, if the cluster's storage medium is HDD, but we need to change part of machines to SSD,
if we change the machine, the tablets before change is stored in HDD and they can't find a dest path
to migrate, and user will create table as usual, it will make all tablets stored in old machines and
the new machines will only store a little tablets. Without this config the only way is admin need
to traverse all partitions in cluster and change the property of storage_medium, it will increase
operational and maintenance costs.
So I add a FE config default_storage_medium, so that user can set the default storage medium.