1.Support Bucket Shuffle Join when left table is colocate table or Colocate/Bucket Bucket Shuffle Join
2.Enable Local Rumtime Filter when there is Bucket Shuffle Join and Colocate Join
3.Add Doc for Bucket Shuffle Join
* [doris-1008] support backup and restore directly to cloud storage via aws s3 protocol
* Internal][S3DirectAccess] Support backup,restore,load,export directlyconnect to s3
1. Support load and export data from/to s3 directly.
2. Add a config to auto convert broker access to s3 acces when available
Change-Id: Iac96d4b3670776708bc96a119ff491db8cb4cde7
(cherry picked from commit 2f03832ca52221cc7436069b96c45c48c4bc7201)
* [Internal][S3DirectAccess] File path glob compatible with broker
Change-Id: Ie55e07a547aa22c6fa8d432ca926216c10384e68
(cherry picked from commit d4fb25544c0dc06d23e1ada571ec3f8edd4ba56f)
* [internal] [doris-1008] fix log4j class not found
Change-Id: I468176aca0d821383c74ee658d461aba9e7d5be3
(cherry picked from commit 029adaa9d6ded8503acbd6644c1519456f3db232)
* add poms
Co-authored-by: yangzhengguo01 <yangzhengguo01@baidu.com>
Support conditional filtering of original data in broker load and routine load
eg:
```
LOAD LABEL `label1`
(
DATA INFILE ('bos://cmy-repo/1.csv')
INTO TABLE tbl2
COLUMNS TERMINATED BY '\t'
(event_day, product_id, ocpc_stage, user_id)
SET (
ocpc_stage = ocpc_stage + 100
)
PRECEDING FILTER user_id = 1381035
WHERE ocpc_stage > 30
)
...
```
Support delete statement like:
1. delete from table partitions(p1, p2) where xxx; // apply to p1, p2
2. delete from table where xxx; // apply to all partitions
Also remove code about the deprecated sync/async delete job.
This CL changes FE meta version to 94
If there are too large fields in the table, there may be only one row in each page,
and this row also has a zone map index
This causes the stored data to expand three times the original data,
It also takes up more memory when reading those segments
Therefore, we need to Disable the creation of zonemap indexes for segments with too few rows
Currently, fe's SystemMetrics only support tcp. I add system memory metrics for fe.
Then we can get system memory metrics , which is used to troubleshoot memory problems.
Based on PR #4475, this patch add a new feature for single tablet migration between different disks by http.
Co-authored-by: weizuo <weizuo@xiaomi.com>
For the task of rebalancing tablet among different disks on the same BE,
It might be an effective strategy to ensure all tablets under the same partition
evenly distribute on the different disks. Thus, it is necessary to obtain the
distribution of tablets under the same partition between different disks on a BE.
This patch add a new http interface for BE to acquire the distribution of tablets
under a partition between different disks on the same BE.
Currently, fe thread metrics is very simple, only have thread count and peak_count.
I think we may need more comprehensive prometheus jvm thread metrics on fe.
This will be useful when we want to analysis fe's running status.
If adding the ignore_eovercrowded flag, the `PTabletWriterAddBatchRequest`
won't failed on `EOVERCROWDED` to avoid load jobs failed in this error.
It only effects the NodeChannel(the load job), other rpc requests will still check if overcrowded.
1. fix build thirdparty may be failed in some os, because of default lib path is `lib` or`lib64` or `arrow` bulld failed by `brotil` and `zstd`
2. fix canot extract `.tar.bz2` file
#4995
**Implementation of Separated Page Cache**
- Add config "index_page_cache_ratio" to set the ratio of capacity of index page cache
- Change the member of StoragePageCache to maintain two type of cache
- Change the interface of StoragePageCache for selecting type of cache
- Change the usage of page cache in read_and_decompress_page in page_io.cpp
- add page type as argument
- check if current page type is available in StoragePageCache (cover the situation of ratio == 0 or 1)
- Add type as argument in superior call of read_and_decompress_page
- Change Unit Test
RebalancerType could be configured via Config.rebalancer_type(BeLoad, Partition).
PartitionRebalancer is based on TwoDimensionalGreedyAlgo.
Two dims of Doris should be cluster & partition. And we only consider about the replica count,
do not consider replica size.
#4845 for further details.
add a flag of fuzzy_parse, if the json file all object keys are the same and has same order, we only need to parse the first row, and then use index instead key to parse value
Add viewable profile for broker load. Similar to the query profile,
the user can submit the import job by setting the session variable is_report_success to true,
and then view the running profile of the job on the FE web page for easy analysis and debugging.
- There is a fe configuration called dynamic_partition_enable
which controls the opening and closing of the dynamic partition function.
When this configuration is false, it means that all tables do not support dynamic partitioning.
- But when the user tried to create the dynamic partition table, Doris did not detect this parameter.
This will cause the user can normally create a dynamic partition table,
but in fact Doris cannot create a partition for this table.
- This pr detect this config when building the table.
The dynamic partition table can be created only when the dynamic_partition_enable configuration is true.
If the configuration is false, the command to create a dynamic partition table will directly report an error.
For #4674
This is a udaf for approximate topn using Space-Saving algorithm. At present, we can only calculate
the frequent items and their frequencies in a certain column, based on which we can implement similar
topN functions supported by Kylin in the future.
I have also added a test to calculate the accuracy of this algorithm. The following is a rough running result.
The total amount of data is 1 million lines and follows the Zipfian distribution, where Element Cardinality
represents the data cardinality, 20X, 50X.. The value representing space_expand_rate is 20,50, which is
used to set the counter number in the space-saving algorithm
```
zf exponent = 0.5
Element cardinality 20X 50X 100X
1000 100% 100% 100%
10000 100% 100% 100%
100000 100% 100% 100%
500000 94% 98% 99%
zf exponent = 0.6,1
Element cardinality 20X 50X 100X
1000 100% 100% 100%
10000 100% 100% 100%
100000 100% 100% 100%
500000 100% 100% 100%
```
issue:#5031
1. Support ODBC Sink for insert into data to ODBC external table.
2. Support Transaction for ODBC sink to make sure insert into data is atomicital.
3. The document about ODBC sink has been modified