Bucket shuffle join is an algorithm of joining two tables. Left table is distributed by a column.
Right table sends the data to the left table for joining operation.
It reduces the network cost. But when two table is without any data. Bucket shuffle join will fail.
Related Issue: #5144
When two colocate tables make join operation, to make join operation locally,
the tablet belongs to the same bucket sequence will be distributed to the same host.
When choosing which host for a bucket sequence, it takes random strategy.
Random strategy can not make query task load balance logically for one query.
Therefore, this patch takes round-robin strategy, make buckets distributed evenly.
For example, if there are 6 bucket sequences and 3 hosts,
it is better to distributed 2 buckets sequence for every host.
RebalancerType could be configured via Config.rebalancer_type(BeLoad, Partition).
PartitionRebalancer is based on TwoDimensionalGreedyAlgo.
Two dims of Doris should be cluster & partition. And we only consider about the replica count,
do not consider replica size.
#4845 for further details.
Doris supports two kinds of cache mode: sql_cache and partition_cache.
sql_cache takes sql string as key and cache the whole data.
partition_cache splits the data into many partition data and caches them differently.
Therefore a query may hit part of the partition_cache data.
If a query hits the left part of the data, we call the hit range is left.
If a query hits the right part of the data, we call the hit range is right.
And if a query hits the whole part of the data, we call the hit range is full.
A query does not hit any partition cache, but the algorithm still returns hit range right.
It should return hit range none.
Related issue: #5136
add a flag of fuzzy_parse, if the json file all object keys are the same and has same order, we only need to parse the first row, and then use index instead key to parse value
In the previous implementation, whether a subtask is in commit or abort state,
we will try to update the job progress, such as the consumed offset of kafka.
Under normal circumstances, the aborted transaction does not consume any data,
and all progress is 0, so even we update the progress, the progress will remain
unchanged.
However, in the case of high cluster load, the subtask may fail half of the execution on the BE side.
At this time, although the task is aborted, part of the progress is updated.
Cause the next subtask to skip these data for consumption, resulting in data loss.
When user wants to create materialized view with a mv column which is transformed
from original column in agg family table, Doris will throw a new error message
"The mv column of agg or uniq table cannot be transformed from original column"
instead of "column not exists".
Add viewable profile for broker load. Similar to the query profile,
the user can submit the import job by setting the session variable is_report_success to true,
and then view the running profile of the job on the FE web page for easy analysis and debugging.
- There is a fe configuration called dynamic_partition_enable
which controls the opening and closing of the dynamic partition function.
When this configuration is false, it means that all tables do not support dynamic partitioning.
- But when the user tried to create the dynamic partition table, Doris did not detect this parameter.
This will cause the user can normally create a dynamic partition table,
but in fact Doris cannot create a partition for this table.
- This pr detect this config when building the table.
The dynamic partition table can be created only when the dynamic_partition_enable configuration is true.
If the configuration is false, the command to create a dynamic partition table will directly report an error.
For #4674
This is a udaf for approximate topn using Space-Saving algorithm. At present, we can only calculate
the frequent items and their frequencies in a certain column, based on which we can implement similar
topN functions supported by Kylin in the future.
I have also added a test to calculate the accuracy of this algorithm. The following is a rough running result.
The total amount of data is 1 million lines and follows the Zipfian distribution, where Element Cardinality
represents the data cardinality, 20X, 50X.. The value representing space_expand_rate is 20,50, which is
used to set the counter number in the space-saving algorithm
```
zf exponent = 0.5
Element cardinality 20X 50X 100X
1000 100% 100% 100%
10000 100% 100% 100%
100000 100% 100% 100%
500000 94% 98% 99%
zf exponent = 0.6,1
Element cardinality 20X 50X 100X
1000 100% 100% 100%
10000 100% 100% 100%
100000 100% 100% 100%
500000 100% 100% 100%
```
For fix#4977, we return queryId in master FE when finish query for non master to audit it in #4978.
But when the query fail(timeout), the client may not receive the right queryId for audit.
In this PR:
None master FE send queryId to master for querying;
Add more log.
issue:#5031
1. Support ODBC Sink for insert into data to ODBC external table.
2. Support Transaction for ODBC sink to make sure insert into data is atomicital.
3. The document about ODBC sink has been modified
1. Add metadata table 'statistics' to store index information;
2. In the header information returned by mysql, the data type length is returned according to the actual type.
The return type of str_to_date depends on whether the time part is included in the format.
If included, it is DATETIME, otherwise it is DATE.
If the format parameter is not constant, the return type will be DATETIME.
The above judgment has been completed in the FE query planning stage,
so here we directly set the value type to the return type set in the query plan.
For example:
A table with one column k1 varchar, and has 2 lines:
"%Y-%m-%d"
"%Y-%m-%d %H:%i:%s"
Query:
SELECT str_to_date("2020-09-01", k1) from tbl;
Result will be:
2020-09-01 00:00:00
2020-09-01 00:00:00
Query:
SELECT str_to_date("2020-09-01", "%Y-%m-%d");
Return type is DATE
Query:
SELECT str_to_date("2020-09-01", "%Y-%m-%d %H:%i:%s");
Return type is DATETIME
Since the plan is retained in the task, if the task is not cleaned up, the memory usage will be too large caused Memory leak or OOM.
When load job finished, there is no need to hold the tasks which are the biggest memory consumers.
Fixed#4992
This CL fix 2 bugs:
1.
When the compaction fails, we must explicitly delete the output rowset,
otherwise the GC logic cannot process these rows.
2.
Base compaction failed if compaction process include some delete version in SegmentV2,
Because the number of filtered rows is wrong.
1. Random().nextInt() maybe return negative numeric value which would result in `java.lang.ArrayIndexOutOfBoundsException`,
pass a positive numeric value would avoid this problem.
```
int seed = new Random().nextInt(Short.MAX_VALUE) % nodesInfo.size()
```
2. EsNodeInfo[] nodeInfos = (EsNodeInfo[]) nodesInfo.values().toArray() maybe lead `java.lang.ClassCastException in some JDK version : [Ljava.lang.Object; cannot be cast to [Lorg.apache.doris.external.elasticsearch.EsNodeInfo` , pass the original `Class Type` can resolve this.
```
EsNodeInfo[] nodeInfos = nodesInfo.values().toArray(new EsNodeInfo[0]);
```
1. Support modify column type CHAR to TINYINT/SMALLINT/INT/BIGINT/LARGEINT/FLOAT/DOUBLE/DATE
and TINYINT/SMALLINT/INT/BIGINT/LARGEINT/FLOAT/DOUBLE convert to a wider range of numeric types (#4937)
2. Use template to refactor code of types.h and schema_change.cpp to delete redundant code.
When a tablet selects which replica's host to execute scan operation,
it takes `round-robin` strategy to load balance. `minAssignedBytes` is the current load of one host.
If a backend is not alive momently, it will randomly take one of other replicas as the choice,
but the unalive backend's `minAssignedBytes` not be descreased and the new choice's `minAssignedBytes`
also not be increased. That will make the real load of the backends not correct.
All Column create in inlineView will set `allowNull = false`, which will cause `NULL` data in CTE be process will be ignore.
So we should set column in inlineView allowNull to make sure correct of query.