The column types of the materialized view and the base table are different.
When mv is selected in query plan, the type of slot should be changed by mv column type.
For example:
base table: k1 int, k2 int
mv table: k1 int, k2 bigint sum
The k2 type of slot ref should be changed from int to bigint.
Closed. #4271
Redesign metrics to 3 layers:
MetricRegistry - MetricEntity - Metrics
MetricRegistry : the register center
MetricEntity : the entity registered on MetricRegistry. Generally a MetricRegistry can be registered on several
MetricEntities, each of MetricEntity is an independent entity, such as server, disk_devices, data_directories, thrift
clients and servers, and so on.
Metric : metrics of an entity. Such as fragment_requests_total on server entity, disk_bytes_read on a disk_device entity,
thrift_opened_clients on a thrift_client entity.
MetricPrototype: the type of a metric. MetricPrototype is a global variable, can be shared by the same metrics across
different MetricEntities.
This PR is to add inPredicate support to delete statement,
and add max_allowed_in_element_num_of_delete variable to
limit element num of InPredicate in delete statement.
The new function approx_count_distinct is the alias of function ndv.
So Doris also need to rewrite approx_count_distinct to hll function when it is possible to match the hll materialized view.
Fix calculation of cumulative point. The problem is calculation of cumulative point
is wrong when be restarts and there is delete rowset. also see #4258
Support ALTER ROUTINE LOAD JOB stmt, for example:
```
alter routine load db1.label1
properties
(
"desired_concurrent_number"="3",
"max_batch_interval" = "5",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false",
"timezone" = "+08:00"
)
```
Details can be found in `alter-routine-load.md`
Revert “Change type of sum, min, max function column in mv”
This pr is revert pr #4199 .
The daily test is cored when the type of mv column has been changed.
So I revert the pr.
The daily core will be fixed in the future. After that, the pr#4199 will be enable.
Change-Id: Ie04fcfacfcd38480121addc5e454093d4ae75181
Using attachement strategy of brpc to send packet with big size.
BRPC send packet should serialize it first and then send it.
If we send one batch with big size, it will encounter a connection failed.
So we can use attachment strategy to bypass the problem and eliminate
the serialization cost.
Stream load should read all the data completely before parsing the json.
And also add a new BE config streaming_load_max_batch_read_mb
to limit the data size when loading json data.
Fix the bug of loading empty json array []
Add doc to explain some certain case of loading json format data.
Fix: #4124
Querys like DELETE FROM tbl WHERE decimal_key <= "123.456";
when the type of decimal_key is DECIMALV2 may failed randomly,
this is because the precision and scale is not initialized.
Now, if the length of URL is longer than 4096 bytes, netty will refuse.
The case can be reproduced by constructing a very long URL(longer than 4096bytes)
Add 2 http server params:
1. http_max_line_length
2. http_max_header_size
If the agg function is sum, the type of mv column will be bigint.
The only exception is that if the base column is largeint, the type of mv column will be largeint.
If the agg function is min or max, the type of mv column will be same as the type of base column.
For example, the type of mv column is smallint when the agg function is min.
We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web.
As follows:
1. nearly all MemTracker raw ptr -> shared_ptr
2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent)
3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec
before MemTracker's destructor. So we don't change these code.
4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared
too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile.
Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.
If table1 and table2 are colocated using column k1, k2.
Query should contains all of the k1, k2 to apply colocation algorithm.
Query like select * from table1 inner join table2 where t1.k1 = t2.k1 can not be used as colocation.
We add the rule to avoid the problem.
1. Add Exec msg for BufferedBlockMgr for debug tuning
2. Change the API of Consume Memory? We will use it in HashTable in the future
3. Fix mistake of count _unfullfilled_reserved_buffers in BufferedBlockMgr
This PR is mainly do three things:
1. Fix fe meta version bug introduced by #4029 , when fix conflict with #4086
2. Make drop check code easy to read
3. Add doc content for drop meta check
Try to select the BE with an existing replicas as the destination BE for
REPLICA_RELOCATING clone task.
Fix#4147
Also add 2 new FE configs `max_clone_task_timeout_sec` and `min_clone_task_timeout_sec`
This CL mainly changes:
1. Add 2 new FE modules
1. fe-common
save all common classes for other modules, currently only `jmockit`
2. spark-dpp
The Spark DPP application for Spark Load. And I removed all dpp related classes to this module, including unit tests.
2. Change the `build.sh`
Add a new param `--spark-dpp` to compile the `spark-dpp` alone. And `--fe` will compile all FE modules.
the output of `spark-dpp` module is `spark-dpp-1.0.0-jar-with-dependencies.jar`, and it will be installed to `output/fe/spark-dpp/`.
3. Modify some bugs of spark load
This PR is to support grammar like the following: INSTALL PLUGIN FROM [source] [PROPERTIES("KEY"="VALUE", ...)]
user can set md5sum="xxxxxxx", so we don't need to provide a md5 uri.
When Fe uploads the spark archive, the broker may fail to write the file,
resulting in the bad file being uploaded to the repository.
Therefore, in order to prevent spark from reading bad files, we need to
divide the upload into two steps.
The first step is to upload the file, and the second step is to rename the file with MD5 value.
This PR is to ensure that dropped db , table or partition can be with normal state after recovered by user. Commited txns can not be aborted, because the partitions's commited versions have been changed, and some tablets may already have new visible versions. If user just don't want the meta(db, table or partition) anymore, just use drop force instead of drop to skip committed txn check.
from/to_base64 may return incorrect value when the value is null #4130
remove the duplicated base64 code
fix the base64 encoded string length is wrong, and this will cause the memory error