Commit Graph

207 Commits

Author SHA1 Message Date
126606cb4d [Fix](cache) fix query cache returns wrong result after deleting partitions. (#23555)
The reason is that sql cache just use partitionKey , latestVersion and latestTime to check if the cache should be returned, if we delete some partition(s) which is not the latest updated partition, all above values are not changed, so the cache will hit.
Use a field to save the partition num of these tables and sum the partition nums and send it to BE, there are two situations which contains delete-partition ops:

- just delete some partition(s), so the sum of partition num will be lower than before.
- delete some partition(s) coexists with add some partition(s), so the latest time or latest version will be higher than before.
2023-08-31 14:22:52 +08:00
1ac0ff0ea9 [feature](delete-predicate) support delete sub predicate v2 (#22442)
New structure for delete sub predicate.
Delete sub predicate uses a string type condition_str to stored temporarily now and fields will be extracted from it using std::regex, which may introduces stack overflow when matching a extremely large string(bug of libc).

Now we attempt to use a new PB structure to hold the delete sub predicate, to avoid that problem.

message DeleteSubPredicatePB {
    optional int32 column_unique_id = 1;
    optional string column_name = 2;
    optional string op = 3;
    optional string cond_value = 4;
}
Currently, 2 versions of sub predicate will both be filled. For query, we use the v2, and during compaction we still use v1. The old rowset meta with delete predicates which had sub predicate v1 will be attempted to convert to v2 when read from PB. Moreover, efforts will be made to rewrite these meta with the new delete sub predicate.

Make preparation to use column unique id to specify a column globally.
Using the column unique id rather than the column name to identify a column is vital for flexible schema change. The rewritten delete predicate will attach column unique id.
2023-08-29 19:37:23 +08:00
c775f8e7bd [feature](move-memtable)[2/7] add protos for memtable on sink node (#23348)
Co-authored-by: zhengyu <freeman.zhang1992@gmail.com>
Co-authored-by: laihui <1353307710@qq.com>
2023-08-24 11:11:46 +08:00
527293aa41 [refactor](dynamic table) remove dynamic table (#23298) 2023-08-23 14:15:14 +08:00
71807ceb5f [Enhancement](tvf) Table value function support reading local file (#17404)
I tested the local tvf with tpch queries. First, generate `lineitem` datasets with 6001215 rows, and load it into `lineitem` table by:
```
insert into lineitem select c11, c1, c4, c2, c3, c5, c6, c7, c8, c9, c10, c12, c13, c14, c15, c16 
from local(
        "file_path" = "tools/tpch-tools/bin/tpch-data/lineitem.tbl.1", 
        "backend_id" = "10003", 
        "format" = "csv", 
        "column_separator" = "|"
);
```
Then, run `q1` and `q16` tpch queries, the query result is correct.

It can also analyze the BE's log directly like:

```
mysql> select * from local(
        "file_path" = "log/be.out",
        "backend_id" = "10006",
        "format" = "csv")
       where c1 like "%start_time%" limit 10;
+--------------------------------------------------------+
| c1                                                     |
+--------------------------------------------------------+
| start time: 2023年 08月 07日 星期一 23:20:32 CST       |
| start time: 2023年 08月 07日 星期一 23:32:10 CST       |
| start time: 2023年 08月 08日 星期二 00:20:50 CST       |
| start time: 2023年 08月 08日 星期二 00:29:15 CST       |
+--------------------------------------------------------+
```
2023-08-10 20:07:42 +08:00
66784cef71 [Enhancement](Load) Stream Load using SQL (#22509)
This PR was originally #16940 , but it has not been updated for a long time due to the original author @Cai-Yao . At present, we will merge some of the code into the master first.

thanks @Cai-Yao @yiguolei
2023-08-08 13:49:04 +08:00
22cbf43b14 [Improvement](binlog) Add full/incr engine clone with binlog (#22678)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-08-08 10:03:11 +08:00
19d1f49fbe [improvement](compaction) compaction policy and options in the properties of a table (#22461) 2023-08-01 22:02:23 +08:00
b64f62647b [runtime filter](profile) add merge time on non-pipeline engine (#22363) 2023-07-31 12:52:42 +08:00
9e16c69925 [improvement](compression) support LZ4_HC algorithm and parse LZ4_RAW (#22165) 2023-07-26 18:23:39 +08:00
f7e3cc1553 [FIX](map)fix map proto contains_null #22107
when we select map in order by and limit; be node will coredump
2023-07-22 10:41:55 +08:00
2b2ac10e93 [feature](partial update) add failure tolerance for strict mode partial update stream load 2023-07-21 16:46:44 +08:00
367ad9164a [feature-wip](auto-inc)(step-2) support auto-increment column for duplicate table (#19917) 2023-07-20 18:03:39 +08:00
c6063ed92f [Revert](lazy open) revert lazy open and add case (#21821) 2023-07-18 19:41:33 +08:00
7b403bff62 [feature](partial update)support insert new rows in non-strict mode partial update with nullable unmentioned columns (#21623)
1. expand the semantics of variable strict_mode to control the behavior for stream load: if strict_mode is true, the stream load can only update existing rows; if strict_mode is false, the stream load can insert new rows if the key is not present in the table
2. when inserting a new row in non-strict mode stream load, the unmentioned columns should have default value or be nullable
2023-07-11 09:38:56 +08:00
8973610543 [feature](datetime) "timediff" supports calculating microseconds (#21371) 2023-07-10 19:21:32 +08:00
691a988c97 [enhancement](merge-on-write) add async publish task when version is discontinuous for merge on write table when clone (#21025)
version discontinuity may occur when clone. To deal with this case, add async publish task when version is discontinuous.
2023-06-22 21:50:14 +08:00
9d41edd9eb [Feature](binlog) Add binlog gc && Auth master_token (#20854)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-06-16 11:25:11 +08:00
fa785f3b24 [chore](proto) make some required fields optional for compability (#20609) 2023-06-09 08:51:01 +08:00
65100d8083 [improvement](profile)add max/min rpc time (#20339) 2023-06-06 12:03:01 +08:00
Pxl
8e39f0cf6b [Enchancement](Agg State) storage function name and result is nullable in agg state type (#20298)
storage function name and result is nullable in agg state type
2023-06-04 22:44:48 +08:00
ffadaa4935 [improvement](inverted index) skip write index on load and generate index on compaction (#20325) 2023-06-03 16:03:21 +08:00
accaff1026 [Feature](compaction) wip: single replica compaction (#19237)
Currently, compaction is executed separately for each backend, and the reconstruction of the index during compaction leads to high CPU usage. To address this, we are introducing single replica compaction, where a specific primary replica is selected to perform compaction, and the remaining replicas fetch the compaction results from the primary replica.

The Backend (BE) requests replica information for all peers corresponding to a tablet from the Frontend (FE). This information includes the host where the replica is located and the replica_id. By calculating hash(replica_id), the replica with the smallest hash value is responsible for executing compaction, while the remaining replicas are responsible for fetching the compaction results from this replica.
The compaction task producer thread, before submitting a compaction task, checks whether the local replica should fetch from its peer. If it should, the task is then submitted to the single replica compaction thread pool.
When performing single replica compaction, the process begins by requesting rowset versions from the target replica. These rowset_versions are then compared with the local rowset versions. The first version that can be fetched is selected.
2023-05-30 21:12:48 +08:00
e0d9f7f955 [enhancement](load) add some profile items for load (#20141) 2023-05-29 09:54:03 +08:00
93933308e6 [Feature-WIP](CCR): Add ccr doris interface (WIP) (#17881) 2023-05-26 23:40:49 +08:00
317338913c [Bug](topn) Fix topn fetch set real default value (#20074)
1. Before this PR if rowset does not contain column which should be read for related SlotDescriptor will call `insert_default` to column, but it's not this real defautl value.Real default value relevant information should be provided by the frontend side.

2. Support fetch when light schema change is not enabled, but disable for AGG or UNIQUE MOR model
2023-05-26 16:06:55 +08:00
303bee6fa3 [Fix](single replica load) add inverted index copy for single replica load (#19663)
* [Fix](single replica load) add inverted index copy for single replica load
2023-05-18 14:13:41 +08:00
943e5fb7e5 [improvement](MOW) use seperated cache for mow pk cache (#19686)
In mow, primary key cache have a big impact on load performance, so we add a new cache type to seperate
it from page cache to make it more flexible in some cases
2023-05-18 13:27:09 +08:00
e22f5891d2 [WIP](row store) two phase opt read row store (#18654) 2023-05-16 13:21:58 +08:00
f8ef25bb10 [enhancement](load) lazy-open necessary partitions when load (#18874) 2023-05-14 16:09:55 +08:00
Pxl
dff669899a [Feature](generic-aggregation) add some type define for generic aggregate functions support (#19252)
add some type define for generic aggregate functions support
2023-05-06 11:30:13 +08:00
aef9355cd3 [feature-wip](partial update) PART1: support basic partial write (#17542) 2023-04-28 17:17:57 +08:00
aabcab9dbe [Improvement](runtime filter) Improve merge phase (#18828) 2023-04-26 21:01:20 +08:00
a32fa219ec Revert "[Enhancement](compaction) stop tablet compaction when table dropped (#18702)" (#19086)
This reverts commit 296b0c92f702675b92eee3c8af219f3862802fb2.

we can use drop table force stmt to fast drop tablets, no need to check tablet dropped state in every report

Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2023-04-26 18:27:46 +08:00
296b0c92f7 [Enhancement](compaction) stop tablet compaction when table dropped (#18702)
* [Enhancement](compaction) stop tablet compaction when table dropped

* fix be ut
2023-04-24 11:04:27 +08:00
8e4710079d [improvement](profile) Insert into add LoadChannel runtime profile (#18908)
TabletSink and LoadChannel in BE are M: N relationship,
Every once in a while LoadChannel will randomly return its own runtime profile to a TabletSink, so usually all LoadChannel runtime profiles are saved on each TabletSink, and the timeliness of the same LoadChannel profile saved on different TabletSinks is different, and each TabletSink will periodically send fe reports all the LoadChannel profiles saved by itself, and ensures to update the latest LoadChannel profile according to the timestamp.
2023-04-24 09:41:57 +08:00
de0e89d1b4 [feature](function) Modified cast as time to behave more like MySQL (#18565)
Because the underlying type of time was float64, select cast("19:22:18" as time) would result in a null value in the past.
Results in the following:
2023-04-22 06:11:59 +08:00
ecb22ad35e [chore](proto) modify the order of store_row_column and is_dynamic_schema to be compatible with branch-1.2-lts (#18232) 2023-04-12 11:59:56 +08:00
e562017801 [feature](table-metadata) support altering the property "light_schema_change" for the tables which created before 1.2 (#17704) 2023-04-11 11:09:43 +08:00
6cbf393665 [enhance](meta action) remove useless pb field and refactor writer cooldown meta code (#17652) 2023-03-22 11:13:13 +08:00
5b39fa9843 [Feature](vec)(quantile_state): support quantile state in vectorized engine (#16562)
* [Feature](vectorized)(quantile_state): support vectorized quantile state functions
1. now quantile column only support not nullable
2. add up some regression test cases
3. set default enable_quantile_state_type = true
---------

Co-authored-by: spaces-x <weixiang06@meituan.com>
2023-03-14 10:54:04 +08:00
fcd25b53bf [Optimize](Random distribution) Improve the performance of tablet sin… (#17389)
The current distribution model for Doris is as follows:

OlapTableSink seperate the original Block into serveral subblocks of each node(BE) by tablets distribution and distributes subblocks to storage engine of backends, then the storage engine will seperate the subblock into multiple tablets channel and each delta writer will handle partial of the block.

This model causes blocks to be split according to tablets, and the splitting process can be a relatively heavy operation. After splitting, the blocks are distributed to different DeltaWriters (Memtables) through RPCs to TabletChannels. The distribution operation on TabletChannels is also a relatively heavy operation. If the distribution property of the table is RANDOM distribution, then we have the opportunity to distribute the blocks according to the complete block during distribution. The advantage of doing so is to reduce memory copying and improve write locality, similar to appending the entire block to the memtable.

This optimze could save 10% ~ 20% CPU cost of RANDOM distribution table load when enable load_to_single_tablet
2023-03-10 10:52:40 +08:00
9477c48ef8 [refactor](functioncontext) remove duplicate type definition in function context (#17421)
remove duplicate type definition in function context
remove unused method in function context
not need stale state in vexpr context because vexpr is stateless and function context saves state and they are cloned.
remove useless slot_size in all tuple or slot descriptor.
remove doris_udf namespace, it is useless.
remove some unused macro definitions.
init v_conjuncts in vscanner, not need write the same code in every scanner.
using unique ptr to manage function context since it could only belong to a single expr context.
Issue Number: close #xxx
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-06 16:07:09 +08:00
400b4bf7a7 [enhance](report) add local and remote size in tablet meta header action (#17406) 2023-03-06 10:43:57 +08:00
b9b028099d [enhancement](stream load pipe) using queryid or load id to identify stream load pipe instead of fragment instance id (#17362)
* [enhancement](stream load pipe) using queryid or load id to identify stream load pipe instead of fragment instance id

NewLoadStreamMgr already has pipe and other info. Do not need save the pipe into fragment state. and FragmentState should be more clear.

But this pr will change the behaviour of BE.
I will pick the pr to doris 1.2.3 and add the load id to FE support. The user could upgrade from 1.2.3 to 2.x
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-04 16:19:36 +08:00
e687f3badd Revert "[feature-wip](BE http)Support BE http service using brpc (#16123)" (#17219)
This reverts commit 049ecccc578802496e5421db19e21e7eb256699d.
Merge back after streamload is handled.
2023-03-01 09:18:25 +08:00
049ecccc57 [feature-wip](BE http)Support BE http service using brpc (#16123)
Now, streamload is not supported.
2023-02-28 09:59:29 +08:00
edead494cb [Enhancement](storage) add a new hidden column __DORIS_VERSION_COL__ for unique key table (#16509) 2023-02-23 15:47:17 +08:00
0e3be4eff5 [Improvement](brpc) Using a thread pool for RPC service avoiding std::mutex block brpc::bthread (#16639)
mainly include:
- brpc service adds two types of thread pools. The number of "light" and "heavy" thread pools is different
Classify the interfaces of be. Those related to data transmission are classified as heavy interfaces and others as light interfaces
- Add some monitoring to the thread pool, including the queue size and the number of active threads. Use these 
- indicators to guide the configuration of the number of threads
2023-02-22 14:15:47 +08:00
0bb6005143 [Improvement](thrift) optimize thrift messages (#16383)
Now we use a thrift message per fragment instance. However, there are many same messages between instances in a fragment. So this PR aims to extract the same messages and we only need to send thrift message once for a fragment
2023-02-16 11:07:46 +08:00