Commit Graph

1177 Commits

Author SHA1 Message Date
a542f107db [feature](move-memtable) buffer messages in load stream stub (#23721) 2023-09-02 13:42:34 +08:00
228f0ac5bb [Feature](Multi-Catalog) support query doris bitmap column in external jdbc catalog (#23021) 2023-09-02 12:46:33 +08:00
18d470ecf7 [improvement](config) add a specific be config for segment_cache_capacity (#23701)
* add segment_cache_capacity config istead of fd limit * 2/5
* default -1 for backward compatibility
2023-09-02 01:14:14 +08:00
e1090d6a63 [Fix](column predicate) seperate CHAR primitive type for column predicate (#23581) 2023-09-01 09:41:53 +08:00
hzq
16d6357266 [fix] (mac compile) Fix mac compile error & fe start time related (#23727)
Fix of PR #23582

Some Fe codes are deleted by [Improvement](pipeline) Cancel outdated query if original fe restarts #23582 , need to be added back;
Fix mac build failed caused by wrong thrift declaration order.
2023-09-01 08:02:30 +08:00
65f41f71c1 [pipelineX](refactor) refine codes (#23726) 2023-09-01 07:57:35 +08:00
c74ca15753 [pipeline](sink) Supprt Async Writer Sink of result file sink and memory scratch sink (#23589) 2023-08-31 22:44:25 +08:00
25b6e4deb2 [fix](daemon) Fix incorrect initialization order of daemon services (#23578)
Current initialization dependency:

      Daemon ───┬──► StorageEngine ──► ExecEnv ──► Disk/Mem/CpuInfo
                │
                │
BackendService ─┘
However, original code incorrectly initialize Daemon before StorageEngine.
This PR also stop and join threads of daemon services in their dtor, to ensure Daemon services release resources in reverse order of initialization via RAII.
2023-08-31 19:46:38 +08:00
f1e43fcaa4 [opt](cache) Support segment cache dynamic opening and closing (#23659)
Dynamically modify the config to clear the cache, each time the disable cache will only be cleared once.
TODO, Support page cache and other caches.

curl -X POST http://xxxx:8040/api/update_config?disable_segment_cache=true
2023-08-31 18:48:26 +08:00
hzq
c083336bbe [Improvement](pipeline) Cancel outdated query if original fe restarts (#23582)
If any FE restarts, queries that is emitted from this FE will be cancelled.

Implementation of #23704
2023-08-31 17:58:52 +08:00
62c075bf7e [improvement](Block) Replace Block(const PBlock&) with deserialize because it has heavy operations in ctor (#23672) 2023-08-31 14:44:17 +08:00
126606cb4d [Fix](cache) fix query cache returns wrong result after deleting partitions. (#23555)
The reason is that sql cache just use partitionKey , latestVersion and latestTime to check if the cache should be returned, if we delete some partition(s) which is not the latest updated partition, all above values are not changed, so the cache will hit.
Use a field to save the partition num of these tables and sum the partition nums and send it to BE, there are two situations which contains delete-partition ops:

- just delete some partition(s), so the sum of partition num will be lower than before.
- delete some partition(s) coexists with add some partition(s), so the latest time or latest version will be higher than before.
2023-08-31 14:22:52 +08:00
d22290e548 [pipelineX](join) support hash join (#23689) 2023-08-31 13:01:26 +08:00
Pxl
f35ab37e1e [Bug](materialized-view) fix load db use analyzer to analyze diffrent metaindex (#23673)
fix load db use analyzer to analyze diffrent metaindex
2023-08-31 12:35:38 +08:00
3e4ee3c1e6 [fix](jdbc catalog) fix jdbc driver cache load error (#23656)
log error:
`W20230830 11:19:47.495721 3046231 status.h:363] meet error status: [INTERNAL_ERROR]user function's name should be function_id.checksum[.file_name].file_type, now the all split parts are by delimiter(.): 7119053928154065546.20c8228267b6c9ce620fddb39467d3eb.postgresql-42.5.0.jar`

When the jdbc driver had `.` in its name we failed to split it properly
2023-08-31 10:17:15 +08:00
05771e8a14 [Enhancement](Load) stream Load using SQL (#23362)
Using stream load in SQL mode

for example:
example.csv

10000,北京
10001,天津
curl -v --location-trusted -u root: -H "sql: insert into test.t1(c1, c2) select c1,c2 from stream(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql
curl -v --location-trusted -u root: -H "sql: insert into test.t2(c1, c2, c3) select c1,c2, 'aaa' from stream(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql
curl -v --location-trusted -u root: -H "sql: insert into test.t3(c1, c2) select c1, count(1) from stream(\"format\" = \"CSV\", \"column_separator\" = \",\") group by c1" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql
2023-08-30 19:02:48 +08:00
14310ad30b [improvement](move-memtable) wait StreamClose from remote (#23605)
* [fix](move-memtable) wait StreamClose from remote
2023-08-30 18:03:36 +08:00
1ac0ff0ea9 [feature](delete-predicate) support delete sub predicate v2 (#22442)
New structure for delete sub predicate.
Delete sub predicate uses a string type condition_str to stored temporarily now and fields will be extracted from it using std::regex, which may introduces stack overflow when matching a extremely large string(bug of libc).

Now we attempt to use a new PB structure to hold the delete sub predicate, to avoid that problem.

message DeleteSubPredicatePB {
    optional int32 column_unique_id = 1;
    optional string column_name = 2;
    optional string op = 3;
    optional string cond_value = 4;
}
Currently, 2 versions of sub predicate will both be filled. For query, we use the v2, and during compaction we still use v1. The old rowset meta with delete predicates which had sub predicate v1 will be attempted to convert to v2 when read from PB. Moreover, efforts will be made to rewrite these meta with the new delete sub predicate.

Make preparation to use column unique id to specify a column globally.
Using the column unique id rather than the column name to identify a column is vital for flexible schema change. The rewritten delete predicate will attach column unique id.
2023-08-29 19:37:23 +08:00
29b94c4ed7 [pipeline](refactor) refine pipeline fragment context (#23478) 2023-08-28 15:55:02 +08:00
7e7cfd17bf [fix](tablet sink) check data valid of tablet sink data (#23530)
Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>
2023-08-28 15:54:12 +08:00
ba351af452 [enhancement](thirdparty) upgrade thirdparty libs - again (#23414)
submit again #23290 (not upgrade brpc, because bthread local has error)

protobuf 3.15.0 -> 21.11
glog 0.4.0 -> 0.6.0
lz4 1.9.3 -> 1.9.4
curl 7.79.0 -> 8.2.1
zstd 1.5.2 -> 1.5.5
arrow 7.0.0 -> 13.0.0
abseil 20220623.1 -> 20230125.3
orc 1.7.2 -> 1.9.0
jemalloc for arrow 5.2.1 -> 5.3.0
xsimd 7.0.0 -> 13.0.0
opentelemetry-proto 0.19.0 -> 1.0.0
opentelemetry 1.8.3 -> 1.10.0

new:
c-ares -> 1.19.1
grpc -> 1.54.3
2023-08-26 22:59:10 +08:00
bc020112fc [enhancement](routineload) add debug conf and set broker.name.ttl = 0 (#23302)
* set broker.name.ttl = 0

* add debug config for librdkafka
2023-08-26 10:56:35 +08:00
f66f161017 [fix](multi-catalog)fix hive table with cosn location issue (#23409)
Sometimes, the partitions of a hive table may on different storage, eg, some is on HDFS, others on object storage(cos, etc).
This PR mainly changes:

1. Fix the bug of accessing files via cosn.
2. Add a new field `fs_name` in TFileRangeDesc
    This is because, when accessing a file, the BE will get a hdfs client from hdfs client cache, and different file in one query
request may have different fs name, eg, some of are `hdfs://`, some of are `cosn://`, so we need to specify fs name
for each file, otherwise, it may return error:

`reason: IllegalArgumentException: Wrong FS: cosn://doris-build-1308700295/xxxx, expected: hdfs://[172.xxxx:4007](http://172.xxxxx:4007/)`
2023-08-26 00:16:00 +08:00
1312c12236 Revert "[fix](testcase) fix test case failure of insert null value into not null column (#20963)" (#23462)
* Revert "[fix](testcase) fix test case failure of insert null value into not null column (#20963)"

This reverts commit 55a6649da962fb170ddb40fea8ef26bdc552a51a.

Mannual Revert "fix in strict mode, return error for insert if datatype convert fails (#20378)"

This mannual reverts commit 1b94b6368f5e871c9a0fe53dd7c64409079a4c9d

* fix case failure
2023-08-25 16:47:14 +08:00
8ef6b4d996 [fix](json) fix json int128 overflow (#22917)
* support int128 in jsonb

* fix jsonb int128 write

* fix jsonb to json int128

* fix json functions for int128

* add nereids function jsonb_extract_largeint

* add testcase for json int128

* change docs for json int128

* add nereids function jsonb_extract_largeint

* clang format

* fix check style

* using int128_t = __int128_t for all int128

* use fmt::format_to instead of snprintf digit by digit for int128

* clang format

* delete useless check

* add warn log

* clang format
2023-08-25 11:40:30 +08:00
a305f2ffc2 [fix](pipeline) update status when prepare failed #23419 2023-08-25 10:34:37 +08:00
9cacf9535a [Opt](functions) Use preloaded cache to accelerate timezone parsing (#22694)
* opt

* bugfix

* fix ut

* fix stylecheck
2023-08-25 10:00:48 +08:00
98d0a2f6c1 [feature](move-memtable)[3/7] add load stream manager and rpc service (#23415)
Co-authored-by: zhengyu <freeman.zhang1992@gmail.com>
Co-authored-by: Yongqiang YANG <dataroaring@gmail.com>
Co-authored-by: laihui <1353307710@qq.com>
2023-08-25 00:08:04 +08:00
Pxl
8ed4045df9 [Chore](primitive-type) remove VecPrimitiveTypeTraits (#22842) 2023-08-23 08:37:40 +08:00
1609b6cbf2 [pipelineX](sort) Support sort operator (#23322) 2023-08-22 19:36:50 +08:00
bcdb481374 [refactor](fragment) refactor non pipeline fragment executor (#23281)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-08-22 16:00:34 +08:00
0d7a61ae8c [fix](load) fix duplicate register of memtable writer in memory limiter (#23205) 2023-08-22 10:05:17 +08:00
dcd6c3c022 [pipelineX](refactor) propose a new pipeline execution model (#22562) 2023-08-21 15:38:45 +08:00
d4694167a8 [Enhancement](chore) Some Status relevant enhancement (#23072) 2023-08-21 14:14:38 +08:00
Pxl
477961dc21 [Chore](agg) refactor of hash map (#22958)
refactor of hash map
2023-08-18 17:59:30 +08:00
Pxl
cf1865a1c8 [Bug](scan) fix core dump due to store_path_map (#23084)
fix core dump due to store_path_map
2023-08-17 15:24:43 +08:00
343a6dc29d [improvement](hash join) Return result early if probe side has no data (#23044) 2023-08-17 09:17:09 +08:00
6cf1efc997 [refactor](load) use smart pointers to manage writers in memtable memory limiter (#23019) 2023-08-16 16:34:57 +08:00
9b2323b7fd [Pipeline](exec) support async writer in pipelien query engine (#22901) 2023-08-15 17:32:53 +08:00
1d825f57bc [fix](load) expose error root cause msg for load (#22968)
Currently, we only return ambiguous "INTERNAL ERROR" to the user when
load. This commit will no more hide the root cause.

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-08-15 13:22:45 +08:00
13cc7a31ab [fix](bug) Fix page handle safe exit #22849 2023-08-11 09:55:19 +08:00
Pxl
56392e21ae [Bug](decimalv3) fix decimalv3 keyrange set wrong number #22818 2023-08-10 18:15:40 +08:00
58e7952eea [refactor](load) use memtable writer in memtable memory limiter (#22780) 2023-08-10 17:08:47 +08:00
124c1b16cf [performance](load) remove unnecessary lock in TabletsChannel::add_batch (#22703)
This lock was introduced by lazy open in #18874.
It's unnecessary and costly to hold a lock while writing data to DeltaWriter in the first place.

However, since lazy open is reverted in #21821, we can completely omit this lock.
_tablet_writers is not supposed to be changed once we've reached TabletsChannel::add_batch.
2023-08-08 22:08:21 +08:00
66784cef71 [Enhancement](Load) Stream Load using SQL (#22509)
This PR was originally #16940 , but it has not been updated for a long time due to the original author @Cai-Yao . At present, we will merge some of the code into the master first.

thanks @Cai-Yao @yiguolei
2023-08-08 13:49:04 +08:00
0ca0c162b1 [fix][load] fix memtable reset cause nullptr (#22577) 2023-08-07 10:45:09 +08:00
ab3fc1df5e [chore](profile) Fix 'BlocksProduced' in plan_fragment_executor (#22637) 2023-08-06 12:42:39 +08:00
96f42ca20a [fix](memory) Independent count exec node memory profile (#22598)
Independent count exec node memory profile, after #22582
2023-08-06 10:56:31 +08:00
Pxl
7839a0e708 [Bug](brpc) fix brpc failed on big query came concurrently (#22600)
fix PriorityThreadPool get_info get wrong number
change brpc pool from priority to fifo
do not use brpc pool when send eos
2023-08-05 21:24:32 +08:00
24c1953e91 [fix](debug) add bvar counter for memtable & loadchannel (#22578)
* [fix](debug) add bvar counter for memtable & loadchannel

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>

* format code

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>

---------

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-08-04 13:58:28 +08:00