Commit Graph

5063 Commits

Author SHA1 Message Date
9581d2b4eb [refactor](load) split memtable writer out of delta writer (#21892) 2023-08-08 22:02:42 +08:00
30ceb7aea7 [fix](chore] need to remove reference in assert_cast (#22706) 2023-08-08 20:36:05 +08:00
897db3cca5 [Chore](inverted index) refine log in DorisCompoundDirectory::FSIndexOutput (#22716) 2023-08-08 20:32:31 +08:00
edd36fe86b [Chore](tablet) Remove unused BaseTablet::is_memory (#22688)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-08-08 16:42:59 +08:00
f2731185c9 [fix](memory) fix cache clean thread (#22472)
fix page cache update last visit time.
fix cache clean thread
2023-08-08 15:38:29 +08:00
66784cef71 [Enhancement](Load) Stream Load using SQL (#22509)
This PR was originally #16940 , but it has not been updated for a long time due to the original author @Cai-Yao . At present, we will merge some of the code into the master first.

thanks @Cai-Yao @yiguolei
2023-08-08 13:49:04 +08:00
22cbf43b14 [Improvement](binlog) Add full/incr engine clone with binlog (#22678)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-08-08 10:03:11 +08:00
c9dc715c5d [fix](broker-load) fix error when using multi data description for same table in load stmt (#22666)
For load request, there are 2 tuples on scan node, input tuple and output tuple.
The input tuple is for reading file, and it will be converted to output tuple based on user specified column mappings.

And the broker load support different column mapping in different data description to same table(or partition).
So for each scanner, the output tuples are same but the input tuple can be different.

The previous implements save the input tuple in scan node level, causing different scanner using same input tuple,
which is incorrect.
This PR remove the input tuple from scan node and save them in each scanners.
2023-08-07 20:03:03 +08:00
77e772e103 [enhancement](config) add some pre-process and pre-check for BE storage config attentions in docs (#22486) 2023-08-07 18:16:57 +08:00
bc697ca9d6 [fix](time) fix error in time_to_sec 2023-08-07 17:33:24 +08:00
f036cdfde6 [feature](compaction) support delete in cumulative compaction (#19609) 2023-08-07 15:22:21 +08:00
Pxl
591aee528d [Bug](exchange) change BlockSerializer from unique_ptr to object (#22653)
change BlockSerializer from unique_ptr to object
2023-08-07 14:47:21 +08:00
0ca0c162b1 [fix][load] fix memtable reset cause nullptr (#22577) 2023-08-07 10:45:09 +08:00
af8774c2e6 [Test](function) not unpack when else column is const null in IF function (#22419) 2023-08-07 09:34:48 +08:00
1847e440b2 [fix](memory) enable Jemalloc arena dirty pages (#22639)
If there is a core dump here, it may cover up the real stack, if stack trace indicates heap corruption
(which led to invalid jemalloc metadata), like double free or use-after-free in the application.
Try sanitizers such as ASAN, or build jemalloc with --enable-debug to investigate further.
2023-08-06 19:18:44 +08:00
1a8a1e5b16 [Feature](count_by_enum) support count_by_enum function (#22071)
count_by_enum(expr1, expr2, ... , exprN);

Treats the data in a column as an enumeration and counts the number of values in each enumeration. Returns the number of enumerated values for each column, and the number of non-null values versus the number of null values.
2023-08-06 16:05:14 +08:00
c2c01825c1 [opt](stacktrace) Optimize stacktrace output #22467 2023-08-06 15:53:53 +08:00
d628baba0a [improvement](hdfs) support hedged read (#22634)
In some cases, the high load of HDFS may lead to a long time to read the data on HDFS,
thereby slowing down the overall query efficiency. HDFS Client provides Hedged Read.
This function can start another read thread to read the same data when a read request
exceeds a certain threshold and is not returned, and whichever is returned first will use the result.

eg:

create catalog regression properties (
    'type'='hms',
    'hive.metastore.uris' = 'thrift://172.21.16.47:7004',
    'dfs.client.hedged.read.threadpool.size' = '128',
    'dfs.client.hedged.read.threshold.millis' = "500"
);
2023-08-06 14:51:48 +08:00
ab3fc1df5e [chore](profile) Fix 'BlocksProduced' in plan_fragment_executor (#22637) 2023-08-06 12:42:39 +08:00
96f42ca20a [fix](memory) Independent count exec node memory profile (#22598)
Independent count exec node memory profile, after #22582
2023-08-06 10:56:31 +08:00
Pxl
7839a0e708 [Bug](brpc) fix brpc failed on big query came concurrently (#22600)
fix PriorityThreadPool get_info get wrong number
change brpc pool from priority to fifo
do not use brpc pool when send eos
2023-08-05 21:24:32 +08:00
55100277a1 [refactor](mysql writer) remove some unused code (#22632)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-08-05 17:59:14 +08:00
d3b50e3b2a [BUG](date_trunc) fix date_trunc function only handle lower string (#22602)
fix date_trunc function only handle lower string
2023-08-05 12:53:13 +08:00
fe6bae2924 [fix](invert index) supports utf8 and non-utf8 strings (#22570)
supports utf8 and non-utf8 strings: [fix] compatible with utf8 and invalid utf8 doris-thirdparty#110
2023-08-05 12:52:53 +08:00
6fe0aa492c [Chore](cmake) Remove ununsed be rowset CMakeLists.txt (#22627)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-08-05 12:51:58 +08:00
3024b82918 [fix](load)Fix wrong default value for char and varchar of reading json data (#22626)
If a column is defined as: col VARCHAR/CHAR NULL and no default value. Then we load json data which misses column col, the result queried is not correct:
+------+
| col |
+------+
| 1 |
+------+
But expect:
+------+
| col |
+------+
| NULL |
+------+

---------

Co-authored-by: duanxujian <duanxujian@jd.com>
2023-08-05 12:47:27 +08:00
38f9ac99df [fix](bug) fix be custom conf persistence path and read path are inconsistent (#22520)
be_custom.conf persistence path is ${doris_home}/conf/be_custom.conf, but if we set ${custom_config_dir} is a different path, will cause be can't read be_custom.conf from ${custom_config_dir}.

set be_custom.conf persist path to ${custom_config_dir}.
2023-08-05 10:22:08 +08:00
12262a2025 [fix](compaction) filter block row locations with delete sign should ignore merge on read scenario (#22628) 2023-08-05 09:15:38 +08:00
26e78ab418 [fix](compaction)none vertical compaction should also use _unique_key_next_block function to read block (#22614) 2023-08-05 00:24:57 +08:00
Pxl
c1c38c956d [exec] fix coredump when limit<0 and limit!=-1 with 1.2 fe (#22622) 2023-08-04 22:18:45 +08:00
8bbccc59ef [refactor](load) split segment flush out of beta rowset writer (#21725) 2023-08-04 19:48:56 +08:00
b122f9b80c [fix](concat) ColumnString::chars is resized with wrong size (#22610)
FunctionStringConcat::execute_impl resized with size that include string null terminator, which causes ColumnString::chars.size() does not match with ColumnString::offsets.back, this will cause problems for some string functions, e.g. like and regexp.
2023-08-04 19:13:35 +08:00
93593a013d [feature](load) add segment bytes limit in segcompaction (#22526) 2023-08-04 18:00:52 +08:00
7fe08c74fe [fix](inverted index) return empty result instead of error for empty match query (#22592)
return empty result instead of error for empty match query as follows:

`SELECT * FROM t WHERE msg MATCH ''`

`SELECT * FROM t WHERE msg MATCH 'stop_word'`
2023-08-04 17:36:32 +08:00
3d758de7a2 [improvement](binlog) gc be binlog metas when tablet is dropped. (#22447) 2023-08-04 14:38:13 +08:00
24c1953e91 [fix](debug) add bvar counter for memtable & loadchannel (#22578)
* [fix](debug) add bvar counter for memtable & loadchannel

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>

* format code

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>

---------

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-08-04 13:58:28 +08:00
ed6bb1fc9d [fix](memory) remove memory tracker profile refresh thread #22582
Memtrackers are usually bound to operators in query/load. If a large number of query/loads are stuck, memtrackers will be very large. memory tracker profile refresh thread will get stuck on the lock.

This pr is for branch-2.0, I will rewrite the memory profile in the next pr
2023-08-04 11:51:19 +08:00
868e65d618 [fix](compaction) rowid_conversion should ignore deleted row (#22579) 2023-08-04 11:41:17 +08:00
bad8237850 [BugFix](Es Catalog) fix bug that es catalog will return error when query partial columns (#22423)
Bug:
When the value of some ES column is empty, querying these value in doc_values mode will receive an error.

Reson:
In doc values mode, these values are empty, We need to determine if the array is empty
2023-08-04 11:28:30 +08:00
9c0528daf6 [Opt](orc-reader) opt the performance of date convertion. (#22381)
Opt the performance of date conversion in orc reader.

```
mysql> select count(l_commitdate) from lineitem;
+---------------------+
| count(l_commitdate) |
+---------------------+
|           600037902 |
+---------------------+
1 row in set (1.28 sec)

mysql> select count(l_commitdate) from lineitem;
+---------------------+
| count(l_commitdate) |
+---------------------+
|           600037902 |
+---------------------+
1 row in set (0.19 sec)
```
2023-08-04 10:52:09 +08:00
0c68f7e347 [peformance](load) cancel unstarted segcompaction tasks when build rowset (#22392) 2023-08-04 10:10:38 +08:00
e8d105d6ff [fix](debug) add bvar counter for memtracker #22581 2023-08-04 09:56:30 +08:00
1ed1b69485 [refactor](reader) move reader from vec/exec/scan to vec/exec/format (#22371)
This readers should be in vec/exec/format
2023-08-04 09:47:20 +08:00
Pxl
c4cee5122b [Chore](brpc) make error messages more verbose when brpc pool offer failed (#22558) 2023-08-03 22:02:37 +08:00
86e6f5d039 [FIX](decimal)fix decimal precision (#22364)
Now we make wrong for decimal parse from string
if given string precision is bigger than defined decimal precision, we will return a overflow error, but only digit part is bigger than typed digit length , we should return overflow error when we traverse given string to decimal value
2023-08-03 21:13:58 +08:00
e7e73a618c [exec](join) Print join type in profile (#22567) 2023-08-03 20:46:15 +08:00
Pxl
098bab7b30 [Bug](exchange) disable implicit conversion of block to bool (#22534)
disable implicit conversion of block to bool
2023-08-03 20:37:14 +08:00
ec187662be use correct bool value (#22507) 2023-08-03 20:09:57 +08:00
96a46302e8 [fix](stacktrace) Fix Jemalloc enable profile fail to run BE after rewrites dl_iterate_phdr (#22549)
Jemalloc heap profile follows libgcc's way of backtracing by default.
rewrites dl_iterate_phdr will cause Jemalloc to fail to run after enable profile.

TODO, two solutions:

- Jemalloc specifies GNU libunwind as the prof backtracing way, but my test failed,
--enable-prof-libunwind not work: --enable-prof-libunwind not work jemalloc/jemalloc#2504

- ClickHouse/libunwind solves Jemalloc profile backtracing, but the branch of ClickHouse/libunwind
has been out of touch with GNU libunwind and LLVM libunwind, which will leave the fate to others.
2023-08-03 19:32:36 +08:00
e90f95dfda [config](merge-on-write) use separate config to control primary key index cache (#22538) 2023-08-03 17:11:19 +08:00