Commit Graph

5177 Commits

Author SHA1 Message Date
38f9ac99df [fix](bug) fix be custom conf persistence path and read path are inconsistent (#22520)
be_custom.conf persistence path is ${doris_home}/conf/be_custom.conf, but if we set ${custom_config_dir} is a different path, will cause be can't read be_custom.conf from ${custom_config_dir}.

set be_custom.conf persist path to ${custom_config_dir}.
2023-08-05 10:22:08 +08:00
12262a2025 [fix](compaction) filter block row locations with delete sign should ignore merge on read scenario (#22628) 2023-08-05 09:15:38 +08:00
26e78ab418 [fix](compaction)none vertical compaction should also use _unique_key_next_block function to read block (#22614) 2023-08-05 00:24:57 +08:00
Pxl
c1c38c956d [exec] fix coredump when limit<0 and limit!=-1 with 1.2 fe (#22622) 2023-08-04 22:18:45 +08:00
8bbccc59ef [refactor](load) split segment flush out of beta rowset writer (#21725) 2023-08-04 19:48:56 +08:00
b122f9b80c [fix](concat) ColumnString::chars is resized with wrong size (#22610)
FunctionStringConcat::execute_impl resized with size that include string null terminator, which causes ColumnString::chars.size() does not match with ColumnString::offsets.back, this will cause problems for some string functions, e.g. like and regexp.
2023-08-04 19:13:35 +08:00
93593a013d [feature](load) add segment bytes limit in segcompaction (#22526) 2023-08-04 18:00:52 +08:00
7fe08c74fe [fix](inverted index) return empty result instead of error for empty match query (#22592)
return empty result instead of error for empty match query as follows:

`SELECT * FROM t WHERE msg MATCH ''`

`SELECT * FROM t WHERE msg MATCH 'stop_word'`
2023-08-04 17:36:32 +08:00
3d758de7a2 [improvement](binlog) gc be binlog metas when tablet is dropped. (#22447) 2023-08-04 14:38:13 +08:00
24c1953e91 [fix](debug) add bvar counter for memtable & loadchannel (#22578)
* [fix](debug) add bvar counter for memtable & loadchannel

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>

* format code

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>

---------

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-08-04 13:58:28 +08:00
3d0d5bfd6d [chore](cmake) Split thirdparty into cmake/thirdparty.cmake (#22572)
* [chore](cmake) Split thirdparty into cmake/thirdparty.cmake

* Add Apache License into thirdparty.cmake

Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>

---------

Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-08-04 13:21:22 +08:00
ed6bb1fc9d [fix](memory) remove memory tracker profile refresh thread #22582
Memtrackers are usually bound to operators in query/load. If a large number of query/loads are stuck, memtrackers will be very large. memory tracker profile refresh thread will get stuck on the lock.

This pr is for branch-2.0, I will rewrite the memory profile in the next pr
2023-08-04 11:51:19 +08:00
868e65d618 [fix](compaction) rowid_conversion should ignore deleted row (#22579) 2023-08-04 11:41:17 +08:00
bad8237850 [BugFix](Es Catalog) fix bug that es catalog will return error when query partial columns (#22423)
Bug:
When the value of some ES column is empty, querying these value in doc_values mode will receive an error.

Reson:
In doc values mode, these values are empty, We need to determine if the array is empty
2023-08-04 11:28:30 +08:00
9c0528daf6 [Opt](orc-reader) opt the performance of date convertion. (#22381)
Opt the performance of date conversion in orc reader.

```
mysql> select count(l_commitdate) from lineitem;
+---------------------+
| count(l_commitdate) |
+---------------------+
|           600037902 |
+---------------------+
1 row in set (1.28 sec)

mysql> select count(l_commitdate) from lineitem;
+---------------------+
| count(l_commitdate) |
+---------------------+
|           600037902 |
+---------------------+
1 row in set (0.19 sec)
```
2023-08-04 10:52:09 +08:00
0c68f7e347 [peformance](load) cancel unstarted segcompaction tasks when build rowset (#22392) 2023-08-04 10:10:38 +08:00
e8d105d6ff [fix](debug) add bvar counter for memtracker #22581 2023-08-04 09:56:30 +08:00
1ed1b69485 [refactor](reader) move reader from vec/exec/scan to vec/exec/format (#22371)
This readers should be in vec/exec/format
2023-08-04 09:47:20 +08:00
Pxl
c4cee5122b [Chore](brpc) make error messages more verbose when brpc pool offer failed (#22558) 2023-08-03 22:02:37 +08:00
86e6f5d039 [FIX](decimal)fix decimal precision (#22364)
Now we make wrong for decimal parse from string
if given string precision is bigger than defined decimal precision, we will return a overflow error, but only digit part is bigger than typed digit length , we should return overflow error when we traverse given string to decimal value
2023-08-03 21:13:58 +08:00
e7e73a618c [exec](join) Print join type in profile (#22567) 2023-08-03 20:46:15 +08:00
Pxl
098bab7b30 [Bug](exchange) disable implicit conversion of block to bool (#22534)
disable implicit conversion of block to bool
2023-08-03 20:37:14 +08:00
ec187662be use correct bool value (#22507) 2023-08-03 20:09:57 +08:00
96a46302e8 [fix](stacktrace) Fix Jemalloc enable profile fail to run BE after rewrites dl_iterate_phdr (#22549)
Jemalloc heap profile follows libgcc's way of backtracing by default.
rewrites dl_iterate_phdr will cause Jemalloc to fail to run after enable profile.

TODO, two solutions:

- Jemalloc specifies GNU libunwind as the prof backtracing way, but my test failed,
--enable-prof-libunwind not work: --enable-prof-libunwind not work jemalloc/jemalloc#2504

- ClickHouse/libunwind solves Jemalloc profile backtracing, but the branch of ClickHouse/libunwind
has been out of touch with GNU libunwind and LLVM libunwind, which will leave the fate to others.
2023-08-03 19:32:36 +08:00
d02b45e847 [chore](cmake) Refactor be CMakeLists option (#22499)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-08-03 19:25:04 +08:00
e90f95dfda [config](merge-on-write) use separate config to control primary key index cache (#22538) 2023-08-03 17:11:19 +08:00
f7755aa538 [exec](set_operation) Support one child node in set operation (#22463)
Support one child node in set operation
2023-08-03 10:35:59 +08:00
9f0a9e6fd6 [bug](distinct-agg) fix limit value not effective in some case (#22517)
fix limit value not effective in some case
2023-08-03 10:35:36 +08:00
c2db01037a [refactor](config) rename segcompaction_max_threads (#22468) 2023-08-02 22:35:14 +08:00
938f768aba [fix](parquet) resolve offset check failed in parquet map type (#22510)
Fix error when reading empty map values in parquet. The `offsets.back()` doesn't not equal the number of elements in map's key column.

### How does this happen
Map in parquet is stored as repeated group, and `repeated_parent_def_level` is set incorrectly when parsing map node in parquet schema.
```
the map definition in parquet:
 optional group <name> (MAP) {
   repeated group map (MAP_KEY_VALUE) {
     required <type> key;
     optional <type> value;
   }
}
```

### How to fix
Set the `repeated_parent_def_level` of key/value node as the definition level of map node.

`repeated_parent_def_level` is the definition level of the first ancestor node whose `repetition_type` equals `REPEATED`.  Empty array/map values are not stored in doris column, so have to use `repeated_parent_def_level` to skip the empty or null values in ancestor node.

For instance, considering an array of strings with 3 rows like the following:
`null, [], [a, b, c]`
We can store four elements in data column: `null, a, b, c`
and the offsets column is: `1, 1, 4`
and the null map is: `1, 0, 0`
For the `i-th` row in array column: range from `offsets[i - 1]` until `offsets[i]` represents the elements in this row, so we can't store empty array/map values in doris data column. As a comparison, spark does not require `repeated_parent_def_level`, because the spark column stores empty array/map values , and use anther length column to indicate empty values. Please reference: https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnVector.java

Furthermore, we can also avoid store null array/map values in doris data column. The same three rows as above, We can only store three elements in data column: `a, b, c`
and the offsets column is: `0, 0, 3`
and the null map is: `1, 0, 0`
2023-08-02 22:33:10 +08:00
Pxl
3d0d7a427b [Chore](brpc) display pool name when try offer failed (#22514) 2023-08-02 22:31:33 +08:00
9d3f1dcf44 [improvement](vectorized) Deserialized elements of count distinct aggregation directly inserted into target hashset (#21888)
The original logic is to first deserialize the ColumnString into a HashSet (insert the deserialized elements into the hashset), and then traverse all the HashSet elements into the target HashSet during the merge phase.
After optimization, when deserializing, elements are directly inserted into the target HashSet, thereby reducing unnecessary hashset insert overhead.

In one of our internal query tests, 30 hashsets were merged in second phase aggregation(the average cardinality is 1,400,000), and the cardinality after merging is 42,000,000. After optimization, the MergeTime dropped from 5s965ms to 3s375ms.
2023-08-02 21:19:56 +08:00
781c1d5238 [log](load) add debug logs for potential duplicate tablet ids (#22485) 2023-08-02 20:38:41 +08:00
0cd5183556 [Refactor](inverted index) refact tokenize function for inverted index (#22313) 2023-08-02 19:12:22 +08:00
4bc65aa921 [fix](load) PrefetchBufferedReader Crashing caused updating counter with an invalid runtime profile (#22464) 2023-08-02 18:19:48 +08:00
Pxl
751a7680c5 [Bug](exchange) fix core dump on send_local_block (#22494)
fix core dump on send_local_block
2023-08-02 18:12:34 +08:00
ddd90855a9 [vectorized](udaf) java udaf support with map type (#22397)
[vectorized](udaf) java udaf support with map type (#22397)
* test
* remove some unused
* update
* add case
2023-08-02 15:03:44 +08:00
18692b2a7c fixed (#22481)
[FIX](array) fix array-dcheck-contains_null
2023-08-02 14:22:16 +08:00
e991f607d5 [fix](string-column) fix unescape length error (#22411) 2023-08-02 12:18:05 +08:00
Pxl
f5e3cd2737 [Improvement](aggregation) optimization for aggregation hash_table_lazy_emplace (#22327)
optimization for aggregation hash_table_lazy_emplace
2023-08-02 11:50:21 +08:00
bc87002028 [opt](conf) remote scanner thread num is changed to core num * 10 (#22427) 2023-08-01 23:09:49 +08:00
19d1f49fbe [improvement](compaction) compaction policy and options in the properties of a table (#22461) 2023-08-01 22:02:23 +08:00
bf50f9fa7f [fix](decimal) fix cast rounding half up with negative number (#22450) 2023-08-01 21:47:42 +08:00
b8399148ef [fix](DOE) es catalog not working with pipeline,datetimev2, array and esquery (#22046) 2023-08-01 21:45:16 +08:00
ff0fda460c [be](parameter) change default fragment_pool_thread_num_max from 512 to 2048 (#22448)
change some parameter's default value:
brpc_num_threads from -1 to 256
compaction_task_num_per_disk from 2 to 4
compaction_task_num_per_fast_disk from 4 to 8
fragment_pool_thread_num_max from 512 to 2048
fragment_pool_queue_size from 2048 to 4096

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-08-01 20:33:41 +08:00
4d3e56e2e7 [fix][regression-test] change lazy open regression test name (#22404) 2023-08-01 20:26:10 +08:00
f16a39aea1 [feature](time) using timev2 type to replace the old time type. (#22269) 2023-08-01 15:59:07 +08:00
43d783ae21 [fix](vertical compaction) compaction block reader should return error when reading next block failed (#22431) 2023-08-01 14:09:18 +08:00
f842067354 [fix](merge-on-write) fix duplicate keys occur when be restart (#22437)
For mow table, delete bitmap of stale rowsets has not been persisted. When be restart, duplicate keys will occur if read stale rowsets.
Therefore, for the mow table, we do not allow reading the stale rowsets. Although this may result in VERSION_ALREADY_MERGED error when query after be restart, its probability of occurrence is relatively low.
2023-08-01 14:07:04 +08:00
3a11de889f [Opt](exec) opt the performance of date parquet convert by date dict (#22384)
before:

mysql> select count(l_commitdate) from lineitem;
+---------------------+
| count(l_commitdate) |
+---------------------+
| 600037902 |
+---------------------+
1 row in set (0.86 sec)
after:

mysql> select count(l_commitdate) from lineitem;
+---------------------+
| count(l_commitdate) |
+---------------------+
| 600037902 |
+---------------------+
1 row in set (0.36 sec)
2023-08-01 12:24:00 +08:00