Commit Graph

5146 Commits

Author SHA1 Message Date
9d3f1dcf44 [improvement](vectorized) Deserialized elements of count distinct aggregation directly inserted into target hashset (#21888)
The original logic is to first deserialize the ColumnString into a HashSet (insert the deserialized elements into the hashset), and then traverse all the HashSet elements into the target HashSet during the merge phase.
After optimization, when deserializing, elements are directly inserted into the target HashSet, thereby reducing unnecessary hashset insert overhead.

In one of our internal query tests, 30 hashsets were merged in second phase aggregation(the average cardinality is 1,400,000), and the cardinality after merging is 42,000,000. After optimization, the MergeTime dropped from 5s965ms to 3s375ms.
2023-08-02 21:19:56 +08:00
781c1d5238 [log](load) add debug logs for potential duplicate tablet ids (#22485) 2023-08-02 20:38:41 +08:00
0cd5183556 [Refactor](inverted index) refact tokenize function for inverted index (#22313) 2023-08-02 19:12:22 +08:00
4bc65aa921 [fix](load) PrefetchBufferedReader Crashing caused updating counter with an invalid runtime profile (#22464) 2023-08-02 18:19:48 +08:00
Pxl
751a7680c5 [Bug](exchange) fix core dump on send_local_block (#22494)
fix core dump on send_local_block
2023-08-02 18:12:34 +08:00
ddd90855a9 [vectorized](udaf) java udaf support with map type (#22397)
[vectorized](udaf) java udaf support with map type (#22397)
* test
* remove some unused
* update
* add case
2023-08-02 15:03:44 +08:00
18692b2a7c fixed (#22481)
[FIX](array) fix array-dcheck-contains_null
2023-08-02 14:22:16 +08:00
e991f607d5 [fix](string-column) fix unescape length error (#22411) 2023-08-02 12:18:05 +08:00
Pxl
f5e3cd2737 [Improvement](aggregation) optimization for aggregation hash_table_lazy_emplace (#22327)
optimization for aggregation hash_table_lazy_emplace
2023-08-02 11:50:21 +08:00
bc87002028 [opt](conf) remote scanner thread num is changed to core num * 10 (#22427) 2023-08-01 23:09:49 +08:00
19d1f49fbe [improvement](compaction) compaction policy and options in the properties of a table (#22461) 2023-08-01 22:02:23 +08:00
bf50f9fa7f [fix](decimal) fix cast rounding half up with negative number (#22450) 2023-08-01 21:47:42 +08:00
b8399148ef [fix](DOE) es catalog not working with pipeline,datetimev2, array and esquery (#22046) 2023-08-01 21:45:16 +08:00
ff0fda460c [be](parameter) change default fragment_pool_thread_num_max from 512 to 2048 (#22448)
change some parameter's default value:
brpc_num_threads from -1 to 256
compaction_task_num_per_disk from 2 to 4
compaction_task_num_per_fast_disk from 4 to 8
fragment_pool_thread_num_max from 512 to 2048
fragment_pool_queue_size from 2048 to 4096

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-08-01 20:33:41 +08:00
4d3e56e2e7 [fix][regression-test] change lazy open regression test name (#22404) 2023-08-01 20:26:10 +08:00
f16a39aea1 [feature](time) using timev2 type to replace the old time type. (#22269) 2023-08-01 15:59:07 +08:00
43d783ae21 [fix](vertical compaction) compaction block reader should return error when reading next block failed (#22431) 2023-08-01 14:09:18 +08:00
f842067354 [fix](merge-on-write) fix duplicate keys occur when be restart (#22437)
For mow table, delete bitmap of stale rowsets has not been persisted. When be restart, duplicate keys will occur if read stale rowsets.
Therefore, for the mow table, we do not allow reading the stale rowsets. Although this may result in VERSION_ALREADY_MERGED error when query after be restart, its probability of occurrence is relatively low.
2023-08-01 14:07:04 +08:00
3a11de889f [Opt](exec) opt the performance of date parquet convert by date dict (#22384)
before:

mysql> select count(l_commitdate) from lineitem;
+---------------------+
| count(l_commitdate) |
+---------------------+
| 600037902 |
+---------------------+
1 row in set (0.86 sec)
after:

mysql> select count(l_commitdate) from lineitem;
+---------------------+
| count(l_commitdate) |
+---------------------+
| 600037902 |
+---------------------+
1 row in set (0.36 sec)
2023-08-01 12:24:00 +08:00
a371e1d4c5 [fix](window_funnel_function) fix upgrade compatibility due to the added field in WindowFunnelState (#22416) 2023-08-01 12:08:55 +08:00
d585a8acc1 [Improvement](shuffle) Accumulate rows in a batch for shuffling (#22218) 2023-08-01 09:55:06 +08:00
5f25b924b3 [opt](conf) Modify brpc eovercrowded conf (#22407)
brpc ignore eovercrowded of data stream sender and exchange sink buffer
Modify the default value of brpc_socket_max_unwritten_bytes
2023-08-01 08:47:55 +08:00
66e540bebe [Fix](executor)Fix incorrect mem_limit return value type (#22415) 2023-07-31 22:28:41 +08:00
c1f36639fd [fix](sort) VSortedRunMerger does not return any rows with a large offset value (#22191) 2023-07-31 22:28:13 +08:00
89433f6a13 [fix](complex_type) throw error when reading complex types in broker/stream load (#22331)
Check whether there are complex types in parquet/orc reader in broker/stream load. Broker/stream load will cast any type as string type, and complex types will be casted wrong. This is a temporary method, and will be replaced by tvf.
2023-07-31 22:23:08 +08:00
c25b9071ad [opt](conf) Modify brpc work pool conf default value #22406
Default, if less than or equal 32 core, the following are 128, 128, 10240, 10240 in turn.
if greater than 32 core, the following are core num * 4, core num * 4, core num * 320, core num * 320 in turn

brpc_heavy_work_pool_threads
brpc_light_work_pool_threads
brpc_heavy_work_pool_max_queue_size
brpc_light_work_pool_max_queue_size
2023-07-31 20:38:34 +08:00
3b1be39033 [fix](load) load core dump print load id (#22388)
save the load id to the thread context,
expect all task ids to be saved in thread context, compaction/schema change/etc.
2023-07-31 18:29:38 +08:00
7261845b3d [FIX](complex-type)fix complex type nested col_const (#22375)
for array/map/struct in mysql_writer unpack_if_const only unpack self column not nested , so col_const should not used in nested column.
2023-07-31 14:53:18 +08:00
147a148364 [refactor](segcompaction) simplify submit_seg_compaction_task interface (#22387) 2023-07-31 13:53:38 +08:00
f2919567df [feature](datetime) Support timezone when insert datetime value (#21898) 2023-07-31 13:08:28 +08:00
b64f62647b [runtime filter](profile) add merge time on non-pipeline engine (#22363) 2023-07-31 12:52:42 +08:00
ee754307bb [refactor](load) refactor memtable flush actively (#21634) 2023-07-30 21:31:54 +08:00
79289e32dc [fix](cast) fix wrong result of casting empty string to array date (#22281) 2023-07-30 21:15:03 +08:00
63a9a886f5 [enhance](S3) add s3 bvar metrics for all s3 operation (#22105) 2023-07-30 21:09:17 +08:00
06e4061b94 [enhance](ColdHeatSeparation) carry use path style info along with cold heat separation to support using minio (#22249) 2023-07-30 21:03:33 +08:00
4077338284 [Opt](parquet) opt the performance of date convertion (#22360)
before:
```
mysql>  select count(l_commitdate) from lineitem;
+---------------------+
| count(l_commitdate) |
+---------------------+
|           600037902 |
+---------------------+
1 row in set (1.61 sec)
```

after:
```
mysql>  select count(l_commitdate) from lineitem;
+---------------------+
| count(l_commitdate) |
+---------------------+
|           600037902 |
+---------------------+
1 row in set (0.86 sec)
```
2023-07-30 15:54:13 +08:00
e47d1fccf5 [bugfix](be core) fragment executor's destruct method should be called before query context (#22362)
fragment executor's destruct method will call close, it depends on query context's object pool, because many object is put in query context's object pool such as runtime filter.
It should be deleted before query context. Or there will be heap use after free error.
It is fixed in #17675, but Do not know why not in master. So 1.2-lts does not have this problem.
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-07-29 22:41:46 +08:00
765f1b6efe [Refactor](load) Extract load public code (#22304) 2023-07-29 12:56:31 +08:00
47c2cc5c74 [vectorized](udf) java udf support with return map type (#22300) 2023-07-29 12:52:27 +08:00
Pxl
210f6661b4 [Bug](profile) add lock on add_filter_info #22355
multiple scanner may update profile at same time
2023-07-29 12:45:50 +08:00
bc88d34b16 [bug](distinct-agg) fix distinct-agg outblock columns size not equal key size (#22357)
* [imporve](flex) support scientific notation(aEb) parser

* update

* [bug](distinct-agg) fix distinct-agg outblock columns size not equal key size
2023-07-29 12:44:44 +08:00
302de27985 [Refactor] Refactor some code with three-way comparison (#22170)
Refactor some code with three-way comparison
2023-07-29 11:30:15 +08:00
ae8a26335c [opt](hive)opt select count(*) stmt push down agg on parquet in hive . (#22115)
Optimization "select count(*) from table" stmtement , push down "count" type to BE.
support file type : parquet ,orc in hive .

1. 4kfiles , 60kwline num 
    before:  1 min 37.70 sec 
    after:   50.18 sec

2. 50files , 60kwline num
    before: 1.12 sec
    after: 0.82 sec
2023-07-29 00:31:01 +08:00
53d255f482 [fix](partial update) remove CHECK on illegal number of partial columns (#22319) 2023-07-28 23:11:58 +08:00
5b14d9fcdc [fix](compaction) fix time series compaction policy corner case (#22238) 2023-07-28 23:07:36 +08:00
0cc3232d6f [Improve](topn opt) modify fetch rpc timeout from 20s to 30s, since fetch is quite heavy sometimes (#22163) 2023-07-28 17:56:18 +08:00
Pxl
f7e0479605 [Chore](refactor) remove some unused code (#22152)
remove some unused code
2023-07-28 17:30:46 +08:00
ec1a4d172b (vertical compaction) fix vertical compaction core (#22275)
* (vertical compaction) fix vertical compaction core
co-author:@zhannngchen
2023-07-28 16:41:00 +08:00
0c734a861e [Enhancement](delete) eliminate reading the old values of non-key columns for delete stmt (#22270) 2023-07-28 14:37:33 +08:00
c2155678ca [fix](functions) fix now(null) crash (#22321)
before: BE crash
now:

mysql [test]>select now(null);
+-----------+
| now(NULL) |
+-----------+
| NULL      |
+-----------+
1 row in set (0.06 sec)
2023-07-28 14:07:56 +08:00