Commit Graph

1754 Commits

Author SHA1 Message Date
a6bc9cbe53 [Function] Refactor the function code of log (#8199)
1. Support return null when input is invalid
2. Del the unless code in vec function

Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-24 11:06:58 +08:00
Pxl
90a8ca808a [Bug][Vectorized] fix bitmap_min(empty) not return null (#8190) 2022-02-24 11:06:27 +08:00
9a7931cfed [fix](mem-pool) fix bug that mem pool failed to allocate in ASAN mode (#8216)
Also fix BE ut:

1. fix scheme_change_test memory leak
2. fix mem_pool_test
    Do not using DEFAULT_PADDING_SIZE = 0x10 in mem_pool when running ut.
3. remove plugin_test
2022-02-24 10:52:58 +08:00
0726a43a2a [fix](be-ut) Fix unused-but-set-variable errors. (#8211) 2022-02-23 21:43:15 +08:00
83543c67fe [improvement](storage)Using Be config to switch storage layer vectorization #8166
Using Be config to switch storage layer vectorization #8166
2022-02-23 20:11:28 +08:00
01fb25a498 [UT] Fix the UT of column_nullable_test (#8180)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-23 15:37:40 +08:00
e3f1efcbbf [Vec][Storage] Support delete condition;ut (#8091)
Co-authored-by: Wang Bo <wangbo36@meituan.com>
2022-02-23 12:48:18 +08:00
d17ed5e27a [vectorization](storage)support seq column in storage layer (#8186)
[vectorization](storage)support seq column in storage layer (#8186)
2022-02-23 12:23:31 +08:00
31ab569c1d [Vectorized][Feature] support some bitmap functions (#8138) 2022-02-23 11:42:16 +08:00
b1e7343532 [Vectorized] [HashJoin] Opt HashJoin Performance (#8119)
Co-authored-by: lihaopeng <happenlee@hotmail.com>
2022-02-23 10:28:16 +08:00
802fcbbb05 (#8162)refactor binary dict
Co-authored-by: zuochunwei <zuochunwei@meituan.com>
2022-02-22 11:23:54 +08:00
Pxl
87e555c27d [Feature][Vectorized] support function json_array/json_object/json_quote (#8158) 2022-02-22 09:29:56 +08:00
d6aebc0c2c [improvement] make asan work as much as possible (#8148)
* make ASAN poisoning work as much as possible

Before this patch a use after poison is reported like below
==19305==ERROR: AddressSanitizer: unknown-crash on address
0x625000137013 at pc 0x561c44bcf6b8 bp 0x7ffb75a00910 sp 0x7ffb75a000b8

After this patch the use after poison is reported like below
==17782==ERROR: AddressSanitizer: use-after-poison on address
0x625000137033 at pc 0x55633c8f56b8 bp 0x7ff3dc437930 sp 0x7ff3dc43

Before this patch, a false memory usage is reported like below
==33080==AddressSanitizer CHECK failed: ../../../../src/libsanitizer/
asan/asan_allocator.cpp:189 "((old)) == ((kAllocBegMagic))"
2022-02-22 09:29:22 +08:00
6e8d52f3fc [fix](stream-load) fix bug that stream load may be blocked with unqualified data (#8176)
Co-authored-by: morningman <chenmingyu@baidu.com>
2022-02-22 09:26:23 +08:00
47067e40a6 [refactor](common) optimize Status implemention: no dynamic new (#8117) 2022-02-22 09:23:29 +08:00
f13fd13e1b [fix] (schema change) Fix BE crash after schema change int column to varchar column(#8073) (#8142)
Co-authored-by: jianping.teng <tengjp@outlook.com>
2022-02-22 09:22:00 +08:00
d0ee101c2f [refactor] (runtime)tidy up the plan_fragment_executor codes (#8110)
Co-authored-by: zuochunwei <zuochunwei@meituan.com>
2022-02-22 09:20:27 +08:00
c47368f80c [fix] (udf) fix check_fn and fn_call function name not same (#8132) 2022-02-22 09:18:07 +08:00
16020cbdf9 [fix](lateral-view) Fix bug that explode_json_array_string return unstable result (#8152)
Co-authored-by: morningman <chenmingyu@baidu.com>
2022-02-21 09:38:36 +08:00
409aefdfbf [refactor] add some log when close parquet file (#8144) 2022-02-21 09:36:53 +08:00
826738d97f [docs]Some doc improvements and typo fix (#8153) 2022-02-21 09:36:01 +08:00
56adc7f56b [Bug][vec] Fix bug of nullable const value convert to argument cause coredump (#8139)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-20 20:05:23 +08:00
4926c0bee7 [typo] translate the comments of byte_buffer.h (#8127)
translate the comments of byte_buffer.h
2022-02-19 12:06:35 +08:00
5f50d9ae3b predicate test bugfix (#8134)
Co-authored-by: zuochunwei <zuochunwei@meituan.com>
2022-02-19 12:05:26 +08:00
0f7a25367d [fix](rowset-meta) Fix bug that rowset meta is not deleted (#8118)
As described in #8120, a large number of rowset meta remain in rocksdb, which may be generated by:

1. drop tablet

    The drop tablet task itself just sets the state of the tablet meta to `SHUTDOWN`
    and moves the tablet to `_shutdown_tablets` vector then the background thread
    will periodically clean up the tablet in `_shutdown_tablets` (that's why even if we execute
    the `drop table xx force`, the tablet may be delayed by 10min to 1 hour before it goes into the trash directory).

    The regular cleanup thread in the background saves the complete tablet meta as a `.hdr` file
    when deleting the tablet, and then moves it to the trash directory along with the data files.

    But this process does not process the rowset meta (before doing the checkpoint of the tablet meta,
    the rowset meta is stored independently in rocksdb as a key-value). So this results in a residual rowset meta.

2. clone task

    The clone task may migrate back and forth between BEs, which may result in a situation
    where the tablet id is the same on the BE, but the tablet uuid is different.
    This leads to some rowset meta can not find the corresponding tablet, but there is no thread
    to process these rowsets, and eventually lead to residual.

This is PR, I handled it in the regular cleanup thread with method `_clean_unused_rowset_metas()`.
I did not delete rowset meta along with "drop tablet" task, because "drop tablet" itself is not a synchronous operation.
It also relies on a background thread to clean up the tablet periodically.
So I put this operation in the background cleanup thread.
2022-02-19 12:00:48 +08:00
9cb9781d86 [chore](storage) add STORAGE_LAYER_VECTORIZED_SWITCH (#8005)
if you want test storage layer vectorized, you need modify some codes to let vectorized storage layer working,
it's boring work.

now, you can just change one code (redefine the macro STORAGE_LAYER_VECTORIZED_SWITCH as 1 or 0),
this gets more convenient.
2022-02-19 11:47:36 +08:00
50864aca7d [refactor] fix warings when compile with clang (#8069) 2022-02-19 11:29:02 +08:00
8892780091 [Vectorized][Feature] support agg function percentile&&percentile_approx (#8066) 2022-02-18 13:42:24 +08:00
bcde1f265a [Function][Vectorized] Support least/greast function (#8107)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-18 11:57:07 +08:00
68b24d608f [fix] (vectorization)Fix nullable column compute the hash value error (#8105)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-18 11:20:47 +08:00
d383821fd5 [refactor] Remove unused code in data dir (#8092) 2022-02-18 11:14:02 +08:00
31399d5876 [Bug][Vec] Fix the bug of coredump when vec exec engine with delete condition (#8109)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-18 11:09:05 +08:00
b9f0b5565c [refactor](storage) refactor some interfaces of storage layer column (#8064)
1 format binary plain
2 remove batch_set_null_bitmap
3 fix segiter return value
4 set insert_many_binary_data args
2022-02-18 10:54:51 +08:00
936da4f10a [feature](thread-pool) Support thread pool per disk for scanners (#7994)
Support thread pool per disk for scanners to prevent pool performance from some high ioutil disks happening

key point:
1. each disk has a thread pool for scanners
2. whenever a thread pool of one disk runs out of local work, tasks can be retrieved from other threads(disks). This is done round-robin.

performance testing: 
vec version: 25% faster than single thread pool in a high io util disk test case
normal version: 8% faster than single thread pool in a high io util disk test case
2022-02-18 09:40:58 +08:00
a162f56284 (test) resolve unit test failed problem for VGenericIteratorsTest
Co-authored-by: zuochunwei <zuochunwei@meituan.com>
2022-02-17 20:03:07 +08:00
bdd78f20c8 [Vectorized][HashJoin] Eliminate hashjoin branch prediction (#8051)
Co-authored-by: jewisliu <jewisliu@tencent.com>
2022-02-17 19:00:26 +08:00
Pxl
e0dbf48682 [Vectorized] [AggFunction] Support group_concat (#8086) 2022-02-17 14:19:07 +08:00
f6e2a4fe16 [Vectorized][Function] Support year/month/week/hour/mintue/day/second floor/ceil function (#8068)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-17 14:18:02 +08:00
f8411f3c6a [refactor](mysql_table_writer)split into two parts of vectorized and row mode (#8081) 2022-02-17 11:29:25 +08:00
26289c28b0 [fix](load)(compaction) Fix NodeChannel coredump bug and modify some compaction logic (#8072)
1. Fix the problem of BE crash caused by destruct sequence. (close #8058)
2. Add a new BE config `compaction_task_num_per_fast_disk`

    This config specify the max concurrent compaction task num on fast disk(typically .SSD).
    So that for high speed disk, we can execute more compaction task at same time,
    to compact the data as soon as possible

3. Avoid frequent selection of unqualified tablet to perform compaction.
4. Modify some log level to reduce the log size of BE.
5. Modify some clone logic to handle error correctly.
2022-02-17 10:52:08 +08:00
Pxl
f06c13a828 [feature](vec)(function) support function convert_tz() (#8060) 2022-02-17 10:51:32 +08:00
bef1b55c1f [feature][fix](vec)(function) Fix multi args function call the DATETIME type not effective in DATE type and add the alias function (#8050)
1. Support some function alias of mod/fmod, adddate/add_data
2. Support some function of multi args: week, yearweek
3. Fix bug of multi args function call the DATETIME type not effective in DATE type
2022-02-17 10:49:25 +08:00
53f22bbc14 [fix] fix incorrect serialized_size of TDigest object (#8046) 2022-02-17 10:47:22 +08:00
d1cb2913c1 [improvement] check simd instructions before start (#8042)
Sometimes BE is build on a machine with SIMD instruction such as AVX2.
But the BE binary will be copied to a machine without AVX2. It will crashed without any error message.

This PR will check the required SIMD instructions and print error messages during startup.
2022-02-17 10:46:03 +08:00
0003822da7 [feature](vec) add ColumnHLL to support hll type (#7828) 2022-02-17 10:44:42 +08:00
Pxl
143c4085ee [Feature][Vectorized] support aggregate function ndv()/approx_count_distinct() (#8044) 2022-02-16 14:30:13 +08:00
a6bf8c13eb [Feature](Transaction) Support two phase commit (2PC) for stream load (#7473)
The two phase batch commit means:
During Stream load, after data is written, the message will be returned to the client,
the data is invisible at this point and the transaction status is PRECOMMITTED.
The data will be visible only after COMMIT is triggered by client.
    
1. User can invoke the following interface to trigger commit operations for transaction:

curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:commit" \
http://fe_host:http_port/api/{db}/_stream_load_2pc

or

curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:commit" \
http://be_host:webserver_port/api/{db}/_stream_load_2pc

    
2.User can invoke the following interface to trigger abort operations for transaction:

curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:abort" \
http://fe_host:http_port/api/{db}/_stream_load_2pc

or

curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:abort" \
http://be_host:webserver_port/api/{db}/_stream_load_2pc
2022-02-16 11:55:04 +08:00
25d64775d1 [Vectorized][Feature] Support mysql external table insert into stm (#7979) 2022-02-15 14:58:58 +08:00
884fddbf33 [fix](compatibility) Fix compatibility issue of PRowBatch and some tablet sink bugs (#8000)
1. set both `tuple_offsets` and `new_tuple_offsets` in PRowBatch for compatibility
2. set FE config `repair_slow_replica` default to false
   Avoid impacting the load process after upgrading.
   Eg, if there are only 2 replicas, one is with high version count. After upgrade,
   that replica will be set to bad, so that the load process will be stopped
   because only 1 replica is alive.
3. Fix a bug that NodeChannel may be blocked at `close_wait()`
   Forget to set `add_batch_finish` flag after the last rpc finished.
4. Fix a NPE of RoutineLoadScheduler
2022-02-15 11:23:19 +08:00
a390b766d4 [Improvement] BE could print log foreground when not use daemon mode (#8031) 2022-02-14 09:30:12 +08:00