Support delta encoding and rle(bool) to read Glue data
add delta bit pack decoder,
add delta length byte array decoder,
add delta byte array decoder.
add rle bool decoder.
We find some data type is read with delta encoding on AWS Glue, so it should be supported.
The definition of delta encoding can refer to the delta encoding in parquet.
1. add support `CAST AS Struct` from Struct type;
2. fix crash while `CAST('{}' AS Struct)`;
3. `CAST('' AS complext_type)` should return NULL instead of empty object;
---------
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
the input replicate_offsets should be the same size as ColumnArray's offset.
```
IColumn::Offsets replicate_offsets(get_offsets().size(), 0);
// |---------------------|-------------------------|-------------------------|
// [0, begin) [begin, begin + count_sz) [begin + count_sz, size())
// do not need to copy copy counts[n] times do not need to copy
```
we should
The current distribution model for Doris is as follows:
OlapTableSink seperate the original Block into serveral subblocks of each node(BE) by tablets distribution and distributes subblocks to storage engine of backends, then the storage engine will seperate the subblock into multiple tablets channel and each delta writer will handle partial of the block.
This model causes blocks to be split according to tablets, and the splitting process can be a relatively heavy operation. After splitting, the blocks are distributed to different DeltaWriters (Memtables) through RPCs to TabletChannels. The distribution operation on TabletChannels is also a relatively heavy operation. If the distribution property of the table is RANDOM distribution, then we have the opportunity to distribute the blocks according to the complete block during distribution. The advantage of doing so is to reduce memory copying and improve write locality, similar to appending the entire block to the memtable.
This optimze could save 10% ~ 20% CPU cost of RANDOM distribution table load when enable load_to_single_tablet
Example:
SELECT ROUTINE_SCHEMA AS PROCEDURE_CAT, NULL AS PROCEDURE_SCHEM,ROUTINE_NAME AS PROCEDURE_NAME,NULL AS NUM_INPUT_PARAMS,NULL AS NUM_OUTPUT_PARAMS,NULL AS NUM_RESULT_SETS,ROUTINE_COMMENT AS REMARKS,IF(ROUTINE_TYPE = 'FUNCTION', 2,IF(ROUTINE_TYPE= 'PROCEDURE', 1, 0)) AS PROCEDURE_TYPE FROM INFORMATION_SCHEMA.ROUTINES WHERE ROUTINE_SCHEMA = DATABASE();
ERROR 1105 (HY000): errCode = 2, detailMessage = invalid parameter
This wrong and some BI tools could not work correctly.
In the past, only simple predicates (slot=const), and, like, or (only bitmap index) could be pushed down to the storage layer. scan process:
Read part of the column first, and calculate the row ids with a simple push-down predicate.
Use row ids to read the remaining columns and pass them to the scanner, and the scanner filters the remaining predicates.
This pr will also push-down the remaining predicates (functions, nested predicates...) in the scanner to the storage layer for filtering. scan process:
Read part of the column first, and use the push-down simple predicate to calculate the row ids, (same as above)
Use row ids to read the columns needed for the remaining predicates, and use the pushed-down remaining predicates to reduce the number of row ids again.
Use row ids to read the remaining columns and pass them to the scanner.
* (merge-on-write) when if publish and be down, need recalc delete bitmap for MoW
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
* fix code
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
---------
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
ECB algorithm, block_encryption_mode does not take effect, it only takes effect when init vector is provided.
Solved: 192/256 supports calculation without init vector
For other algorithms, an error should be reported when there is no init vector
Initialization Vector. The default value for the block_encryption_mode system variable is aes-128-ecb, or ECB mode, which does not require an initialization vector. The alternative permitted block encryption modes CBC, CFB1, CFB8, CFB128, and OFB all require an initialization vector.
Reference: https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_aes-decrypt
Note: This fix does not support smooth upgrades. during upgrade process, query may report error: funciton not found
The _src_block_mem_reuse variable actually not work, since the _src_block is cleared each time when we call get_block.
But current code may cause core dump, see issue #17587. Because we insert some result column generated by expr into dest block, and such a column holds a pointer to some column in original schema. When clearing the data of _src_block, some column's data in dest block is also cleared.
e.g. coalesce will return a result column which holds a pointer to some original column, see issue #17588
Result of select bitmap_to_string(bitmap_or(to_bitmap(1), null)) should be 1 instead of null.
This PR fix logic of bitmap_or and bitmap_or_count.
Other count related funcitons should also be checked and fix, they will be fixed in another PR.
Fix MacOS mem_limit parse result is 0.
Fix GC after env Init, otherwise, when the memory is insufficient, BE will start failure.
*** Query id: 0-0 ***
*** Aborted at 1677833773 (unix time) try "date -d @1677833773" if you are using GNU date ***
*** Current BE git commitID: 8ee5f45 ***
*** SIGSEGV address not mapped to object (@0x70) received by PID 24145 (TID 0x7fa53c9fd700) from PID 112; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at be/src/common/signal_handler.h:420
1# os::Linux::chained_handler(int, siginfo*, void*) in /usr/local/jdk/jre/lib/amd64/server/libjvm.so
2# JVM_handle_linux_signal in /usr/local/jdk/jre/lib/amd64/server/libjvm.so
3# signalHandler(int, siginfo*, void*) in /usr/local/jdk/jre/lib/amd64/server/libjvm.so
4# 0x00007FA56295A400 in /lib64/libc.so.6
5# doris::MemTrackerLimiter::log_process_usage_str(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) at be/src/runtime/memory/mem_tracker_limiter.cpp:208
6# doris::MemTrackerLimiter::print_log_process_usage(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) at be/src/runtime/memory/mem_tracker_limiter.cpp:226
7# doris::Daemon::memory_maintenance_thread() at be/src/common/daemon.cpp:245
8# doris::Thread::supervise_thread(void*) at be/src/util/thread.cpp:455
9# start_thread in /lib64/libpthread.so.0
10# clone in /lib64/libc.so.6
1. change PipelineTaskState to enum class
2. remove some row-based code on FoldConstantExecutor::_get_result
3. reduce memcpy on minmax runtime filter function(Now we can guarantee that the input data is aligned)
4. add Wunused-template check, and remove some unused function, change some static function to inline function.
the max_pushdown_conditions_per_column limit, the column will not perform predicate pushdown, but if there are subsequent columns that need to be pushed down, the subsequent column pushdown will be misplaced in _scan_keys and it causes query results to be wrong
Co-authored-by: tongyang.hty <hantongyang@douyu.tv>