Commit Graph

180 Commits

Author SHA1 Message Date
ba882dea21 [pipelineX](dependency) Build DAG between pipelines (#23355) 2023-08-23 13:21:32 +08:00
Pxl
8ed4045df9 [Chore](primitive-type) remove VecPrimitiveTypeTraits (#22842) 2023-08-23 08:37:40 +08:00
dcd6c3c022 [pipelineX](refactor) propose a new pipeline execution model (#22562) 2023-08-21 15:38:45 +08:00
Pxl
477961dc21 [Chore](agg) refactor of hash map (#22958)
refactor of hash map
2023-08-18 17:59:30 +08:00
Pxl
d371101bfd [Improvement](aggregation) make fixed hashmap's bitmap_size flexable (#22573)
make fixed hashmap's bitmap_size flexable
2023-08-14 10:47:06 +08:00
4e880288c6 [refactor]use clear concept to replace std::enable_if_t (#22801)
---------

Signed-off-by: flynn <fenglv15@mails.ucas.ac.cn>
2023-08-12 15:10:30 +08:00
30ceb7aea7 [fix](chore] need to remove reference in assert_cast (#22706) 2023-08-08 20:36:05 +08:00
Pxl
f5e3cd2737 [Improvement](aggregation) optimization for aggregation hash_table_lazy_emplace (#22327)
optimization for aggregation hash_table_lazy_emplace
2023-08-02 11:50:21 +08:00
f2919567df [feature](datetime) Support timezone when insert datetime value (#21898) 2023-07-31 13:08:28 +08:00
302de27985 [Refactor] Refactor some code with three-way comparison (#22170)
Refactor some code with three-way comparison
2023-07-29 11:30:15 +08:00
Pxl
9451382428 [Improvement](aggregate) optimization for AggregationMethodKeysFixed::insert_keys_into_columns (#22216)
optimization for AggregationMethodKeysFixed::insert_keys_into_columns
2023-07-26 16:19:15 +08:00
1f3de0eae3 [fix](memory) fix invalid large memory check && fix memory info thread safety (#22027)
fix invalid large memory check
fix memory info thread safety
2023-07-26 12:18:31 +08:00
6875ef4b8b [refactor](mem_reuse) refactor mem_reuse in MutableBlock (#21564) 2023-07-20 22:53:19 +08:00
a9ea138caf [fix](two level hash table) fix dead loop when converting to two level hash table for zero value (#21899)
When enable two level hash table , if there is zero value in the existing one level hash table, it will cause dead loop when converting to two level hash table, because the PartitionedHashTable::_is_partitioned flag is not set correctly when doing the converting.
2023-07-18 19:50:30 +08:00
4b30485d62 [improvement](memory) Refactor doris cache GC (#21522)
Abstract CachePolicy, which controls the gc of all caches.
Add stale sweep to all lru caches, including page caches, etc.
I0710 18:32:35.729460 2945318 mem_info.cpp:172] End Full GC Free, Memory 3866389992 Bytes. cost(us): 112165339, details: FullGC:
  FreeTopMemoryQuery:
     - CancelCostTime: 1m51s
     - CancelTasksNum: 1
     - FindCostTime: 0.000ns
     - FreedMemory: 2.93 GB
  WorkloadGroup:
  Cache name=DataPageCache:
     - CostTime: 15.283ms
     - FreedEntrys: 9.56K
     - FreedMemory: 691.97 MB
     - PruneAllNumber: 1
     - PruneStaleNumber: 1
2023-07-11 20:21:31 +08:00
4d17400244 [profile](join) add collisions into profile (#21510) 2023-07-06 14:30:10 +08:00
38c8657e5e [improve](memory) more grace logging for memory exceed limit (#21311)
more grace logging for Allocator and MemTracker when memory exceed limit
fix bthread grace exit.
2023-07-05 14:59:06 +08:00
790b771a49 [improvement](execute) Eliminate virtual function calls when serializing and deserializing aggregate functions (#21427)
Eliminate virtual function calls when serializing and deserializing aggregate functions.

For example, in AggregateFunctionUniq::deserialize_and_merge method, calling read_pod_binary(ref, buf) in the for loop generates a large number of virtual function calls.

void deserialize_and_merge(AggregateDataPtr __restrict place, BufferReadable& buf,
                           Arena* arena) const override {
    auto& set = this->data(place).set;
    UInt64 size;
    read_var_uint(size, buf);
    set.rehash(size + set.size());
    for (size_t i = 0; i < size; ++i) {
        KeyType ref;
        read_pod_binary(ref, buf);
        set.insert(ref);
    }
}

template <typename Type>
void read_pod_binary(Type& x, BufferReadable& buf) {
    buf.read(reinterpret_cast<char*>(&x), sizeof(x));
}
BufferReadable has only one subclass, VectorBufferReader, so it is better to implement the BufferReadable class directly.

The following sql was tested on SSB-flat dataset:

SELECT COUNT (DISTINCT lo_partkey), COUNT (DISTINCT lo_suppkey) FROM lineorder_flat;
before: MergeTime: 415.398ms
after opt: MergeTime: 174.660ms
2023-07-04 09:26:37 +08:00
25b5bab22d [fix](memory) Fix hash table buf initialize null pointer (#21315)
When compiling FunctionArrayEnumerateUniq::_execute_by_hash, AllocatorWithStackMemory::free(buf)
will be called when delete HashMapContainer. the gcc compiler will think that size > N and buf is not heap memory,
and report an error ' void free(void*)' called on unallocated object 'hash_map'

This only fails on doris docker + gcc 11.1, no problem on doris docker + clang 16.0.1,
no problem on ldb_toolchanin gcc 11.1 and clang 16.0.1.
2023-06-30 14:50:53 +08:00
0396f78590 [fix](memory) Remove ChunkAllocator & fix Allocator no use mmap (#21259) 2023-06-28 16:10:24 +08:00
609410d82b [opt](hashmap) memset the hashmap memory to improve performance (#21225) 2023-06-27 19:30:57 +08:00
c470bf56a5 [chore](build) Fix compilation errors reported by GCC-13 (#21215)
Add missing headers to fix the compilation errors reported by GCC-13.
2023-06-27 17:04:44 +08:00
e0b20f0437 [feature](function) add ip function ipv4numtostring (alias inet_ntoa) (#20936) 2023-06-27 10:17:40 +08:00
50c1d55769 [Improve](dynamic schema) support filtering invalid data (#21160)
* [Improve](dynamic schema) support filtering invalid data

1. Support dynamic schema to filter illegal data.
2. Expand the regular expression for ColumnName to support more column names.
3. Be compatible with PropertyAnalyzer and support legacy tables.
4. Default disable parse multi dimenssion array, since some bug unresolved
2023-06-26 19:32:43 +08:00
2c9bdd64fa [fix](memory) arena support memory reuse after clear() (#21033) 2023-06-21 23:27:21 +08:00
84b97860a1 [fix](memory) Fix memory exceed limit and query has been canceled, Allocator will block 100ms (#20959) 2023-06-21 17:35:19 +08:00
ca6f51fcd5 [Performance] disable mmap alloc for doris performance (#21034)
disable mmap alloc for some benchmark
2023-06-20 23:27:49 +08:00
6d579d924d [fix](profile) delete useless profile add_child #20989 2023-06-20 23:21:52 +08:00
08fff8923f [improvement](serde) Optimizing the performance of mysql result writter (#20928)
When converting query results into MySQL format, it involves transforming columnar data storage into row-based storage. This process raises the question of choosing between sequential reading and sequential writing. In reality, sequential writing is the better choice for performance optimization.

Test with 9M rows contains more than 20 columns, this patch can reduce the conversion time from 20s to 11s.
2023-06-19 12:29:01 +08:00
31a4f96f01 [refactor](exprcontext) move close to expr context's dector method (#20747)
The close method does nothing. But I am not sure we could remove it. So that I add it to dector method and remove many many calls.
2023-06-14 18:01:07 +08:00
9e21318834 [refactor](dynamic table) Make segment_writer unaware of dynamic schema, and ensure parsing is exception-safe. (#19594)
1. make ColumnObject exception safe
2. introduce FlushContext and construct schema at memtable flush stage to make segment independent from dynamic schema
3. add more test cases
2023-06-01 10:25:04 +08:00
Pxl
7415135ad4 [Enchancement](execute) make assert_cast can output derived class name (#20212)
before:
F0530 11:02:41.989699 1154607 assert_cast.h:54] Bad cast from type:doris::vectorized::IDataType const* to doris::vectorized::DataTypeAggState const*

after:
F0530 11:24:28.390286 1292475 assert_cast.h:46] Bad cast from type:doris::vectorized::DataTypeNullable* to doris::vectorized::DataTypeAggState const*
2023-05-30 20:23:04 +08:00
Pxl
d1d0d9e5e8 [Chore](build) adjust some compile diagnostic (#20162) 2023-05-29 19:19:01 +08:00
Pxl
8376e5eefb [Chore](build) add non-virtual-dtor, remove no-embedded-directive/no-zero-length-array (#20118)
add non-virtual-dtor, remove no-embedded-directive/no-zero-length-array
2023-05-29 14:42:47 +08:00
9f8de89659 [refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758)
Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity.

By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed.

This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.
2023-05-29 11:47:31 +08:00
a928b21434 [improvement](exception-safe) sort node is completely exception safe #20041 2023-05-26 18:29:02 +08:00
53ae24912f [vectorized](feature) support partition sort node (#19708) 2023-05-25 11:22:02 +08:00
cf7a74f6ec [fix](memory) query check cancel while waiting for memory in Allocator, and optimize log (#19967)
After the query check process memory exceed limit in Allocator, it will wait up to 5s.
Before, Allocator will not check whether the query is canceled while waiting for memory, this causes the query to not end quickly.
2023-05-24 11:08:48 +08:00
068a32bc49 [Improvement](memory) faststring use Allocator #19762
After the outer catch exception, faststring resize reserve build may throw a memory alloc failure exception from the Allocator.

Currently page body compress will catch memory alloc failure exception
2023-05-18 15:00:49 +08:00
8fd1eb0d1e [minor](hash table) parameterize hash table (#19653) 2023-05-17 09:58:26 +08:00
16f5d3d5b3 [Improvement](memory) new page use Allocator (#19472) 2023-05-16 19:09:17 +08:00
Pxl
4eb2604789 [Bug](function) fix function define of Retention inconsist and change some static_cast to assert cast (#19455)
1. fix function define of `Retention` inconsist, this function return tinyint on `FE` and return uint8 on `BE`
2. make assert_cast support cast to derived
3. change some static cast to assert cast
4. support sum(bool)/avg(bool)
2023-05-15 11:50:02 +08:00
58cb404661 [fix](memory) Allocator throws Exception instead of std::bad_alloc (#19285)
W0505 01:31:25.840227 1727715 scanner_scheduler.cpp:340] Scan thread read VScanner failed: [MEM_LIMIT_EXCEEDED]PreCatch error code:11, [E11] Allocator sys memory check failed: Cannot alloc:16384, consuming tracker:<Orphan>, exec node:<>, process memory used 5.87 GB exceed limit 5.64 GB or sys mem available 252.17 GB less than low water mark 1.60 GB, failed alloc size 16.00 KB.
    @     0x555c19e0cca8  doris::Exception::Exception()
    @     0x555c1c3e0c3f  Allocator<>::sys_memory_check()
    @     0x555c1c3e1052  Allocator<>::memory_check()
    @     0x555c19e0a645  Allocator<>::alloc()
    @     0x555c1c34508b  COWHelper<>::create<>()
    @     0x555c1e23f574  doris::vectorized::ConvertThroughParsing<>::execute<>()
    @     0x555c1e23f209  doris::vectorized::FunctionConvertFromString<>::execute_impl()
    @     0x555c1e23f4aa  doris::vectorized::FunctionConvertFromString<>::execute_impl()
    @     0x555c1e15ac29  doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns()
    @     0x555c1e15ac56  doris::vectorized::PreparedFunctionImpl::execute()
    @     0x555c1e245276  _ZNSt17_Function_handlerIFN5doris6StatusEPNS0_15FunctionContextERNS0_10vectorized5BlockERKSt6vectorImSaImEEmmEZNKS4_12FunctionCast14create_wrapperINS4_14DataTypeNumberIiEEEESt8functionISC_ERKSt10shared_ptrIKNS4_9IDataTypeEEPKT_bEUlS3_S6_SB_mmE_E9_M_invokeERKSt9_Any_dataOS3_S6_SB_OmSY_
    @     0x555c1e2a9341  _ZZNK5doris10vectorized12FunctionCast23prepare_remove_nullableEPNS_15FunctionContextERKSt10shared_ptrIKNS0_9IDataTypeEES9_bENKUlS3_RNS0_5BlockERKSt6vectorImSaImEEmmE_clES3_SB_SG_mm
    @     0x555c1e2a8d42  _ZNSt17_Function_handlerIFN5doris6StatusEPNS0_15FunctionContextERNS0_10vectorized5BlockERKSt6vectorImSaImEEmmEZNKS4_12FunctionCast23prepare_remove_nullableES3_RKSt10shared_ptrIKNS4_9IDataTypeEESJ_bEUlS3_S6_SB_mmE_E9_M_invokeERKSt9_Any_dataOS3_S6_SB_OmSQ_
    @     0x555c1e20e42b  doris::vectorized::PreparedFunctionCast::execute_impl()
    @     0x555c1e15ac29  doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns()
    @     0x555c1e15ac56  doris::vectorized::PreparedFunctionImpl::execute()
    @     0x555c1d63e960  doris::vectorized::IFunctionBase::execute()
    @     0x555c1d628700  doris::vectorized::VCastExpr::execute()
    @     0x555c1d6163e5  doris::vectorized::VExprContext::execute()
    @     0x555c20a83fe1  doris::vectorized::VFileScanner::_convert_to_output_block()
    @     0x555c20a809af  doris::vectorized::VFileScanner::_get_block_impl()
    @     0x555c209b9bc4  doris::vectorized::VScanner::get_block()
    @     0x555c209b1a50  doris::vectorized::ScannerScheduler::_scanner_scan()
    @     0x555c209b2ac1  _ZNSt17_Function_handlerIFvvEZZN5doris10vectorized16ScannerScheduler18_schedule_scannersEPNS2_14ScannerContextEENK3$_0clEvEUlvE1_E9_M_invokeERKSt9_Any_data
    @     0x555c1a8378cf  doris::ThreadPool::dispatch_thread()
    @     0x555c1a830fac  doris::Thread::supervise_thread()
    @     0x7f461faa117a  start_thread
    @     0x7f462033bdf3  __GI___clone
    @              (nil)  (unknown)
2023-05-05 18:01:48 +08:00
e17a171a3c [fix](vertical_compaction) Fix continuous_agg_count PODArray wrong boundary judgment #19187 2023-05-04 14:50:30 +08:00
1379d7f3e0 [fix](memory) mmap threshold can be modified in conf, Increase to 128M 2023-04-28 18:17:22 +08:00
Pxl
ec517a53a8 [Chore](build) upgrade clang-format version to 16 && move thrift to fe-common (#19155)
upgrade clang-format version to 16
move thrift to fe-common
fix core dump on pipeline engine when operator canceled and not prepared
2023-04-28 14:14:51 +08:00
8d7a9fd21b [refactor](exceptionsafe) add factory creator to some class (#18978)
make vexprecontext,vexpr,function,query context,runtimestate thread safe.


---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-24 10:32:11 +08:00
0c95d760fe [fix](fixed_hashtable) The incorrect implementation of copy constructor (#18921) 2023-04-24 08:36:52 +08:00
63a76ed115 [refactor](exceptionsafe) disallow call new method explicitly (#18830)
disallow call new method explicitly
force to use create_shared or create_unique to use shared ptr
placement new is allowed
reference https://abseil.io/tips/42 to add factory method to all class.
I think we should follow this guide because if throw exception in new method, the program will terminate.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-21 09:13:24 +08:00
c4e469c82c [feature](agg) Support spill to disk in aggregation (#18051) 2023-04-20 18:59:08 +08:00