Commit Graph

776 Commits

Author SHA1 Message Date
249b688663 [chore](github) Add a workflow to check BE UT on macOS (#14506) 2022-11-23 08:38:28 +08:00
b04ec41c1d [Vectorized](udaf) fix java-udaf couldn't get jar core dump (#14393)
fix java-udaf couldn't get jar core dump
2022-11-22 20:49:02 +08:00
3e1e8db173 [fix](exec) fix thread token shutdown (#14418)
Fix Thread pool token was shut down error.
This is because when there are more than 1 fragment of a query on one BE, the thread token maybe
reset incorrectly, causing thread token shutdown earlier.
cherry-pick from master
Introduced from #13021
2022-11-20 00:04:48 +08:00
2c42f0a905 [refactor](decimalv3) Refine code for DecimalV3 (#14394) 2022-11-19 16:57:17 +08:00
1f326fc0d6 [enhancement](be)limit mem cost to 16m when pre serialize keys in agg node (#14321)
* [enhancement](be)limit mem cost to 16m when pre serialize keys in agg node

* use only one chunk memory when serializing keys in agg node
2022-11-18 12:31:52 +08:00
bd5a593403 [enhancement](memtracker) Use proc/meminfo MemAvailable to control memory and optimize MemTracker log printing (#14335) 2022-11-17 22:46:07 +08:00
a382bb95e7 [fix](runtimefilter) fix heap-user-after-free of runtime filter merge (#14362) 2022-11-17 19:38:45 +08:00
333c6390ee [fix](be-ut) AddressSanitizer detects container-overflow issues (#14255)
* [chore] Fix the container-overflow errors detected by address sanitizer

* Fix compilation errors
2022-11-15 15:49:55 +08:00
cffdeff4ec [fix](memory) Fix memory leak by calling boost::stacktrace (#14269)
boost::stacktrace::stacktrace() has memory leak, so use glog internal func to print stacktrace.
The reason for the memory leak of boost::stacktrace is that a state is saved in the thread local of each thread but not actively released. The test found that each thread leaked about 100M after calling boost::stacktrace.
refer to:
boostorg/stacktrace#118
boostorg/stacktrace#111
2022-11-15 08:58:57 +08:00
3bc26f773d [hotfix](memtracker) Fix expired DCHECK(_limit != -1); and segment_meta_mem_tracker inelegant end (#14223) 2022-11-13 17:15:29 +08:00
72748c229a update (#14215) 2022-11-13 12:31:42 +08:00
33b50860c7 [improvement](load) release load channel actively when error occurs (#14218) 2022-11-13 12:31:15 +08:00
dd11d5c0a5 [enhancement](memory) Support try catch bad alloc (#14135) 2022-11-13 11:22:56 +08:00
7682c08af0 [improvement](load) reduce memory in batch for small load channels (#14214) 2022-11-12 22:14:01 +08:00
035657c5a1 [typo](comment) Fix a lot of spell errors in be comments (#14208)
fix typos.
2022-11-12 16:06:15 +08:00
12652ebb0e [UDF](java udf) using config to enable java udf instead of macro at compile time (#14062)
* [UDF](java udf) useing config to enable java udf instead of macro at compile time
2022-11-11 09:03:52 +08:00
Pxl
0e26f28bf2 [Enhancement](runtime-filter) enlarge runtime filter in predicate threshold (#13581)
enlarge runtime filter in predicate threshold
2022-11-10 15:48:46 +08:00
10df61b5bf [improvement](join) Share hash table in fragments for broadcast join (#13921) 2022-11-10 09:48:34 +08:00
0b945fe361 [enhancement](memtracker) Refactor mem tracker hierarchy (#13585)
mem tracker can be logically divided into 4 layers: 1)process 2)type 3)query/load/compation task etc. 4)exec node etc.

type includes

enum Type {
        GLOBAL = 0,        // Life cycle is the same as the process, e.g. Cache and default Orphan
        QUERY = 1,         // Count the memory consumption of all Query tasks.
        LOAD = 2,          // Count the memory consumption of all Load tasks.
        COMPACTION = 3,    // Count the memory consumption of all Base and Cumulative tasks.
        SCHEMA_CHANGE = 4, // Count the memory consumption of all SchemaChange tasks.
        CLONE = 5, // Count the memory consumption of all EngineCloneTask. Note: Memory that does not contain make/release snapshots.
        BATCHLOAD = 6,  // Count the memory consumption of all EngineBatchLoadTask.
        CONSISTENCY = 7 // Count the memory consumption of all EngineChecksumTask.
    }
Object pointers are no longer saved between each layer, and the values of process and each type are periodically aggregated.

other fix:

In [fix](memtracker) Fix transmit_tracker null pointer because phamp is not thread safe #13528, I tried to separate the memory that was manually abandoned in the query from the orphan mem tracker. But in the actual test, the accuracy of this part of the memory cannot be guaranteed, so put it back to the orphan mem tracker again.
2022-11-08 09:52:33 +08:00
d1cbaa1de8 [fix](load) fix a bug that reduce memory work on hard limit might be triggered twice (#13967)
When the load mem hard limit reached, all load channel should wait on the lock of LoadChannelMgr, util current reduce mem work finished. In current implementation, there's a bug might cause some threads be woke up before reduce mem work finished:

thread A found that soft limit reached, picked a load channel and waiting for reduce memory work finish.
The memory keep increasing
thread B found that hard limit reached (either the load mem hard limit, or process soft limit), it picked a load channel to reduce memory and set the variable _should_wait_flush to true
thread C found that _should_wait_flush is true, waiting on _wait_flush_cond
thread A finished it's reduce memory work, found that _should_wait_flush is true, set it to false, and notify all threads.
thread C is woke up and pick a load channel to do the reduce memory work, and now thread B's work is not finished.
We can see 2 threads doing reduce memory work when hard limit reached, it's quite confusing.
2022-11-08 00:07:52 +08:00
32fea672b0 [chore](gutil) remove some gutil macros and solve some macro conflict with brpc (#13954)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-11-07 13:39:52 +08:00
c7b2b90504 [fix](memtracker) Fix DCHECK !std::count(_consumer_tracker_stack.begin(), _consumer_tracker_stack.end(), tracker) 2022-11-06 16:41:03 +08:00
27549564a7 [feature](table-valued-function) Support S3 tvf (#13959)
This pr does three things:

1. Modified the framework of table-valued-function(tvf).
2. be support `fetch_table_schema` rpc.
3. Implemented `S3(path, AK, SK, format)` table-valued-function.
2022-11-06 11:04:26 +08:00
2ee7ba79a8 [Improvement](javaudf) improve java loader usage (#13962) 2022-11-05 13:20:04 +08:00
f87be09d69 [fix](load) Fix load channel mgr lock (#13960)
hot fix load channel mgr lock
2022-11-05 00:48:30 +08:00
698541e58d [improvement](exec) add more debug info on fragment exec error (#13899) 2022-11-04 08:55:31 +08:00
32a029d9dc [enhancement](memtracker) Refactor load channel + memtable mem tracker (#13795) 2022-11-03 09:47:12 +08:00
7b4c2cabb4 [feature](new-scan) support transactional insert in new scan framework (#13858)
Support running transactional insert operation with new scan framework. eg:

admin set frontend config("enable_new_load_scan_node" = "true");
begin;
insert into tbl1 values(1,2);
insert into tbl1 values(3,4);
insert into tbl1 values(5,6);
commit;
Add some limitation to transactional insert

Do not support non-literal value in insert stmt
Fix some issue about array type:

Forbid cast other non-array type to NESTED array type, it may cause BE crash.
Add getStringValueForArray() method for Expr, to get valid string-formatted array type value.
Add useLocalSessionState=true in regression-test jdbc url
without this config, the jdbc driver will send some init cmd each time it connect to server, such as
select @@session.tx_read_only.
But when we use transactional insert, after begin command, Doris do not support any other type of
stmt except for insert, commit or rollback.
So adding this config to let the jdbc NOT send cmd when connecting.
2022-11-03 08:36:07 +08:00
7db916fc85 [enhancement](metric)Add metric for exec_state prepare function (#13646)
* add bvar metric for exec_state prepare function
2022-11-01 14:09:47 +08:00
Pxl
711dad28fb [Chore](unused) remove QSorter #13769 2022-10-31 08:44:39 +08:00
9f7c76a0d6 [fix](memtracker) Fix the usage of bthread mem tracker (#13708)
bthead context init has performance loss, temporarily delete it first, it will be completely refactored in #13585.
2022-10-30 19:51:00 +08:00
Pxl
2fab0c45c7 [Feature](runtime-filter) add runtime filter breaking change adapt (#13246)
add runtime filter breaking change adapt
2022-10-28 10:59:28 +08:00
3c95106d45 [Bug](jdbc) Fix memory leak for JDBC datasource (#13657) 2022-10-27 00:02:25 +08:00
0134e9d2f4 [Improvement](runtime filter) Reduce merging time for bloom filter (#13668) 2022-10-27 00:02:05 +08:00
295d887cf5 [improvement](thread) set name for priority thread pool (#13552) 2022-10-26 09:32:15 +08:00
c24e5585c3 [fix](load) clear and notify when an error happens in flushing (#13589) 2022-10-25 13:39:17 +08:00
2cf89c55c2 [chore](macOS) Fix issues found on macOS x86_64 (#13583)
1. Use `brew --prefix` instead of `brew --repo` in scripts.
2. `sprintf` is marked as a deprecated function in MacOSX sdk (13.0).
2022-10-24 20:59:20 +08:00
e5b0ca4e8c [fix](streamload) report exactly error message when be does not receive heartbeat from fe (#13518)
* [fix](streamload) report exactly error message when be does not receive heartbeat from fe

Http service is started before hearbeat from fe, so if a streamload comes before heartbeat
from fe, then be report an error like below because master address is not set.
"Couldn't open transport for :0 (Could not resolve host for client socket.)".
2022-10-24 11:50:24 +08:00
3006b258b0 [Improvement](bloomfilter) allocate memory for BF in open phase (#13494) 2022-10-21 17:37:26 +08:00
ccc04210d6 [feature](jsonb type) functions for cast from and to jsonb datatype (#13379) 2022-10-21 15:18:16 +08:00
9dc5dd382a [enhancement](memtracker) Fix Brpc mem count and refactored thread context macro (#13469) 2022-10-21 12:01:38 +08:00
9a3c1f0867 [Improvement](decimal) print decimal according to the real precision and scale (#13437) 2022-10-21 10:00:01 +08:00
1b0dafcaa1 [Enhancement](load) consider memtable in flush while reducing load me… (#13480)
We should consider memory which are being flushed from memtable to disk when trying to reduce memory by flushing memtable. Otherwise, we might not release memory space as expected. (e.g. lots of large memtable is in flush, the reduce_mem_usage method picks some small memtables to flush, it can't release enough memory and also can generate lots of small segments, which can cause -238 error)
2022-10-21 08:35:35 +08:00
e62d3dd8e5 [opt](function) refactor extract_url to use StringValue (#13508)
change extract_url use stringvalue to repalce std::string to speed up
2022-10-21 08:33:39 +08:00
d2be5096d6 [Revert](mem) revert the mem config cause perfermace degradation (#13526)
* Revert "[fix](mem) failure of allocating memory (#13414)"

This reverts commit 971eb9172f3e925c0b46ec1ffd1a9037a1b49801.

* Revert "[improvement](memory) disable page cache and chunk allocator, optimize memory allocate size (#13285)"

This reverts commit a5f3880649b094b58061f25c15dccdb50a4a2973.
2022-10-21 08:32:16 +08:00
736d113700 [fix](memtracker) Fix transmit_tracker null pointer because phamp is not thread safe #13528 2022-10-21 08:30:30 +08:00
32b1456b28 [feature-wip](array) remove array config and check array nested depth (#13428)
1. remove FE config `enable_array_type`
2. limit the nested depth of array in FE side.
3. Fix bug that when loading array from parquet, the decimal type is treated as bigint
4. Fix loading array from csv(vec-engine), handle null and "null"
5. Change the csv array loading behavior, if the array string format is invalid in csv, it will be converted to null. 
6. Remove `check_array_format()`, because it's logic is wrong and meaningless
7. Add stream load csv test cases and more parquet broker load tests
2022-10-20 15:52:31 +08:00
Pxl
1892e8f66e [Enhancement](scanner) support split avg key range (#13166) 2022-10-20 14:53:16 +08:00
f329d33666 [chore](fix) Fix some spell errors in be's comments. #13452 2022-10-20 08:56:01 +08:00
3a2d5db914 [fix](String) fix string type length set to -1 when load stirng data (#13475)
string type length may set to -1 when create TypeDescriptor from thrift or protobuf, this will cause check limit overflow
2022-10-20 08:45:25 +08:00