Commit Graph

486 Commits

Author SHA1 Message Date
7873bc95a6 [Enhancement](bitmapfilter) Support bitmap filter to apply zone_map index to filter pages (#14635) 2022-12-01 10:41:09 +08:00
Pxl
82da071b45 [Chore](format) update clang-format version to 15 (#13036)
update clang-format version to 15
2022-11-29 14:46:10 +08:00
e1f0fa069c [enhancement](memory) Refactored process memory statistics periodically refresh, and fix catch bad_alloc (#14580) 2022-11-29 10:15:25 +08:00
7513c82431 [NLJoin](conjuncts) separate join conjuncts and general conjuncts (#14608) 2022-11-29 08:55:54 +08:00
ed92a8f81e [feature](jsonb function)change jsonb_extract_string behavior and doc (#14619)
1. change jsonb_extract_string behavior: convert to string instead of NULL if the type of json path is not string
2. move jsonb tutorial doc to JSONB data type
2022-11-28 11:36:54 +08:00
ff197b0fa5 [chore](macOS) Fix linker errors (#14410) 2022-11-21 10:38:36 +08:00
2c42f0a905 [refactor](decimalv3) Refine code for DecimalV3 (#14394) 2022-11-19 16:57:17 +08:00
21416f9947 [enhancement](memory) Support Jemalloc metrics and default allocator changed to Jemalloc (#14384) 2022-11-18 21:02:54 +08:00
2b6f85ab96 [chore](macOS) Fix BE UT (#14307)
#13195 left some unresolved issues. One of them is that some BE unit tests fail.
This PR fixes this issue. Now, we can run the command ./run-be-ut.sh --run successfully on macOS.
2022-11-18 10:13:38 +08:00
bd5a593403 [enhancement](memtracker) Use proc/meminfo MemAvailable to control memory and optimize MemTracker log printing (#14335) 2022-11-17 22:46:07 +08:00
333c6390ee [fix](be-ut) AddressSanitizer detects container-overflow issues (#14255)
* [chore] Fix the container-overflow errors detected by address sanitizer

* Fix compilation errors
2022-11-15 15:49:55 +08:00
cffdeff4ec [fix](memory) Fix memory leak by calling boost::stacktrace (#14269)
boost::stacktrace::stacktrace() has memory leak, so use glog internal func to print stacktrace.
The reason for the memory leak of boost::stacktrace is that a state is saved in the thread local of each thread but not actively released. The test found that each thread leaked about 100M after calling boost::stacktrace.
refer to:
boostorg/stacktrace#118
boostorg/stacktrace#111
2022-11-15 08:58:57 +08:00
035657c5a1 [typo](comment) Fix a lot of spell errors in be comments (#14208)
fix typos.
2022-11-12 16:06:15 +08:00
12652ebb0e [UDF](java udf) using config to enable java udf instead of macro at compile time (#14062)
* [UDF](java udf) useing config to enable java udf instead of macro at compile time
2022-11-11 09:03:52 +08:00
724cf1cdb8 [chore][build] add instructions to build version string (#14067) 2022-11-10 16:23:34 +08:00
291fa499e9 [fix](JSON) Fail to parse JSONPath (libc++) (#13941) 2022-11-09 08:58:01 +08:00
f7ecb6d79f [Bug](Bitmap) fix sub_bitmap calculate wrong result to return null (#13978)
fix sub_bitmap calculate wrong result to return null
2022-11-08 14:10:12 +08:00
0b945fe361 [enhancement](memtracker) Refactor mem tracker hierarchy (#13585)
mem tracker can be logically divided into 4 layers: 1)process 2)type 3)query/load/compation task etc. 4)exec node etc.

type includes

enum Type {
        GLOBAL = 0,        // Life cycle is the same as the process, e.g. Cache and default Orphan
        QUERY = 1,         // Count the memory consumption of all Query tasks.
        LOAD = 2,          // Count the memory consumption of all Load tasks.
        COMPACTION = 3,    // Count the memory consumption of all Base and Cumulative tasks.
        SCHEMA_CHANGE = 4, // Count the memory consumption of all SchemaChange tasks.
        CLONE = 5, // Count the memory consumption of all EngineCloneTask. Note: Memory that does not contain make/release snapshots.
        BATCHLOAD = 6,  // Count the memory consumption of all EngineBatchLoadTask.
        CONSISTENCY = 7 // Count the memory consumption of all EngineChecksumTask.
    }
Object pointers are no longer saved between each layer, and the values of process and each type are periodically aggregated.

other fix:

In [fix](memtracker) Fix transmit_tracker null pointer because phamp is not thread safe #13528, I tried to separate the memory that was manually abandoned in the query from the orphan mem tracker. But in the actual test, the accuracy of this part of the memory cannot be guaranteed, so put it back to the orphan mem tracker again.
2022-11-08 09:52:33 +08:00
95591ce49a [refactor](cv)wait on condition variable more gently (#12620) 2022-11-08 08:40:31 +08:00
32fea672b0 [chore](gutil) remove some gutil macros and solve some macro conflict with brpc (#13954)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-11-07 13:39:52 +08:00
554f566217 [enhancement](compaction) introduce segment compaction (#12609) (#12866)
## Design

### Trigger

Every time when a rowset writer produces more than N (e.g. 10) segments, we trigger segment compaction. Note that only one segment compaction job for a single rowset at a time to ensure no recursing/queuing nightmare.

### Target Selection

We collect segments during every trigger. We skip big segments whose row num > M (e.g. 10000) coz we get little benefits from compacting them comparing our effort. Hence, we only pick the 'Longest Consecutive Small" segment group to do actual compaction.

### Compaction Process

A new thread pool is introduced to help do the job. We submit the above-mentioned 'Longest Consecutive Small" segment group to the pool. Then the worker thread does the followings:

- build a MergeIterator from the target segments
- create a new segment writer
- for each block readed from MergeIterator, the Writer append it

### SegID handling

SegID must remain consecutive after segment compaction. 

If a rowset has small segments named seg_0, seg_1, seg_2, seg_3 and a big segment seg_4:

- we create a segment named "seg_0-3" to save compacted data for seg_0, seg_1, seg_2 and seg_3
- delete seg_0, seg_1, seg_2 and seg_3
- rename seg_0-3 to seg_0
- rename seg_4 to seg_1

It is worth noticing that we should wait inflight segment compaction tasks to finish before building rowset meta and committing this txn.
2022-11-04 14:12:51 +08:00
948e080b31 [minor](error msg) Fix wrong error message (#13950) 2022-11-04 13:49:46 +08:00
5fe3342aa3 [Vectorized](function) support bitmap_to_array function (#13926) 2022-11-03 14:29:28 +08:00
cc0fa5fef6 [fix](array-type) fix the be core dump when import array<largeint> (#13821)
- this pr is used to fix the be core dump when import array.
- before the change, we import array by rapidjson string will core dump under the non-vectorized scenario.
- after the change, we can import array by rapidjson string successfully.
2022-10-31 22:08:55 +08:00
Pxl
57a9b0fa65 [Enhancement](chore) remove unused diagnostic (#12337)
remove unused diagnostic
2022-10-31 19:19:13 +08:00
eab8876abc [Feature](remote) Using heavy schema change if the table is not enable light weight schema change (#13487) 2022-10-28 15:48:22 +08:00
2ef8f3f6f4 [enhancement](java-udf) Support loading libjvm at runtime (#13660) 2022-10-28 08:45:12 +08:00
20363edc73 [BugFix](function) fix reverse function dynamic buffer overflow due to illegal character (#13671)
Previous logic of reverse function might not be strong enough to handle illegal character. For example, one one byte size character would be mistaken as one utf-8 character which occupies more than one byte space. And unfortunately exceeding the buffer space during future process.
2022-10-28 08:44:08 +08:00
f51464af59 [chore](macOS) Support Java UDF (#13714) 2022-10-28 08:40:56 +08:00
bad950136d [chore](build) Pass the compile flag -Wno-unused-but-set-variable on demand (#13716)
There are some issues with the compile flag `-Wno-unused-but-set-variable` for clang.
1. `-Wno-unused-but-set-variable` should be set when building source by clang-15 on Linux. (#13000 #13016)
2. On macOS Monterey, Apple Clang 13 may treat it as a unknown warning option and the compilation process may interrupt.

This PR introduces a better way to make this compile flag more portable.
1. Test whether the compiler recognizes this flag.
2. Add this flag if the compiler recognizes it.
2022-10-27 15:18:28 +08:00
d388de6c11 [Enhancement](threadpool) print thread pool name on error (#13706) 2022-10-27 10:49:18 +08:00
ffcb2f8525 [opt](exec) Replace get_utf8_byte_length function by array (#13664) 2022-10-27 09:46:41 +08:00
3c95106d45 [Bug](jdbc) Fix memory leak for JDBC datasource (#13657) 2022-10-27 00:02:25 +08:00
295d887cf5 [improvement](thread) set name for priority thread pool (#13552) 2022-10-26 09:32:15 +08:00
ccc04210d6 [feature](jsonb type) functions for cast from and to jsonb datatype (#13379) 2022-10-21 15:18:16 +08:00
e62d3dd8e5 [opt](function) refactor extract_url to use StringValue (#13508)
change extract_url use stringvalue to repalce std::string to speed up
2022-10-21 08:33:39 +08:00
d2be5096d6 [Revert](mem) revert the mem config cause perfermace degradation (#13526)
* Revert "[fix](mem) failure of allocating memory (#13414)"

This reverts commit 971eb9172f3e925c0b46ec1ffd1a9037a1b49801.

* Revert "[improvement](memory) disable page cache and chunk allocator, optimize memory allocate size (#13285)"

This reverts commit a5f3880649b094b58061f25c15dccdb50a4a2973.
2022-10-21 08:32:16 +08:00
2b328eafbb [function](string_function) add new string function 'extract_url_parameter' (#13323) 2022-10-20 11:11:43 +08:00
f329d33666 [chore](fix) Fix some spell errors in be's comments. #13452 2022-10-20 08:56:01 +08:00
755a946516 [feature](jsonb) jsonb functions (#13366)
Issue Number: Step3 of DSIP-016: Support JSON type
2022-10-19 08:44:08 +08:00
125def5102 [enhancement](macOS M1) Support building from source on macOS (M1) (#13195)
# Proposed changes

This PR fixed lots of issues when building from source on macOS with Apple M1 chip.

## ATTENTION

The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime:
1. Some errors with memory tracker occur when BE (RELEASE) starts.
2. Some UT cases fail.
...

Temporarily, the following changes are made on macOS to start BE successfully.
1. Disable memory tracker.
2. Use tcmalloc instead of jemalloc.

This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues.

## Use case

```shell
./build.sh -j 8 --be --clean

cd output/be/bin
ulimit -n 60000
./start_be.sh --daemon
```

## Something else

It takes around _**10+**_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the  development experience on macOS greatly when we finish the adaptation job.
2022-10-18 13:10:13 +08:00
a5f3880649 [improvement](memory) disable page cache and chunk allocator, optimize memory allocate size (#13285)
disable page cache by default
disable chunk allocator by default
not use chunk allocator for vectorized allocator by default
add a new config memory_linear_growth_threshold = 128Mb, not allocate memory by RoundUpToPowerOf2 if the allocated size is larger than this threshold. This config is added to MemPool, ChunkAllocator, PodArray, Arena.
2022-10-15 17:27:17 +08:00
de4315c1c5 [feature](function) support initcap string function (#13193)
support `initcap` string function
2022-10-13 21:31:44 +08:00
9b590ac4cb [improvement](olap) cache value of has_null in ColumnNullable (#13289) 2022-10-13 09:12:02 +08:00
bb4414e303 [feature-wip](multi-catalog) optimize parquet profile & add null map timer (#13257)
Use indentation to make `ParquetReader`'s profile more readable
Add `ParquetReader.DecodeNullMapTime` to show the time of parsing `NullMap` for `NullableColumn`

```
VFILE_SCAN_NODE  (id=0):(Active:  279.62ms,  %  non-child:  85.83%)
    -  FileReadBytes:  2.36  MB
    -  FileReadCalls:  20
    -  FileReadTime:  5.686ms
    -  MaxScannerThreadNum:  1
    -  NewlyCreateFreeBlocksNum:  125
    -  NumScanners:  1
    -  ParquetReader:  0ns
        -  ColumnReadTime:  259.946ms
        -  DecodeDictTime:  0ns
        -  DecodeHeaderTime:  437.707us
        -  DecodeLevelTime:  30.101us
        -  DecodeNullMapTime:  53.295ms
        -  DecodeValueTime:  62.607ms
        -  DecompressCount:  511
        -  DecompressTime:  1.159ms
        -  FilteredBytes:  0.00  
        -  FilteredGroups:  0
        -  FilteredRowsByGroup:  0
        -  FilteredRowsByPage:  0
        -  ParseMetaTime:  22.517ms
        -  ReadBytes:  2.36  MB
        -  ReadGroups:  20
```
2022-10-12 16:51:06 +08:00
1bd14f1d82 [feature-wip](jsonb) jsonb parse function and load (#13129)
add function to parse json string to jsonb format and use it to support stream load.
2022-10-12 13:56:37 +08:00
Pxl
bdcb600f3d [Bug](load) fix core dump on big block load (#13014) 2022-10-10 12:38:32 +08:00
Pxl
245490d6b7 [Enhancement](runtime filter) optimize for runtime filter (#12856)
optimize for runtime filter
2022-10-09 14:11:03 +08:00
b14b178928 [enhancement](memory) Trigger load channel flush based on process physical memory to avoid OOM #12960
When the physical memory of the process reaches 90% of the mem limit, trigger the load channel mgr to brush down
The default value of be.conf mem_limit is changed from 90% to 80%, and stability is the priority.
Fix deadlock in arena_locks in BufferPool::BufferAllocator::ScavengeBuffers and _lock in DebugString
2022-09-27 09:07:38 +08:00
Pxl
8731eea26e [Chore](clang) fix some build fail on clang15 (#12882)
remove unused variables
2022-09-26 23:13:28 +08:00