Commit Graph

38 Commits

Author SHA1 Message Date
Pxl
a15a0b9193 [Chore](build) use file(GLOB_RECURSE xxx CONFIGURE_DEPENDS) to replace set cpp (#20461)
use file(GLOB_RECURSE xxx CONFIGURE_DEPENDS) to replace set cpp
2023-06-08 19:36:21 +08:00
ae352997b4 [Enhancement](alter inverted index) Improve alter inverted index performance with light weight add or drop inverted index (#19063) 2023-05-28 11:23:07 +08:00
943e5fb7e5 [improvement](MOW) use seperated cache for mow pk cache (#19686)
In mow, primary key cache have a big impact on load performance, so we add a new cache type to seperate
it from page cache to make it more flexible in some cases
2023-05-18 13:27:09 +08:00
a05dbd3f81 [chore](compile) Improves PCH cache hit ratio (#19469)
Supplement the documentation of be-clion-dev, avoid the problem of undefined DORIS_JAVA_HOME and inability to find jni.h when using clion development without directly compiling through build.sh
Complete the classification of header files in pch.h and introduce some header files that are not frequently modified in doris.
Separate the declaration and definition in common/config.h. If you need to modify the default configuration now, please modify it in common/config.cpp.
gen_cpp/version.h is regenerated every time it is recompiled, which may cause PCH to fail, so now you need to get the version information indirectly rather than directly.
2023-05-10 12:49:01 +08:00
16a394da0e [chore](build) Use include-what-you-use to optimize includes (PART III) (#18958)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-24 14:51:51 +08:00
8d7a9fd21b [refactor](exceptionsafe) add factory creator to some class (#18978)
make vexprecontext,vexpr,function,query context,runtimestate thread safe.


---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-24 10:32:11 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
Pxl
c9b4eaea76 [Chore](storage) change FieldType to enum class #18500 2023-04-10 08:53:44 +08:00
d9fe5f7b67 [enhancement](memory) Remove MemPool and replace it with Arena (#17820)
Arena can replace MemPool in most scenarios. Except for memory reuse, MemPool supports reuse of previous memory chunks after clear, but Arena does not.

Some comparisons between MemPool and Arena:

 1. Expansion
     Arena is less than 128M index 2 alloc chunk; more than 128M memory, allocate 128M * n > `size`, n is equal to the minimum value that satisfies the expression;
     MemPool less than 512K index 2 alloc chunk, greater than 512K memory, separately apply for a `size` length chunk
     
     After Arena applied for a chunk larger than 128M last time, the minimum chunk applied for after that is 128M. Does this seem to be a waste of memory? MemPool is also similar. After the chunk of 512K was applied for last time, the minimum chunk of subsequent applications is 512K.

 2. Alignment
     MemPool defaults to 16 alignment, because memtable and other places that use int128 require 16 alignment;
     Arena has no default alignment;

 3. Memory reuse
     Arena only supports `rollback`, which reuses the memory of the current chunk, usually the memory requested last time.
     MemPool supports clear(), all chunks can be reused; or call ReturnPartialAllocation() to roll back the last requested memory; if the last chunk has no memory, search for the most free chunk for allocation

 4. Realloc
     Arena supports realloc contiguous memory; it also supports realloc contiguous memory from any position at the time of the last allocation. The difference between `alloc_continue` and `realloc` is:
         1. Alloc_continue does not need to specify the old size, but the default old size = head->pos - range_start
         2. alloc_continue supports expansion from range_start when additional_bytes is between head and pos, which is equivalent to reusing a part of memory, while realloc completely allocates a new memory
     MemPool does not support realloc, but supports transferring or absorbing chunks between two MemPools

 5. check mem limit
     MemPool checks the mem limit, and Arena checks at the Allocator layer.

 6. Support for ASAN
     Arena does something extra

 7. Error handling
     MemPool supports returning the error message of application failure directly through `Status`, and Arena throws Exception.
Tests that Arena can consider

 1. After the last applied chunk is larger than 128M, the minimum applied chunk is 128M, which seems to waste memory;

 2. Support clear, memory multiplexing;

 3. Increase the large list, alloc the memory larger than 128M, and the size is equal to `size`, so as to avoid the current chunk not being fully used, which is wasteful.

 4. In some cases, it may be possible to allocate backwards to find chunks t
2023-03-29 20:56:49 +08:00
5191b4f473 [fix](ut)support run be-ut on release mode (#18119)
Fixed improper usage. So now be ut could be run on release mode.
btw, split be build type environment variable to be/be-ut.
2023-03-27 23:00:03 +08:00
77ab2fac20 [refactor](functioncontext) remove function context impl class (#17715)
* [refactor](functioncontext) remove function context impl class


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-03-14 11:21:45 +08:00
9477c48ef8 [refactor](functioncontext) remove duplicate type definition in function context (#17421)
remove duplicate type definition in function context
remove unused method in function context
not need stale state in vexpr context because vexpr is stateless and function context saves state and they are cloned.
remove useless slot_size in all tuple or slot descriptor.
remove doris_udf namespace, it is useless.
remove some unused macro definitions.
init v_conjuncts in vscanner, not need write the same code in every scanner.
using unique ptr to manage function context since it could only belong to a single expr context.
Issue Number: close #xxx
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-06 16:07:09 +08:00
17f4990bd3 [enhancement](functioncontext) function context should use shared ptr and simply function context (#17311)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-02 16:23:54 +08:00
Pxl
5e4bb98900 [Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290)
enable -Wpedantic and update lowest gcc version to 11.1
2023-02-03 11:28:48 +08:00
e49766483e [refactor](remove unused code) remove many xxxVal structure (#16143)
remove many xxxVal structure
remove BetaRowsetWriter::_add_row
remove anyval_util.cpp
remove non-vectorized geo functions
remove non-vectorized like predicate
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-28 14:17:43 +08:00
adb758dcac [refactor](remove non vec code) remove json functions string functions match functions and some code (#16141)
remove json functions code
remove string functions code
remove math functions code
move MatchPredicate to olap since it is only used in storage predicate process
remove some code in tuple, Tuple structure should be removed in the future.
remove many code in collection value structure, they are useless
2023-01-26 16:21:12 +08:00
0b5e71d3b4 [refactor](refactor field) remove unused method (#16068) 2023-01-19 10:16:09 +08:00
e579530c99 [Feature-WIP](inverted index) support use inverted index searcher cache (#16003)
use inverted index searcher cache to improve query performance

dependency pr: #14211 #15807 #15823
2023-01-18 09:30:55 +08:00
97fcad76f8 [enhancement](memtracker) Improve readability (#15716) 2023-01-16 16:30:35 +08:00
a98636a970 [bugfix](from_unixtime) fix timezone not work for from_unixtime (#15298)
* [bugfix](from_unixtime) fix timezone not work for from_unixtime
2022-12-23 19:05:09 +08:00
94a6ffb906 [feature](compaction) support vertical_compaction & ordered_data_compaction (#14524) 2022-12-01 22:15:41 +08:00
2b6f85ab96 [chore](macOS) Fix BE UT (#14307)
#13195 left some unresolved issues. One of them is that some BE unit tests fail.
This PR fixes this issue. Now, we can run the command ./run-be-ut.sh --run successfully on macOS.
2022-11-18 10:13:38 +08:00
dd11d5c0a5 [enhancement](memory) Support try catch bad alloc (#14135) 2022-11-13 11:22:56 +08:00
035657c5a1 [typo](comment) Fix a lot of spell errors in be comments (#14208)
fix typos.
2022-11-12 16:06:15 +08:00
0b945fe361 [enhancement](memtracker) Refactor mem tracker hierarchy (#13585)
mem tracker can be logically divided into 4 layers: 1)process 2)type 3)query/load/compation task etc. 4)exec node etc.

type includes

enum Type {
        GLOBAL = 0,        // Life cycle is the same as the process, e.g. Cache and default Orphan
        QUERY = 1,         // Count the memory consumption of all Query tasks.
        LOAD = 2,          // Count the memory consumption of all Load tasks.
        COMPACTION = 3,    // Count the memory consumption of all Base and Cumulative tasks.
        SCHEMA_CHANGE = 4, // Count the memory consumption of all SchemaChange tasks.
        CLONE = 5, // Count the memory consumption of all EngineCloneTask. Note: Memory that does not contain make/release snapshots.
        BATCHLOAD = 6,  // Count the memory consumption of all EngineBatchLoadTask.
        CONSISTENCY = 7 // Count the memory consumption of all EngineChecksumTask.
    }
Object pointers are no longer saved between each layer, and the values of process and each type are periodically aggregated.

other fix:

In [fix](memtracker) Fix transmit_tracker null pointer because phamp is not thread safe #13528, I tried to separate the memory that was manually abandoned in the query from the orphan mem tracker. But in the actual test, the accuracy of this part of the memory cannot be guaranteed, so put it back to the orphan mem tracker again.
2022-11-08 09:52:33 +08:00
9dc5dd382a [enhancement](memtracker) Fix Brpc mem count and refactored thread context macro (#13469) 2022-10-21 12:01:38 +08:00
125def5102 [enhancement](macOS M1) Support building from source on macOS (M1) (#13195)
# Proposed changes

This PR fixed lots of issues when building from source on macOS with Apple M1 chip.

## ATTENTION

The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime:
1. Some errors with memory tracker occur when BE (RELEASE) starts.
2. Some UT cases fail.
...

Temporarily, the following changes are made on macOS to start BE successfully.
1. Disable memory tracker.
2. Use tcmalloc instead of jemalloc.

This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues.

## Use case

```shell
./build.sh -j 8 --be --clean

cd output/be/bin
ulimit -n 60000
./start_be.sh --daemon
```

## Something else

It takes around _**10+**_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the  development experience on macOS greatly when we finish the adaptation job.
2022-10-18 13:10:13 +08:00
87a6b1a13b [enhancement](memtracker) Fix bthread local consume mem tracker (#13368)
Previously, bthread_getspecific was called every time bthread local was used. In the test at #10823, it was found that frequent calls to bthread_getspecific had performance problems.

So a cache is implemented on pthread local based on the btls key, but the btls key cannot correctly sense bthread switching.

So, based on bthread_self to get the bthread id to implement the cache.
2022-10-17 18:31:07 +08:00
b41eaa5ac0 [fix](memtracker) Introduce orphan mem tracker to verify memory tracking accuracy (#12794)
The mem hook consumes the orphan tracker by default. If the thread does not attach other trackers, by default all consumption will be passed to the process tracker through the orphan tracker.

In real time, consumption of all other trackers + orphan tracker consumption = process tracker consumption.

Ideally, all threads are expected to attach to the specified tracker, so that "all memory has its own ownership", and the consumption of the orphan mem tracker is close to 0, but greater than 0.
2022-09-21 15:47:10 +08:00
db07e51cd3 [refactor](status) Refactor status handling in agent task (#11940)
Refactor TaggableLogger
Refactor status handling in agent task:
Unify log format in TaskWorkerPool
Pass Status to the top caller, and replace some OLAPInternalError with more detailed error message Status
Premature return with the opposite condition to reduce indention
2022-08-29 12:06:01 +08:00
5c1cd058f2 [Feature] Add interface to check tablet segment lost (#10711)
Co-authored-by: weizuo <weizuo@xiaomi.com>
2022-08-02 09:40:04 +08:00
b35daf0a04 [improvement](light-schema-change) Support tablet schema cache (#11131) 2022-08-01 12:18:00 +08:00
73d8f5901d fix mem tracker limiter (#11376) 2022-08-01 09:44:04 +08:00
4960043f5e [enhancement] Refactor to improve the usability of MemTracker (step2) (#10823) 2022-07-21 17:11:28 +08:00
3b9cb524bc [feature-wip](unique-key-merge-on-write) Add rowset tree, based on interval-tree, DSIP-018[3/3] (#10714)
* port from rowset-tree from kudu

* use shared_ptr

* some update

* add mock rowset

* some compatibility update

* fix ut fail

* reformat code
2022-07-11 23:10:38 +08:00
Pxl
fd0bd395ac [Enhancement] Remove some unused include (#10035) 2022-06-17 10:47:25 +08:00
2a11a4ab99 [feature-wip][array-type] Support more sub types. (#9466)
Please refer to #9465
2022-05-26 08:41:34 +08:00
5a44eeaf62 [refactor] Unify all unit tests into one binary file (#8958)
1. solved the previous delayed unit test file size is too large (1.7G+) and the unit test link time is too long problem problems
2. Unify all unit tests into one file to significantly reduce unit test execution time to less than 3 mins
3. temporarily disable stream_load_test.cpp, metrics_action_test.cpp, load_channel_mgr_test.cpp because it will re-implement part of the code and affect other tests
2022-04-12 15:30:40 +08:00