Commit Graph

1082 Commits

Author SHA1 Message Date
ab8125d56f [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema (#20037)
* [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema

1. When the system is under high-concurrency load with wide table point queries, the frequent memory allocation and deallocation of Schema become evident system bottlenecks. Additionally, the initialization of TabletSchema and Schema also becomes a CPU hotspot.Therefore, the introduction of a SchemaCache is implemented to cache these resources for reuse.

2. Make some variables wrapped with std::unique<unique_ptr>

Performance:
| 状态              | QPS | 平均响应时间 (avg) | P99 响应时间 |
|------------------|-----|------------------|-------------|
| 开启 SchemaCache | 501 | 20ms             | 34ms        |
| 关闭 SchemaCache | 321 | 31ms             | 61ms        |

* handle schema change with schema version

* remove useless header

* rebase
2023-05-29 17:34:53 +08:00
9f8de89659 [refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758)
Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity.

By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed.

This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.
2023-05-29 11:47:31 +08:00
e0d9f7f955 [enhancement](load) add some profile items for load (#20141) 2023-05-29 09:54:03 +08:00
9d44918036 [Improve](data-type) Clean datatype uselesscode (#20145)
* fix struct_export out data

* delete useless code with data type
2023-05-28 20:48:29 +08:00
ae352997b4 [Enhancement](alter inverted index) Improve alter inverted index performance with light weight add or drop inverted index (#19063) 2023-05-28 11:23:07 +08:00
93933308e6 [Feature-WIP](CCR): Add ccr doris interface (WIP) (#17881) 2023-05-26 23:40:49 +08:00
ee34b6de2d [Refact] (serde) refact mysql serde with data type (#19543)
refact mysql output (de)serialize with data type serde , avoid accoriding switch case Primitive type writed in mysqlWriter
2023-05-26 14:11:17 +08:00
Pxl
15a7420661 [Chore](ub) fix some undefined behaviors (#19986)
/home/zcp/repo_center/doris_master/doris/be/src/olap/rowset/segment_v2/column_reader.cpp:895:21: runtime error: load of value 423208544, which is not a valid value for type 'doris::ReaderType'

/home/zcp/repo_center/doris_master/doris/be/src/vec/columns/column_decimal.cpp:260:33: runtime error: load of misaligned address 0x7fa3348b301c for type 'int64_t' (aka 'long'), which requires 8 byte alignment

/home/zcp/repo_center/doris_master/doris/be/src/olap/block_column_predicate.cpp:82:24: runtime error: variable length array bound evaluates to non-positive value 0

/home/zcp/repo_center/doris_master/doris/be/src/vec/columns/column_string.h:225:26: runtime error: null pointer passed as argument 2, which is declared to never be null
2023-05-26 14:08:40 +08:00
e5eed53b89 [improvement](bitmap) Use shared_ptr in BitmapValue to avoid deep copying (#19101)
Currently bitmapvalue type is copied between columns, it cost a lot of memory. Use a shared ptr in bitmap value to avoid copy data.
2023-05-24 16:13:01 +08:00
a6bd014b8a [FIX](serde)pb ut is not stable #19907 2023-05-22 09:03:02 +08:00
1c950d6930 [fix](config) fix memory config enable_query_memroy_overcommit spell problem #19898 2023-05-22 00:32:20 +08:00
fd4fa5c64e [Optimize](row store) optimize serialization and deserialization (#19691)
1. Get DataTypeSerde in advance to avoid get temporary DataTypeSerde iterate each column
2. Iterate the original row once is enoungh for deserializing by introducing a map for record the index of each column's unique id
2023-05-18 16:22:38 +08:00
294599ee45 [feature](jsonb) rename JSONB type name and function name to JSON (#19774)
To be more compatible with MySQL, rename JSONB type name and function name to JSON.

The old JSONB type name and jsonb_xx function can still be used for backward compatibility.

There is a function jsonb_extract remained since json_extract is used by json string function and more work need to change it. It will be changed further.
2023-05-18 16:16:52 +08:00
068a32bc49 [Improvement](memory) faststring use Allocator #19762
After the outer catch exception, faststring resize reserve build may throw a memory alloc failure exception from the Allocator.

Currently page body compress will catch memory alloc failure exception
2023-05-18 15:00:49 +08:00
943e5fb7e5 [improvement](MOW) use seperated cache for mow pk cache (#19686)
In mow, primary key cache have a big impact on load performance, so we add a new cache type to seperate
it from page cache to make it more flexible in some cases
2023-05-18 13:27:09 +08:00
79d30cfe46 [feature](compact) Duplicate with no keys tables compaction coredump (#19490)
Co-authored-by: yuxianbing <yuxianbing@yy.com>
2023-05-17 22:22:14 +08:00
16f5d3d5b3 [Improvement](memory) new page use Allocator (#19472) 2023-05-16 19:09:17 +08:00
fad9237d30 [fix](storage) consider file size on page cache key (#19619)
The core is due to a DCHECK:

F0513 22:48:56.059758 3996895 tablet.cpp:2690] Check failed: num_to_read == num_read
Finally, we found that the DCHECK failure is due to page cache:

1. At first we have 20 segments, which id is 0-19.
2. For MoW table, memtable flush process will calculate the delete bitmap. In this procedure, the index pages and data pages of PrimaryKeyIndex is loaded to cache
3. Segment compaction compact all these 10 segments to 2 segment, and rename it to id 0,1
4. Finally, before the load commit, we'll calculate delete bitmap between segments in current rowset. This procedure need to iterator primary key index of each segments, but when we access data of new compacted segments, we read data of old segments in page cache
To fix this issue, the best policy is:

1. Add a crc32 or last modified time to CacheKey.
2. Or invalid related cache keys after segment compaction.
For policy 1, we don't have crc32 in segment footer, and getting the last-modified-time needs to perform 1 additional disk IO.
For policy 2, we need to add additional page cache invalidation methods, which may cause the page cache not stable

So I think we can simply add a file size to identify that the file is changed.
In LSM-Tree, all modification will generate new files, such file-name reuse is not normal case(as far as I know, only segment compaction), file size is enough to identify the file change.
2023-05-15 17:16:31 +08:00
f8ef25bb10 [enhancement](load) lazy-open necessary partitions when load (#18874) 2023-05-14 16:09:55 +08:00
Pxl
9b7a419aed [Chore](build) update some doc about build enviroment (#19325)
update some doc about build enviroment
2023-05-10 16:18:44 +08:00
a05dbd3f81 [chore](compile) Improves PCH cache hit ratio (#19469)
Supplement the documentation of be-clion-dev, avoid the problem of undefined DORIS_JAVA_HOME and inability to find jni.h when using clion development without directly compiling through build.sh
Complete the classification of header files in pch.h and introduce some header files that are not frequently modified in doris.
Separate the declaration and definition in common/config.h. If you need to modify the default configuration now, please modify it in common/config.cpp.
gen_cpp/version.h is regenerated every time it is recompiled, which may cause PCH to fail, so now you need to get the version information indirectly rather than directly.
2023-05-10 12:49:01 +08:00
b2371c1246 [Refact](Literal)refact literal get field and value (#19351) 2023-05-10 09:01:17 +08:00
Pxl
dfad7b6b38 [Feature](generic-aggregation) some prowork of generic aggregation (#19343)
some prowork of generic aggregation
2023-05-09 21:42:21 +08:00
4c6ca88088 Revert "[refactor](function) ignore DST for function from_unixtime (#19151)" (#19333)
This reverts commit 9dd6c8f87b73db238bfd38fb1d76f3796910f398.
2023-05-06 16:33:58 +08:00
9dd6c8f87b [refactor](function) ignore DST for function from_unixtime (#19151) 2023-05-05 11:51:49 +08:00
9813406757 [Enhancement](HttpServer) Add http interface authentication for BE (#17753) 2023-05-04 23:46:49 +08:00
e9a4cbcdf9 [Refact](type system) refact column with arrow serde (#19091)
* refact arrow serde

* add date serde

* update arrow and fix nullable and date type
2023-05-04 15:28:46 +08:00
b0c215e694 [enhance](be)add more profile in prefetched buffered reader (#19119) 2023-05-02 09:53:39 +08:00
aef9355cd3 [feature-wip](partial update) PART1: support basic partial write (#17542) 2023-04-28 17:17:57 +08:00
65a82a0b57 [opt](FileReader) turn off prefetch data in parquet page reader when using MergeRangeFileReader (#19102)
Using both `MergeRangeFileReader` and `BufferedStreamReader` simultaneously would waste a lot of memory,
so turn off prefetch data in `BufferedStreamReader` when using MergeRangeFileReader.
2023-04-28 09:27:56 +08:00
6eb12640a1 [fix](segment_iter) do not init segment_iterator twice (#18337)
* [fix](segment_iter) do not init segment_iterator twice

SegmentIterator::init is called by Segment::new_iterator and
BetaRowsetReader::get_segment_iterators twice.
2023-04-27 09:51:57 +08:00
a32fa219ec Revert "[Enhancement](compaction) stop tablet compaction when table dropped (#18702)" (#19086)
This reverts commit 296b0c92f702675b92eee3c8af219f3862802fb2.

we can use drop table force stmt to fast drop tablets, no need to check tablet dropped state in every report

Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2023-04-26 18:27:46 +08:00
1be5dac036 [improve] Refactor file cache and Improve the file cache strategy (#18652)
1. Refactor file cache. Before refactor, the file cache config format is "[{"path":"/path/to/file_cache","normal":21474836480,"persistent":10737418240,"query_limit":10737418240}]" and now change to "[{"path":"/mnt/disk3/selectdb_cloud/file_cache","total_size":21474836480,"query_limit":10737418240}]". It will be simpler than before.
2. Support more strategy. Support file cache priority. The file cache will have three queue,  name as 'index'/'normal'/'disposable'. We can avoid that the higher priority data is eliminate by the lower priority data.
2023-04-25 23:14:28 +08:00
339d804ec4 [Refactor](exceptionsafe) add factory creator to some class (#19000) 2023-04-25 14:33:47 +08:00
16a394da0e [chore](build) Use include-what-you-use to optimize includes (PART III) (#18958)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-24 14:51:51 +08:00
296b0c92f7 [Enhancement](compaction) stop tablet compaction when table dropped (#18702)
* [Enhancement](compaction) stop tablet compaction when table dropped

* fix be ut
2023-04-24 11:04:27 +08:00
8d7a9fd21b [refactor](exceptionsafe) add factory creator to some class (#18978)
make vexprecontext,vexpr,function,query context,runtimestate thread safe.


---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-24 10:32:11 +08:00
c3baa65de3 [feature](io) enable s3 file writer with multi part uploading concurrently (#17585)
Formerly S3FileWriter has to write each buffer with 5MB or more then upload one part, after all these works are done it could then process the incoming data, it's blocking and inefficient. This pr brings one bufferpool where the data could write into memory buffer immediately if has free buffer and then it would be uploaded into the S3.
This pr doesn't provide the ability to elegantly support cases where there is no free buffer, i'll leave it as one future work.
2023-04-23 23:19:44 +08:00
3736530585 [refactor](query context) rename query fragments context to query context and make query context safe (#18950)
* [refactor](query context) rename query fragments context to query context and make query context safe

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-23 22:53:56 +08:00
29f502380c [opt](FileReader) merge small IO to optimize read performace (#18796)
Add `MergeRangeFileReader` to merge small IO to optimize parquet&orc read performance.

`MergeRangeFileReader` is a FileReader that efficiently supports random access in format like parquet and orc.
In order to merge small IO in parquet and orc, the random access ranges should be generated when creating the 
reader. The random access ranges is a list of ranges that order by offset.
The range in random access ranges should be reading sequentially, can be skipped, but can't be read repeatedly.
When calling read_at, if the start offset located in random access ranges, the slice size should not span two ranges.

For example, in parquet, the random access ranges is the column offsets in a row group.

When reading at offset, if [offset, offset + 8MB) contains many random access ranges,
the reader will read data in [offset, offset + 8MB) as a whole, and copy the data in random access ranges into small 
buffers(name as box, default 1MB, 64MB in total). A box can be occupied by many ranges,
and use a reference counter to record how many ranges are cached in the box. If reference counter equals zero,
the box can be release or reused by other ranges. When there is no empty box for a new read operation,
the read operation will do directly.

## Effects
The runtime of ClickBench reduces from 102s to 77s, and the runtime of Query 24 reduces from 24.74s to 9.45s.
The profile of Query 24:
```
 VFILE_SCAN_NODE  (id=0):(Active:  8s344ms,  %  non-child:  83.06%)
    -  FileReadBytes:  534.46  MB
    -  FileReadCalls:  1.031K  (1031)
    -  FileReadTime:  28s801ms
    -  GetNextTime:  8s304ms
    -  MaxScannerThreadNum:  12
    -  MergedSmallIO:  0ns
        -  CopyTime:  157.774ms
        -  MergedBytes:  549.91  MB
        -  MergedIO:  94
        -  ReadTime:  28s642ms
        -  RequestBytes:  507.96  MB
        -  RequestIO:  1.001K  (1001)
    -  NumScanners:  18
```
1001 request IOs has been merged into 94 IOs.

## Remaining problems
1. Add p2 regression test in nest PR
2. Profiles are scattered in various codes and will be refactored in the next PR
3. Support ORC reader
2023-04-23 10:51:38 +08:00
1ffd34f6f1 [Refact](type system)refact interconversion for jsonb with column (#18819)
* refact jsonb to column

* update

* fix format

* fixed

* fix file head for compile
2023-04-22 14:01:05 +08:00
63a76ed115 [refactor](exceptionsafe) disallow call new method explicitly (#18830)
disallow call new method explicitly
force to use create_shared or create_unique to use shared ptr
placement new is allowed
reference https://abseil.io/tips/42 to add factory method to all class.
I think we should follow this guide because if throw exception in new method, the program will terminate.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-21 09:13:24 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
fb377a9da9 [Improvement](functions)Optimized some datetime function's return value (#18369) 2023-04-19 15:51:11 +08:00
564446e52f [Refact](type system) refact serde for type system and pb serde impl (#18627) 2023-04-18 14:13:56 +08:00
042cf2a1bf [enhancement](ut) add ut for buffered reader (#18667) 2023-04-16 18:08:22 +08:00
ecb22ad35e [chore](proto) modify the order of store_row_column and is_dynamic_schema to be compatible with branch-1.2-lts (#18232) 2023-04-12 11:59:56 +08:00
Pxl
297764b37d [Chore](build) fix some compile fail on gnu20 && remove some unused compatibility codes (#18467) 2023-04-10 18:05:52 +08:00
012a261f69 [FIX](complex-type) fixed complex type with create_column_const_with_default_value #18463 2023-04-10 14:11:15 +08:00
ea47a6ae59 [fix](hdfs) not setting hadoop username when kerberos enabled (#18485)
1. If we set hadoop user property along with kerberos info, the authentication will fail.
2. fix some minor issue of local fs, follow up #18397
3. Add KW_HOSTNAME to keywords region, follow up #17329
4. Fix tvf not working with pipeline engine, follow up #18376
2023-04-10 09:32:27 +08:00