Commit Graph

4510 Commits

Author SHA1 Message Date
f32deb18e9 [Update](build) change clucene from thirdparty to git module (#19352) 2023-05-19 08:25:51 +08:00
3d6a13605d [improvement](stacktrace) do not captute stack trace for txn error codes (#19817) 2023-05-18 23:58:56 +08:00
481e9aebdb [Refactor](spark load) remove parquet scanner (#19251) 2023-05-18 19:19:13 +08:00
ef0657c072 [Bug](pipeline) RegressionTest failed release resouce cause DCHECK failed (#19783)
RegressionTest failed release resouce cause DCHECK failed
2023-05-18 18:57:25 +08:00
e242d7dfcc [refactor-WIP](TaskWorkerPool) add DropTableTaskPool for DROP_TABLE task (#19793) 2023-05-18 18:25:13 +08:00
07bbf741fb [enhence](memory) gc inverted index cache when there is not enough memory (#19622)
Support to gc inverted index cache when there is not enough memory.
previous problem: The inverted index cache (InvertedIndexSearcherCache and InvertedIndexQueryCache) may use 20% memory which can't be released.
2023-05-18 16:41:51 +08:00
fd4fa5c64e [Optimize](row store) optimize serialization and deserialization (#19691)
1. Get DataTypeSerde in advance to avoid get temporary DataTypeSerde iterate each column
2. Iterate the original row once is enoungh for deserializing by introducing a map for record the index of each column's unique id
2023-05-18 16:22:38 +08:00
294599ee45 [feature](jsonb) rename JSONB type name and function name to JSON (#19774)
To be more compatible with MySQL, rename JSONB type name and function name to JSON.

The old JSONB type name and jsonb_xx function can still be used for backward compatibility.

There is a function jsonb_extract remained since json_extract is used by json string function and more work need to change it. It will be changed further.
2023-05-18 16:16:52 +08:00
068a32bc49 [Improvement](memory) faststring use Allocator #19762
After the outer catch exception, faststring resize reserve build may throw a memory alloc failure exception from the Allocator.

Currently page body compress will catch memory alloc failure exception
2023-05-18 15:00:49 +08:00
7c8b7878cd [fix](memory) Print all query/load memory before memory GC when memory_debug=true (#19720) 2023-05-18 14:55:47 +08:00
303bee6fa3 [Fix](single replica load) add inverted index copy for single replica load (#19663)
* [Fix](single replica load) add inverted index copy for single replica load
2023-05-18 14:13:41 +08:00
851886cc18 [minor](datev2) remove datev2 because datev2 is used by default (#19777) 2023-05-18 13:36:11 +08:00
943e5fb7e5 [improvement](MOW) use seperated cache for mow pk cache (#19686)
In mow, primary key cache have a big impact on load performance, so we add a new cache type to seperate
it from page cache to make it more flexible in some cases
2023-05-18 13:27:09 +08:00
62458ed0f4 [enhancement](compaction) not core when init failed (#19754) 2023-05-18 12:06:22 +08:00
6a5b590873 [refactor-WIP](TaskWorkerPool) add CreateTableTaskPool class for CREATE_TABLE task (#19734) 2023-05-18 11:43:09 +08:00
f412aec187 [improvement](load)disable shrink memory by default (#19714)
disable shrink memory by default, it becomes very slow when importing large amounts of data
you can turn on If you think it's necessary
2023-05-18 11:25:39 +08:00
fe42e52851 [pipeline](CTE) Support multi stream data sink in pipeline (#19519) 2023-05-18 10:34:37 +08:00
88ca4f3e6b [feature](like) make like regexp used as a sql function (#19755) 2023-05-18 10:03:12 +08:00
d5d47703fe [fix](memory) remove auto option in memory config and optimize memtracker logs #19706
fix mem_limit default value
memory_gc_sleep_time_s to memory_gc_sleep_time_ms
LoadChannelMgr::_handle_mem_exceed_limit process_mem_limit to process soft mem limit
fix query mem tracker print
2023-05-18 08:54:03 +08:00
cfab124ddd [Chore](inverted index) change Status::EndOfFile to just logging info, remove useless print (#19721) 2023-05-18 08:44:18 +08:00
6a6be52bc9 [enhancement](merge-on-write) Avoiding unnecessary primary key index traversal (#19746) 2023-05-18 08:41:49 +08:00
f04f181249 [Bug](pipeline) RegressionTest failed release resouce cause DCHECK failed #19772 2023-05-18 08:41:32 +08:00
5fa956b0d6 [Bug](pipeline) RegressionTest failed release resouce cause DCHECK failed #19773 2023-05-18 08:35:57 +08:00
4566281cc3 [fix](sink) disable lazy-open partition by default (#19769)
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-05-18 07:28:04 +08:00
082b7cce41 [improvement](storage) let the storage_page_cache_shard_size conf be rounded up to a power of two (#19639) 2023-05-17 22:54:58 +08:00
79d30cfe46 [feature](compact) Duplicate with no keys tables compaction coredump (#19490)
Co-authored-by: yuxianbing <yuxianbing@yy.com>
2023-05-17 22:22:14 +08:00
49c6bbce84 [improvement](load) do not create pthread in tablet_sink (#19465)
add bvar stat for streamload.
2023-05-17 22:05:54 +08:00
dc18da2ce4 [Log](expr) add DCHECK info for expr close DCHECK (#19683) 2023-05-17 21:37:38 +08:00
1d05feea1b [Feature](Nereids) add executable function to support fold constant for functions (#18209)
1. Add date-time functions for fold constant for Nereids.
This is the list of executable date-time function nereids supports up to now:
- now()
- now(int)
- current_timestamp()
- current_timestamp(int)
- localtime()
- localtimestamp()
- curdate()
- current_date()
- curtime()
- current_time()
- date_{add/sub}(),{years/months/days/hours/minutes/seconds}_{add/sub}()
- datediff()
- {date/datev2}()
- {year/quarter/month/day/hour/minute/second}()
- dayof{year/month/week}()
- date_format()
- date_trunc()
- from_days()
- last_day()
- to_monday()
- from_unixtime()
- unix_timestamp()
- utc_timestamp()
- to_date()
- to_days()
- str_to_date()
- makedate()

2. solved problem:
- enable datev2/datetimev2 default.
- refactor Nereids foldConstantOnFE and support fold nested expression.
- separate the executable into multi-files for easily-reading and adding new functions
2023-05-17 21:26:31 +08:00
30c4f25cb3 [fix](multi-catalog) verify the precision of datetime types for each data source (#19544)
Fix threes bugs of timestampv2 precision:
1. Hive catalog doesn't set the precision of timestampv2, and can't get the precision from hive metastore, so set the largest precision for timestampv2;
2. Jdbc catalog use datetimev1 to parse timestamp, and convert to timestampv2, so the precision is lost.
3. TVF doesn't use the precision from meta data of file format.
2023-05-17 20:50:15 +08:00
272a7565b8 [improvement](tracing) Remove useless span levels from be side tracing (#19665)
1. Remove an exec node method corresponding to a span and replace it with an exec node corresponding to a span;
2. Fix some problems with tracing in pipeline.
2023-05-17 19:04:52 +08:00
16d005c7d1 [fix](merge-on-write) fix that delete bitmap calculation error when clone tablet (#19713) 2023-05-17 17:16:45 +08:00
d76e2e2254 [chore](config) ignore_eovercrowded to be true by default (#19282) 2023-05-17 16:21:32 +08:00
b29e87a382 [fix](load) donot hang publish due to compaction signal (#19440) 2023-05-17 16:20:55 +08:00
Pxl
800de168db [Chore](function) clean some unused function symbols (#19649)
clean some unused function symbols
2023-05-17 15:31:51 +08:00
48ec530d2c [fix](functions) fix least/greatest function coredump bug (#19462)
fix least/greatest function coredump bug
2023-05-17 14:12:52 +08:00
56809230d1 [Improvement](string function) optimize substring and in string set (#19257)
* [Improvement](string function) optimize substring and in string set

* update
2023-05-17 14:09:52 +08:00
1462e44162 [Bug](topn) fix rowid fetcher merge with empty block (#19712) 2023-05-17 10:56:32 +08:00
2d9cc8fe8f [improvement](file cache)Support set min file segment size while use block file cache (#19536) 2023-05-17 10:23:33 +08:00
8fd1eb0d1e [minor](hash table) parameterize hash table (#19653) 2023-05-17 09:58:26 +08:00
2bdfaac609 [fix](ubsan) fix ubsan errors (#19658)
ixu ubsan errors:

doris/be/src/util/string_parser.hpp:275:58: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'

doris/be/src/vec/functions/functions_comparison.h:214:51: runtime error: addition of unsigned offset to 0x7fea6c6b7010 overflowed to 0x7fea6c6b700c

doris/be/src/vec/functions/multiply.cpp:67:50: runtime error: signed integer overflow: 1295699415680000000 * 0x0000000000015401d0a4cd4890a77700 cannot be represented in type '__int128

doris/be/src/vec/aggregate_functions/aggregate_function_percentile_approx.h:445:73: runtime error: addition of unsigned offset to 0x7feca3343d10 overflowed to 0x7feca3343d08 

doris/be/src/exec/schema_scanner/schema_tables_scanner.cpp:330:24: run
2023-05-17 09:32:03 +08:00
Pxl
7f73749b88 [Bug](pipeline) fix distributionColumnIds not updated correct when outputColumnUnique… (#19704)
fix distributionColumnIds not updated correct when outputColumnUnique
2023-05-17 00:13:10 +08:00
8f8814e49c [bugfix](be core) master info is deconstructed before fragment mgr and be will core (#19687)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-05-16 21:55:15 +08:00
16f5d3d5b3 [Improvement](memory) new page use Allocator (#19472) 2023-05-16 19:09:17 +08:00
92a533724c [enhancement](merge-on-write) avoid unecessary pk index iteration (#19620) 2023-05-16 17:05:14 +08:00
325a1d4b28 [vectorized](function) support array_count function (#18557)
support array_count function.
array_count:Returns the number of non-zero and non-null elements in the given array.
2023-05-16 17:00:01 +08:00
e22f5891d2 [WIP](row store) two phase opt read row store (#18654) 2023-05-16 13:21:58 +08:00
610f1c8ef5 [improvement](load) skip compression when memtable is small (#19300)
* [improvement](load) skip compression when memtable is small

* format
2023-05-16 12:08:41 +08:00
9cd7005dec [fix](delete) notify all when there is no high priority task (#19577)
In somecases high priority threads are waked but normal are not. We
notify_all as a workaround.
2023-05-16 11:29:10 +08:00
Pxl
b927f8cd37 [Chore](asan) change asan_suppr from interceptor_via_lib to interceptor_via_fun (#19636)
change asan_suppr from interceptor_via_lib to interceptor_via_fun
2023-05-16 10:51:43 +08:00