Commit Graph

3010 Commits

Author SHA1 Message Date
9dc5dd382a [enhancement](memtracker) Fix Brpc mem count and refactored thread context macro (#13469) 2022-10-21 12:01:38 +08:00
3ca8bfaf30 [Function](array) support array_difference function (#13440) 2022-10-21 10:57:37 +08:00
9a3c1f0867 [Improvement](decimal) print decimal according to the real precision and scale (#13437) 2022-10-21 10:00:01 +08:00
d3f65aa746 [Improvement](join) remove unnecessary state for join (#13472) 2022-10-21 09:59:34 +08:00
1f7829e099 [Fix](array-type) bugfix for array column with delete condition (#13361)
Fix for SQL with array column:
delete from tbl where c_array is null;

more info please refer to #13360

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-10-21 09:29:02 +08:00
1b0dafcaa1 [Enhancement](load) consider memtable in flush while reducing load me… (#13480)
We should consider memory which are being flushed from memtable to disk when trying to reduce memory by flushing memtable. Otherwise, we might not release memory space as expected. (e.g. lots of large memtable is in flush, the reduce_mem_usage method picks some small memtables to flush, it can't release enough memory and also can generate lots of small segments, which can cause -238 error)
2022-10-21 08:35:35 +08:00
e62d3dd8e5 [opt](function) refactor extract_url to use StringValue (#13508)
change extract_url use stringvalue to repalce std::string to speed up
2022-10-21 08:33:39 +08:00
3dd00df24b [fix](jsonreader) release memory of both value and parse allocator (#13513) 2022-10-21 08:33:05 +08:00
d2be5096d6 [Revert](mem) revert the mem config cause perfermace degradation (#13526)
* Revert "[fix](mem) failure of allocating memory (#13414)"

This reverts commit 971eb9172f3e925c0b46ec1ffd1a9037a1b49801.

* Revert "[improvement](memory) disable page cache and chunk allocator, optimize memory allocate size (#13285)"

This reverts commit a5f3880649b094b58061f25c15dccdb50a4a2973.
2022-10-21 08:32:16 +08:00
736d113700 [fix](memtracker) Fix transmit_tracker null pointer because phamp is not thread safe #13528 2022-10-21 08:30:30 +08:00
d624ff0580 [chore](macOS) Avoid using binutils from Homebrew to build third parties (#13512)
Overwrite the environment variable PATH to avoid using binutils from Homebrew to build third parties which may cause compilation errors.

Error: building for macOS-x86_64 but attempting to link with file built for unknown-unsupported file format
2022-10-21 01:28:30 +08:00
7109cbfe6f [feature-wip](unique-key-merge-on-write) fix that delete the bitmap of stale rowset (#13393) 2022-10-20 21:53:13 +08:00
1e774036f1 [fix](function)fix be coredump when using json_object function (#13443) 2022-10-20 17:32:37 +08:00
32b1456b28 [feature-wip](array) remove array config and check array nested depth (#13428)
1. remove FE config `enable_array_type`
2. limit the nested depth of array in FE side.
3. Fix bug that when loading array from parquet, the decimal type is treated as bigint
4. Fix loading array from csv(vec-engine), handle null and "null"
5. Change the csv array loading behavior, if the array string format is invalid in csv, it will be converted to null. 
6. Remove `check_array_format()`, because it's logic is wrong and meaningless
7. Add stream load csv test cases and more parquet broker load tests
2022-10-20 15:52:31 +08:00
Pxl
1892e8f66e [Enhancement](scanner) support split avg key range (#13166) 2022-10-20 14:53:16 +08:00
2b328eafbb [function](string_function) add new string function 'extract_url_parameter' (#13323) 2022-10-20 11:11:43 +08:00
b5cd167713 [fix](hashjoin) fix coredump of hash join in ubsan build (#13479)
* [fix](hashjoin) fix coredump of hash join in ubsan build
2022-10-20 10:16:19 +08:00
f7c69ade18 [feature-wip](multi-catalog) implement predicate pushdown in native OrcReader (#13453)
# Proposed changes
Implement predicate pushdown in `OrcReader` by converting doris `ColumnValueRange` to orc `SearchArgument`.

## Remaining problems
1. Orc support `not in`, which may have effect on bloom filter. However, doris `ScanNode` has not push down `not in` to file scanner.
2. Orc support `is null`, and row range has `hasNull` identifier. However,  `_contain_null` in `ColumnValueRange` is ambiguous. `_contain_null = true` only means that the value can be nullable, not equal to null.
3. `DateTimeV2` has lost microsecond precision in `ColumnValueRange`, which may cause filtering error when a min-max value equals to the predicate value.
4. `DateTimeV1`  is not accurate enough, and only saved to seconds.
5. Orc support the predicate pushdown of `float&double` type, but doris has not push down `float&double` type for precision reason.
2022-10-20 10:07:36 +08:00
4996eafe74 [bugfix](VecDateTimeValue) eat the value of microsecond in function from_date_format_str (#13446)
* [bugfix](VecDateTimeValue) eat the value of microsecond in function from_date_format_str

* add sql based regression test

Co-authored-by: xiaojunjie <xiaojunjie@baidu.com>
2022-10-20 09:02:33 +08:00
f329d33666 [chore](fix) Fix some spell errors in be's comments. #13452 2022-10-20 08:56:01 +08:00
3821f8420d [opt](tpch) after change the config to speed up q21 (#13460) 2022-10-20 08:54:35 +08:00
50e2d0fd3e [opt](storage) opt the read by column decimal (#13488)
do the opt:
TPCH Q18 36s->33s
Q20 18s->17s
2022-10-20 08:53:23 +08:00
3a2d5db914 [fix](String) fix string type length set to -1 when load stirng data (#13475)
string type length may set to -1 when create TypeDescriptor from thrift or protobuf, this will cause check limit overflow
2022-10-20 08:45:25 +08:00
410e36ef5b [enhancement](macOS) Refine the build scripts for macOS (#13473)
Set the environment up before running the build scripts on macOS.
2022-10-19 22:52:22 +08:00
9ac4cfc9bb [bugfix](array-type) ColumnDate lost is_date_type after cloned (#13420)
Problem:
IColumn::is_date property will lost after ColumnDate::clone called.

Fix:
After ColumnDate created, also set IColumn::is_date.

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-10-19 21:29:36 +08:00
c4b5ba2a4f [Regression](java-udf) Move source code used by Java UDF test case (#13476) 2022-10-19 21:05:06 +08:00
0b368fbbfa [Bugfix](vec) Fix all create mv using to_bitmap() on negative value columns when enable_vectorized_alter_table is true (#13448)
* [Bugfix] add negtive value check when create mv using vec
2022-10-19 15:40:04 +08:00
5423de68dd [refactor](new-scan) remove old file scan node (#13433)
All these files are not used anymore, can be removed.
2022-10-19 14:25:32 +08:00
1e42598fe6 [memory](podarray) revert not allocate too much memory in podarray change (#13457)
revert not allocate too much memory in podarray change
2022-10-19 14:08:44 +08:00
2745a88814 [enhancement](memtracker) Fix brpc causing query mem tracker to be inaccurate #13401 2022-10-19 12:28:20 +08:00
c449028a5f [fix](year) fix year() results are not as expected (#13426)
fix `year()` results are not as expected
2022-10-19 11:28:00 +08:00
8a068c8c92 [function](string_function) add new string function 'not_null_or_empty' (#13418) 2022-10-19 11:10:37 +08:00
755a946516 [feature](jsonb) jsonb functions (#13366)
Issue Number: Step3 of DSIP-016: Support JSON type
2022-10-19 08:44:08 +08:00
ac037e57f5 [fix](sort)the sort expr's nullability property may not be right (#13328) 2022-10-18 22:09:02 +08:00
971eb9172f [fix](mem) failure of allocating memory (#13414)
When the target size to allocate is 8164, MemPool will return nullptr.
2022-10-18 21:11:30 +08:00
174054e32d [fix](conf) aggressive_memory_decommit and chunk_reserve_limits can not be changed when running (#13427) 2022-10-18 18:21:38 +08:00
6d322f85ac [improvement](compaction) delete num based compaction policy (#13409)
Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-10-18 16:13:28 +08:00
21f233d7e7 [feature-wip](multi-catalog) use apache orc reader to read orc file (#13404)
Use apache orc to read orc file, and convert ColumnVectorBatch to doris block.
2022-10-18 13:47:56 +08:00
125def5102 [enhancement](macOS M1) Support building from source on macOS (M1) (#13195)
# Proposed changes

This PR fixed lots of issues when building from source on macOS with Apple M1 chip.

## ATTENTION

The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime:
1. Some errors with memory tracker occur when BE (RELEASE) starts.
2. Some UT cases fail.
...

Temporarily, the following changes are made on macOS to start BE successfully.
1. Disable memory tracker.
2. Use tcmalloc instead of jemalloc.

This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues.

## Use case

```shell
./build.sh -j 8 --be --clean

cd output/be/bin
ulimit -n 60000
./start_be.sh --daemon
```

## Something else

It takes around _**10+**_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the  development experience on macOS greatly when we finish the adaptation job.
2022-10-18 13:10:13 +08:00
3f964ad5a8 [Regression](javaudf) add regression test for javaudf (#13266) 2022-10-18 12:48:57 +08:00
cd3450bd9d [Improvement](join) optimize join probing phase (#13357) 2022-10-18 12:37:17 +08:00
f0dbbe5b46 [Bug](funciton) fix repeat coredump when step is to long (#13408) 2022-10-18 09:55:06 +08:00
49b060418a [optimization](array-type) array_min/array_max function support the date/datetime type (#13407)
This pr is used to expand the supported data type for array_min/array_max function.
Before the change , the array_min/array_max function can't support the date/datetime type.
After the change, array_min/array_max function can support the date/datetime type.
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-10-17 23:38:20 +08:00
dbf71ed3be [feature-wip](new-scan) Support stream load with csv in new scan framework (#13354)
1. Refactor the file reader creation in FileFactory, for simplicity.
    Previously, FileFactory had too many `create_file_reader` interfaces.
    Now unified into two categories: the interface used by the previous BrokerScanNode,
    and the interface used by the new FileScanNode.
    And separate the creation methods of readers that read `StreamLoadPipe` and other readers that read files.

2. Modify the StreamLoadPlanner on FE side to support using ExternalFileScanNode

3. Now for generic reader, the file reader will be created inside the reader, not passed from the outside.

4. Add some test cases for csv stream load, the behavior is same as the old broker scanner.
2022-10-17 23:33:41 +08:00
c114d87d13 [Enhancement](array-type) Tuple is null predicate support array type (#13307)
Issue Number: #12689
2022-10-17 18:50:56 +08:00
207f4e559e [feature](agg) support group_bitmap_xor agg function. (#13287)
support `group_bitmap_xor` agg function
2022-10-17 18:40:06 +08:00
87a6b1a13b [enhancement](memtracker) Fix bthread local consume mem tracker (#13368)
Previously, bthread_getspecific was called every time bthread local was used. In the test at #10823, it was found that frequent calls to bthread_getspecific had performance problems.

So a cache is implemented on pthread local based on the btls key, but the btls key cannot correctly sense bthread switching.

So, based on bthread_self to get the bthread id to implement the cache.
2022-10-17 18:31:07 +08:00
045bccdbea [Feature](Retention) support retention function (#13056) 2022-10-17 11:00:47 +08:00
6ea9a65bb6 [Opt](vec) opt runtime filter for TPCH Q22 (#13339) 2022-10-17 10:30:07 +08:00
9454bcca12 [fix](memory) Fix USE_JEMALLOC=true UBSAN compilation error #13398 2022-10-17 08:52:14 +08:00