Commit Graph

6753 Commits

Author SHA1 Message Date
0b368fbbfa [Bugfix](vec) Fix all create mv using to_bitmap() on negative value columns when enable_vectorized_alter_table is true (#13448)
* [Bugfix] add negtive value check when create mv using vec
2022-10-19 15:40:04 +08:00
5423de68dd [refactor](new-scan) remove old file scan node (#13433)
All these files are not used anymore, can be removed.
2022-10-19 14:25:32 +08:00
1e42598fe6 [memory](podarray) revert not allocate too much memory in podarray change (#13457)
revert not allocate too much memory in podarray change
2022-10-19 14:08:44 +08:00
2745a88814 [enhancement](memtracker) Fix brpc causing query mem tracker to be inaccurate #13401 2022-10-19 12:28:20 +08:00
c449028a5f [fix](year) fix year() results are not as expected (#13426)
fix `year()` results are not as expected
2022-10-19 11:28:00 +08:00
8a068c8c92 [function](string_function) add new string function 'not_null_or_empty' (#13418) 2022-10-19 11:10:37 +08:00
248ca14df7 [fix](test) let each case uses its own table name (#13419) 2022-10-19 10:58:56 +08:00
755a946516 [feature](jsonb) jsonb functions (#13366)
Issue Number: Step3 of DSIP-016: Support JSON type
2022-10-19 08:44:08 +08:00
ac037e57f5 [fix](sort)the sort expr's nullability property may not be right (#13328) 2022-10-18 22:09:02 +08:00
971eb9172f [fix](mem) failure of allocating memory (#13414)
When the target size to allocate is 8164, MemPool will return nullptr.
2022-10-18 21:11:30 +08:00
a8fd76fe32 [Fix](docs) fix error description of LDAP_ADMIN_PASSWORD in the document (#13405)
co-author:@luozenglin
2022-10-18 18:53:10 +08:00
174054e32d [fix](conf) aggressive_memory_decommit and chunk_reserve_limits can not be changed when running (#13427) 2022-10-18 18:21:38 +08:00
d8e53da764 [feature-wip](statistics) collect statistics by sampling sql-tasks (#13399)
1. Collect statistics by sampling sql-tasks.
2. Consolidate statistics SQL statements and remove redundant statements.
2022-10-18 16:34:01 +08:00
6d322f85ac [improvement](compaction) delete num based compaction policy (#13409)
Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-10-18 16:13:28 +08:00
21f233d7e7 [feature-wip](multi-catalog) use apache orc reader to read orc file (#13404)
Use apache orc to read orc file, and convert ColumnVectorBatch to doris block.
2022-10-18 13:47:56 +08:00
125def5102 [enhancement](macOS M1) Support building from source on macOS (M1) (#13195)
# Proposed changes

This PR fixed lots of issues when building from source on macOS with Apple M1 chip.

## ATTENTION

The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime:
1. Some errors with memory tracker occur when BE (RELEASE) starts.
2. Some UT cases fail.
...

Temporarily, the following changes are made on macOS to start BE successfully.
1. Disable memory tracker.
2. Use tcmalloc instead of jemalloc.

This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues.

## Use case

```shell
./build.sh -j 8 --be --clean

cd output/be/bin
ulimit -n 60000
./start_be.sh --daemon
```

## Something else

It takes around _**10+**_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the  development experience on macOS greatly when we finish the adaptation job.
2022-10-18 13:10:13 +08:00
3f964ad5a8 [Regression](javaudf) add regression test for javaudf (#13266) 2022-10-18 12:48:57 +08:00
cd3450bd9d [Improvement](join) optimize join probing phase (#13357) 2022-10-18 12:37:17 +08:00
18f2db6064 [feature](nereids) let minValue and maxValue in stats support for Date, CHAR and VARCHAR type (#13311)
1. enable varchar/char type set min/max value.
    take first 8 chars as long, and convert to double.
2. fix bug when set min/max value for date and datav2
2022-10-18 12:12:33 +08:00
f0dbbe5b46 [Bug](funciton) fix repeat coredump when step is to long (#13408) 2022-10-18 09:55:06 +08:00
49b060418a [optimization](array-type) array_min/array_max function support the date/datetime type (#13407)
This pr is used to expand the supported data type for array_min/array_max function.
Before the change , the array_min/array_max function can't support the date/datetime type.
After the change, array_min/array_max function can support the date/datetime type.
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-10-17 23:38:20 +08:00
dbf71ed3be [feature-wip](new-scan) Support stream load with csv in new scan framework (#13354)
1. Refactor the file reader creation in FileFactory, for simplicity.
    Previously, FileFactory had too many `create_file_reader` interfaces.
    Now unified into two categories: the interface used by the previous BrokerScanNode,
    and the interface used by the new FileScanNode.
    And separate the creation methods of readers that read `StreamLoadPipe` and other readers that read files.

2. Modify the StreamLoadPlanner on FE side to support using ExternalFileScanNode

3. Now for generic reader, the file reader will be created inside the reader, not passed from the outside.

4. Add some test cases for csv stream load, the behavior is same as the old broker scanner.
2022-10-17 23:33:41 +08:00
c114d87d13 [Enhancement](array-type) Tuple is null predicate support array type (#13307)
Issue Number: #12689
2022-10-17 18:50:56 +08:00
207f4e559e [feature](agg) support group_bitmap_xor agg function. (#13287)
support `group_bitmap_xor` agg function
2022-10-17 18:40:06 +08:00
87a6b1a13b [enhancement](memtracker) Fix bthread local consume mem tracker (#13368)
Previously, bthread_getspecific was called every time bthread local was used. In the test at #10823, it was found that frequent calls to bthread_getspecific had performance problems.

So a cache is implemented on pthread local based on the btls key, but the btls key cannot correctly sense bthread switching.

So, based on bthread_self to get the bthread id to implement the cache.
2022-10-17 18:31:07 +08:00
3b5b7ae12b [improvement](config) let default value of alter and load timeout suitable for most cases (#13370)
It is frustrated that a long running job fails due to small timeout. Actually, users
do not expect a timeout for a log running job.
2022-10-17 14:55:05 +08:00
53286794c6 [typo](docs) Fixed thrift_client_timeout_ms's incorrect description of en docs. (#13391)
Co-authored-by: smallhibiscus <8449081280@qq.com>
2022-10-17 14:54:38 +08:00
4caa1e8041 [optimization](array-type) update the docs for import data to array column (#13345)
1. this pr is used to update the json load docs for  import data to array column.
when we use json to import data to array column,  the Rapidjson will cause precision problems. 
so we update the json-load docs to specify how to avoid these problems.

Issue Number: #7570
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-10-17 12:43:22 +08:00
045bccdbea [Feature](Retention) support retention function (#13056) 2022-10-17 11:00:47 +08:00
6ea9a65bb6 [Opt](vec) opt runtime filter for TPCH Q22 (#13339) 2022-10-17 10:30:07 +08:00
c1588b2900 [thirdparty](zstd)update dist info and thirdparty change log (#13392) 2022-10-17 09:09:16 +08:00
2da7fe940c [fix](regression-test) fix that multiple cases conflict with the same table name (#13395) 2022-10-17 09:08:30 +08:00
9454bcca12 [fix](memory) Fix USE_JEMALLOC=true UBSAN compilation error #13398 2022-10-17 08:52:14 +08:00
e84d9a6c87 [fix](array-type) Fix cast null to array make be core (#13324)
Doris do not support explicitly cast NULL_TYPE to ANY type .

```
mysql> select cast(NULL as int);
ERROR 1105 (HY000): errCode = 2, detailMessage = Invalid type cast of NULL from NULL_TYPE to INT
```

So we should also forbid user from casting NULL_TYPE to ARRAY type.

This commit will produce the following effect:

```
mysql> select cast(NULL as array<int>);
ERROR 1105 (HY000): errCode = 2, detailMessage = Invalid type cast of NULL from NULL_TYPE to ARRAY<INT(11)>
```
2022-10-17 00:04:50 +08:00
162e60eb19 [fix](array-type) check value valid while insert data into array column (#13365)
We should prevent insert while value overflow.

1. create table:
`CREATE TABLE test_array_load_test_array_int_insert_db.test_array_load_test_array_int_insert_tb ( k1 int NULL, k2 array<int> NULL ) DUPLICATE KEY(k1) DISTRIBUTED BY HASH(k1) BUCKETS 5`

2. try insert data less than INT_MIN.
`insert into test_array_load_test_array_int_insert_tb values (1005, [-2147483649])`

Before this pr, the insert will success, but the value it not correct.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-10-17 00:01:03 +08:00
a83eaddfcf [test](cache)Add remote cache ut (#13377) 2022-10-16 23:59:50 +08:00
1d5ba9cbcc [Improvement](like) Change like function to batch call (#13314) 2022-10-16 16:18:22 +08:00
Pxl
632670a49c [Enhancement](function) refactor of date function (#13362)
refactor of date function
2022-10-16 14:31:26 +08:00
144486e220 [Opt](fun) simd the substring function and use stack buf to speed up (#13338) 2022-10-16 11:48:34 +08:00
a5f3880649 [improvement](memory) disable page cache and chunk allocator, optimize memory allocate size (#13285)
disable page cache by default
disable chunk allocator by default
not use chunk allocator for vectorized allocator by default
add a new config memory_linear_growth_threshold = 128Mb, not allocate memory by RoundUpToPowerOf2 if the allocated size is larger than this threshold. This config is added to MemPool, ChunkAllocator, PodArray, Arena.
2022-10-15 17:27:17 +08:00
bf2e20c4c4 [fix](agg) reset the content of grouping exprs instead of replace it with original exprs (#13376)
* [fix](agg)the reseet the content of grouping exprs instead of replace it with original exprs

* keep old behavior if the grouping type is not GROUP_BY
2022-10-15 11:07:35 +08:00
52397df9f0 [thirdparty](update) zstd 1.5.0 to 1.5.2 #13378 2022-10-15 10:50:20 +08:00
f2fa9606c9 [fix](agg)count function should return 0 for null value (#13247)
count(null) should return 0 instead of 1, the streaming_agg_serialize_to_column function didn't handle if the input value is null, this pr fix it.
2022-10-15 10:40:52 +08:00
4bc33a54a1 [Fix](agg) fix bitmap agg core dump when phmap pointer assert alignment (#13381) 2022-10-15 10:39:23 +08:00
8218cfed40 [Bug](function) Fix constant predicate evaluation (#13346) 2022-10-15 01:05:29 +08:00
79a5125eff [Improvement](predicates) Use datev2 as the compatible type between string and datev2 (#13348)
If string literal can be converted to dateV2, we use datev2 as the compatible type instead of datetimev2.
2022-10-14 19:00:37 +08:00
993f38fe3c [feature](Nereids): use Multi join to rearrange join to eliminate cross join by using predicate. (#13353) 2022-10-14 17:26:34 +08:00
5bc8858571 [fix](jsonreader) teach jsonreader to release memory (#13336)
Allocator of rapidjson does not release memory, this fix use allocator with local buffer and call Clear to release memory allocated beyond local buffer.
2022-10-14 15:52:05 +08:00
6746434770 [improvement](schema change) avoid using column ptr swap (#13273) 2022-10-14 15:19:08 +08:00
b82e54a525 [feature](statistics) support to drop table or partition statistics (#13303)
Manually drop statistics for tables or partitions. Table or partition can be specified, if neither is specified, all statistics under the current database will be deleted.

syntax:
```SQL
DROP STATS [tableName [PARTITIONS(partitionNames)]];

-- e.g.
DROP STATS;    -- drop all table statistics under the current database
DROP STATS t0;    -- drop t0 statistics
DROP STATS t1 PARTITIONS(p1);    -- drop partition p1 statistics of t1
```
2022-10-14 15:15:37 +08:00