Commit Graph

8276 Commits

Author SHA1 Message Date
1e774036f1 [fix](function)fix be coredump when using json_object function (#13443) 2022-10-20 17:32:37 +08:00
32b1456b28 [feature-wip](array) remove array config and check array nested depth (#13428)
1. remove FE config `enable_array_type`
2. limit the nested depth of array in FE side.
3. Fix bug that when loading array from parquet, the decimal type is treated as bigint
4. Fix loading array from csv(vec-engine), handle null and "null"
5. Change the csv array loading behavior, if the array string format is invalid in csv, it will be converted to null. 
6. Remove `check_array_format()`, because it's logic is wrong and meaningless
7. Add stream load csv test cases and more parquet broker load tests
2022-10-20 15:52:31 +08:00
Pxl
1892e8f66e [Enhancement](scanner) support split avg key range (#13166) 2022-10-20 14:53:16 +08:00
3c837a9bdd [regression](load) modify variable definition (#13506) 2022-10-20 14:07:53 +08:00
2b328eafbb [function](string_function) add new string function 'extract_url_parameter' (#13323) 2022-10-20 11:11:43 +08:00
b5cd167713 [fix](hashjoin) fix coredump of hash join in ubsan build (#13479)
* [fix](hashjoin) fix coredump of hash join in ubsan build
2022-10-20 10:16:19 +08:00
f7c69ade18 [feature-wip](multi-catalog) implement predicate pushdown in native OrcReader (#13453)
# Proposed changes
Implement predicate pushdown in `OrcReader` by converting doris `ColumnValueRange` to orc `SearchArgument`.

## Remaining problems
1. Orc support `not in`, which may have effect on bloom filter. However, doris `ScanNode` has not push down `not in` to file scanner.
2. Orc support `is null`, and row range has `hasNull` identifier. However,  `_contain_null` in `ColumnValueRange` is ambiguous. `_contain_null = true` only means that the value can be nullable, not equal to null.
3. `DateTimeV2` has lost microsecond precision in `ColumnValueRange`, which may cause filtering error when a min-max value equals to the predicate value.
4. `DateTimeV1`  is not accurate enough, and only saved to seconds.
5. Orc support the predicate pushdown of `float&double` type, but doris has not push down `float&double` type for precision reason.
2022-10-20 10:07:36 +08:00
8637ac1ca3 [regression](framework)set random parallel_fragment_exec_instance_num… (#13383)
Some problems have been found with the setting of parallel_fragment_exec_inistance_num > 1.
Try to use this way to set a random parallel_fragment_exec_inistance_num value for each query to cover more situations.
2022-10-20 10:02:27 +08:00
4996eafe74 [bugfix](VecDateTimeValue) eat the value of microsecond in function from_date_format_str (#13446)
* [bugfix](VecDateTimeValue) eat the value of microsecond in function from_date_format_str

* add sql based regression test

Co-authored-by: xiaojunjie <xiaojunjie@baidu.com>
2022-10-20 09:02:33 +08:00
60d5e4dfce [improvement](spark-load) support parquet and orc file (#13438)
Add support for parquet/orc in SparkDpp.java
Fixed sparkDpp checkstyle issue
2022-10-20 08:59:22 +08:00
bc08854a35 [doc](storage policy) add cold and hot separation docs (#13096) 2022-10-20 08:56:53 +08:00
f329d33666 [chore](fix) Fix some spell errors in be's comments. #13452 2022-10-20 08:56:01 +08:00
3821f8420d [opt](tpch) after change the config to speed up q21 (#13460) 2022-10-20 08:54:35 +08:00
50e2d0fd3e [opt](storage) opt the read by column decimal (#13488)
do the opt:
TPCH Q18 36s->33s
Q20 18s->17s
2022-10-20 08:53:23 +08:00
4fa3b14bf0 [Fix](multi-catalog)Fix NPE caused by GsonUtils created objects. #13489 2022-10-20 08:52:58 +08:00
697fa5f586 [Enhancement](profile) support configure the number of query profile (#13421)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2022-10-20 08:51:36 +08:00
3a2d5db914 [fix](String) fix string type length set to -1 when load stirng data (#13475)
string type length may set to -1 when create TypeDescriptor from thrift or protobuf, this will cause check limit overflow
2022-10-20 08:45:25 +08:00
410e36ef5b [enhancement](macOS) Refine the build scripts for macOS (#13473)
Set the environment up before running the build scripts on macOS.
2022-10-19 22:52:22 +08:00
9ac4cfc9bb [bugfix](array-type) ColumnDate lost is_date_type after cloned (#13420)
Problem:
IColumn::is_date property will lost after ColumnDate::clone called.

Fix:
After ColumnDate created, also set IColumn::is_date.

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-10-19 21:29:36 +08:00
c4b5ba2a4f [Regression](java-udf) Move source code used by Java UDF test case (#13476) 2022-10-19 21:05:06 +08:00
e65a4a9f9f [Improvement](multi-catalog)Support refresh external catalog. (#13363)
Support manually refresh external catalog metadata.
1. refresh catalog external_catalog_name
2. refresh database catalog.db OR refresh database db (current catalog)
3. refresh table catalog.db.table OR refresh table db.table (current catalog) OR refresh table table_name (current db)

And the refresh operations above keep the database and table ids unchanged.
2022-10-19 16:02:14 +08:00
eeb2b0acdb [doc][fix](multi-catalog) Add multi-catalog es doc (#13429)
1. Add multicatalog es doc
2. Modify es unsigned_long mapping to largeint.
3. getHost add pre judge logic.
2022-10-19 16:00:13 +08:00
29b4d8dcad [typo](docs) fix some problem #13462 2022-10-19 15:42:17 +08:00
0b368fbbfa [Bugfix](vec) Fix all create mv using to_bitmap() on negative value columns when enable_vectorized_alter_table is true (#13448)
* [Bugfix] add negtive value check when create mv using vec
2022-10-19 15:40:04 +08:00
5423de68dd [refactor](new-scan) remove old file scan node (#13433)
All these files are not used anymore, can be removed.
2022-10-19 14:25:32 +08:00
1e42598fe6 [memory](podarray) revert not allocate too much memory in podarray change (#13457)
revert not allocate too much memory in podarray change
2022-10-19 14:08:44 +08:00
2745a88814 [enhancement](memtracker) Fix brpc causing query mem tracker to be inaccurate #13401 2022-10-19 12:28:20 +08:00
c449028a5f [fix](year) fix year() results are not as expected (#13426)
fix `year()` results are not as expected
2022-10-19 11:28:00 +08:00
8a068c8c92 [function](string_function) add new string function 'not_null_or_empty' (#13418) 2022-10-19 11:10:37 +08:00
248ca14df7 [fix](test) let each case uses its own table name (#13419) 2022-10-19 10:58:56 +08:00
755a946516 [feature](jsonb) jsonb functions (#13366)
Issue Number: Step3 of DSIP-016: Support JSON type
2022-10-19 08:44:08 +08:00
ac037e57f5 [fix](sort)the sort expr's nullability property may not be right (#13328) 2022-10-18 22:09:02 +08:00
971eb9172f [fix](mem) failure of allocating memory (#13414)
When the target size to allocate is 8164, MemPool will return nullptr.
2022-10-18 21:11:30 +08:00
a8fd76fe32 [Fix](docs) fix error description of LDAP_ADMIN_PASSWORD in the document (#13405)
co-author:@luozenglin
2022-10-18 18:53:10 +08:00
174054e32d [fix](conf) aggressive_memory_decommit and chunk_reserve_limits can not be changed when running (#13427) 2022-10-18 18:21:38 +08:00
d8e53da764 [feature-wip](statistics) collect statistics by sampling sql-tasks (#13399)
1. Collect statistics by sampling sql-tasks.
2. Consolidate statistics SQL statements and remove redundant statements.
2022-10-18 16:34:01 +08:00
6d322f85ac [improvement](compaction) delete num based compaction policy (#13409)
Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-10-18 16:13:28 +08:00
21f233d7e7 [feature-wip](multi-catalog) use apache orc reader to read orc file (#13404)
Use apache orc to read orc file, and convert ColumnVectorBatch to doris block.
2022-10-18 13:47:56 +08:00
125def5102 [enhancement](macOS M1) Support building from source on macOS (M1) (#13195)
# Proposed changes

This PR fixed lots of issues when building from source on macOS with Apple M1 chip.

## ATTENTION

The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime:
1. Some errors with memory tracker occur when BE (RELEASE) starts.
2. Some UT cases fail.
...

Temporarily, the following changes are made on macOS to start BE successfully.
1. Disable memory tracker.
2. Use tcmalloc instead of jemalloc.

This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues.

## Use case

```shell
./build.sh -j 8 --be --clean

cd output/be/bin
ulimit -n 60000
./start_be.sh --daemon
```

## Something else

It takes around _**10+**_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the  development experience on macOS greatly when we finish the adaptation job.
2022-10-18 13:10:13 +08:00
3f964ad5a8 [Regression](javaudf) add regression test for javaudf (#13266) 2022-10-18 12:48:57 +08:00
cd3450bd9d [Improvement](join) optimize join probing phase (#13357) 2022-10-18 12:37:17 +08:00
18f2db6064 [feature](nereids) let minValue and maxValue in stats support for Date, CHAR and VARCHAR type (#13311)
1. enable varchar/char type set min/max value.
    take first 8 chars as long, and convert to double.
2. fix bug when set min/max value for date and datav2
2022-10-18 12:12:33 +08:00
f0dbbe5b46 [Bug](funciton) fix repeat coredump when step is to long (#13408) 2022-10-18 09:55:06 +08:00
49b060418a [optimization](array-type) array_min/array_max function support the date/datetime type (#13407)
This pr is used to expand the supported data type for array_min/array_max function.
Before the change , the array_min/array_max function can't support the date/datetime type.
After the change, array_min/array_max function can support the date/datetime type.
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-10-17 23:38:20 +08:00
dbf71ed3be [feature-wip](new-scan) Support stream load with csv in new scan framework (#13354)
1. Refactor the file reader creation in FileFactory, for simplicity.
    Previously, FileFactory had too many `create_file_reader` interfaces.
    Now unified into two categories: the interface used by the previous BrokerScanNode,
    and the interface used by the new FileScanNode.
    And separate the creation methods of readers that read `StreamLoadPipe` and other readers that read files.

2. Modify the StreamLoadPlanner on FE side to support using ExternalFileScanNode

3. Now for generic reader, the file reader will be created inside the reader, not passed from the outside.

4. Add some test cases for csv stream load, the behavior is same as the old broker scanner.
2022-10-17 23:33:41 +08:00
c114d87d13 [Enhancement](array-type) Tuple is null predicate support array type (#13307)
Issue Number: #12689
2022-10-17 18:50:56 +08:00
207f4e559e [feature](agg) support group_bitmap_xor agg function. (#13287)
support `group_bitmap_xor` agg function
2022-10-17 18:40:06 +08:00
87a6b1a13b [enhancement](memtracker) Fix bthread local consume mem tracker (#13368)
Previously, bthread_getspecific was called every time bthread local was used. In the test at #10823, it was found that frequent calls to bthread_getspecific had performance problems.

So a cache is implemented on pthread local based on the btls key, but the btls key cannot correctly sense bthread switching.

So, based on bthread_self to get the bthread id to implement the cache.
2022-10-17 18:31:07 +08:00
3b5b7ae12b [improvement](config) let default value of alter and load timeout suitable for most cases (#13370)
It is frustrated that a long running job fails due to small timeout. Actually, users
do not expect a timeout for a log running job.
2022-10-17 14:55:05 +08:00
53286794c6 [typo](docs) Fixed thrift_client_timeout_ms's incorrect description of en docs. (#13391)
Co-authored-by: smallhibiscus <8449081280@qq.com>
2022-10-17 14:54:38 +08:00