Commit Graph

11195 Commits

Author SHA1 Message Date
35c19daec7 [opt](routine load) log BE id when get partitions failed. (#20749)
Add BackendId when get partitions failed to make debug error easier.
2023-06-13 19:15:05 +08:00
f1fd486f84 [fix](docker)Fix docker be init script restart failed bug (#20505)
fix docker be restart failed bug
2023-06-13 19:05:31 +08:00
5d2758cb8f [improvement](build) move add BE extension jars to java_extensions dir (#20740)
Follow #20185
Move all BE java extension jars to `be/lib/java_extensions/` dir.
Also remove `udf` dir, used for BE native udf, which is deprecated since v1.2

The final output is:

```
output
├── be
│   ├── bin
│   ├── conf
│   ├── dict
│   ├── lib
|   ├── java_extensions
│       ├── hudi-scanner-jar-with-dependencies.jar
│       ├── java-udf-jar-with-dependencies.jar
│       ├── jdbc-scanner-jar-with-dependencies.jar
│       ├── max-compute-scanner-jar-with-dependencies.jar
│       └── paimon-scanner-jar-with-dependencies.jar
│   ├── LICENSE-dist.txt
│   ├── licenses
│   ├── log
│   ├── NOTICE.txt
│   ├── storage
│   └── www
└── fe
    ├── bin
    ├── conf
    ├── doris-meta
    ├── lib
    ├── LICENSE-dist.txt
    ├── licenses
    ├── log
    ├── mysql_ssl_default_certificate
    ├── NOTICE.txt
    ├── spark-dpp
    └── webroot
```
2023-06-13 18:55:12 +08:00
Pxl
9244cb6553 [Chore](runtime-filter) do not make query fail when rf publish failed (#20742)
do not make query fail when rf publish failed
2023-06-13 18:23:46 +08:00
37db0145b4 [fix](load) fix mysql load parse response npe (#20699) 2023-06-13 18:14:03 +08:00
ad2f1b5647 [Update](clucene) synchronize clucene version to address PFOR adaptation issue (#20736) 2023-06-13 18:04:48 +08:00
7636dd1fdc [fix](nereids) always use colocate scan when agg's fragment has olap scan (#20695) 2023-06-13 17:59:17 +08:00
7942bd0bf9 [fix](planner) cast string literal to date like type should not be an implict cast (#20709)
1. cast string literal to date like type should not be an implict cast
2. the string representation of float like type should not be scientific notation
3. the data type of like function's regex expr should be string type even if it's a null literal
4. add -Xss4m in fe.conf to prevent stack overflow in some case
2023-06-13 17:57:14 +08:00
0e82c0d7a2 [Fix](Nereids) constant folding for function timestamp() (#20607) 2023-06-13 17:41:58 +08:00
feb21fc9e9 [fix](group_concat) use default seperator ',' instead of ', ' for group_concat, to be consistant with mysql (#20741) 2023-06-13 17:20:29 +08:00
2dddab03a1 [compatibility](schema cache) ensure schema version when using schema cache (#20729)
When FE is old version, be is new version, issue a schema change(add column) and
then query, old version of FE query without schema version could result in reading
stale schema from schema cache
2023-06-13 15:19:26 +08:00
4b15185e25 [improvement](hdfs) add parquet footer cache and hdfs file handle cache (#20544)
1. Add hdfs file handle cache for hdfs file reader

    Copied from Impala, `https://github.com/apache/impala/blob/master/be/src/util/lru-multi-cache.h`. (Thanks for the Impala team)
    This is a lru cache that can store multi entries with same key.
    The key is build with {file name + modification time}
    The value is the hdfsFile pointer that point to a certain hdfs file.
    
    This cache is to avoid reopen same hdfs file mutli time, which can save
    query time.
    
    Add a BE config `max_hdfs_file_handle_cache_num` to limit the max number
    of file handle cache, default is 20000.

2. Add file meta cache

	The file meta cache is a lru cache. the key is {file name + modification time},
	the value is the parsed file meta info of the certain file, which can save
	the time of re-parsing file meta everytime.
	Currently, it is only used for caching parquet file footer.
	
The test show that is cache is hit, the `FileOpenTime` and `ParseFooterTime` is reduce to almost 0
in query profile, which can save time when there are lots of files to read.
2023-06-13 15:13:57 +08:00
2adf5169e6 [improvement](test) improve p2 case of githubevents (#20727)
Check rows of github_events table after restore finish.
2023-06-13 14:31:24 +08:00
54a7dbeb4d [Refactor](External) Move Common ODBC Methods to JDBC Class and Add Default config to Disable ODBC Creation (#20566)
This PR addresses the refactoring of common methods that were originally located within the ODBC classes, but were used by the JDBC classes. These methods have now been moved to the JDBC classes to improve code readability and maintainability.

In addition, we have disabled the creation of ODBC external tables by default. However, this will not affect the existing usage of ODBC. You can still enable the ODBC external tables through the enable_odbc_table setting. Please be aware that we plan to completely remove the ODBC external tables in future versions, so we recommend using the JDBC Catalog as a priority.
2023-06-13 14:29:04 +08:00
033f64de93 [tools](tpch)add analyze in run-tpch-queries.sh (#20733) 2023-06-13 14:11:45 +08:00
eaa13e66f9 [fix](planner) inplement constant folding for function to_monday() (#20708) 2023-06-13 11:40:44 +08:00
Pxl
e010fa8d4f [Chore](runtime filter) remove runtime filter ready_for_publish/publish_finally (#20593) 2023-06-13 11:20:49 +08:00
ee0e2b40da [Improvement](meta) support return brief info of restore job (#20653) 2023-06-13 10:47:31 +08:00
ce3050d75c [fix](regression) fix vertical compaction test (#20601) 2023-06-13 10:31:22 +08:00
e28187feb7 [fix](hive) fix NPE of hive meta store client (#20664)
The failed to connect to hive meta store, the exception will be thrown.
But there is a bug that the exception object may not be set, causing NPE.
2023-06-13 09:41:49 +08:00
57656b2459 [Enhancement](java-udf) java-udf module split to sub modules (#20185)
The java-udf module has become increasingly large and difficult to manage, making it inconvenient to package and use as needed. It needs to be split into multiple sub-modules, such as : java-commom、java-udf、jdbc-scanner、hudi-scanner、 paimon-scanner.

Co-authored-by: lexluo <lexluo@tencent.com>
2023-06-13 09:41:22 +08:00
51bbf17786 [Refactor](Profile) Add and refactor the join profile (#20693) 2023-06-13 09:06:51 +08:00
ef4410821f [typo](doc)document optimization (#20645)
* document optimization

* document optimization
2023-06-13 09:01:03 +08:00
4ac38ca67a [typo](docs) add a python example for stream load. (#20697) 2023-06-13 08:57:01 +08:00
550584e4e9 [docs](docs)Add the list of BI tools supported by Doris. (#20690) 2023-06-13 08:56:01 +08:00
73ad885e19 [Feature][Fix](multi-catalog) Implements transactional hive full acid tables. (#20679)
After supporting insert-only transactional hive full acid tables #19518, #19419, this PR support transactional hive full acid tables.

Support hive3 transactional hive full acid tables.
Hive2 transactional hive full acid tables need to run major compactions.
2023-06-13 08:55:16 +08:00
939575f5f3 [fix](mtmv)create mtmv failed when not specifying refresh strategy #20696
* fix no refresh error

* add ut
2023-06-13 08:53:24 +08:00
Pxl
5e3a96d605 [Bug](pipeline) fix memory leak because pipeline shared ptr not release #20710 2023-06-13 08:50:34 +08:00
412ca9059e [fix](routine-load) fix stackoverflow bug in routine load (#20704)
When executing routine load job, there may encounter StackOverflowException.
This is because the expr in column setting list will be analyze for each routine load sub task,
and there is a self-reference bug that may cause endless loop when analyzing expr.

The following columns expr list may trigger this bug:

```
columns(col1, col2,
col2=null_or_empty(col2),
col1=null_or_empty(col2))
```

This fix is verified by user, but I can't add regression test for this case, because I can't submit a routine load job
in our regression test, and this bug can only be triggered in routine load.
2023-06-13 00:07:56 +08:00
283c55720d [bug](cooldown) Fix the issue of unused remote files not being deleted (#19785) 2023-06-12 21:05:09 +08:00
1433544c56 [fix](case expr) fix coredump of case for null value 3 #20711 2023-06-12 20:58:01 +08:00
6652287b52 [Fix](regression-test) fix unstable test case nereids_p0/update (#20692) 2023-06-12 20:55:22 +08:00
b4e552c3c3 [typo](docs) add parameter version (#20672) 2023-06-12 18:47:32 +08:00
c25c19bddc [test](regression) Add cases to test join condition push and not like (#20453)
Add testing cases to issue #19613
2023-06-12 18:26:23 +08:00
Pxl
5fd9f58bd3 [Chore](pipeline-engine) adjus queryt canceled log on pipeline engine (#20702)
adjus queryt canceled log on pipeline engine
2023-06-12 18:23:19 +08:00
565095eb52 [bug](function) fix is_null/is_not_null check is_const has error (#20562)
fix is_null/is_not_null check is_const has error
2023-06-12 18:21:12 +08:00
daf18a4b0e [fix](MTMV) Support refreshing data manually (#20108) 2023-06-12 17:57:06 +08:00
153f91f77e [typo](doc) Update doc for newly released 1.2.0 version of spark connector (#20639) 2023-06-12 17:42:10 +08:00
9d47c6a871 [fix](columnstring) fix bug of columnstring prefetch (#20698) 2023-06-12 17:03:44 +08:00
99c0592157 [Feature](array-function) Support array_pushback function #17417 (#19988)
Implement array_pushback.

mysql> select array_pushback([1, 2], 3);
+--------------------------------+
| array_pushback(ARRAY(1, 2), 3) |
+--------------------------------+
| [1, 2, 3]                      |
+--------------------------------+
1 row in set (0.01 sec)
2023-06-12 16:51:12 +08:00
141813b476 [tpcds](nereids) estimate distribution cost by byte size instead of row count (#20642)
this pr impacts tpch q16 Agg strategy, but no performance issue
this pr improves tpcds sf100

before:
cold 141 sec
hot 133 sec

after:
code 137 sec
hot 128 sec
2023-06-12 16:23:49 +08:00
ea264ce9de [Opt](join) short circuit probe for join node (#20585)
Support the _short_circuit_for_probe for join node
2023-06-12 16:01:09 +08:00
0b228b3414 [fix](load)Support load json data with default value (#20624)
* support json default value

---------

Co-authored-by: duanxujian <duanxujian@jd.com>
2023-06-12 14:51:31 +08:00
10134ea8c6 [fix](planner) fix RewriteInPredicateRule may be useless (#20668)
Issue Number: close #20669

RewriteInPredicateRule may cast InPredicate expr's two child to the same type, for example: where cast(age as char) in ('11'), the type of age is int, RewriteInPredicateRule will cast expr's two child type to int. As in the example above, child 0 will be such struct: 
```
child 0: type: int
    |---  child: type : char
            |-- child: type : int
```

Due to the RewriteInPredicateRule cast the type of the expr to int, it will reanalyze stmt, but it will reset stmt first before reanalyze the stmt, and reset opt will change child 0 to such struct:
```
child: type : char
    |-- child: type : int
```
It cause two child's type will be cast to varchar in func castAllToCompatibleType, the logic of RewriteInPredicateRule will be useless.

In 1.1-lts and 1.2-lts, such case  " where cast(age as char) in ('11')"  can't work well,  because func castAllToCompatibleType will cast int to char but int can't cast to char(master can work well because func castAllToCompatibleType will cast int to varchar in such case).
```
MySQL [test]> select user_id from test_cast where cast(age as char) in ('45');
ERROR 1105 (HY000): errCode = 2, detailMessage = type not match, originType=INT, targeType=CHAR(*)
```
2023-06-12 14:39:01 +08:00
f90d5dbacf [fix](test) fix unstable dynamic partition regression test (#20674)
Add to define variable with def keyword
2023-06-12 14:28:30 +08:00
28fbdf3273 [BUG](es_catalog)Solve the problem of querying es catalog Unexpected exception: Index:… (#18743) 2023-06-12 13:48:12 +08:00
Pxl
7f8c5c81e7 [Feature](agg_state) support agg_state combinator on nereids (#20164)
support agg_state combinator on nereids
2023-06-12 12:49:26 +08:00
a02a2f4163 [doc](create-function) Update CREATE-FUNCTION.md to remove the usage of c++ (#20654) 2023-06-12 11:48:14 +08:00
14f59bef1d [improvement](profile)add sum/avg rpc time (#20511) 2023-06-12 11:34:49 +08:00
bcc37c9405 [fix](planner)the common type of floating and decimal should be floating type (#20634)
* [fix](planner)the common type of floating and decimal should be floating type

* fix test cases
2023-06-12 11:32:23 +08:00