Commit Graph

8339 Commits

Author SHA1 Message Date
a3cd0ddbdc [refactor](remove broker scan node) it is not useful any more (#16128)
remove broker scannode
remove broker table
remove broker scanner
remove json scanner
remove orc scanner
remove hive external table
remove hudi external table
remove broker external table, user could use broker table value function instead
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-23 19:37:38 +08:00
61fccc88d7 [vectorized](analytic) fix analytic node of window function get wrong… (#16074)
[Bug] 基础函数rank()开窗排序结果错误 #15951
2023-01-23 16:09:46 +08:00
ab04a458aa [Enhancement](export) cancel all running coordinators when execute cancel-export statement. (#15801) 2023-01-22 23:11:32 +08:00
199d7d3be8 [Refactor]Merged string_value into string_ref (#15925) 2023-01-22 16:39:23 +08:00
b9872ceb98 [deps](libhdfs3) update to 2.3.6 to fix kms aes 256 bug (#16127)
update libhdfs3 to 2.3.6 to fix kms aes 256 bug.
And update the licences and changelog
2023-01-22 07:18:35 +08:00
8920295534 [refactor](remoe non vec code) remove non vectorized conjunctx from scanner (#16121)
1. remove arrow group filter
2. remove non vectorized conjunctx from scanner

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-21 19:23:17 +08:00
253445ca46 [vectorzied](jdbc) fix jdbc executor for get result by batch and memo… (#15843)
result set should be get by batch size2.
fix memory leak3.
2023-01-21 08:22:22 +08:00
87c7f2fcc1 [Feature](profile) set sql and defaultDb fields in show-load-profile. (#15875)
When execute show load profile '/', the value of SQL and DefaultDb columns are all 'N/A', but we can fill these fields,the result of this pr is as follows:

Execute show load profile '/'\G:

MySQL [test_d]> show load profile '/'\G
*************************** 1. row ***************************
   QueryId: 652326
      User: N/A
 DefaultDb: default_cluster:test_d
       SQL: LOAD LABEL `default_cluster:test_d`.`xxx`  (APPEND DATA INFILE ('hdfs://xxx/user/hive/warehouse/xxx.db/xxx/*')  INTO TABLE xxx FORMAT AS 'ORC' (c1, c2, c3) SET (`c1` = `c1`, `c2` = `c2`, `c3` = `c3`))  WITH BROKER broker_xxx (xxx)  PROPERTIES ("max_filter_ratio" = "0", "timeout" = "30000")
 QueryType: Load
 StartTime: 2023-01-12 18:33:34
   EndTime: 2023-01-12 18:33:46
 TotalTime: 11s613ms
QueryState: N/A
1 row in set (0.01 sec)
2023-01-21 08:10:15 +08:00
8b40791718 [Feature](ES): catalog support mapping es _id #15943 2023-01-21 08:08:32 +08:00
01c001e2ac [refactor](javaudf) simplify UdfExecutor and UdafExecutor (#16050)
* [refactor](javaudf) simplify UdfExecutor and UdafExecutor

* update

* update
2023-01-21 08:07:28 +08:00
25046fabec [regression-test](sub query) add regression test for subquery with limit (#16051)
* [regression-test](sub query) add regression test for subquery with limit

* add lisence header
2023-01-21 08:06:49 +08:00
de12957057 [debug](ParquetReader) print file path if failed to read parquet file (#16118) 2023-01-21 08:05:17 +08:00
2daa5f3fef [fix](statistics) Fix statistics related threads continuously spawn as doing checkpoint #16088 2023-01-21 07:58:33 +08:00
8d02961216 [test](pipline)Remove P1 regression required check in .asf.yaml (#16119) 2023-01-21 07:57:52 +08:00
7814d2b651 [Fix](Oracle External Table) fix that oracle external table can not insert batch values (#16117)
Issue Number: close #xxx

This pr fix two bugs:

_jdbc_scanner may be nullptr in vjdbc_connector.cpp, so we use another method to count jdbc statistic. close [Enhencement](jdbc scanner) add profile for jdbc scanner #15914
In the batch insertion scenario, oracle database does not support syntax insert into tables values (...),(...); , what it supports is:
insert all
into table(col1,col2) values(c1v1, c2v1)
into table(col1,col2) values(c1v2, c2v2)
SELECT 1 FROM DUAL;
2023-01-21 07:57:12 +08:00
d318d644ff [docs](en) update en docs (#16124) 2023-01-20 23:05:39 +08:00
9ffd109b35 [fix](datetimev2) Fix BE datetimev2 type returning wrong result (#15885) 2023-01-20 22:25:20 +08:00
6b110aeba6 [test](Nereids) add regression cases for all functions (#15907) 2023-01-20 22:17:27 +08:00
5514b1c1b7 [enhancement](tablet_report) accelerate deleteFromBackend function to avoid tablet report task blocked (#16115) 2023-01-20 20:11:58 +08:00
0305aad097 [fix](privilege)fix grant resource bug (#16045)
GRANT USAGE_PRIV ON RESOURCE * TO user;
user will see all database

Describe your changes.

Set a PrivPredicate for show resources and remove USAGE under PrivPredicate in SHOW_ PRIV
2023-01-20 19:00:44 +08:00
3b08a22e61 [test](Nereids) add p0 regression test for Nereids (#15888) 2023-01-20 18:50:23 +08:00
956070e17f fix english number of tpch (#16116) 2023-01-20 17:27:10 +08:00
171404228f [improvement](vertical compaction) cache segment in vertical compaction (#16101)
1.In vertical compaction, segments will be loaded for every column group, so
we should cache segment ptr to avoid too many repeated io.
2.fix vertical compaction data size bug
2023-01-20 16:38:23 +08:00
a4265fae70 [enhancement](query) Make query scan nodes more evenly distributed (#16037)
Add replicaNumPerHost into consideration while schedule scan node to host to make final query scan nodes more evenly distributed in cluster
2023-01-20 16:24:49 +08:00
13d93cb2b4 [typo](doc)nvl add 1.2 label (#15856) 2023-01-20 15:11:58 +08:00
419f433d21 [fix](Nereids) topn arg check is not compatible with legacy planner (#16105) 2023-01-20 15:08:10 +08:00
72df283344 [fix](planner) extract common factor rule should consider not only where predicate (#16110)
This PR #14381 limit the `ExtractCommonFactorsRule` to handle only `WHERE` predicate,
but the predicate in `ON` clause should also be considered. Such as:

```
CREATE TABLE `nation` (
  `n_nationkey` int(11) NOT NULL,
  `n_name` varchar(25) NOT NULL,
  `n_regionkey` int(11) NOT NULL,
  `n_comment` varchar(152) NULL
)
DUPLICATE KEY(`n_nationkey`)
DISTRIBUTED BY HASH(`n_nationkey`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);


select * from
nation n1 join nation n2
on (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
```

There should be predicates:
```
PREDICATES: `n1`.`n_name` IN ('FRANCE', 'GERMANY')
PREDICATES: `n2`.`n_name` IN ('FRANCE', 'GERMANY')
```
On each scan node.

This PR fix this issue by removing the limit of `ExtractCommonFactorsRule`
2023-01-20 14:53:48 +08:00
1638936e3f [fix](oracle catalog) oracle catalog support TIMESTAMP dateType of oracle (#16113)
`TIMESTAMP` dateType of Oracle will map to `DateTime` dateType of Doris
2023-01-20 14:47:58 +08:00
726427b795 [refactor](fe) refactor and upgrade dependency tree of FE and support AWS glue catalog (#16046)
1. Spark dpp
 
	Move `DppResult` and `EtlJobConfig` to sparkdpp package in `fe-common` module.
	So taht `fe-core` is longer depends on `spark-dpp` module, so that the `spark-dpp.jar`
	will not be moved into `fe/lib`, which reduce the size of FE output.
	
2. Modify start_fe.sh

	Modify the CLASSPATH to make sure that doris-fe.jar is at front, so that
	when loading classes with same qualified name, it will be got from doris-fe.jar firstly.
	
3. Upgrade hadoop and hive version

	hadoop: 2.10.2 -> 3.3.3
	hive: 2.3.7 -> 3.1.3
	
4. Override the IHiveMetastoreClient implementations from dependency

	`ProxyMetaStoreClient.java` for Aliyun DLF.
	`HiveMetaStoreClient.java` for origin Apache Hive metastore.

	Because I need to modified some of their method to make them compatible with
	different version of Hive.
	
5. Exclude some unused dependencies to reduce the size of FE output

	Now it is only 370MB (Before is 600MB)
	
6. Upgrade aws-java-sdk version to 1.12.31

7. Support AWS Glue Data Catalog

8. Remove HudiScanNode(no longer support)
2023-01-20 14:42:16 +08:00
3652cb3fe9 [test](Nereids)add test aboule dateType and dateTimeType (#16098) 2023-01-20 14:15:54 +08:00
101bc568d7 [fix](Nereids) fix bugs about date function (#16112)
1. when casting constant, check the value is whether in the range of targetType
2. change the scale of dateTimeV2 to 6
2023-01-20 14:11:17 +08:00
cbb203efd2 [fix](nereids) fix test_join regression test for nereids (#16094)
1. add TypeCoercion for (string, decimal) and (date, decimal)
2. The equality of LogicalProject node should consider children in some case
3. don't push down join condition like "t1 join t2 on true/false"
4. add PUSH_DOWN_FILTERS after FindHashConditionForJoin
5. nestloop join should support all kind of join
6. the intermediate tuple should contains slots from both children of nest loop join.
2023-01-20 14:02:29 +08:00
116e17428b [Enhancement](point query optimize) improve performace of point query on primary keys (#15491)
1. support row format using codec of jsonb
2. short path optimize for point query
3. support prepared statement for point query
4. support mysql binary format
2023-01-20 13:33:01 +08:00
3ebc98228d [feature wip](multi catalog)Support iceberg schema evolution. (#15836)
Support iceberg schema evolution for parquet file format.
Iceberg use unique id for each column to support schema evolution.
To support this feature in Doris, FE side need to get the current column id for each column and send the ids to be side.
Be read column id from parquet key_value_metadata, set the changed column name in Block to match the name in parquet file before reading data. And set the name back after reading data.
2023-01-20 12:57:36 +08:00
ab4127d0b2 [Fix][regression-test] Fix test_hdfs_tvf.groovy by update HDFS conf URI to uri and better error msg handling. (#16029)
Fix test_hdfs_tvf.groovy by update HDFS conf URI to uri and better error msg handling.
test_hdfs_tvf.groovy didn't passed.
2023-01-20 12:40:25 +08:00
ba71516eba [feature](jdbc catalog) support SQLServer jdbc catalog (#16093) 2023-01-20 12:37:38 +08:00
60231454cc [fix](nereids) fix bug in multiply return data type (#15949) 2023-01-20 11:44:24 +08:00
2ed4eac6f8 [feature](Nereids) add scalar function width_bucket (#16106) 2023-01-20 11:31:17 +08:00
73621bdb18 [enhance](Nereids) process DELETE_SIGN_COLUMN of OlapTable(#16030)
1. add DELETE_SIGN_COLUMN in non-visible-columns in LogicalOlapScan
2. when the table has a delete sign, add a filter `delete_sign_coumn = 0`
3. use output slots and non-visible slots to bind slot
2023-01-20 11:27:35 +08:00
6485221ffb [Feature-WIP](inverted index)(bkd) Support try query before query bkd to improve query efficiency (#16075) 2023-01-20 11:19:36 +08:00
6c5470b163 [typo](docs) fix error path in ssb/tpch docs (#16108)
* fix docs error path

* commit suggestionns
2023-01-20 11:07:12 +08:00
5b2191a496 [fix](multi-catalog)Make ES catalog and resource compatible (#16096)
close #16099 

1. Make ES resource compatible with `username` property. Keep the same behavior with ES catalog.
2. Change ES catalog `username` to `user` to avoid confusion.
3. Add log in ESRestClient and make debug easier.
2023-01-20 09:31:57 +08:00
b230b704d2 [Update](thirdparty) omit BUILD_TYPE for clucene, ignore other build type except RelWithDebInfo (#16076)
omit BUILD_TYPE for clucene, ignore other build type except RelWithDebInfo
2023-01-20 09:10:58 +08:00
85c2060862 [Minor](Nereids): minor fix (#16095) 2023-01-20 00:53:11 +08:00
43dade0f68 [fix](doc): fix some spelling mistaks (#16097) 2023-01-19 23:31:45 +08:00
2018b49ef0 [opt](test) scalar_types_p0 use 100k lines dataset and scalar_types_p2 use 1000k (#16104) 2023-01-19 22:59:29 +08:00
69a3ecfd51 [Fix](inverted index) fix add nulls bug for inverted fulltext index (#16078)
We found a problem with inverted index when parser=english,
if there were nulls in columns when flushing inverted index for them, it can cause CLucene throwing an exception.
2023-01-19 21:21:44 +08:00
379ba73675 [enhance](nereids) tightestCommonType of datetime and datev2 is datev2 (#16086)
in original planner, tightestCommonType of datetime and datev2 is datev2.
make nereids compatible with original planner.
2023-01-19 19:55:19 +08:00
6cff651f71 [enhancement](statistics) add some methods to use histogram statistics (#15755)
1. Fix the histogram document
2. Add some methods for histogram statistics

TODO:
1. use histogram statistics for the optimizer
2023-01-19 19:20:18 +08:00
c1dd1fc331 [fix](nereids): fix all bugs in mergeGroup(). (#16079)
* [fix](Nereids): fix mergeGroup()

* polish code

* fix replace children of PhysicalEnforcer

* delete `deleteBestPlan`

* delete `getInputProperties`

* after merge GroupExpression, clear owner Group
2023-01-19 19:15:05 +08:00