Commit Graph

3183 Commits

Author SHA1 Message Date
36737fe9f4 [feature](Nereids): Add cache to avoid repeatly calculation in DPhyp (#14585) 2022-11-30 21:35:45 +08:00
9bbbcf031c [enhancement](k8s) Support fqdn mode for be in k8s enviroment (#9172)
In the k8s environment, the ip of the pod can be changed, but the hostname of pod is stable. When the host machine of the pod fails, the k8s can schedule the failed pod to the new host machine for reconstruction. After that, the newly created pod's hostname remains unchanged, and the ip address has been changed. The change of the be node's ip address can be detected by FQDNManager when enable_fqdn_mode is true

Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2022-11-30 20:42:15 +08:00
593a916ae6 [feature](nereids) split AggregateDisassemble into two rules (#14611)
# Proposed changes

Issue Number: close #14280

## Problem summary

The AggregateDisassemble rule is refactored and split into two rules, which are not dependent on each other.
1. AggregateDisassemble splits the agg into two phases: Local, Global.
1.1. For count function, the implementation is as follows:distinct_multi_count(update)+ distinct_multi_count(merge)

2. DistinctAggregateDisassemble splits the agg into 4 stages: Local, Global, Distinct Local, Distinct GLobal.
2.1. For count function, the implementation is as follows:distinct_multi_count(update)+ distinct_multi_count(merge)+sum(update)+ sum(merge)
2022-11-30 14:02:42 +08:00
3ca3af2234 [improvement](planner)sort show catalogs result by name (#14684)
Result of show databases, show tables, show data are all sorted by name, so make show catalogs behavior same.
2022-11-30 11:55:14 +08:00
3ff409551c [enhencement](netty) bind netty's default logger when launching fe (#14675)
The logger Doris Fe uses is log4j, while netty might use slf4j to choose one logger.
And it's reported some confusing occasions would happen under such circumstance.
And this binding doesn't take effect if move the bind logic to other file or other place within PaloFe.java,
so I have to leave it before the main function.
2022-11-30 11:54:39 +08:00
9272680d00 [feature](multi-catalog) support Jdbc catalog (#14527)
Issue Number: close #xxx

I add jdbc catalog for doris multi-catalog feature.
Currently, the jdbc catalog only supports MYSQL DBMS.

TODO:

support for postgre DB
Support for other databases.
Problem summary
For jdbc catalog, we can create catalog like:

CREATE CATALOG jdbc4 PROPERTIES (
    "type"="jdbc",
    "jdbc.user"="root",
    "jdbc.password"="123456",
    "jdbc.jdbc_url" = "jdbc:mysql://127.0.0.1:13396/demo?yearIsDateType=false",
    "jdbc.driver_url" = "file:/mnt/disk2/ftw/tools/jar/mysql-connector-java-5.1.47/mysql-connector-java-5.1.47.jar",
    "jdbc.driver_class" = "com.mysql.jdbc.Driver"
);
Note:
yearIsDateType is a param of jdbc:
If yearIsDateType configuration property is set to false, then the returned object type is java.sql.Short. If set to true (the default), then the returned object is of type java.sql.Date with the date set to January 1st, at midnight.
To compat with mysql, we force the use of yearIsDateType=false in FE. if user sets yearIsDateType=true, doris FE will force to change yearIsDateType=false.
2022-11-30 11:28:08 +08:00
82f3980774 [feature](Nereids) estimation without column statistics (#14526)
estimate plan cost without column statistics.
change list:
1. remove original StatsCalculator, it is replaced by StatsCalculatorV2. rename StatsCalculatorV2 to StatsCalculator
2. remove FilterSelectivityCalculator, it is replaced by FilterEstimation
3. remove session var:ENABLE_NEREIDS_STATS_DERIVE_V2
4. add ColumnStatistics.isUnKnown, which means the column is not analyzed, and its stats is not accurate.
5. add estimatedRowCount() function for OLAP tables
6. add unit tests for FilterEstimation and StatsCalculator
2022-11-30 11:27:51 +08:00
3a362fab76 [fix](fe)table function node use wrong info for projection (#14667) 2022-11-30 10:41:32 +08:00
ca90253b09 [config](storage-policy) add a FE config to disable storage policy by default (#14655)
the cold-hot separation feature is still
under development. And seems there are some unsolved feature remains.
So I add a fe config enable_storage_policy, and default is false, to disable the creation and usage of storage policy by default.

So that user can aware that he is using an experimental feature on his own, and it will not be released formally in v1.2.0.

Disable storage policy by default, user can not use or create storage policy. Configured by enable_storage_policy.

Remove property remote_storage_policy, it is duplicate with storage_policy

Change the persist field in DataProperty.java.
And remove remoteCooldownTime from DataProperty, because it can be got from StoragePolicy.
2022-11-30 10:04:33 +08:00
dd7ec8f4ca [improvement](test) add tpch1 orc for hive catalog and refactor some test dir (#14669)
Add tpch 1g orc test case in hive docker

Refactor some suites dir of catalog test cases.

And "-internal" for dlf endpoint, to support access oss with aliyun vpc.
2022-11-30 10:03:58 +08:00
4faca56819 [bug](jsonb) fix INSERT/CAST NULL to JSONB (#14682)
Add NULL -> JSONB in implicitCastMap to support INSERT/CAST NULL to JSONB.
2022-11-30 09:53:16 +08:00
d5ee721621 [improvement](planner)Adjust the field naming rules when creating tables (#14671)
Adjust the field naming rules when creating tables.

The original table field rules are letters or underscores or @ characters as the first letter,
followed by a maximum of 63 characters, and the total cannot exceed 64 characters.
However, in many industries, such as the financial industry, the length of the derived fields often exceeds 64 
characters, so adjust the regular The rules are from 64 characters to 128 characters.
Many users load data from Hive to Doris through appearance or BrokerLoad.
Arabic numerals can be used as the first letter in the Hive table, so the regular rules are adjusted
to support Arabic numerals as the first letter.
2022-11-30 09:45:27 +08:00
33cda9f22a [improvement](planner)support like in show catalogs stmt #14678
Co-authored-by: yuleiyang <yuleiyang@tencent.com>
2022-11-30 08:38:42 +08:00
33ad616839 [fix](statistics) Fix potential NPE in ShowStatisticsStmt #14679
When required cache hasn't been loaded yet, cache would always return ColumnStatistics.DEFAULT which not define the max/min literal expr, add judge for that.
2022-11-30 08:38:20 +08:00
85ce3c37b5 [fix](DOE) fix ES query dsl is wrong after FE restarted. (#14652)
Some of default properties of ES catalog is not persisted in EditLog. So when FE is restarted,
the default properties is lost, such as `elasticsearch.doc_value_scan`, `elasticsearch.keyword_sniff` and so on.
2022-11-29 17:06:48 +08:00
7a08a799e9 [Vectorized](function) support order by convert_to function (#14555) 2022-11-29 15:22:27 +08:00
facb7cf4e2 [fix](spark load)Temp partition with spark load (#14648)
* [fix](spark load)losing temporary partition item entry

* [fix](spark load)Temp partition with spark load
2022-11-29 15:21:44 +08:00
c5f9fd5619 [fix](spark load)partition column is not duplicate key, spark load IndexOutOfBounds error (#14661)
* [fix](spark load)partition column is not duplicate key,spark load IndexOutOfBoundsException error

Co-authored-by: 张放(vivianv.zhang) <vivianv.zhang@huolala.cn>
2022-11-29 15:21:21 +08:00
3e8b3658c7 [feature-wip](decimalv3) Support basic agg and arithmetic operations for decimal v3 (#14513) 2022-11-29 15:12:41 +08:00
97f0d3a756 [Improvement](datatype) disable new types if vectorized engine is disabled (#14561)
* [Imptovement](datatype) disable new types if vectorized engine is disabled

disable datev2/datetimev2/decimalv3 if vectorized engine is disabled
2022-11-29 10:33:46 +08:00
f7a827c06b [fix](new-scan) fix some bugs about new scan node and readers (#14504)
json reader DCHECK fail because of missing TYPE_STRING

fix bug that if no file is found, the tvf will throw NPE.

The predicate conjuncts can not be pushed down to parquet reader if this is a load task.
Because the predicate should be applied on column of dest table, not on column of source file.

Add a temp property "use_new_load_scan_node" of broker load to make regression test happy.
So that we can use new load scan node for a certain job and avoid setting global FE config.
2022-11-29 10:21:41 +08:00
wxy
2295ab24b0 [fix](metric) fix jvm_young_size_bytes. (#14562)
Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>
2022-11-29 09:10:48 +08:00
7513c82431 [NLJoin](conjuncts) separate join conjuncts and general conjuncts (#14608) 2022-11-29 08:55:54 +08:00
c5eb8ab084 [fix](persiste) make ArithmeticExpr wriable (#14615)
Fix bug that the ArithmeticExpr's write method is not implement, causing FE crash when creating function like:
CREATE ALIAS FUNCTION IF NOT EXISTS mesh_udf_test1(INT,INT) WITH PARAMETER(n,d) AS ROUND(1+floor(n/d));

Add if exists and if not exists for drop and create function

Fix a minor bug that if file does not exist, hdfs() table valued function will throw NPE
2022-11-29 08:55:18 +08:00
b51f6ae050 [feature](Nereids)add rule: PruneOlapScanTablet (#14378) 2022-11-29 01:06:14 +08:00
a803e75438 [feature](Nereids) add rule: EliminateGroupByConstants (#14541)
remove group by constants, like:
before apply rule:
select 1, k1, min(k2), max(k3) from t1 group by 1, 2; 
after apply rule:
select 1, k1, min(k2), max(k3) from t1 group by k1;
2022-11-28 22:52:24 +08:00
16bc20a357 [opt](nereids)Estimate cost by row, not by data size (#14471)
Since column data size is not always available, estimate plan cost by row count instead of data size.
2022-11-28 19:58:06 +08:00
529bdfb153 [Fix](function) Fix retention function return wrong value type (#14552)
MySQL [db]> SELECT SUM(a.r[1]) as active_user_num, SUM(a.r[2]) as active_user_num_1day, SUM(a.r[3]) as active_user_num_3day, SUM(a.r[4]) as active_user_num_7day FROM ( SELECT user_id, retention( day = '2022-11-01', day = '2022-11-02', day = '2022-11-04', day = '2022-11-07') as r FROM login_event WHERE (day >= '2022-11-01') AND (day <= '2022-11-21') GROUP BY user_id ) a;
ERROR 1105 (HY000): errCode = 2, detailMessage = sum requires a numeric parameter: sum(%element_extract%(a.r, 1))
2022-11-28 15:56:18 +08:00
c0e25a1c37 [fix](Nereids) diable unstable test in graph simplifier (#14630) 2022-11-28 14:07:14 +08:00
b9270dace3 [fix](nereids) after injection, min/max value in columnStats for date/dateV2 type is wrong (#14605) 2022-11-28 14:05:33 +08:00
b6605b99aa [ehancement](nereids) eliminate project in the post process phase (#14490)
Remove those projects that used for column pruning only and don't do any expression calculation, So that we could avoid some redundant data copy in do_projection of BE side.
2022-11-28 00:39:36 +08:00
280f8be4bd [test](regression) adjust nereids related regression cases under datev2 (#14578)
1. revert 14439, recovery dup&unique test cases
2. adjust nereids related case
2022-11-27 23:57:51 +08:00
230ede9085 [opt](nereids) avoid broadcast join if hash table is big (#14240)
1. when we choose broadcast join, we only consider transferring less data. This may lead OOM, if hash table is big enough.
2. fix a bug in `Stats.computeSize()`. ColumnStats.dataSize is the total size of this column, but we need the byte of one cell.
2022-11-27 23:22:43 +08:00
948ee41632 [opt](planner) let cardinality in explain result more readable (#14330)
1. add common for big int in explain. for example "1500000" will be printed as "1,500,000"
2. Scan node cardinal is missing
2022-11-27 23:12:41 +08:00
b3859e1e1a [ehancement](fe) Remove unnecessary kill in AutoCloseConnectContext (#14606)
The invocation in ConnectContext.kill in AutoCloseConnectContext is redundant and caused too many useless logs
2022-11-26 23:54:33 +08:00
36419fae48 [fix](JdbcExecutor) fix that JdbcExecutor did not load the class jar (#14598)
JdbcExecutor did not load jdbc driver jar, so add classloader to load jdbc jar.
2022-11-26 23:53:05 +08:00
064b8d2aa6 [fix](multi-catalog) fix coredump when querying partitioned hive table with text format (#14604)
BE will crash when querying partitioned hive table with text format
and put partition column at first of select items.

1. FE should use file slots to set the column mapping index of csv file.
2. BE should use `get_by_name` of block to get right column in a block in csv reader.
2022-11-26 11:42:40 +08:00
52c6ba051e [feature](jsonb type)refactor JSONB type using column and add testcase (#13778)
1. Refactor JSONB type using ColumnString instead making a copy.
2. Add regression testcase for JSONB load and functions.
2022-11-26 10:06:15 +08:00
2ae7dae925 [feature](nereids) Support row policy (#13879)
This pr did two things:
1. 【new logical plan】add **LogicalCheckPolicy** before UnboundRelation in LogicalPlanBuilder.
2. 【new rule】turn **LogicalCheckPolicy** to LogicalFilter if row policy exist, otherwise remove it.
2022-11-25 22:57:56 +08:00
494f35c26b [fuzzy](test) disable some fuzzy variables since it has bugs (#14583)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-11-25 21:15:10 +08:00
45fa2fc56b [fix](multi catalog)Use -1 as external es table column id instead of uniq id (#14557)
Using cache to store external table columns, doesn't persist uniq id for external columns anymore.
So use -1 as column id for ES external table.
Avoid non-master FE trying to get uniq id problem. The problem will cause non-master FE fail to write bdbje.
2022-11-25 16:13:16 +08:00
9630257704 [fix](Nereids): fix bugs in random construct join plan (#14575) 2022-11-25 16:05:29 +08:00
4728e75079 [feature](bitmap) Support in bitmap syntax and bitmap runtime filter (#14340)
1.Support in bitmap syntax, like 'where k1 in (select bitmap_column from tbl)';
2.Support bitmap runtime filter. Generate a bitmap filter using the right table bitmap and push it down to the left table storage layer for filtering.
2022-11-25 15:22:44 +08:00
5efdcb9ed0 [improvement](storage) For debugging problem: add session variable (#14576) 2022-11-25 14:16:00 +08:00
d5d356b17f [vectorized](function) support order by field function (#14528)
* [vectorized](function) support order by field function

* update

* update test
2022-11-25 14:00:46 +08:00
deef491e01 [fix](Nereids) refactor CTE and EliminateAliasNode and fix the bug that CTE reuse relationId (#14534)
This pr contribute:
- support explain CTE;
- refine CTE, fix the bug: reuse the same analyzed plan which LogicalOlapScan has the same relationId;
- change EliminateAliasNode to LogicalSubQueryAliasToLogicalProject and move to the top of rewrite stage, so we can simply observe the analyzed plan by the LogicalSubQueryAlias with alias;
- job traverse left child first, so the ExprId growth from left child to right child.
2022-11-25 10:54:53 +08:00
5ccc875824 [fix](recycle) refactor the logic of erase meta with same name (#14551)
in #14482, we implement the feature to keep specific number of meta with same name in catalog recycle bin.
But it will cause meta replay bug.
Because every time we drop db/table/partition, it will try to erase a certain number of meta with same name.
And when replay "drop" edit log, it will do same thing. But the number of meta to erase it based on current config value,
not persist in edit log, so it will cause inconsistency with "drop" and "replay drop".

In this PR, I move the "erase meta with same name" logic to the daemon thread of catalog recycle bin.
2022-11-25 09:47:24 +08:00
d12112b930 [fix](fe) Fix mem leaks (#14570)
1. Fix memory leaks in StmtExecutor::executeInternalQuery
2. Limit the number of concurrent running load task for statistics cache
2022-11-25 09:16:54 +08:00
9103ded1dd [improvement](join)optimize sharing hash table for broadcast join (#14371)
This PR is to make sharing hash table for broadcast more robust:

Add a session variable to enable/disable this function.
Do not block the hash join node's close function.
Use shared pointer to share hash table and runtime filter in broadcast join nodes.
The Hash join node that doesn't need to build the hash table will close the right child without reading any data(the child will close the corresponding sender).
2022-11-24 21:06:44 +08:00
59b31a03c4 [Improvement](agg function) support group_bit_and/group_bit_or/group_bit_xor functions (#14386) 2022-11-24 16:46:42 +08:00