doris

Author	SHA1	Message	Date
Mingyu Chen	ed96442b85	[fix](multi-catalog) fix persist issue about jdbc catalog and class loader issue #14794 Fix a bug that JDBC catalog/database/table should be add to GsonUtil Fix a class loader issue that sometime it will cause ClassNotFoundException Fix regression test to use different catalog name. Comment out 2 regression tests: regression-test/suites/query_p0/system/test_query_sys.groovy regression-test/suites/statistics/alter_col_stats.groovy Need to be fixed later	2022-12-05 09:05:13 +08:00
qiye	5be8f9432e	[fix](DOE) Support ES index which contains dynamic_templates (#14762 ) Support ES index with dynamic_templates. And do not support index mapping without explicit mapping.	2022-12-05 08:33:51 +08:00
Yulei-Yang	852b03729f	[Improvement](meta)add IsCurrent column in show catalogs result #14700 When a user has multiple catalogs and switch several times, he may forget which catalog is using. So I add a iscurrent column in show catalogs result for help. mysql> show catalogs; +-----------+-------------+----------+-----------+ \| CatalogId \| CatalogName \| Type \| IsCurrent \| +-----------+-------------+----------+-----------+ \| 136591 \| es \| es \| \| \| 130100 \| hive \| hms \| yes \| \| 0 \| internal \| internal \| \| +-----------+-------------+----------+-----------+	2022-12-05 08:32:16 +08:00
Mingyu Chen	ce95da8dfb	[improvement](multi-catalog) support specify hadoop username (#14734 ) Support setting "hadoop.username" property when creating hms catalog.	2022-12-04 21:09:39 +08:00
minghong	97dcd2b13a	[feature](nereids) merge proj-proj in post process (#14730 ) * merge proj-proj * v2this pr guarantees that the physical plan does not contains consecutive physical projects. Like rewrite rule "merge projects", it works on physical plan, not logical plan. * move merge-proj code into Project.java	2022-12-03 23:41:02 +08:00
Kikyou1997	283b23f6da	[fix](planner) wrong results when select from view which has with clause (#14747 )	2022-12-02 18:10:52 +08:00
HappenLee	12304bc0ee	[Pipeline](exec) Support pipeline exec engine (#14736 ) Co-authored-by: Lijia Liu <liutang123@yeah.net> Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: Jerry Hu <mrhhsg@gmail.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: shee <13843187+qzsee@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> ## Problem Summary: ### 1. Design DSIP: https://cwiki.apache.org/confluence/display/DORIS/DSIP-027%3A+Support+Pipeline+Exec+Engine ### 2. How to use: Set the environment variable `set enable_pipeline_engine = true; `	2022-12-02 17:11:34 +08:00
Gabriel	e9799fab09	[refactor](datev2) refine function expr for datev2 (#14697 ) * [refactor](datev2) refine function expr for datev2 * update	2022-12-02 10:13:11 +08:00
Yulei-Yang	228e9ed01c	[fix](improvement)(meta) fix alter catalog properties issues and reformat code (#14745 ) 1. fix NPE exception #14740 2. fix issue: mysql> alter catalog xyz set properties ('hive.metastore.uris'='thrift://172.21.0.1:7004'); ERROR 1105 (HY000): errCode = 2, detailMessage = Can't modify the type of catalog property with name: xyz 3. change behavior. The original logic is use props in set properties clause to replace all exists props, now change to only replace the listed props in set properties clause, and new props will be added. Make it behavior like alter table property stmt.	2022-12-02 09:34:13 +08:00
Kikyou1997	e5000c708e	[feature](statistics) Support for collecting statistics on materialized view (#14676 ) 1. Map muiltiple tasks to one Job 2. Remove the codes for analyzing whole default db, since this feature is not available and would create too many tasks and related code is confusing 3. support analyze materialized view 4. abstract the common logic to BaseTask	2022-12-01 22:34:13 +08:00
minghong	2be8235d95	[feature](nereids) support timestampdiff function (#14662 ) complete timeStampDiff supported timeunit: - YEAR - MONTH - WEEK - DAY - HOUR - MINUTE - SECOND	2022-12-01 22:11:55 +08:00
mch_ucchi	14e208354d	[Feature](Nereids) support nereids event for logging the cascades states and transformation. (#13659 ) Add an event producer, channel, consumer system to support the feature as title and you can turn it on using set enable_nereids_event = true; For more information, please see fe/fe-core/src/main/java/org/apache/doris/nereids/metrics/README.md	2022-12-01 21:42:40 +08:00
谢健	302da03b18	[enhancement](Nereids): Use long bitmap in DPHyp (#14725 )	2022-12-01 20:47:45 +08:00
Gabriel	9dd1d989e8	[test](decimalv3) add regression test cases for decimalv3 (#14672 )	2022-12-01 15:18:40 +08:00
Mingyu Chen	f496d1972a	[improvement](multi-catalog) return root cause of exception (#14708 )	2022-12-01 14:58:05 +08:00
morrySnow	3c6b96b9be	[enhancement](Nereids) avoid add project that output same with child to memo (#14180 )	2022-12-01 10:49:44 +08:00
谢健	36737fe9f4	[feature](Nereids): Add cache to avoid repeatly calculation in DPhyp (#14585 )	2022-11-30 21:35:45 +08:00
caiconghui	9bbbcf031c	[enhancement](k8s) Support fqdn mode for be in k8s enviroment (#9172 ) In the k8s environment, the ip of the pod can be changed, but the hostname of pod is stable. When the host machine of the pod fails, the k8s can schedule the failed pod to the new host machine for reconstruction. After that, the newly created pod's hostname remains unchanged, and the ip address has been changed. The change of the be node's ip address can be detected by FQDNManager when enable_fqdn_mode is true Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2022-11-30 20:42:15 +08:00
yinzhijian	593a916ae6	[feature](nereids) split AggregateDisassemble into two rules (#14611 ) # Proposed changes Issue Number: close #14280 ## Problem summary The AggregateDisassemble rule is refactored and split into two rules, which are not dependent on each other. 1. AggregateDisassemble splits the agg into two phases: Local, Global. 1.1. For count function, the implementation is as follows：distinct_multi_count(update)+ distinct_multi_count(merge) 2. DistinctAggregateDisassemble splits the agg into 4 stages: Local, Global, Distinct Local, Distinct GLobal. 2.1. For count function, the implementation is as follows：distinct_multi_count(update)+ distinct_multi_count(merge)+sum(update)+ sum(merge)	2022-11-30 14:02:42 +08:00
Yulei-Yang	3ca3af2234	[improvement](planner)sort show catalogs result by name (#14684 ) Result of show databases, show tables, show data are all sorted by name, so make show catalogs behavior same.	2022-11-30 11:55:14 +08:00
AlexYue	3ff409551c	[enhencement](netty) bind netty's default logger when launching fe (#14675 ) The logger Doris Fe uses is log4j, while netty might use slf4j to choose one logger. And it's reported some confusing occasions would happen under such circumstance. And this binding doesn't take effect if move the bind logic to other file or other place within PaloFe.java, so I have to leave it before the main function.	2022-11-30 11:54:39 +08:00
Tiewei Fang	9272680d00	[feature](multi-catalog) support Jdbc catalog (#14527 ) Issue Number: close #xxx I add jdbc catalog for doris multi-catalog feature. Currently, the jdbc catalog only supports MYSQL DBMS. TODO: support for postgre DB Support for other databases. Problem summary For jdbc catalog, we can create catalog like: CREATE CATALOG jdbc4 PROPERTIES ( "type"="jdbc", "jdbc.user"="root", "jdbc.password"="123456", "jdbc.jdbc_url" = "jdbc:mysql://127.0.0.1:13396/demo?yearIsDateType=false", "jdbc.driver_url" = "file:/mnt/disk2/ftw/tools/jar/mysql-connector-java-5.1.47/mysql-connector-java-5.1.47.jar", "jdbc.driver_class" = "com.mysql.jdbc.Driver" ); Note: yearIsDateType is a param of jdbc: If yearIsDateType configuration property is set to false, then the returned object type is java.sql.Short. If set to true (the default), then the returned object is of type java.sql.Date with the date set to January 1st, at midnight. To compat with mysql, we force the use of yearIsDateType=false in FE. if user sets yearIsDateType=true, doris FE will force to change yearIsDateType=false.	2022-11-30 11:28:08 +08:00
minghong	82f3980774	[feature](Nereids) estimation without column statistics (#14526 ) estimate plan cost without column statistics. change list: 1. remove original StatsCalculator, it is replaced by StatsCalculatorV2. rename StatsCalculatorV2 to StatsCalculator 2. remove FilterSelectivityCalculator, it is replaced by FilterEstimation 3. remove session var:ENABLE_NEREIDS_STATS_DERIVE_V2 4. add ColumnStatistics.isUnKnown, which means the column is not analyzed, and its stats is not accurate. 5. add estimatedRowCount() function for OLAP tables 6. add unit tests for FilterEstimation and StatsCalculator	2022-11-30 11:27:51 +08:00
starocean999	3a362fab76	[fix](fe)table function node use wrong info for projection (#14667 )	2022-11-30 10:41:32 +08:00
Mingyu Chen	ca90253b09	[config](storage-policy) add a FE config to disable storage policy by default (#14655 ) the cold-hot separation feature is still under development. And seems there are some unsolved feature remains. So I add a fe config enable_storage_policy, and default is false, to disable the creation and usage of storage policy by default. So that user can aware that he is using an experimental feature on his own, and it will not be released formally in v1.2.0. Disable storage policy by default, user can not use or create storage policy. Configured by enable_storage_policy. Remove property remote_storage_policy, it is duplicate with storage_policy Change the persist field in DataProperty.java. And remove remoteCooldownTime from DataProperty, because it can be got from StoragePolicy.	2022-11-30 10:04:33 +08:00
Mingyu Chen	dd7ec8f4ca	[improvement](test) add tpch1 orc for hive catalog and refactor some test dir (#14669 ) Add tpch 1g orc test case in hive docker Refactor some suites dir of catalog test cases. And "-internal" for dlf endpoint, to support access oss with aliyun vpc.	2022-11-30 10:03:58 +08:00
Kang	4faca56819	[bug](jsonb) fix INSERT/CAST NULL to JSONB (#14682 ) Add NULL -> JSONB in implicitCastMap to support INSERT/CAST NULL to JSONB.	2022-11-30 09:53:16 +08:00
FreeOnePlus	d5ee721621	[improvement](planner)Adjust the field naming rules when creating tables (#14671 ) Adjust the field naming rules when creating tables. The original table field rules are letters or underscores or @ characters as the first letter, followed by a maximum of 63 characters, and the total cannot exceed 64 characters. However, in many industries, such as the financial industry, the length of the derived fields often exceeds 64 characters, so adjust the regular The rules are from 64 characters to 128 characters. Many users load data from Hive to Doris through appearance or BrokerLoad. Arabic numerals can be used as the first letter in the Hive table, so the regular rules are adjusted to support Arabic numerals as the first letter.	2022-11-30 09:45:27 +08:00
Yulei-Yang	33cda9f22a	[improvement](planner)support like in show catalogs stmt #14678 Co-authored-by: yuleiyang <yuleiyang@tencent.com>	2022-11-30 08:38:42 +08:00
Kikyou1997	33ad616839	[fix](statistics) Fix potential NPE in ShowStatisticsStmt #14679 When required cache hasn't been loaded yet, cache would always return ColumnStatistics.DEFAULT which not define the max/min literal expr, add judge for that.	2022-11-30 08:38:20 +08:00
qiye	85ce3c37b5	[fix](DOE) fix ES query dsl is wrong after FE restarted. (#14652 ) Some of default properties of ES catalog is not persisted in EditLog. So when FE is restarted, the default properties is lost, such as `elasticsearch.doc_value_scan`, `elasticsearch.keyword_sniff` and so on.	2022-11-29 17:06:48 +08:00
zhangstar333	7a08a799e9	[Vectorized](function) support order by convert_to function (#14555 )	2022-11-29 15:22:27 +08:00
xiaoDjun	facb7cf4e2	[fix](spark load)Temp partition with spark load (#14648 ) * [fix](spark load)losing temporary partition item entry * [fix](spark load)Temp partition with spark load	2022-11-29 15:21:44 +08:00
Gabriel	3e8b3658c7	[feature-wip](decimalv3) Support basic agg and arithmetic operations for decimal v3 (#14513 )	2022-11-29 15:12:41 +08:00
Gabriel	97f0d3a756	[Improvement](datatype) disable new types if vectorized engine is disabled (#14561 ) * [Imptovement](datatype) disable new types if vectorized engine is disabled disable datev2/datetimev2/decimalv3 if vectorized engine is disabled	2022-11-29 10:33:46 +08:00
lsy3993	f7a827c06b	[fix](new-scan) fix some bugs about new scan node and readers (#14504 ) json reader DCHECK fail because of missing TYPE_STRING fix bug that if no file is found, the tvf will throw NPE. The predicate conjuncts can not be pushed down to parquet reader if this is a load task. Because the predicate should be applied on column of dest table, not on column of source file. Add a temp property "use_new_load_scan_node" of broker load to make regression test happy. So that we can use new load scan node for a certain job and avoid setting global FE config.	2022-11-29 10:21:41 +08:00
wxy	2295ab24b0	[fix](metric) fix jvm_young_size_bytes. (#14562 ) Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>	2022-11-29 09:10:48 +08:00
Gabriel	7513c82431	[NLJoin](conjuncts) separate join conjuncts and general conjuncts (#14608 )	2022-11-29 08:55:54 +08:00
Mingyu Chen	c5eb8ab084	[fix](persiste) make ArithmeticExpr wriable (#14615 ) Fix bug that the ArithmeticExpr's write method is not implement, causing FE crash when creating function like: CREATE ALIAS FUNCTION IF NOT EXISTS mesh_udf_test1(INT,INT) WITH PARAMETER(n,d) AS ROUND(1+floor(n/d)); Add if exists and if not exists for drop and create function Fix a minor bug that if file does not exist, hdfs() table valued function will throw NPE	2022-11-29 08:55:18 +08:00
mch_ucchi	b51f6ae050	[feature](Nereids)add rule: PruneOlapScanTablet (#14378 )	2022-11-29 01:06:14 +08:00
mch_ucchi	a803e75438	[feature](Nereids) add rule: EliminateGroupByConstants (#14541 ) remove group by constants, like: before apply rule: select 1, k1, min(k2), max(k3) from t1 group by 1, 2; after apply rule: select 1, k1, min(k2), max(k3) from t1 group by k1;	2022-11-28 22:52:24 +08:00
minghong	16bc20a357	[opt](nereids)Estimate cost by row, not by data size (#14471 ) Since column data size is not always available, estimate plan cost by row count instead of data size.	2022-11-28 19:58:06 +08:00
abmdocrt	529bdfb153	[Fix](function) Fix retention function return wrong value type (#14552 ) MySQL [db]> SELECT SUM(a.r[1]) as active_user_num, SUM(a.r[2]) as active_user_num_1day, SUM(a.r[3]) as active_user_num_3day, SUM(a.r[4]) as active_user_num_7day FROM ( SELECT user_id, retention( day = '2022-11-01', day = '2022-11-02', day = '2022-11-04', day = '2022-11-07') as r FROM login_event WHERE (day >= '2022-11-01') AND (day <= '2022-11-21') GROUP BY user_id ) a; ERROR 1105 (HY000): errCode = 2, detailMessage = sum requires a numeric parameter: sum(%element_extract%(a.r, 1))	2022-11-28 15:56:18 +08:00
谢健	c0e25a1c37	[fix](Nereids) diable unstable test in graph simplifier (#14630 )	2022-11-28 14:07:14 +08:00
minghong	b9270dace3	[fix](nereids) after injection, min/max value in columnStats for date/dateV2 type is wrong (#14605 )	2022-11-28 14:05:33 +08:00
Kikyou1997	b6605b99aa	[ehancement](nereids) eliminate project in the post process phase (#14490 ) Remove those projects that used for column pruning only and don't do any expression calculation, So that we could avoid some redundant data copy in do_projection of BE side.	2022-11-28 00:39:36 +08:00
minghong	280f8be4bd	[test](regression) adjust nereids related regression cases under datev2 (#14578 ) 1. revert 14439, recovery dup&unique test cases 2. adjust nereids related case	2022-11-27 23:57:51 +08:00
minghong	230ede9085	[opt](nereids) avoid broadcast join if hash table is big (#14240 ) 1. when we choose broadcast join, we only consider transferring less data. This may lead OOM, if hash table is big enough. 2. fix a bug in `Stats.computeSize()`. ColumnStats.dataSize is the total size of this column, but we need the byte of one cell.	2022-11-27 23:22:43 +08:00
minghong	948ee41632	[opt](planner) let cardinality in explain result more readable (#14330 ) 1. add common for big int in explain. for example "1500000" will be printed as "1,500,000" 2. Scan node cardinal is missing	2022-11-27 23:12:41 +08:00
Kikyou1997	b3859e1e1a	[ehancement](fe) Remove unnecessary kill in AutoCloseConnectContext (#14606 ) The invocation in ConnectContext.kill in AutoCloseConnectContext is redundant and caused too many useless logs	2022-11-26 23:54:33 +08:00

1 2 3 4 5 ...

2108 Commits