doris

Author	SHA1	Message	Date
AKIRA	5ceb5441f4	[feature](nereids) let set operation syntax campatible with lagecy planner (#15664 ) Though this syntax doesn't get suppoted in many other systems since the order by clause here almost redandunt and useless but we have to keep consistent with the legacy doris syntax Here is a example: SELECT * FROM (SELECT k1, k3 FROM tbl1 ORDER BY k3 UNION ALL SELECT k1, k5 FROM tbl2) t;	2023-01-09 15:31:29 +08:00
Gabriel	2c9c7c48ac	[improvement](decimalv3) Java UDF and array type support DECIMALV3 (#15674 )	2023-01-09 15:13:16 +08:00
Mingyu Chen	211cc66d02	[fix](multi-catalog) fix image loading failture when create catalog with resource (#15692 ) Bug fix fix image loading failture when create catalog with resource When creating jdbc catalog with resource, the metadata image will failed to be loaded. Because when loading jdbc catalog image, it will try to get resource from ResourceMgr, but ResourceMgr has not been loaded, so NPE will be thrown. This PR fix this bug, and refactor some logic about catalog and resource. When loading jdbc catalog image, it will not get resource from ResourceMgr. And now user can create catalog with resource and properties, like: create catalog jdbc_catalog with resource jdbc_resource properites("user" = "user1"); The properties in "properties" clause will overwrite the properties in "jdbc_resource". force adding tinyInt1isBit=false to jdbc url The default value of tinyInt1isBit is true, and it will cause tinyint in mysql to be bit type. force adding tinyInt1isBit=false to jdbc url so that the tinyint in mysql will be tinyint in Doris. Avoid calculate checksum of jdbc driver jar multiple times Refactor Refactor the notification logic when updating properties in resource. When updating properties in resource, it will notify the corresponding catalog to update its own properties. This PR change this logic. After updating properties in resource, it will only uninitialize the catalog's internal objects such "jdbc client" or "hms client". And this objects will be re-initialized lazily. And all properties will be got from Resource at runtime, so that it will always get the latest properties Regression test cases Because we add tinyInt1isBit=false to jdbc url, some of cases need to be changed.	2023-01-09 09:56:26 +08:00
Pxl	1514b5ab5c	[Feature](Materialized-View) support advanced Materialized-View (#15212 )	2023-01-09 09:53:11 +08:00
ElvinWei	5dfdacd278	[enhancement](histogram) add histogram syntax and perstist histogram statistics (#15490 ) Histogram statistics are more expensive to collect and we collect and persist them separately. This PR does the following work: 1. Add histogram syntax and add keyword `TABLE` 2. Add the task of collecting histogram statistics 3. Persistent histogram statistics 4. Replace fastjson with gson 5. Add unit tests... Relevant syntax examples： > Refer to some databases such as mysql and add the keyword `TABLE`. ```SQL -- collect column statistics ANALYZE TABLE statistics_test; -- collect histogram statistics ANALYZE TABLE statistics_test UPDATE HISTOGRAM ON col1,col2; ``` base on #15317	2023-01-07 00:55:42 +08:00
AKIRA	7f84db310a	[fix](nereids) Convert to datetime when binary expr's left is date and right is int type (#15615 ) In the below case, expression ` date > 20200101` should implicit cast date both side to datetime instead of bigint ```sql CREATE TABLE `part_by_date` ( `date` date NOT NULL COMMENT '', `id` int(11) NOT NULL COMMENT '' ) ENGINE=OLAP UNIQUE KEY(`date`, `id`) PARTITION BY RANGE(`date`) (PARTITION p201912 VALUES [('0000-01-01'), ('2020-01-01')), PARTITION p202001 VALUES [('2020-01-01'), ('2020-02-01'))) DISTRIBUTED BY HASH(`id`) BUCKETS 3 PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); INSERT INTO part_by_date VALUES('0001-02-01', 1),('2020-01-15', 2); SELECT id FROM part_by_date WHERE date > 20200101; ```	2023-01-06 14:08:05 +08:00
Gabriel	b57500d0c3	[Bug](decimalv3) fix wrong result for MOD operation (#15644 )	2023-01-06 10:38:53 +08:00
Zhengguo Yang	77ffafb766	[vulnerability](CVE-2022-1292) fix CVE-2022-1292 (#15639 )	2023-01-05 21:57:16 +08:00
jakevin	d36b93708c	[feature](Nereids): add ExtractFilterFromJoin rule to support more (#14896 )	2023-01-05 19:09:43 +08:00
zhengshiJ	5460c873e8	[Feature] (Nereids) support un equals conjuncts in un scalar sub query (#15591 ) support un equals conjuncts in un scalar sub query. [fix] in correlated subquery wrong result	2023-01-05 16:56:14 +08:00
谢健	0dfa143140	[enhancement](Nereids) generate colocate join when property is different with require property (#15479 ) 1. When checking HashProperty which's type is nature, we only need to check whether the required properties contain all shuffle column 2. In ChildrenPropertiesRegulator.java, when colocate/buckte join is not allowed, we will enforce the required property.	2023-01-05 11:41:18 +08:00
deardeng	61d538c713	[improvement](storage-policy) Add check validity when create storage policy. (#14405 )	2023-01-04 22:24:49 +08:00
wxy	e0c56bcd20	[Feature](export) Support cancel export statement (#15128 ) Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>	2023-01-04 14:08:25 +08:00
morrySnow	7728794b4a	[fix](Nereids) SimplifyArithmeticRule generate wrong expression after process (#15580 ) in the case of 'a / b', if a is constant, after apple SimplifyArithmeticRule, expression will be convert to 'b * a' by mistake.	2023-01-04 11:10:15 +08:00
Shuo Wang	18bc354c06	[fix](Nereids) use correct column unique id when read data from non-base index (#15534 ) When light schema change is enabled by default, a column in OLAP scan is retrieved by column unique id instead of the column name. Columns with the same name would use different unique IDs among materialized indexes. This PR ensures that the column in the OLAP scan node could use the correct column unique id.	2023-01-04 01:41:25 +08:00
minghong	8d0c06c897	[fix](nereids) binding priority in agg-sort, having, group_by_key (#15240 ) This PR defines order_key and having_key binding priority. 1. order key priority ``` select col1 * -1 as col1 # inner_col1 * -1 as alias_col1 from t order by col1; # order by order_col1 ``` to bind `order_col1`, `alias_col1` has higher priority than `inner_col1` 2. having key priority ``` select (a-1) as a # inner_a - 1 as alias_a from bind_priority_tbl group by a having a=1; ``` to bind having key, `inner_a` has higher priority than `alias_a` 3. group by key binding priority ``` SELECT date_format(b.k10, '%Y%m%d') AS k10 FROM test a LEFT JOIN (SELECT k10 FROM baseall) b ON a.k10 = b.k10 GROUP BY k10; ``` group_by_key (k10) binding priority: - agg.child.output - agg.output if binding with agg.child.output failed(the slot not found, or more than one candidate slot found in agg.child.output), nereids try to bind group_by_key with agg.output. In above example, nereids found 2 candidate slots (a.k10, b.k10) in agg.child.output for group_by_key (k10), binding with agg.child.output failed. Then nereids try to bind group_by_key with agg.output, that is `date_format(b.k10, '%Y%m%d') AS k10`. and finally, group_by_key is bound with `alias k10`	2023-01-03 22:09:28 +08:00
starocean999	55dc541c90	[Fix](Nereids) aggregate function except COUNT should nullable without group by expr (#15547 ) Co-authored-by: mch_ucchi	2023-01-03 21:28:07 +08:00
morrySnow	a365486a25	[fix](Nereids) get datatype for binary arithmetic (#15548 ) it is just a temporary fix for binary arithmetic. Next we will refactor the TypeCoercion rule to make the behavior exactly same with Lagecy planner.	2023-01-03 19:09:48 +08:00
Mingyu Chen	02d035466b	[refactor] remove partition pruner v1 (#15552 ) partition pruner v1 is no longer used. Also remove session variable partition_prune_algorithm_version	2023-01-03 11:35:30 +08:00
zhannngchen	238ae54620	[fix](merge-on-write) unique key mow tables should require distribution columns be key column (#15535 ) * [fix](merge-on-write) unique key mow tables should require distribution columns be key column * fix code style	2023-01-01 15:53:21 +08:00
Mingyu Chen	e89adc6e1d	[fix](create-table) wrong judgement about partition column type (#15542 ) The following stmt should be success, but return error: `complex type cannt be partition column：ARRAY<VARCHAR(64)>` ``` create table test_array( task_insert_time BIGINT NOT NULL DEFAULT "0" COMMENT "" , task_project ARRAY<VARCHAR(64)> DEFAULT NULL COMMENT "" , route_key DATEV2 NOT NULL COMMENT "range分区键" ) DUPLICATE KEY(`task_insert_time`) COMMENT "" PARTITION BY RANGE(route_key) (PARTITION `p202209` VALUES LESS THAN ("2022-10-01"), PARTITION `p202210` VALUES LESS THAN ("2022-11-01"), PARTITION `p202211` VALUES LESS THAN ("2022-12-01")) DISTRIBUTED BY HASH(`task_insert_time` ) BUCKETS 32 PROPERTIES ( "replication_num" = "1", "light_schema_change" = "true" ); ``` This PR fix this	2022-12-31 13:10:39 +08:00
zhangstar333	c47bdf6606	[vectorized](jdbc) fix external table of oracle have keyworld column (#15487 ) if column name is keyword of oracle, the query will report error	2022-12-31 12:48:26 +08:00
starocean999	100834df8b	[fix](nereids) fix some arrgregate bugs in Nereids (#15326 ) 1. the agg function without distinct keyword should be a "merge" funcion in threePhaseAggregateWithDistinct 2. use aggregateParam.aggMode.consumeAggregateBuffer instead of aggregateParam.aggPhase.isGlobal() to indicate if a agg function is a "merge" function 3. add an AvgDistinctToSumDivCount rule to support avg(distinct xxx) in some case 4. AggregateExpression's nullable method should call inner function's nullable method. 5. add a bind slot rule to bind pattern "logicalSort(logicalHaving(logicalProject()))" 6. don't remove project node in PhysicalPlanTranslator 7. add a cast to bigint expr when count( distinct datelike type ) 8. fallback to old optimizer if bitmap runtime filter is enabled. 9. fix exchange node mem leak	2022-12-30 23:07:37 +08:00
starocean999	93a25e1af5	[fix](nereids) the project node is lost when creating PhysicalStorageLayerAggregate node (#15467 )	2022-12-30 16:33:24 +08:00
Shuo Wang	6c847daba0	[Feature](Nereids) Support grouping set for materialized index. (#15383 ) This PR adds support for materialized index selecting when the query has grouping sets.	2022-12-29 23:17:02 +08:00
minghong	dda505487c	[fix](nereids) SimplifyArithmeticRuleTest ut failed (#15486 ) this PR remove typeCoercion on expected expr in ExpressionRewriteTestHelper. Because we should not rewrite expected expr at all. It will change the expected expr unexpectedly.	2022-12-29 22:53:27 +08:00
weizuo93	79113b0cd1	[Fix](storage) Fix bug that cooldown time is error (#15444 ) Cooldown time is wrong for data in SSD, because cooldown time for all `table/partitionis` is only calculated once when class `DataProperty` loaded and that cannot be updated later. This patch is to ensure that cooldown time for each table/partition can be calculated in real time when table/partition is created. Co-authored-by: weizuo <weizuo@xiaomi.com>	2022-12-29 21:01:36 +08:00
Henry2SS	25b257e37c	[enhancement](session var) varariable to control whether to rewrite OR to IN or not (#15437 )	2022-12-29 14:50:32 +08:00
Kikyou1997	5b09d27d54	[feature-wip](nereids) Made decimal in nereids more complete (#15087 ) 1. Add IntegralDivide operator to support `DIV` semantics 2. Add more operator rewriter to keep expression type consistent between operators 3. Support the convertion between float type and decimal type. After this PR, below cases could be executed normaly like the legacy optimizer: use test_query_db; select k1, k5,100000*k5 from test order by k1, k2, k3, k4; select avg(k9) as a from test group by k1 having a < 100.0 order by a;	2022-12-29 13:01:47 +08:00
Jibing-Li	0e154feeb9	[feature](multi catalog nereids)Add file scan node to nereids. (#15201 ) Add file scan node to nereids, so that the new planner could support external hms table.	2022-12-29 10:31:11 +08:00
caiconghui	1b1083eb52	[fix](metric) fix prometheus metric format error for doris_fe_query_latency_ms (#15447 ) Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2022-12-29 08:51:15 +08:00
Gabriel	4336aaa01a	[bug](datetimev2) fix wrong info when show create table (#15422 ) * [bug](datetimev2) fix wrong info when show create table * update	2022-12-28 19:55:43 +08:00
Henry2SS	8ce62600dc	[Bug] #14876 && #15225 have some bugs in rewrite or to in, revert them (#15420 )	2022-12-28 13:30:09 +08:00
zhengshiJ	2af831de33	[Fix](Nereids)fix group by binding error, resulting in incorrect results (#15328 ) Original: group by is bound to the outputExpression of the current node. Problem: When the name of the new reference of outputExpression is the same as the child's output column, the child's output column should be used for group by, but at this time, the new reference of the node's outputExpression will be used for group by, resulting in an error Now: Give priority to the child's output for group by binding. If the child does not have a corresponding column, use the outputExpression of this node for binding	2022-12-28 10:42:21 +08:00
Mingyu Chen	28bb13a026	[feature](light-schema-change) enable light schema change by default (#15344 )	2022-12-28 09:29:26 +08:00
jakevin	5ac7b09765	[feature](Nereids) Support SchemaScan (#15411 ) such as: select * from information_schema.backends;	2022-12-28 00:33:48 +08:00
Henry2SS	0550dfaeb2	[enhancement](rewrite) add OrToIn rule and fix ExtractCommonFactorsRule apply problems (#12872 ) Co-authored-by: wuhangze <wuhangze@jd.com>	2022-12-27 18:39:53 +08:00
mch_ucchi	a07ca41f8e	[Fix](Nereids) fix repeat node nullable error bugs (#15251 )	2022-12-27 17:01:33 +08:00
ZenoYang	69068f9835	[fix](planner) fix hll_union plan: Invalid Aggregate Operator: hll_union (#14931 ) When using hll_union aggregate function, PREAGGREGATION is always OFF and Rollup cannot be hit.	2022-12-27 11:20:41 +08:00
Shuo Wang	325d247b92	[Feature](Nereids) Support hll and count for materialized index. (#15275 )	2022-12-27 00:38:04 +08:00
PF FOUR	650136c32e	[Enhancement](fe): replace assertTrue(X.equals(X)) with assertEquals (#15356 )	2022-12-27 00:37:24 +08:00
shee	ae87415174	[Feature](Nereids) add simplify arithmetic rule (#15242 ) support simplify arithmetic rule for example ： a + 1 > 1 => a > 0	2022-12-26 16:57:59 +08:00
AlexYue	1400a89065	[Bug](Compile) fix compile error by using correct method name (#15355 ) fix compile error by using correct method name	2022-12-26 14:58:01 +08:00
Mingyu Chen	8b6e4e74e7	[improvement](jdbc) add default jdbc driver's dir (#15346 ) Add a new config "jdbc_drivers_dir" for both FE and BE. User can put jdbc drivers' jar file in this dir, and only specify file name in "driver_url" properties when creating jdbc resource. And Doris will find jar files in this dir. Also modify the logic so that when the jdbc resource is modified, the corresponding jdbc table will get the latest properties.	2022-12-26 11:51:12 +08:00
shee	7b5739e9a9	[Fix](Nerids) fix dup key for pull predicate from project children (#15292 ) In InferPredicates, we need pull predicates from project children then use sid replace id1. In our code, use alias name as key, use expression as value to build map. Obviously, sid has two alias name(id1,id2) so throw Duplicate key exception.	2022-12-26 10:57:14 +08:00
yiguolei	a807978882	[refactor](non-vec) Remove rowbatch code from delta writer and some rowbatch related code (#15349 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-12-26 08:54:51 +08:00
Mingyu Chen	a291cb17be	[fix](information-schema) fix bug that query tables in information_schema db will return error #15336	2022-12-25 10:09:40 +08:00
morrySnow	27d64964e6	[enhancement](Nereids) cast expression to the type with parameters (#14657 )	2022-12-23 18:29:50 +08:00
mch_ucchi	4b7f279cf9	[Enhancement](Nereids) change expression to conjuncts in filter (#14807 )	2022-12-23 15:31:40 +08:00
ElvinWei	754fceafaf	[feature-wip](statistics) add aggregate function histogram and collect histogram statistics (#14910 ) Histogram statistics Currently doris collects statistics, but no histogram data, and by default the optimizer assumes that the different values of the columns are evenly distributed. This calculation can be problematic when the data distribution is skewed. So this pr implements the collection of histogram statistics. For columns containing data skew columns (columns with unevenly distributed data in the column), histogram statistics enable the optimizer to generate more accurate estimates of cardinality for filtering or join predicates involving these columns, resulting in a more precise execution plan. The optimization of the execution plan by histogram is mainly in two aspects: the selection of where condition and the selection of join order. The selection principle of the where condition is relatively simple: the histogram is used to calculate the selection rate of each predicate, and the filter with higher selection rate is preferred. The selection of join order is based on the estimation of the number of rows in the join result. In the case of uneven data distribution in the join condition columns, histogram can greatly improve the accuracy of the prediction of the number of rows in the join result. At the same time, if the number of rows of a bucket in one of the columns is 0, you can mark it and directly skip the bucket in the subsequent join process to improve efficiency. --- Histogram statistics are mainly collected by the histogram aggregation function, which is used as follows: Syntax ```SQL histogram(expr) ``` > The histogram function is used to describe the distribution of the data. It uses an "equal height" bucking strategy, and divides the data into buckets according to the value of the data. It describes each bucket with some simple data, such as the number of values that fall in the bucket. It is mainly used by the optimizer to estimate the range query. example ``` MySQL [test]> select histogram(login_time) from dev_table; +------------------------------------------------------------------------------------------------------------------------------+ \| histogram(`login_time`) \| +------------------------------------------------------------------------------------------------------------------------------+ \| {"bucket_size":5,"buckets":[{"lower":"2022-09-21 17:30:29","upper":"2022-09-21 22:30:29","count":9,"pre_sum":0,"ndv":1},...]}\| +------------------------------------------------------------------------------------------------------------------------------+ ``` description ```JSON { "bucket_size": 5, "buckets": [ { "lower": "2022-09-21 17:30:29", "upper": "2022-09-21 22:30:29", "count": 9, "pre_sum": 0, "ndv": 1 }, { "lower": "2022-09-22 17:30:29", "upper": "2022-09-22 22:30:29", "count": 10, "pre_sum": 9, "ndv": 1 }, { "lower": "2022-09-23 17:30:29", "upper": "2022-09-23 22:30:29", "count": 9, "pre_sum": 19, "ndv": 1 }, { "lower": "2022-09-24 17:30:29", "upper": "2022-09-24 22:30:29", "count": 9, "pre_sum": 28, "ndv": 1 }, { "lower": "2022-09-25 17:30:29", "upper": "2022-09-25 22:30:29", "count": 9, "pre_sum": 37, "ndv": 1 } ] } ``` TODO: - histogram func supports parameter and sample statistics (It's got another pr) - use histogram statistics - add p0 regression	2022-12-22 16:42:17 +08:00

1 2 3 4 5 ...

1011 Commits