doris

Author	SHA1	Message	Date
Mingyu Chen	211cc66d02	[fix](multi-catalog) fix image loading failture when create catalog with resource (#15692 ) Bug fix fix image loading failture when create catalog with resource When creating jdbc catalog with resource, the metadata image will failed to be loaded. Because when loading jdbc catalog image, it will try to get resource from ResourceMgr, but ResourceMgr has not been loaded, so NPE will be thrown. This PR fix this bug, and refactor some logic about catalog and resource. When loading jdbc catalog image, it will not get resource from ResourceMgr. And now user can create catalog with resource and properties, like: create catalog jdbc_catalog with resource jdbc_resource properites("user" = "user1"); The properties in "properties" clause will overwrite the properties in "jdbc_resource". force adding tinyInt1isBit=false to jdbc url The default value of tinyInt1isBit is true, and it will cause tinyint in mysql to be bit type. force adding tinyInt1isBit=false to jdbc url so that the tinyint in mysql will be tinyint in Doris. Avoid calculate checksum of jdbc driver jar multiple times Refactor Refactor the notification logic when updating properties in resource. When updating properties in resource, it will notify the corresponding catalog to update its own properties. This PR change this logic. After updating properties in resource, it will only uninitialize the catalog's internal objects such "jdbc client" or "hms client". And this objects will be re-initialized lazily. And all properties will be got from Resource at runtime, so that it will always get the latest properties Regression test cases Because we add tinyInt1isBit=false to jdbc url, some of cases need to be changed.	2023-01-09 09:56:26 +08:00
Pxl	1514b5ab5c	[Feature](Materialized-View) support advanced Materialized-View (#15212 )	2023-01-09 09:53:11 +08:00
caiconghui	97cea9b5c9	[improvement](bdbje) add more log to make bdbje DatabaseNotFoundException problem easily solved (#15715 ) Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2023-01-09 08:55:21 +08:00
wxy	6829d361cb	[Feature](audit) add errorCode and errorMessage in audit log (#14925 ) * [feat] add errorCode and errorMessage in audit log. * [Feature](audit) add errorCode and errorMessage in audit log Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>	2023-01-09 08:47:57 +08:00
Mingyu Chen	f256bb8d39	[fix](meta) fix priv table load bug when upgrading to 1.2.x (#15706 ) In old version, NODE_PRIV will be incorrectly assigned to normal users. So when upgrading to 1.2.x, it will failed to handle this unexpected case. This PR fix this by removing NODE_PRIV from normal user.	2023-01-09 08:38:26 +08:00
ElvinWei	36590da24b	[fix](regression p0) add the alias function hist to histogram and fix p0 (#15708 ) add the alias function hist to histogram and fix p0	2023-01-08 11:31:23 +08:00
Mingyu Chen	500c7fb702	[improvement](multi-catalog) support unsupported column type (#15660 ) When creating an external catalog, Doris will automatically sync the schema of table from external catalog. But some of column type are not supported by Doris now, such as struct, map, etc. In previous, when meeting these unsupported column, Doris will throw an exception, and the corresponding table can not be synced. But user may just want to query other supported columns. In this PR, I add a new column type: UNSUPPORTED. And now it is just used for external table schema sync. When meeting unsupported column, it will be synced as column with UNSUPPORTED type. When query this table, there are serval situation: select * from table: throw error Unsupported type 'UNSUPPORTED_TYPE' xxx select k1 from table: k1 is with supported type. query OK. select * except(k2): k2 is with unsupported type. query OK	2023-01-08 10:07:10 +08:00
ElvinWei	5dfdacd278	[enhancement](histogram) add histogram syntax and perstist histogram statistics (#15490 ) Histogram statistics are more expensive to collect and we collect and persist them separately. This PR does the following work: 1. Add histogram syntax and add keyword `TABLE` 2. Add the task of collecting histogram statistics 3. Persistent histogram statistics 4. Replace fastjson with gson 5. Add unit tests... Relevant syntax examples： > Refer to some databases such as mysql and add the keyword `TABLE`. ```SQL -- collect column statistics ANALYZE TABLE statistics_test; -- collect histogram statistics ANALYZE TABLE statistics_test UPDATE HISTOGRAM ON col1,col2; ``` base on #15317	2023-01-07 00:55:42 +08:00
ElvinWei	76ad599fd7	[enhancement](histogram) optimise aggregate function histogram (#15317 ) This pr mainly to optimize the histogram(👉🏻 https://github.com/apache/doris/pull/14910) aggregation function. Including the following: 1. Support input parameters `sample_rate` and `max_bucket_num` 2. Add UT and regression test 3. Add documentation 4. Optimize function implementation logic Parameter description： - `sample_rate`：Optional. The proportion of sample data used to generate the histogram. The default is 0.2. - `max_bucket_num`：Optional. Limit the number of histogram buckets. The default value is 128. --- Example： ``` MySQL [test]> SELECT histogram(c_float) FROM histogram_test; +-------------------------------------------------------------------------------------------------------------------------------------+ \| histogram(`c_float`) \| +-------------------------------------------------------------------------------------------------------------------------------------+ \| {"sample_rate":0.2,"max_bucket_num":128,"bucket_num":3,"buckets":[{"lower":"0.1","upper":"0.1","count":1,"pre_sum":0,"ndv":1},...]} \| +-------------------------------------------------------------------------------------------------------------------------------------+ MySQL [test]> SELECT histogram(c_string, 0.5, 2) FROM histogram_test; +-------------------------------------------------------------------------------------------------------------------------------------+ \| histogram(`c_string`) \| +-------------------------------------------------------------------------------------------------------------------------------------+ \| {"sample_rate":0.5,"max_bucket_num":2,"bucket_num":2,"buckets":[{"lower":"str1","upper":"str7","count":4,"pre_sum":0,"ndv":3},...]} \| +-------------------------------------------------------------------------------------------------------------------------------------+ ``` Query result description： ``` { "sample_rate": 0.2, "max_bucket_num": 128, "bucket_num": 3, "buckets": [ { "lower": "0.1", "upper": "0.2", "count": 2, "pre_sum": 0, "ndv": 2 }, { "lower": "0.8", "upper": "0.9", "count": 2, "pre_sum": 2, "ndv": 2 }, { "lower": "1.0", "upper": "1.0", "count": 2, "pre_sum": 4, "ndv": 1 } ] } ``` Field description： - sample_rate：Rate of sampling - max_bucket_num：Limit the maximum number of buckets - bucket_num：The actual number of buckets - buckets：All buckets - lower：Upper bound of the bucket - upper：Lower bound of the bucket - count：The number of elements contained in the bucket - pre_sum：The total number of elements in the front bucket - ndv：The number of different values in the bucket > Total number of histogram elements = number of elements in the last bucket(count) + total number of elements in the previous bucket(pre_sum).	2023-01-07 00:50:32 +08:00
谢健	9c8fcd805c	[feature](Nereids) support variable type expression (#15659 )	2023-01-07 00:32:57 +08:00
mch_ucchi	08d439cde7	[feature](Nereids) add keyword rlike (#15647 )	2023-01-07 00:28:21 +08:00
luozenglin	53559e2bdc	[fix](decimalv2) fix loss of precision when cast to decimalv2 literal (#15629 )	2023-01-06 16:02:46 +08:00
Jerry Hu	9c36278c4a	[improvement](pipeline) Support sharing hash table for broadcast join (#15628 )	2023-01-06 15:11:28 +08:00
AKIRA	7f84db310a	[fix](nereids) Convert to datetime when binary expr's left is date and right is int type (#15615 ) In the below case, expression ` date > 20200101` should implicit cast date both side to datetime instead of bigint ```sql CREATE TABLE `part_by_date` ( `date` date NOT NULL COMMENT '', `id` int(11) NOT NULL COMMENT '' ) ENGINE=OLAP UNIQUE KEY(`date`, `id`) PARTITION BY RANGE(`date`) (PARTITION p201912 VALUES [('0000-01-01'), ('2020-01-01')), PARTITION p202001 VALUES [('2020-01-01'), ('2020-02-01'))) DISTRIBUTED BY HASH(`id`) BUCKETS 3 PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); INSERT INTO part_by_date VALUES('0001-02-01', 1),('2020-01-15', 2); SELECT id FROM part_by_date WHERE date > 20200101; ```	2023-01-06 14:08:05 +08:00
谢健	ae77b582f0	[fix](Nereids) add information function and fix bugs in schemaScan (#15608 ) 1. Add information function - Database() - User() - Current_User() - Connection_id() 2. Fix bugs in schemaScan	2023-01-06 13:37:27 +08:00
谢健	ef72b8d859	[Feature](Nereids): add logical operator \|\| && (#15643 )	2023-01-06 12:18:21 +08:00
Tiewei Fang	df2da89b89	[feature](multi-catalog) support postgresql jdbc catalog (#15570 ) support postgresql jdbc catalog	2023-01-06 11:00:59 +08:00
Gabriel	b57500d0c3	[Bug](decimalv3) fix wrong result for MOD operation (#15644 )	2023-01-06 10:38:53 +08:00
jakevin	6d691edcc7	[fix](Nereids): restrict join reorder project. (#15645 )	2023-01-06 00:18:05 +08:00
Zhengguo Yang	77ffafb766	[vulnerability](CVE-2022-1292) fix CVE-2022-1292 (#15639 )	2023-01-05 21:57:16 +08:00
Kang	9d1f02c580	[Improvement](topn) runtime prune for topn query (#15558 )	2023-01-05 20:10:12 +08:00
jakevin	d36b93708c	[feature](Nereids): add ExtractFilterFromJoin rule to support more (#14896 )	2023-01-05 19:09:43 +08:00
zhengshiJ	5460c873e8	[Feature] (Nereids) support un equals conjuncts in un scalar sub query (#15591 ) support un equals conjuncts in un scalar sub query. [fix] in correlated subquery wrong result	2023-01-05 16:56:14 +08:00
Gabriel	5ee479f45c	[Pipeline](load) Support transaction on pipeline engine (#15597 )	2023-01-05 15:59:18 +08:00
谢健	0dfa143140	[enhancement](Nereids) generate colocate join when property is different with require property (#15479 ) 1. When checking HashProperty which's type is nature, we only need to check whether the required properties contain all shuffle column 2. In ChildrenPropertiesRegulator.java, when colocate/buckte join is not allowed, we will enforce the required property.	2023-01-05 11:41:18 +08:00
camby	59f34be41f	[fix](having-clause) having clause do not works correct with same alias name (#15143 )	2023-01-05 10:15:15 +08:00
Gabriel	5ff5b8fc98	[feature](mark join) Support mark join for hash join node (#15569 ) * [feature](mark join) Support mark join for hash join node	2023-01-05 09:32:26 +08:00
deardeng	61d538c713	[improvement](storage-policy) Add check validity when create storage policy. (#14405 )	2023-01-04 22:24:49 +08:00
deardeng	7ef3940809	[fix](storage-policy) fix some bug (#15585 ) 1. fix datetime ms transfer to s bug 2. fix alter storage policy notify be missing field(datetime, ttl) 3. support alter storage policy use "h, hour, d, day" as ttl filed	2023-01-04 16:49:51 +08:00
luozenglin	c42c61dcad	[fix](bitmapfilter) fix bitmap filter not pushing down (#15532 )	2023-01-04 14:33:53 +08:00
luozenglin	a4af1fbf90	[fix](inbitmap) forbid having clause to include in bitmap. (#15494 )	2023-01-04 14:33:18 +08:00
wxy	e0c56bcd20	[Feature](export) Support cancel export statement (#15128 ) Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>	2023-01-04 14:08:25 +08:00
morrySnow	7728794b4a	[fix](Nereids) SimplifyArithmeticRule generate wrong expression after process (#15580 ) in the case of 'a / b', if a is constant, after apple SimplifyArithmeticRule, expression will be convert to 'b * a' by mistake.	2023-01-04 11:10:15 +08:00
AKIRA	f2f06c1acc	[feature](nereids) Support select temp partition (#15579 ) Support such grammer: select * from t_p temporary partition(tp1); select * from t_p temporary partitions(tp1); select * from t_p temporary partition tp1;	2023-01-04 11:04:36 +08:00
Gabriel	eef1f432dd	[Bug](datetimev2/decimalv3) Fix wrong predicate infer rule (#15574 )	2023-01-04 10:03:43 +08:00
starocean999	a97f582b93	[fix](nereids) use DAYS as default unit for DATE_ADD and DATE_SUB function (#15559 )	2023-01-04 01:55:15 +08:00
Shuo Wang	18bc354c06	[fix](Nereids) use correct column unique id when read data from non-base index (#15534 ) When light schema change is enabled by default, a column in OLAP scan is retrieved by column unique id instead of the column name. Columns with the same name would use different unique IDs among materialized indexes. This PR ensures that the column in the OLAP scan node could use the correct column unique id.	2023-01-04 01:41:25 +08:00
minghong	8d0c06c897	[fix](nereids) binding priority in agg-sort, having, group_by_key (#15240 ) This PR defines order_key and having_key binding priority. 1. order key priority ``` select col1 * -1 as col1 # inner_col1 * -1 as alias_col1 from t order by col1; # order by order_col1 ``` to bind `order_col1`, `alias_col1` has higher priority than `inner_col1` 2. having key priority ``` select (a-1) as a # inner_a - 1 as alias_a from bind_priority_tbl group by a having a=1; ``` to bind having key, `inner_a` has higher priority than `alias_a` 3. group by key binding priority ``` SELECT date_format(b.k10, '%Y%m%d') AS k10 FROM test a LEFT JOIN (SELECT k10 FROM baseall) b ON a.k10 = b.k10 GROUP BY k10; ``` group_by_key (k10) binding priority: - agg.child.output - agg.output if binding with agg.child.output failed(the slot not found, or more than one candidate slot found in agg.child.output), nereids try to bind group_by_key with agg.output. In above example, nereids found 2 candidate slots (a.k10, b.k10) in agg.child.output for group_by_key (k10), binding with agg.child.output failed. Then nereids try to bind group_by_key with agg.output, that is `date_format(b.k10, '%Y%m%d') AS k10`. and finally, group_by_key is bound with `alias k10`	2023-01-03 22:09:28 +08:00
starocean999	55dc541c90	[Fix](Nereids) aggregate function except COUNT should nullable without group by expr (#15547 ) Co-authored-by: mch_ucchi	2023-01-03 21:28:07 +08:00
morrySnow	a365486a25	[fix](Nereids) get datatype for binary arithmetic (#15548 ) it is just a temporary fix for binary arithmetic. Next we will refactor the TypeCoercion rule to make the behavior exactly same with Lagecy planner.	2023-01-03 19:09:48 +08:00
zhengshiJ	1dabcb0111	[Fix](Nereids) fix except and intersect error for statsCalculator (#15557 ) When calculating the statsCalculator of except and intersect, the slotId of the corresponding column was not replaced with the slotId of output, resulting in NPE.	2023-01-03 17:06:57 +08:00
starocean999	8748f65a1b	[fix](nereids)support nulls first/last in order by clause (#15530 )	2023-01-03 14:56:00 +08:00
zhangdong	893f5f9345	[feature-wip](multi-catalog) support automatic sync hive metastore events (#15401 ) Poll metastore for create/alter/drop operations on database, table, partition events at a given frequency. By observing such events, we can take appropriate action on the (refresh/invalidate/add/remove) so that represents the latest information available in metastore. We keep track of the last synced event id in each polling iteration so the next batch can be requested appropriately.	2023-01-03 13:59:14 +08:00
jakevin	ada72b055f	[feature](Nereids): Support any_value/any function. (#15450 )	2023-01-03 12:21:13 +08:00
Mingyu Chen	02d035466b	[refactor] remove partition pruner v1 (#15552 ) partition pruner v1 is no longer used. Also remove session variable partition_prune_algorithm_version	2023-01-03 11:35:30 +08:00
minghong	31548cfe2a	[fix](nereids) check failed that exchange node under agg must from PhysicalDistribute (#15473 ) when nereids translates PhysicalHashAggreg node to original plan, if the input fragment root is exchange node, nereids assumes that this exchanged node is generated from PhyscialDistirbute node. But this assumption is not true. For example, sort node could be translated to exchange(merge phase)+sort(local phase).	2023-01-03 11:19:25 +08:00
zhannngchen	238ae54620	[fix](merge-on-write) unique key mow tables should require distribution columns be key column (#15535 ) * [fix](merge-on-write) unique key mow tables should require distribution columns be key column * fix code style	2023-01-01 15:53:21 +08:00
Mingyu Chen	e89adc6e1d	[fix](create-table) wrong judgement about partition column type (#15542 ) The following stmt should be success, but return error: `complex type cannt be partition column：ARRAY<VARCHAR(64)>` ``` create table test_array( task_insert_time BIGINT NOT NULL DEFAULT "0" COMMENT "" , task_project ARRAY<VARCHAR(64)> DEFAULT NULL COMMENT "" , route_key DATEV2 NOT NULL COMMENT "range分区键" ) DUPLICATE KEY(`task_insert_time`) COMMENT "" PARTITION BY RANGE(route_key) (PARTITION `p202209` VALUES LESS THAN ("2022-10-01"), PARTITION `p202210` VALUES LESS THAN ("2022-11-01"), PARTITION `p202211` VALUES LESS THAN ("2022-12-01")) DISTRIBUTED BY HASH(`task_insert_time` ) BUCKETS 32 PROPERTIES ( "replication_num" = "1", "light_schema_change" = "true" ); ``` This PR fix this	2022-12-31 13:10:39 +08:00
zhangstar333	c47bdf6606	[vectorized](jdbc) fix external table of oracle have keyworld column (#15487 ) if column name is keyword of oracle, the query will report error	2022-12-31 12:48:26 +08:00
morrySnow	781fa17993	[fix](Nereids) round function return type should be double (#15502 )	2022-12-30 23:36:15 +08:00

1 2 3 4 5 ...

3466 Commits