doris

Author	SHA1	Message	Date
minghong	796d51ae2e	[enhance](fuzzy)set rewriteOrToInPredicateThreshold=2/10000 in fuzzy mode (#16456 ) * set rewriteOrToInPredicateThreshold=2/10000 in fuzzy mod * fmt	2023-02-07 12:45:27 +08:00
yiguolei	6fdd35a6f2	[enhancement](mpp process) remove unused method and make report process more clear (#16441 ) both update status and open_vectorized_internal will call send_report and stop report thread. move update_status code to open method and remove unnecessary send_report and stop_report_thread. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-02-07 12:28:55 +08:00
Shuo Wang	bed1ab7c19	[Feature](Nereids) Add hint to enable pre-aggregation when scan OLAP table. (#15614 ) This pr added support for the pre-aggregation hint. Users could use /+PREAGGOPEN/ to enable pre-preaggregation for OLAP table. For example: Let's say we have an aggregate-keys table t (k1 int, k2 int, v1 int sum, v2 int sum). Pre-aggregation could be enabled by query with a hint: select k1, v1 from t /+PREAGGOPEN/.	2023-02-07 11:59:10 +08:00
Henry2SS	0b8c6315fb	[fix](broker load) Fix hll_hash(null) in broker load report incorrect Exception (#16293 ) Co-authored-by: wuhangze <wuhangze@jd.com>	2023-02-07 11:32:20 +08:00
Jibing-Li	a13beca0de	[Fix](load)Use lower case for load column names. #16422 The columns name in stream load and broker load are case sensitive, make it case insensitive. This would be consist with query, because query sql columns name are case insensitve.	2023-02-07 09:18:37 +08:00
Dongyang Li	dcbcec0775	[regression](fuzzy)fuzzy enable_fold_constant_by_be (#16448 ) * [fuzzy](test) fuzzy some session variables stably according to pull_request_id * fuzzy enable_fold_constant_by_be --------- Co-authored-by: stephen <hello_stephen@@qq.com>	2023-02-07 09:17:50 +08:00
xueweizhang	3334e3f393	[fix](restore) do not set default replication_allocation when restore with property reserve_replica = true (#15562 ) Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2023-02-06 22:38:03 +08:00
Kang	737c73dcf0	[Improvement](topn) order by key topn query optimization (#15663 )	2023-02-06 15:36:05 +08:00
jakevin	719b8ca340	[enhance](Nereids): polish code (#16368 )	2023-02-06 12:06:55 +08:00
huangzhaowei	a17fbe2b4c	[fix](MTMV) Use current db to identify the MTMV tasks and jobs (#16419 ) Show MTMV JOB/Task will list all the jobs and tasks among different databases in spite of the current database. Now use current db to identify the mtmv tasks and jobs. Only the user who did not use a database can list all job and tasks among different databases.	2023-02-06 12:03:29 +08:00
starocean999	dccd04a3ba	[fix](fe)predicate is wrongly pushed through CUBE function (#15831 )	2023-02-06 11:29:15 +08:00
Mingyu Chen	a390252893	[fix](keywork) add TIME to keyword (#16277 )	2023-02-06 11:07:11 +08:00
Mingyu Chen	f940cf4cf6	[fix](multi-catalog) fix recursive get schema cache bug (#16415 )	2023-02-06 09:23:07 +08:00
slothever	b1b2697cc7	[fix](iceberg) fix iceberg catalog (#16372 ) 1. Fix iceberg catalog access s3 2. Fix iceberg catalog partition table query 3. Fix persistence	2023-02-05 13:15:28 +08:00
starocean999	df3a6e2412	[fix](fe)only set column info for slots in sortTupleDesc (#16407 )	2023-02-04 23:14:25 +08:00
Xiangyu Wang	e5d624ce9c	[Enhancement](profile) lazy load profileContent string (#16354 ) Sometimes the profileContent of ProfileElement is very large (more than 30MB), and this kind of huge string object may cause performance problems for gc. But we use them only when we invoke profile relevant restful apis (such as /profile/{format}/{query_id}, /api/profile and so on), so we need to lazy load them.	2023-02-04 22:53:44 +08:00
zhangstar333	458adf6c91	[improvement](jdbc) refator jdbc of copy result set by batch (#16337 ) have test jdbc external table with read, 10%+ performance improvement after optimization	2023-02-04 22:51:55 +08:00
Xujian Duan	1069d4f91e	[Enhancement](Stmt)ShowPartitionsStmt support forward to master #16359 Co-authored-by: duanxujian <duanxujian@jd.com>	2023-02-04 22:51:19 +08:00
huangzhaowei	1146bde695	[feature-wip](MTMV) Support refresh mtmv (#16218 ) Support using this sql to refresh mtmv manually. It can generate a mtmv task right now. ``` REFRESH MATERIALIZED VIEW test_mv_view [complete]; ``` You can use `show mtmv task` to show the latest task. In this pr, I also try to clear the mtmv tasks when drop the mtmv to make sure test suite to be right	2023-02-04 20:17:45 +08:00
ElvinWei	ad78f313be	[Improvement](statistics) show analysis job info (#16305 ) Supports query analysis job info. syntax: ```SQL SHOW ANALYZE [TABLE \| ID] [ WHERE [STATE = ["PENDING"\|"RUNNING"\|"FINISHED"\|"FAILED"]] ] [ORDER BY ...] [LIMIT limit]; ``` example: ```SQL SHOW ANALYZE test_table1 WHERE state = 'FINISHED' ORDER BY col_name LIMIT 1; ``` result: \| job_id \| catalog_name \| db_name \| tbl_name \| col_name \| job_type \| analysis_type \| message \| last_exec_time_in_ms \| state \| schedule_type \| \| ------ \| ------------ \| -------------------- \| ----------- \| -------- \| -------- \| ------------- \| ------- \| -------------------- \| -------- \| ------------- \| \| 10086 \| internal \| default_cluster:test \| test_table1 \| pv \| MANUAL \| FULL \| \| 2023-02-01 09:36:41 \| FINISHED \| ONCE \|	2023-02-03 23:21:47 +08:00
ElvinWei	f443ebfd9a	[Improvement](statistics) optimise histogram keyword (#16369 )	2023-02-03 23:02:41 +08:00
minghong	4f778c38a1	[feature](nereids) support explore 4 phase aggregation (#16298 ) support 4 phase Aggregation. example: `select count(distinct k1), sum(k2) from t` suppose t.k0 is distribute key. we have plan ``` Agg(DISTINCT_GLOBAL) \| Exchange(Gather) \| Agg(DISTINCT_LOCAL) \| Agg(GLOBAL) \| Exchange(hash distribute by k1) \| Agg(LOCAL) \| scan ``` limitations: 1. only support sql with one distinct. not support:`select count(distinct k1), count(distinct k2) from t` 2. only support sql with distinct one column not support: `select count(distinct k1, k2) from t`	2023-02-03 21:51:10 +08:00
lihangyu	54c85e36ad	[Fix](point query) OlapScanNode `reuslt` could be memleak since it's cached (#16406 ) Cached OlapScanNode each time call `addScanRangeLocations` will add TScanRangeLocations to result. So `result` could grow too large and lead `getReplicaNumPerHost` a cpu hot spot in it's loop.	2023-02-03 21:42:53 +08:00
AKIRA	5e232a30d8	[fix](planner) Doris returns empty sets when select from a inline view (#16370 ) Doris always delays the execution of expressions as possible as it can, so as the expansion of constant expression. Given below SQL: ```sql select i from (select 'abc' as i, sum(birth) as j from subquerytest2) as tmp ``` The aggregation would be eliminated, since its output is not required by the outer block, but the expasion for constant expression would be done in the final result expr, and since aggreagete output has been eliminate, the expasion would actually do nothing, and finally cause a empty results. To fix this, we materialize the results expr in the inner block for such SQL, it may affect performance, but better than let system produce a mistaken result.	2023-02-03 21:23:52 +08:00
zhengshiJ	929b31bd3c	[Feature](Nereids) Support CaseWhen with subquery (#16385 ) Co-authored-by: jianghaochen <jianghaochen@meituan.com>	2023-02-03 18:20:47 +08:00
谢健	3891083474	[fix](Nereids): fix some bugs in DpHyper (#16282 )	2023-02-03 18:19:48 +08:00
Gabriel	3f4ca3da32	[Bug](CURRENT_TIMESTAMP) Fix wrong default value after schema change (#16364 ) * [Bug](CURRENT_TIMESTAMP) Fix wrong default value after schema change * update * update	2023-02-03 17:06:24 +08:00
xy720	b1fd124f02	[feature](struct-type/map-type) Add switch for struct and map type for creating table (#16379 ) Add switches to forbid uses creating table with struct or map column.	2023-02-03 13:46:52 +08:00
starocean999	dfb610d7ec	[fix](nereids) the order exprs in sort node should be slotRef in its tupleDesc (#16363 )	2023-02-03 13:28:08 +08:00
morrySnow	a9177569c6	[refactor](Nereids) remove trick datatype code in Expression (#16365 ) Since we already do typeCoercion bottom-up in binding step. The trick codes of dataType in Expression are useless. This PR try to remove them.	2023-02-03 13:02:34 +08:00
zhangdong	4fc0715156	[fix](auth)fix external catalog cannot use db (#16269 )	2023-02-03 10:10:33 +08:00
lihangyu	13f74088fa	[Improve](row-store) check light schema change enabled (#16358 )	2023-02-02 20:57:18 +08:00
lihangyu	1d8265c5a3	[refactor](row-store) make row store column a hidden column in meta (#16251 ) This could simplfy storage engine logic and make code more readable, and we could analyze the hidden `__DORIS_ROW_STORE_COL__` length etc..	2023-02-02 20:56:13 +08:00
Pxl	0d5b115993	[Feature](Materialized-View) support duplicate base column for diffrent aggregate function (#15837 ) support duplicate base column for diffrent aggregate function	2023-02-02 18:57:39 +08:00
zhengshiJ	e31913faca	[Feature](Nereids) Support order and limit in subquery (#15971 ) 1.Compatible with the old optimizer, the sort and limit in the subquery will not take effect, just delete it directly. ``` select * from sub_query_correlated_subquery1 where sub_query_correlated_subquery1.k1 > (select sum(sub_query_correlated_subquery3.k3) a from sub_query_correlated_subquery3 where sub_query_correlated_subquery3.v2 = sub_query_correlated_subquery1.k2 order by a limit 1); ``` 2.Adjust the unnesting position of the subquery to ensure that the conjunct in the filter has been optimized, and then unnesting Support: ``` SELECT DISTINCT k1 FROM sub_query_correlated_subquery1 i1 WHERE ((SELECT count() FROM sub_query_correlated_subquery1 WHERE ((k1 = i1.k1) AND (k2 = 2)) or ((k1 = i1.k1) AND (k2 = 1)) ) > 0); ``` The reason why the above can be supported is that conjunction will be performed, which can be converted into the following ``` SELECT DISTINCT k1 FROM sub_query_correlated_subquery1 i1 WHERE ((SELECT count() FROM sub_query_correlated_subquery1 WHERE ((k1 = i1.k1) AND (k2 = 2 or k2 = 1)) ) > 0); ``` Not Support: ``` SELECT DISTINCT k1 FROM sub_query_correlated_subquery1 i1 WHERE ((SELECT count(*) FROM sub_query_correlated_subquery1 WHERE ((k1 = i1.k1) AND (k2 = 2)) or ((k2 = i1.k1) AND (k2 = 1)) ) > 0); ```	2023-02-02 18:17:30 +08:00
Mingyu Chen	cb6875b5a4	[improvement](multi-catalog) use date/datetimev2 as default col type for catalog table (#16304 ) 1. When mapping column from external datasource, use date/datetimev2 as default type 2. check `is_cancelled` when read data, to avoid endless loop after query is cancelled	2023-02-02 17:35:48 +08:00
Tiewei Fang	557159d3ce	[feature](JdbcExternalCatalog) support insert data in JdbcExternalCatalog (#16271 )	2023-02-02 17:31:33 +08:00
谢健	398da44e46	[fix](Nereids) fix bugs in test join5 (#16312 ) make bucket-shuffle-join in PhysicalPlanTranlator when property of left child is not enforced	2023-02-02 16:51:45 +08:00
YueW	bb179b77f7	[Feature-WIP](inverted index) support array type for inverted index reader (#16355 )	2023-02-02 16:14:14 +08:00
morrySnow	a6c1eaf1d8	[refactor] bind slot and function in one rule (#16288 ) 1. use one rule to bind slot and function and do type coercion to fix type and nullable error a. SUM(a1 + AVG(a2)) when a1 and a2 are TINYINT. Before, the return type was SMALLINT, after this PR will return the right type - DOUBLE. 2. fix runtime filter gnerator bugs - bind runtime filter on wrong join conjuncts.	2023-02-02 15:02:32 +08:00
Gabriel	3b8182ee7e	[nereids](nvl) Fix function signature (#16345 )	2023-02-02 14:05:51 +08:00
Ashin Gau	9618427020	[improvement](multi-catalog) increase default batch_size to 4064 (#16326 ) The performance of ClickBench Q30 is affected by batch_size: \| batch_size \| 1024 \| 4096 \| 20480 \| \| -- \| -- \| -- \| -- \| \| Q30 query time \| 2.27 \| 1.08 \| 0.62 \| Because aggregation operator will create a new result block for each batch block, and Q30 has 90 columns, which is time-consuming. Larger batch_size will decrease the number of aggregation blocks, so the larger batch_size will improve performance. Doris internal reader will read at least 4064 rows even if batch_size < 4064, so this PR keep the process of reading external table the same as internal table.	2023-02-02 11:51:09 +08:00
Mingyu Chen	06db0c6a91	[fix](iceberg) fix meta persist bug of iceberg catalog (#16344 ) This PR #16082 forgot to update the GsonUtil for Iceberg Catalog/Database/Table	2023-02-02 09:30:25 +08:00
slothever	40d9e19e1d	[feature-wip](multi-catalog) support iceberg union catalog, and add h… (#16082 ) support iceberg unified catalog framework, and add hms and rest catalog for the framework	2023-02-01 22:59:42 +08:00
huangzhaowei	b878a7e61e	[feature](Load)Suppot skip specific lines number for csv stream load (#16055 ) Support set skip line number for stream load to load csv file. Usage `-H skip_lines:number`: ``` curl --location-trusted -u root: -T test.csv -H skip_lines:5 -XPUT http://127.0.0.1:8030/api/testDb/testTbl/_stream_load ``` Skip line number also can be used in mysql load as below: ```sql LOAD DATA LOCAL INFILE '${mysql_load_skip_lines}' INTO TABLE ${tableName} COLUMNS TERMINATED BY ',' IGNORE 2 LINES PROPERTIES ("auth" = "root:"); ```	2023-02-01 20:42:43 +08:00
huangzhaowei	0842aa2947	[Fix](MTMV)Support master and follow change in multi fe for mtmv (#16149 ) Support master and follow change in multi fe for mtmv This PR fixes following issues: 1. Start the mtmv only in master node, if master change to follower, it will stop the scheduler. 2. Fix a double meta write here 3. Rename some edit log function and variables 4. If a mv both have PeriodicalJob and immediate job and PeriodicalJob will be trigger right now, scheduler will ignore the immediate job. 5. Fix expired time bugs, and make sure it will be clean among all the fes. 6. cleanerScheduler interval from 1 day to 1 minute.	2023-02-01 20:02:46 +08:00
jakevin	f14c62b274	[enhance](Nereids): polish code. (#16309 )	2023-02-01 19:41:10 +08:00
Jibing-Li	d224624bbe	[improvement](session variable)Add enable_file_cache session variable (#16268 ) Add enable_file_cache session variable, so that we can close file cache without restart BE.	2023-02-01 18:15:03 +08:00
huangzhaowei	4e92f63d7b	[Fix](Load) Disable for the developer to import fast json in fe (#16235 )	2023-02-01 16:32:11 +08:00
HaveAnOrangeCat	e3c8fffd99	[function](round) fix decimal scale for scale not specified (#15541 )	2023-02-01 14:58:48 +08:00

... 40 41 42 43 44 ...

5755 Commits