doris

Author	SHA1	Message	Date
zy-kkk	5130a6c006	[improvement](jdbc catalog)Adjustment to JDBC External Table Configuration Based on Internal Table Settings (#25059 ) This pull request addresses the behavior of the `lower_case_table_names` parameter for jdbc catalog's based on the configuration of the internal table's corresponding parameter. Changes: - For internal tables, if `lower_case_table_names` is set to 1 or 2, thejdbc catalog's parameter is forcefully set to `true`. - For internal tables, if `lower_case_table_names` is set to 0, the jdbc catalog's parameter can be either `true` or `false` with a default value of `false`. These adjustments ensure consistency and predictability when working with both internal and external table configurations in Doris.	2023-10-07 06:25:52 -05:00
wangqt	976335e236	[Fix](stream load) stearm load record use valid txn info when two txn with same label #24320 Co-authored-by: wangqingtao6 <wangqingtao6@jd.com>	2023-10-07 16:42:45 +08:00
minghong	1405f1efd2	[refactor](nereids) unify withSel/updateRowCountOnly/withRowCount (#24997 ) 1.refactor statistics functions withSel/updateRowCountOnly/withRowCount, 2. donot use Double.MAX in stats estimation 3. dateLikeType.rangeLength() do not throw DateTimeException.	2023-10-07 16:22:30 +08:00
jakevin	3c9ff7af39	[feature](Nereids): push down topN through join (#24720 ) Push TopN through Join. JoinType just can be left/right outer join or cross join, because data of their one child can't be filtered. new TopN is (original limit + original offset, 0) as limit and offset.	2023-10-07 14:58:53 +08:00
Petrichor	47694c5b36	[fix](jdbc catalog )fix jdbc catalog current_timestamp default (#25016 ) This problem is caused when you read table data from Mariadb where the datatime type default value is set to current_timestamp().	2023-10-07 01:43:03 -05:00
Long Zhao	0f6ea41220	[bug][auth]Show grant causes role errors in the memory. #24783 (#24841 )	2023-10-07 14:06:09 +08:00
Mingyu Chen	727fa2c0cd	[opt](tvf) refine the class of ExternalFileTableValuedFunction (#24706 ) `ExternalFileTableValuedFunction` now has 3 derived classes: - LocalTableValuedFunction - HdfsTableValuedFunction - S3TableValuedFunction All these tvfs are for reading data from file. The difference is where to read the file, eg, from HDFS or from local filesystem. So I refine the fields and methods of these classes. Now there 3 kinds of properties of these tvfs: 1. File format properties File format properties, such as `format`, `column_separator`. For all these tvfs, they are common properties. So these properties should be analyzed in parenet class `ExternalFileTableValuedFunction`. 2. URI or file path The URI or file path property indicate the file location. For different storage, the format of the uri are not same. So they should be analyzed in each derived classes. 3. Other properties All other properties which are special for certain tvf. So they should be analyzed in each derived classes. There are 2 new classes: - `FileFormatConstants`: Define some common property names or variables related to file format. - `FileFormatUtils`: Define some util methods related to file format. After this PR, if we want to add some common properties for all these tvfs, only need to handled it in `ExternalFileTableValuedFunction`, to avoid missing handle it in any one of them. ### Behavior change 1. Remove `fs.defaultFS` property in `hdfs()`, it can be got from `uri` 2. Use `\t` as the default column separator of csv format, same as stream load	2023-10-07 12:44:04 +08:00
Calvin Kirs	0e615a531e	[Feature](Job)Job tasks support the choice of persistence or storage in memory (#24919 )	2023-10-06 23:20:36 -05:00
wangbo	7b2ff38401	query cpu hard limit based on doris scheduler (#24844 )	2023-10-07 12:03:07 +08:00
morrySnow	70f5b0006f	[fix](Nereids) ctas throw npe when default value is null (#25009 )	2023-10-06 22:39:32 -05:00
AKIRA	ffad945dd1	[opt](optimizer) Recycle expired table stats #24777 Remove table stats when olap table is dropped	2023-10-07 11:31:45 +08:00
starocean999	f1e948e5f4	[fix](planner)the common type of date and decimal should be double (#24956 )	2023-10-07 11:27:19 +08:00
Mryange	0631ed61b0	[feature](profilev2) Preliminary support for profilev2. (#24881 ) You can set the level of counters on the backend using ADD_COUNTER_WITH_LEVEL/ADD_TIMER_WITH_LEVEL. The profile can then merge counters with level 1. set profile_level = 1; such as sql select count() from customer join item on c_customer_sk = i_item_sk profile Simple profile PLAN FRAGMENT 0 OUTPUT EXPRS: count() PARTITION: UNPARTITIONED VRESULT SINK MYSQL_PROTOCAL 7:VAGGREGATE (merge finalize) \| output: count(partial_count())[#44] \| group by: \| cardinality=1 \| TotalTime: avg 725.608us, max 725.608us, min 725.608us \| RowsReturned: 1 \| 6:VEXCHANGE offset: 0 TotalTime: avg 52.411us, max 52.411us, min 52.411us RowsReturned: 8 PLAN FRAGMENT 1 PARTITION: HASH_PARTITIONED: c_customer_sk STREAM DATA SINK EXCHANGE ID: 06 UNPARTITIONED TotalTime: avg 106.263us, max 118.38us, min 81.403us BlocksSent: 8 5:VAGGREGATE (update serialize) \| output: partial_count()[#43] \| group by: \| cardinality=1 \| TotalTime: avg 679.296us, max 739.395us, min 554.904us \| BuildTime: avg 33.198us, max 48.387us, min 28.880us \| ExecTime: avg 27.633us, max 40.278us, min 24.537us \| RowsReturned: 8 \| 4:VHASH JOIN \| join op: INNER JOIN(PARTITIONED)[] \| equal join conjunct: c_customer_sk = i_item_sk \| runtime filters: RF000[bloom] <- i_item_sk(18000/16384/1048576) \| cardinality=17,740 \| vec output tuple id: 3 \| vIntermediate tuple ids: 2 \| hash output slot ids: 22 \| RowsReturned: 18.0K (18000) \| ProbeRows: 18.0K (18000) \| ProbeTime: avg 862.308us, max 1.576ms, min 666.28us \| BuildRows: 18.0K (18000) \| BuildTime: avg 3.8ms, max 3.860ms, min 2.317ms \| \|----1:VEXCHANGE \| offset: 0 \| TotalTime: avg 48.822us, max 67.459us, min 30.380us \| RowsReturned: 18.0K (18000) \| 3:VEXCHANGE offset: 0 TotalTime: avg 33.162us, max 39.480us, min 28.854us RowsReturned: 18.0K (18000) PLAN FRAGMENT 2 PARTITION: HASH_PARTITIONED: c_customer_id STREAM DATA SINK EXCHANGE ID: 03 HASH_PARTITIONED: c_customer_sk TotalTime: avg 753.954us, max 1.210ms, min 499.470us BlocksSent: 64 2:VOlapScanNode TABLE: default_cluster:tpcds.customer(customer), PREAGGREGATION: ON runtime filters: RF000[bloom] -> c_customer_sk partitions=1/1, tablets=12/12, tabletList=1550745,1550747,1550749 ... cardinality=100000, avgRowSize=0.0, numNodes=1 pushAggOp=NONE TotalTime: avg 18.417us, max 41.319us, min 10.189us RowsReturned: 18.0K (18000) --------- Co-authored-by: yiguolei <676222867@qq.com>	2023-10-07 11:16:53 +08:00
walter	813c8f1e5a	[Improve](metric) Improve FE DorisMetricRegistry (#24773 ) The current implementation needs to iterate all metrics in a lock, which might cause latency spikes. This PR changes the underlying data structure to ConcurrentHashMap so that removing metrics doesn't need to block the entire registry.	2023-10-07 10:25:55 +08:00
morrySnow	42c52037fc	Revert "[Fix](Nereids) fix infer predicate lost cast of source expression (#23692 )" (#25008 ) This reverts commit be3618316f8411ad36d0a77f5b4405f2dbd128fa.	2023-10-06 21:18:29 -05:00
yujun	534d942933	[improvement](tablet clone) impr further repair tablet sched priority (#25046 )	2023-10-05 22:19:53 +08:00
yujun	c298b1ca1a	[fix](timezone) fix parse timezone when include GMT or time zone short ids (#25032 )	2023-10-03 20:53:16 +08:00
minghong	4c94820ff9	[opt](nereids) adjust column stats in filter estimation (#24973 ) TPCDS before query4 9335 8113 8070 8070 query13 3104 1386 1385 1385 query18 1704 1216 1151 1151 query48 840 840 839 839 query61 435 379 383 379 query71 715 570 579 570 query85 2822 2627 2612 2612 query88 1897 1816 1793 1793 Total cold run time: 20852 ms Total hot run time: 16799 ms after: query4 9610 8287 8249 8249 query13 1721 1013 1042 1013 query18 1585 1186 1155 1155 query48 789 777 778 777 query61 384 387 381 381 query71 713 610 584 584 query85 2020 1867 1843 1843 query88 1859 1812 1805 1805 Total cold run time: 18681 ms Total hot run time: 15807 ms	2023-09-28 21:34:17 +08:00
morrySnow	8eaf0d3a4b	[fix](Nereids) ctas varchar length should set to max except column from slot (#25003 )	2023-09-28 21:32:33 +08:00
starocean999	5dd70b8a25	[fix](planner) createColumnAndViewDefs method use wrong analyzer (#25005 )	2023-09-28 06:59:04 -05:00
JingDas	230b7bd15e	[test](nereids) Add some tests for PushFilterInsideJoin and FindHashConditionForJoin rule (#24550 )	2023-09-28 03:45:05 -05:00
morrySnow	bf4fb32487	[minor](catalog) remove useless compatibilityMatrix in catalog PrimitiveType (#24999 )	2023-09-28 02:57:59 -05:00
谢健	a574f29d76	[enhancement](Nereids): use enforcer to choose the n-th plan (#22929 )	2023-09-28 15:16:24 +08:00
morrySnow	b50c1448df	[fix](Nereids) should not replace slot by Alias when do NormalizeSlot (#24928 ) when we do NormalizeToSlot, we pushed complex expression and only remain slot of it. When we do this, we collect alias and their child and compute its child in bottom project, remain the result slot in current node. for example Window(max(...), c1 as a1) after normalization, we get Window(max(...), a1) +-- Project(..., c1 as a1) But, in some cases, we remove some SlotReference by mistake, for example Window(max(...), c1, c1 as a1) after normalization, we get Window(max(...), a1) +-- Project(..., c1 as a1) we lost the SlotReference c1. This PR fix this problem. After this Pr, we get Window(max(...), c1, a1) +-- Project(..., c1, c1 as a1)	2023-09-28 14:51:08 +08:00
Calvin Kirs	377554ee1c	[Fix](Job)Job Task does not display error message (#24897 )	2023-09-28 14:47:12 +08:00
xzj7019	e863cfe5c7	[fix](nereids) fix multi window projection issue temporarily (#24912 ) Current multi-window plan generation has problem on the project sequence, for example: +--LogicalWindow ( windowExpressions=[avg(sum_sales#115) WindowSpec(...) AS `avg_monthly_sales`#116, rank() WindowSpec(...) AS `rn`#117], ...) and correspond physical plan is: +--PhysicalWindow[6572]@16 ( windowFrameGroup=(Funcs=[avg(sum_sales#115) WindowSpec(...) AS `avg_monthly_sales`#116], ... ) +--PhysicalWindow[6568]@29 ( windowFrameGroup=(Funcs=[rank() WindowSpec(...) AS `rn`#117], ...] ) If the final plan is generated as following: MultiCastDataSinks STREAM DATA SINK EXCHANGE ID: 20 HASH_PARTITIONED: rn[#208], i_brand[#202], cc_name[#203], i_category[#201] Before we eventually resolve the multi-window issue, we add a projection as following and force a mapping but this will not cover all potential problems. MultiCastDataSinks STREAM DATA SINK EXCHANGE ID: 20 HASH_PARTITIONED: rn[#219], i_brand[#213], cc_name[#214], i_category[#212] PROJECTIONS: i_category[#184], i_brand[#185], cc_name[#186], d_year[#187], d_moy[#188], sum_sales[#189], avg_monthly_sales[#191], rn[#190] PROJECTION TUPLE: 20	2023-09-28 14:33:00 +08:00
Calvin Kirs	f5c38b29a5	[Improve](Load)Change the response label prefix of Update and Delete to the corresponding operations (#24996 ) Doris Whether it is insert, delete, or update, the label prefix is insert, which may confuse users. Change Add Update and Delete label prefix Test mysql> insert into t2 (id,id_str) values (2,'test2'); Query OK, 1 row affected (0.09 sec) {'label':'insert_b16405a387f14bfa_947dc9b2217ee3df', 'status':'VISIBLE', 'txnId':'17023'} mysql> insert into t1 (id,id_str) values (2,'test2'); Query OK, 1 row affected (0.09 sec) {'label':'insert_c3acdf63bf94e87_ad65a2dca88f5576', 'status':'VISIBLE', 'txnId':'17025'} mysql> update t2 set id_str='update2'; Query OK, 2 rows affected (5.27 sec) {'label':'update_903a88c8defe41d5_a7fca85159c84e50', 'status':'VISIBLE', 'txnId':'17026'} mysql> delete from t2 where id =2; Query OK, 0 rows affected (5.56 sec) {'label':'delete_1ca419aa-b7a2-41f6-9cbd-e14f4c7517f4', 'status':'VISIBLE', 'txnId':'17028'} mysql> delete from t1 where t1.id in (select id from t2); Query OK, 1 row affected (4.41 sec) {'label':'delete_7e2ae75fee9a42b7_9322d4ae8b80a28b', 'status':'VISIBLE', 'txnId':'17034'}	2023-09-28 14:32:35 +08:00
jakevin	bf808e9aa6	[fix](Nereids): tolerate DateLike overflow in SQL CAST/CONVERT (#24943 ) - explicit type cast, we need tolerate overflow and convert it to be NULL - implicit type cast, throw exception	2023-09-28 12:11:50 +08:00
Siyang Tang	188d9ab94e	[enhancement](statistics) collect table level loaded rows on BE to make RPC light weight (#24609 )	2023-09-28 10:51:50 +08:00
minghong	42207df89f	[refactor](nereids)update NormalizeRepeat comments (#24893 ) update NormalizeRepeat comments	2023-09-28 10:42:16 +08:00
starocean999	584646c054	[improvement](nereids)dphyper GraphSimplifier should consider missed edges when estimating join cost (#21747 )	2023-09-28 09:30:57 +08:00
airborne12	732f821c15	[Fix](inverted index) make parser mode coarse grained by default (#24949 )	2023-09-27 21:04:41 +08:00
Liqf	d4e823950a	[bug](json)Fix some problems of json function on Nereids (#24898 ) Fix some problems of json_length and json_contains function on Nereids fix wrong result of json_contains function Regression test jsonb_p0 to enable Nereids	2023-09-27 21:01:45 +08:00
meiyi	391a4e29eb	[fix](schema) Table column order is changed if add a column and do truncate (#24981 )	2023-09-27 20:59:11 +08:00
morrySnow	63b283a848	[fix](Nereids) init Date/DateV2Literal should check non-zero time fields (#24971 )	2023-09-27 20:48:36 +08:00
Gabriel	1fb9022d07	[pipelineX](bug) Fix meta scan operator (#24963 )	2023-09-27 20:34:47 +08:00
xzj7019	bb7f8d18a8	[fix](nereids) push down filter through partition topn (#24944 ) support pushing down filter through partition topn if the filter can pass through window. fix CreatePartitionTopNFromWindow bug which may generate two partition topn unexpectly. case: select * from (select c2, row_number() over (partition by c2) as rn from t1) T where rn<=1 and c2 = 1; before this pr: \| PhysicalResultSink \| \| --PhysicalDistribute \| \| ----filter((rn <= 1)) \| \| ------PhysicalWindow \| \| --------PhysicalQuickSort \| \| ----------PhysicalDistribute \| \| ------------PhysicalPartitionTopN \| \| --------------filter((T.c2 = 1)) \| \| ----------------PhysicalPartitionTopN \| \| ------------------PhysicalProject \| \| --------------------PhysicalOlapScan[t1] \| +------------------------------------------+ after: \| PhysicalResultSink \| \| --PhysicalDistribute \| \| ----filter((rn <= 1)) \| \| ------PhysicalWindow \| \| --------PhysicalQuickSort \| \| ----------PhysicalDistribute \| \| ------------PhysicalPartitionTopN \| \| --------------PhysicalProject \| \| ----------------filter((T.c2 = 1)) \| \| ------------------PhysicalOlapScan[t1] \| +----------------------------------------+	2023-09-27 19:38:04 +08:00
morrySnow	00786a3295	[fix](Nereids) could not prune datev1 partition column (#24959 ) because storage engine could not process date comparison predicates. we convert it to datetime comparison predicates. however, partition prunner could not process cast(slot) cp literal. so, we convert back in partition pruner to let it work well. TODO: move convert date to datetime in translate stage and only convert predicates for storage engine.	2023-09-27 18:41:56 +08:00
Mryange	5d138b6928	[remove](function) make execute_impl const and remove running_difference function (#24935 )	2023-09-27 18:17:28 +08:00
zy-kkk	100d76510c	[Fix](HttpServer) Refactor API Endpoints to Only Allow GET Requests for Enhanced Security (#24855 )	2023-09-27 17:10:11 +08:00
LiBinfeng	00e8d1c3b4	[Fix](Planner) disable bitmap type in compare expression (#24792 ) Problem: be core because of bitmap calculation. Reason: when be check failed, it would core directly. Example: SELECT id_bitmap FROM test_bitmap WHERE id_bitmap IN (NULL) LIMIT 20; Solved: Forbidden this kind of expression in fe when analyze. And also forbid bitmap type comparing in other unsupported expressions.	2023-09-27 16:57:06 +08:00
bigben0204	0227292c85	[bug](profile) query profile api of fe cann't get result if non-root user query on the other fe #24858 (#24914 ) Issue Number: #24858 If isAllNode is true, the api should only distribute the query to all fe and do not run checkAuthByUserAndQueryId. If isAllNode is false, the api queries profile on the fe, at this time the api should run checkAuthByUserAndQueryId.	2023-09-27 16:50:41 +08:00
谢健	9562e280af	[enhancement](Nereids): remove stats derivation in CostAndEnforce job (#24945 ) 1. remove stats derivation in CostAndEnforce job 2. enforce valid for each stats after estimating	2023-09-27 16:31:03 +08:00
Xinyi Zou	87a30dc41d	[feature-wip](arrow-flight)(step3) Support authentication and user session (#24772 )	2023-09-27 14:53:58 +08:00
Ashin Gau	26818de9c8	[feature](jni) support complex types in jni framework (#24810 ) Support complex types in jni framework, and successfully run end-to-end on hudi. ### How to Use Other scanners only need to implement three interfaces in `ColumnValue`: ``` // Get array elements and append into values void unpackArray(List<ColumnValue> values); // Get map key array&value array, and append into keys&values void unpackMap(List<ColumnValue> keys, List<ColumnValue> values); // Get the struct fields specified by `structFieldIndex`, and append into values void unpackStruct(List<Integer> structFieldIndex, List<ColumnValue> values); ``` Developers can take `HudiColumnValue` as an example.	2023-09-27 14:47:41 +08:00
xzj7019	a1ab8f96a1	[fix](nereids) mark two phase partition topn global to notice be passthrough logic (#24886 ) mark partition topn phase to notice be to handle passthrough logic well, this pr is fe part code. be side logic: the the phase equals to PTopNPhase.TWO_PAHSE_GLOBAL, it should skip the bypass logic and do the second phase ptopn operation anyway.	2023-09-27 14:08:59 +08:00
Gabriel	1b0e3246ea	[pipelineX](fix) Fix exception reporting and Nereids plan (#24936 )	2023-09-27 13:15:40 +08:00
zzzzzzzs	452318a9fc	[Enhancement](streamload) stream tvf support user specified label (#24219 ) stream tvf support user specified label example: curl -v --location-trusted -u root: -H "sql: insert into test.t1 WITH LABEL label1 select c1,c2 from http_stream(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/_http_stream return: { "TxnId": 2064, "Label": "label1", "Comment": "", "TwoPhaseCommit": "false", "Status": "Success", "Message": "OK", "NumberTotalRows": 2, "NumberLoadedRows": 2, "NumberFilteredRows": 0, "NumberUnselectedRows": 0, "LoadBytes": 27, "LoadTimeMs": 152, "BeginTxnTimeMs": 0, "StreamLoadPutTimeMs": 83, "ReadDataTimeMs": 92, "WriteDataTimeMs": 41, "CommitAndPublishTimeMs": 24 }	2023-09-27 12:09:35 +08:00
Pxl	18b5f70a7c	[Bug](materialized-view) enable rewrite on select materialized index with aggregate mode (#24691 ) enable rewrite on select materialized index with aggregate mode	2023-09-27 11:30:36 +08:00
minghong	a8f312794e	[feature](nereids)support stats estimation for is-null predicate (#24764 ) 1. condition order: filter/hashCondition/otherCondition, 2. update regression out 3. remove tpch_sf500 shape case(covered by tpch sf1000) 4. implement is-null stats estimation 5. update ssb shape	2023-09-27 10:04:35 +08:00

... 42 43 44 45 46 ...

8289 Commits