Commit Graph

8289 Commits

Author SHA1 Message Date
5130a6c006 [improvement](jdbc catalog)Adjustment to JDBC External Table Configuration Based on Internal Table Settings (#25059)
This pull request addresses the behavior of the `lower_case_table_names` parameter for jdbc catalog's based on the configuration of the internal table's corresponding parameter.

Changes:
- For internal tables, if `lower_case_table_names` is set to 1 or 2, thejdbc catalog's parameter is forcefully set to `true`.
- For internal tables, if `lower_case_table_names` is set to 0, the jdbc catalog's parameter can be either `true` or `false` with a default value of `false`.

These adjustments ensure consistency and predictability when working with both internal and external table configurations in Doris.
2023-10-07 06:25:52 -05:00
976335e236 [Fix](stream load) stearm load record use valid txn info when two txn with same label #24320
Co-authored-by: wangqingtao6 <wangqingtao6@jd.com>
2023-10-07 16:42:45 +08:00
1405f1efd2 [refactor](nereids) unify withSel/updateRowCountOnly/withRowCount (#24997)
1.refactor statistics functions withSel/updateRowCountOnly/withRowCount,
2. donot use Double.MAX in stats estimation
3. dateLikeType.rangeLength() do not throw DateTimeException.
2023-10-07 16:22:30 +08:00
3c9ff7af39 [feature](Nereids): push down topN through join (#24720)
Push TopN through Join.

JoinType just can be left/right outer join or cross join, because data of their one child can't be filtered.

new TopN is (original limit + original offset, 0) as limit and offset.
2023-10-07 14:58:53 +08:00
47694c5b36 [fix](jdbc catalog )fix jdbc catalog current_timestamp default (#25016)
This problem is caused when you read table data from Mariadb where the datatime type default value is set to current_timestamp().
2023-10-07 01:43:03 -05:00
0f6ea41220 [bug][auth]Show grant causes role errors in the memory. #24783 (#24841) 2023-10-07 14:06:09 +08:00
727fa2c0cd [opt](tvf) refine the class of ExternalFileTableValuedFunction (#24706)
`ExternalFileTableValuedFunction` now has 3 derived classes:

- LocalTableValuedFunction
- HdfsTableValuedFunction
- S3TableValuedFunction

All these tvfs are for reading data from file. The difference is where to read the file, eg, from HDFS or from local filesystem.

So I refine the fields and methods of these classes.
Now there 3 kinds of properties of these tvfs:

1. File format properties

	File format properties, such as `format`, `column_separator`. For all these tvfs, they are common properties.
	So these properties should be analyzed in parenet class `ExternalFileTableValuedFunction`.
	
2. URI or file path

	The URI or file path property indicate the file location. For different storage, the format of the uri are not same.
	So they should be analyzed in each derived classes.
	
3. Other properties

	All other properties which are special for certain tvf.
	So they should be analyzed in each derived classes.
	
There are 2 new classes:

- `FileFormatConstants`: Define some common property names or variables related to file format.
- `FileFormatUtils`: Define some util methods related to file format.

After this PR, if we want to add some common properties for all these tvfs, only need to handled it in
`ExternalFileTableValuedFunction`, to avoid missing handle it in any one of them.

### Behavior change

1. Remove `fs.defaultFS` property in `hdfs()`, it can be got from `uri`
2. Use `\t` as the default column separator of csv format, same as stream load
2023-10-07 12:44:04 +08:00
0e615a531e [Feature](Job)Job tasks support the choice of persistence or storage in memory (#24919) 2023-10-06 23:20:36 -05:00
7b2ff38401 query cpu hard limit based on doris scheduler (#24844) 2023-10-07 12:03:07 +08:00
70f5b0006f [fix](Nereids) ctas throw npe when default value is null (#25009) 2023-10-06 22:39:32 -05:00
ffad945dd1 [opt](optimizer) Recycle expired table stats #24777
Remove table stats when olap table is dropped
2023-10-07 11:31:45 +08:00
f1e948e5f4 [fix](planner)the common type of date and decimal should be double (#24956) 2023-10-07 11:27:19 +08:00
0631ed61b0 [feature](profilev2) Preliminary support for profilev2. (#24881)
You can set the level of counters on the backend using ADD_COUNTER_WITH_LEVEL/ADD_TIMER_WITH_LEVEL. The profile can then merge counters with level 1.
set profile_level = 1;
such as
sql
select count(*) from customer join item on c_customer_sk = i_item_sk

profile

Simple  profile  
  
  PLAN  FRAGMENT  0
    OUTPUT  EXPRS:
        count(*)
    PARTITION:  UNPARTITIONED

    VRESULT  SINK
          MYSQL_PROTOCAL


    7:VAGGREGATE  (merge  finalize)
    |    output:  count(partial_count(*))[#44]
    |    group  by:  
    |    cardinality=1
    |    TotalTime:  avg  725.608us,  max  725.608us,  min  725.608us
    |    RowsReturned:  1
    |    
    6:VEXCHANGE
          offset:  0
          TotalTime:  avg  52.411us,  max  52.411us,  min  52.411us
          RowsReturned:  8

PLAN  FRAGMENT  1

    PARTITION:  HASH_PARTITIONED:  c_customer_sk

    STREAM  DATA  SINK
        EXCHANGE  ID:  06
        UNPARTITIONED

        TotalTime:  avg  106.263us,  max  118.38us,  min  81.403us
        BlocksSent:  8

    5:VAGGREGATE  (update  serialize)
    |    output:  partial_count(*)[#43]
    |    group  by:  
    |    cardinality=1
    |    TotalTime:  avg  679.296us,  max  739.395us,  min  554.904us
    |    BuildTime:  avg  33.198us,  max  48.387us,  min  28.880us
    |    ExecTime:  avg  27.633us,  max  40.278us,  min  24.537us
    |    RowsReturned:  8
    |    
    4:VHASH  JOIN
    |    join  op:  INNER  JOIN(PARTITIONED)[]
    |    equal  join  conjunct:  c_customer_sk  =  i_item_sk
    |    runtime  filters:  RF000[bloom]  <-  i_item_sk(18000/16384/1048576)
    |    cardinality=17,740
    |    vec  output  tuple  id:  3
    |    vIntermediate  tuple  ids:  2  
    |    hash  output  slot  ids:  22  
    |    RowsReturned:  18.0K  (18000)
    |    ProbeRows:  18.0K  (18000)
    |    ProbeTime:  avg  862.308us,  max  1.576ms,  min  666.28us
    |    BuildRows:  18.0K  (18000)
    |    BuildTime:  avg  3.8ms,  max  3.860ms,  min  2.317ms
    |    
    |----1:VEXCHANGE
    |              offset:  0
    |              TotalTime:  avg  48.822us,  max  67.459us,  min  30.380us
    |              RowsReturned:  18.0K  (18000)
    |        
    3:VEXCHANGE
          offset:  0
          TotalTime:  avg  33.162us,  max  39.480us,  min  28.854us
          RowsReturned:  18.0K  (18000)

PLAN  FRAGMENT  2

    PARTITION:  HASH_PARTITIONED:  c_customer_id

    STREAM  DATA  SINK
        EXCHANGE  ID:  03
        HASH_PARTITIONED:  c_customer_sk

        TotalTime:  avg  753.954us,  max  1.210ms,  min  499.470us
        BlocksSent:  64

    2:VOlapScanNode
          TABLE:  default_cluster:tpcds.customer(customer),  PREAGGREGATION:  ON
          runtime  filters:  RF000[bloom]  ->  c_customer_sk
          partitions=1/1,  tablets=12/12,  tabletList=1550745,1550747,1550749  ...
          cardinality=100000,  avgRowSize=0.0,  numNodes=1
          pushAggOp=NONE
          TotalTime:  avg  18.417us,  max  41.319us,  min  10.189us
          RowsReturned:  18.0K  (18000)
---------

Co-authored-by: yiguolei <676222867@qq.com>
2023-10-07 11:16:53 +08:00
813c8f1e5a [Improve](metric) Improve FE DorisMetricRegistry (#24773)
The current implementation needs to iterate all metrics in a lock,
which might cause latency spikes. This PR changes the underlying
data structure to ConcurrentHashMap so that removing metrics doesn't
need to block the entire registry.
2023-10-07 10:25:55 +08:00
42c52037fc Revert "[Fix](Nereids) fix infer predicate lost cast of source expression (#23692)" (#25008)
This reverts commit be3618316f8411ad36d0a77f5b4405f2dbd128fa.
2023-10-06 21:18:29 -05:00
534d942933 [improvement](tablet clone) impr further repair tablet sched priority (#25046) 2023-10-05 22:19:53 +08:00
c298b1ca1a [fix](timezone) fix parse timezone when include GMT or time zone short ids (#25032) 2023-10-03 20:53:16 +08:00
4c94820ff9 [opt](nereids) adjust column stats in filter estimation (#24973)
TPCDS before
query4  9335    8113    8070    8070
query13 3104    1386    1385    1385
query18 1704    1216    1151    1151
query48 840     840     839     839
query61 435     379     383     379
query71 715     570     579     570
query85 2822    2627    2612    2612
query88 1897    1816    1793    1793
Total cold run time: 20852 ms
Total hot run time: 16799 ms

after:
query4  9610    8287    8249    8249
query13 1721    1013    1042    1013
query18 1585    1186    1155    1155
query48 789     777     778     777
query61 384     387     381     381
query71 713     610     584     584
query85 2020    1867    1843    1843
query88 1859    1812    1805    1805
Total cold run time: 18681 ms
Total hot run time: 15807 ms
2023-09-28 21:34:17 +08:00
8eaf0d3a4b [fix](Nereids) ctas varchar length should set to max except column from slot (#25003) 2023-09-28 21:32:33 +08:00
5dd70b8a25 [fix](planner) createColumnAndViewDefs method use wrong analyzer (#25005) 2023-09-28 06:59:04 -05:00
230b7bd15e [test](nereids) Add some tests for PushFilterInsideJoin and FindHashConditionForJoin rule (#24550) 2023-09-28 03:45:05 -05:00
bf4fb32487 [minor](catalog) remove useless compatibilityMatrix in catalog PrimitiveType (#24999) 2023-09-28 02:57:59 -05:00
a574f29d76 [enhancement](Nereids): use enforcer to choose the n-th plan (#22929) 2023-09-28 15:16:24 +08:00
b50c1448df [fix](Nereids) should not replace slot by Alias when do NormalizeSlot (#24928)
when we do NormalizeToSlot, we pushed complex expression and only remain
slot of it. When we do this, we collect alias and their child and
compute its child in bottom project, remain the result slot in current
node. for example

Window(max(...), c1 as a1)

after normalization, we get

Window(max(...), a1)
+-- Project(..., c1 as a1)

But, in some cases, we remove some SlotReference by mistake, for example

Window(max(...), c1, c1 as a1)

after normalization, we get

Window(max(...), a1)
+-- Project(..., c1 as a1)

we lost the SlotReference c1. This PR fix this problem. After this Pr,
we get

Window(max(...), c1, a1)
+-- Project(..., c1, c1 as a1)
2023-09-28 14:51:08 +08:00
377554ee1c [Fix](Job)Job Task does not display error message (#24897) 2023-09-28 14:47:12 +08:00
e863cfe5c7 [fix](nereids) fix multi window projection issue temporarily (#24912)
Current multi-window plan generation has problem on the project sequence, for example:

+--LogicalWindow ( windowExpressions=[avg(sum_sales#115) WindowSpec(...) AS `avg_monthly_sales`#116, rank() WindowSpec(...) AS `rn`#117], ...)
and correspond physical plan is:

+--PhysicalWindow[6572]@16 ( windowFrameGroup=(Funcs=[avg(sum_sales#115) WindowSpec(...) AS `avg_monthly_sales`#116], ... )
    +--PhysicalWindow[6568]@29 ( windowFrameGroup=(Funcs=[rank() WindowSpec(...) AS `rn`#117], ...] )
If the final plan is generated as following:

MultiCastDataSinks
STREAM DATA SINK
  EXCHANGE ID: 20
  HASH_PARTITIONED: rn[#208], i_brand[#202], cc_name[#203], i_category[#201]
Before we eventually resolve the multi-window issue, we add a projection as following and force a mapping but this will not cover all potential problems.

MultiCastDataSinks
STREAM DATA SINK
  EXCHANGE ID: 20
  HASH_PARTITIONED: rn[#219], i_brand[#213], cc_name[#214], i_category[#212]
  PROJECTIONS: i_category[#184], i_brand[#185], cc_name[#186], d_year[#187], d_moy[#188], sum_sales[#189], avg_monthly_sales[#191], rn[#190]
  PROJECTION TUPLE: 20
2023-09-28 14:33:00 +08:00
f5c38b29a5 [Improve](Load)Change the response label prefix of Update and Delete to the corresponding operations (#24996)
Doris Whether it is insert, delete, or update, the label prefix is insert, which may confuse users.

Change
Add Update and Delete label prefix

Test

mysql>  insert into t2 (id,id_str) values (2,'test2');
Query OK, 1 row affected (0.09 sec)
{'label':'insert_b16405a387f14bfa_947dc9b2217ee3df', 'status':'VISIBLE', 'txnId':'17023'}

mysql>  insert into t1 (id,id_str) values (2,'test2');
Query OK, 1 row affected (0.09 sec)
{'label':'insert_c3acdf63bf94e87_ad65a2dca88f5576', 'status':'VISIBLE', 'txnId':'17025'}

mysql> update t2 set id_str='update2';
Query OK, 2 rows affected (5.27 sec)
{'label':'update_903a88c8defe41d5_a7fca85159c84e50', 'status':'VISIBLE', 'txnId':'17026'}


mysql> delete from  t2  where id =2;
Query OK, 0 rows affected (5.56 sec)
{'label':'delete_1ca419aa-b7a2-41f6-9cbd-e14f4c7517f4', 'status':'VISIBLE', 'txnId':'17028'}

mysql> delete from t1 where t1.id in (select id from t2);
Query OK, 1 row affected (4.41 sec)
{'label':'delete_7e2ae75fee9a42b7_9322d4ae8b80a28b', 'status':'VISIBLE', 'txnId':'17034'}
2023-09-28 14:32:35 +08:00
bf808e9aa6 [fix](Nereids): tolerate DateLike overflow in SQL CAST/CONVERT (#24943)
- explicit type cast, we need tolerate overflow and convert it to be NULL
- implicit type cast, throw exception
2023-09-28 12:11:50 +08:00
188d9ab94e [enhancement](statistics) collect table level loaded rows on BE to make RPC light weight (#24609) 2023-09-28 10:51:50 +08:00
42207df89f [refactor](nereids)update NormalizeRepeat comments (#24893)
update NormalizeRepeat comments
2023-09-28 10:42:16 +08:00
584646c054 [improvement](nereids)dphyper GraphSimplifier should consider missed edges when estimating join cost (#21747) 2023-09-28 09:30:57 +08:00
732f821c15 [Fix](inverted index) make parser mode coarse grained by default (#24949) 2023-09-27 21:04:41 +08:00
d4e823950a [bug](json)Fix some problems of json function on Nereids (#24898)
Fix some problems of json_length and json_contains function on Nereids
fix wrong result of json_contains function
Regression test jsonb_p0 to enable Nereids
2023-09-27 21:01:45 +08:00
391a4e29eb [fix](schema) Table column order is changed if add a column and do truncate (#24981) 2023-09-27 20:59:11 +08:00
63b283a848 [fix](Nereids) init Date/DateV2Literal should check non-zero time fields (#24971) 2023-09-27 20:48:36 +08:00
1fb9022d07 [pipelineX](bug) Fix meta scan operator (#24963) 2023-09-27 20:34:47 +08:00
bb7f8d18a8 [fix](nereids) push down filter through partition topn (#24944)
support pushing down filter through partition topn if the filter can pass through window.
fix CreatePartitionTopNFromWindow bug which may generate two partition topn unexpectly.
case:
select * from (select c2, row_number() over (partition by c2) as rn from t1) T where rn<=1 and c2 = 1;
before this pr:
| PhysicalResultSink                       |
| --PhysicalDistribute                     |
| ----filter((rn <= 1))                    |
| ------PhysicalWindow                     |
| --------PhysicalQuickSort                |
| ----------PhysicalDistribute             |
| ------------PhysicalPartitionTopN        |
| --------------filter((T.c2 = 1))         |
| ----------------PhysicalPartitionTopN    |
| ------------------PhysicalProject        |
| --------------------PhysicalOlapScan[t1] |
+------------------------------------------+
after:

| PhysicalResultSink                     |
| --PhysicalDistribute                   |
| ----filter((rn <= 1))                  |
| ------PhysicalWindow                   |
| --------PhysicalQuickSort              |
| ----------PhysicalDistribute           |
| ------------PhysicalPartitionTopN      |
| --------------PhysicalProject          |
| ----------------filter((T.c2 = 1))     |
| ------------------PhysicalOlapScan[t1] |
+----------------------------------------+
2023-09-27 19:38:04 +08:00
00786a3295 [fix](Nereids) could not prune datev1 partition column (#24959)
because storage engine could not process date comparison predicates.
we convert it to datetime comparison predicates.
however, partition prunner could not process cast(slot) cp literal.
so, we convert back in partition pruner to let it work well.

TODO:
move convert date to datetime in translate stage
and only convert predicates for storage engine.
2023-09-27 18:41:56 +08:00
5d138b6928 [remove](function) make execute_impl const and remove running_difference function (#24935) 2023-09-27 18:17:28 +08:00
100d76510c [Fix](HttpServer) Refactor API Endpoints to Only Allow GET Requests for Enhanced Security (#24855) 2023-09-27 17:10:11 +08:00
00e8d1c3b4 [Fix](Planner) disable bitmap type in compare expression (#24792)
Problem:
be core because of bitmap calculation.

Reason:
when be check failed, it would core directly.

Example:
SELECT id_bitmap FROM test_bitmap WHERE id_bitmap IN (NULL) LIMIT 20;

Solved:
Forbidden this kind of expression in fe when analyze. And also forbid bitmap type comparing in other unsupported expressions.
2023-09-27 16:57:06 +08:00
0227292c85 [bug](profile) query profile api of fe cann't get result if non-root user query on the other fe #24858 (#24914)
Issue Number: #24858

If isAllNode is true, the api should only distribute the query to all fe and do not run checkAuthByUserAndQueryId.
If isAllNode is false, the api queries profile on the fe, at this time the api should run checkAuthByUserAndQueryId.
2023-09-27 16:50:41 +08:00
9562e280af [enhancement](Nereids): remove stats derivation in CostAndEnforce job (#24945)
1. remove stats derivation in CostAndEnforce job
2. enforce valid for each stats after estimating
2023-09-27 16:31:03 +08:00
87a30dc41d [feature-wip](arrow-flight)(step3) Support authentication and user session (#24772) 2023-09-27 14:53:58 +08:00
26818de9c8 [feature](jni) support complex types in jni framework (#24810)
Support complex types in jni framework, and successfully run end-to-end on hudi.
### How to Use
Other scanners only need to implement three interfaces in `ColumnValue`:
```
// Get array elements and append into values
void unpackArray(List<ColumnValue> values);

// Get map key array&value array, and append into keys&values
void unpackMap(List<ColumnValue> keys, List<ColumnValue> values);

// Get the struct fields specified by `structFieldIndex`, and append into values
void unpackStruct(List<Integer> structFieldIndex, List<ColumnValue> values);
```
Developers can take `HudiColumnValue` as an example.
2023-09-27 14:47:41 +08:00
a1ab8f96a1 [fix](nereids) mark two phase partition topn global to notice be passthrough logic (#24886)
mark partition topn phase to notice be to handle passthrough logic well, this pr is fe part code.
be side logic: the the phase equals to PTopNPhase.TWO_PAHSE_GLOBAL, it should skip the bypass logic and do the second phase ptopn operation anyway.
2023-09-27 14:08:59 +08:00
1b0e3246ea [pipelineX](fix) Fix exception reporting and Nereids plan (#24936) 2023-09-27 13:15:40 +08:00
452318a9fc [Enhancement](streamload) stream tvf support user specified label (#24219)
stream tvf support user specified label
example:

curl -v --location-trusted -u root: -H "sql: insert into test.t1 WITH LABEL label1 select c1,c2 from http_stream(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/_http_stream
return:

{
    "TxnId": 2064,
    "Label": "label1",
    "Comment": "",
    "TwoPhaseCommit": "false",
    "Status": "Success",
    "Message": "OK",
    "NumberTotalRows": 2,
    "NumberLoadedRows": 2,
    "NumberFilteredRows": 0,
    "NumberUnselectedRows": 0,
    "LoadBytes": 27,
    "LoadTimeMs": 152,
    "BeginTxnTimeMs": 0,
    "StreamLoadPutTimeMs": 83,
    "ReadDataTimeMs": 92,
    "WriteDataTimeMs": 41,
    "CommitAndPublishTimeMs": 24
}
2023-09-27 12:09:35 +08:00
Pxl
18b5f70a7c [Bug](materialized-view) enable rewrite on select materialized index with aggregate mode (#24691)
enable rewrite on select materialized index with aggregate mode
2023-09-27 11:30:36 +08:00
a8f312794e [feature](nereids)support stats estimation for is-null predicate (#24764)
1. condition order: filter/hashCondition/otherCondition,
2. update regression out
3. remove tpch_sf500 shape case(covered by tpch sf1000)
4. implement is-null stats estimation
5. update ssb shape
2023-09-27 10:04:35 +08:00