Commit Graph

2204 Commits

Author SHA1 Message Date
5734e2bd30 [opt](meta-cache) refine the meta cache (#33449) (#33754)
bp #33449
2024-04-17 23:42:13 +08:00
2890f6c3cf [opt](Nereids) date literal support basic format with timezone (#33662) 2024-04-17 23:42:13 +08:00
ca728a2405 [feature](proc)Add table's indexes info in show proc interface (#33438)
1. Add show proc `/dbs/db_id/table_id/indexes` impl
2. Remove index_id in `show index from table`
3. Add test cases

---------

Co-authored-by: Luennng <luennng@gmail.com>
2024-04-17 23:42:13 +08:00
8e38549a92 [fix](nereids) Use correct PREAGGREGATION in agg(filter(scan)) (#33454)
1. set `PreAggStatus` to `ON` when agg key column by max or min;
2. #28747 may change `PreAggStatus` of scan, inherit it from the previous one.
2024-04-17 23:42:13 +08:00
d18f5e2544 [refactor](refresh-catalog) refactor the refresh catalog code (#33653)
To unify the code.
In previous, we do catalog refresh in `CatalogMgr`, but do
database and table refresh in `RefreshMgr`, which is very confusing.

This PR move all `refresh` related code from CatalogMgr to RefreshMgr.

No logic is changed in this PR.
2024-04-17 23:42:12 +08:00
466b9f35d5 [fix](nereids)EliminateGroupBy should keep the output's datatype same as old ones (#33585) 2024-04-17 23:42:12 +08:00
7659b1aa67 [opt](Nereids) prefer slot type to support delete task better (#33559) 2024-04-17 23:42:12 +08:00
e53a76d75b [fix](planner) fix bug of InlineViewRef's tableNameToSql method (#33575) 2024-04-17 23:42:12 +08:00
b2face0d20 [feature](Nereids): date literal suppose Zone (#33534)
support
```
'2022-05-01 01:02:55+02:30
'2022-05-01 01:02:55Asia/Shanghai
```
2024-04-17 23:42:12 +08:00
2cd4012541 [opt](scan) read scan ranges in the order of partitions (#33515) (#33657)
backport: #33515
2024-04-17 23:42:12 +08:00
1be753ed75 [enhancement](mysql compatible) add user and procs_priv tables to mysql db in all catalogs (#33058)
Issue Number: close #xxx

This PR aims to enhance the compatibility of BI tools (such as Dbeaver, DataGrip) when using the mysql connector to connect to Doris, because some BI tools query some tables in the mysql database. In our tests, the user and procs_priv tables were mainly queried. This PR adds these two tables and adds actual data to the user table. However, please note that most of the fields in the user table are in Doris' own format rather than mysql format, so it can only ensure that the BI tool is querying No error is reported when accessing these tables, which does not guarantee that the data is completely displayed, and the tables under Doris's mysql database do not support data modification.
Thanks to @liujiwen-up for assisting in testing
2024-04-17 23:42:12 +08:00
b2b385a4ff [improve](fold) support complex type for constant folding (#32867) 2024-04-17 23:41:59 +08:00
92d28e497b [refactor](Nereids): compute unique and uniform property respectively (#32908) 2024-04-17 23:41:59 +08:00
e26a53d8a6 [fix](nereids) SemiJoinSemiJoinTransposeProject rule didn't handle mark join correctly (#33401) 2024-04-12 15:09:25 +08:00
Pxl
5f30463bb3 [Chore](descriptors) remove unused codes for descriptors (#33408)
remove unused codes for descriptors
2024-04-12 15:09:25 +08:00
ef64d7a011 [feature](profile) add transaction statistics for profile (#33488)
1. commit total time
2. fs operator total time
     rename file count
     rename dir count
     delete dir count
3. add partition total time
    add partition count
4. update partition total time
    update partition count
like:
```
      -  Transaction  Commit  Time:  906ms
          -  FileSystem  Operator  Time:  833ms
              -  Rename  File  Count:  4
              -  Rename  Dir  Count:  0
              -  Delete  Dir  Count:  0
          -  HMS  Add  Partition  Time:  0ms
              -  HMS  Add  Partition  Count:  0
          -  HMS  Update  Partition  Time:  68ms
              -  HMS  Update  Partition  Count:  4
```
2024-04-12 15:06:16 +08:00
d31bca199f [feature](iceberg)The new DDL syntax is added to create iceberg partitioned tables (#33338)
support partition by :

```
create table tb1 (c1 string, ts datetime) engine = iceberg partition by (c1, day(ts)) () properties ("a"="b")
```
2024-04-12 10:45:16 +08:00
18fb8407ae [feature](insert)use optional location and add hive regression test (#33153) 2024-04-12 10:38:54 +08:00
31a7060dbd [testcase](hive)add exception test for hive txn (#33278)
Issue #31442
#32726

1. add LocalDfsFileSystem to manipulate local files.
2. add HMSCachedClientTest to analog HMS services.
3. add test for rollback commit.
2024-04-12 10:38:48 +08:00
e11db3f050 [feature](hive)support ExternalTransaction for writing exteral table (#32726)
Issue #31442

Add `TransactionManager` and `Transaction`. 

```
public interface Transaction {
    void commit() throws UserException;
    void rollback();
}
public interface TransactionManager {
    long begin();
    void commit(long id) throws UserException;
    void rollback(long id);
    Transaction getTransaction(long id);
}
```
`TransactionManager` is used to manage all external transactions:
The application layer should manage the entire transaction through this `TransactionManager`, like:
```
transactionManager.commit();
transactionManager.rollback();
```

`Transaction` is an interface. You can implement this interface according to the specific content, such as `HMSTransaction` currently implemented, iceberg that may be implemented in the future, etc.
2024-04-12 10:38:12 +08:00
f0ac21e231 [feature](external) process tbl/db exist when create/drop db/tbl (#33119)
Issue Number: #31442
2024-04-12 10:36:43 +08:00
7a05396cd1 [feature](multi-catalog)support catalog name when create/drop db (#33116)
Issue Number: #31442
2024-04-12 10:36:18 +08:00
01b21da82d [feature](insert)add hive insert plan ut and remove redundant fields (#33051)
add hive insert sink plan UT case
remove some deprecated code
2024-04-12 10:30:08 +08:00
07f296734a [regression](insert)add hive DDL and CTAS regression case (#32924)
Issue Number: #31442

dependent on #32824

add ddl(create and drop) test
add ctas test
add complex type test
TODO:
bucketed table test
truncate test
add/drop partition test
2024-04-12 10:24:23 +08:00
716c146750 [fix](insert)fix hive external return msgs and exception and pass all columns to BE (#32824)
[fix](insert)fix hive external return msgs and exception and pass all columns to BE
2024-04-12 10:23:52 +08:00
3343322965 [fix](insert)fix conversion of doris type to hive type (#32735)
#31442

create table
fix doris to hive type, use primitiveType to check doris type.
2024-04-12 10:01:30 +08:00
c68b353017 [feature][insert]add FE UT and support CTAS for external table (#32525)
1. add FE ut for create hive table
2. support external CTAS:

> source table:
```
mysql> show create table hive.jz3.test;

CREATE TABLE `test`(
  `id` int COMMENT 'col1',
  `name` string COMMENT 'col2')
PARTITIONED BY (
 `dt` string,
 `dtm` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://HDFS8000871/usr/hive/warehouse/jz3.db/test'
TBLPROPERTIES (
  'transient_lastDdlTime'='1710837792',
  'file_format'='orc')
```


> create unpartitioned target table
```
mysql> create table hive.jz3.ctas engine=hive as select * from hive.jz3.test;
mysql> show create table ctas;

CREATE TABLE `ctas`(
  `id` int COMMENT '',
  `name` string COMMENT '',
  `dt` string COMMENT '',
  `dtm` string COMMENT '')
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://HDFS8000871/usr/hive/warehouse/jz3.db/ctas'
TBLPROPERTIES (
  'transient_lastDdlTime'='1710860377')

```


> create partitioned target table
```
mysql> create table hive.jz3.ctas1 engine=hive partition by list (dt,dtm) () as select * from hive.jz3.test;
mysql> show create table hive.jz3.ctas1;

CREATE TABLE `ctas1`(
  `id` int COMMENT '',
  `name` string COMMENT '')
PARTITIONED BY (
 `dt` string,
 `dtm` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://HDFS8000871/usr/hive/warehouse/jz3.db/ctas1'
TBLPROPERTIES (
  'transient_lastDdlTime'='1710919070')
```
2024-04-12 09:58:49 +08:00
36a1bf1d73 [feature][insert]Adapt the create table  statement to the nereids sql (#32458)
issue: #31442

1. adapt  create table statement from doris  to hive
2. fix insert overwrite for table sink

> The doris create hive table statement:

```
mysql> CREATE TABLE buck2(
    ->     id int COMMENT 'col1',
    ->     name string COMMENT 'col2',
    ->     dt string COMMENT 'part1',
    ->     dtm string COMMENT 'part2'
    -> ) ENGINE=hive
    -> COMMENT "create tbl"
    -> PARTITION BY LIST (dt, dtm) ()
    -> DISTRIBUTED BY HASH (id) BUCKETS 16
    -> PROPERTIES(
    ->     "file_format" = "orc"
    -> );
```

> generated  hive create table statement:

```
CREATE TABLE `buck2`(
  `id` int COMMENT 'col1',
  `name` string COMMENT 'col2')
PARTITIONED BY (
 `dt` string,
 `dtm` string)
CLUSTERED BY (
  id)
INTO 16 BUCKETS
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://HDFS8000871/usr/hive/warehouse/jz3.db/buck2'
TBLPROPERTIES (
  'transient_lastDdlTime'='1710840747',
  'doris.file_format'='orc')

```
2024-04-12 09:57:37 +08:00
dc8da9ee89 [Fix](nereids) fix qualifier problem that affects delete stmt in another catalog (#33528)
* [Fix](nereids) fix qualifier problem that affects delete stmt in another catalog

* [Fix](nereids) fix qualifier problem that affects delete stmt in another catalog

* [Fix](nereids) fix qualifier problem that affects delete stmt in another catalog

* [Fix](nereids) fix qualifier problem that affects delete stmt in another catalog

---------

Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
2024-04-11 21:43:01 +08:00
3d66723214 [branch-2.1](auto-partition) pick auto partition and some more prs (#33523) 2024-04-11 17:12:17 +08:00
045dd05f2a [fix](Nereids): don't transpose agg and join if join is mark join (#33312) 2024-04-10 16:23:20 +08:00
16f8afc408 [refactor](coordinator) split profile logic and instance report logic (#32010)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-04-10 15:51:32 +08:00
5e59c09a60 [Fix](nereids) modify the binding aggregate function in order by (#32758)
modify the bind logical to make the order by has same behavior with mysql when sort child is aggregate.
when an order by Expr has aggregate function, all slots in this order by Expr should bind the LogicalAggregate non-AggFunction outputs first, then bind the LogicalAggregate Child
e.g.
select 2*abs(sum(c1)) as c1, c1,sum(c1)+c1 from t_order_by_bind_priority group by c1 order by sum(c1)+c1 asc;
in this sql, the two c1 in order by all bind to the c1 in t_order_by_bind_priority
2024-04-10 15:26:09 +08:00
38d580dfb7 [fix](Nereids) fix link children failed (#33134)
#32617 introduce a bug: rewrite may not working when plan's arity >= 3.
this pr fix it

(cherry picked from commit 8b070d1a9d43aa7d25225a79da81573c384ee825)
2024-04-10 14:59:45 +08:00
ff990eb869 [enhancement](Nereids) refactor expression rewriter to pattern match (#32617)
this pr can improve the performance of the nereids planner, in plan stage.

1. refactor expression rewriter to pattern match, so the lots of expression rewrite rules can criss-crossed apply in a big bottom-up iteration, and rewrite until the expression became stable. now we can process more cases because original there has no loop, and sometimes only process the top expression, like `SimplifyArithmeticRule`.
2. replace `Collection.stream()` to `ImmutableXxx.Builder` to avoid useless method call
3. loop unrolling some codes, like `Expression.<init>`, `PlanTreeRewriteBottomUpJob.pushChildrenJobs`
4. use type/arity specified-code, like `OneRangePartitionEvaluator.toNereidsLiterals()`, `PartitionRangeExpander.tryExpandRange()`, `PartitionRangeExpander.enumerableCount()`
5. refactor `ExtractCommonFactorRule`, now we can extract more cases, and I fix the deed loop when use `ExtractCommonFactorRule` and `SimplifyRange` in one iterative, because `SimplifyRange` generate right deep tree, but `ExtractCommonFactorRule` generate left deep tree
6. refactor `FoldConstantRuleOnFE`, support visitor/pattern match mode, in ExpressionNormalization, pattern match can criss-crossed apply with other rules; in PartitionPruner, visitor can evaluate expression faster
7. lazy compute and cache some operation
8. use int field to compare date
9. use BitSet to find disableNereidsRules
10. two level loop usually faster then build Multimap when bind slot in Scope, so I revert the code
11. `PlanTreeRewriteBottomUpJob` don't need to clearStatePhase any more

### test case
100 threads parallel continuous send this sql which query an empty table, test in my mac machine(m2 chip, 8 core), enable sql cache
```sql
select  count(1),date_format(time_col,'%Y%m%d'),varchar_col1
from tbl
where  partition_date>'2024-02-15'  and (varchar_col2 ='73130' or varchar_col3='73130') and time_col>'2024-03-04'
  and  time_col<'2024-03-05'
group by date_format(time_col,'%Y%m%d'),varchar_col1
order by date_format(time_col,'%Y%m%d') desc, varchar_col1 desc,count(1) asc
limit 1000
```

before this pr: 3100 peak QPS, about 2700 avg QPS
after this pr: 4800 peak QPS, about 4400 avg QPS

(cherry picked from commit 7338683fdbdf77711f2ce61e580c19f4ea100723)
2024-04-10 14:59:45 +08:00
a7c8abe58c [feature](nereids) support common sub expression by multi-layer projections (fe part) (#33087)
* cse fe part
2024-04-10 14:53:56 +08:00
dcddd88e01 Limit the max string length to 1024 while collecting column stats to control BE memory usage. (#32470) 2024-04-10 11:34:29 +08:00
0499d4013e Support identical column name in different index. (#32792) 2024-04-10 11:34:29 +08:00
407f8642da [Enhancement](data skew) extends show data skew (#32732) 2024-04-10 11:34:29 +08:00
e980cd3e7f [feature](Nereids): add ColumnPruningPostProcessor. (#32800) 2024-04-10 11:34:29 +08:00
26e86d53a4 [enhance](mtmv)support olap table partition column is null (#32698) 2024-04-10 11:34:29 +08:00
22a7fc3c55 [improvement](mtmv) Support to get tables in materialized view when collecting table in plan (#32797)
Support to get tables in materialized view when collecting table in plan

table scehma as fllowing:

create materialized view mv1
BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
DISTRIBUTED BY RANDOM BUCKETS 1 
PROPERTIES ('replication_num' = '1')
 as 
select 
  t1.c1, 
  t3.c2 
from 
  table1 t1 
  inner join table3 t3 on t1.c1 = t3.c2

if get table from the plan as follwoing, we can get [table1, table3, table2], the mv1 is expanded to get base tables;

SELECT 
  mv1.*, 
  uuid() 
FROM 
  mv1 LEFT SEMI 
  JOIN table2 ON mv1.c1 = table2.c1 
WHERE 
  mv1.c1 IN (
    SELECT 
      c1 
    FROM 
      table2
  ) 
  OR mv1.c1 < 10
2024-04-10 11:34:29 +08:00
dcfdbf0629 [chore](show) support statement to show views from table (#32358)
MySQL [test]> show views;
+----------------+
| Tables_in_test |
+----------------+
| t1_view        |
| t2_view        |
+----------------+
2 rows in set (0.00 sec)

MySQL [test]> show views like '%t1%';
+----------------+
| Tables_in_test |
+----------------+
| t1_view        |
+----------------+
1 row in set (0.01 sec)

MySQL [test]> show views where create_time > '2024-03-18';
+----------------+
| Tables_in_test |
+----------------+
| t2_view        |
+----------------+
1 row in set (0.02 sec)
2024-04-10 11:34:28 +08:00
217514e5dd [minor](test) Add Iceberg hadoop catalog FE unit test (#32449)
For easy testing the behavior of Iceberg's HadoopCatalog.listNamespaces()
2024-04-10 11:34:28 +08:00
e574b35833 [Enhancement](partition) Refine some auto partition behaviours (#32737) (#33412)
fix legacy planner grammer
fix nereids planner parsing
fix cases
forbid auto range partition with null column
fix CreateTableStmt with auto partition and some partition items.
1 and 2 are about #31585
doc pr: apache/doris-website#488
2024-04-09 15:51:02 +08:00
fae55e0e46 [Feature](information_schema) add processlist table for information_schema db (#32511) 2024-04-07 23:24:22 +08:00
b882704eaf [fix](Export) Set the default value of the data_consistence property of export to partition (#32830) 2024-04-07 23:24:22 +08:00
d9d950d98e [fix](iceberg) fix iceberg predicate conversion bug (#33283)
Followup #32923

Some cases are not covered in #32923
2024-04-07 22:12:38 +08:00
190763e301 [bugfix](iceberg)Convert the datetime type in the predicate according to the target column (#32923)
Convert the datetime type in the predicate according to the target column.
And add a testcase for #32194
related #30478 #30162
2024-04-07 22:12:33 +08:00
32d6a4fdd5 [opt](rowcount) refresh external table's rowcount async (#32997)
In previous implementation, the row count cache will be expired after 10min(by default),
and after expiration, the next row count request will miss the cache, causing unstable query plan.

In this PR, the cache will be refreshed after Config.external_cache_expire_time_minutes_after_access,
so that the cache entry will remain fresh.
2024-04-07 22:11:14 +08:00