Commit Graph

18429 Commits

Author SHA1 Message Date
a4924dabb7 [enhancement](exception) enble exception logic in pipeline execute thread (#33437)
* [enhancement](exception) enble exception logic in pipeline execute thread

* f

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-04-12 15:09:25 +08:00
Pxl
ab4f8fafcd [Bug](materialized-view) forbid create mv with value column before key column (#33436)
forbid create mv with value column before key column
2024-04-12 15:09:25 +08:00
Pxl
5f30463bb3 [Chore](descriptors) remove unused codes for descriptors (#33408)
remove unused codes for descriptors
2024-04-12 15:09:25 +08:00
d4a67d93f3 [improve](routine-load) timely pause job if Kafka cluster exception when consume (#33372) 2024-04-12 15:09:25 +08:00
26d9082b9a [Feature](function) Add function strcmp (#33272) 2024-04-12 15:09:25 +08:00
6f96e2b64a [fix](plsql) Fix handle select that fe can do without be (#33363)
CREATE OR REPLACE PROCEDURE procedure_test1()
BEGIN
select 1;
END; 

call procedure_test1()

fix `ERROR 2027 (HY000): Malformed packet`
2024-04-12 15:09:25 +08:00
215f402df7 [fix](nereids)when clause cannot be regarded as common sub expression (#33358)
* when clause cannot be regarded as common sub expression
2024-04-12 15:09:25 +08:00
31984bb4f0 [feature](function) support quote string function #33055 2024-04-12 15:09:25 +08:00
ef64d7a011 [feature](profile) add transaction statistics for profile (#33488)
1. commit total time
2. fs operator total time
     rename file count
     rename dir count
     delete dir count
3. add partition total time
    add partition count
4. update partition total time
    update partition count
like:
```
      -  Transaction  Commit  Time:  906ms
          -  FileSystem  Operator  Time:  833ms
              -  Rename  File  Count:  4
              -  Rename  Dir  Count:  0
              -  Delete  Dir  Count:  0
          -  HMS  Add  Partition  Time:  0ms
              -  HMS  Add  Partition  Count:  0
          -  HMS  Update  Partition  Time:  68ms
              -  HMS  Update  Partition  Count:  4
```
2024-04-12 15:06:16 +08:00
ee36b2f70d [branch-2.1](opt)(profile) parallel serialize fragment and add detail schedule profile #33376 #33379 2024-04-12 13:15:56 +08:00
e841d82ffb [Enhancement](hive-writer) Adjust table sink exchange rebalancer params. (#33397)
Issue Number:  #31442

Change table sink exchange rebalancer params to node level and adjust these params to improve write performance by better balance.

rebalancer params:
```
DEFINE_mInt64(table_sink_partition_write_min_data_processed_rebalance_threshold,
              "26214400"); // 25MB
// Minimum partition data processed to rebalance writers in exchange when partition writing
DEFINE_mInt64(table_sink_partition_write_min_partition_data_processed_rebalance_threshold,
              "15728640"); // 15MB
```
2024-04-12 13:09:56 +08:00
d31bca199f [feature](iceberg)The new DDL syntax is added to create iceberg partitioned tables (#33338)
support partition by :

```
create table tb1 (c1 string, ts datetime) engine = iceberg partition by (c1, day(ts)) () properties ("a"="b")
```
2024-04-12 10:45:16 +08:00
f0463a9034 [Feature][Enhancement](hive-writer) Add hive-writer runtime profiles, change output file names (#33245)
Issue Number: #31442

- Add hive-writer runtime profiles.
- Change output file names to `${query_id}${uuid}-${index}.${compression}.${format}`. e.g. `"d8735c6fa444a6d-acd392981e510c2b_34fbdcbb-b2e1-4f2c-b68c-a384238954a9-0.snappy.parquet"`. For the same partition writer, when the file size exceeds `hive_sink_max_file_size`, the currently written file will be closed and a new file will be generated, in which ${index} in the new file name will be incremented, while the rest will be the same .
2024-04-12 10:43:16 +08:00
18fb8407ae [feature](insert)use optional location and add hive regression test (#33153) 2024-04-12 10:38:54 +08:00
31a7060dbd [testcase](hive)add exception test for hive txn (#33278)
Issue #31442
#32726

1. add LocalDfsFileSystem to manipulate local files.
2. add HMSCachedClientTest to analog HMS services.
3. add test for rollback commit.
2024-04-12 10:38:48 +08:00
e11db3f050 [feature](hive)support ExternalTransaction for writing exteral table (#32726)
Issue #31442

Add `TransactionManager` and `Transaction`. 

```
public interface Transaction {
    void commit() throws UserException;
    void rollback();
}
public interface TransactionManager {
    long begin();
    void commit(long id) throws UserException;
    void rollback(long id);
    Transaction getTransaction(long id);
}
```
`TransactionManager` is used to manage all external transactions:
The application layer should manage the entire transaction through this `TransactionManager`, like:
```
transactionManager.commit();
transactionManager.rollback();
```

`Transaction` is an interface. You can implement this interface according to the specific content, such as `HMSTransaction` currently implemented, iceberg that may be implemented in the future, etc.
2024-04-12 10:38:12 +08:00
f0ac21e231 [feature](external) process tbl/db exist when create/drop db/tbl (#33119)
Issue Number: #31442
2024-04-12 10:36:43 +08:00
7a05396cd1 [feature](multi-catalog)support catalog name when create/drop db (#33116)
Issue Number: #31442
2024-04-12 10:36:18 +08:00
01b21da82d [feature](insert)add hive insert plan ut and remove redundant fields (#33051)
add hive insert sink plan UT case
remove some deprecated code
2024-04-12 10:30:08 +08:00
3c9c6c18a8 [Enhancement](hive-writer) Write only regular fields to file in the hive-writer. (#33000) 2024-04-12 10:29:08 +08:00
07f296734a [regression](insert)add hive DDL and CTAS regression case (#32924)
Issue Number: #31442

dependent on #32824

add ddl(create and drop) test
add ctas test
add complex type test
TODO:
bucketed table test
truncate test
add/drop partition test
2024-04-12 10:24:23 +08:00
716c146750 [fix](insert)fix hive external return msgs and exception and pass all columns to BE (#32824)
[fix](insert)fix hive external return msgs and exception and pass all columns to BE
2024-04-12 10:23:52 +08:00
f3a6132214 [chore] Format regression-conf.groovy (#32713) 2024-04-12 10:21:47 +08:00
9ada38327b [feature](txn insert) txn insert support insert into select (#31666) 2024-04-12 10:11:22 +08:00
bd364897d4 [feature](hive/iceberg)add doris's version in table properties (#32774)
issue #31442
when create a external table, we can add doris's version in table's properties.
2024-04-12 10:02:31 +08:00
b98d225183 [fix](insert)fix hive table sink type coercion and unify coercion (#32762)
Issue Number: #31442
2024-04-12 10:02:09 +08:00
3343322965 [fix](insert)fix conversion of doris type to hive type (#32735)
#31442

create table
fix doris to hive type, use primitiveType to check doris type.
2024-04-12 10:01:30 +08:00
70489fe749 [fix](insert)fix hive table sink write path (#32587)
issue: #31442

fix hive table sink write path to hdfs://${hdfs_root}/tmp/.doris_staging/${user}
2024-04-12 10:00:48 +08:00
c68b353017 [feature][insert]add FE UT and support CTAS for external table (#32525)
1. add FE ut for create hive table
2. support external CTAS:

> source table:
```
mysql> show create table hive.jz3.test;

CREATE TABLE `test`(
  `id` int COMMENT 'col1',
  `name` string COMMENT 'col2')
PARTITIONED BY (
 `dt` string,
 `dtm` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://HDFS8000871/usr/hive/warehouse/jz3.db/test'
TBLPROPERTIES (
  'transient_lastDdlTime'='1710837792',
  'file_format'='orc')
```


> create unpartitioned target table
```
mysql> create table hive.jz3.ctas engine=hive as select * from hive.jz3.test;
mysql> show create table ctas;

CREATE TABLE `ctas`(
  `id` int COMMENT '',
  `name` string COMMENT '',
  `dt` string COMMENT '',
  `dtm` string COMMENT '')
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://HDFS8000871/usr/hive/warehouse/jz3.db/ctas'
TBLPROPERTIES (
  'transient_lastDdlTime'='1710860377')

```


> create partitioned target table
```
mysql> create table hive.jz3.ctas1 engine=hive partition by list (dt,dtm) () as select * from hive.jz3.test;
mysql> show create table hive.jz3.ctas1;

CREATE TABLE `ctas1`(
  `id` int COMMENT '',
  `name` string COMMENT '')
PARTITIONED BY (
 `dt` string,
 `dtm` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://HDFS8000871/usr/hive/warehouse/jz3.db/ctas1'
TBLPROPERTIES (
  'transient_lastDdlTime'='1710919070')
```
2024-04-12 09:58:49 +08:00
36a1bf1d73 [feature][insert]Adapt the create table  statement to the nereids sql (#32458)
issue: #31442

1. adapt  create table statement from doris  to hive
2. fix insert overwrite for table sink

> The doris create hive table statement:

```
mysql> CREATE TABLE buck2(
    ->     id int COMMENT 'col1',
    ->     name string COMMENT 'col2',
    ->     dt string COMMENT 'part1',
    ->     dtm string COMMENT 'part2'
    -> ) ENGINE=hive
    -> COMMENT "create tbl"
    -> PARTITION BY LIST (dt, dtm) ()
    -> DISTRIBUTED BY HASH (id) BUCKETS 16
    -> PROPERTIES(
    ->     "file_format" = "orc"
    -> );
```

> generated  hive create table statement:

```
CREATE TABLE `buck2`(
  `id` int COMMENT 'col1',
  `name` string COMMENT 'col2')
PARTITIONED BY (
 `dt` string,
 `dtm` string)
CLUSTERED BY (
  id)
INTO 16 BUCKETS
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://HDFS8000871/usr/hive/warehouse/jz3.db/buck2'
TBLPROPERTIES (
  'transient_lastDdlTime'='1710840747',
  'doris.file_format'='orc')

```
2024-04-12 09:57:37 +08:00
babec88aa9 fix cloud mode from PR #32748 2024-04-11 23:01:06 +08:00
dc8da9ee89 [Fix](nereids) fix qualifier problem that affects delete stmt in another catalog (#33528)
* [Fix](nereids) fix qualifier problem that affects delete stmt in another catalog

* [Fix](nereids) fix qualifier problem that affects delete stmt in another catalog

* [Fix](nereids) fix qualifier problem that affects delete stmt in another catalog

* [Fix](nereids) fix qualifier problem that affects delete stmt in another catalog

---------

Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
2024-04-11 21:43:01 +08:00
3d66723214 [branch-2.1](auto-partition) pick auto partition and some more prs (#33523) 2024-04-11 17:12:17 +08:00
af95302088 fix compile 2024-04-11 13:10:24 +08:00
69fc8cf06d [branch-2.1](memory) Fix rowid storage reader memory tracker (#33521)
fix:
F20240411 10:26:06.693233 1368925 thread_context.h:293] __builtin_unreachable, If you crash here, it means that SCOPED_ATTACH_TASK and SCOPED_SWITCH_THREAD_MEM_TRACKER_LIMITER are not used correctly. starting position of each thread is expected to use SCOPED_ATTACH_TASK to bind a MemTrackerLimiter belonging to Query/Load/Compaction/Other Tasks, otherwise memory alloc using Doris Allocator in the thread will crash. If you want to switch MemTrackerLimiter during thread execution, please use SCOPED_SWITCH_THREAD_MEM_TRACKER_LIMITER, do not repeat Attach. Of course, you can modify enable_memory_orphan_check=false in be.conf to avoid this crash.
*** Check failure stack trace: ***
    @     0x562d9b5aa6a6  google::LogMessage::SendToLog()
    @     0x562d9b5a70f0  google::LogMessage::Flush()
    @     0x562d9b5aaee9  google::LogMessageFatal::~LogMessageFatal()
    @     0x562d7ebd1b7e  doris::thread_context()
    @     0x562d7ec203b8  Allocator<>::sys_memory_check()
    @     0x562d7ec255a3  Allocator<>::memory_check()
    @     0x562d7ec274a1  Allocator<>::alloc_impl()
    @     0x562d7ec27227  Allocator<>::alloc()
    @     0x562d67a12207  doris::vectorized::PODArrayBase<>::alloc<>()
    @     0x562d67a11fde  doris::vectorized::PODArrayBase<>::realloc<>()
    @     0x562d67a11e26  doris::vectorized::PODArrayBase<>::reserve<>()
    @     0x562d77331ee3  doris::vectorized::ColumnVector<>::reserve()
    @     0x562d7e64328e  doris::vectorized::ColumnNullable::reserve()
    @     0x562d7ec79a84  doris::vectorized::Block::Block()
    @     0x562d6b86b81b  doris::PInternalServiceImpl::_multi_get()
    @     0x562d6b8a4a07  doris::PInternalServiceImpl::multiget_data()::$_0::operator()()
2024-04-11 13:10:24 +08:00
Pxl
5688c28364 [Bug](runtime-filter) try to fix heap use after free on runtime filter send filter size (#33465) (#33522) 2024-04-11 13:10:24 +08:00
9a3b19d21e [fix](cases) Add check status timeout for backup/restore cases (#32975) 2024-04-11 13:10:24 +08:00
ff38e7c497 [log](chore) print isBad in Replica::toString() (#33427) 2024-04-11 09:31:50 +08:00
b5a84f7d23 Fix alter column stats without min max value deserialize failure. (#33406) 2024-04-11 09:31:50 +08:00
f7d52b5b1c [feature](expr) add type check when expr prepare (#33330) 2024-04-11 09:31:50 +08:00
bc929686e3 [feature](debug point) add macro DBUG_RUN_CALLBACK (#33407) 2024-04-11 09:31:50 +08:00
e4eb76212a [fix](Nereids): add order for constraint test (#33323) 2024-04-11 09:31:50 +08:00
ef26479282 [improve](serde) support complex type in write/read pb serde (#33124)
support complex type and ip/jsonb in DataTypeSerDe::write_column_to_pb/read_column_from_pb function
2024-04-11 09:31:50 +08:00
Pxl
3070eda58c [Bug](load) fix stream load file on hll type mv column (#33373)
fix stream load file on hll type mv column
2024-04-11 09:31:50 +08:00
f35dd3fc35 [chore](test) let some case suitable for legacy planner and nereids (#33352) 2024-04-11 09:31:50 +08:00
a38b97fbdd [bugfix](profile) should use backend ip:heartbeat port as key during merge profile (#33368) 2024-04-11 09:31:50 +08:00
ea1e542e31 [fix](partial-update) remove unnecessary DECHEK on IndexChannel::num_rows_filtered (#33160) 2024-04-11 09:31:50 +08:00
2708641bee [Fix]fix insert overwrite non-partition table null pointer exception (#33205)
fix legacy planner bug when insert overwrite non-partition table.
2024-04-11 09:31:50 +08:00
38b2e58d59 [Improvement](executor)cancel query when a query is queued (#33339) 2024-04-11 09:31:50 +08:00
326eee5d04 [Fix](schema change) Fix schema change fault when add complex type column (#31824)
Problem: An error is encountered when executing a schema change on a unique table to add a column with a complex type, such as bitmap, as documented in https://github.com/apache/doris/issues/31365

Reason: The error arises because the schema change logic erroneously applies an aggregation check for new columns across all table types, demanding an explicit aggregation type declaration. However, unique and duplicate tables inherently assume default aggregation types for newly added columns, leading to this misstep.

Solution: The schema change process for introducing new columns needs to distinguish between table types accurately. For unique and duplicate tables, it should automatically assign the appropriate aggregation type, which, for the purpose of smooth integration with subsequent processes, should be set to NONE.
2024-04-11 09:31:50 +08:00