Commit Graph

18102 Commits

Author SHA1 Message Date
Pxl
883f0a96c4 [Improvementation](stream-load) improve streamLoadPut log warning detail (#33535)
improve streamLoadPut log warning detail
2024-04-17 23:41:59 +08:00
911f61c68d [opt](Nereids) support set operation minus (#33582) 2024-04-17 23:41:59 +08:00
5d5b059e3a [fix](memory) Fix compaction destructor memory tracking #33549 (#33572) 2024-04-17 23:41:59 +08:00
38c5030f97 [opt](log) refactor the log dir config (#32933)
Refactor the config for log dir of FE and BE

TLDR:
- Use env variable `LOG_DIR` to set root log dir
- Remove `sys_log_dir` for FE and BE

Details:

1. FE

    1. The root log dir is set by env variable `LOG_DIR` in `fe.conf`
    2. The default value of `audit_log_dir` is same as `${LOG_DIR}/`
    3. The default value of `spark_launcher_log_dir` is `${LOG_DIR}/spark_launcher_log`
    4. The default value of `nereids_trace_log_dir` is `${LOG_DIR}/nereids_trace_log`
    5. The origin `sys_log_dir` is deprecated, and default value is `""`.
        But for compatibility, if user already set `sys_log_dir` before, Doris will still use it as root log dir.

2. BE

     1. The root log dir is set by env variable `LOG_DIR` in `be.conf`
     2. Remove `pipeline_tracing_log_dir`, use `${LOG_DIR}` directly.
     3. The origin `sys_log_dir` is deprecated, and default value is `""`.
         But for compatibility, if user already set `sys_log_dir` before, Doris will still use it as root log dir.
2024-04-17 23:41:59 +08:00
09fb30c989 (Chore)[regression-test] fix unstable output variant case (#33520) 2024-04-17 23:41:59 +08:00
82d2bde3c7 [fix](nereids) do not transpose semi join agg when mark join (#32475) 2024-04-17 23:41:59 +08:00
d436dd6264 [fix](mtmv)add logs for mv_infos() (#33485) 2024-04-17 23:41:59 +08:00
92d28e497b [refactor](Nereids): compute unique and uniform property respectively (#32908) 2024-04-17 23:41:59 +08:00
272269f9c1 [Fix](inverted index) fix fast execute problem when need read data opt enabled (#33526) 2024-04-17 23:41:59 +08:00
ca59b25d59 [feat](nereids) add session var to turn on/off common sub expressoin extraction (branch-2.1) #33616 2024-04-17 23:41:43 +08:00
e26a53d8a6 [fix](nereids) SemiJoinSemiJoinTransposeProject rule didn't handle mark join correctly (#33401) 2024-04-12 15:09:25 +08:00
78b81d4150 [fix](test) remove distribute node of shape in some regression test (#33463) 2024-04-12 15:09:25 +08:00
87806a0137 [fix](debug point) fix gcc compile (#33451) 2024-04-12 15:09:25 +08:00
d2f84229ec [chore](test) remove some outdated datetime test case (#33476) 2024-04-12 15:09:25 +08:00
b035c7ceb4 [fix](catalog) fix resource is not reopen when rename catalog (#33432)
During the renaming of `JdbcCatalog`, I noticed that the `jdbcClient` was being closed, 
resulting in exceptions during subsequent queries. This happens because the `removeCatalog` 
method is invoked when changing the name, which in turn calls the `onClose` method of the catalog. 
Ideally, the client should not be closed when renaming the catalog. 
However, to avoid extra checks in the `removeCatalog` method, we can simply execute `onRefresh` 
in the `addCatalog` method to address this issue.
2024-04-12 15:09:25 +08:00
22c42209f7 [fix](nereids) fix a visitor bug in CommonSubExpressionOpt (#33154)
* fix a bug in visitproject
* fix-variant
2024-04-12 15:09:25 +08:00
8c66915bb5 [fix](doris compose) Fix not show ms recycler .out log in cloud mode (#33489) 2024-04-12 15:09:25 +08:00
fefbde8927 [log](move-memtable) improve logs in vtablet_writer_v2 and load_stream (#33103) 2024-04-12 15:09:25 +08:00
1da1fac4ee [improve](load) try lock 30ms to get base_migration_lock in rowset builder (#32243) 2024-04-12 15:09:25 +08:00
53336d6170 [fix](routine-load) routine load date where expression rewrite should change over time (#33345) 2024-04-12 15:09:25 +08:00
031f1e6a65 [case](regression) Add backup restore with NGRAM bloom filter (#33479) 2024-04-12 15:09:25 +08:00
9b7af4c0cf [feature](schema change) unified schema change for parquet and orc reader (#32873)
Following #25138, unified schema change interface for parquet and orc reader, and can be applied to other format readers as well.
Unified schema change interface for all format readers:
- First, read the data according to the column type of the file into source column;
- Second, convert source column to the destination column with type planned by FE.
2024-04-12 15:09:25 +08:00
fe772c76e7 [fix](restore) Fix the conflict IDs between two cluster (#33423)
The meta of the restore may come from different clusters, so the
original ID in the meta may conflict with the ID of the new cluster. For
example, if a newly allocated ID happens to be the same as an original ID,
the original one may be overwritten when executing `put`.
2024-04-12 15:09:25 +08:00
a4924dabb7 [enhancement](exception) enble exception logic in pipeline execute thread (#33437)
* [enhancement](exception) enble exception logic in pipeline execute thread

* f

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-04-12 15:09:25 +08:00
Pxl
ab4f8fafcd [Bug](materialized-view) forbid create mv with value column before key column (#33436)
forbid create mv with value column before key column
2024-04-12 15:09:25 +08:00
Pxl
5f30463bb3 [Chore](descriptors) remove unused codes for descriptors (#33408)
remove unused codes for descriptors
2024-04-12 15:09:25 +08:00
d4a67d93f3 [improve](routine-load) timely pause job if Kafka cluster exception when consume (#33372) 2024-04-12 15:09:25 +08:00
26d9082b9a [Feature](function) Add function strcmp (#33272) 2024-04-12 15:09:25 +08:00
6f96e2b64a [fix](plsql) Fix handle select that fe can do without be (#33363)
CREATE OR REPLACE PROCEDURE procedure_test1()
BEGIN
select 1;
END; 

call procedure_test1()

fix `ERROR 2027 (HY000): Malformed packet`
2024-04-12 15:09:25 +08:00
215f402df7 [fix](nereids)when clause cannot be regarded as common sub expression (#33358)
* when clause cannot be regarded as common sub expression
2024-04-12 15:09:25 +08:00
31984bb4f0 [feature](function) support quote string function #33055 2024-04-12 15:09:25 +08:00
ef64d7a011 [feature](profile) add transaction statistics for profile (#33488)
1. commit total time
2. fs operator total time
     rename file count
     rename dir count
     delete dir count
3. add partition total time
    add partition count
4. update partition total time
    update partition count
like:
```
      -  Transaction  Commit  Time:  906ms
          -  FileSystem  Operator  Time:  833ms
              -  Rename  File  Count:  4
              -  Rename  Dir  Count:  0
              -  Delete  Dir  Count:  0
          -  HMS  Add  Partition  Time:  0ms
              -  HMS  Add  Partition  Count:  0
          -  HMS  Update  Partition  Time:  68ms
              -  HMS  Update  Partition  Count:  4
```
2024-04-12 15:06:16 +08:00
ee36b2f70d [branch-2.1](opt)(profile) parallel serialize fragment and add detail schedule profile #33376 #33379 2024-04-12 13:15:56 +08:00
e841d82ffb [Enhancement](hive-writer) Adjust table sink exchange rebalancer params. (#33397)
Issue Number:  #31442

Change table sink exchange rebalancer params to node level and adjust these params to improve write performance by better balance.

rebalancer params:
```
DEFINE_mInt64(table_sink_partition_write_min_data_processed_rebalance_threshold,
              "26214400"); // 25MB
// Minimum partition data processed to rebalance writers in exchange when partition writing
DEFINE_mInt64(table_sink_partition_write_min_partition_data_processed_rebalance_threshold,
              "15728640"); // 15MB
```
2024-04-12 13:09:56 +08:00
d31bca199f [feature](iceberg)The new DDL syntax is added to create iceberg partitioned tables (#33338)
support partition by :

```
create table tb1 (c1 string, ts datetime) engine = iceberg partition by (c1, day(ts)) () properties ("a"="b")
```
2024-04-12 10:45:16 +08:00
f0463a9034 [Feature][Enhancement](hive-writer) Add hive-writer runtime profiles, change output file names (#33245)
Issue Number: #31442

- Add hive-writer runtime profiles.
- Change output file names to `${query_id}${uuid}-${index}.${compression}.${format}`. e.g. `"d8735c6fa444a6d-acd392981e510c2b_34fbdcbb-b2e1-4f2c-b68c-a384238954a9-0.snappy.parquet"`. For the same partition writer, when the file size exceeds `hive_sink_max_file_size`, the currently written file will be closed and a new file will be generated, in which ${index} in the new file name will be incremented, while the rest will be the same .
2024-04-12 10:43:16 +08:00
18fb8407ae [feature](insert)use optional location and add hive regression test (#33153) 2024-04-12 10:38:54 +08:00
31a7060dbd [testcase](hive)add exception test for hive txn (#33278)
Issue #31442
#32726

1. add LocalDfsFileSystem to manipulate local files.
2. add HMSCachedClientTest to analog HMS services.
3. add test for rollback commit.
2024-04-12 10:38:48 +08:00
e11db3f050 [feature](hive)support ExternalTransaction for writing exteral table (#32726)
Issue #31442

Add `TransactionManager` and `Transaction`. 

```
public interface Transaction {
    void commit() throws UserException;
    void rollback();
}
public interface TransactionManager {
    long begin();
    void commit(long id) throws UserException;
    void rollback(long id);
    Transaction getTransaction(long id);
}
```
`TransactionManager` is used to manage all external transactions:
The application layer should manage the entire transaction through this `TransactionManager`, like:
```
transactionManager.commit();
transactionManager.rollback();
```

`Transaction` is an interface. You can implement this interface according to the specific content, such as `HMSTransaction` currently implemented, iceberg that may be implemented in the future, etc.
2024-04-12 10:38:12 +08:00
f0ac21e231 [feature](external) process tbl/db exist when create/drop db/tbl (#33119)
Issue Number: #31442
2024-04-12 10:36:43 +08:00
7a05396cd1 [feature](multi-catalog)support catalog name when create/drop db (#33116)
Issue Number: #31442
2024-04-12 10:36:18 +08:00
01b21da82d [feature](insert)add hive insert plan ut and remove redundant fields (#33051)
add hive insert sink plan UT case
remove some deprecated code
2024-04-12 10:30:08 +08:00
3c9c6c18a8 [Enhancement](hive-writer) Write only regular fields to file in the hive-writer. (#33000) 2024-04-12 10:29:08 +08:00
07f296734a [regression](insert)add hive DDL and CTAS regression case (#32924)
Issue Number: #31442

dependent on #32824

add ddl(create and drop) test
add ctas test
add complex type test
TODO:
bucketed table test
truncate test
add/drop partition test
2024-04-12 10:24:23 +08:00
716c146750 [fix](insert)fix hive external return msgs and exception and pass all columns to BE (#32824)
[fix](insert)fix hive external return msgs and exception and pass all columns to BE
2024-04-12 10:23:52 +08:00
f3a6132214 [chore] Format regression-conf.groovy (#32713) 2024-04-12 10:21:47 +08:00
9ada38327b [feature](txn insert) txn insert support insert into select (#31666) 2024-04-12 10:11:22 +08:00
bd364897d4 [feature](hive/iceberg)add doris's version in table properties (#32774)
issue #31442
when create a external table, we can add doris's version in table's properties.
2024-04-12 10:02:31 +08:00
b98d225183 [fix](insert)fix hive table sink type coercion and unify coercion (#32762)
Issue Number: #31442
2024-04-12 10:02:09 +08:00
3343322965 [fix](insert)fix conversion of doris type to hive type (#32735)
#31442

create table
fix doris to hive type, use primitiveType to check doris type.
2024-04-12 10:01:30 +08:00