Commit Graph

11668 Commits

Author SHA1 Message Date
48bfb8e9cf [Enhancement](regression-test)Add regression test for MoW backup and restore (#21223) 2023-07-05 15:16:04 +08:00
38c8657e5e [improve](memory) more grace logging for memory exceed limit (#21311)
more grace logging for Allocator and MemTracker when memory exceed limit
fix bthread grace exit.
2023-07-05 14:59:06 +08:00
f9bc433917 [fix](nereids) fix runtime filter expr order (#21480)
Current runtime filter pushing down to cte internal, we construct the runtime filter expr_order with incremental number, which is not correct. For cte internal rf pushing down, the join node will be always different, the expr_order should be fixed as 0 without incrementation, otherwise, it will lead the checking for expr_order and probe_expr_size illegal or wrong query result.

This pr will revert 2827bc1 temporarily, it will break the cte rf pushing down plan pattern.
2023-07-05 14:27:35 +08:00
Pxl
f02bec8ad1 [Chore](runtime filter) change runtime filter dcheck to error status or exception (#21475)
change runtime filter dcheck to error status or exception
2023-07-05 14:03:55 +08:00
d3eeb233c8 [fix](dbt) dbt getconfig array or string (#21345)
{{ config(unique_key='id') }}
{{ config(unique_key=['id','name']) }}
Follow the dbt habit, use string for a single column name, and use array for multiple columns
2023-07-05 11:42:38 +08:00
e510e6b0a6 [fix](dbt) dbt-doris match dbt-core==1.5 (#21392)
dbt-doris==0.2 match dbt-core==1.3 or older version

dbt-doris Subsequent version match dbt-core==1.4,1.5
2023-07-05 11:42:19 +08:00
c9c183e498 [fix](dbt) dbt seed config read (#21492) 2023-07-05 11:41:59 +08:00
0084b9fd9a [fix](hudi) scala can't call Properties.putAll in jdk11 (#21494) 2023-07-05 10:53:09 +08:00
de5cfe34bf [fix](feut)should not create a DeriveStatsJob in fe ut (#21498) 2023-07-05 10:38:09 +08:00
15ec191a77 [Fix](CCR) Use tableId as the credential for CCR syncer instead of tableName (#21466) 2023-07-05 10:16:09 +08:00
93795442a4 [Fix](CCR) Binlog config is missed when create replica task (#21397) 2023-07-05 10:15:13 +08:00
0469c02202 [Test](regression) Temporarily disable quickTest for SHOW CREATE TABLE to adapt to enable_feature_binlog=true (#21247) 2023-07-05 10:12:02 +08:00
122f5f6c2d [enchanment](udf) add more info when download jar package failed (#21440)
when download jar package, some times show the checksum is not equal,
but the root reason is unknown, now add some error msg if failed.
2023-07-04 20:35:35 +08:00
3b73604f74 [fix](memory) fix jemalloc purge arena dirty pages core dump (#21486)
Issue Number: close #xxx

jemalloc/jemalloc#2470
Occasional core dump during stress test.
2023-07-04 20:35:13 +08:00
81ee4d7402 [performance](group_concat) avoid extra copy in group_concat (#21432)
avoid extra copy in group_concat
2023-07-04 20:21:44 +08:00
8c2963961f [docs](releasenote) 2.0 beta release note (#21457) 2023-07-04 19:02:18 +08:00
f498beed07 [improvement](jdbc)Support for automatically obtaining the precision of the trino/presto timestamp type (#21386) 2023-07-04 18:59:42 +08:00
aec5bac498 [improvement](jdbc)Support for automatically obtaining the precision of the hana timestamp type (#21380) 2023-07-04 18:59:21 +08:00
b27fa70558 [fix](jdbc) fix presto jdbc catalog pushDown and nameFormat (#21447) 2023-07-04 18:58:33 +08:00
be406a1696 [typo](docs) fix presto jdbc catalog docs (#21445) 2023-07-04 18:24:58 +08:00
899f7fbfeb [fix](regression case) fix variable scope bug in some inverted index regression cases (#21194)
fix variable scope bug in some inverted index regression cases
2023-07-04 18:05:46 +08:00
9d997b9349 [revert](nereids) Revert data size agg (#21216)
To make stats derivation more precise
2023-07-04 18:02:15 +08:00
1b86e658fd [fix](Nereids): decrease the memo GroupExpression of limits (#21354) 2023-07-04 17:15:41 +08:00
13fb69550a [improvement](kerberos) disable hdfs fs handle cache to renew kerberos ticket at fix interval (#21265)
Add a new BE config `kerberos_ticket_lifetime_seconds`, default is 86400.
Better set it same as the value of `ticket_lifetime` in `krb5.conf`
If a HDFS fs handle in cache is live longer than HALF of this time, it will be set as invalid and recreated.
And the kerberos ticket will be renewed.
2023-07-04 17:13:34 +08:00
c2b483529c [fix](heartbeat) need to set backend status base on edit log (#21410)
For non-master FE, must set Backend's status based on the content of edit log.
There is a bug that if we set fe config: `max_backend_heartbeat_failure_tolerance_count` larger that one,
the non-master FE will not set Backend as dead until it receive enough number of heartbeat edit log,
which is wrong.
This will causing the Backend is dead on Master FE, but is alive on non-master FE
2023-07-04 17:12:53 +08:00
9adbca685a [opt](hudi) use spark bundle to read hudi data (#21260)
Use spark-bundle to read hudi data instead of using hive-bundle to read hudi data.

**Advantage** for using spark-bundle to read hudi data:
1. The performance of spark-bundle is more than twice that of hive-bundle
2. spark-bundle using `UnsafeRow` can reduce data copying and GC time of the jvm
3. spark-bundle support `Time Travel`, `Incremental Read`, and `Schema Change`, these functions can be quickly ported to Doris

**Disadvantage** for using spark-bundle to read hudi data:
1. More dependencies make hudi-dependency.jar very cumbersome(from 138M -> 300M)
2. spark-bundle only provides `RDD` interface and cannot be used directly
2023-07-04 17:04:49 +08:00
90dd8716ed [refactor](multicast) change the way multicast do filter, project and shuffle (#21412)
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>

1. Filtering is done at the sending end rather than the receiving end
2. Projection is done at the sending end rather than the receiving end
3. Each sender can use different shuffle policies to send data
2023-07-04 16:51:07 +08:00
09f414e0f4 fix lru cache handle field order (#21435)
For LRUHandle, all fields should be put ahead of key_data.
The LRUHandle is allocated using malloc and starting from field key_data is for key data.
2023-07-04 16:10:05 +08:00
9e8501f191 [Performance](Nereids): speedup analyze by removing sort()/addAll() in OptimizeGroupExpressionJob to (#21452)
sort() and allAll() all rules will cost much time and it's useless action, remove them to speed up.

explain tpcds q72: 1.72s -> 1.46s
2023-07-04 16:01:54 +08:00
890e55b604 [typo](docs)Delete unsupported sql statements in GROUP_CONCAT() (#21455)
Delete unsupported sql statements in GROUP_CONCAT()
2023-07-04 14:46:49 +08:00
Pxl
65cb91e60e [Chore](agg-state) add sessionvariable enable_agg_state (#21373)
add sessionvariable enable_agg_state
2023-07-04 14:25:21 +08:00
9477436524 [fix](test) add def keyword to define local variable success (#21206)
add def keyword to define local variable success
2023-07-04 14:24:37 +08:00
b5da3f74f5 [improvement](join) avoid unnecessary copying in _build_output_block (#21360)
If the source columns are mutually exclusive within a temporary block, there is no need to duplicate the data.
2023-07-04 12:13:49 +08:00
cac465472a [chore](tools) add submodules in .idea/vcs.xml (#21383) 2023-07-04 11:44:09 +08:00
e4c0a0ac24 [improve](dependency)Upgrade dependency version (#21431)
exclude old netty version
upgrade spring-boot version to 2.7.13
used ojdbc8 replace ojdbc6
upgrade jackson version to 2.15.2
upgrade fabric8 version to 6.7.2
2023-07-04 11:29:21 +08:00
b86dd11a7d [fix](pipeline) refactor olap table sink close (#20771)
For pipeline, olap table sink close is divided into three stages, try_close() --> pending_finish() --> close()
only after all node channels are done or canceled, pending_finish() will return false, close() will start.
this will avoid block pipeline on close().

In close, check the index channel intolerable failure status after each node channel failure,
if intolerable failure is true, the close will be terminated in advance, and all node channels will be canceled to avoid meaningless blocking.
2023-07-04 11:27:51 +08:00
8cbc1d58e1 [fix](MTMV) Disable partition specification temporarily (#20793)
The syntax for supporting partition updates in the future has not been investigated yet and there are issues with partition syntax. Therefore, the partition syntax has been temporarily removed in the current version and will be added after future research.
2023-07-04 11:09:04 +08:00
d5f39a6e54 [Performance](Nereids) refactor code speedup analyze (#21458)
refactor those code which cost much time.
2023-07-04 10:59:07 +08:00
599ba4529c [fix](nereids) need run ConvertInnerOrCrossJoin rule again after EliminateNotNull (#21346)
after running EliminateNotNull rule, the join conjuncts may be removed from inner join node.
So need run ConvertInnerOrCrossJoin rule to convert inner join with no join conjuncts to cross join node.
2023-07-04 10:52:36 +08:00
b1c16b96d6 [refactor](load) move validator out of VOlapTableSink (#21460) 2023-07-04 10:16:56 +08:00
938c0765cd [improvement](memory) improve inserting sparse rows into string column (#21420)
For the following test, which simulate hash join outputing 435699854 rows from 5131 buiding rows:

    {
        auto col = doris::vectorized::ColumnString::create();
        constexpr int build_rows = 5131;
        constexpr int output_rows = 435699854;
        std::string str("01234567");
        for (int i = 0; i < build_rows; ++i) {
            col->insert_data(str.data(), str.size());
        }
        int indices[output_rows];
        for (int i = 0; i < output_rows; ++i) {
            indices[i] = i % build_rows;
        }
        auto col2 = doris::vectorized::ColumnString::create();
        doris::MonotonicStopWatch watch;
        watch.start();
        col2->insert_indices_from(*col, indices, indices + output_rows);
        watch.stop();
        LOG(WARNING) << "string column insert_indices_from, rows: " << output_rows << ", time: " << doris::PrettyPrinter::print(watch.elapsed_time(), doris::TUnit::TIME_NS);
    }
The ColumnString::insert_indices_from inserting time improve from 6s665ms to 3s158ms:

W0702 23:08:39.672044 1277989 doris_main.cpp:545] string column insert_indices_from, rows: 435699854, time: 3s153ms
W0702 23:09:36.368853 1282061 doris_main.cpp:545] string column insert_indices_from, rows: 435699854, time: 3s158ms


W0703 00:30:26.093307 1468640 doris_main.cpp:545] string column insert_indices_from, rows: 435699854, time: 6s761ms
W0703 00:31:21.043638 1472937 doris_main.cpp:545] string column insert_indices_from, rows: 435699854, time: 6s665ms
2023-07-04 09:34:10 +08:00
70f473f32c [improvement](nereids) Refine tpcds tools (#21421)
Refine tpcds test tools, including split 99 cases into separate files, and refine 100g schema with range partition format.



---------

Co-authored-by: zhongjian.xzj <zhongjian.xzj@zhongjianxzjdeMacBook-Pro.local>
2023-07-04 09:28:02 +08:00
790b771a49 [improvement](execute) Eliminate virtual function calls when serializing and deserializing aggregate functions (#21427)
Eliminate virtual function calls when serializing and deserializing aggregate functions.

For example, in AggregateFunctionUniq::deserialize_and_merge method, calling read_pod_binary(ref, buf) in the for loop generates a large number of virtual function calls.

void deserialize_and_merge(AggregateDataPtr __restrict place, BufferReadable& buf,
                           Arena* arena) const override {
    auto& set = this->data(place).set;
    UInt64 size;
    read_var_uint(size, buf);
    set.rehash(size + set.size());
    for (size_t i = 0; i < size; ++i) {
        KeyType ref;
        read_pod_binary(ref, buf);
        set.insert(ref);
    }
}

template <typename Type>
void read_pod_binary(Type& x, BufferReadable& buf) {
    buf.read(reinterpret_cast<char*>(&x), sizeof(x));
}
BufferReadable has only one subclass, VectorBufferReader, so it is better to implement the BufferReadable class directly.

The following sql was tested on SSB-flat dataset:

SELECT COUNT (DISTINCT lo_partkey), COUNT (DISTINCT lo_suppkey) FROM lineorder_flat;
before: MergeTime: 415.398ms
after opt: MergeTime: 174.660ms
2023-07-04 09:26:37 +08:00
11e18f4c98 [Fix](multi-catalog) fix NPE for FileCacheValue. (#21441)
FileCacheValue.files may be null if there is not any files exists for some partitions.
2023-07-03 23:38:58 +08:00
5e6242e235 [typo](docs) Refactor upgrade documentation (#21449)
Co-authored-by: Yijia Su <suyijia@selectdb.com>
2023-07-03 20:14:19 +08:00
bb33ad0bde [opt](docs) update nereids doc to reflect the latest changes (#21444) 2023-07-03 18:50:01 +08:00
Pxl
f7c724f8a3 [Bug](excution) avoid core dump on filter_block_internal and add debug information (#21433)
avoid core dump on filter_block_internal and add debug information
2023-07-03 18:10:30 +08:00
63b170251e [fix](nereids)cast filter and join conjunct's return type to boolean (#21434) 2023-07-03 17:22:46 +08:00
d4a1549003 [minor](broker) fix name in broker's pom.xml (#20840)
change palo -> doris
do not check compiler's version inenv.sh, because building broker does not need gcc compiler. And the version is also checked in CMakefile
2023-07-03 16:46:47 +08:00
f80df20b6f [Fix](multi-catalog) Fix read error in mixed partition locations. (#21399)
Issue Number: close #20948

Fix read error in mixed partition locations(for example, some partitions locations are on s3, other are on hdfs) by `getLocationType` of file split level instead of the table level.
2023-07-03 15:14:17 +08:00