Commit Graph

444 Commits

Author SHA1 Message Date
af7b16f213 [optimize](desc) display the correct data type of aggStateType (#34968)
If a table column is AGG_STATE type, we can't get the clear defined data type if we use `desc tbl` statement.

create table a_table(
    k1 int null,
    k2 agg_state<max_by(int not null,int)> generic,
    k3 agg_state<group_concat(string)> generic
)
aggregate key (k1)
distributed BY hash(k1) buckets 3
properties("replication_num" = "1");

before optimize:

mysql> desc a_table;
+-------+------------------------------------------------+------+-------+---------+---------+
| Field | Type                                           | Null | Key   | Default | Extra   |
+-------+------------------------------------------------+------+-------+---------+---------+
| k1    | INT                                            | Yes  | true  | NULL    |         |
| k2    | org.apache.doris.catalog.AggStateType@239f771c | No   | false | NULL    | GENERIC |
| k3    | org.apache.doris.catalog.AggStateType@2e535f50 | No   | false | NULL    | GENERIC |
+-------+------------------------------------------------+------+-------+---------+---------+
3 rows in set (0.00 sec)


after optimize:

mysql> desc a_table;
+-------+------------------------------------+------+-------+---------+---------+
| Field | Type                               | Null | Key   | Default | Extra   |
+-------+------------------------------------+------+-------+---------+---------+
| k1    | INT                                | Yes  | true  | NULL    |         |
| k2    | AGG_STATE<max_by(INT, INT NULL)>   | No   | false | NULL    | GENERIC |
| k3    | AGG_STATE<group_concat(TEXT NULL)> | No   | false | NULL    | GENERIC |
+-------+------------------------------------+------+-------+---------+---------+


Co-authored-by: duanxujian <duanxujian@jd.com>
2024-05-22 10:03:31 +08:00
5012ddd87a [fix](Nereids) fix sql cache return old value when truncate partition (#34698)
1. fix sql cache return old value when truncate partition
2. use expire_sql_cache_in_fe_second to control the expire time of the sql cache which in the NereidsSqlCacheManager
2024-05-18 18:05:31 +08:00
1a24895257 [opt](routine-load) optimize routine load task thread pool and related param(#32282) (#34896) 2024-05-15 12:42:02 +08:00
f9c42f34dd [fix](auth)Compatible with previously enabled ldap configuration (#34891) 2024-05-15 12:36:47 +08:00
cadbbdd2c0 [fix](config) for compatibility issue of log dir config (#34734)
* [fix](config) for compatibility issue of log dir config

* 1
2024-05-12 09:44:50 +08:00
ec34bc0386 [bug](config) Fix modifying label_num_threshold does not take effect (#34575) 2024-05-10 22:12:17 +08:00
9a94681b29 [refactor](type) AggStateType should not extends ScalarType (#34463)
1. let AggStateType extends Type
2. remove useless interface isFixedLengthType and supportsTablePartitioning
3. let MapType implement interface isSupported
4. let VariantType extends ScalarType
2024-05-10 22:10:42 +08:00
853dbdcb00 [Feature](PreparedStatement) implement general server side prepared (#33807) 2024-05-10 22:10:11 +08:00
6c11dd2231 [Fix](planner) fix ScalarType.getAssignmentCompatibleType() when deal boolean and decimal (#34435)
The legacy planner encounters issues when handling filters such as: c1(boolean type)=0.0(decimalv3).
The literal 0.0 is interpreted as decimalv3(1,1), and the boolean type c1 is coerced to decimalv3(1,1).
decimalv3(1,1) can only retain values in the range [0,1), while the boolean true is represented as 1, exceeding the upper bound, thus causing an overflow problem.
This pull request addresses this issue by considering the boolean type as decimalv3(1,0), making both c1 and 0.0 being cast to decimal(2,1).


Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
2024-05-10 22:07:16 +08:00
07207b7b51 [feature](shuffle) enable strict consistency dml by default (#32958) (#34641) 2024-05-10 14:31:50 +08:00
3ae3f9d6e1 [opt](catalog) support using loading cache for db/table list in external catalog (#33610) (#34596)
bp #33610
2024-05-09 17:50:39 +08:00
8fa1b78d7b Revert "[feature](shuffle) enable strict consistency dml by default (#32958)"
This reverts commit 400105a92182755bdd95a58a7d378d67c6b27f51.
2024-05-08 23:00:46 +08:00
400105a921 [feature](shuffle) enable strict consistency dml by default (#32958) 2024-05-08 11:00:14 +08:00
182177def0 [Improve](config)The stream_load label length is changed to a configurable (#34459)
pick from #33745
2024-05-07 20:43:16 +08:00
8fdfbcb3c4 Revert "[Opt](func) opt the percentile func performance (#34373) (#34416)"
This reverts commit 509ae425e416b4779ae94eab9c2b21f9850e03c3.
2024-05-07 07:23:48 +08:00
2d4da7d177 [fix](kerberos)enable hadoop auto renew tgt (#34439) 2024-05-07 00:36:20 +08:00
509ae425e4 [Opt](func) opt the percentile func performance (#34373) (#34416) 2024-05-06 20:10:35 +08:00
91887a285e Implement HLL with 128 buckets to support statistics cache. (#34124) 2024-04-26 15:05:36 +08:00
2a1fbfd72c [feat](fe) Add ignore_bdbje_log_checksum_read for BDBEnvironment (#31247)
* https://forums.oracle.com/ords/apexds/post/je-log-checksumexception-2812

* When meeting disk damage or other reason described in the oracle forums
  and fe cannot start due to `com.sleepycat.je.log.ChecksumException`, we
  add a param `ignore_bdbje_log_checksum_read` to ignore the exception, but
  there is no guarantee of correctness for bdbje kv data

Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>
2024-04-22 22:33:24 +08:00
88b3d61eca [refactor](Mysql) Refactoring the process of using external components to authenticate in MySQL connections (#32875) (#33958)
bp #32875

Co-authored-by: LompleZ Liu <47652868+LompleZ@users.noreply.github.com>
2024-04-22 16:41:49 +08:00
15f8014e4e [enhancement](Nereids) Enable parse sql from sql cache and fix some bugs (#33867)
* [enhancement](Nereids) Enable parse sql from sql cache (#33262)

Before this pr, the query must pass through parser, analyzer, rewriter, optimizer and translator, then we can check whether this query can use sql cache, if the query is too long, or the number of join tables too big, the plan time usually >= 500ms.

This pr reduce this time by skip the fashion plan path, because we can reuse the previous physical plan and query result if no any changed. In some cases we should not parse sql from sql cache, e.g. table structure changed, data changed, user policies changed, privileges changed, contains non-deterministic functions, and user variables changed.

In my test case: query a view which has lots of join and union, and the tables has empty partition, the query latency is about 3ms. if not parse sql from sql cache, the plan time is about 550ms

## Features
1. use Config.sql_cache_manage_num to control how many sql cache be reused in on fe
2. if explain plan appear some plans contains `LogicalSqlCache` or `PhysicalSqlCache`, it means the query can use sql cache, like this:
```sql
mysql> set enable_sql_cache=true;
Query OK, 0 rows affected (0.00 sec)

mysql> explain physical plan select * from test.t;
+----------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                  |
+----------------------------------------------------------------------------------+
| cost = 3.135                                                                     |
| PhysicalResultSink[53] ( outputExprs=[c1#0, c2#1] )                              |
| +--PhysicalDistribute[50]@0 ( stats=3, distributionSpec=DistributionSpecGather ) |
|    +--PhysicalOlapScan[t]@0 ( stats=3 )                                          |
+----------------------------------------------------------------------------------+
4 rows in set (0.02 sec)

mysql> select * from test.t;
+------+------+
| c1   | c2   |
+------+------+
|    1 |    2 |
|   -2 |   -2 |
| NULL |   30 |
+------+------+
3 rows in set (0.05 sec)

mysql> explain physical plan select * from test.t;
+-------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                           |
+-------------------------------------------------------------------------------------------+
| cost = 0.0                                                                                |
| PhysicalSqlCache[2] ( queryId=78511f515cda466b-95385d892d6c68d0, backend=127.0.0.1:9050 ) |
| +--PhysicalResultSink[52] ( outputExprs=[c1#0, c2#1] )                                    |
|    +--PhysicalDistribute[49]@0 ( stats=3, distributionSpec=DistributionSpecGather )       |
|       +--PhysicalOlapScan[t]@0 ( stats=3 )                                                |
+-------------------------------------------------------------------------------------------+
5 rows in set (0.01 sec)
```

(cherry picked from commit 03bd2a337d4a56ea9c91673b3bd4ae518ed10f20)

* fix

* [fix](Nereids) fix some sql cache consistence bug between multiple frontends (#33722)

fix some sql cache consistence bug between multiple frontends which introduced by [enhancement](Nereids) Enable parse sql from sql cache #33262, fix by use row policy as the part of sql cache key.
support dynamic update the num of fe manage sql cache key

(cherry picked from commit 90abd76f71e73702e49794d375ace4f27f834a30)

* [fix](Nereids) fix bug of dry run query with sql cache (#33799)

1. dry run query should not use sql cache
2. fix test sql cache in cloud mode
3. enable cache OneRowRelation and EmptyRelation in frontend to skip parse sql

(cherry picked from commit dc80ecf7f33da7b8c04832dee88abd09f7db9ffe)

* remove cloud mode

* remove @NotNull
2024-04-19 15:22:14 +08:00
ad75b9b142 [opt](auto bucket) add fe config autobucket_max_buckets (#33842) 2024-04-19 15:03:06 +08:00
6776a3ad1b [Fix](planner) fix create view star except and modify cast to sql (#33726) 2024-04-19 15:02:49 +08:00
56b7839447 [feature](backup) ignore table that not support type when backup, and… (#33158)
* [feature](backup) ignore table that not support type when backup, and not report exception

Signed-off-by: nextdreamblue <zxw520blue1@163.com>

* fix

Signed-off-by: nextdreamblue <zxw520blue1@163.com>

---------

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2024-04-17 23:42:11 +08:00
b2b385a4ff [improve](fold) support complex type for constant folding (#32867) 2024-04-17 23:41:59 +08:00
38c5030f97 [opt](log) refactor the log dir config (#32933)
Refactor the config for log dir of FE and BE

TLDR:
- Use env variable `LOG_DIR` to set root log dir
- Remove `sys_log_dir` for FE and BE

Details:

1. FE

    1. The root log dir is set by env variable `LOG_DIR` in `fe.conf`
    2. The default value of `audit_log_dir` is same as `${LOG_DIR}/`
    3. The default value of `spark_launcher_log_dir` is `${LOG_DIR}/spark_launcher_log`
    4. The default value of `nereids_trace_log_dir` is `${LOG_DIR}/nereids_trace_log`
    5. The origin `sys_log_dir` is deprecated, and default value is `""`.
        But for compatibility, if user already set `sys_log_dir` before, Doris will still use it as root log dir.

2. BE

     1. The root log dir is set by env variable `LOG_DIR` in `be.conf`
     2. Remove `pipeline_tracing_log_dir`, use `${LOG_DIR}` directly.
     3. The origin `sys_log_dir` is deprecated, and default value is `""`.
         But for compatibility, if user already set `sys_log_dir` before, Doris will still use it as root log dir.
2024-04-17 23:41:59 +08:00
07f296734a [regression](insert)add hive DDL and CTAS regression case (#32924)
Issue Number: #31442

dependent on #32824

add ddl(create and drop) test
add ctas test
add complex type test
TODO:
bucketed table test
truncate test
add/drop partition test
2024-04-12 10:24:23 +08:00
36a1bf1d73 [feature][insert]Adapt the create table  statement to the nereids sql (#32458)
issue: #31442

1. adapt  create table statement from doris  to hive
2. fix insert overwrite for table sink

> The doris create hive table statement:

```
mysql> CREATE TABLE buck2(
    ->     id int COMMENT 'col1',
    ->     name string COMMENT 'col2',
    ->     dt string COMMENT 'part1',
    ->     dtm string COMMENT 'part2'
    -> ) ENGINE=hive
    -> COMMENT "create tbl"
    -> PARTITION BY LIST (dt, dtm) ()
    -> DISTRIBUTED BY HASH (id) BUCKETS 16
    -> PROPERTIES(
    ->     "file_format" = "orc"
    -> );
```

> generated  hive create table statement:

```
CREATE TABLE `buck2`(
  `id` int COMMENT 'col1',
  `name` string COMMENT 'col2')
PARTITIONED BY (
 `dt` string,
 `dtm` string)
CLUSTERED BY (
  id)
INTO 16 BUCKETS
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://HDFS8000871/usr/hive/warehouse/jz3.db/buck2'
TBLPROPERTIES (
  'transient_lastDdlTime'='1710840747',
  'doris.file_format'='orc')

```
2024-04-12 09:57:37 +08:00
14c5247fb7 [feature](replica) support force set replicate allocation for olap tables (#32916)
Add a config to force set replication allocation for all OLAP tables and partitions.
2024-04-10 16:00:15 +08:00
16f8afc408 [refactor](coordinator) split profile logic and instance report logic (#32010)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-04-10 15:51:32 +08:00
80cdc74908 [fix](arrow-flight) Fix reach limit of connections error (#32911)
Fix Reach limit of connections error
in fe.conf , arrow_flight_token_cache_size is mandatory less than qe_max_connection/2. arrow flight sql is a stateless protocol, connection is usually not actively disconnected, bearer token is evict from the cache will unregister ConnectContext.

Fix ConnectContext.command not be reset to COM_SLEEP in time, this will result in frequent kill connection after query timeout.

Fix bearer token evict log and exception.

TODO: use arrow flight session: https://mail.google.com/mail/u/0/#inbox/FMfcgzGxRdxBLQLTcvvtRpqsvmhrHpdH
2024-04-10 11:34:29 +08:00
1d4e5a1c58 [enhance](auth)enable col auth (#32659) 2024-03-24 08:07:01 +08:00
ab467f53db [fix](partition) Fix be tablet partition id eq 0 By report tablet (#32179) (#32667) 2024-03-22 15:38:58 +08:00
279ea2f366 [feature](proxy-protocol) Support proxy protocol v1 (#32338)
Enable proxy protocol to support IP transparency.
See: `IP Transparency` in f57387b502/docs/en/docs/admin-manual/cluster-management/load-balancing.md
for details
2024-03-21 14:07:22 +08:00
ecadb60bcd [Pick 2.1](inverted index) support inverted index format v2 (#30145) (#32418) 2024-03-19 08:11:33 +08:00
711c0cd55c [feature](insert)implement hive table sink plan (#31765) (#32386)
from #31765
2024-03-18 22:49:30 +08:00
4732aae628 [Refactor](insert) refactor insert command to support other type of table (#31610) (#32345)
bp #31610
2024-03-17 20:46:07 +08:00
844a1b53b7 [fix](retry) Set query encounter rpc exception default retry times to 3 (#28555) 2024-03-16 20:53:46 +08:00
ea2fbfaffa [feature](Nereids) support agg state type in create table (#32171)
this PR introduce a behavior change, syntax of create table with agg_state type is changed.
2024-03-15 18:04:49 +08:00
847ec368be [Fix](smooth-upgrade) Fix incompatibility when upgrade from 2.0 to 2.1 (#32220) 2024-03-14 11:23:05 +08:00
b9a87c63f7 [chore](catalog recycle bin) Add option to ignore min erase latency for testing (#31417) 2024-02-29 16:44:40 +08:00
9243b3eeee [fix](multi-catalog) add config to disable external DDL (#31528)
from #31453
2024-02-29 08:42:35 +08:00
Pxl
6737fdea64 [Chore](agg-state) adjust AggStateType constructor check input (#31401)
adjust AggStateType constructor check input
2024-02-28 17:52:11 +08:00
883d022f84 [fix](paimon) fix hadoop.username does not take effect in paimon catalog (#31478) 2024-02-28 13:08:41 +08:00
a371a10603 [fix](Nereids) let time type coercion same with legacy planner (#31472) 2024-02-28 13:07:47 +08:00
eb0416032b [feature](multi-catalog)support hms catalog create and drop table/db (#30198) (#31499)
1. rename old create/drop table to add/removeMemoryTable
2. add new create/drop table/db method
3. support hms catalog create/drop table/db

(cherry picked from commit b2e869c7414c68186de8d43b324ae736d7cc3463)
2024-02-28 09:33:54 +08:00
1127b0065a [Improment](executor)Add scanbytes/scanrows condition (#31364)
* Add scanbytes/scanrows condition

* fix reg
2024-02-27 10:12:33 +08:00
e48f4f38d0 [Fix](fe-common) Fix the Pair.java code about the hidden danger of NullPointException (#31371)
* 修复Pair类 first 或 second 为null时,调用equals和toString 抛NullPointException问题

* add license
2024-02-26 19:07:10 +08:00
7a1caf4718 [refactor](wg) enable wg by default and init normal wg in constructor (#31373)
should always enable workload group because other operations depend on it for example MTMV, and spill to disk.
the normal workload group should be created in constructor.
2024-02-25 18:08:19 +08:00
aee49adf1e [opt](compute-node) refactor compute node doc and opt some default config (#31325)
* [opt](compute-node) refactor compute node doc and opt some default config

* 1

* 1
2024-02-24 11:44:53 +08:00