Commit Graph

8289 Commits

Author SHA1 Message Date
69243b3a57 [fix](Nereids): SemiJoinLogicalJoinTranspose shouldn't throw error when eliminate outer failed. (#19566) 2023-05-15 12:31:54 +08:00
Pxl
4eb2604789 [Bug](function) fix function define of Retention inconsist and change some static_cast to assert cast (#19455)
1. fix function define of `Retention` inconsist, this function return tinyint on `FE` and return uint8 on `BE`
2. make assert_cast support cast to derived
3. change some static cast to assert cast
4. support sum(bool)/avg(bool)
2023-05-15 11:50:02 +08:00
5df5c77d39 [fix](Nereids) should not colocate agg when scan data partition is random (#19598) 2023-05-15 11:22:41 +08:00
6748ae4a57 [Feature] Collect the information statistics of the query hit (#18805)
1. Show the query hit statistics for `baseall`

   ```sql
    MySQL [test_query_db]> show query stats from baseall;
    +-------+------------+-------------+
    | Field | QueryCount | FilterCount |
    +-------+------------+-------------+
    | k0    | 0          | 0           |
    | k1    | 0          | 0           |
    | k2    | 0          | 0           |
    | k3    | 0          | 0           |
    | k4    | 0          | 0           |
    | k5    | 0          | 0           |
    | k6    | 0          | 0           |
    | k10   | 0          | 0           |
    | k11   | 0          | 0           |
    | k7    | 0          | 0           |
    | k8    | 0          | 0           |
    | k9    | 0          | 0           |
    | k12   | 0          | 0           |
    | k13   | 0          | 0           |
    +-------+------------+-------------+
    14 rows in set (0.002 sec)

    MySQL [test_query_db]> select k0, k1,k2, sum(k3) from baseall  where k9 > 1 group by k0,k1,k2;
    +------+------+--------+-------------+
    | k0   | k1   | k2     | sum(`k3`)   |
    +------+------+--------+-------------+
    |    0 |    6 |  32767 |        3021 |
    |    1 |   12 |  32767 | -2147483647 |
    |    0 |    3 |   1989 |        1002 |
    |    0 |    7 | -32767 |        1002 |
    |    1 |    8 |    255 |  2147483647 |
    |    1 |    9 |   1991 | -2147483647 |
    |    1 |   11 |   1989 |       25699 |
    |    1 |   13 | -32767 |  2147483647 |
    |    1 |   14 |    255 |         103 |
    |    0 |    1 |   1989 |        1001 |
    |    0 |    2 |   1986 |        1001 |
    |    1 |   15 |   1992 |        3021 |
    +------+------+--------+-------------+
    12 rows in set (0.050 sec)

    MySQL [test_query_db]> show query stats from baseall;
    +-------+------------+-------------+
    | Field | QueryCount | FilterCount |
    +-------+------------+-------------+
    | k0    | 1          | 0           |
    | k1    | 1          | 0           |
    | k2    | 1          | 0           |
    | k3    | 1          | 0           |
    | k4    | 0          | 0           |
    | k5    | 0          | 0           |
    | k6    | 0          | 0           |
    | k10   | 0          | 0           |
    | k11   | 0          | 0           |
    | k7    | 0          | 0           |
    | k8    | 0          | 0           |
    | k9    | 1          | 1           |
    | k12   | 0          | 0           |
    | k13   | 0          | 0           |
    +-------+------------+-------------+
    14 rows in set (0.001 sec)
   ```

2. Show the query hit statistics summary for all the mv in a table

   ```sql
   MySQL [test_query_db]> show query stats from baseall all;
    +-----------+------------+
    | IndexName | QueryCount |
    +-----------+------------+
    | baseall   | 1          |
    +-----------+------------+
    1 row in set (0.005 sec)
   ```

3. Show the query hit statistics detail info for all the mv in a table

   ```sql
    MySQL [test_query_db]> show query stats from baseall all verbose;
    +-----------+-------+------------+-------------+
    | IndexName | Field | QueryCount | FilterCount |
    +-----------+-------+------------+-------------+
    | baseall   | k0    | 1          | 0           |
    |           | k1    | 1          | 0           |
    |           | k2    | 1          | 0           |
    |           | k3    | 1          | 0           |
    |           | k4    | 0          | 0           |
    |           | k5    | 0          | 0           |
    |           | k6    | 0          | 0           |
    |           | k10   | 0          | 0           |
    |           | k11   | 0          | 0           |
    |           | k7    | 0          | 0           |
    |           | k8    | 0          | 0           |
    |           | k9    | 1          | 1           |
    |           | k12   | 0          | 0           |
    |           | k13   | 0          | 0           |
    +-----------+-------+------------+-------------+
    14 rows in set (0.017 sec)
   ```

4. Show the query hit for a database

   ```sql
    MySQL [test_query_db]> show query stats for test_query_db;
    +----------------------------+------------+
    | TableName                  | QueryCount |
    +----------------------------+------------+
    | compaction_tbl             | 0          |
    | bigtable                   | 0          |
    | empty                      | 0          |
    | tempbaseall                | 0          |
    | test                       | 0          |
    | test_data_type             | 0          |
    | test_string_function_field | 0          |
    | baseall                    | 1          |
    | nullable                   | 0          |
    +----------------------------+------------+
    9 rows in set (0.005 sec)
   ```

5. Show query hit statistics for all the databases

   ```sql
    MySQL [(none)]> show query stats;
    +-----------------+------------+
    | Database        | QueryCount |
    +-----------------+------------+
    | test_query_db   | 1          |
    +-----------------+------------+
    1 rows in set (0.005 sec)
   ```
2023-05-15 10:56:34 +08:00
0068828a94 [Feature](insert) support insert overwrite stmt (#19616) 2023-05-14 20:01:30 +08:00
91cdb79d89 [Bugfix](Outfile) fix that export data to parquet and orc file format (#19436)
1. support export `LARGEINT` data type to parquet/orc file format.
2. Export the DORIS `DATE/DATETIME` type to the `Date/Timestamp` logic type of parquet file format.
3. Fix that the data is not correct when the DATE type data is exported to ORC.
2023-05-13 22:39:24 +08:00
316223ef34 [fix](planner) forbidden query in insert value list (#19493) 2023-05-12 19:46:19 +08:00
4142cc0e8c [fix](merge conflict) fix FE compile error (#19586) 2023-05-12 18:18:22 +08:00
c37d781942 [enchancement](statistics) manually inject table level statistics (#19495)
supports users to manually inject table level statistics.

table stats type:
- row_count

Modify table or partition statistics:
```SQL
ALTER TABLE table_name SET STATS ('k1' = 'v1', ...) 
```

TODO:
- support other table stats type if necessary
- update statistics cache if necessary
2023-05-12 17:03:12 +08:00
26a7f86b66 [improvement](auth)only GRANT_PRIV and USAGE_PRIV can GRANT for RESOURCE (#19547)
only GRANT_PRIV and USAGE_PRIV can GRANT for RESOURCE
2023-05-12 15:47:04 +08:00
26e930eed1 [Fix](multi-catalog) Make BE selection policy works fine when enable prefer_compute_node_for_external_table (#19346) 2023-05-12 15:32:50 +08:00
860ce97622 [feature](torc) support insert only transactional hive table on FE side (#19419)
* [feature](torc) support insert only transactional hive table on FE side

* 3

* commit

* 1
2023-05-12 15:32:26 +08:00
a1da57c63e [opt](Nereids)(WIP) optimize agg and window normalization step 2 #19305
1. refactor aggregate normalization to avoid data amplification before aggregate
2. remove useless aggreagte processing in ExtractAndNormalizeWindowExpression
3. only push distinct aggregate function children

TODO:
1. push down redundant expression in aggregate functions
2. refactor normalize repeat rule
3. move expression normalization and optimization after plan normalization to avoid unexpected expression optimization.
2023-05-12 14:00:13 +08:00
0477a9f5de [fix](dateformat) Fix hour date format (#19569)
Introduced from #19265.
The hour format should support both "5" and "05".
2023-05-12 13:38:41 +08:00
56bc8a762d [decimalv3](literal) use decimalv3 literal if enable_decimal_conversion is true (#19559) 2023-05-12 12:01:54 +08:00
9bf6ecca48 [minor](log) change debug log to info to observe the storage medium change #19529
When user set default_storage_medium to true, the storage medium of all partitions should be SSD,
and cooldown time should be 9999-12-31 23:59:59.
So that it won't change to HDD.

But looks like sometimes it still change to HDD.
So I change the debug log to info to observer it.
2023-05-12 11:02:55 +08:00
e9392780a9 [fix](nereids)fix some nereids planner bugs (#19509)
1.some encrypt and decrypt functions have wrong blockEncryptionMode
2.topN node should compare tuples from intermediate_row_desc with first_sort_slot.tuple_id
3.must keep the limit if it's an uncorrelated in-subquery with limit on sort, like select a from t1 where a in ( select b from t2 order by xx limit yy )
2023-05-12 09:06:16 +08:00
a041f8eabe [fix](fe) Fx SimpleDateFormatter thread unsafe issue by replacing to DateTimeFormatter. (#19265)
DateTimeFormatter replace SimpleDateFormat in fe module because SimpleDateFormat is not thread-safe.
2023-05-11 22:50:24 +08:00
d58498841a [fix](Nereids) Should copy JoinReorderContext for PushdownProject (#19508)
1. should copy JoinReorderContext
2. verify bushy tree join reorder
2023-05-11 21:05:12 +08:00
35c4de9fea [fix](Nereids) convert decimalv2 type to decimalv3 type by mistake (#19491) 2023-05-11 19:11:51 +08:00
c5a53e0caa [tpch](nereids) estimate cost with unknown column stats #19046
make nereids generate more reasonable plans with table row count, but without column stats.
TODO: q5 and q7 is not good, because of column correlation
ps_suppkey and ps_partkey
2023-05-11 19:03:11 +08:00
39ec8aa64c [refactor](complex-type) refactor array/map/struct literal to not invoke execute() function in prepare state (#19068) 2023-05-11 18:44:37 +08:00
99cef84acf [Feature](Nereids) Add nereids minidump (#18747) 2023-05-11 18:36:30 +08:00
45c89c1d3c [Fix](stats) Stats persistence failed when a column is all null values (#19412) 2023-05-11 17:44:44 +08:00
589dd8a9b3 [Fix](multi-catalog) Fix query hms tbl with compressed data files. (#19387)
If submit a query contains hms tbls which data files are compressed (bz2,lzo,lz4 ...), a error will occurs like this: 

```[INTERNAL_ERROR]Only support csv data in utf8 codec``` . 

This is because `org.apache.doris.planner.external.HiveScanNode`  set `fileFormatType` as `TFileFormatType.FORMAT_CSV_PLAIN` whether the real compress algo of data files are.  This pr try to fix this problem.
2023-05-11 14:53:58 +08:00
6d2070c59d [enhancement](stats) Make stats cache item size configurable (#19205) 2023-05-11 13:59:37 +08:00
dc497e11bb [fix](Nereids) avoid to push top Project of JoinCluster in PushdownProjectThroughJoin (#19441)
We shouldn't push top Project of JoinCluster in PushdownProjectThroughJoin

like 

```
 *      Project  (id + 1) if this project is top project of Join Cluster
 *        |     
 *       Join   
 *      /      \         
 *    Join  Join
 *    /  ....
 * Join
```
2023-05-11 13:58:54 +08:00
834bf2eab7 [feature](array) Add array_last lambda function (#18388)
Add array_last lambda function
2023-05-11 13:15:54 +08:00
5167dc1251 [feature](merge-on-write) enable merge on write by default (#19017) 2023-05-11 11:10:48 +08:00
3ba3b6c66f [opt](FileCache) use modification time to determine whether the file is changed (#18906)
Get the last modification time from file status, and use the combination of path and modification time to generate cache identifier.
When a file is changed, the modification time will be changed, so the former cache path will be invalid.
2023-05-11 07:50:39 +08:00
4418eb36a3 [Fix](multi-catalog) Fix some hive partition issues. (#19513)
Fix some hive partition issues.
1. Fix be will crash when using hive partitions field of `date`, `timestamp`, `decimal` type.
2. Fix hdfs uri decode error when using `timestamp` partition filed which will cause some url-encoding for special chars, such as `%3A` will encode `:`.
2023-05-11 07:49:46 +08:00
95833426e8 [BugFix](table-value-function) Fix backends() tvf (#19452)
Change the `Alive/SystemDecommissioned/ClusterDecommissioned` field type of the `backends()`tvf to bool
2023-05-11 07:49:27 +08:00
2d1f597413 [Fix](statistics)Fix hive table statistic bug (#19365)
Fix hive table statistic bug. Collect table/partition level statistics.
2023-05-11 07:48:58 +08:00
41d4ed8367 [Improvement](multicatalog) support show_partitions for hms catalog (#19242)
* [Improvement](multicatalog) support show_partitions for hms catalog

* update according review advice
2023-05-11 01:17:23 +08:00
8845c2cf44 [fix](bdbje) remove System.exit(-1) in BDBEnvironment.close() (#19335)
* https://github.com/apache/doris/issues/18766
2023-05-11 01:01:38 +08:00
0f6c69de53 [Fix](multi-catalog) Fix sync hms event failed when start FE soon. (#19344)
* [Fix](multi-catalog) Fix sync hms event failed when start FE soon after.

* [Fix](multi-catalog) Fix sync hms event failed when start FE soon after.

---------

Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>
2023-05-11 01:00:55 +08:00
b129c9901b [improvement](FQDN)Change the implementation of fqdn (#19123)
Main changes:
1. If fqdn is enabled in the configuration file, when fe starts, localAddr will obtain fqdn instead of IP, priority_ Networks will fail
2. The IP and host names of Backend and Front are combined into one field, host. When fqdn is enabled, it represents the host name, and when not enabled, it represents the IP address
3. The communication between clusters directly uses fqdn, and various Connection pool add authentication mechanisms to prevent the IP address of the domain name from changing and the connection between nodes from making errors
4. No longer requires polling to verify if the IP has changed, delete fqdnManager
5. Change the method of verifying the legitimacy of nodes between FEs from obtaining client IP to displaying the identity of the transmitting node itself in the HTTP request header or the message body of the throttle
6. When processing the heartbeat, if BE finds that the host stored by itself is inconsistent with the host stored by the master, after verifying the legitimacy of the host, it will change its own host instead of directly reporting an error
7. Simplify the generation logic of fe name

Scope of influence:
1. Establishing communication connections between clusters
2. Determine whether it is the same node through attributes such as IP
3. Print Log
4. Information display
5. Address Splicing
6. k8s deployment
7. Upgrade compatibility

Test plan:
1. Change the IP address of the node, while keeping the fqdn unchanged, change the IP addresses of fe and be, and verify whether the cluster can read and write data normally
2. Use the master code to generate metadata, and use the previous metadata on the current pr to verify whether it is compatible with the old version (upgrading is no longer supported if fqdn has been enabled before)
3. Deploy fe and be clusters using k8s to verify whether the cluster can read and write data normally
4. According to https://doris.apache.org/zh-CN/docs/dev/admin-manual/cluster-management/fqdn?_highlight=fqdn#%E6%97%A7%E9%9B%86%E7%BE%A4%E5%90%AF%E7%94%A8fqdn Upgrading old clusters
5. Use streamload to specify the fqdn of fe and be to import data separately
6. Use different users to start transactions and write data using insert statements
2023-05-11 00:44:48 +08:00
3a22af836e [fix](jdbc catalog) fix error to clickhouse uint64 type Conversion (#19463)
* [fix](jdbc catalog) fix error to clickhouse uint64 type Conversion

* add test case
2023-05-10 21:53:30 +08:00
d0a8cd0fc5 [fix](nereids) dphyper join reorder may lost some join conjuncts (#19318) 2023-05-10 19:02:35 +08:00
337732ae01 [fix](nereids) lost exchange before global limit merge node sometimes (#19396)
should add exchange node between global and local limit
2023-05-10 17:57:21 +08:00
894801f5ce [feature](load-refactor) Step1: InsertStmt as facade layer and run S3/Broker Load (#19142) 2023-05-10 17:48:50 +08:00
d20b5f90d8 [feature](executor) Automatically set the instance_num using the info from be. (#19345)
1. fixed some error regressions (results error with big nstance_num due to incorrect order by).
2. if set parallel_fragment_exec_instance_num to 0, the concurrency in the Pipeline execution engine will automatically be set to half of the number of CPU cores.
3. add limit to parallel_fragment_exec_instance_num that it cannot be set to more than fe.conf::max_instance_num(Default: 128)
```
mysql [(none)]>set parallel_fragment_exec_instance_num = 514;
ERROR 1231 (42000): errCode = 2, detailMessage = Variable 'parallel_fragment_exec_instance_num' can't be set to the value of '514(Should not be set to more than 128)'
```
2023-05-10 17:07:41 +08:00
4483e3a6e1 [Improvement](scan) add a config for scan queue memory limit (#19439) 2023-05-10 13:14:23 +08:00
ab8cfbbfb6 [bugfix](regression-test) add some window function test (#19460)
Only 2000 union will cause BE use a lot of memory, so that I enable other test in this PR only disable 2000 union case.



---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-05-10 12:06:02 +08:00
553068f7be [feat](Nereids): trace enumeration of DPHyp (#19394) 2023-05-10 11:57:35 +08:00
fae2e5fd22 [enchancement](statistics) implement automatically analyzing statistics and support table level statistics #19420
Add table level statistics, support SHOW TABLE STATS statement to show table level statistics.
Implement automatically analyze statistics, support ANALYZE... WITH AUTO ... statement to automatically analyze statistics.
TODO:

collate relevant p0 tests
Supplement the design description to README.md
Issue Number: close #xxx
2023-05-10 11:47:34 +08:00
601565341b [fix](gson) avoid gson serde with EsRepository (#19385)
To avoid error like:

class org.apache.doris.external.elasticsearch.EsRepository declares multiple JSON fields named runnable
2023-05-10 11:37:18 +08:00
78435823b6 [Fix](multi catalog)Return all partition values while reading hive table. (#19434)
Return all partition values while reading hive table.
Add a config item for the max value of hive table to partition list cache.
Default value is 100.
2023-05-10 10:55:33 +08:00
68eb420cab [fix](MySQL) the way Doris handles boolean type is consistent with MySQL (#19416) 2023-05-10 00:58:09 +08:00
096aa25ca6 [improvement](orc-reader) Implements ORC lazy materialization (#18615)
- Implements ORC lazy materialization, integrate with the implementation of https://github.com/apache/doris-thirdparty/pull/56 and https://github.com/apache/doris-thirdparty/pull/62.
- Refactor code: Move `execute_conjuncts()` and `execute_conjuncts_and_filter_block()` in `parquet_group_reader `to `VExprContext`, used by parquet reader and orc reader.
- Add session variables `enable_parquet_lazy_materialization` and `enable_orc_lazy_materialization` to control whether enable lazy materialization.
- Modify `build.sh` to update apache-orc submodule or download package every time.
2023-05-09 23:33:33 +08:00