Commit Graph

5755 Commits

Author SHA1 Message Date
c98147375d [fix](Nereids) decimal compare float should use double as common type (#19710) 2023-05-17 10:36:04 +08:00
Pxl
d784c99360 [Bug](planner) fix unassigned conjunct assigned on wrong node (#19672)
* fix unassigned conjunct assigned on wrong node
2023-05-17 10:28:22 +08:00
0cae9bb3a1 [UT](decimalv3) fix FE UT when enable decimal conversion (#19701) 2023-05-17 09:55:05 +08:00
54507bb058 [fix](FQDN)fix Checkpoint error (#19678)
Must use Env.getServingEnv() instead of getCurrentEnv(),because here we need to obtain selfNode through the official service catalog.
2023-05-17 08:47:11 +08:00
Pxl
7f73749b88 [Bug](pipeline) fix distributionColumnIds not updated correct when outputColumnUnique… (#19704)
fix distributionColumnIds not updated correct when outputColumnUnique
2023-05-17 00:13:10 +08:00
a1b1aff0ee [improvement](jdbc catalog) Adapt to hana's special view & Optimize jdbc name format (#19696) 2023-05-16 23:29:30 +08:00
325a1d4b28 [vectorized](function) support array_count function (#18557)
support array_count function.
array_count:Returns the number of non-zero and non-null elements in the given array.
2023-05-16 17:00:01 +08:00
e22f5891d2 [WIP](row store) two phase opt read row store (#18654) 2023-05-16 13:21:58 +08:00
3f2d1ae9a4 [feature-wip](multi-catalog)(step1)support connect to max compute (#19606)
Issue Number: #19679

support connect to max compute metadata by odps sdk
2023-05-16 11:30:27 +08:00
9cede6d763 [fix](row-policy) row policy supports external catalog (#19570)
Row policy support external catalog
2023-05-16 08:54:06 +08:00
9535ed01aa [feature](tvf) Support compress file for tvf hdfs() and s3() (#19530)
We can support this by add a new properties for tvf, like :

`select * from hdfs("uri" = "xxx", ..., "compress_type" = "lz4", ...)`

User can:

Specify compression explicitly by setting `"compression" = "xxx"`.
Doris can infer the compression type by the suffix of file name(e.g. `file1.gz`)
Currently, we only support reading compress file in `csv` format, and on BE side, we already support.
All need to do is to analyze the `"compress_type"` on FE side and pass it to BE.
2023-05-16 08:50:43 +08:00
8284c342cb [Fix](multi-catalog) Fix query hms tbl with compressed data files. (#19557)
If a hms table's file format is csv, uncompressed data files may be coexists with compressed data files, so we need to set compressType separately.
2023-05-16 08:49:45 +08:00
8ec18660fe [improvement](FQDN)Remove unused code (#19638) 2023-05-16 08:48:20 +08:00
6c9c9e9765 [feature-wip](resource-group) Supports memory hard isolation of resource group (#19526) 2023-05-15 22:45:46 +08:00
276e631e9c [chore](ddlExecutor) log class of unknown stmt in DdlExecutor (#19631)
* [chore](ddlExecutor) log class of unknown stmt in DdlExecutor
2023-05-15 21:59:44 +08:00
052c7cff89 [Fix](Planner) fix cast from decimal to boolean (#19585) 2023-05-15 15:13:16 +08:00
69243b3a57 [fix](Nereids): SemiJoinLogicalJoinTranspose shouldn't throw error when eliminate outer failed. (#19566) 2023-05-15 12:31:54 +08:00
Pxl
4eb2604789 [Bug](function) fix function define of Retention inconsist and change some static_cast to assert cast (#19455)
1. fix function define of `Retention` inconsist, this function return tinyint on `FE` and return uint8 on `BE`
2. make assert_cast support cast to derived
3. change some static cast to assert cast
4. support sum(bool)/avg(bool)
2023-05-15 11:50:02 +08:00
5df5c77d39 [fix](Nereids) should not colocate agg when scan data partition is random (#19598) 2023-05-15 11:22:41 +08:00
6748ae4a57 [Feature] Collect the information statistics of the query hit (#18805)
1. Show the query hit statistics for `baseall`

   ```sql
    MySQL [test_query_db]> show query stats from baseall;
    +-------+------------+-------------+
    | Field | QueryCount | FilterCount |
    +-------+------------+-------------+
    | k0    | 0          | 0           |
    | k1    | 0          | 0           |
    | k2    | 0          | 0           |
    | k3    | 0          | 0           |
    | k4    | 0          | 0           |
    | k5    | 0          | 0           |
    | k6    | 0          | 0           |
    | k10   | 0          | 0           |
    | k11   | 0          | 0           |
    | k7    | 0          | 0           |
    | k8    | 0          | 0           |
    | k9    | 0          | 0           |
    | k12   | 0          | 0           |
    | k13   | 0          | 0           |
    +-------+------------+-------------+
    14 rows in set (0.002 sec)

    MySQL [test_query_db]> select k0, k1,k2, sum(k3) from baseall  where k9 > 1 group by k0,k1,k2;
    +------+------+--------+-------------+
    | k0   | k1   | k2     | sum(`k3`)   |
    +------+------+--------+-------------+
    |    0 |    6 |  32767 |        3021 |
    |    1 |   12 |  32767 | -2147483647 |
    |    0 |    3 |   1989 |        1002 |
    |    0 |    7 | -32767 |        1002 |
    |    1 |    8 |    255 |  2147483647 |
    |    1 |    9 |   1991 | -2147483647 |
    |    1 |   11 |   1989 |       25699 |
    |    1 |   13 | -32767 |  2147483647 |
    |    1 |   14 |    255 |         103 |
    |    0 |    1 |   1989 |        1001 |
    |    0 |    2 |   1986 |        1001 |
    |    1 |   15 |   1992 |        3021 |
    +------+------+--------+-------------+
    12 rows in set (0.050 sec)

    MySQL [test_query_db]> show query stats from baseall;
    +-------+------------+-------------+
    | Field | QueryCount | FilterCount |
    +-------+------------+-------------+
    | k0    | 1          | 0           |
    | k1    | 1          | 0           |
    | k2    | 1          | 0           |
    | k3    | 1          | 0           |
    | k4    | 0          | 0           |
    | k5    | 0          | 0           |
    | k6    | 0          | 0           |
    | k10   | 0          | 0           |
    | k11   | 0          | 0           |
    | k7    | 0          | 0           |
    | k8    | 0          | 0           |
    | k9    | 1          | 1           |
    | k12   | 0          | 0           |
    | k13   | 0          | 0           |
    +-------+------------+-------------+
    14 rows in set (0.001 sec)
   ```

2. Show the query hit statistics summary for all the mv in a table

   ```sql
   MySQL [test_query_db]> show query stats from baseall all;
    +-----------+------------+
    | IndexName | QueryCount |
    +-----------+------------+
    | baseall   | 1          |
    +-----------+------------+
    1 row in set (0.005 sec)
   ```

3. Show the query hit statistics detail info for all the mv in a table

   ```sql
    MySQL [test_query_db]> show query stats from baseall all verbose;
    +-----------+-------+------------+-------------+
    | IndexName | Field | QueryCount | FilterCount |
    +-----------+-------+------------+-------------+
    | baseall   | k0    | 1          | 0           |
    |           | k1    | 1          | 0           |
    |           | k2    | 1          | 0           |
    |           | k3    | 1          | 0           |
    |           | k4    | 0          | 0           |
    |           | k5    | 0          | 0           |
    |           | k6    | 0          | 0           |
    |           | k10   | 0          | 0           |
    |           | k11   | 0          | 0           |
    |           | k7    | 0          | 0           |
    |           | k8    | 0          | 0           |
    |           | k9    | 1          | 1           |
    |           | k12   | 0          | 0           |
    |           | k13   | 0          | 0           |
    +-----------+-------+------------+-------------+
    14 rows in set (0.017 sec)
   ```

4. Show the query hit for a database

   ```sql
    MySQL [test_query_db]> show query stats for test_query_db;
    +----------------------------+------------+
    | TableName                  | QueryCount |
    +----------------------------+------------+
    | compaction_tbl             | 0          |
    | bigtable                   | 0          |
    | empty                      | 0          |
    | tempbaseall                | 0          |
    | test                       | 0          |
    | test_data_type             | 0          |
    | test_string_function_field | 0          |
    | baseall                    | 1          |
    | nullable                   | 0          |
    +----------------------------+------------+
    9 rows in set (0.005 sec)
   ```

5. Show query hit statistics for all the databases

   ```sql
    MySQL [(none)]> show query stats;
    +-----------------+------------+
    | Database        | QueryCount |
    +-----------------+------------+
    | test_query_db   | 1          |
    +-----------------+------------+
    1 rows in set (0.005 sec)
   ```
2023-05-15 10:56:34 +08:00
0068828a94 [Feature](insert) support insert overwrite stmt (#19616) 2023-05-14 20:01:30 +08:00
91cdb79d89 [Bugfix](Outfile) fix that export data to parquet and orc file format (#19436)
1. support export `LARGEINT` data type to parquet/orc file format.
2. Export the DORIS `DATE/DATETIME` type to the `Date/Timestamp` logic type of parquet file format.
3. Fix that the data is not correct when the DATE type data is exported to ORC.
2023-05-13 22:39:24 +08:00
316223ef34 [fix](planner) forbidden query in insert value list (#19493) 2023-05-12 19:46:19 +08:00
4142cc0e8c [fix](merge conflict) fix FE compile error (#19586) 2023-05-12 18:18:22 +08:00
c37d781942 [enchancement](statistics) manually inject table level statistics (#19495)
supports users to manually inject table level statistics.

table stats type:
- row_count

Modify table or partition statistics:
```SQL
ALTER TABLE table_name SET STATS ('k1' = 'v1', ...) 
```

TODO:
- support other table stats type if necessary
- update statistics cache if necessary
2023-05-12 17:03:12 +08:00
26a7f86b66 [improvement](auth)only GRANT_PRIV and USAGE_PRIV can GRANT for RESOURCE (#19547)
only GRANT_PRIV and USAGE_PRIV can GRANT for RESOURCE
2023-05-12 15:47:04 +08:00
26e930eed1 [Fix](multi-catalog) Make BE selection policy works fine when enable prefer_compute_node_for_external_table (#19346) 2023-05-12 15:32:50 +08:00
860ce97622 [feature](torc) support insert only transactional hive table on FE side (#19419)
* [feature](torc) support insert only transactional hive table on FE side

* 3

* commit

* 1
2023-05-12 15:32:26 +08:00
a1da57c63e [opt](Nereids)(WIP) optimize agg and window normalization step 2 #19305
1. refactor aggregate normalization to avoid data amplification before aggregate
2. remove useless aggreagte processing in ExtractAndNormalizeWindowExpression
3. only push distinct aggregate function children

TODO:
1. push down redundant expression in aggregate functions
2. refactor normalize repeat rule
3. move expression normalization and optimization after plan normalization to avoid unexpected expression optimization.
2023-05-12 14:00:13 +08:00
0477a9f5de [fix](dateformat) Fix hour date format (#19569)
Introduced from #19265.
The hour format should support both "5" and "05".
2023-05-12 13:38:41 +08:00
56bc8a762d [decimalv3](literal) use decimalv3 literal if enable_decimal_conversion is true (#19559) 2023-05-12 12:01:54 +08:00
9bf6ecca48 [minor](log) change debug log to info to observe the storage medium change #19529
When user set default_storage_medium to true, the storage medium of all partitions should be SSD,
and cooldown time should be 9999-12-31 23:59:59.
So that it won't change to HDD.

But looks like sometimes it still change to HDD.
So I change the debug log to info to observer it.
2023-05-12 11:02:55 +08:00
e9392780a9 [fix](nereids)fix some nereids planner bugs (#19509)
1.some encrypt and decrypt functions have wrong blockEncryptionMode
2.topN node should compare tuples from intermediate_row_desc with first_sort_slot.tuple_id
3.must keep the limit if it's an uncorrelated in-subquery with limit on sort, like select a from t1 where a in ( select b from t2 order by xx limit yy )
2023-05-12 09:06:16 +08:00
a041f8eabe [fix](fe) Fx SimpleDateFormatter thread unsafe issue by replacing to DateTimeFormatter. (#19265)
DateTimeFormatter replace SimpleDateFormat in fe module because SimpleDateFormat is not thread-safe.
2023-05-11 22:50:24 +08:00
d58498841a [fix](Nereids) Should copy JoinReorderContext for PushdownProject (#19508)
1. should copy JoinReorderContext
2. verify bushy tree join reorder
2023-05-11 21:05:12 +08:00
35c4de9fea [fix](Nereids) convert decimalv2 type to decimalv3 type by mistake (#19491) 2023-05-11 19:11:51 +08:00
c5a53e0caa [tpch](nereids) estimate cost with unknown column stats #19046
make nereids generate more reasonable plans with table row count, but without column stats.
TODO: q5 and q7 is not good, because of column correlation
ps_suppkey and ps_partkey
2023-05-11 19:03:11 +08:00
39ec8aa64c [refactor](complex-type) refactor array/map/struct literal to not invoke execute() function in prepare state (#19068) 2023-05-11 18:44:37 +08:00
99cef84acf [Feature](Nereids) Add nereids minidump (#18747) 2023-05-11 18:36:30 +08:00
45c89c1d3c [Fix](stats) Stats persistence failed when a column is all null values (#19412) 2023-05-11 17:44:44 +08:00
589dd8a9b3 [Fix](multi-catalog) Fix query hms tbl with compressed data files. (#19387)
If submit a query contains hms tbls which data files are compressed (bz2,lzo,lz4 ...), a error will occurs like this: 

```[INTERNAL_ERROR]Only support csv data in utf8 codec``` . 

This is because `org.apache.doris.planner.external.HiveScanNode`  set `fileFormatType` as `TFileFormatType.FORMAT_CSV_PLAIN` whether the real compress algo of data files are.  This pr try to fix this problem.
2023-05-11 14:53:58 +08:00
6d2070c59d [enhancement](stats) Make stats cache item size configurable (#19205) 2023-05-11 13:59:37 +08:00
dc497e11bb [fix](Nereids) avoid to push top Project of JoinCluster in PushdownProjectThroughJoin (#19441)
We shouldn't push top Project of JoinCluster in PushdownProjectThroughJoin

like 

```
 *      Project  (id + 1) if this project is top project of Join Cluster
 *        |     
 *       Join   
 *      /      \         
 *    Join  Join
 *    /  ....
 * Join
```
2023-05-11 13:58:54 +08:00
834bf2eab7 [feature](array) Add array_last lambda function (#18388)
Add array_last lambda function
2023-05-11 13:15:54 +08:00
5167dc1251 [feature](merge-on-write) enable merge on write by default (#19017) 2023-05-11 11:10:48 +08:00
3ba3b6c66f [opt](FileCache) use modification time to determine whether the file is changed (#18906)
Get the last modification time from file status, and use the combination of path and modification time to generate cache identifier.
When a file is changed, the modification time will be changed, so the former cache path will be invalid.
2023-05-11 07:50:39 +08:00
4418eb36a3 [Fix](multi-catalog) Fix some hive partition issues. (#19513)
Fix some hive partition issues.
1. Fix be will crash when using hive partitions field of `date`, `timestamp`, `decimal` type.
2. Fix hdfs uri decode error when using `timestamp` partition filed which will cause some url-encoding for special chars, such as `%3A` will encode `:`.
2023-05-11 07:49:46 +08:00
95833426e8 [BugFix](table-value-function) Fix backends() tvf (#19452)
Change the `Alive/SystemDecommissioned/ClusterDecommissioned` field type of the `backends()`tvf to bool
2023-05-11 07:49:27 +08:00
2d1f597413 [Fix](statistics)Fix hive table statistic bug (#19365)
Fix hive table statistic bug. Collect table/partition level statistics.
2023-05-11 07:48:58 +08:00
41d4ed8367 [Improvement](multicatalog) support show_partitions for hms catalog (#19242)
* [Improvement](multicatalog) support show_partitions for hms catalog

* update according review advice
2023-05-11 01:17:23 +08:00