Commit Graph

1162 Commits

Author SHA1 Message Date
481e9aebdb [Refactor](spark load) remove parquet scanner (#19251) 2023-05-18 19:19:13 +08:00
294599ee45 [feature](jsonb) rename JSONB type name and function name to JSON (#19774)
To be more compatible with MySQL, rename JSONB type name and function name to JSON.

The old JSONB type name and jsonb_xx function can still be used for backward compatibility.

There is a function jsonb_extract remained since json_extract is used by json string function and more work need to change it. It will be changed further.
2023-05-18 16:16:52 +08:00
851886cc18 [minor](datev2) remove datev2 because datev2 is used by default (#19777) 2023-05-18 13:36:11 +08:00
f43e8cc98f [regressiontest](unionall) Regression_test_similar_query_boolean (#19553)
* regression_test_similar_query

* add the ORDER BY

* update ORDER BY to comfirm correctness

---------

Co-authored-by: ZI-MA <chime316@qq.com>
2023-05-18 12:21:32 +08:00
18c1081659 [fix](nereids) fix some nereids bugs (#19711)
1. add json_unquote and json_extract functions
2. remove mv releated code in visitPhysicalOlapScan
3. forbid bitmap and hll type for topn node's sort exprs
4. HashDistributionInfo of olap scan node should use the slots from output not the full schema
5. SelectMaterializedIndexWithoutAggregate should use the filter node's output together with the predicate to get the correct mv
6. forbid SimplifyArithmeticRule for decimal type
7. make DecimalLiteral's type and value consistent with each other if the value is decimalv2
8. json_array need support empty argument
2023-05-18 11:33:56 +08:00
88ca4f3e6b [feature](like) make like regexp used as a sql function (#19755) 2023-05-18 10:03:12 +08:00
67668905d6 [Improve](complex-type)add complex type support unique table with regress test #19751
add complex type support unique table with regress test
struct / map / array now support unique table but no regress test
2023-05-17 21:32:46 +08:00
1d05feea1b [Feature](Nereids) add executable function to support fold constant for functions (#18209)
1. Add date-time functions for fold constant for Nereids.
This is the list of executable date-time function nereids supports up to now:
- now()
- now(int)
- current_timestamp()
- current_timestamp(int)
- localtime()
- localtimestamp()
- curdate()
- current_date()
- curtime()
- current_time()
- date_{add/sub}(),{years/months/days/hours/minutes/seconds}_{add/sub}()
- datediff()
- {date/datev2}()
- {year/quarter/month/day/hour/minute/second}()
- dayof{year/month/week}()
- date_format()
- date_trunc()
- from_days()
- last_day()
- to_monday()
- from_unixtime()
- unix_timestamp()
- utc_timestamp()
- to_date()
- to_days()
- str_to_date()
- makedate()

2. solved problem:
- enable datev2/datetimev2 default.
- refactor Nereids foldConstantOnFE and support fold nested expression.
- separate the executable into multi-files for easily-reading and adding new functions
2023-05-17 21:26:31 +08:00
1eb929e1ca [Bugfix](Jdbc Catalog) fix data type mapping of SQLServer Catalog (#19525)
We map `money/smallmoney` types of SQLSERVER into decimal type of doris.
2023-05-17 21:02:42 +08:00
30c4f25cb3 [fix](multi-catalog) verify the precision of datetime types for each data source (#19544)
Fix threes bugs of timestampv2 precision:
1. Hive catalog doesn't set the precision of timestampv2, and can't get the precision from hive metastore, so set the largest precision for timestampv2;
2. Jdbc catalog use datetimev1 to parse timestamp, and convert to timestampv2, so the precision is lost.
3. TVF doesn't use the precision from meta data of file format.
2023-05-17 20:50:15 +08:00
3e661a30c2 [fix](planner)just return non-empty side of ExprSubstitutionMap if one of ExprSubstitutionMap is empty (#19600) 2023-05-17 15:06:43 +08:00
48ec530d2c [fix](functions) fix least/greatest function coredump bug (#19462)
fix least/greatest function coredump bug
2023-05-17 14:12:52 +08:00
1462e44162 [Bug](topn) fix rowid fetcher merge with empty block (#19712) 2023-05-17 10:56:32 +08:00
Pxl
d784c99360 [Bug](planner) fix unassigned conjunct assigned on wrong node (#19672)
* fix unassigned conjunct assigned on wrong node
2023-05-17 10:28:22 +08:00
Pxl
7f73749b88 [Bug](pipeline) fix distributionColumnIds not updated correct when outputColumnUnique… (#19704)
fix distributionColumnIds not updated correct when outputColumnUnique
2023-05-17 00:13:10 +08:00
325a1d4b28 [vectorized](function) support array_count function (#18557)
support array_count function.
array_count:Returns the number of non-zero and non-null elements in the given array.
2023-05-16 17:00:01 +08:00
e22f5891d2 [WIP](row store) two phase opt read row store (#18654) 2023-05-16 13:21:58 +08:00
9535ed01aa [feature](tvf) Support compress file for tvf hdfs() and s3() (#19530)
We can support this by add a new properties for tvf, like :

`select * from hdfs("uri" = "xxx", ..., "compress_type" = "lz4", ...)`

User can:

Specify compression explicitly by setting `"compression" = "xxx"`.
Doris can infer the compression type by the suffix of file name(e.g. `file1.gz`)
Currently, we only support reading compress file in `csv` format, and on BE side, we already support.
All need to do is to analyze the `"compress_type"` on FE side and pass it to BE.
2023-05-16 08:50:43 +08:00
c87e78dc35 [bug](jsonb) fix jsonb query bug When the json key value contains "." (#19185)
Issue Number: close #19173

mysql> SELECT jsonb_extract('{"a.b.c":{"k1":"v31", "k2.a1": 300},"a":"opentelemetry"}', '$."a.b.c".k1');
+-------------------------------------------------------------------------------------------+
| jsonb_extract('{"a.b.c":{"k1":"v31", "k2.a1": 300},"a":"opentelemetry"}', '$."a.b.c".k1') |
+-------------------------------------------------------------------------------------------+
| "v31" |
+-------------------------------------------------------------------------------------------+
1 row in set (0.06 sec)
2023-05-15 15:43:12 +08:00
052c7cff89 [Fix](Planner) fix cast from decimal to boolean (#19585) 2023-05-15 15:13:16 +08:00
6748ae4a57 [Feature] Collect the information statistics of the query hit (#18805)
1. Show the query hit statistics for `baseall`

   ```sql
    MySQL [test_query_db]> show query stats from baseall;
    +-------+------------+-------------+
    | Field | QueryCount | FilterCount |
    +-------+------------+-------------+
    | k0    | 0          | 0           |
    | k1    | 0          | 0           |
    | k2    | 0          | 0           |
    | k3    | 0          | 0           |
    | k4    | 0          | 0           |
    | k5    | 0          | 0           |
    | k6    | 0          | 0           |
    | k10   | 0          | 0           |
    | k11   | 0          | 0           |
    | k7    | 0          | 0           |
    | k8    | 0          | 0           |
    | k9    | 0          | 0           |
    | k12   | 0          | 0           |
    | k13   | 0          | 0           |
    +-------+------------+-------------+
    14 rows in set (0.002 sec)

    MySQL [test_query_db]> select k0, k1,k2, sum(k3) from baseall  where k9 > 1 group by k0,k1,k2;
    +------+------+--------+-------------+
    | k0   | k1   | k2     | sum(`k3`)   |
    +------+------+--------+-------------+
    |    0 |    6 |  32767 |        3021 |
    |    1 |   12 |  32767 | -2147483647 |
    |    0 |    3 |   1989 |        1002 |
    |    0 |    7 | -32767 |        1002 |
    |    1 |    8 |    255 |  2147483647 |
    |    1 |    9 |   1991 | -2147483647 |
    |    1 |   11 |   1989 |       25699 |
    |    1 |   13 | -32767 |  2147483647 |
    |    1 |   14 |    255 |         103 |
    |    0 |    1 |   1989 |        1001 |
    |    0 |    2 |   1986 |        1001 |
    |    1 |   15 |   1992 |        3021 |
    +------+------+--------+-------------+
    12 rows in set (0.050 sec)

    MySQL [test_query_db]> show query stats from baseall;
    +-------+------------+-------------+
    | Field | QueryCount | FilterCount |
    +-------+------------+-------------+
    | k0    | 1          | 0           |
    | k1    | 1          | 0           |
    | k2    | 1          | 0           |
    | k3    | 1          | 0           |
    | k4    | 0          | 0           |
    | k5    | 0          | 0           |
    | k6    | 0          | 0           |
    | k10   | 0          | 0           |
    | k11   | 0          | 0           |
    | k7    | 0          | 0           |
    | k8    | 0          | 0           |
    | k9    | 1          | 1           |
    | k12   | 0          | 0           |
    | k13   | 0          | 0           |
    +-------+------------+-------------+
    14 rows in set (0.001 sec)
   ```

2. Show the query hit statistics summary for all the mv in a table

   ```sql
   MySQL [test_query_db]> show query stats from baseall all;
    +-----------+------------+
    | IndexName | QueryCount |
    +-----------+------------+
    | baseall   | 1          |
    +-----------+------------+
    1 row in set (0.005 sec)
   ```

3. Show the query hit statistics detail info for all the mv in a table

   ```sql
    MySQL [test_query_db]> show query stats from baseall all verbose;
    +-----------+-------+------------+-------------+
    | IndexName | Field | QueryCount | FilterCount |
    +-----------+-------+------------+-------------+
    | baseall   | k0    | 1          | 0           |
    |           | k1    | 1          | 0           |
    |           | k2    | 1          | 0           |
    |           | k3    | 1          | 0           |
    |           | k4    | 0          | 0           |
    |           | k5    | 0          | 0           |
    |           | k6    | 0          | 0           |
    |           | k10   | 0          | 0           |
    |           | k11   | 0          | 0           |
    |           | k7    | 0          | 0           |
    |           | k8    | 0          | 0           |
    |           | k9    | 1          | 1           |
    |           | k12   | 0          | 0           |
    |           | k13   | 0          | 0           |
    +-----------+-------+------------+-------------+
    14 rows in set (0.017 sec)
   ```

4. Show the query hit for a database

   ```sql
    MySQL [test_query_db]> show query stats for test_query_db;
    +----------------------------+------------+
    | TableName                  | QueryCount |
    +----------------------------+------------+
    | compaction_tbl             | 0          |
    | bigtable                   | 0          |
    | empty                      | 0          |
    | tempbaseall                | 0          |
    | test                       | 0          |
    | test_data_type             | 0          |
    | test_string_function_field | 0          |
    | baseall                    | 1          |
    | nullable                   | 0          |
    +----------------------------+------------+
    9 rows in set (0.005 sec)
   ```

5. Show query hit statistics for all the databases

   ```sql
    MySQL [(none)]> show query stats;
    +-----------------+------------+
    | Database        | QueryCount |
    +-----------------+------------+
    | test_query_db   | 1          |
    +-----------------+------------+
    1 rows in set (0.005 sec)
   ```
2023-05-15 10:56:34 +08:00
92bf485abd [Bug] Fix doris pipeline shared scan and top n opt (#19599) 2023-05-15 10:00:44 +08:00
0068828a94 [Feature](insert) support insert overwrite stmt (#19616) 2023-05-14 20:01:30 +08:00
91cdb79d89 [Bugfix](Outfile) fix that export data to parquet and orc file format (#19436)
1. support export `LARGEINT` data type to parquet/orc file format.
2. Export the DORIS `DATE/DATETIME` type to the `Date/Timestamp` logic type of parquet file format.
3. Fix that the data is not correct when the DATE type data is exported to ORC.
2023-05-13 22:39:24 +08:00
e9392780a9 [fix](nereids)fix some nereids planner bugs (#19509)
1.some encrypt and decrypt functions have wrong blockEncryptionMode
2.topN node should compare tuples from intermediate_row_desc with first_sort_slot.tuple_id
3.must keep the limit if it's an uncorrelated in-subquery with limit on sort, like select a from t1 where a in ( select b from t2 order by xx limit yy )
2023-05-12 09:06:16 +08:00
39ec8aa64c [refactor](complex-type) refactor array/map/struct literal to not invoke execute() function in prepare state (#19068) 2023-05-11 18:44:37 +08:00
ed8a4b4120 [feature-wip](duplicate_no_keys) skip sort function if the table is duplicate without keys (#19483) 2023-05-11 14:44:16 +08:00
dc497e11bb [fix](Nereids) avoid to push top Project of JoinCluster in PushdownProjectThroughJoin (#19441)
We shouldn't push top Project of JoinCluster in PushdownProjectThroughJoin

like 

```
 *      Project  (id + 1) if this project is top project of Join Cluster
 *        |     
 *       Join   
 *      /      \         
 *    Join  Join
 *    /  ....
 * Join
```
2023-05-11 13:58:54 +08:00
834bf2eab7 [feature](array) Add array_last lambda function (#18388)
Add array_last lambda function
2023-05-11 13:15:54 +08:00
5167dc1251 [feature](merge-on-write) enable merge on write by default (#19017) 2023-05-11 11:10:48 +08:00
71f7e9e185 [test](cast func) add test for cast float text to int when nereids is on #19517 2023-05-11 08:24:54 +08:00
4418eb36a3 [Fix](multi-catalog) Fix some hive partition issues. (#19513)
Fix some hive partition issues.
1. Fix be will crash when using hive partitions field of `date`, `timestamp`, `decimal` type.
2. Fix hdfs uri decode error when using `timestamp` partition filed which will cause some url-encoding for special chars, such as `%3A` will encode `:`.
2023-05-11 07:49:46 +08:00
68505a1192 [Test](multi catalog)Add test case for Iceberg External Table. #19488 2023-05-11 01:13:40 +08:00
47edc5a06e [fix](functions) Support nullable column for multi_string functions (#19498) 2023-05-11 01:13:13 +08:00
3a22af836e [fix](jdbc catalog) fix error to clickhouse uint64 type Conversion (#19463)
* [fix](jdbc catalog) fix error to clickhouse uint64 type Conversion

* add test case
2023-05-10 21:53:30 +08:00
d20b5f90d8 [feature](executor) Automatically set the instance_num using the info from be. (#19345)
1. fixed some error regressions (results error with big nstance_num due to incorrect order by).
2. if set parallel_fragment_exec_instance_num to 0, the concurrency in the Pipeline execution engine will automatically be set to half of the number of CPU cores.
3. add limit to parallel_fragment_exec_instance_num that it cannot be set to more than fe.conf::max_instance_num(Default: 128)
```
mysql [(none)]>set parallel_fragment_exec_instance_num = 514;
ERROR 1231 (42000): errCode = 2, detailMessage = Variable 'parallel_fragment_exec_instance_num' can't be set to the value of '514(Should not be set to more than 128)'
```
2023-05-10 17:07:41 +08:00
14a56da397 [chore](testcase) change tpcds q67 testcase name to q67_ignore_temporarily (#19227)
change tpcds q67 testcase name to q67_ignore_temporarily since the error should be ignored temporarily
due to precision problem that will be fixed further.
2023-05-10 15:06:23 +08:00
fae2e5fd22 [enchancement](statistics) implement automatically analyzing statistics and support table level statistics #19420
Add table level statistics, support SHOW TABLE STATS statement to show table level statistics.
Implement automatically analyze statistics, support ANALYZE... WITH AUTO ... statement to automatically analyze statistics.
TODO:

collate relevant p0 tests
Supplement the design description to README.md
Issue Number: close #xxx
2023-05-10 11:47:34 +08:00
Pxl
5473795a51 [Bug](scan) forbiden push down in predicate when in_state->use_set is false (#19471)
forbiden push down in predicate when in_state->use_set is false
2023-05-10 11:12:20 +08:00
4302ceaee8 [Improvement](data types) enhance show data types stmt (#18831) 2023-05-09 09:42:44 +08:00
af04c3acab [fix](sequence-column) Fix sequence_col column used default expr insert failed (#18933) 2023-05-08 17:18:25 +08:00
e78149cb65 [Enhencement](Export) add property for outfile/export and add test (#18997)
This pr does three things:
1. add `delete_existing_files` property for outfile/export. If `delete_existing_files = true`, export/outfile will delete all files under file_path first.
2. add p2 test for export
3. modify docs
2023-05-08 14:02:20 +08:00
4c6ca88088 Revert "[refactor](function) ignore DST for function from_unixtime (#19151)" (#19333)
This reverts commit 9dd6c8f87b73db238bfd38fb1d76f3796910f398.
2023-05-06 16:33:58 +08:00
3f6e5118e6 [enchancement](statistics) support periodic collection of statistics (#19247)
This PR enables periodic collection of statistics and is a precursor to automatic statistics collection. It mainly includes the following contents:

support periodic collection of statistics.
Change the type of Date in statistics p0 to DateV2(see [Enhancement](data-type) add FE config to prohibit create date and decimalv2 type #19077) for test locally. complement cases(remove Chinese characters, optimize code, etc) , improve stability.
Supports setting whether to keep records of statistics synchronization job info, convenient for use in p0 testing.
The statistics job table was modified, and some auxiliary judgments were added to avoid the user perceiving the modification. This function was removed when the table schema is stable.
2023-05-06 14:53:06 +08:00
a72eee24f1 [fix](nereids) fix merge project with window function bug (#19280)
1. don't merge projects if any window function exists
2. bypass SimplifyArithmeticRule for decimalV3 type
2023-05-06 10:38:14 +08:00
3e3262361c [fix](fe)havingClause should be substituted the same way as resultExprs (#19261)
substituted havingClause in the same way as resultExprs to prevent " HAVING clause not produced by aggregation output" error
2023-05-05 18:03:43 +08:00
817f3ce510 [fix](nereids) plan shape on tpch_sf1T q21 case #19291 2023-05-05 14:24:28 +08:00
9dd6c8f87b [refactor](function) ignore DST for function from_unixtime (#19151) 2023-05-05 11:51:49 +08:00
aaf0ef741e [fix](regression) fix inverted_index_p1 q72.sql timeout error (#19241)
Fix inverted_index_p1 q72.sql timeout error
1、the runtime filter exeed wait time and lead to 100w * 1000w data join
2023-05-04 11:05:15 +08:00
52d25f41a4 [feature](multi-catalog) Rename multi-catalog config 'specified_database_list' to 'include_database_list', and introduce new multi-catalog config 'exclude_database_list' (#18834)
In my scene, We need to specify databases that are excluded to synchronize to doris,
like some databases store temporary table.
Since #17803 introduce `specified_database_list` to specify 'include databases',
this pr introduce new config `exclude_database_list` to specify 'exclude databases',
and rename `specified_database_list` to `include_database_list` for naming symmetry.

BTW, when `include_database_list` and `exclude_database_list` specify overlapping databases, `exclude_database_list` would take effect with higher privilege over `include_database_list`.
2023-05-04 09:30:02 +08:00