Commit Graph

1185 Commits

Author SHA1 Message Date
0dce725120 [fix](nereids)fix decimalv3 type error of mod operator (#20039) 2023-05-25 17:25:11 +08:00
002c76e06f [vectorized](udaf) support udaf function work with window function (#19962) 2023-05-25 14:38:47 +08:00
8149b757c4 [Feature](Nereids)support insert into select command (#18869)
support insert the ret-value of a query into a table with `partition`, `with label`, `cols` tags:

```
insert into t partition (p1, p2)
with label label_1
(c1, c2, c3)
[hint1, hint2]
with cte as (
  select * from src
)
select k1, k2, k3 from cte
```

we create new class: InsertIntoTableCommand, Unbound/Logical/PhysicalOlapTableSink to describe the command of insert and the olapTableSink for Nereids. 
We make UnboundOlapTableSink in parsing phase and bind it, then implement and translate the node to OlapTableSink.
Then we run the command with a transaction.
2023-05-25 10:44:41 +08:00
1dd3a4ed3a [fix](Nereids) fix unstable regression test cases and some bugs (#19999)
Fix bugs:
1. should return the other side child of Or if current side is NULL after constant fold
2. Lead should has three parameters, remove the default value ctors

Not enable Nereids case under nereids_p0
1. nereids_p0/join/sql
2. nereids_p0/sql_functions/horology_functions/sql

Should disble Nereids explicitly because the result is not same
1. query_p0/sql_functions/horology_functions/sql
2. query_p0/stats/query_stats_test.groovy
3. query_profile/test_profile.groovy

Unstable regression test case
1. nereids_syntax_p0/join.groovy
2023-05-24 20:34:01 +08:00
a713c225a5 [regressiontest](statistics) Collate and supplement statistics regression test (#19901)
This pr is mainly supplement statistics regression test. include the following:

analyze stats p0 tests:

1. Universal analysis

analyze stats p1 tests:

1. Universal analysis
2. Sampled analysis
3. Incremental analysis
4. Automatic analysis
5. Periodic analysis

manage stats p0 tests:

1. Alter table stats
2. Show table stats
3. Alter column stats
4. Show column stats and histogram
5. Drop column stats
6. Drop expired stats

TODO:

1. Supplement related documents
2. Optimize for unstable cases encountered during testing
3. Add other cases

For pr related to statistics, should ensure that all of these cases pass!
2023-05-24 20:17:28 +08:00
4aad88abc4 [test](Nereids) fix tpcds shape out file #20002 2023-05-24 17:40:13 +08:00
f14e6189a9 [feature](load-refactor) Unfied mysql load use InsertStmt (#19571) 2023-05-24 12:09:16 +08:00
384a0c7aa7 [fix](testcases) Fix some unstable testcases. (#19956)
case of test_string_concat_extremely_long_string will exceed our test limit. Move it to p2 so that it will be tested only in SelectDB test environment.
Because we need to keep consistent with MySQL & avoid overflow. the q67 must keep its behavior like now. When we fully apply nereids & decimalV3 then, it will be fixed automatically.
In the parallel test, although all query stats were cleaned, the cases run in parallel will affect this. So we need to use a unique table for query_stats_test
test_query_sys_tables didn't deal with some unstable situations. fixed it.
temporarily disable unstable case analyze_test case for p0.
2023-05-24 09:52:02 +08:00
a6674bb7b1 [regression](nereids) tpcds sf100 plan shape regression cases (#19913) 2023-05-23 18:48:00 +08:00
35f8fc22f2 [testcase](test) Fix query stats test may failed (#19958) 2023-05-23 18:33:07 +08:00
a434a49f71 [Bug](decimal) fix mod function (#19925)
Bug:
select id, kdcml * ktint, kdcml / ktint, kdcml % ktint from expr_test order by id;
+------+-------------------+-------------------+-----------------------+
| id | kdcml * ktint | kdcml / ktint | kdcml % ktint |
+------+-------------------+-------------------+-----------------------+
| NULL | NULL | NULL | NULL |
| 1 | 24.395 | 24.395 | -4702111234474983.74 |
| 2 | 68.968 | 17.242 | -4702111234474983.74 |
| 3 | 146.268 | 16.252 | -4702111234474983.74 |
| 4 | 275.772 | 17.235 | -4702111234474983.74 |
| 5 | 487.470 | 19.498 | -4702111234474983.74 |
| 6 | 827.244 | 22.979 | -4702111234474983.74 |
| 7 | 1364.860 | 27.854 | -4702111234474983.74 |
| 8 | 2205.928 | 34.467 | -4702111234474983.74 |
| 9 | 3509.595 | 43.328 | -4702111234474983.74 |
| 10 | 5514.790 | 55.147 | -4702111234474983.74 |
| 11 | 8578.988 | 70.900 | -4702111234474983.74 |
| 12 | 13235.484 | 91.913 | -4702111234474983.74 |
| 13 | 24.395 | 24.395 | -4702111234474983.74 |
| 14 | 68.968 | 17.242 | -4702111234474983.74 |
| 15 | 146.268 | 16.252 | -4702111234474983.74 |
| 16 | 275.772 | 17.235 | -4702111234474983.74 |
| 17 | 487.470 | 19.498 | -4702111234474983.74 |
| 18 | 827.244 | 22.979 | -4702111234474983.74 |
| 19 | 1364.860 | 27.854 | -4702111234474983.74 |
| 20 | 2205.928 | 34.467 | -4702111234474983.74 |
| 21 | 3509.595 | 43.328 | -4702111234474983.74 |
| 22 | 5514.790 | 55.147 | -4702111234474983.74 |
| 23 | 8578.988 | 70.900 | -4702111234474983.74 |
| 24 | 13235.484 | 91.913 | -4702111234474983.74 |
2023-05-23 18:24:31 +08:00
c88ba85e10 [Bug](schema-change) fix varchar can not change to datev2 #19952 2023-05-23 18:18:55 +08:00
4398b91576 [Fix](multi catalog)Change all partition names to lower case (#19816)
Iceberg table partition name may contain upper case characters, for example: City=xxx, Nation=xxx.
But in Doris, all column names are in lower case. Here we transfer the partition name to lower case to keep consist with column name.
2023-05-23 09:31:31 +08:00
bd74890cf7 [fix](multi-catalog) JDBC Catalog Unknown UNSIGNED type of mysql, type: [DOUBLE] (#19912) 2023-05-23 09:29:57 +08:00
6762af3c9b [Improve](struct)improve struct support into outfile (#19894)
support select into outfile for struct type
2023-05-22 18:45:56 +08:00
Pxl
9945067e3c [Bug](function) make VcompoundPred optimization work well (#19870)
make VcompoundPred optimization work well
#19818 this pr try to enable VcompoundPred optimization but get wrong result on tpcds q28.
The reason is some nullable logic on mysql need special handling.

mysql [regression_test_tpcds_sf1_p1]>select null and false;
+----------------+
| NULL AND FALSE |
+----------------+
|              0 |
+----------------+
1 row in set (0.00 sec)

mysql [regression_test_tpcds_sf1_p1]>select null and true;
+---------------+
| NULL AND TRUE |
+---------------+
| NULL          |
+---------------+
1 row in set (0.00 sec)

mysql [regression_test_tpcds_sf1_p1]>select null or false;
+---------------+
| NULL OR FALSE |
+---------------+
| NULL          |
+---------------+
1 row in set (0.00 sec)

mysql [regression_test_tpcds_sf1_p1]>select null or true;
+--------------+
| NULL OR TRUE |
+--------------+
|            1 |
+--------------+
1 row in set (0.00 sec)
2023-05-22 18:32:17 +08:00
Pxl
e9223f6a19 [Feature](aggregation) add agg_state define and ddl support (#19824)
add agg_state define and ddl support
2023-05-22 11:45:53 +08:00
Pxl
d64be9565d [Bug](function) fix function in get wrong result when input const column (#19791)
fix function in get wrong result when input const column
2023-05-22 10:58:29 +08:00
8b9813663d [test](executor)add crud regression test for resource group (#19659)
dd crud regression test for resource group (#19659)
2023-05-20 13:49:02 +08:00
ca737c37ee add testcases for inverted index on different datatypes (#19843) 2023-05-20 00:21:34 +08:00
67dc68630b [Improve](complex-type)improve array/map/struct creating and function with decimalv3 (#19830) 2023-05-19 17:43:36 +08:00
609b20bd02 [Feature](planner) use partial update in update from & delete from (#19262) 2023-05-19 09:46:29 +08:00
1d01136b1b [Fix](parquet-reader) Fix partition field conjuncts not work. (#19837)
Fix partition field conjuncts not work.
Add predicate_partition_columns in _slot_id_to_filter_conjuncts(single slot conjuncts) to _filter_conjuncts, others should had been added from not_single_slot_filter_conjuncts.
2023-05-19 08:44:02 +08:00
481e9aebdb [Refactor](spark load) remove parquet scanner (#19251) 2023-05-18 19:19:13 +08:00
294599ee45 [feature](jsonb) rename JSONB type name and function name to JSON (#19774)
To be more compatible with MySQL, rename JSONB type name and function name to JSON.

The old JSONB type name and jsonb_xx function can still be used for backward compatibility.

There is a function jsonb_extract remained since json_extract is used by json string function and more work need to change it. It will be changed further.
2023-05-18 16:16:52 +08:00
851886cc18 [minor](datev2) remove datev2 because datev2 is used by default (#19777) 2023-05-18 13:36:11 +08:00
f43e8cc98f [regressiontest](unionall) Regression_test_similar_query_boolean (#19553)
* regression_test_similar_query

* add the ORDER BY

* update ORDER BY to comfirm correctness

---------

Co-authored-by: ZI-MA <chime316@qq.com>
2023-05-18 12:21:32 +08:00
18c1081659 [fix](nereids) fix some nereids bugs (#19711)
1. add json_unquote and json_extract functions
2. remove mv releated code in visitPhysicalOlapScan
3. forbid bitmap and hll type for topn node's sort exprs
4. HashDistributionInfo of olap scan node should use the slots from output not the full schema
5. SelectMaterializedIndexWithoutAggregate should use the filter node's output together with the predicate to get the correct mv
6. forbid SimplifyArithmeticRule for decimal type
7. make DecimalLiteral's type and value consistent with each other if the value is decimalv2
8. json_array need support empty argument
2023-05-18 11:33:56 +08:00
88ca4f3e6b [feature](like) make like regexp used as a sql function (#19755) 2023-05-18 10:03:12 +08:00
67668905d6 [Improve](complex-type)add complex type support unique table with regress test #19751
add complex type support unique table with regress test
struct / map / array now support unique table but no regress test
2023-05-17 21:32:46 +08:00
1d05feea1b [Feature](Nereids) add executable function to support fold constant for functions (#18209)
1. Add date-time functions for fold constant for Nereids.
This is the list of executable date-time function nereids supports up to now:
- now()
- now(int)
- current_timestamp()
- current_timestamp(int)
- localtime()
- localtimestamp()
- curdate()
- current_date()
- curtime()
- current_time()
- date_{add/sub}(),{years/months/days/hours/minutes/seconds}_{add/sub}()
- datediff()
- {date/datev2}()
- {year/quarter/month/day/hour/minute/second}()
- dayof{year/month/week}()
- date_format()
- date_trunc()
- from_days()
- last_day()
- to_monday()
- from_unixtime()
- unix_timestamp()
- utc_timestamp()
- to_date()
- to_days()
- str_to_date()
- makedate()

2. solved problem:
- enable datev2/datetimev2 default.
- refactor Nereids foldConstantOnFE and support fold nested expression.
- separate the executable into multi-files for easily-reading and adding new functions
2023-05-17 21:26:31 +08:00
1eb929e1ca [Bugfix](Jdbc Catalog) fix data type mapping of SQLServer Catalog (#19525)
We map `money/smallmoney` types of SQLSERVER into decimal type of doris.
2023-05-17 21:02:42 +08:00
30c4f25cb3 [fix](multi-catalog) verify the precision of datetime types for each data source (#19544)
Fix threes bugs of timestampv2 precision:
1. Hive catalog doesn't set the precision of timestampv2, and can't get the precision from hive metastore, so set the largest precision for timestampv2;
2. Jdbc catalog use datetimev1 to parse timestamp, and convert to timestampv2, so the precision is lost.
3. TVF doesn't use the precision from meta data of file format.
2023-05-17 20:50:15 +08:00
3e661a30c2 [fix](planner)just return non-empty side of ExprSubstitutionMap if one of ExprSubstitutionMap is empty (#19600) 2023-05-17 15:06:43 +08:00
48ec530d2c [fix](functions) fix least/greatest function coredump bug (#19462)
fix least/greatest function coredump bug
2023-05-17 14:12:52 +08:00
1462e44162 [Bug](topn) fix rowid fetcher merge with empty block (#19712) 2023-05-17 10:56:32 +08:00
Pxl
d784c99360 [Bug](planner) fix unassigned conjunct assigned on wrong node (#19672)
* fix unassigned conjunct assigned on wrong node
2023-05-17 10:28:22 +08:00
Pxl
7f73749b88 [Bug](pipeline) fix distributionColumnIds not updated correct when outputColumnUnique… (#19704)
fix distributionColumnIds not updated correct when outputColumnUnique
2023-05-17 00:13:10 +08:00
325a1d4b28 [vectorized](function) support array_count function (#18557)
support array_count function.
array_count:Returns the number of non-zero and non-null elements in the given array.
2023-05-16 17:00:01 +08:00
e22f5891d2 [WIP](row store) two phase opt read row store (#18654) 2023-05-16 13:21:58 +08:00
9535ed01aa [feature](tvf) Support compress file for tvf hdfs() and s3() (#19530)
We can support this by add a new properties for tvf, like :

`select * from hdfs("uri" = "xxx", ..., "compress_type" = "lz4", ...)`

User can:

Specify compression explicitly by setting `"compression" = "xxx"`.
Doris can infer the compression type by the suffix of file name(e.g. `file1.gz`)
Currently, we only support reading compress file in `csv` format, and on BE side, we already support.
All need to do is to analyze the `"compress_type"` on FE side and pass it to BE.
2023-05-16 08:50:43 +08:00
c87e78dc35 [bug](jsonb) fix jsonb query bug When the json key value contains "." (#19185)
Issue Number: close #19173

mysql> SELECT jsonb_extract('{"a.b.c":{"k1":"v31", "k2.a1": 300},"a":"opentelemetry"}', '$."a.b.c".k1');
+-------------------------------------------------------------------------------------------+
| jsonb_extract('{"a.b.c":{"k1":"v31", "k2.a1": 300},"a":"opentelemetry"}', '$."a.b.c".k1') |
+-------------------------------------------------------------------------------------------+
| "v31" |
+-------------------------------------------------------------------------------------------+
1 row in set (0.06 sec)
2023-05-15 15:43:12 +08:00
052c7cff89 [Fix](Planner) fix cast from decimal to boolean (#19585) 2023-05-15 15:13:16 +08:00
6748ae4a57 [Feature] Collect the information statistics of the query hit (#18805)
1. Show the query hit statistics for `baseall`

   ```sql
    MySQL [test_query_db]> show query stats from baseall;
    +-------+------------+-------------+
    | Field | QueryCount | FilterCount |
    +-------+------------+-------------+
    | k0    | 0          | 0           |
    | k1    | 0          | 0           |
    | k2    | 0          | 0           |
    | k3    | 0          | 0           |
    | k4    | 0          | 0           |
    | k5    | 0          | 0           |
    | k6    | 0          | 0           |
    | k10   | 0          | 0           |
    | k11   | 0          | 0           |
    | k7    | 0          | 0           |
    | k8    | 0          | 0           |
    | k9    | 0          | 0           |
    | k12   | 0          | 0           |
    | k13   | 0          | 0           |
    +-------+------------+-------------+
    14 rows in set (0.002 sec)

    MySQL [test_query_db]> select k0, k1,k2, sum(k3) from baseall  where k9 > 1 group by k0,k1,k2;
    +------+------+--------+-------------+
    | k0   | k1   | k2     | sum(`k3`)   |
    +------+------+--------+-------------+
    |    0 |    6 |  32767 |        3021 |
    |    1 |   12 |  32767 | -2147483647 |
    |    0 |    3 |   1989 |        1002 |
    |    0 |    7 | -32767 |        1002 |
    |    1 |    8 |    255 |  2147483647 |
    |    1 |    9 |   1991 | -2147483647 |
    |    1 |   11 |   1989 |       25699 |
    |    1 |   13 | -32767 |  2147483647 |
    |    1 |   14 |    255 |         103 |
    |    0 |    1 |   1989 |        1001 |
    |    0 |    2 |   1986 |        1001 |
    |    1 |   15 |   1992 |        3021 |
    +------+------+--------+-------------+
    12 rows in set (0.050 sec)

    MySQL [test_query_db]> show query stats from baseall;
    +-------+------------+-------------+
    | Field | QueryCount | FilterCount |
    +-------+------------+-------------+
    | k0    | 1          | 0           |
    | k1    | 1          | 0           |
    | k2    | 1          | 0           |
    | k3    | 1          | 0           |
    | k4    | 0          | 0           |
    | k5    | 0          | 0           |
    | k6    | 0          | 0           |
    | k10   | 0          | 0           |
    | k11   | 0          | 0           |
    | k7    | 0          | 0           |
    | k8    | 0          | 0           |
    | k9    | 1          | 1           |
    | k12   | 0          | 0           |
    | k13   | 0          | 0           |
    +-------+------------+-------------+
    14 rows in set (0.001 sec)
   ```

2. Show the query hit statistics summary for all the mv in a table

   ```sql
   MySQL [test_query_db]> show query stats from baseall all;
    +-----------+------------+
    | IndexName | QueryCount |
    +-----------+------------+
    | baseall   | 1          |
    +-----------+------------+
    1 row in set (0.005 sec)
   ```

3. Show the query hit statistics detail info for all the mv in a table

   ```sql
    MySQL [test_query_db]> show query stats from baseall all verbose;
    +-----------+-------+------------+-------------+
    | IndexName | Field | QueryCount | FilterCount |
    +-----------+-------+------------+-------------+
    | baseall   | k0    | 1          | 0           |
    |           | k1    | 1          | 0           |
    |           | k2    | 1          | 0           |
    |           | k3    | 1          | 0           |
    |           | k4    | 0          | 0           |
    |           | k5    | 0          | 0           |
    |           | k6    | 0          | 0           |
    |           | k10   | 0          | 0           |
    |           | k11   | 0          | 0           |
    |           | k7    | 0          | 0           |
    |           | k8    | 0          | 0           |
    |           | k9    | 1          | 1           |
    |           | k12   | 0          | 0           |
    |           | k13   | 0          | 0           |
    +-----------+-------+------------+-------------+
    14 rows in set (0.017 sec)
   ```

4. Show the query hit for a database

   ```sql
    MySQL [test_query_db]> show query stats for test_query_db;
    +----------------------------+------------+
    | TableName                  | QueryCount |
    +----------------------------+------------+
    | compaction_tbl             | 0          |
    | bigtable                   | 0          |
    | empty                      | 0          |
    | tempbaseall                | 0          |
    | test                       | 0          |
    | test_data_type             | 0          |
    | test_string_function_field | 0          |
    | baseall                    | 1          |
    | nullable                   | 0          |
    +----------------------------+------------+
    9 rows in set (0.005 sec)
   ```

5. Show query hit statistics for all the databases

   ```sql
    MySQL [(none)]> show query stats;
    +-----------------+------------+
    | Database        | QueryCount |
    +-----------------+------------+
    | test_query_db   | 1          |
    +-----------------+------------+
    1 rows in set (0.005 sec)
   ```
2023-05-15 10:56:34 +08:00
92bf485abd [Bug] Fix doris pipeline shared scan and top n opt (#19599) 2023-05-15 10:00:44 +08:00
0068828a94 [Feature](insert) support insert overwrite stmt (#19616) 2023-05-14 20:01:30 +08:00
91cdb79d89 [Bugfix](Outfile) fix that export data to parquet and orc file format (#19436)
1. support export `LARGEINT` data type to parquet/orc file format.
2. Export the DORIS `DATE/DATETIME` type to the `Date/Timestamp` logic type of parquet file format.
3. Fix that the data is not correct when the DATE type data is exported to ORC.
2023-05-13 22:39:24 +08:00
e9392780a9 [fix](nereids)fix some nereids planner bugs (#19509)
1.some encrypt and decrypt functions have wrong blockEncryptionMode
2.topN node should compare tuples from intermediate_row_desc with first_sort_slot.tuple_id
3.must keep the limit if it's an uncorrelated in-subquery with limit on sort, like select a from t1 where a in ( select b from t2 order by xx limit yy )
2023-05-12 09:06:16 +08:00
39ec8aa64c [refactor](complex-type) refactor array/map/struct literal to not invoke execute() function in prepare state (#19068) 2023-05-11 18:44:37 +08:00
ed8a4b4120 [feature-wip](duplicate_no_keys) skip sort function if the table is duplicate without keys (#19483) 2023-05-11 14:44:16 +08:00