Commit Graph

387 Commits

Author SHA1 Message Date
4608dcb2d9 [fix](agg) fix coredump caused by push down count aggregation (#22699)
fix coredump caused by push down count aggregation
2023-08-09 10:21:20 +08:00
1a8a1e5b16 [Feature](count_by_enum) support count_by_enum function (#22071)
count_by_enum(expr1, expr2, ... , exprN);

Treats the data in a column as an enumeration and counts the number of values in each enumeration. Returns the number of enumerated values for each column, and the number of non-null values versus the number of null values.
2023-08-06 16:05:14 +08:00
d3b50e3b2a [BUG](date_trunc) fix date_trunc function only handle lower string (#22602)
fix date_trunc function only handle lower string
2023-08-05 12:53:13 +08:00
469886eb4e [FIX](array)fix if function for array() #22553
[FIX](array)fix if function for array() #22553
2023-08-03 19:40:45 +08:00
7a2ff56863 [regression](fix) fix test_round case (#22441) 2023-08-01 11:35:44 +08:00
c1f36639fd [fix](sort) VSortedRunMerger does not return any rows with a large offset value (#22191) 2023-07-31 22:28:13 +08:00
3a1d678ca9 [Fix](Planner) fix parse error of view with group_concat order by (#22196)
Problem:
    When create view with projection group_concat(xxx, xxx order by orderkey). It will failed during second parse of inline view

For example:
    it works when doing 
    "SELECT id, group_concat(`name`, "," ORDER BY id) AS test_group_column FROM  test GROUP BY id"
    but when create view it does not work
    "create view test_view as SELECT id, group_concat(`name`, "," ORDER BY id) AS test_group_column FROM  test GROUP BY id"

Reason:
    when creating view, we will doing parse again of view.toSql() to check whether it has some syntax error. And when doing toSql() to group_concat with order by, it add seperate ', ' between second parameter and order by. So when parsing again, it
would failed because it is different semantic with original statement.
    group_concat(`name`, "," ORDER BY id)  ==> group_concat(`name`, "," , ORDER BY id)

Solved:
    Change toSql of group_concat and add order by statement analyze() of group_concat in Planner cause it would work if we get order by from view statement and do not analyze and binding slot reference to it
2023-07-31 17:20:23 +08:00
7261845b3d [FIX](complex-type)fix complex type nested col_const (#22375)
for array/map/struct in mysql_writer unpack_if_const only unpack self column not nested , so col_const should not used in nested column.
2023-07-31 14:53:18 +08:00
79289e32dc [fix](cast) fix wrong result of casting empty string to array date (#22281) 2023-07-30 21:15:03 +08:00
b5fa29e138 [fix](bitmap) incorrect result of function 'bitmap_from_array' (#22305) 2023-07-27 22:44:06 +08:00
ae5e39ad26 [opt](Nereids) add double signature back for round like function (#22284)
add double signature back for round like function
2023-07-27 19:10:43 +08:00
341c45974c [round](decimalv2) round precise decimalv2 value (#22258) 2023-07-27 10:00:36 +08:00
163a38a527 [opt](Nereids) support sql cache (#22144)
1. let Nereids support sql cache
2. let legacy planner's sql cache supports union all
2023-07-27 09:57:31 +08:00
8ff487cc4b [fix](cast) fix invalid value error when casting null date value to string then casting to date value (#22223) 2023-07-26 17:59:01 +08:00
b41fcbb783 [feature](agg) add the aggregation function 'mag_agg' (#22043)
New aggregation function: map_agg.

This function requires two arguments: a key and a value, which are used to build a map.

select map_agg(column1, column2) from t group by column3;
2023-07-25 11:21:03 +08:00
a0463ea047 [round](decimalv2) round decimalv2 to precision value (#22138)
* [round](decimalv2) round decimalv2 to precision value

* update

* update`
2023-07-25 03:29:48 +08:00
21deb57a4d [fix](Nereids) remove double sigature of ceil, floor and round (#22134)
we convert input parameters to double for function ceil, floor and round,
because DecimalV2 could not do these operation. Since we intro DecimalV3,
we should convert all parameters to DecimalV3 to get correct result.
For example, when we use double as parameters, we get wrong result:
```sql
select round(341/20000,4),341/20000,round(0.01705,4);
+-------------------------+---------------+-------------------+
| round((341 / 20000), 4) | (341 / 20000) | round(0.01705, 4) |
+-------------------------+---------------+-------------------+
| 0.017                   | 0.01705       | 0.0171            |
+-------------------------+---------------+-------------------+
```
DecimalV3 could get correct result
```sql
select round(341/20000,4),341/20000,round(0.01705,4);
+-------------------------+---------------+-------------------+
| round((341 / 20000), 4) | (341 / 20000) | round(0.01705, 4) |
+-------------------------+---------------+-------------------+
| 0.0171                  | 0.01705       | 0.0171            |
+-------------------------+---------------+-------------------+
```
2023-07-24 16:08:00 +08:00
f7e3cc1553 [FIX](map)fix map proto contains_null #22107
when we select map in order by and limit; be node will coredump
2023-07-22 10:41:55 +08:00
732e0d14ff [Enhancement](window-funnel)add different modes for window_funnel() function (#20563) 2023-07-21 13:57:27 +08:00
5b043a980e [fix](planner)only forbid literal value in AnalyticExpr's order by list (#21819)
* [fix](planner)only forbid literal value in AnalyticExpr's order by list
2023-07-19 09:40:55 +08:00
a9ea138caf [fix](two level hash table) fix dead loop when converting to two level hash table for zero value (#21899)
When enable two level hash table , if there is zero value in the existing one level hash table, it will cause dead loop when converting to two level hash table, because the PartitionedHashTable::_is_partitioned flag is not set correctly when doing the converting.
2023-07-18 19:50:30 +08:00
03b575842d [Feature](table function) support explode_json_array_json (#21795) 2023-07-17 11:40:02 +08:00
736d6f3b4c [improvement](timezone) support mixed uppper-lower case of timezone names (#21572) 2023-07-11 09:37:14 +08:00
fba3ae96b9 Revert "[Fix](planner) Set inline view output as non constant after analyze (#21212)" (#21581)
This reverts commit 0c3acfdb7c744decb7b60e372007707a55d14e00.
2023-07-06 20:30:27 +08:00
0c3acfdb7c [Fix](planner) Set inline view output as non constant after analyze (#21212)
Problem:
Select list should be non const when from list have tables or multiple tuples. Or upper query will regard wrong of isConstant
And make wrong constant folding
For example: when using nullif funtion with subquery which result in two alternative constant, planner would treat it as constant expr. So analyzer would report an error of order by clause can not be constant

Solusion:
Change inline view output to non constant, because (select 1 a from table) as view , a in output is no constant when we see
view.a outside
2023-07-06 15:37:43 +08:00
0469c02202 [Test](regression) Temporarily disable quickTest for SHOW CREATE TABLE to adapt to enable_feature_binlog=true (#21247) 2023-07-05 10:12:02 +08:00
8e8a8da2e7 [Improve](regresstest) update collect distinct regress test for array hash (#21417)
this regress sql can make sense of array hashing function is working fine
2023-07-03 12:16:11 +08:00
33fa5dd1e9 [fix](cast) fix coredump of cast string of invalid datetime (#21350)
For sql like select cast("627492340" as datetime); the string is an invalid datetime, function DateV2Value<T>::from_date_str cast it as datetime 2062-74-92 23:40:00, with an out-of-range month and day value, which cause memory violation in function DateV2Value<T>::format_datetime when trying to access s_days_in_month.

==256444==ERROR: AddressSanitizer: global-buffer-overflow on address 0x55a7c1a5cff8 at pc 0x55a7e5aa3d2a bp 0x7f3b805f0370 sp 0x7f3b805f0368
READ of size 4 at 0x55a7c1a5cff8 thread T390 (FragmentMgrThre)
    #0 0x55a7e5aa3d29 in doris::vectorized::DateV2Value<doris::vectorized::DateTimeV2ValueType>::format_datetime(unsigned int*, bool*) const /home/zcp/repo_center/doris_master/doris/be/src/vec/runtime/vdatetime_value.cpp:1821:31
    #1 0x55a7e5aa3052 in doris::vectorized::DateV2Value<doris::vectorized::DateTimeV2ValueType>::from_date_str(char const*, int, int) /home/zcp/repo_center/doris_master/doris/be/src/vec/runtime/vdatetime_value.cpp:1968:5
    #2 0x55a7d48f0c49 in bool doris::vectorized::read_datetime_v2_text_impl<unsigned long>(unsigned long&, doris::vectorized::ReadBuffer&, unsigned int) /home/zcp/repo_center/doris_master/doris/be/src/vec/io/io_helper.h:309:19
    #3 0x55a7ddb21642 in bool doris::vectorized::try_read_datetime_v2_text<unsigned long>(unsigned long&, doris::vectorized::ReadBuffer&, unsigned int) /home/zcp/repo_center/doris_master/doris/be/src/vec/io/io_helper.h:409:12
    #4 0x55a7ddb215ec in bool doris::vectorized::try_parse_impl<doris::vectorized::DataTypeDateTimeV2, unsigned int, void*>(doris::vectorized::DataTypeDateTimeV2::FieldType&, doris::vectorized::ReadBuffer&, DateLUTImpl const*, unsigned int) /home/zcp/repo_center/doris_master/doris/be/src/vec/functions/function_cast.h:839:16
    #5 0x55a7ddb21c84 in auto doris::Status doris::vectorized::ConvertThroughParsing<doris::vectorized::DataTypeString, doris::vectorized::DataTypeDateTimeV2, doris::vectorized::NameCast>::execute<void*>(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long>> const&, unsigned long, unsigned long, bool, void*)::'lambda'(void*, auto)::operator()<std::integral_constant<bool, false>, std::integral_constant<bool, true>>(void*, auto) const /home/zcp/repo_center/doris_master/doris/be/src/vec/functions/function_cast.h:1340:38
    #6 0x55a7ddb216f7 in void* std::__invoke_impl<doris::Status, doris::Status doris::vectorized::ConvertThroughParsing<doris::vectorized::DataTypeString, doris::vectorized::DataTypeDateTimeV2, doris::vectorized::NameCast>::execute<void*>(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long>> const&, unsigned long, unsigned long, bool, void*)::'lambda'(void*, auto), std::integral_constant<bool, false>, std::integral_constant<bool, true>>(std::__invoke_other, auto&&, std::integral_constant<bool, false>&&, std::integral_constant<bool, true>&&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14
    #7 0x55a7ddb2167f in std::__invoke_result<void*, std::integral_constant<bool, false>, std::integral_constant<bool, true>>::type std::__invoke<doris::Status doris::vectorized::ConvertThroughParsing<doris::vectorized::DataTypeString, doris::vectorized::DataTypeDateTimeV2, doris::vectorized::NameCast>::execute<void*>(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long>> const&, unsigned long, unsigned long, bool, void*)::'lambda'(void*, auto), std::integral_constant<bool, false>, std::integral_constant<bool, true>>(void*&&, std::integral_constant<bool, false>&&, std::integral_constant<bool, true>&&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96:14
    #8 0x55a7ddb20d14 in std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<std::__detail::__variant::__deduce_visit_result<doris::Status> (*)(doris::Status doris::vectorized::ConvertThroughParsing<doris::vectorized::DataTypeString, doris::vectorized::DataTypeDateTimeV2, doris::vectorized::NameCast>::execute<void*>(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long>> const&, unsigned long, unsigned long, bool, void*)::'lambda'(void*, auto)&&, std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>&&, std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>&&)>, std::integer_sequence<unsigned long, 0ul, 1ul>>::__visit_invoke(doris::Status doris::vectorized::ConvertThroughParsing<doris::vectorized::DataTypeString, doris::vectorized::DataTypeDateTimeV2, doris::vectorized::NameCast>::execute<void*>(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long>> const&, unsigned long, unsigned long, bool, void*)::'lambda'(void*, auto)&&, std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>&&, std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>&&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/variant:1013:11
    #9 0x55a7ddb20c15 in decltype(auto) std::__do_visit<std::__detail::__variant::__deduce_visit_result<doris::Status>, doris::Status doris::vectorized::ConvertThroughParsing<doris::vectorized::DataTypeString, doris::vectorized::DataTypeDateTimeV2, doris::vectorized::NameCast>::execute<void*>(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long>> const&, unsigned long, unsigned long, bool, void*)::'lambda'(void*, auto), std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>, std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>>(auto&&, std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>&&, std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>&&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/variant:1714:14
    #10 0x55a7ddb20b6a in decltype(auto) std::visit<doris::Status doris::vectorized::ConvertThroughParsing<doris::vectorized::DataTypeString, doris::vectorized::DataTypeDateTimeV2, doris::vectorized::NameCast>::execute<void*>(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long>> const&, unsigned long, unsigned long, bool, void*)::'lambda'(void*, auto), std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>, std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>>(void*&&, std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>&&, std::variant<std::integral_constant<bool, false>, std::integral_constant<bool, true>>&&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/variant:1769:9
    #11 0x55a7ddb205ff in doris::Status doris::vectorized::ConvertThroughParsing<doris::vectorized::DataTypeString, doris::vectorized::DataTypeDateTimeV2, doris::vectorized::NameCast>::execute<void*>(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long>> const&, unsigned long, unsigned long, bool, void*) /home/zcp/repo_center/doris_master/doris/be/src/vec/functions/function_cast.h:1321:23
    #12 0x55a7ddb1f2c7 in doris::vectorized::FunctionConvertFromString<doris::vectorized::DataTypeDateTimeV2, doris::vectorized::NameCast>::execute_impl(doris::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long>> const&, unsigned long, unsigned long) /home/zcp/repo_center/doris_master/doris/be/src/vec/functions/function_cast.h:1417:20
2023-06-30 10:12:31 +08:00
5506faa7b4 [datetimev2](minor) Add scale parameter for datetimev2 (#21176) 2023-06-27 19:55:35 +08:00
e0b20f0437 [feature](function) add ip function ipv4numtostring (alias inet_ntoa) (#20936) 2023-06-27 10:17:40 +08:00
638aa41988 [fix](planner) fix push filter through agg #21080
In the previous implementation, the check for groupby exprs was ignored. Add this necessary check to make sure it would work

You could reproduce it by runnning belowing sql:

CREATE TABLE t_push_filter_through_agg (col1 varchar(11451) not null, col2 int not null, col3 int not null)
UNIQUE KEY(col1)
DISTRIBUTED BY HASH(col1)
BUCKETS 3
PROPERTIES(
    "replication_num"="1"
);

CREATE VIEW `view_i` AS 
SELECT 
    `b`.`col1` AS `col1`, 
    `b`.`col2` AS `col2`
FROM 
(
    SELECT 
        `col1` AS `col1`, 
        sum(`cost`) AS `col2`
    FROM 
    (
        SELECT 
            `col1` AS `col1`, 
            sum(CAST(`col3` AS INT)) AS `cost` 
        FROM 
            `t_push_filter_through_agg` 
        GROUP BY 
            `col1`
    ) a 
    GROUP BY 
        `col1`
) b;

SELECT SUM(`total_cost`) FROM view_a WHERE `dt` BETWEEN '2023-06-12' AND '2023-06-18' LIMIT 1;
2023-06-25 19:14:20 +08:00
4d84cd8ca1 Revert "Revert "[Test](regression) CCR syncer thrift interface regression test (#20935)" (#20990)" (#21022)
This reverts commit 2a294801f1324a999570158eea3224239eefbb29.
2023-06-21 15:20:21 +08:00
18beb822a3 [FIX](array-type) fix array string output with fe const expr (#21042)
fe foldconstRule make array() function expr with const literal , and would not pass this array literal to be . but we should make fe array string output format is same with be array string output
2023-06-21 11:52:02 +08:00
0cf9de8cef [fix](decimalv3) fix result error when cast a round decimalv3 to double (#20678) 2023-06-21 00:02:48 +08:00
f10258577b [Fix](Planner) Fix group concat with multi distinct and segs (#20912)
Problem:
when use select group_concat(distinct a, 'seg1'), group_concat(distinct b, 'seg2') ... Error would rised
Reason:
Group_concat function regard 'seg' as arguments also, so multi distinct column error would rised
Solved:
let Multi Distinct group_concat function only get first argument as real argument
2023-06-20 21:00:18 +08:00
824bc02603 [Function] Support date function: microsecond() (#20044) 2023-06-20 10:32:54 +08:00
2a294801f1 Revert "[Test](regression) CCR syncer thrift interface regression test (#20935)" (#20990)
This reverts commit dd482b74c849b022862e7cfb1f1d0b933a84e3d2.
2023-06-19 21:38:03 +08:00
fb9fcf460a [fix](leftjoin) fix bug of left and full join with other conjuncts (#20946)
Fix bug of left and full outer join with other conjuncts. When equal matched row count of a probe row exceed batch_size, some times the _join_node->_is_any_probe_match_row_output flag is not set correcty, which result in outputing extra rows for the probe row.
2023-06-19 12:27:06 +08:00
1efd345963 [Enhancement](table) adding information_schema.parameters table (#20259)
this is a virtual table for compatibility information_schema parameters table
2023-06-19 09:05:46 +08:00
dd482b74c8 [Test](regression) CCR syncer thrift interface regression test (#20935) 2023-06-18 00:13:09 +08:00
97135a1cbb [Feature] (json)add json_contains function (#20824) 2023-06-16 15:10:12 +08:00
Pxl
a0d4f11667 [Bug](function) catch error state in function cast to avoid core dump (#20751)
catch error state in function cast to avoid core dump
2023-06-14 17:34:34 +08:00
affe36d32e [test](find_in_set) add find_in_set function test case (#20718) 2023-06-14 09:43:48 +08:00
feb21fc9e9 [fix](group_concat) use default seperator ',' instead of ', ' for group_concat, to be consistant with mysql (#20741) 2023-06-13 17:20:29 +08:00
99c0592157 [Feature](array-function) Support array_pushback function #17417 (#19988)
Implement array_pushback.

mysql> select array_pushback([1, 2], 3);
+--------------------------------+
| array_pushback(ARRAY(1, 2), 3) |
+--------------------------------+
| [1, 2, 3]                      |
+--------------------------------+
1 row in set (0.01 sec)
2023-06-12 16:51:12 +08:00
10134ea8c6 [fix](planner) fix RewriteInPredicateRule may be useless (#20668)
Issue Number: close #20669

RewriteInPredicateRule may cast InPredicate expr's two child to the same type, for example: where cast(age as char) in ('11'), the type of age is int, RewriteInPredicateRule will cast expr's two child type to int. As in the example above, child 0 will be such struct: 
```
child 0: type: int
    |---  child: type : char
            |-- child: type : int
```

Due to the RewriteInPredicateRule cast the type of the expr to int, it will reanalyze stmt, but it will reset stmt first before reanalyze the stmt, and reset opt will change child 0 to such struct:
```
child: type : char
    |-- child: type : int
```
It cause two child's type will be cast to varchar in func castAllToCompatibleType, the logic of RewriteInPredicateRule will be useless.

In 1.1-lts and 1.2-lts, such case  " where cast(age as char) in ('11')"  can't work well,  because func castAllToCompatibleType will cast int to char but int can't cast to char(master can work well because func castAllToCompatibleType will cast int to varchar in such case).
```
MySQL [test]> select user_id from test_cast where cast(age as char) in ('45');
ERROR 1105 (HY000): errCode = 2, detailMessage = type not match, originType=INT, targeType=CHAR(*)
```
2023-06-12 14:39:01 +08:00
a347063390 [fix](case expr) fix coredump of case for null value 2 (#20635)
fix coredump of case for null value 2
2023-06-11 23:08:53 +08:00
dd71e101d3 [fix](case expr) fix coredump of case for null value (#20564)
be coredump when when expr is null:
2023-06-08 20:05:23 +08:00
49f8f20fb1 [fix](regex) String with Chinese characters matching failed (#20493) 2023-06-07 07:27:47 +08:00
ae428c29e2 [feature](planner)(nereids) support user defined variable (#20334)
Support user-defined variables.
After this PR, we can use `set @a = xx` to define a user variable and use it in the query like `select @a`.

the changes of this PR:
1. Support the grammar for `set user variable` in the parser.
2. Add the `userVars` in `VariableMgr` to store the user-defined variables.
3. For the `set @a = xx`, we will store the variable name and its value in the `userVars` in `VariableMgr`.
4. For the `select @a`, we will get the value for the variable name in `userVars`.
2023-06-06 14:35:16 +08:00