347cceb530
[Feature](inverted index) push count on index down to scan node ( #22687 )
...
Co-authored-by: airborne12 <airborne12@gmail.com >
2023-09-02 22:24:43 +08:00
228f0ac5bb
[Feature](Multi-Catalog) support query doris bitmap column in external jdbc catalog ( #23021 )
2023-09-02 12:46:33 +08:00
68aa4867b0
[fix](map_agg) lost scale information for decimal type ( #23776 )
2023-09-02 08:03:33 +08:00
657e927d50
[fix](json)Fix the bug that read json file Out of bounds access ( #23411 )
2023-09-02 01:11:37 +08:00
6630f92878
[Enhancement](Load) stream tvf support json ( #23752 )
...
stream tvf support json
[{"id":1, "name":"ftw", "age":18}]
[{"id":2, "name":"xxx", "age":17}]
[{"id":3, "name":"yyy", "age":19}]
example:
curl -v --location-trusted -u root: -H "sql: insert into test.t1(c1, c2) select id, name from http_stream(\"format\" = \"json\", \"strip_outer_array\" = \"true\", \"read_json_by_line\" = \"true\")" -T /root/json_file.json http://127.0.0.1:8030/api/_http_stream
2023-09-02 01:09:06 +08:00
75e2bc8a25
[function](bitmap) support bitmap_to_base64 and bitmap_from_base64 ( #23759 )
2023-09-02 00:58:48 +08:00
32853a529c
[Bug](cte) fix multi cast data stream source not open expr ( #23740 )
...
fix multi cast data stream source not open expr
2023-09-01 14:57:12 +08:00
eaf2a6a80e
[fix](date) return right date value even if out of the range of date dictionary( #23664 )
...
PR(https://github.com/apache/doris/pull/22360 ) and PR(https://github.com/apache/doris/pull/22384 ) optimized the performance of date type. However hive supports date out of 1970~2038, leading wrong date value in tpcds benchmark.
How to fix:
1. Increase dictionary range: 1900 ~ 2038
2. The date out of 1900 ~ 2038 is regenerated.
2023-09-01 14:40:20 +08:00
52e645abd2
[Feature](Nereids): support cte for update and delete statements of Nereids ( #23384 )
2023-08-31 23:36:27 +08:00
e680d42fe7
[feature](information_schema)add metadata_name_ids for quickly get catlogs,db,table and add profiling table in order to Compatible with mysql ( #22702 )
...
add information_schema.metadata_name_idsfor quickly get catlogs,db,table.
1. table struct :
```mysql
mysql> desc internal.information_schema.metadata_name_ids;
+---------------+--------------+------+-------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+--------------+------+-------+---------+-------+
| CATALOG_ID | BIGINT | Yes | false | NULL | |
| CATALOG_NAME | VARCHAR(512) | Yes | false | NULL | |
| DATABASE_ID | BIGINT | Yes | false | NULL | |
| DATABASE_NAME | VARCHAR(64) | Yes | false | NULL | |
| TABLE_ID | BIGINT | Yes | false | NULL | |
| TABLE_NAME | VARCHAR(64) | Yes | false | NULL | |
+---------------+--------------+------+-------+---------+-------+
6 rows in set (0.00 sec)
mysql> select * from internal.information_schema.metadata_name_ids where CATALOG_NAME="hive1" limit 1 \G;
*************************** 1. row ***************************
CATALOG_ID: 113008
CATALOG_NAME: hive1
DATABASE_ID: 113042
DATABASE_NAME: ssb1_parquet
TABLE_ID: 114009
TABLE_NAME: dates
1 row in set (0.07 sec)
```
2. when you create / drop catalog , need not refresh catalog .
```mysql
mysql> select count(*) from internal.information_schema.metadata_name_ids\G;
*************************** 1. row ***************************
count(*): 21301
1 row in set (0.34 sec)
mysql> drop catalog hive2;
Query OK, 0 rows affected (0.01 sec)
mysql> select count(*) from internal.information_schema.metadata_name_ids\G;
*************************** 1. row ***************************
count(*): 10665
1 row in set (0.04 sec)
mysql> create catalog hive3 ...
mysql> select count(*) from internal.information_schema.metadata_name_ids\G;
*************************** 1. row ***************************
count(*): 21301
1 row in set (0.32 sec)
```
3. create / drop table , need not refresh catalog .
```mysql
mysql> CREATE TABLE IF NOT EXISTS demo.example_tbl ... ;
mysql> select count(*) from internal.information_schema.metadata_name_ids\G;
*************************** 1. row ***************************
count(*): 10666
1 row in set (0.04 sec)
mysql> drop table demo.example_tbl;
Query OK, 0 rows affected (0.01 sec)
mysql> select count(*) from internal.information_schema.metadata_name_ids\G;
*************************** 1. row ***************************
count(*): 10665
1 row in set (0.04 sec)
```
4. you can set query time , prevent queries from taking too long .
```
fe.conf : query_metadata_name_ids_timeout
the time used to obtain all tables in one database
```
5. add information_schema.profiling in order to Compatible with mysql
```mysql
mysql> select * from information_schema.profiling;
Empty set (0.07 sec)
mysql> set profiling=1;
Query OK, 0 rows affected (0.01 sec)
```
2023-08-31 21:22:26 +08:00
3a34ec95af
[FE](fucntion) add date_floor/ceil in FE function ( #23539 )
2023-08-31 19:26:47 +08:00
7379cdc995
[feature](nereids) support subquery in select list ( #23271 )
...
1. add scalar subquery's output to LogicalApply's output
2. for in and exists subquery's, add mark join slot into LogicalApply's output
3. forbid push down alias through join if the project list have any mark join slots.
4. move normalize aggregate rule to analysis phase
2023-08-31 15:51:32 +08:00
96c4471b4a
[feature](udf) udf array/map support decimal and update doc ( #23560 )
...
* update
* decimal
* update table name
* remove log
* add log
2023-08-31 07:44:18 +08:00
7f4f39551a
[Bug](materialized-view) fix change base schema when create mv ( #23607 )
...
* fix change base schema when create mv
* fix
* fix
2023-08-30 21:00:12 +08:00
05771e8a14
[Enhancement](Load) stream Load using SQL ( #23362 )
...
Using stream load in SQL mode
for example:
example.csv
10000,北京
10001,天津
curl -v --location-trusted -u root: -H "sql: insert into test.t1(c1, c2) select c1,c2 from stream(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql
curl -v --location-trusted -u root: -H "sql: insert into test.t2(c1, c2, c3) select c1,c2, 'aaa' from stream(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql
curl -v --location-trusted -u root: -H "sql: insert into test.t3(c1, c2) select c1, count(1) from stream(\"format\" = \"CSV\", \"column_separator\" = \",\") group by c1" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql
2023-08-30 19:02:48 +08:00
a136836770
[feature](Nereids) add two functions: char and covert ( #23104 )
...
add [char](https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/string-functions/char/?_highlight=char ) func
```
mysql> select char(68, 111, 114, 105, 115);
+--------------------------------------+
| char('utf8', 68, 111, 114, 105, 115) |
+--------------------------------------+
| Doris |
+--------------------------------------+
```
convert func
```
MySQL root@127.0.0.1:(none)> select convert(1 using gbk);
+-------------------+
| convert(1, 'gbk') |
+-------------------+
| 1 |
+-------------------+
```
2023-08-30 17:09:06 +08:00
103fa4eb55
[feature](Export) support export with nereids ( #23319 )
2023-08-29 19:36:19 +08:00
94a8fa6bc9
[bug](function) fix explode_number function return wrong rows ( #23603 )
...
before the explode_number function result is random with const value.
because the _cur_size is reset, so it's can't insert values to column.
2023-08-29 19:02:49 +08:00
cc1509ba11
[fix](view) The parameter positions of timestamp diff function to sql are reversed ( #23601 )
2023-08-29 18:30:16 +08:00
84006dd8c7
[Fix](Full compaction) Fix full compaction regressison test ( #23487 )
2023-08-29 18:27:19 +08:00
f7a3d2778a
[FIX](array)update array olapconvertor and support array nested other complex type ( #23489 )
...
* update array olapconvertor and support array nested other complex type
* update for inverted index
2023-08-29 16:18:11 +08:00
4c00b1760b
[feature](partial update) Support partial update for broker load ( #22970 )
2023-08-29 14:41:01 +08:00
7dcde4d529
[bug](decimal) Use max value as result if overflow ( #23602 )
...
* [bug](decimal) Use max value as result if overflow
* update
2023-08-29 13:26:25 +08:00
0128dd42d9
[fix](regexp_extract_all) fix be OOM when quering with regexp_extrac… ( #23284 )
2023-08-29 10:34:12 +08:00
6f3e2a30e6
[Feat](Nereids) Add leading and ordered hint ( #22057 )
...
Add leading hint and ordered hint. Usage:
select /*+ ordered / * from a join b on xxx; which will limit join order to original order
select /+ leading ({b a}) */ from a join b on xxx; which will change join order to b join a.
2023-08-28 21:04:40 +08:00
8e4c0d1e81
[Bug](materialized-view) fix divide double can not match mv ( #23504 )
...
* fix divide double can not match mv
* fix
* fix
2023-08-28 18:01:08 +08:00
3049533e63
[Bug](materialized-view) fix core dump on create materialized view when diffrent mv column have same reference base column ( #23425 )
...
* Remove redundant predicates on scan node
update
fix core dump on create materialized view when diffrent mv column have same reference base column
Revert "update"
This reverts commit d9ef8dca123b281dc8f1c936ae5130267dff2964.
Revert "Remove redundant predicates on scan node"
This reverts commit f24931758163f59bfc47ee10509634ca97358676.
* update
* fix
* update
* update
2023-08-28 14:40:51 +08:00
c05319b8eb
[fix](agg) incorrect result of bitmap_agg and bitmap_union ( #23558 )
2023-08-28 14:22:19 +08:00
f7d2c1faf6
[feature](Nereids) support select key encryptKey ( #23257 )
...
Add select key
```
- CREATE ENCRYPTKEY key_name AS "key_string"
- select key my_key
+-----------------------------+
| encryptKeyRef('', 'my_key') |
+-----------------------------+
| ABCD123456789 |
+-----------------------------+
```
2023-08-28 14:07:26 +08:00
e84989fb6d
[feature](Nereids) support map type ( #23493 )
2023-08-28 11:31:44 +08:00
d19dcd6bc1
[improve](jdbc catalog) support sqlserver uniqueidentifier data type ( #23297 )
2023-08-28 10:30:10 +08:00
a5761a25c5
[feature](move-memtable)[7/7] add regression tests ( #23515 )
...
Co-authored-by: laihui <1353307710@qq.com >
2023-08-26 17:52:10 +08:00
40be6a0b05
[fix](hive) do not split compress data file and support lz4/snappy block codec ( #23245 )
...
1. do not split compress data file
Some data file in hive is compressed with gzip, deflate, etc.
These kinds of file can not be splitted.
2. Support lz4 block codec
for hive scan node, use lz4 block codec instead of lz4 frame codec
4. Support snappy block codec
For hadoop snappy
5. Optimize the `count(*)` query of csv file
For query like `select count(*) from tbl`, only need to split the line, no need to split the column.
Need to pick to branch-2.0 after this PR: #22304
2023-08-26 12:59:05 +08:00
f32efe5758
[Fix](Outfile) Fix that it does not report error when export table to S3 with an incorrect ak/sk/bucket ( #23441 )
...
Problem:
It will return a result although we use wrong ak/sk/bucket name, such as:
```sql
mysql> select * from demo.student
-> into outfile "s3://xxxx/exp_"
-> format as csv
-> properties(
-> "s3.endpoint" = "https://cos.ap-beijing.myqcloud.com ",
-> "s3.region" = "ap-beijing",
-> "s3.access_key"= "xxx",
-> "s3.secret_key" = "yyyy"
-> );
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
| FileNumber | TotalRows | FileSize | URL |
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
| 1 | 3 | 26 | s3://xxxx/exp_2ae166e2981d4c08-b577290f93aa82ba_ |
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
1 row in set (0.15 sec)
```
The reason for this is that we did not catch the error returned by `close()` phase.
2023-08-26 00:19:30 +08:00
8af1e7f27f
[Fix](orc-reader) Fix incorrect result if null partition fields in orc file. ( #23369 )
...
Fix incorrect result if null partition fields in orc file.
### Root Cause
Theoretically, the underlying file of the hive partition table should not contain partition fields. But we found that in some user scenarios, the partition field will exist in the underlying orc/parquet file and are null values. As a result, the pushed down partition field which are null values. filter incorrectly.
### Solution
we handle this case by only reading non-partition fields. The parquet reader is already handled this way, this PR handles the orc reader.
2023-08-26 00:13:11 +08:00
00826185c1
[fix](tvf view)Support Table valued function view for nereids ( #23317 )
...
Nereids doesn't support view based table value function, because tvf view doesn't contain the proper qualifier (catalog, db and table name). This pr is to support this function.
Also, fix nereids table value function explain output exprs incorrect bug.
2023-08-25 21:23:16 +08:00
29273771f7
[Fix](multi-catalog) Fix hive incorrect result by disable string dict filter if exprs contain null expr. ( #23361 )
...
Issue Number: close #21960
Fix hive incorrect result by disable string dict filter if exprs contain null expr.
2023-08-25 21:16:43 +08:00
e1367d509f
[Fix](Full compaction) Fix full compaction by table id regressison test #23496
2023-08-25 18:07:06 +08:00
1312c12236
Revert "[fix](testcase) fix test case failure of insert null value into not null column ( #20963 )" ( #23462 )
...
* Revert "[fix](testcase) fix test case failure of insert null value into not null column (#20963 )"
This reverts commit 55a6649da962fb170ddb40fea8ef26bdc552a51a.
Mannual Revert "fix in strict mode, return error for insert if datatype convert fails (#20378 )"
This mannual reverts commit 1b94b6368f5e871c9a0fe53dd7c64409079a4c9d
* fix case failure
2023-08-25 16:47:14 +08:00
6d4f06689f
[fix](Nereids) avoid Stats NaN ( #23445 )
...
tpcds 61 plan changed:
improved from 1.75 sec to 1.67 sec
2023-08-25 16:27:34 +08:00
0ccb7262a7
[feature](Nereids) add password func ( #23244 )
...
add password function
```
select password("123");
+-------------------------------------------+
| password('123') |
+-------------------------------------------+
| *23AE809DDACAF96AF0FD78ED04B6A265E05AA257 |
+-------------------------------------------+
```
2023-08-25 14:04:49 +08:00
8ef6b4d996
[fix](json) fix json int128 overflow ( #22917 )
...
* support int128 in jsonb
* fix jsonb int128 write
* fix jsonb to json int128
* fix json functions for int128
* add nereids function jsonb_extract_largeint
* add testcase for json int128
* change docs for json int128
* add nereids function jsonb_extract_largeint
* clang format
* fix check style
* using int128_t = __int128_t for all int128
* use fmt::format_to instead of snprintf digit by digit for int128
* clang format
* delete useless check
* add warn log
* clang format
2023-08-25 11:40:30 +08:00
372f83df5c
[opt](Nereids) remove between expression to simplify planner ( #23421 )
2023-08-25 11:28:12 +08:00
37b90021b7
[fix](planner)literal expr should do nothing in substituteImpl() method ( #23438 )
...
substitute a literal expr is pointless and wrong. This pr keep literal expr unchanged during substitute process
2023-08-25 11:21:35 +08:00
18094511e7
[fix](Outfile/Nereids) fix that csv_with_names and csv_with_names_and_types file format could not be exported on nereids ( #23387 )
...
This problem is casued by #21197
Fixed an issue that `csv_with_names` and `csv_with_names_and_types` file format could not be exported on nereids optimizer when using `select...into outfile`.
2023-08-25 11:12:04 +08:00
3786ffec51
[opt](Nereids) add some array functions ( #23324 )
...
1. rename TVFProperties to Properties
2. add generating function explode and explode_outer
3. fix concat_ws could not apply on array
4. check tokenize second argument format on FE
5. add test case for concat_ws, tokenize, explode, explode_outer and split_by_string
2023-08-25 11:01:50 +08:00
7cfb3cc0aa
[fix](functions) fix function substitute for datetimeV1/V2 ( #23344 )
...
* fix
* function fe
2023-08-25 09:59:38 +08:00
bc3d397759
[fix](case) update .out file, relate to #23272 ( #23455 )
...
Co-authored-by: stephen <hello-stephen@qq.com >
2023-08-25 09:15:27 +08:00
ceb931c513
[regression-test](hdfs_tvf)append regression test that hdfs_tvf read compression file ( #23454 )
2023-08-25 09:00:21 +08:00
441a9fff6d
[fix](planner) fix now function param type error ( #23446 )
2023-08-25 00:12:21 +08:00