Commit Graph

464 Commits

Author SHA1 Message Date
e3d764aac5 [test](jdbc) add new jdbc case in other source (#14443) 2022-11-21 21:33:06 +08:00
Pxl
bcd641877f [Enhancement](scan) disable build key range and filters when push down agg work (#14248)
disable build key range and filters when push down agg work
2022-11-21 12:47:57 +08:00
3f29e3bff6 [bug](test) fix regression test of jdbc postgresql table core (#14417) 2022-11-20 23:03:14 +08:00
5dfe5ef965 [test](hive catalog)add hive catalog test case (#14217) 2022-11-19 17:26:18 +08:00
512b787559 [fix](parquet-reader) fix stack-use-after-return error (#14411) 2022-11-19 10:52:50 +08:00
b4aef889f2 [feature-array](array-function) add array constructor function array() (#14250)
* [feature-array](array-function) add array constructor function `array()`

```
mysql>  select array(qid, creationDate) from nested_c_2  limit 10;
+------------------------------+
| array(`qid`, `creationDate`) |
+------------------------------+
| [1000038, 20090616074056]    |
| [1000069, 20090616075005]    |
| [1000130, 20090616080918]    |
| [1000145, 20090616081545]    |
+------------------------------+
10 rows in set (0.01 sec)
```
2022-11-19 10:49:50 +08:00
02372ca2ea [test](jdbc external table) add new jdbc mysql external table (#14323) 2022-11-19 09:46:48 +08:00
68da6bccb7 [fix](type) fix DECIMAL scale when cast function on fe (#12877)
before:
MySQL [test]> select cast('135.759999999' as DECIMAL(10,3));
+----------------------------------------+
| CAST('135.759999999' AS DECIMAL(10,3)) |
+----------------------------------------+
| 135.759999999 |
+----------------------------------------+
1 row in set (0.00 sec)

now:
MySQL [stage]> select cast('135.759999999' as DECIMAL(10,3));
+----------------------------------------+
| CAST('135.759999999' AS DECIMAL(10,3)) |
+----------------------------------------+
| 135.759 |
+----------------------------------------+
1 row in set (0.01 sec)
2022-11-18 19:36:14 +08:00
eab0af7afe [optimization](array-type) optimize the export precision of floating point numbers (#14261)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-11-18 18:24:11 +08:00
2c4236fd24 [improvement](ctas) use string type for varchar/char/string (#14382)
When executing create table as select stmt,
the varchar/char/string type of column in created table will be unified to string type.

Because when select from external table (mysql/pg, etc), the length of varchar in external database
is calculated by "char" length, not "byte" length.
So if there is a column with varchar(10) in external table, then there will be a same varchar(10)
in created table. But the byte length of data in external table may be larger than 10, causing failure of CTAS.

Change to string will not impact performance of the capacity of disk storage.
And notice that if a string type column is the first column, it will be changed to varchar(65535),
because we do not allow string type column as sort key column.
2022-11-18 14:20:13 +08:00
a1d02f36ac [feature](table-valued-function) support hdfs() tvf (#14213)
This pr does two things:
1. support `hdfs()` table valued function.
2. add regression test
2022-11-18 14:17:02 +08:00
da0b09caea [fix](Nereids) DateTimeType migrate to DateType is wrong when hour, minute and second all zero (#14327)
1. fix DateTimeType migrate to DateType is wrong when hour, minute and second all zero
2. add TPC-H regression test with DATEV2 type
2022-11-18 01:38:03 +08:00
fb140d0180 [Enhancement](sequence-column) optimize the use of sequence column (#13872)
When you create the Uniq table, you can specify the mapping of sequence column to other columns.
You no longer need to specify mapping column when importing.
2022-11-17 22:39:09 +08:00
50bfd99b59 [feature](join) support nested loop semi/anti join (#14227) 2022-11-17 22:20:08 +08:00
44ee4386f7 [test](multi-catalog)Regression test for external hive orc table (#13762)
Add regression test for external hive orc table. This PR has generated all basic types support by hive orc, and create a hive external table to touch them in docker environment.
Functions to be tested:
1. Ensure that all types are parsed correctly
2. Ensure that the null map of all types are parsed correctly
3. Ensure that the `SearchArgument` of `OrcReader` works well
4. Only select partition columns
2022-11-17 20:36:02 +08:00
7182f14645 [improvement][fix](multi-catalog) speed up list partition prune (#14268)
In previous implementation, when doing list partition prune, we need to generation `rangeToId`
every time we doing prune.
But `rangeToId` is actually a static data that should be create-once-use-every-where.

So for hive partition, I created the `rangeToId` and all other necessary data structures for partition prunning
in partition cache, so that we can use it directly.

In my test, the cost of partition prune for 10000 partitions reduce from 8s -> 0.2s.

Aslo add "partition" info in explain string for hive table.
```
|   0:VEXTERNAL_FILE_SCAN_NODE                           |
|      predicates: `nation` = '0024c95b'                 |
|      inputSplitNum=1, totalFileSize=4750, scanRanges=1 |
|      partition=1/10000                                 |
|      numNodes=1                                        |
|      limit: 10                                         |
```

Bug fix:
1. Fix bug that es scan node can not filter data
2. Fix bug that query es with predicate like `where substring(test2,2) = "ext2";` will fail at planner phase.
`Unexpected exception: org.apache.doris.analysis.FunctionCallExpr cannot be cast to org.apache.doris.analysis.SlotRef`

TODO:
1. Some problem when quering es version 8: ` Unexpected exception: Index: 0, Size: 0`, will be fixed later.
2022-11-17 08:30:03 +08:00
442b844b22 [regressiontest](delete)delete-where-in-test (#14036)
* delete-where-in-test

* Update test_delete_where_in.groovy

* Update test_delete_where_in.groovy
2022-11-15 18:35:31 +08:00
3ea9d3f2e1 [enhancement](array) support read list(Array) type from orc file (#14132)
Before this pr, if we try to load ORC file with native list(or array) type data, the be will crash.
Because complex types in ORC file include multi real columns, so we need to filter columns by column names.
Otherwise we could not read all columns we need.
Now arrow release-7.0.0 only support create stripe reader by column index, so we patch it to support create stripe reader by column names.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-11-15 17:48:17 +08:00
70cc725649 [Vectorized](function) support avg_weighted/percentile_array/topn_wei… (#14209)
* [Vectorized](function) support avg_weighted/percentile_array/topn_weighted functions

* update add to stringRef
2022-11-15 16:38:38 +08:00
5badd70db2 [fix](csv-reader) Fix core dump when load text into doris with special delimiter (#14196) 2022-11-15 16:06:59 +08:00
f86886f8f5 [Feature](function) Support array_compact function (#14141) 2022-11-15 14:24:37 +08:00
a3062c662c [feature-wip](statistics) support statistics injection and show statistics (#14201)
1. Reduce the configuration options for statistics framework, and add comment for those rest.
2. Move the logic of creation of analysis job to the `StatisticsRepository` which defined all the functions used to interact with internal statistics table
3. Move AnalysisJobScheduler to the statistics package
4. Support display and injections manually for statistics
2022-11-15 11:29:51 +08:00
215a4c6e02 [Bug](BHJ) Fix wrong result when use broadcast hash join for naaj (#14253) 2022-11-15 09:40:00 +08:00
93e5d8e660 [Vectorized](function) support bitmap_from_array function (#14259) 2022-11-15 01:55:51 +08:00
30f36070b5 [test](multi-catalog)Regression test for external hive parquet table (#13611) 2022-11-14 14:10:10 +08:00
23a8c7eeb6 (fix)(multi-catalog)(es) Fix error result because not used fields_context (#14229)
Fix error result because not used fields_context
2022-11-14 14:00:55 +08:00
8263c34da6 [fix](ctas) use json_object in CTAS get wrong result (#14173)
* [fix](ctas) use json_object in CTAS get wrong result

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2022-11-14 09:13:05 +08:00
376b4fda9f [fix](scankey) fix extended scan key errors. (#14200)
Issue Number: close #14199
2022-11-12 20:44:09 +08:00
082028b2a2 [test](jdbc postgresql case)add jdbc test case for postgresql (#14162) 2022-11-12 20:43:13 +08:00
78fa167b0a [test](jdbc external table) add jdbc regression test case (#14086) 2022-11-12 20:42:57 +08:00
43490a33a5 [feature-array](array-type) Add array function array_with_constant (#14115)
Return array of constants with length num.

```
mysql> select array_with_constant(4, 1223);
+------------------------------+
| array_with_constant(4, 1223) |
+------------------------------+
| [1223, 1223, 1223, 1223]     |
+------------------------------+
1 row in set (0.01 sec)
```
co-authored-by @eldenmoon
2022-11-11 22:08:43 +08:00
0ba13af8ff [feature](running_difference) support running_difference function (#13737) 2022-11-11 21:22:56 +08:00
2e29b15c6a [test](array function)add array_range function test (#14123)
* add array_range function test

* add array_range function test
2022-11-11 18:04:33 +08:00
b6ba654f5b [Feature](Sequence) Support sequence_match and sequence_count functions (#13785) 2022-11-11 13:38:45 +08:00
8e17fcef3f [fix](cast)fix cast to char(N) error (#14168) 2022-11-11 11:27:51 +08:00
1ef85ae1f2 [Improvement](join) Support nested loop outer join (#13965) 2022-11-10 19:50:46 +08:00
84b969a25c [fix](grouping)the grouping expr should check col name from base table first, then alias (#14077)
* [fix](grouping)the grouping expr should check col name from base table first, then alias

* fix fe ut, the behavior would be same as mysql
2022-11-10 11:10:42 +08:00
df622d8b7d [Bug](udf) fix java-udaf process string type error and add some tests (#14106) 2022-11-10 09:30:57 +08:00
322ac5cf89 [refractor](array) refractor DataTypeArray from_string (#13905)
refractor DataTypeArray from_string, make it more clear;
support ',' and ']' inside string element, for example: ['hello,,,', 'world][]']
support empty elements, such as [,] ==> [0,0]
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-11-09 16:58:08 +08:00
55ca810445 [fix](Vectorized)fix json_object and json_array function return wrong result on vectorized engine (#13775)
Issue Number: close #13598
2022-11-09 11:26:55 +08:00
572f491756 [fix](ctas) text column type len = 1 when create table as select (#13906)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2022-11-09 09:09:34 +08:00
151842a1fe [feature](inverted index)WIP inverted index api: SQL syntax and metadata (#13430)
Introduce a SQL syntax for creating inverted index and related metadata changes.

```
-- create table with INVERTED index 

CREATE TABLE httplogs (
  ts datetime,
  clientip varchar(20),
  request string,
  status smallint,
  size int,
  INDEX idx_size (size) USING INVERTED,
  INDEX idx_status (status) USING INVERTED,
  INDEX idx_clientip (clientip) USING INVERTED PROPERTIES("parser"="none")
)
DUPLICATE KEY(ts)
DISTRIBUTED BY RANDOM BUCKETS 10

-- add an INVERTED index  to a table

CREATE INDEX idx_request ON httplogs(request) USING INVERTED PROPERTIES("parser"="english");
```
2022-11-08 23:46:53 +08:00
115c6bd411 [fix](keyranges) fix the split error of keyranges (#14049)
fix the split error of keyranges
2022-11-08 22:09:16 +08:00
f7ecb6d79f [Bug](Bitmap) fix sub_bitmap calculate wrong result to return null (#13978)
fix sub_bitmap calculate wrong result to return null
2022-11-08 14:10:12 +08:00
34f43ac781 [bug](like function)fix like '' (empty string) get wrong result with all rows #14035 2022-11-08 08:51:39 +08:00
1c2532b9dc [Bug](udf) Make UDF's type always nullable (#14002) 2022-11-07 20:51:31 +08:00
bb9182d602 [fix](repeat)remove unmaterialized expr from repeat node (#13953) 2022-11-07 14:13:05 +08:00
e8d2fb6778 [feature](function)add search functions: multi_search_all_positions & multi_match_any (#13763)
Co-authored-by: yiliang qiu <yiliang.qiu@qq.com>
2022-11-07 11:50:55 +08:00
7ffe88b579 [feature-array](array-type) Add array function array_popback (#13641)
Remove the last element from array.

```
mysql> select array_popback(['test', NULL, 'value']);
+-----------------------------------------------------+
| array_popback(ARRAY('test', NULL, 'value')) |
+-----------------------------------------------------+
| [test, NULL]                                        |
+-----------------------------------------------------+
```
2022-11-07 10:48:16 +08:00
1724faf9a5 [test](java-udf)add java udf RegressionTest about the currently supported data types #13972 2022-11-05 19:25:58 +08:00