Commit Graph

1895 Commits

Author SHA1 Message Date
d0cd535cb9 [improvement](insert) refactor group commit stream load (#25560) 2023-10-20 13:27:30 +08:00
dc47087560 [fix](function) fix str_to_date default return type scale for nereids (#24932)
fix str_to_date default return type scale for nereids
2023-10-20 12:55:49 +08:00
7385602b19 [bug](rf) fix only min/max rf return error when has remote target (#25588) 2023-10-19 19:26:29 +08:00
e77b98be88 [fix](months_diff) fix wrong result of months_diff (#25577) 2023-10-19 14:29:47 +08:00
b45f501e51 [improvement](nereids) Support aggregate functions without from clause (#25500)
Support aggregate functions in select without from clause, here are some examples as following:

SELECT 1,  
  'a',
   COUNT(),  
   SUM(1) + 1,
   AVG(2) / COUNT(),
   MAX(3),
   MIN(4),
   RANK() OVER() AS w_rank,
   DENSE_RANK() OVER() AS w_dense_rank,
   ROW_NUMBER() OVER() AS w_row_number,
   SUM(5) OVER() AS w_sum,
   AVG(6) OVER() AS w_avg,
   COUNT() OVER() AS w_count,
   MAX(7) OVER() AS w_max,
   MIN(8) OVER() AS w_min;
2023-10-18 23:07:37 -05:00
2a442972a8 [Fix](merge-on-write) Fix some bugs about sequence column (#24915)
1. add checks and handling of sequence column in #21896 to insert statement in origin planner and Nereids planner.
2. disable drop sequence mapping column in schema change.
2023-10-18 20:40:12 +08:00
2ddd2e5079 [feature](Nereids) add map_agg function (#25246) 2023-10-18 06:44:36 -05:00
e4a83a22d1 [opt](error msg) Make data codec error clearly when load csv data can't display (#25540)
Co-authored-by: Tanya-W <tanya1218w@163,com>
2023-10-18 16:12:22 +08:00
c77590414e [fix](pipeline)fix case (#25567)
user in some case is repetitive
The order of backups may not be consistent
2023-10-18 03:03:40 -05:00
62d06584f1 [feature](fe) add function 'BitmapAgg' in nereids (#25508) 2023-10-18 14:24:27 +08:00
64aeeb971b [Fix](partial-update) Correct the alignment process when the table has sequence column and add cases (#25346)
This PR fix the alignment process during publish phase when conflict occurs during concurrent partial updates: if we encounter a row with the same key and larger value in sequence column, it means that there exists another load which introduces a row with the same keys and larger sequence column value published successfully after the commit phase of the current load. We should act as follows:

- If the columns we update include sequence column, we should delete the current row becase the partial update on the current row has been overwritten by the previous one with larger sequence column value.
- Otherwise, we should combine the values of the missing columns in the previous row and the values of the including columns in the current row into a new row.
2023-10-18 11:32:51 +08:00
b0e0a0569a [Fix](row store) Real default value should be used instead of default… (#25230)
Before this PR the default value is not correct, we should use default value in Frontend schema.
2023-10-18 10:13:44 +08:00
3225495233 [regression-test](export) Add some tests that use hive external table to read orc/parquet file exported by doris (#25431)
add some regression test:

1. Export Doris data to the orc/parquet file on HDFS with DORIS.
2. Create external table to read orc/parquet files on hive.
2023-10-18 09:59:15 +08:00
47689fd452 [refactor](jni) unified jni framework for java udf (#25302)
Use the unified jni framework to refactor java udf.
The unified jni framework takes VectorTable as the container to transform data between c++ and java, and hide the details of data format conversion.
In addition, the unified framework supports complex and nested types.
The performance of basic types remains consistent, with a 30% improvement in string types and an order of magnitude improvement in complex types.
2023-10-18 09:27:54 +08:00
18c2a13e09 [fix](multi-catalog)fix maxcompute partition filter and session creation (#24911)
add maxcompute partition support
fix maxcompute partition filter
modify maxcompute session create method
2023-10-17 22:36:10 +08:00
ce18f1148a [improvement](catalog)compatible with paimon 0.5 (#24985)
compatible with paimon 0.5
add p0 for paimon,need set enablePaimonTest=true
2023-10-17 22:07:13 +08:00
b74836050a [chore](config) turnoff fuzzy for enable_simdjson_reader (#25521) 2023-10-17 18:42:11 +08:00
9d6b2dceb2 [fix](Nereids) non-slot filter should not be push through aggregate (#25525) 2023-10-17 05:02:26 -05:00
af8832389f [feature](Nereids) add 4 array functions (#25488)
- array_concat
- array_pushback
- array_pushfront
- array_zip
2023-10-17 04:45:15 -05:00
652d6c57c0 [fix](jdbc catalog) fix handle oracle date format (#25487) 2023-10-17 02:10:28 -05:00
0ee06f30b0 [feature](nereids)Ignore some node in 'explain shape plan' command (#25485)
if set ignore_shape_nodes='PhysicalDistribute, PhysicalProject'
then
explain shape plan will not print project and distribute node
2023-10-17 11:57:36 +08:00
410441b516 [enhancement](Nereids): remove LAsscom in Bushy Tree RuleSet (#25465)
- Bushy Tree RuleSet don't need LAsscom
- fix bug: rule pattern shouldn't use same name
2023-10-17 11:22:52 +08:00
384fddb2ff [test](case)add some debug log in mv case (#25458)
* [test](case)change the insert stmt in mv case
2023-10-17 11:04:45 +08:00
a383a2bc83 [cases](regresstest)add json format regress test for nested types (#25397) 2023-10-17 10:16:52 +08:00
a364a24ac2 [Enhance](regression) add hive out file check (#25475)
add hive out file check
fix hive sql state with " ; "
2023-10-17 10:11:57 +08:00
ef7d8aa99a [fix](be)confix bug of converting outer join probe block to nullable (#25492)
_do_evaluate will add temp result column into original table block, so in order to only convert correct columns to be nullable, need call convert_block_to_null before _do_evaluate
2023-10-17 10:10:56 +08:00
85b8497624 [fix](Tvf) return empty set when tvf queries an empty file or an error uri (#25280)
### Before:
return errors when tvf queries an empty file or an error uri:
1. get parsed schema failed, empty csv file
2. Can not get first file, please check uri.

### Now:
we just return empty set when tvf queries an empty file or an error uri.
```sql
mysql> select * from s3( 
"uri" = "https://error_uri/exp_1.csv", 
"s3.access_key"= "xx", 
"s3.secret_key" = "yy", 
"format" = "csv") limit 10;

Empty set (1.29 sec)
```
2023-10-17 09:52:53 +08:00
Pxl
72920fbd1d [Improvement](materialized-view) set job failed when toAgentTaskRequest meet error (#25358)
set job failed when toAgentTaskRequest meet error
2023-10-16 20:10:52 +08:00
b2e3ecb81d [opt](load)change load_to_single_tablet tablet search algorithm from random to round-robin (#25256)
At present, `load_to_singlt_tablet` import implementation refers to simple random number remainder, which cannot achieve true averaging. This will lead to uneven disk IO and uneven use of cluster resources. To solve this problem, we are preparing to implement round-robin for each partition tablet imported each time, in order to achieve average load to each tablet.

When generating the load query plan, the tablet index record currently imported is passed to BE.
Add a deamon task in FE to regularly clean up the `loadTabletRecordMap`. The map will get the bucket_number of the partition and update the `load_tablet_index` when `getCurrentLoadTabletIndex`.
2023-10-16 16:43:25 +08:00
e8431e1a97 [fix](planner)should not add TupleIsNullPredicate for inlineview plan (#25338) 2023-10-16 15:24:13 +08:00
Pxl
292ccaeda8 insert default when json array parse failed (#25447)
insert default when json array parse failed
2023-10-16 14:51:26 +08:00
0aa50fb256 [fix](nereids)fix regression case: eliminate_outer_join (#25208) 2023-10-16 14:08:36 +08:00
e94fbe169e [Enhance](regression) add hms catalog broker scan case (#25453) 2023-10-16 12:35:46 +08:00
29d4e8ee90 [Fix](Nereids) fix test leading change disable join reorder parameter (#23657)
Problem:
when running pipeline, we get randomly failed of test_leading
Reason:
physical distribute was generated and choosed to be the best plan because we can not get any statistic information of empty table. So we would get some unexpect result because we can not expect the order in memo
Solved:
Add statistic of columns used in test_leading, try repeatly in pipeline
2023-10-15 22:59:45 -05:00
c482c22a74 [case](regresscases) add regress cases for nested type nested type with csv format (#25355)
this pr
1.  fix use podarray push_back() with back() will make heap_use_after_free when podarray is reach capacity which would may make heap free 
2. add cases for csv format for nested types. and csv file has two define which are without quote or just like json text
2023-10-16 11:13:44 +08:00
4c57c31c5c [fix](Nereids) count should not accept complex and json type (#25354) 2023-10-15 22:08:35 -05:00
dfc7d04626 [fix](functions) add quantile_state_empty function signature (#25306) 2023-10-16 11:05:48 +08:00
9649e09aaa [feature](function) support bitmap type in min/max_by agg function (#25430)
support bitmap type in min/max_by agg function
2023-10-16 11:05:32 +08:00
e5ef0aa6d4 [refactor](mysql result format) use new serde framework to tuple convert (#25006) 2023-10-14 19:46:42 +08:00
b946521a56 [enhancement](regression-test) add single stream multi table case (#25360) 2023-10-14 10:59:50 +08:00
283bd59eba [improvement](scanner) Remove the predicate that is always true for the segment (#25366)
By utilizing the zonemap index of the segment, we can ascertain if a predicate is always true. For example, if the segment’s maximum value is 100 and the predicate is col < 101, then this predicate is always true for this segment.
2023-10-13 15:25:38 +08:00
6f9a084d99 [Fix](Outfile) Use data_type_serde to export data to parquet file format (#24998) 2023-10-13 13:58:34 +08:00
509a79988e [FIX](regresstest) fix cases for test_nested_types_insert_into_with_s3 (#25228) 2023-10-13 11:39:29 +08:00
c6824ce1ae [test](fix) unstable case test_jdbc_query_mysql (#25279) 2023-10-12 03:56:38 -05:00
42f8b253aa [function](nereids) support array_apply/array_repeat/group_uniq_array/ipv4numtostring (#25249)
nereids support functions: array_apply/array_repeat/group_uniq_array/ipv4numtostring
2023-10-12 11:08:42 +08:00
Pxl
a0d2b1ec56 [Bug](materialized-view) fix not match mv when some alias on agg (#25321)
fix not match mv when some alias on agg
2023-10-12 11:02:55 +08:00
7ca63665b4 [fix](agg) garbled characters in result of map_agg (#25318) 2023-10-12 10:10:55 +08:00
73c3e3ab55 [Feature](x-load) support config min replica num for loading data (#21118) 2023-10-11 21:07:35 +08:00
ba87f7d3a3 [fix](pipelineX) add table sink and some fix in pipelineX (#25314) 2023-10-11 20:18:08 +08:00
f680a2141d [enhancement](regression-test) add routine load json case (#25253) 2023-10-11 19:43:08 +08:00