Commit Graph

2294 Commits

Author SHA1 Message Date
97fa840324 [feature](multi-catalog)support iceberg hadoop catalog external table query (#22949)
support iceberg hadoop catalog external table query
2023-08-20 19:29:25 +08:00
0838ff4bf4 [fix](Outfile) fix bug that the fileSize is not correct when outfile is completed (#22951) 2023-08-18 22:31:44 +08:00
10abbd2b62 [Feauture](Export) support parallel export job using Job Schedule (#22854) 2023-08-18 22:24:42 +08:00
7c4870c371 [fix](catalog) fix hive partition prune bug on nereids (#23026) 2023-08-18 18:31:01 +08:00
419e922a69 [fix](json)Fix the bug that does not stop when reading json files (#23062)
* [fix](json)Fix the bug that does not stop when reading json files
2023-08-18 18:23:19 +08:00
4d3113c6e5 Update test_segcompaction_dup_keys_index.groovy (#23046) 2023-08-18 16:52:26 +08:00
18f47f3e6e Update regression-conf.groovy (#23057) 2023-08-18 15:51:17 +08:00
1c3cc77a54 [fix](function) to_bitmap parameter parsing failure returns null instead of bitmap_empty (#21236)
* [fix](function) to_bitmap parameter parsing failure returns null instead of bitmap_empty

* add ut

* fix nereids

* fix regression-test
2023-08-18 14:37:49 +08:00
a7771ea507 [fix](planner) fix current_timestamp param type mismatch when doing stream load (#23092)
FileLoadScanNode did not analyze the default value expr, result in target param type int32 become int8 as the original IntLiteral type.
2023-08-18 14:28:45 +08:00
a8d63ef93b [fix](case) Update test_dup_tab_auto_inc_10000.groovy, add sync after streamload #23082 2023-08-18 14:20:31 +08:00
795006ea3d [fix](multi-catalog) conversion of compatible numerical types (#23113)
Hive support schema change, but doesn't rewrite the parquet file, so the physical type of parquet file may not equal the logical type of table schema.
2023-08-18 14:05:33 +08:00
2d96d19030 [FIX](array-func) fix array() with decimal type (#23117)
if we write sql with : select array(1.0,2.0,null, null,2.0)
here will pass arg type with uint8 to be which does not match array() func sign with deicmal, and make be core. so here should cast from be and make null tag to cast decimal type
2023-08-18 12:12:50 +08:00
Pxl
59c6139aa5 [Chore](parser) fix create view failed when view contained cast as varchar (#23043)
fix create view failed when view contained cast as varchar
2023-08-18 11:50:18 +08:00
e6fe8c05d1 [fix](inverted index change) fix update delete bitmap incompletely when build inverted index on mow table (#23047) 2023-08-18 11:15:39 +08:00
d018ac8fb7 fix show grants throw NullPointerException (#22943) 2023-08-18 10:48:56 +08:00
314f5a5143 [Fix](orc-reader) Fix filling partition or missing column used incorrect row count. (#23096)
[Fix](orc-reader) Fix filling partition or missing column used incorrect row count.

`_row_reader->nextBatch` returns number of read rows. When orc lazy materialization is turned on, the number of read rows includes filtered rows, so caller must look at `numElements` in the row batch to determine how
many rows were not filtered which will to fill to the block.

In this case, filling partition or missing column used incorrect row count which will cause be crash by `filter.size() != offsets.size()` in filter column step.

When orc lazy materialization is turned off, add `_convert_dict_cols_to_string_cols(block, nullptr)` if `(block->rows() == 0)`.
2023-08-17 23:26:11 +08:00
b91bb9f503 [fix](alter table property) fix alter property if rpc failed (#22845)
* fix alter property

* add regression case

* do not repeat
2023-08-17 18:02:34 +08:00
11d76d0ebe [fix](Nereids) non-inner join should not merge dist info (#22979)
1. left join should use left dist info.
2. right join should use right dist info.
3. full outer join should return ANY dist info.
2023-08-17 17:48:50 +08:00
330f369764 [enhancement](file-cache) limit the file cache handle num and init the file cache concurrently (#22919)
1. the real value of BE config `file_cache_max_file_reader_cache_size` will be the 1/3 of process's max open file number.
2. use thread pool to create or init the file cache concurrently.
    To solve the issue that when there are lots of files in file cache dir, the starting time of BE will be very slow because
    it will traverse all file cache dirs sequentially.
2023-08-17 16:52:08 +08:00
d7a6b64a65 [Fix](Planner) fix case function with null cast to array null (#22947) 2023-08-17 16:37:07 +08:00
e289e03a1a [fix](executor)fix no return with old type in time_round 2023-08-17 15:34:26 +08:00
d59c2f763f [fix](test) add sync for test_pk_uk_case (#23067) 2023-08-17 15:18:07 +08:00
92c8f842f7 [fix](nereids) dphyper join reorder use wrong method to get hash and other conjuncts (#22966)
should use getHashJoinConjuncts() and getOtherJoinConjuncts() to get hash and other conjuncts of hash join node instead of categorizing them by checking if it's 'EqualTo' expression
2023-08-17 11:03:45 +08:00
a288377118 [fix](regresstion) Fix sql server external case (#23031) 2023-08-17 10:54:54 +08:00
343a6dc29d [improvement](hash join) Return result early if probe side has no data (#23044) 2023-08-17 09:17:09 +08:00
390c52f73a [Improve](complex-type) update for array/map element_at with nested complex type with local tvf (#22927) 2023-08-16 20:47:36 +08:00
f1880d32d9 [fix](nereids)bind slot failed because of "default_cluster" #23008
slot bind failed for following querys:
select tpch.lineitem.* from lineitem
select tpch.lineitem.l_partkey from lineitem

the unbound slot is tpch.lineitem.l_partkey, but the bounded slot is default_cluster:tpch.lineitem.l_partkey. They are not matched.
we need to ignore default_cluster: when compare dbName
2023-08-16 17:22:44 +08:00
92f443b3b8 [enhancement](Nereids): count(1) to count(*) #22999
add a rule to transform count(1) to count(*)
2023-08-16 17:19:23 +08:00
4510e16845 [improvement](delete) support delete predicate on value column for merge-on-write unique table (#21933)
Previously, delete statement with conditions on value columns are only supported on duplicate tables. After we introduce delete sign mechanism to do batch delete, a delete statement with conditions on value columns on unique tables will be transformed into the corresponding insert into ..., __DELETE_SIGN__ select ... statement. However, for unique table with merge-on-write enabled, the overhead of inserting these data can be eliminated. So this PR add the ability to allow delete predicate on value columns for merge-on-write unique tables.
2023-08-16 12:18:05 +08:00
3efa06e63e [Fix](View)varchar type conversion error (#22987) 2023-08-16 11:49:04 +08:00
c41179b8e9 [fix](regression) Improve the robustness when close target connection (#23012) 2023-08-16 11:42:58 +08:00
221e7bdd17 [test](jdbc external) fix mysql and pg external regression test (#22998) 2023-08-16 10:44:47 +08:00
cb6678adb9 [fix](case) Update repositoryAffinityList1.sql (#22941) 2023-08-16 09:23:46 +08:00
c8c46e042d [Improve](regress-test)add regress test for map_agg with nested type and insert to doris inner table #23006 2023-08-16 09:21:02 +08:00
423002b20a [fix](nereids) partitionTopN & Window estimation (#22953)
* partitionTopN & winExpr estimation

* tpcds 44/47/57
2023-08-15 20:19:03 +08:00
80566f7fed [stats](nereids)support partition stats (#22606) 2023-08-15 17:52:25 +08:00
7de362f646 [fix](Nereids): expand other join which has or condition (#22809) 2023-08-15 16:49:19 +08:00
f1864d9fcf [fix](function) fix str_to_date with specific format #22981 2023-08-15 15:30:48 +08:00
9b42093742 [feature](agg) Make 'map_agg' support array type as value (#22945) 2023-08-15 14:44:50 +08:00
707a527775 [FIX](map)insert into doris table with array/map type by local tvf (#22955) 2023-08-15 13:11:23 +08:00
b49dc8042d [feature](load) refactor CSV reading process during scanning, and support enclose and escape for stream load (#22539)
## Proposed changes

Refactor thoughts: close #22383
Descriptions about `enclose` and `escape`: #22385

## Further comments

2023-08-09: 
It's a pity that experiment shows that the original way for parsing plain CSV is faster. Therefor, the refactor is only applied on enclose related code. The plain CSV parser use the original logic.

Fallback of performance is unavoidable anyway. From the `CSV reader`'s perspective, the real weak point may be the write column behavior, proved by the flame graph.
 
Trimming escape will be enable after fix: #22411 is merged

Cases should be discussed: 

1. When an incomplete enclose appears in the beginning of a large scale data, the line delimiter will be unreachable till the EOF, will the buffer become extremely large?
2. What if an infinite line occurs in the case? Essentially,  `1.` is equivalent to this.  

Only support stream load as trial in this PR, avoid too many unrelated changes. Docs will be added when `enclose` and `escape` is available for all kinds of load.
2023-08-15 09:23:53 +08:00
fa6110accd [fix](catalog)paimon support more data type (#22899) 2023-08-14 13:48:33 +08:00
Pxl
49d503911e [MV](exec) disable create mv with select star (#22895) 2023-08-13 19:28:51 +08:00
bddab94121 [Enhancement](partial update) Support including delete sign column in partial update stream load (#22874) 2023-08-13 10:32:21 +08:00
41ff48f838 [regresstion][external]fix case test_show_where and es_query 0811 (#22898) 2023-08-12 19:41:55 +08:00
23094a01d4 [fix](test) load data inpath will remove the data in hdfs (#22908)
Load data from hdfs in hive will move the source directory into table's location directory, leading the error like Can not get first file, please check uri in tvf test.
2023-08-12 15:12:00 +08:00
5e2748d2b4 [Improve](complex-type)update orc reader for complex type and add regress tests (#22856) 2023-08-12 07:06:12 +08:00
5b09254fac [improvement](external statistics)Fix external stats collection bugs (#22788)
1. Collect external table row count when execute analyze database.
2. Support show cached table stats (row count)
3. Support alter external table column stats.
4. Refresh/Invalidate table row count stat memory cache when analyze task finished and drop table stats.
2023-08-11 21:58:24 +08:00
44475b64ef [fix](pg test) fix postgresql jdbc catalog test case (#22875) 2023-08-11 20:50:47 +08:00
28561f77e9 [fix](regression)fix test_hdfs_tvf regression_test out file : decimalv3 -> decimal (#22852) 2023-08-11 20:44:18 +08:00