Commit Graph

13721 Commits

Author SHA1 Message Date
fc12362a6d [feature-wip](arrow-flight)(step2) FE support Arrow Flight server (#24314)
This is a POC, the design documentation will be updated soon
2023-09-20 14:42:54 +08:00
a3361df7b9 [Feat](Nereids) support json and jsonb datatype (#24156)
Feature:
support jsonb and json type in nereids

Document:
this feature supports these two datatype in nereids optimizer like original planner, the sql reference is same as before
[JSON - Apache Doris](https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-reference/Data-Types/JSON)
2023-09-20 14:32:22 +08:00
e9435c14f8 [Improve](array-func)improve array union support multi params (#24327) 2023-09-20 14:29:48 +08:00
ca56921481 [docs](partition) Auto partition docs (#24574) 2023-09-20 14:28:23 +08:00
8aea31e383 [fix](timezone) fix timezone parse when there is no tzfile (#24578) 2023-09-20 14:28:12 +08:00
aa9f2260ea [fix](multi-catalog)Es catalog needs to verify whether it is a valid configuration. (#24309) 2023-09-20 14:20:57 +08:00
df66922bc0 [Chore](sonar)sonar (C++) configuration file name error (#24662)
FYI https://community.sonarsource.com/t/project-root-configuration-file-none/99389
2023-09-20 13:58:30 +08:00
26ca0b2780 Add some block counter (#24465)
Add some block counter (#24465)
2023-09-20 13:23:01 +08:00
deafa2dd88 [fix](Nereids) fix row count unconsistent when join ordering (#24589)
In the context of reorder join, when a new plan is generated, it may include a project operation. In this case, the newly generated join root and the original join root will no longer be in the same group. To avoid inconsistencies in the statistics between these two groups, we keep the child group's row count unchanged when the parent group expression is a project operation.
2023-09-20 13:11:35 +08:00
901ee7a8d3 [regression](pipelineX) disable pipelineX test cases (#24654) 2023-09-20 13:01:08 +08:00
c0df8fca20 [pipelineX](fix) Fix potential concurrent problem (#24651) 2023-09-20 13:00:58 +08:00
c704497d02 [fix](csv_reader)Fixed bug when parsing multi-character delimiters. (#24572)
Fixed bug when parsing multi-character delimiters.
2023-09-20 12:41:35 +08:00
075552ead4 [feature](partitions)support batch delete partition (#23986)
ALTER TABLE example_db.my_table
DROP PARTITION p1,
DROP PARTITION p2,
DROP PARTITION p3;
2023-09-20 11:45:52 +08:00
0fb79e4011 [fix](broker-load) fix file offset for compressed file #24564
Co-authored-by: Kang <kxiao.tiger@gmail.com>
2023-09-20 11:41:52 +08:00
a2e29d171a [enhancement](be-meta) sync rocksdb by default to protect data (#24571)
If performance of user's disks is low, users can change the config
to false, this way users know what would happen if a kernel panic.
2023-09-20 11:41:26 +08:00
b7ca4fcc8d [fix](io): use try with resource make io stream close automatically to avoid resource leak (#24605) 2023-09-20 11:39:03 +08:00
848290d8a8 [Fix](nereids) Support partial update for insert into table (#24594) 2023-09-20 11:35:09 +08:00
b02398ba85 [fix](planner) statement run successful but log error msg in audit log (#24628)
legacy planner will set error msg when throw AnalysisException.
However, in some place, we catch these exception and muted them.
So, we should reset back error msg and error code.
2023-09-20 11:32:47 +08:00
5a0ccd702c [typo](docs) fix error in routine load doc (#24623) 2023-09-20 11:13:14 +08:00
8316aad417 [chore](macOS) Fix linkage errors (#24642)
Issue Number: close #24643
2023-09-20 10:50:10 +08:00
9a4a4c0760 [opt](Nereids)skip unknown col stats check on __internal_scheam and information_schema (#24625)
columns in __internal_scheam and information_schema do not have column stats
2023-09-20 10:48:05 +08:00
c41cadb64d [fix](broker) fix broker read issue (#24635)
The given "length" of broker's pread() method is the buffer length, not the length required from file.
So it may larger than the file length.
So we should return all read data, instead of return EOF when `read()` method return -1

I will add regression test case later when the framework support broker process.
2023-09-20 10:43:16 +08:00
c3b3f0f00a [enhancement](serialize) add dcheck to ensure pb type is set (#24645)
should check the pb's type is set, or the deserialize will core.
should not return unknown type because deserialize will core.
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-09-20 10:42:28 +08:00
49f6eda843 [fix](nested_join) incorrect result of semi/anti mark join (#24616) 2023-09-20 10:41:06 +08:00
14bd290aec [feature](jsonb)support json_length and json_contains function (#24332) 2023-09-20 10:40:44 +08:00
e59aa49f28 [feature](datetime-func)support milliseconds_add/sub/diff and microseconds_diff (#24114) 2023-09-20 10:38:56 +08:00
a71d7f2beb [pipelineX](operator) support partition sort operator and distinct streaming agg operator (#24544) 2023-09-20 09:50:51 +08:00
7e17e0d3f7 [fix](Nereids) select outfile column order is wrong (#24595) 2023-09-20 09:27:40 +08:00
527b284e90 [improvement](jdbc catalog) Extend conjunctExprToString to Support both 'AND' and 'OR' with Optimized DateLiteral Handling (#24537) 2023-09-19 23:11:44 +08:00
4f215a7dc3 [Improve](Fe)Ensure that only one FE process uses the metedata file (#24442) 2023-09-19 23:11:20 +08:00
1a553f7e14 [Improve](start-shell)Optimize fe&be startup (#24556)
- sh start_fe/start_be --console is used to instruct the program to run in console mode.
- sh start_fe/start_be --daemon is used to instruct the program to run in daemon mode.
- sh start_fe/start_be used starts as a background execution, records output and error logs to the specified file
2023-09-19 23:00:59 +08:00
420914abfc [Fix](RoutineLoad)multi-table query table error (#24538)
multi-table will take all tables and then convert them into OlapTable, thus causing View type conversion errors.
2023-09-19 22:57:13 +08:00
19ccb9517f [fix](iceberg) should call UserGroupInformation when enable security authentication (#24614)
Fix two bugs:
1. Call `UserGroupInformation.doAs` when enable security authentication
2. `catalogId` is 0 when `IcebergExternalCatalog` is loaded from fe image
2023-09-19 22:39:58 +08:00
32c6f5f905 [opt](test) set longer timeout for hive query cache test case (#24569)
Sometimes the first run of query may be longer then former given threshold, which case test fail.
Also add a new session variable test_query_cache_hit

So that we can use it to test if cache is hit in regression test
2023-09-19 22:25:18 +08:00
71dcb58db9 [improvement](scanner_schedule) reduce memory consumption of scanner (#24199)
* [improvement](scanner_schedule) reduce memory consumption of scanner

1. limit scanner by memory consumptin rather than blocks.
2. scheduler run correcty instread of at lest 1.
2023-09-19 21:36:23 +08:00
8afdfd58e2 [fix](case) ensure jar downloaded (#24475)
ensure jar downloaded
2023-09-19 21:26:12 +08:00
8c502f65f2 [Fix](metrics) fix wrong timer metrics for _seek_columns (#24622) 2023-09-19 20:59:09 +08:00
c3bd2a22d4 [feature](Nereids) add many array functions (#24301)
Add function array_filter, array_sortby, array_last_index, array_first_index, array_orderby, array_count
2023-09-19 18:58:49 +08:00
c9f5142420 [Imporve](UNIX_TIMESTAMP) UNIX_TIMESTAMP func support 'yyyy-MM-dd HH:mm:ss' format (#24561)
UNIX_TIMESTAMP function data format parameter supports 'yyyy-MM-dd HH:mm:ss'
The implementation is the same as the date_format function
before:
```sql
mysql> select UNIX_TIMESTAMP('2023-09-18 00:00:00','yyyy-MM-dd HH:mm:ss');
+--------------------------------------------------------------+
| unix_timestamp('2023-09-18 00:00:00', 'yyyy-MM-dd HH:mm:ss') |
+--------------------------------------------------------------+
|                                                         NULL |
+--------------------------------------------------------------+
1 row in set (0.04 sec)
```
now:
```sql
mysql> select UNIX_TIMESTAMP('2023-09-18 00:00:00','yyyy-MM-dd HH:mm:ss');
+------------+
| 1694966400 |
+------------+
| 1694966400 |
+------------+
1 row in set (0.01 sec)
```
2023-09-19 18:41:59 +08:00
037ff2d5a6 [fix](nereids) bug: runtimefilter should not be pushed through window and topN (#24439)
runtime filter should not push down through topN
runtime filter should not push down through window if target slot is not partition key of all windowExpressions
2023-09-19 18:18:06 +08:00
e54c4ef258 [pipelineX](dependency) refactor write dependency (#24555) 2023-09-19 18:01:42 +08:00
3cac6806b4 [fix](txn) persist txn record of single replica load and ccr ingestion (#24543)
Otherwise txn would be dropped when a be reboots.
2023-09-19 15:10:38 +08:00
Pxl
5e4ab7cd25 [Bug](materialized-view) add limit for drop column on mv (#24493)
add limit for drop column on mv
2023-09-19 14:32:14 +08:00
ee56783629 [fix](Java UDF) Do not use enum as the data type for JavaUdfDataType. (#24460) 2023-09-19 14:06:02 +08:00
eea84ac36c [fix](Nereids): use == instead of id to identity PhysicalHashJoin (#24535) 2023-09-19 12:06:30 +08:00
b092bdaabf [feature](load) collect loaded rows on table level after txn published (#24346)
As title.

Stream load 20 lines

```
2023-09-14 11:40:04,186 DEBUG (PUBLISH_VERSION|23) [DatabaseTransactionMgr.updateCatalogAfterVisible():1769] table id to loaded rows:{51016=20}
```

```
mysql> select count(*) from dup_tbl_basic;
+----------+
| count(*) |
+----------+
|       20 |
+----------+
1 row in set (0.05 sec)
```
2023-09-19 12:00:08 +08:00
80bcb43143 [Feature]Support external table sample stats collection (#24376)
Support hive table sample stats collection. Gramma is like

`analyze table with sample percent 10`
2023-09-19 11:20:27 +08:00
6a33e4639a [schedule](pipeline) Remove wait schedule time in pipeline query engine and change current queue to std::mutex (#24525)
This reverts commit 591aeaa98d1178e2e277278c7afeafef9bdb88d6.
2023-09-18 23:57:56 +08:00
1ac7c8f14d [improvement](scan_queue_mem_limit) scan queue mem limit is so small for (#24553)
a wide table

Users rarely set scan_queue_mem_limit, so it almost often works as 2G/20. However,
somecases we need set it to a larger value, especially for insrt into
select from a wide table.
2023-09-18 20:22:03 +08:00
c54fc82031 [improve](nereids) expand runtime filter target by hashJoin's equal condition (#23274)
generate more runtime filters
example:

lineitem join partsupp on l_partkey= ps_partkey join filter(part) on ps_partkey=p_partkey 
we need two RFs:
RF1: p_partkey->ps_partkey
RF2: p_partkey->l_partkey

This pr will generate RF2, but current version will not.

merge runtime filters
current version, if one src could affect 2 targets, we will generate 2 runtime filters.
after this pr, the two rf will be merged.
refer to regression test: ds_rf2/ds_rf5/ds_rf54
2023-09-18 18:27:01 +08:00