Commit Graph

18429 Commits

Author SHA1 Message Date
c203d36300 [pipelineX](bug) Add logs (#27665) 2023-11-28 15:53:40 +08:00
91f56cefc0 [feature](Nereids): Pushdown TopN-Distinct through Union (#27628)
```
  TopN-Distinct
  -> Union All
  -> child plan1
  -> child plan2
  -> child plan3
 
  rewritten to
 
  TopN-Distinct
  -> Union All
    -> TopN-Distinct
      -> child plan1
    -> TopN-Distinct
      -> child plan2
    -> TopN-Distinct
      -> child plan3
```
2023-11-28 15:23:46 +08:00
2ea1e9db44 [fix](nereids) temp partition is always pruned (#27636) 2023-11-28 14:18:14 +08:00
Pxl
31fe48111b [Improvement](materialized-view) forbidden mv rewriter when select stmt's from clause not have mv (#27638)
forbidden mv rewriter when select stmt's from clause not have mv
2023-11-28 14:11:46 +08:00
f565f60bc3 [refactor](standard)BE:Initialize pointer variables in the class to nullptr by default (#27587) 2023-11-28 13:02:30 +08:00
fc2129a09f [fix](stats) skip collect agg_state type (#27640) 2023-11-28 11:43:48 +08:00
f329b90696 [fix](show_variables) fix default value for special variables (#27651) 2023-11-28 11:35:46 +08:00
4cfb9b73b8 [regression](partial update) Fix unstable p0 case test_primary_key_partial_update_parallel due to conflicting table name (#27633) 2023-11-28 11:14:34 +08:00
fe7ff6f113 [Opt](functions) Opt tvf number for performance regression framework (#27582)
Opt tvf number for performance regression framework
2023-11-28 10:43:51 +08:00
9903c30591 [opt](nereids)adjust distribution cost for better choice of broadcast join and shuffle join (#27113)
add boundary to distribution cost factor
2023-11-28 10:41:16 +08:00
d1e163126c [regression] remove useless case (#27590) 2023-11-28 10:39:55 +08:00
98c6885ae2 [opt](plan) only lock olap table when query plan (#27639)
For olap table, we need to acquire read lock when plan.
Because we need to make sure the partition's version remain unchanged when plan.

For other kind of table, no need to lock them.
2023-11-28 10:36:01 +08:00
c83e3318a8 (session) fix NereidsTracer shouldLog always true after set enable_nereids_trace from true to false (#27420)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-11-28 10:22:46 +08:00
65126459bd [deps](compression) Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 1. Add libdeflate lib. (#27542)
Test result:

- env: 1 node(16 cores, 64G).
- parquet column: 100 million rows of char(255) column.
- result: 9.09 s -> 6.04 s.
2023-11-28 10:14:48 +08:00
b48c40ed31 Make blockschduler first stop then delete (#27645) 2023-11-28 10:09:15 +08:00
ea7eca9345 [pipelineX](bug) Add some logs (#27596) 2023-11-28 10:02:13 +08:00
5bdfaf6447 [improve](metrics)Display garbage collector type (#27408) 2023-11-27 23:28:25 +08:00
2076d2b390 [Fix](statistics)Fix bug and improve auto analyze. (#27626)
1. Implement needReAnalyzeTable for ExternalTable. For now, external table will not be reanalyzed in 10 days.
2. For HiveMetastoreCache.loadPartitions, handle the empty iterator case to avoid Index out of boundary exception.
3. Wrap handle show analyze loop with try catch, so that when one table failed (for example, catalog dropped so the table couldn't be found anymore), we can still show the other tables.
4. For now, only OlapTable and Hive HMSExternalTable support sample analyze, throw exception for other types of table.
5. In StatisticsCollector, call constructJob after createTableLevelTaskForExternalTable to avoid NPE.
2023-11-27 22:13:48 +08:00
7ac97c1650 [fix](bdbje) add free disk config (#27578) 2023-11-27 21:29:02 +08:00
6a1c98af82 [regression case](broker laod) add case for without seq (#27586) 2023-11-27 21:27:27 +08:00
4ea69ed390 [regression test](broker load) add case for num_as_string (#27588) 2023-11-27 21:25:59 +08:00
bb68900bed [fix](bdbje) Fix bdbje logging level not work (#27597)
* `EnvironmentConfig.FILE_LOGGING_LEVEL` only set FileHandlerLevel, we should
   set logger level firstly, otherwise it will not take effect.
2023-11-27 21:24:34 +08:00
646f1ea087 [performance](Nereids): avoid use getStringValue() in getTimeFormatter() (#27625)
Original `getTimeFormatter()` will convert `long` to `string`, and then parse `string` to `int`.
2023-11-27 21:08:32 +08:00
HB
c7b9a32e3e [improvement](show snapshot) show iceberg snapshot print summary info (#27448)
Iceberg's snapshot has summary information, but Doris did not display it. This patch fixes this issue.
2023-11-27 20:56:50 +08:00
3d7d166355 [feature](cmd) add UNSET_VARIABLE statement to set back variables (#27552) 2023-11-27 20:30:04 +08:00
HB
36a528b6bc [fix](judge-partition) Fix incorrect logic in determining whether it is a partitioned table (#27515)
The old logic used to determine whether it was a partition table based on the number of buckets, but if I had a partition table with only one partition and the number of buckets in that partition was 1, it would be mistakenly recognized as a non partition table.

```
Table[test_load_doris_to_hive_2] is not partitioned
```
2023-11-27 18:56:52 +08:00
50c442fc6c [DOC](sparkload)add spark load faq (#27455)
add spark load FAQ
2023-11-27 17:49:52 +08:00
d5a56dc7f4 [information_schema](tables)modify information_schema.tables rows column use cache rows. (#27028)
Use the cached information and estimated information of the table in the rows column under 
information_schema.tables. Avoid querying information_schema.tables that will cause rpc timeout when there are a 
large number of tables in the catalog.
2023-11-27 17:48:06 +08:00
66eeafcd48 [refactor](Nereids): unify one DateLiteral init() (#27618)
`fromDateStr` will parse `date string` into `dateLiteral`, but `init()` already handle it, so we can use `init()` replace it.
2023-11-27 17:09:45 +08:00
fde4bab048 [fix](Nereids) non-deterministic expression should not be constant (#27606) 2023-11-27 16:40:30 +08:00
3d0dc94b18 [fix](ci) fix bug that "run build\n" not trigger pipeline (#27617)
Co-authored-by: stephen <hello-stephen@qq.com>
2023-11-27 16:23:42 +08:00
cbdb886b6e [fix](Nereids): fill up miss slot of order having project (#27480)
fill up miss slot of order having project such as 
```
select a + 1 as c from t having by c > 2 order by a 
```
2023-11-27 16:00:29 +08:00
612347f650 [fix](planner)sort node should materialized required slots for itself (#27605)
this is a follow up pr for #27526 . The old pr didn't fix the problem correctly, this pr do.
2023-11-27 15:37:11 +08:00
dc1a31715b [doc](flink) Update doc index title (#27410) 2023-11-27 15:32:10 +08:00
13b26ee920 [Fix](core) Fix wal space back pressure core and add regression test (#27311) 2023-11-27 15:10:26 +08:00
234aff3e78 [feature](Nereids): Pushdown TopN through Union (#27535)
```
topn
-> Union All 
  -> child plan1
  -> child plan2
  -> child plan3

rewritten to

topn
-> Union All 
 -> topn
  -> child plan1
 -> topn
  -> child plan2
 -> topn
  -> child plan3
```
2023-11-27 14:13:18 +08:00
1b4cd24b36 [opt](Nereids) support where, group by, having, order by clause without from clause in query statement (#27006)
Support where, group by, having, order by clause without from clause in query statement.
For example as following:

SELECT 1 AS a, COUNT(), SUM(2), AVG(1), RANK() OVER() AS w_rank
WHERE 1 = 1
GROUP BY a, w_rank
HAVING COUNT() IN (1, 2) AND w_rank = 1
ORDER BY a;

this will return result:

| a  |count(*)|sum(2)|avg(1)|w_rank|
+----+--------+------+------+------+
| 1  |       1|     2|   1.0|     1|


For another example as following:

select 1 c1, 2 union (select "hell0", "") order by c1
the second column datatype will be varchar(65533), 65533 is the default varchar length.

this will return result:

|c1    | 2 |
+------+---+
|1     | 2 |
|hell0 |   |
2023-11-27 12:05:14 +08:00
331effdb20 [feature](Nereids): support merge graph in group (#27353) 2023-11-27 11:48:38 +08:00
0e1e4c8508 [opt](nereids) disable infer column name when query (#27450)
Disable infer column name when query, because it cause some errors when using BI tools
This feature is firstly developed by #26055
2023-11-27 11:26:17 +08:00
5cb5241a9e [feature](mtmv) materialized view rewrite framework (#27059)
materialized view rewrite framework, support to query rewrite by struct info.
The idea is from "Optimizing Queries Using Materialized Views- A Practical, Scalable Solution"
2023-11-27 11:15:54 +08:00
3838b6fbae [refine](pipelineX) refine some code in pipelineX (#27472) 2023-11-27 11:04:16 +08:00
82d15669bc [minor](fe) convert Chinese annotations into English (#27560) 2023-11-27 11:03:44 +08:00
9aafcf2e22 [Enhance](fe) Support setting initial root password when FE firstly launch (#27438) 2023-11-27 11:03:27 +08:00
d0fea8db27 [chore][log] Opt log, revert some log introduced by #25739 (#26365) 2023-11-27 10:48:02 +08:00
550f3e801d [improve](routine_load) move log from write lock (#27576) 2023-11-27 10:47:31 +08:00
d10a708fa2 [improve](jdbc catalog) add profile for jdbc scan (#27447) 2023-11-27 10:33:39 +08:00
cd6c61347d [Feature](tvf)(avro-jni) avro-jni add projection push down (#26885) 2023-11-27 10:33:27 +08:00
baadc14e60 [Enhancement](function) support unix_timestamp with float (#26827)
---------

Co-authored-by: YangWithU <plzw8@outlook.com>
2023-11-27 09:58:53 +08:00
3791de3cfa [feature](mtmv)(6)implement cancel method (#27541)
1.implement cancel task method
2.fix `show create table ` not display `comment`
2023-11-27 09:49:46 +08:00
5700332c3c [enhance](S3) Print the error detail for every s3 operation (#27572) 2023-11-26 18:54:43 +08:00