Commit Graph

15142 Commits

Author SHA1 Message Date
d771f16b79 [fix](parquet)fix bug that can not read parquet data page v2 (#27655) 2023-11-28 22:43:46 +08:00
aa6573db4f [fix](statistics)Fix sample min max npe bug (#27702)
Min and max value may be NULL, need to handle this case in sample analyze.
2023-11-28 21:24:20 +08:00
8910772cb8 [pipelineX](log) refine debug string (#27712) 2023-11-28 21:15:52 +08:00
c6f43e4241 [Fix](show-load)Show load npe(userinfo is null) (#27698) 2023-11-28 21:07:32 +08:00
f54db85ea3 [Opt](compression) Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 2. Opt gzip decompression by libdeflate lib. (#27669)
Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 2. Opt gzip decompression by libdeflate after adding libdeflate lib in #27542.
2023-11-28 20:05:24 +08:00
Pxl
d969047b50 [Refactor](join) refactor of hash join (#27557)
Improve the performance under the tpch data set by reconstructing the join related code and the use of hash table

Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: BiteTheDDDDt <pxl290@qq.com>
2023-11-28 19:46:00 +08:00
1b509ab13c [Fix](statistics)Need to recalculate health value when table row count become 0 (#27673)
Need to recalculate health value when table row count become 0. Otherwise, when user truncate a table, the old statistics will not be updated.
2023-11-28 18:47:12 +08:00
38d30f21f1 [pipelineX](bug) Fix scan dependency timeout (#27696) 2023-11-28 18:21:11 +08:00
Pxl
91b0edfaa2 [Bug](join) try fix wrong _has_null_in_build_side setted (#27684)
try fix wrong _has_null_in_build_side setted
2023-11-28 17:42:14 +08:00
b93dd1d5f7 [enhancement](load) improve error msg for load when cancelled by mem gc (#26809)
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-11-28 17:36:11 +08:00
7087250b4a [fix](insert) txn insert and group commit should write \N string corr… (#27637) 2023-11-28 17:32:50 +08:00
f0dbce4cf5 [fix](Nereids) compound predicate need cast children to boolean (#27649) 2023-11-28 16:55:44 +08:00
3d46643dab [feature](Nereids): rewrite count(null) to 0 (#27471)
select count(null) --> select 0
2023-11-28 16:25:34 +08:00
c203d36300 [pipelineX](bug) Add logs (#27665) 2023-11-28 15:53:40 +08:00
91f56cefc0 [feature](Nereids): Pushdown TopN-Distinct through Union (#27628)
```
  TopN-Distinct
  -> Union All
  -> child plan1
  -> child plan2
  -> child plan3
 
  rewritten to
 
  TopN-Distinct
  -> Union All
    -> TopN-Distinct
      -> child plan1
    -> TopN-Distinct
      -> child plan2
    -> TopN-Distinct
      -> child plan3
```
2023-11-28 15:23:46 +08:00
2ea1e9db44 [fix](nereids) temp partition is always pruned (#27636) 2023-11-28 14:18:14 +08:00
Pxl
31fe48111b [Improvement](materialized-view) forbidden mv rewriter when select stmt's from clause not have mv (#27638)
forbidden mv rewriter when select stmt's from clause not have mv
2023-11-28 14:11:46 +08:00
f565f60bc3 [refactor](standard)BE:Initialize pointer variables in the class to nullptr by default (#27587) 2023-11-28 13:02:30 +08:00
fc2129a09f [fix](stats) skip collect agg_state type (#27640) 2023-11-28 11:43:48 +08:00
f329b90696 [fix](show_variables) fix default value for special variables (#27651) 2023-11-28 11:35:46 +08:00
4cfb9b73b8 [regression](partial update) Fix unstable p0 case test_primary_key_partial_update_parallel due to conflicting table name (#27633) 2023-11-28 11:14:34 +08:00
fe7ff6f113 [Opt](functions) Opt tvf number for performance regression framework (#27582)
Opt tvf number for performance regression framework
2023-11-28 10:43:51 +08:00
9903c30591 [opt](nereids)adjust distribution cost for better choice of broadcast join and shuffle join (#27113)
add boundary to distribution cost factor
2023-11-28 10:41:16 +08:00
d1e163126c [regression] remove useless case (#27590) 2023-11-28 10:39:55 +08:00
98c6885ae2 [opt](plan) only lock olap table when query plan (#27639)
For olap table, we need to acquire read lock when plan.
Because we need to make sure the partition's version remain unchanged when plan.

For other kind of table, no need to lock them.
2023-11-28 10:36:01 +08:00
c83e3318a8 (session) fix NereidsTracer shouldLog always true after set enable_nereids_trace from true to false (#27420)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-11-28 10:22:46 +08:00
65126459bd [deps](compression) Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 1. Add libdeflate lib. (#27542)
Test result:

- env: 1 node(16 cores, 64G).
- parquet column: 100 million rows of char(255) column.
- result: 9.09 s -> 6.04 s.
2023-11-28 10:14:48 +08:00
b48c40ed31 Make blockschduler first stop then delete (#27645) 2023-11-28 10:09:15 +08:00
ea7eca9345 [pipelineX](bug) Add some logs (#27596) 2023-11-28 10:02:13 +08:00
5bdfaf6447 [improve](metrics)Display garbage collector type (#27408) 2023-11-27 23:28:25 +08:00
2076d2b390 [Fix](statistics)Fix bug and improve auto analyze. (#27626)
1. Implement needReAnalyzeTable for ExternalTable. For now, external table will not be reanalyzed in 10 days.
2. For HiveMetastoreCache.loadPartitions, handle the empty iterator case to avoid Index out of boundary exception.
3. Wrap handle show analyze loop with try catch, so that when one table failed (for example, catalog dropped so the table couldn't be found anymore), we can still show the other tables.
4. For now, only OlapTable and Hive HMSExternalTable support sample analyze, throw exception for other types of table.
5. In StatisticsCollector, call constructJob after createTableLevelTaskForExternalTable to avoid NPE.
2023-11-27 22:13:48 +08:00
7ac97c1650 [fix](bdbje) add free disk config (#27578) 2023-11-27 21:29:02 +08:00
6a1c98af82 [regression case](broker laod) add case for without seq (#27586) 2023-11-27 21:27:27 +08:00
4ea69ed390 [regression test](broker load) add case for num_as_string (#27588) 2023-11-27 21:25:59 +08:00
bb68900bed [fix](bdbje) Fix bdbje logging level not work (#27597)
* `EnvironmentConfig.FILE_LOGGING_LEVEL` only set FileHandlerLevel, we should
   set logger level firstly, otherwise it will not take effect.
2023-11-27 21:24:34 +08:00
646f1ea087 [performance](Nereids): avoid use getStringValue() in getTimeFormatter() (#27625)
Original `getTimeFormatter()` will convert `long` to `string`, and then parse `string` to `int`.
2023-11-27 21:08:32 +08:00
HB
c7b9a32e3e [improvement](show snapshot) show iceberg snapshot print summary info (#27448)
Iceberg's snapshot has summary information, but Doris did not display it. This patch fixes this issue.
2023-11-27 20:56:50 +08:00
3d7d166355 [feature](cmd) add UNSET_VARIABLE statement to set back variables (#27552) 2023-11-27 20:30:04 +08:00
HB
36a528b6bc [fix](judge-partition) Fix incorrect logic in determining whether it is a partitioned table (#27515)
The old logic used to determine whether it was a partition table based on the number of buckets, but if I had a partition table with only one partition and the number of buckets in that partition was 1, it would be mistakenly recognized as a non partition table.

```
Table[test_load_doris_to_hive_2] is not partitioned
```
2023-11-27 18:56:52 +08:00
50c442fc6c [DOC](sparkload)add spark load faq (#27455)
add spark load FAQ
2023-11-27 17:49:52 +08:00
d5a56dc7f4 [information_schema](tables)modify information_schema.tables rows column use cache rows. (#27028)
Use the cached information and estimated information of the table in the rows column under 
information_schema.tables. Avoid querying information_schema.tables that will cause rpc timeout when there are a 
large number of tables in the catalog.
2023-11-27 17:48:06 +08:00
66eeafcd48 [refactor](Nereids): unify one DateLiteral init() (#27618)
`fromDateStr` will parse `date string` into `dateLiteral`, but `init()` already handle it, so we can use `init()` replace it.
2023-11-27 17:09:45 +08:00
fde4bab048 [fix](Nereids) non-deterministic expression should not be constant (#27606) 2023-11-27 16:40:30 +08:00
3d0dc94b18 [fix](ci) fix bug that "run build\n" not trigger pipeline (#27617)
Co-authored-by: stephen <hello-stephen@qq.com>
2023-11-27 16:23:42 +08:00
cbdb886b6e [fix](Nereids): fill up miss slot of order having project (#27480)
fill up miss slot of order having project such as 
```
select a + 1 as c from t having by c > 2 order by a 
```
2023-11-27 16:00:29 +08:00
612347f650 [fix](planner)sort node should materialized required slots for itself (#27605)
this is a follow up pr for #27526 . The old pr didn't fix the problem correctly, this pr do.
2023-11-27 15:37:11 +08:00
dc1a31715b [doc](flink) Update doc index title (#27410) 2023-11-27 15:32:10 +08:00
13b26ee920 [Fix](core) Fix wal space back pressure core and add regression test (#27311) 2023-11-27 15:10:26 +08:00
234aff3e78 [feature](Nereids): Pushdown TopN through Union (#27535)
```
topn
-> Union All 
  -> child plan1
  -> child plan2
  -> child plan3

rewritten to

topn
-> Union All 
 -> topn
  -> child plan1
 -> topn
  -> child plan2
 -> topn
  -> child plan3
```
2023-11-27 14:13:18 +08:00
1b4cd24b36 [opt](Nereids) support where, group by, having, order by clause without from clause in query statement (#27006)
Support where, group by, having, order by clause without from clause in query statement.
For example as following:

SELECT 1 AS a, COUNT(), SUM(2), AVG(1), RANK() OVER() AS w_rank
WHERE 1 = 1
GROUP BY a, w_rank
HAVING COUNT() IN (1, 2) AND w_rank = 1
ORDER BY a;

this will return result:

| a  |count(*)|sum(2)|avg(1)|w_rank|
+----+--------+------+------+------+
| 1  |       1|     2|   1.0|     1|


For another example as following:

select 1 c1, 2 union (select "hell0", "") order by c1
the second column datatype will be varchar(65533), 65533 is the default varchar length.

this will return result:

|c1    | 2 |
+------+---+
|1     | 2 |
|hell0 |   |
2023-11-27 12:05:14 +08:00