Commit Graph

3069 Commits

Author SHA1 Message Date
bdf7d2779a [fix](Nereids) aggregate always report has 1 row count (#14236)
the data structure of new stats is changed, bug Agg-estimation is not changed
2022-11-14 16:27:55 +08:00
47326f951d [fix](nereids) count(*) reports npe when do filter selectivity estimation (#14235) 2022-11-14 16:11:08 +08:00
cf5e2a2eb6 [fix](nereids) new statistics use wrong default selectivity (#14233)
by default, column selectivity MUST be 1.0, not ZERO
2022-11-14 16:09:17 +08:00
7eed5a292c [feature-wip](multi-catalog) Support hive partition cache (#14134) 2022-11-14 14:12:40 +08:00
594e3b8224 [feature](Nereids) add circle detector and avoid overlap (#14164) 2022-11-14 14:02:14 +08:00
23a8c7eeb6 (fix)(multi-catalog)(es) Fix error result because not used fields_context (#14229)
Fix error result because not used fields_context
2022-11-14 14:00:55 +08:00
49fecd2a6d [improvement](log) print info of error replicas (#14220) 2022-11-14 11:37:18 +08:00
13b1f92c63 [enhancement](Nereids) add output set and output exprid set cache (#14151) 2022-11-14 11:24:57 +08:00
8263c34da6 [fix](ctas) use json_object in CTAS get wrong result (#14173)
* [fix](ctas) use json_object in CTAS get wrong result

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2022-11-14 09:13:05 +08:00
beaf2fcaf6 [feature](partition) support new create partition syntax (#13772)
Create partitions use :
```
PARTITION BY RANGE(event_day)(
        FROM ("2000-11-14") TO ("2021-11-14") INTERVAL 1 YEAR,
        FROM ("2021-11-14") TO ("2022-11-14") INTERVAL 1 MONTH,
        FROM ("2022-11-14") TO ("2023-01-03") INTERVAL 1 WEEK,
        FROM ("2023-01-03") TO ("2023-01-14") INTERVAL 1 DAY,
        PARTITION p_20230114 VALUES [('2023-01-14'), ('2023-01-15'))
)

PARTITION BY RANGE(event_time)(
        FROM ("2023-01-03 12") TO ("2023-01-14 22") INTERVAL 1 HOUR
)
```
can create a year/month/week/day/hour's date partitions in a batch,
also it is compatible with the single partitioning method.
2022-11-12 20:52:37 +08:00
d9913b1317 [Enhancement](Nerieds) Support numbers TableValuedFunction and some bitmap/hll aggregate function (#14169)
## Problem summary
This pr support
1. `numbers` TableValuedFunction for nereids test, like `select * from numbers(number = 10, backend_num = 1)`
2. bitmap/hll aggregate function
3. support find variable length function in function registry, like `coalesce`
4. fix a bug that print nerieds trace will throw exception because use RewriteRule in ApplyRuleJob, e.g: `AggregateDisassemble`, introduced by #13957
2022-11-11 16:29:15 +08:00
7c48168a53 [refactor](Nereids) remove DecimalType, use DecimalV2Type instead (#14166) 2022-11-11 13:58:16 +08:00
b6ba654f5b [Feature](Sequence) Support sequence_match and sequence_count functions (#13785) 2022-11-11 13:38:45 +08:00
5fad4f4c7b [feature](Nereids) replace order by keys by child output if possible (#14108)
To support query like that:
SELECT c1 + 1 as a, sum(c2) FROM t GROUP BY c1 + 1 ORDER BY c1 + 1

After rewrite, plan will equal to
SELECT c1 + 1 as a, sum(c2) FROM t GROUP BY c1 + 1 ORDER BY a
2022-11-11 13:34:29 +08:00
9b50888aaf [feature](Nereids) prune runtime filters which cannot reduce the tuple number of probe table (#13990)
1. add a post processor: runtime filter pruner 
Doris generates RFs (runtime filter) on Join node to reduce the probe table at scan stage. But some RFs have no effect, because its selectivity is 100%. This pr will remove them.
A RF is effective if
a. the build column value range covers part of that of probe column, OR
b. the build column ndv is less than that of probe column, OR
c. the build column's ColumnStats.selectivity < 1, OR
d. the build column is reduced by another RF, which satisfies above criterions.

2. explain graph
a. add RF info in Join and Scan node
b. add predicate count in Scan node

3. Rename session variable
rename `enable_remove_no_conjuncts_runtime_filter_policy` to `enable_runtime_filter_prune` 

4. fix min/max column stats derive bug
`select max(A) as X from T group by B`  
X.min is A.min, not A.max
2022-11-11 13:13:29 +08:00
8e17fcef3f [fix](cast)fix cast to char(N) error (#14168) 2022-11-11 11:27:51 +08:00
8812a680fc [fix](metric) fix the bug of not updating the query latency metric #14172 2022-11-11 11:21:17 +08:00
e1e63f8354 [feature-wip](statistic) persistence table statistics into olap table (#13883)
1. Supports for persisting collected statistics to a pre-built OLAP table named `column_statistics`.
2. Use a much simpler mechanism to collect statistics: all the gauges are collected in single one SQL for each partition and then the whole column, which defined in class `AnalysisJob`
3. Implement a cache to manage the statistics records in FE

TODO:

1. Use opentelemetry to monitor the execution time of each job
2. Format the internal analysis SQL
3. split SQL to promise the in expr's child count not exceeds the FE limits of generated SQL for deleting expired records
4. Implements show statements
2022-11-10 22:08:08 +08:00
1ef85ae1f2 [Improvement](join) Support nested loop outer join (#13965) 2022-11-10 19:50:46 +08:00
6c13126e5c [enhancement](Nereids) analyze check input slots must in child's output (#14107) 2022-11-10 19:28:01 +08:00
ae4f2aead7 [fix](nereids) column stats min/max missing (#14091)
in the result of SHOW COLUMN STATS tbl, min/max value is not displayed.
2022-11-10 17:08:44 +08:00
9b5b411112 [fix](schemeChange) fe oom because replicas too many when schema change (#12850) 2022-11-10 16:17:25 +08:00
151a72d158 [feature](Nereids) support circle graph (#14082) 2022-11-10 15:54:21 +08:00
Pxl
0e26f28bf2 [Enhancement](runtime-filter) enlarge runtime filter in predicate threshold (#13581)
enlarge runtime filter in predicate threshold
2022-11-10 15:48:46 +08:00
4cde9c4765 [enhance](Nereids): add missing hypergraph rule. (#14087) 2022-11-10 15:23:31 +08:00
0dfdbe4508 [feature](Nereids): InnerJoinLeftAssociate, InnerJoinRightAssociate and JoinExchange. (#14051) 2022-11-10 12:21:06 +08:00
8c5c6d9d7f [fix](ctas) fix wrong string column length after executing ctas from external table (#14090) 2022-11-10 11:36:56 +08:00
17867e446f [feature](nereids) let user define right deep tree penalty by session variable (#14040)
it is hard for us to find a proper factor for all queries.
default is 0.7
2022-11-10 11:25:02 +08:00
84b969a25c [fix](grouping)the grouping expr should check col name from base table first, then alias (#14077)
* [fix](grouping)the grouping expr should check col name from base table first, then alias

* fix fe ut, the behavior would be same as mysql
2022-11-10 11:10:42 +08:00
994d563f52 [fix](nereids) cannot collect decimal column stats (#13961)
When execute analyze table, doris fails on decimal columns.
The root cause is the scale in decimalV2 is 9, but 2 in schema.
There is no need to check scale for decimalV2, since it is not a float point type.
2022-11-10 11:06:38 +08:00
184cee2d2b [Bug](outfile) Fix wrong decimal format for ORC (#14124) 2022-11-10 11:01:30 +08:00
43eb946543 [feature](table-valued-function)S3 table valued function supports parquet/orc/json file format #14130
S3 table valued function supports parquet/orc/json file format.
For example: parquet format
2022-11-10 10:33:12 +08:00
10df61b5bf [improvement](join) Share hash table in fragments for broadcast join (#13921) 2022-11-10 09:48:34 +08:00
df622d8b7d [Bug](udf) fix java-udaf process string type error and add some tests (#14106) 2022-11-10 09:30:57 +08:00
3117ac9289 [enhancement](Nereids) use post-order to generate runtime filter in RuntimeFilterGenerator (#13949)
change runtime filter generator from pre-order to post-order, it maybe change the quantity of generated runtime filters.
and the ut will be corrected.
2022-11-09 14:28:49 +08:00
b74d0a4747 [feature](table-valued-function) Support desc from s3() and modify the syntax of tvf (#14047)
This pr does two things:

Support desc function s3()
modify the syntax of tvf
2022-11-09 14:12:43 +08:00
84bb82acc0 [fix](Nereids) aggregate disassemble generate error output list on GLOBAL phase aggregate (#14079)
we must use localAggregateFunction as key of globalOutputSMap, because we use local output exprs to generate global output in disassembleDistinct
2022-11-09 13:43:12 +08:00
b144d2b4f4 [improve](Nereids): remove redundant code, add annotation in Memo. (#14083) 2022-11-09 13:39:20 +08:00
aff62655c4 [feature](Nereids) binding slot in order by that not show in project (#14042)
1. binding slot in order by that not show in project, such as:
SELECT c1 FROM t WHERE c2 > 0 ORDER BY c3

2. not check unbound when bind slot reference. Instead, do it in analysis check.
2022-11-09 13:25:41 +08:00
572f491756 [fix](ctas) text column type len = 1 when create table as select (#13906)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2022-11-09 09:09:34 +08:00
151842a1fe [feature](inverted index)WIP inverted index api: SQL syntax and metadata (#13430)
Introduce a SQL syntax for creating inverted index and related metadata changes.

```
-- create table with INVERTED index 

CREATE TABLE httplogs (
  ts datetime,
  clientip varchar(20),
  request string,
  status smallint,
  size int,
  INDEX idx_size (size) USING INVERTED,
  INDEX idx_status (status) USING INVERTED,
  INDEX idx_clientip (clientip) USING INVERTED PROPERTIES("parser"="none")
)
DUPLICATE KEY(ts)
DISTRIBUTED BY RANDOM BUCKETS 10

-- add an INVERTED index  to a table

CREATE INDEX idx_request ON httplogs(request) USING INVERTED PROPERTIES("parser"="english");
```
2022-11-08 23:46:53 +08:00
826cfdaf93 [feature](information_schema) add backends information_schema table (#13086) 2022-11-08 22:15:10 +08:00
3f3f2eb098 [Nereids][Improve] infer predicate after push down predicate (#12996)
This PR implements the function of predicate inference

For example:

``` sql
select * from student left join score on student.id = score.sid where score.sid > 1
```
transformed logical plan tree:

                    left join
             /                    \
       filter(sid >1)     filter(id > 1) <---- inferred predicate
         |                           |
      scan                      scan  

See `InferPredicatesTest`  for more cases

 The logic is as follows:
  1. poll up bottom predicate then infer additional predicates
    for example:
    select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id
    1. poll up bottom predicate
       select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t.id = 1
    2. infer
       select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t.id = 1 and t2.id = 1
    finally transformed sql:
       select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t2.id = 1
  2. put these predicates into `otherJoinConjuncts` , these predicates are processed in the next
    round of predicate push-down


Now only support infer `ComparisonPredicate`.

TODO: We should determine whether `expression` satisfies the condition for replacement
             eg: Satisfy `expression` is non-deterministic
2022-11-08 21:36:17 +08:00
b6f91b6eff [improvement](profile) support ordinary user to get query profile via http api (#14016) 2022-11-08 20:39:01 +08:00
ecfdf0320d [fix](statistics) ColumnStatistics was changed unexpectedly when show stats (#14068)
The logic of show stats would change the internal collected ColumnStat unexpectedly which would cause inaccurate cost and inefficient plan
2022-11-08 20:26:37 +08:00
cdc635610b [enhancement](Nereids) tpch q21 anti and semi join reorder (#14037)
estimation of anti and semi join need re-work. we just let tpch q21 pass.
2022-11-08 17:21:50 +08:00
54c07f8782 [regression](Nereids) add back tpch regression test cases (#13826)
1. add back TPC-H regression test cases
2. fix decimal problem on aggregate function sum and agg introduced by #13764 
3. fix memo merge group NPE introduced by #13900
2022-11-08 16:40:46 +08:00
1c07a01038 [feature](multi-catalog) Support data on s3-compatible oss and support aliyun DLF (#13994)
Support Aliyun DLF
Support data on s3-compatible object storage, such as aliyun oss.
Refactor some interface of catalog, to make it more tidy.
Fix bug that the default text format field delimiter of hive should be \x01
Add a new class PooledHiveMetaStoreClient to wrap the IMetaStoreClient.
2022-11-08 14:02:41 +08:00
61d4974ba1 [fix](Nereids) Use simple cost to calculate benefit and avoid unuseless calculation (#14056)
In GraphSimplifier, we can use simple cost to calculate the benefit.
And only when the best neighbor of the apply step is the processing edge, we need to update recursively.
2022-11-08 13:11:38 +08:00
e6b12ce8e8 [feature](Nereids) support query that group by use alias generated in aggregate output (#14030)
support query having alias in group by list, such as:
SELECT c1 AS a, SUM(c2) FROM t GROUP BY a;
2022-11-08 11:02:42 +08:00