doris

Author	SHA1	Message	Date
minghong	bdf7d2779a	[fix](Nereids) aggregate always report has 1 row count (#14236 ) the data structure of new stats is changed, bug Agg-estimation is not changed	2022-11-14 16:27:55 +08:00
minghong	47326f951d	[fix](nereids) count(*) reports npe when do filter selectivity estimation (#14235 )	2022-11-14 16:11:08 +08:00
minghong	cf5e2a2eb6	[fix](nereids) new statistics use wrong default selectivity (#14233 ) by default, column selectivity MUST be 1.0, not ZERO	2022-11-14 16:09:17 +08:00
Mingyu Chen	7eed5a292c	[feature-wip](multi-catalog) Support hive partition cache (#14134 )	2022-11-14 14:12:40 +08:00
谢健	594e3b8224	[feature](Nereids) add circle detector and avoid overlap (#14164 )	2022-11-14 14:02:14 +08:00
Stalary	23a8c7eeb6	(fix)(multi-catalog)(es) Fix error result because not used fields_context (#14229 ) Fix error result because not used fields_context	2022-11-14 14:00:55 +08:00
Yongqiang YANG	49fecd2a6d	[improvement](log) print info of error replicas (#14220 )	2022-11-14 11:37:18 +08:00
morrySnow	13b1f92c63	[enhancement](Nereids) add output set and output exprid set cache (#14151 )	2022-11-14 11:24:57 +08:00
xueweizhang	8263c34da6	[fix](ctas) use json_object in CTAS get wrong result (#14173 ) * [fix](ctas) use json_object in CTAS get wrong result Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2022-11-14 09:13:05 +08:00
catpineapple	beaf2fcaf6	[feature](partition) support new create partition syntax (#13772 ) Create partitions use ： ``` PARTITION BY RANGE(event_day)( FROM ("2000-11-14") TO ("2021-11-14") INTERVAL 1 YEAR, FROM ("2021-11-14") TO ("2022-11-14") INTERVAL 1 MONTH, FROM ("2022-11-14") TO ("2023-01-03") INTERVAL 1 WEEK, FROM ("2023-01-03") TO ("2023-01-14") INTERVAL 1 DAY, PARTITION p_20230114 VALUES [('2023-01-14'), ('2023-01-15')) ) PARTITION BY RANGE(event_time)( FROM ("2023-01-03 12") TO ("2023-01-14 22") INTERVAL 1 HOUR ) ``` can create a year/month/week/day/hour's date partitions in a batch, also it is compatible with the single partitioning method.	2022-11-12 20:52:37 +08:00
924060929	d9913b1317	[Enhancement](Nerieds) Support numbers TableValuedFunction and some bitmap/hll aggregate function (#14169 ) ## Problem summary This pr support 1. `numbers` TableValuedFunction for nereids test, like `select * from numbers(number = 10, backend_num = 1)` 2. bitmap/hll aggregate function 3. support find variable length function in function registry, like `coalesce` 4. fix a bug that print nerieds trace will throw exception because use RewriteRule in ApplyRuleJob, e.g: `AggregateDisassemble`, introduced by #13957	2022-11-11 16:29:15 +08:00
morrySnow	7c48168a53	[refactor](Nereids) remove DecimalType, use DecimalV2Type instead (#14166 )	2022-11-11 13:58:16 +08:00
abmdocrt	b6ba654f5b	[Feature](Sequence) Support sequence_match and sequence_count functions (#13785 )	2022-11-11 13:38:45 +08:00
morrySnow	5fad4f4c7b	[feature](Nereids) replace order by keys by child output if possible (#14108 ) To support query like that: SELECT c1 + 1 as a, sum(c2) FROM t GROUP BY c1 + 1 ORDER BY c1 + 1 After rewrite, plan will equal to SELECT c1 + 1 as a, sum(c2) FROM t GROUP BY c1 + 1 ORDER BY a	2022-11-11 13:34:29 +08:00
minghong	9b50888aaf	[feature](Nereids) prune runtime filters which cannot reduce the tuple number of probe table (#13990 ) 1. add a post processor: runtime filter pruner Doris generates RFs (runtime filter) on Join node to reduce the probe table at scan stage. But some RFs have no effect, because its selectivity is 100%. This pr will remove them. A RF is effective if a. the build column value range covers part of that of probe column, OR b. the build column ndv is less than that of probe column, OR c. the build column's ColumnStats.selectivity < 1, OR d. the build column is reduced by another RF, which satisfies above criterions. 2. explain graph a. add RF info in Join and Scan node b. add predicate count in Scan node 3. Rename session variable rename `enable_remove_no_conjuncts_runtime_filter_policy` to `enable_runtime_filter_prune` 4. fix min/max column stats derive bug `select max(A) as X from T group by B` X.min is A.min, not A.max	2022-11-11 13:13:29 +08:00
starocean999	8e17fcef3f	[fix](cast)fix cast to char(N) error (#14168 )	2022-11-11 11:27:51 +08:00
Luwei	8812a680fc	[fix](metric) fix the bug of not updating the query latency metric #14172	2022-11-11 11:21:17 +08:00
Kikyou1997	e1e63f8354	[feature-wip](statistic) persistence table statistics into olap table (#13883 ) 1. Supports for persisting collected statistics to a pre-built OLAP table named `column_statistics`. 2. Use a much simpler mechanism to collect statistics: all the gauges are collected in single one SQL for each partition and then the whole column, which defined in class `AnalysisJob` 3. Implement a cache to manage the statistics records in FE TODO: 1. Use opentelemetry to monitor the execution time of each job 2. Format the internal analysis SQL 3. split SQL to promise the in expr's child count not exceeds the FE limits of generated SQL for deleting expired records 4. Implements show statements	2022-11-10 22:08:08 +08:00
Gabriel	1ef85ae1f2	[Improvement](join) Support nested loop outer join (#13965 )	2022-11-10 19:50:46 +08:00
morrySnow	6c13126e5c	[enhancement](Nereids) analyze check input slots must in child's output (#14107 )	2022-11-10 19:28:01 +08:00
minghong	ae4f2aead7	[fix](nereids) column stats min/max missing (#14091 ) in the result of SHOW COLUMN STATS tbl, min/max value is not displayed.	2022-11-10 17:08:44 +08:00
shee	9b5b411112	[fix](schemeChange) fe oom because replicas too many when schema change (#12850 )	2022-11-10 16:17:25 +08:00
谢健	151a72d158	[feature](Nereids) support circle graph (#14082 )	2022-11-10 15:54:21 +08:00
Pxl	0e26f28bf2	[Enhancement](runtime-filter) enlarge runtime filter in predicate threshold (#13581 ) enlarge runtime filter in predicate threshold	2022-11-10 15:48:46 +08:00
jakevin	4cde9c4765	[enhance](Nereids): add missing hypergraph rule. (#14087 )	2022-11-10 15:23:31 +08:00
jakevin	0dfdbe4508	[feature](Nereids): InnerJoinLeftAssociate, InnerJoinRightAssociate and JoinExchange. (#14051 )	2022-11-10 12:21:06 +08:00
Mingyu Chen	8c5c6d9d7f	[fix](ctas) fix wrong string column length after executing ctas from external table (#14090 )	2022-11-10 11:36:56 +08:00
minghong	17867e446f	[feature](nereids) let user define right deep tree penalty by session variable (#14040 ) it is hard for us to find a proper factor for all queries. default is 0.7	2022-11-10 11:25:02 +08:00
starocean999	84b969a25c	[fix](grouping)the grouping expr should check col name from base table first, then alias (#14077 ) * [fix](grouping)the grouping expr should check col name from base table first, then alias * fix fe ut, the behavior would be same as mysql	2022-11-10 11:10:42 +08:00
minghong	994d563f52	[fix](nereids) cannot collect decimal column stats (#13961 ) When execute analyze table, doris fails on decimal columns. The root cause is the scale in decimalV2 is 9, but 2 in schema. There is no need to check scale for decimalV2, since it is not a float point type.	2022-11-10 11:06:38 +08:00
Gabriel	184cee2d2b	[Bug](outfile) Fix wrong decimal format for ORC (#14124 )	2022-11-10 11:01:30 +08:00
Tiewei Fang	43eb946543	[feature](table-valued-function)S3 table valued function supports parquet/orc/json file format #14130 S3 table valued function supports parquet/orc/json file format. For example: parquet format	2022-11-10 10:33:12 +08:00
Jerry Hu	10df61b5bf	[improvement](join) Share hash table in fragments for broadcast join (#13921 )	2022-11-10 09:48:34 +08:00
zhangstar333	df622d8b7d	[Bug](udf) fix java-udaf process string type error and add some tests (#14106 )	2022-11-10 09:30:57 +08:00
mch_ucchi	3117ac9289	[enhancement](Nereids) use post-order to generate runtime filter in RuntimeFilterGenerator (#13949 ) change runtime filter generator from pre-order to post-order, it maybe change the quantity of generated runtime filters. and the ut will be corrected.	2022-11-09 14:28:49 +08:00
Tiewei Fang	b74d0a4747	[feature](table-valued-function) Support `desc from s3()` and modify the syntax of tvf (#14047 ) This pr does two things: Support desc function s3() modify the syntax of tvf	2022-11-09 14:12:43 +08:00
morrySnow	84bb82acc0	[fix](Nereids) aggregate disassemble generate error output list on GLOBAL phase aggregate (#14079 ) we must use localAggregateFunction as key of globalOutputSMap, because we use local output exprs to generate global output in disassembleDistinct	2022-11-09 13:43:12 +08:00
jakevin	b144d2b4f4	[improve](Nereids): remove redundant code, add annotation in Memo. (#14083 )	2022-11-09 13:39:20 +08:00
morrySnow	aff62655c4	[feature](Nereids) binding slot in order by that not show in project (#14042 ) 1. binding slot in order by that not show in project, such as: SELECT c1 FROM t WHERE c2 > 0 ORDER BY c3 2. not check unbound when bind slot reference. Instead, do it in analysis check.	2022-11-09 13:25:41 +08:00
xueweizhang	572f491756	[fix](ctas) text column type len = 1 when create table as select (#13906 ) Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2022-11-09 09:09:34 +08:00
Kang	151842a1fe	[feature](inverted index)WIP inverted index api: SQL syntax and metadata (#13430 ) Introduce a SQL syntax for creating inverted index and related metadata changes. ``` -- create table with INVERTED index CREATE TABLE httplogs ( ts datetime, clientip varchar(20), request string, status smallint, size int, INDEX idx_size (size) USING INVERTED, INDEX idx_status (status) USING INVERTED, INDEX idx_clientip (clientip) USING INVERTED PROPERTIES("parser"="none") ) DUPLICATE KEY(ts) DISTRIBUTED BY RANDOM BUCKETS 10 -- add an INVERTED index to a table CREATE INDEX idx_request ON httplogs(request) USING INVERTED PROPERTIES("parser"="english"); ```	2022-11-08 23:46:53 +08:00
Tiewei Fang	826cfdaf93	[feature](information_schema) add `backends` information_schema table (#13086 )	2022-11-08 22:15:10 +08:00
shee	3f3f2eb098	[Nereids][Improve] infer predicate after push down predicate (#12996 ) This PR implements the function of predicate inference For example: ``` sql select * from student left join score on student.id = score.sid where score.sid > 1 ``` transformed logical plan tree: left join / \ filter(sid >1) filter(id > 1) <---- inferred predicate \| \| scan scan See `InferPredicatesTest` for more cases The logic is as follows: 1. poll up bottom predicate then infer additional predicates for example: select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id 1. poll up bottom predicate select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t.id = 1 2. infer select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t.id = 1 and t2.id = 1 finally transformed sql: select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t2.id = 1 2. put these predicates into `otherJoinConjuncts` , these predicates are processed in the next round of predicate push-down Now only support infer `ComparisonPredicate`. TODO: We should determine whether `expression` satisfies the condition for replacement eg: Satisfy `expression` is non-deterministic	2022-11-08 21:36:17 +08:00
Mingyu Chen	b6f91b6eff	[improvement](profile) support ordinary user to get query profile via http api (#14016 )	2022-11-08 20:39:01 +08:00
Kikyou1997	ecfdf0320d	[fix](statistics) ColumnStatistics was changed unexpectedly when show stats (#14068 ) The logic of show stats would change the internal collected ColumnStat unexpectedly which would cause inaccurate cost and inefficient plan	2022-11-08 20:26:37 +08:00
minghong	cdc635610b	[enhancement](Nereids) tpch q21 anti and semi join reorder (#14037 ) estimation of anti and semi join need re-work. we just let tpch q21 pass.	2022-11-08 17:21:50 +08:00
morrySnow	54c07f8782	[regression](Nereids) add back tpch regression test cases (#13826 ) 1. add back TPC-H regression test cases 2. fix decimal problem on aggregate function sum and agg introduced by #13764 3. fix memo merge group NPE introduced by #13900	2022-11-08 16:40:46 +08:00
Mingyu Chen	1c07a01038	[feature](multi-catalog) Support data on s3-compatible oss and support aliyun DLF (#13994 ) Support Aliyun DLF Support data on s3-compatible object storage, such as aliyun oss. Refactor some interface of catalog, to make it more tidy. Fix bug that the default text format field delimiter of hive should be \x01 Add a new class PooledHiveMetaStoreClient to wrap the IMetaStoreClient.	2022-11-08 14:02:41 +08:00
谢健	61d4974ba1	[fix](Nereids) Use simple cost to calculate benefit and avoid unuseless calculation (#14056 ) In GraphSimplifier, we can use simple cost to calculate the benefit. And only when the best neighbor of the apply step is the processing edge, we need to update recursively.	2022-11-08 13:11:38 +08:00
morrySnow	e6b12ce8e8	[feature](Nereids) support query that group by use alias generated in aggregate output (#14030 ) support query having alias in group by list, such as: SELECT c1 AS a, SUM(c2) FROM t GROUP BY a;	2022-11-08 11:02:42 +08:00

1 2 3 4 5 ...

3069 Commits