doris

Author	SHA1	Message	Date
Mingyu Chen	8fe5211df4	[improvement](multi-catalog)(cache) invalidate catalog cache when refresh (#14342 ) Invalidate catalog/db/table cache when doing refresh catalog/db/table. Tested table with 10000 partitions. The refresh operation will cost about 10-20 ms.	2022-11-17 20:47:46 +08:00
Jibing-Li	ccf4db394c	[feature-wip](multi-catalog) Collect external table statistics (#14160 ) Collect HMS external table statistic information through external metadata. Insert the result into __internal_schema.column_statistics using insert into SQL.	2022-11-17 20:41:09 +08:00
Ashin Gau	44ee4386f7	[test](multi-catalog)Regression test for external hive orc table (#13762 ) Add regression test for external hive orc table. This PR has generated all basic types support by hive orc, and create a hive external table to touch them in docker environment. Functions to be tested: 1. Ensure that all types are parsed correctly 2. Ensure that the null map of all types are parsed correctly 3. Ensure that the `SearchArgument` of `OrcReader` works well 4. Only select partition columns	2022-11-17 20:36:02 +08:00
Kikyou1997	98956dfa19	[fix](statistics) statistics inaccurate after analyze same table more than once (#14279 ) If a table already been analyzed, then we analyze it again, the new statistics would larger than expected since the incremental would contain the values from table level statistics since the SQL lack the predication for the nullability of part_id	2022-11-17 20:18:14 +08:00
slothever	6da2948283	[feature-wip](multi-catalog) support iceberg v2(step 1) (#13867 ) Support position delete(part of).	2022-11-17 17:56:48 +08:00
morrySnow	af462b07c7	[enhancement](explain) compress descriptor table explain string (#14152 ) 1. compress slot descriptor explain string to one row 2. remove unmaterialized tuple descriptor and slot descriptor before this PR descriptor table explain string is like this: ``` TupleDescriptor{id=0, tbl=lineitem, byteSize=176, materialized=true} SlotDescriptor{id=0, col=l_shipdate, type=DATEV2} parent=0 materialized=true byteSize=4 byteOffset=0 nullIndicatorByte=0 nullIndicatorBit=-1 nullable=false slotIdx=0 SlotDescriptor{id=1, col=l_orderkey, type=BIGINT} parent=0 materialized=true byteSize=8 byteOffset=24 nullIndicatorByte=0 nullIndicatorBit=-1 nullable=false slotIdx=6 ``` after this PR descriptor table explain string is like this: ``` TupleDescriptor{id=2, tbl=lineitem} SlotDescriptor{id=1, col=l_extendedprice, type=DECIMAL(15,2), nullable=false} SlotDescriptor{id=2, col=l_discount, type=DECIMAL(15,2), nullable=false} ```	2022-11-17 15:19:17 +08:00
minghong	afc9065b51	[test](nereids) add filter estimation ut cases (#14293 ) fix a bug for filter estimation, in pattern of A>10 and A<20.	2022-11-17 11:01:30 +08:00
Mingyu Chen	7182f14645	[improvement][fix](multi-catalog) speed up list partition prune (#14268 ) In previous implementation, when doing list partition prune, we need to generation `rangeToId` every time we doing prune. But `rangeToId` is actually a static data that should be create-once-use-every-where. So for hive partition, I created the `rangeToId` and all other necessary data structures for partition prunning in partition cache, so that we can use it directly. In my test, the cost of partition prune for 10000 partitions reduce from 8s -> 0.2s. Aslo add "partition" info in explain string for hive table. ``` \| 0:VEXTERNAL_FILE_SCAN_NODE \| \| predicates: `nation` = '0024c95b' \| \| inputSplitNum=1, totalFileSize=4750, scanRanges=1 \| \| partition=1/10000 \| \| numNodes=1 \| \| limit: 10 \| ``` Bug fix: 1. Fix bug that es scan node can not filter data 2. Fix bug that query es with predicate like `where substring(test2,2) = "ext2";` will fail at planner phase. `Unexpected exception: org.apache.doris.analysis.FunctionCallExpr cannot be cast to org.apache.doris.analysis.SlotRef` TODO: 1. Some problem when quering es version 8: ` Unexpected exception: Index: 0, Size: 0`, will be fixed later.	2022-11-17 08:30:03 +08:00
wxy	943e014414	[enhancement](decommission) speed up decommission process (#14028 ) (#14006 )	2022-11-16 20:43:07 +08:00
morrySnow	47a6373e0a	[feature](Nereids) support datev2 and datetimev2 type (#14263 ) 1. split DateLiteral and DateTimeLiteral into V1 and V2 2. add a type coercion about DateLikeType: DateTimeV2Type > DateTimeType > DateV2Type > DateType 3. add a rule to remove unnecessary CAST on DateLikeType in ComparisonPredicate	2022-11-16 15:51:48 +08:00
Gabriel	6881989dd9	[Bug](jvm memory) Support multiple java version to get max heap size (#14295 ) `sun.misc.VM.maxDirectMemory` is used in JDK1.8 only. This PR add the interface for JDK11.	2022-11-16 10:58:58 +08:00
zhangstar333	70cc725649	[Vectorized](function) support avg_weighted/percentile_array/topn_wei… (#14209 ) * [Vectorized](function) support avg_weighted/percentile_array/topn_weighted functions * update add to stringRef	2022-11-15 16:38:38 +08:00
yiguolei	87544a017f	[fuzztest](fe session variable) add fuzzy test config for fe session variables. (#14272 ) Many feature in fe session variable is disabled by default. So that these features do not pass github workflow test actually. I add a fuzzy test config in fe.conf. If it is set to true, then we will use fuzzy session variables for every connection so that every feature developer could set fuzzy values for its config. Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-11-15 15:43:21 +08:00
Yongqiang YANG	5ae046b208	[bugfix](log) fix wrong print introduced by 49fecd2a6dae #14266	2022-11-15 11:39:05 +08:00
Kikyou1997	a3062c662c	[feature-wip](statistics) support statistics injection and show statistics (#14201 ) 1. Reduce the configuration options for statistics framework, and add comment for those rest. 2. Move the logic of creation of analysis job to the `StatisticsRepository` which defined all the functions used to interact with internal statistics table 3. Move AnalysisJobScheduler to the statistics package 4. Support display and injections manually for statistics	2022-11-15 11:29:51 +08:00
huangzhaowei	89db3fee00	[feature-wip](MTMV)Add show statement for MTMV (#13786 ) Use Case mysql> CREATE TABLE t1 (pk INT, v1 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1'); mysql> CREATE TABLE t2 (pk INT, v2 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1'); mysql > CREATE MATERIALIZED VIEW mv BUILD IMMEDIATE REFRESH COMPLETE KEY (mv_pk) DISTRIBUTED BY HASH (mv_pk) PROPERTIES ('replication_num' = '1') AS SELECT t1.pk as mv_pk FROM t1, t2 WHERE t1.pk = t2.pk; mysql> SHOW MTMV JOB; mysql> SHOW MTMV TASK;	2022-11-15 10:32:47 +08:00
ccoffline	37fdd011b4	[fix](fe-metric) Prometheus read format error #13831 (#13832 ) Co-authored-by: 迟成 <chicheng@meituan.com>	2022-11-14 22:07:00 +08:00
minghong	b0ff852d74	[opt](Nereids) right deep tree penalty adjust: use right rowCount, not abs(left - right) (#14239 ) in origin algorithm, the penalty is abs(leftRowCount - RightRowCount). this will make some right deep tree escape from penalty， because the substraction is almost zero. Penalty by RightRowCount can avoid this escape.	2022-11-14 16:40:26 +08:00
minghong	bea66e6a12	[fix](nereids) cannot generate RF on colocate join and prune useful RF in RF prune (#14234 ) 1. when we translate colocated join, we lost RF information attached to the right child, and hence BE will not generate those RFs. 2. when a RF is useless, we prune all RFs on the scan node by mistake	2022-11-14 16:36:55 +08:00
minghong	8dd2f8b349	[enhancement](nereids) set Ndv=rowCount if ndv is almost equal to rowCount on ColumnStatisitics load (#14238 )	2022-11-14 16:30:35 +08:00
minghong	bdf7d2779a	[fix](Nereids) aggregate always report has 1 row count (#14236 ) the data structure of new stats is changed, bug Agg-estimation is not changed	2022-11-14 16:27:55 +08:00
minghong	47326f951d	[fix](nereids) count(*) reports npe when do filter selectivity estimation (#14235 )	2022-11-14 16:11:08 +08:00
minghong	cf5e2a2eb6	[fix](nereids) new statistics use wrong default selectivity (#14233 ) by default, column selectivity MUST be 1.0, not ZERO	2022-11-14 16:09:17 +08:00
Mingyu Chen	7eed5a292c	[feature-wip](multi-catalog) Support hive partition cache (#14134 )	2022-11-14 14:12:40 +08:00
谢健	594e3b8224	[feature](Nereids) add circle detector and avoid overlap (#14164 )	2022-11-14 14:02:14 +08:00
Stalary	23a8c7eeb6	(fix)(multi-catalog)(es) Fix error result because not used fields_context (#14229 ) Fix error result because not used fields_context	2022-11-14 14:00:55 +08:00
Yongqiang YANG	49fecd2a6d	[improvement](log) print info of error replicas (#14220 )	2022-11-14 11:37:18 +08:00
morrySnow	13b1f92c63	[enhancement](Nereids) add output set and output exprid set cache (#14151 )	2022-11-14 11:24:57 +08:00
xueweizhang	8263c34da6	[fix](ctas) use json_object in CTAS get wrong result (#14173 ) * [fix](ctas) use json_object in CTAS get wrong result Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2022-11-14 09:13:05 +08:00
catpineapple	beaf2fcaf6	[feature](partition) support new create partition syntax (#13772 ) Create partitions use ： ``` PARTITION BY RANGE(event_day)( FROM ("2000-11-14") TO ("2021-11-14") INTERVAL 1 YEAR, FROM ("2021-11-14") TO ("2022-11-14") INTERVAL 1 MONTH, FROM ("2022-11-14") TO ("2023-01-03") INTERVAL 1 WEEK, FROM ("2023-01-03") TO ("2023-01-14") INTERVAL 1 DAY, PARTITION p_20230114 VALUES [('2023-01-14'), ('2023-01-15')) ) PARTITION BY RANGE(event_time)( FROM ("2023-01-03 12") TO ("2023-01-14 22") INTERVAL 1 HOUR ) ``` can create a year/month/week/day/hour's date partitions in a batch, also it is compatible with the single partitioning method.	2022-11-12 20:52:37 +08:00
924060929	d9913b1317	[Enhancement](Nerieds) Support numbers TableValuedFunction and some bitmap/hll aggregate function (#14169 ) ## Problem summary This pr support 1. `numbers` TableValuedFunction for nereids test, like `select * from numbers(number = 10, backend_num = 1)` 2. bitmap/hll aggregate function 3. support find variable length function in function registry, like `coalesce` 4. fix a bug that print nerieds trace will throw exception because use RewriteRule in ApplyRuleJob, e.g: `AggregateDisassemble`, introduced by #13957	2022-11-11 16:29:15 +08:00
morrySnow	7c48168a53	[refactor](Nereids) remove DecimalType, use DecimalV2Type instead (#14166 )	2022-11-11 13:58:16 +08:00
abmdocrt	b6ba654f5b	[Feature](Sequence) Support sequence_match and sequence_count functions (#13785 )	2022-11-11 13:38:45 +08:00
morrySnow	5fad4f4c7b	[feature](Nereids) replace order by keys by child output if possible (#14108 ) To support query like that: SELECT c1 + 1 as a, sum(c2) FROM t GROUP BY c1 + 1 ORDER BY c1 + 1 After rewrite, plan will equal to SELECT c1 + 1 as a, sum(c2) FROM t GROUP BY c1 + 1 ORDER BY a	2022-11-11 13:34:29 +08:00
minghong	9b50888aaf	[feature](Nereids) prune runtime filters which cannot reduce the tuple number of probe table (#13990 ) 1. add a post processor: runtime filter pruner Doris generates RFs (runtime filter) on Join node to reduce the probe table at scan stage. But some RFs have no effect, because its selectivity is 100%. This pr will remove them. A RF is effective if a. the build column value range covers part of that of probe column, OR b. the build column ndv is less than that of probe column, OR c. the build column's ColumnStats.selectivity < 1, OR d. the build column is reduced by another RF, which satisfies above criterions. 2. explain graph a. add RF info in Join and Scan node b. add predicate count in Scan node 3. Rename session variable rename `enable_remove_no_conjuncts_runtime_filter_policy` to `enable_runtime_filter_prune` 4. fix min/max column stats derive bug `select max(A) as X from T group by B` X.min is A.min, not A.max	2022-11-11 13:13:29 +08:00
starocean999	8e17fcef3f	[fix](cast)fix cast to char(N) error (#14168 )	2022-11-11 11:27:51 +08:00
Luwei	8812a680fc	[fix](metric) fix the bug of not updating the query latency metric #14172	2022-11-11 11:21:17 +08:00
Kikyou1997	e1e63f8354	[feature-wip](statistic) persistence table statistics into olap table (#13883 ) 1. Supports for persisting collected statistics to a pre-built OLAP table named `column_statistics`. 2. Use a much simpler mechanism to collect statistics: all the gauges are collected in single one SQL for each partition and then the whole column, which defined in class `AnalysisJob` 3. Implement a cache to manage the statistics records in FE TODO: 1. Use opentelemetry to monitor the execution time of each job 2. Format the internal analysis SQL 3. split SQL to promise the in expr's child count not exceeds the FE limits of generated SQL for deleting expired records 4. Implements show statements	2022-11-10 22:08:08 +08:00
Gabriel	1ef85ae1f2	[Improvement](join) Support nested loop outer join (#13965 )	2022-11-10 19:50:46 +08:00
morrySnow	6c13126e5c	[enhancement](Nereids) analyze check input slots must in child's output (#14107 )	2022-11-10 19:28:01 +08:00
minghong	ae4f2aead7	[fix](nereids) column stats min/max missing (#14091 ) in the result of SHOW COLUMN STATS tbl, min/max value is not displayed.	2022-11-10 17:08:44 +08:00
shee	9b5b411112	[fix](schemeChange) fe oom because replicas too many when schema change (#12850 )	2022-11-10 16:17:25 +08:00
谢健	151a72d158	[feature](Nereids) support circle graph (#14082 )	2022-11-10 15:54:21 +08:00
Pxl	0e26f28bf2	[Enhancement](runtime-filter) enlarge runtime filter in predicate threshold (#13581 ) enlarge runtime filter in predicate threshold	2022-11-10 15:48:46 +08:00
jakevin	4cde9c4765	[enhance](Nereids): add missing hypergraph rule. (#14087 )	2022-11-10 15:23:31 +08:00
jakevin	0dfdbe4508	[feature](Nereids): InnerJoinLeftAssociate, InnerJoinRightAssociate and JoinExchange. (#14051 )	2022-11-10 12:21:06 +08:00
Mingyu Chen	8c5c6d9d7f	[fix](ctas) fix wrong string column length after executing ctas from external table (#14090 )	2022-11-10 11:36:56 +08:00
minghong	17867e446f	[feature](nereids) let user define right deep tree penalty by session variable (#14040 ) it is hard for us to find a proper factor for all queries. default is 0.7	2022-11-10 11:25:02 +08:00
starocean999	84b969a25c	[fix](grouping)the grouping expr should check col name from base table first, then alias (#14077 ) * [fix](grouping)the grouping expr should check col name from base table first, then alias * fix fe ut, the behavior would be same as mysql	2022-11-10 11:10:42 +08:00
minghong	994d563f52	[fix](nereids) cannot collect decimal column stats (#13961 ) When execute analyze table, doris fails on decimal columns. The root cause is the scale in decimalV2 is 9, but 2 in schema. There is no need to check scale for decimalV2, since it is not a float point type.	2022-11-10 11:06:38 +08:00

... 103 104 105 106 107 ...

8289 Commits