doris

Author	SHA1	Message	Date
Stalary	6698f63dec	[fix](function) If function adds type inference (#9728 )	2022-05-26 22:43:18 +08:00
morrySnow	e701c057dc	[style](fe) wrap and whitespace rules (#9764 ) change below rules' severity to error and fix original code error: - EmptyBlock - EmptyCatchBlock - LeftCurly - RightCurly - IllegalTokenText - MultipleVariableDeclarations - OneStatementPerLine - StringLiteralEquality - UnusedLocalVariable - Indentation - OuterTypeFilename - MethodParamPad - GenericWhitespace - NoWhitespaceBefore - OperatorWrap - ParenPad - WhitespaceAfter - WhitespaceAround	2022-05-26 16:56:20 +08:00
Mingyu Chen	32a210f426	[fix](help) fix bug of help command (#9761 ) This bug is introduced from #9306, that user need to execute "help stream-load" to show the help doc. But actually, it should be "help stream load".	2022-05-26 08:44:00 +08:00
Mingyu Chen	0c70359404	[fix](resource-tag) Consider resource tags when assigning tasks for broker & routine load (#9492 ) This CL mainly changes: 1. Broker Load When assigning backends, use user level resource tag to find available backends. If user level resource tag is not set, broker load task can be assigned to any BE node, otherwise, task can only be assigned to BE node which match the user level tags. 2. Routine Load The current routine load job does not have user info, so it can not get user level tag when assigning tasks. So there are 2 ways: 1. For old routine load job, use tags of replica allocation info to select BE nodes. 2. For new routine load job, the user info will be added and persisted in routine load job.	2022-05-26 08:42:09 +08:00
Adonis Ling	2a11a4ab99	[feature-wip][array-type] Support more sub types. (#9466 ) Please refer to #9465	2022-05-26 08:41:34 +08:00
Zhengguo Yang	be026addde	[security] update canal version to fix fastjson security issue (#9763 )	2022-05-25 18:22:37 +08:00
924060929	cc9321a09b	[Enhancement](Nereids)refactor plan node into plan + operator (#9755 ) Close #9623 Summary: This pr refactor plan node into plan + operator. In the previous version in nereids, a plan node consists of children and relational algebra, e.g. ```java class LogicalJoin extends LogicalBinary { private Plan left, right; } ``` This structure above is easy to understand, but it difficult to optimize `Memo.copyIn`: rule generate complete sub-plan, and Memo must compare the complete sub-plan to distinct GroupExpression and hurt performance. First, we need change the rule to generate partial sub-plan, and replace some children plan to a placeholder, e.g. LeafOp in Columbia optimizer. And then mark some children in sub-plan to unchanged, and bind the relate group, so don't have to compare and copy some sub-plan if relate group exists. Second, we need separate the origin `Plan` into `Plan` and `Operator`, which Plan contains children and Operator, and Operator just denote relation relational algebra(no children/ input field). This design make operator and children not affect each other. So plan-group binder can generate placeholder plan(contains relate group) for the sub-query, don't have to generate current plan node case by case because the plan is immutable(means generate a new plan with replace children). And rule implementer can reuse the placeholder to generate partial sub-plan. Operator and Plan have the similar inheritance structure like below. XxxPlan contains XxxOperator, e.g. LogicalBinary contains a LogicalBinaryOperator. ``` TreeNode │ │ ┌───────┴────────┐ Operator │ │ │ │ │ │ │ │ │ ▼ ▼ ▼ Expression Plan PlanOperator │ │ │ │ ┌───────────┴─────────┐ │ │ │ ┌───────────┴──────────────────┐ │ │ │ │ │ │ │ │ ▼ ▼ ▼ ▼ LogicalPlan PhysicalPlan LogicalPlanOperator PhysicalPlanOperator │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ├───►LogicalLeaf ├──►PhysicalLeaf ├──► LogicalLeafOperator ├───►PhysicalLeafOperator │ │ │ │ │ │ │ │ │ │ │ │ ├───►LogicalUnary ├──►PhysicalUnary ├──► LogicalUnaryOperator ├───►PhysicalUnaryOperator │ │ │ │ │ │ │ │ │ │ │ │ └───►LogicalBinary └──►PhysicalBinary └──► LogicalBinaryOperator └───►PhysicalBinaryOperator ``` The concrete operator extends the XxxNaryOperator, e.g. ```java class LogicalJoin extends LogicalBinaryOperator; class PhysicalProject extends PhysicalUnaryOperator; class LogicalRelation extends LogicalLeafOperator; ``` So the first example change to this: ```java class LogicalBinary extends AbstractLogicalPlan implements BinaryPlan { private Plan left, right; private LogicalBinaryOperator operator; } class LogicalJoin extends LogicalBinaryOperator {} ``` Under such changes, Rule must build the plan and operator as needed, not only the plan like before. for example: JoinCommutative Rule ```java public Rule<Plan> build() { // the plan override function can automatic build plan, according to the Operator's type, // so return a LogicalBinary(LogicalJoin, Plan, Plan) return innerLogicalJoin().then(join -> plan( // operator new LogicalJoin(join.op.getJoinType().swap(), join.op.getOnClause()), // children join.right(), join.left() )).toRule(RuleType.LOGICAL_JOIN_COMMUTATIVE); } ```	2022-05-24 20:53:24 +08:00
Shuangchi He	77297bb7ee	Fix some typos in fe/. (#9682 )	2022-05-23 12:11:01 +08:00
zhengshiJ	d8f1b77cc1	[improvement](planner) Backfill the original predicate pushdown code (#9703 ) Due to the current architecture, predicate derivation at rewrite cannot satisfy all cases, because rewrite is performed on first and then where, and when there are subqueries, all cases cannot be derived. So keep the predicate pushdown method here. eg. select * from t1 left join t2 on t1 = t2 where t1 = 1; InferFiltersRule can't infer t2 = 1, because this is out of specification. The expression(t2 = 1) can actually be deduced to push it down to the scan node.	2022-05-22 21:35:32 +08:00
Mingyu Chen	d270f4f2d4	[config](checksum) Disable consistency checker by default (#9699 ) Disable by default because current checksum logic has some bugs. And it will also bring some overhead.	2022-05-22 21:31:43 +08:00
zxealous	ad4da4aa8f	[doc] Fix typos in documentation (#9692 )	2022-05-22 21:30:22 +08:00
xy720	3391de482b	[Refactor] simplify some code in routine load (#9532 )	2022-05-22 21:25:39 +08:00
xiepengcheng01	31e40191a8	[Refactor] add vpre_filter_expr for vectorized to improve performance (#9508 )	2022-05-22 11:45:57 +08:00
HappenLee	8fa677b59c	[Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner (#9666 ) * [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner 1. fix bug of vjson scanner not support `range_from_file_path` 2. fix bug of vjson/vbrocker scanner core dump by src/dest slot nullable is different 3. fix bug of vparquest filter_block reference of column in not 1 4. refactor code to simple all the code It only changed vectorized load, not original row based load. Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-05-20 11:43:03 +08:00
zhangstar333	6f61af7682	[Vectorized][java-udf] add datetime&&largeint&&decimal type to java-udf (#9440 )	2022-05-20 10:26:09 +08:00
Jibing-Li	5fa6e892be	[fix](broker-scan-node) Remove trailing spaces in broker_scanner. Make it consistent with hive and trino behavior. (#9190 ) Hive and trino/presto would automatically trim the trailing spaces but Doris doesn't. This would cause different query result with hive. Add a new session variable "trim_tailing_spaces_for_external_table_query". If set to true, when reading csv from broker scan node, it will trim the tailing space of the column	2022-05-20 09:55:13 +08:00
jakevin	c2d41c84bf	[feature](nereids): add join rules base code (#9598 )	2022-05-20 08:18:08 +08:00
spaces-x	c048b1f0f9	[fix](sparkload): fix min_value will be negative number when `maxGlobalDictValue` exceeds integer range (#9436 )	2022-05-19 23:56:24 +08:00
leo65535	1355bc162b	[Enhance] Add host info to heartbeat error msg (#9499 )	2022-05-19 23:45:53 +08:00
Stalary	cbc7b167b1	[Feature] cancel load support state (#9537 )	2022-05-19 16:37:56 +08:00
morrySnow	235d586f11	[style](fe) code correct rules and name rules (#9670 ) * [style](fe) code correct rules and name rules * revert some change according to comments	2022-05-19 16:36:03 +08:00
EmmyMiao87	7a9bf5b23e	[FeConfig](Project) Project optimization is enabled by default (#9667 )	2022-05-19 14:03:14 +08:00
morrySnow	a3183ec45c	[fix](planner) unnecessary cast will be added on children in CaseExpr sometimes (#9600 ) unnecessary cast will be added on children in CaseExpr because use symbolized equal to compare to `Expr`'s type. it will lead to expression compare mistake and then lead to expression substitute failed when use `ExprSubstitutionMap`	2022-05-18 22:44:51 +08:00
morrySnow	94c89e8a37	[improment](planner) push down predicate past two phase aggregate (#9498 ) Push down predicate past aggregate cannot push down predicate past 2 phase aggregate. origin plan is like this: ``` second phase agg (conjuncts on olap scan node tuples) \| first phase agg \| olap scan node ``` should be optimized to ``` second phase agg \| first phase agg \| olap scan node (conjuncts on olap scan node tuples) ```	2022-05-18 10:09:39 +08:00
Hui Tian	682cc14182	[bug] (init) Java version check fail (#9607 )	2022-05-18 07:47:03 +08:00
Mingyu Chen	7d9c25e718	[config] Remove some old config and session variable (#9495 ) 1. Remove session variable "enable_lateral_view" 2. Remove Fe config: enable_materialized_view 3. Remove Fe config: enable_create_sync_job 4. Fe config dynamic_partition_enable is only used for disable dynamic partition scheduler.	2022-05-17 22:37:11 +08:00
Mingyu Chen	2ba81899d0	[fix] fix bug that replica can not be repaired duo to DECOMMISSION state (#9424 ) Reset state of replica which state are in DECOMMISSION after finished scheduling.	2022-05-17 22:36:30 +08:00
pengxiangyu	4ba75d3195	[feature] Add StoragePolicyResource for Remote Storage (#9554 ) Add StoragePolicyResource for Remote Storage	2022-05-17 20:17:33 +08:00
Stalary	d95fe08458	[feature] group_concat support distinct (#9576 )	2022-05-17 19:29:47 +08:00
dujl	72e0042efb	[feature-wip](hudi) Step1: Support create hudi external table (#9559 ) support create hudi table support show create table for hudi table ### Design 1. create hudi table without schema(recommanded) ```sql CREATE [EXTERNAL] TABLE table_name ENGINE = HUDI [COMMENT "comment"] PROPERTIES ( "hudi.database" = "hudi_db_in_hive_metastore", "hudi.table" = "hudi_table_in_hive_metastore", "hudi.hive.metastore.uris" = "thrift://127.0.0.1:9083" ); ``` 2. create hudi table with schema ```sql CREATE [EXTERNAL] TABLE table_name [(column_definition1[, column_definition2, ...])] ENGINE = HUDI [COMMENT "comment"] PROPERTIES ( "hudi.database" = "hudi_db_in_hive_metastore", "hudi.table" = "hudi_table_in_hive_metastore", "hudi.hive.metastore.uris" = "thrift://127.0.0.1:9083" ); ``` When create hudi table with schema, the columns must exist in corresponding table in hive metastore.	2022-05-17 11:30:23 +08:00
yinzhijian	bee5c2f8aa	[feature-wip](parquet-vec) Support parquet scanner in vectorized engine (#9433 )	2022-05-17 09:37:17 +08:00
morrySnow	c731e84341	[fix](planner)VecNotImplException thrown when query need rewrite and some slot cannot changed to nullable (#9589 )	2022-05-16 22:34:02 +08:00
EmmyMiao87	9f9b666bc1	[Feature](Nereids) Data structure of comparison predicate (#9506 ) 1. The data structure of the comparison expression 2. Refactored the inheritance and implementation relationship of tree node ``` +-- ---- ---- ---+- ---- ---- ---- ---+- ---- ----- ---- ----TreeNode-----------------+ \| \| \| \| \| \| \| \| \| v v v v Abstract Tree Node Leaf Node Unary Node Binary Node +-------- ---------+ \| \| \| \| (children) \| \| \| v v v v v Leaf Expression Unary Expression Binary Expression +------Expression----+ Plan Node \| \| \| \| \| \| \| \| \| \| v v \| \| +- ---- ---- -----> Comparison Predicate Named Expr +---- -------+ \| \| v v \| +- -- --- --- --- --- --- --- --- --- --- --- --- --- ---> Alias Expr Slot ^ \| \| \| \| +---- --- ---- ------ ---- ------- ------ ------- --- ------ ------ ----- ---- ----- ----- ---+ ```	2022-05-16 15:01:13 +08:00
zhangstar333	953429e370	[fix](function) fix last_value get wrong result when have order by clause (#9247 )	2022-05-15 23:56:01 +08:00
EmmyMiao87	a9653f00bb	[fix](lateral-view) Error view includes lateral view (#9530 ) Fixed #9529 When the lateral view based on a inline view which belongs to a view, Doris could not resolve the column of lateral view in query. When a query uses a view, it mainly refers to the string representation of the view. That is, if the view's string representation is wrong, the view is wrong. The string representation of the inline view lacks the handling of the lateral view. This leads to query errors when using such views. This PR mainly fixes the string representation of inline views.	2022-05-14 09:57:08 +08:00
morrySnow	8a0097cfb9	[style](java) format fe code with some check rules (#9460 ) Issue Number: close #9403 set below rules' severity to error and format code according check info. a. Merge conflicts unresolved b. Avoid using corresponding octal or Unicode escape c. Avoid Escaped Unicode Characters d. No Line Wrap e. Package Name f. Type Name g. Annotation Location h. Interface Type Parameter i. CatchParameterName j. Pattern Variable Name k. Record Component Name l. Record Type Parameter Name m. Method Type Parameter Name n. Redundant Import o. Custom Import Order p. Unused Imports q. Avoid Star Import r. tab character in file s. Newline At End Of File t. Trailing whitespace found	2022-05-12 20:14:38 +08:00
jiafeng.zhang	d7705ace65	[fix](binlog-load) binlog load fails because txn exceeds the default value (#9471 ) binlog load Because txn exceeds the default value, resume is a failure, and a friendly prompt message is given to the user, instead of prompting success now, it still fails after a while, and the user will feel inexplicable Issue Number: close #9468	2022-05-12 13:31:22 +08:00
deardeng	cfbf13710b	[fix](broker-load) can't load parquet file with column name case sensitive with Doris column (#9358 )	2022-05-12 13:27:03 +08:00
morrySnow	122cc3b772	[chore](fe code style)add suppressions to fe check style (#9429 ) Current fe check style check all files. But some rules should be only applied on production files. Add suppressions to suppress some rules on test files.	2022-05-12 12:16:55 +08:00
Stalary	f11d320213	[feature] support row policy filter (#9206 )	2022-05-11 22:11:10 +08:00
jakevin	74352c807e	[refactor](Nereids): cascades refactor (#9470 ) Describe the overview of changes. - rename GroupExpression - use `HashSet<GroupExpression> groupExpressions` in `memo` - add label of `Nereids` for CI - remove `GroupExpr` from Plan	2022-05-11 11:07:58 +08:00
jiafeng.zhang	ad88eb739b	[fix](http) Hardening Recommendations Disable TRACE/TRAC methods (#9479 )	2022-05-11 09:41:59 +08:00
Mingyu Chen	8fa0122ed0	[refactor](backend) Refactor the logic of selecting Backend in FE. (#9478 ) There are many places in FE where a group of BE nodes needs to be selected according to certain requirements. for example: 1. When creating replicas for a tablet. 2. When selecting a BE to execute Insert. 3. When Stream Load forwards http requests to BE nodes. These operations all have the same logic. So this CL mainly changes: 1. Create a new `BeSelectionPolicy` class to describe the set of conditions for selecting BE. 2. The logic of selecting BE nodes in `SystemInfoService` has been refactored, and the following two methods are used uniformly: 1. `selectBackendIdsByPolicy`: Select the required number of BE nodes according to the `BeSelectionPolicy`. 2. `selectBackendIdsForReplicaCreation`: Select the BE node for the replica creation operation. Note that there are some changes here: For the replica creation operation, the round-robin method was used to select BE nodes before, but now it is changed to `random` selection for the following reasons: 1. Although the previous logic is round-robin, it is actually random. 2. The final diff of the random algorithm will not be greater than 5%, so it can be considered that the random algorithm can distribute the data evenly.	2022-05-11 09:40:57 +08:00
xueweizhang	375c1bf5c0	[feature](mysql-table) support utf8mb4 for mysql external table (#9402 ) This patch supports utf8mb4 for mysql external table. if someone needs a mysql external table with utf8mb4 charset, but only support charset utf8 right now. When create mysql external table, it can add an optional propertiy "charset" which can set character fom mysql connection, default value is "utf8". You can set "utf8mb4" instead of "utf8" when you need.	2022-05-11 09:39:23 +08:00
Stalary	092a12e983	[feature] show create materialized view (#9391 )	2022-05-11 09:29:55 +08:00
924060929	99b8e08a5f	[Enhancement](Optimizer) Nereids pattern matching base framework (#9474 ) This pr provide a new pattern matching framework for Nereids optimizer. The new pattern matching framework contains this concepts: 1. `Pattern`/`PatternDescriptor`: the tree node's multiple hierarchy shape, e.g. `logicalJoin(logicalJoin(), any()` pattern describe a plan that root is a `LogicalJoin` and the left child is `LogicalJoin` too. 2. `MatchedAction`: a callback function when the pattern matched, usually you can create new plan to replace the origin matched plan. 3. `MatchingContext`: the param pass through MatchedAction, contains the matched plan root and the PlannerContext. 4. `PatternMatcher`: contains PatternDescriptor and MatchedAction 5. `Rule`: a rewrite rule contains RuleType, PatternPromise, Pattern and transform function(equals to MatchedAction) 6. `RuleFactory`: the factory can help us build Rules easily. RuleFactory extends Patterns interface, and have some predefined pattern descriptors. for example, Join commutative: ```java public class JoinCommutative extends OneExplorationRuleFactory { @Override public Rule<Plan> build() { return innerLogicalJoin().thenApply(ctx -> { return new LogicalJoin( JoinType.INNER_JOIN, ctx.root.getOnClause(), ctx.root.right(), ctx.root.left() ); }).toRule(RuleType.LOGICAL_JOIN_COMMUTATIVE); } } ``` the code above show the three step to create a Rule 1. 'innerLogicalJoin()' declare pattern is an inner logical join. 'innerLogicalJoin' is a predefined pattern. 2. invoke 'thenApply()' function to combine a MatchedAction, return a new LogicalJoin with exchange children. 3. invoke 'toRule()' function to convert to Rule You can think the Rule contains three parts: 1. Pattern 2. transform function / MatchedAction 3. RuleType and RulePromise So 1. `innerLogicalJoin()` create a `PatternDescriptor`, which contains a `Pattern` 2. `PatternDescriptor.then()` convert `PatternDescriptor` to `PatternMatcher,` witch contains Pattern and MatchedAction 3. `PatternMatcher.toRule()` convert `PatternMatcher` to a Rule This three step inspired by the currying in function programing. It should be noted, #9446 provide a generic type for TreeNode's children, so we can infer multiple hierarchy type in this pattern matching framework, so you can get the really tree node type without unsafely cast. like this: ```java logicalJoin(logicalJoin(), any()).then(j -> { // j can be inferred type to LogicalJoin<LogicalJoin<Plan, Plan>, Plan> // so j.left() can be inferred type to LogicalJoin<Plan, Plan>, // so you don't need to cast j.left() from 'Plan' to 'LogicalJoin' var node = j.left().left(); }) ```	2022-05-10 10:06:04 +08:00
leo65535	d1b85d51a0	[code style](fe) Include test sources (#9366 ) Include test sources, we also need to check them.	2022-05-09 09:40:44 +08:00
caiconghui	580ce38a3f	[fix](schema_hash) Fix bug that introduced by removing schema_hash (#9449 )	2022-05-08 21:03:10 +08:00
Henry2SS	c633402ce3	[feature] (sql-digest) support sql digest (#8919 )	2022-05-08 17:25:41 +08:00
924060929	52a2db18c0	[Enhancement](Optimizer) Optimize nereids tree node structure (#9446 ) This pr optimize nereids tree node structure for generic parameter and Nary abstract tree node. It can facilitate the use of pattern match framework.	2022-05-08 16:56:00 +08:00

1 2 3 4 5 ...

2140 Commits