doris

Author	SHA1	Message	Date
jiafeng.zhang	d7705ace65	[fix](binlog-load) binlog load fails because txn exceeds the default value (#9471 ) binlog load Because txn exceeds the default value, resume is a failure, and a friendly prompt message is given to the user, instead of prompting success now, it still fails after a while, and the user will feel inexplicable Issue Number: close #9468	2022-05-12 13:31:22 +08:00
deardeng	cfbf13710b	[fix](broker-load) can't load parquet file with column name case sensitive with Doris column (#9358 )	2022-05-12 13:27:03 +08:00
morrySnow	122cc3b772	[chore](fe code style)add suppressions to fe check style (#9429 ) Current fe check style check all files. But some rules should be only applied on production files. Add suppressions to suppress some rules on test files.	2022-05-12 12:16:55 +08:00
Stalary	f11d320213	[feature] support row policy filter (#9206 )	2022-05-11 22:11:10 +08:00
jakevin	74352c807e	[refactor](Nereids): cascades refactor (#9470 ) Describe the overview of changes. - rename GroupExpression - use `HashSet<GroupExpression> groupExpressions` in `memo` - add label of `Nereids` for CI - remove `GroupExpr` from Plan	2022-05-11 11:07:58 +08:00
jiafeng.zhang	ad88eb739b	[fix](http) Hardening Recommendations Disable TRACE/TRAC methods (#9479 )	2022-05-11 09:41:59 +08:00
Mingyu Chen	8fa0122ed0	[refactor](backend) Refactor the logic of selecting Backend in FE. (#9478 ) There are many places in FE where a group of BE nodes needs to be selected according to certain requirements. for example: 1. When creating replicas for a tablet. 2. When selecting a BE to execute Insert. 3. When Stream Load forwards http requests to BE nodes. These operations all have the same logic. So this CL mainly changes: 1. Create a new `BeSelectionPolicy` class to describe the set of conditions for selecting BE. 2. The logic of selecting BE nodes in `SystemInfoService` has been refactored, and the following two methods are used uniformly: 1. `selectBackendIdsByPolicy`: Select the required number of BE nodes according to the `BeSelectionPolicy`. 2. `selectBackendIdsForReplicaCreation`: Select the BE node for the replica creation operation. Note that there are some changes here: For the replica creation operation, the round-robin method was used to select BE nodes before, but now it is changed to `random` selection for the following reasons: 1. Although the previous logic is round-robin, it is actually random. 2. The final diff of the random algorithm will not be greater than 5%, so it can be considered that the random algorithm can distribute the data evenly.	2022-05-11 09:40:57 +08:00
xueweizhang	375c1bf5c0	[feature](mysql-table) support utf8mb4 for mysql external table (#9402 ) This patch supports utf8mb4 for mysql external table. if someone needs a mysql external table with utf8mb4 charset, but only support charset utf8 right now. When create mysql external table, it can add an optional propertiy "charset" which can set character fom mysql connection, default value is "utf8". You can set "utf8mb4" instead of "utf8" when you need.	2022-05-11 09:39:23 +08:00
Stalary	092a12e983	[feature] show create materialized view (#9391 )	2022-05-11 09:29:55 +08:00
924060929	99b8e08a5f	[Enhancement](Optimizer) Nereids pattern matching base framework (#9474 ) This pr provide a new pattern matching framework for Nereids optimizer. The new pattern matching framework contains this concepts: 1. `Pattern`/`PatternDescriptor`: the tree node's multiple hierarchy shape, e.g. `logicalJoin(logicalJoin(), any()` pattern describe a plan that root is a `LogicalJoin` and the left child is `LogicalJoin` too. 2. `MatchedAction`: a callback function when the pattern matched, usually you can create new plan to replace the origin matched plan. 3. `MatchingContext`: the param pass through MatchedAction, contains the matched plan root and the PlannerContext. 4. `PatternMatcher`: contains PatternDescriptor and MatchedAction 5. `Rule`: a rewrite rule contains RuleType, PatternPromise, Pattern and transform function(equals to MatchedAction) 6. `RuleFactory`: the factory can help us build Rules easily. RuleFactory extends Patterns interface, and have some predefined pattern descriptors. for example, Join commutative: ```java public class JoinCommutative extends OneExplorationRuleFactory { @Override public Rule<Plan> build() { return innerLogicalJoin().thenApply(ctx -> { return new LogicalJoin( JoinType.INNER_JOIN, ctx.root.getOnClause(), ctx.root.right(), ctx.root.left() ); }).toRule(RuleType.LOGICAL_JOIN_COMMUTATIVE); } } ``` the code above show the three step to create a Rule 1. 'innerLogicalJoin()' declare pattern is an inner logical join. 'innerLogicalJoin' is a predefined pattern. 2. invoke 'thenApply()' function to combine a MatchedAction, return a new LogicalJoin with exchange children. 3. invoke 'toRule()' function to convert to Rule You can think the Rule contains three parts: 1. Pattern 2. transform function / MatchedAction 3. RuleType and RulePromise So 1. `innerLogicalJoin()` create a `PatternDescriptor`, which contains a `Pattern` 2. `PatternDescriptor.then()` convert `PatternDescriptor` to `PatternMatcher,` witch contains Pattern and MatchedAction 3. `PatternMatcher.toRule()` convert `PatternMatcher` to a Rule This three step inspired by the currying in function programing. It should be noted, #9446 provide a generic type for TreeNode's children, so we can infer multiple hierarchy type in this pattern matching framework, so you can get the really tree node type without unsafely cast. like this: ```java logicalJoin(logicalJoin(), any()).then(j -> { // j can be inferred type to LogicalJoin<LogicalJoin<Plan, Plan>, Plan> // so j.left() can be inferred type to LogicalJoin<Plan, Plan>, // so you don't need to cast j.left() from 'Plan' to 'LogicalJoin' var node = j.left().left(); }) ```	2022-05-10 10:06:04 +08:00
leo65535	d1b85d51a0	[code style](fe) Include test sources (#9366 ) Include test sources, we also need to check them.	2022-05-09 09:40:44 +08:00
caiconghui	580ce38a3f	[fix](schema_hash) Fix bug that introduced by removing schema_hash (#9449 )	2022-05-08 21:03:10 +08:00
Henry2SS	c633402ce3	[feature] (sql-digest) support sql digest (#8919 )	2022-05-08 17:25:41 +08:00
924060929	52a2db18c0	[Enhancement](Optimizer) Optimize nereids tree node structure (#9446 ) This pr optimize nereids tree node structure for generic parameter and Nary abstract tree node. It can facilitate the use of pattern match framework.	2022-05-08 16:56:00 +08:00
Shuo Wang	1746f61388	[refactor](test) Refactor FE unit test framework that starts a FE server. (#9388 ) Currently, we use `UtFrameUtils` to start a FE server in the FE unit test. Each test class has to do some initialization and clean up stuff with the JUnit4 `@BeforeClass` and `@AfterClass` annotation. It's redundant and boring. Besides, almost all the APIs in `UtFrameUtils` has a `ConnectContext` parameter, which is not easy to use. This PR proposes to use an inherit-manner, i.e., wrap all the common logic in base class `TestWithFeService`, leveraging the JUnit5 `@BeforeAll` and `@AfterAll` annotation to narrow down the setup and cleanup lifecycle to each test class instance. At the same time, the derived concrete test class could directly use utility methods inherited from the base class, without calling a util class and passing a `ConnectContext` argument. `UtFrameUtils` and `DorisAssert` are marked as deprecated. We could remove these two classes if this refactor works well for a time.	2022-05-07 21:28:42 +08:00
zhangstar333	fd11a6b493	[fix][feature](Function) fix return type && support hll_union_agg/group_concat agg to window function (#9119 )	2022-05-07 20:44:04 +08:00
zhengshiJ	e5a88dd0a4	[fix](rewrite) The where condition cannot be pushed down because there is no derivation (#8980 ) Fix a bug. The where condition cannot be pushed down because there is no derivation eg: select * from tb1 left join tb2 on tb1.id = tb2.id where tb2.id = 1; The correct case is that the condition of "=1" needs to be deduced to tb1.id, but the current implementation does not do the deduction	2022-05-07 20:41:11 +08:00
zhannngchen	4235db8902	[refactor] some code cleanup for min/max function. (#8874 )	2022-05-07 20:39:44 +08:00
jiafeng.zhang	9bae0a61ed	[fix]Stream load 307 temporary redirection authentication information is lost (#9363 )	2022-05-07 19:22:45 +08:00
Stalary	b6a74cfea5	[Bug][CTAS] create table by partition list (#9412 ) Co-authored-by: Rongqian Li <rongqian_li@idgcapital.com>	2022-05-07 19:17:39 +08:00
Stalary	6f0c8fb698	[Feature] CTAS support insert data (#9271 )	2022-05-07 08:51:54 +08:00
morrySnow	ce02c661e3	[WIP-feature](Optimizer) Nereids code base (#9392 ) Nereids(new optimizer) code base Nereids is new query planner for Doris. It include three main parts: parser, analyzer and optimizer. The parser, generated by ANTLR4, transforms SQL into a logical plan with a tree structure. Analysis and optimization are performed on the logical plan of the tree structure. Each transformation is defined as a rule. The rule is applied to the logical plan using pattern matching. The implementation of the optimizer follows the approach in the Cascades paper.	2022-05-06 16:22:29 +08:00
Jibing-Li	a5f9031c89	[improvement](hive) Support hive with HA HDFS. Pass ha configuration through hive create table properties. (#9151 ) Doris couldn't resolve the defaultFS of HDFS with HA configuration, so it could query hive table on HA HDFS. This is because there's no way to send the HA configs to hive external table. Describe the overview of changes. Pass the ha configs to hive external table through create table properties. Usage: Example of creating hive table with ha configuration properties: CREATE TABLE region ( r_regionkey integer NOT NULL, r_name char(25) NOT NULL, r_comment varchar(152) ) engine=hive properties ("database"="default", "table"="region", "hive.metastore.uris"="thrift://172.21.16.11:7004", "dfs.nameservices"="hacluster", "dfs.ha.namenodes.hacluster"="3,4", "dfs.namenode.rpc-address.hacluster.3"="192.168.0.93:8020", "dfs.namenode.rpc-address.hacluster.4"="172.21.16.11:8020", "dfs.client.failover.proxy.provider.hacluster"="org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider");	2022-05-05 23:43:11 +08:00
Zhengguo Yang	6d1c300241	[improvement](odbc) support more ODBC Connection Parameters for odbc external table (#9198 ) user can add more supported ODBC Connection Parameters to resource PROPERTIES or external table PROPERTIES	2022-05-05 20:45:13 +08:00
Mingyu Chen	e222a50c42	[fix](backup) Remove colocate_with property when backing up a table (#9142 ) We currently not support backup table with colocation property. So that we have to remove colocate_with property from a table when backing up.	2022-05-05 20:44:27 +08:00
yiguolei	0604ecba17	[fixbug][metadata] catalog could not load from image (#9364 ) * [fixbug][catalog] catalog could not load from image * fix ut failed Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-05-05 09:58:01 +08:00
jiafeng.zhang	512d8784d9	[improvement](load) Add http header size parameter (#9357 ) Add the http header size parameter to avoid failure due to too many fields when users import using stream load. The normal default is 8192, and 10K is given here.	2022-05-05 09:56:53 +08:00
HappenLee	23d673a178	[fix](vectorized) Fix bug of outer join with delete column (#9315 )	2022-05-05 09:42:48 +08:00
leo65535	c5941fd166	[FE Code Style][sub] Adjust some check rules (#9345 ) Adjust `RedundantImport`,`UnusedImports`,`EmptyStatement`,`NewlineAtEndOfFile`,`UpperEll`, `AvoidStarImport`, `MissingOverride` rules.	2022-05-04 23:34:55 +08:00
yiguolei	f1aa9668af	[refactor][storage format] Forbidden rowset v1 (#9248 ) - Force change the existing olaptable's storage format from V1 to V2 - Forbidden to create new olap table with storage format == v1 OR do schema change that want to create new v1 format	2022-05-04 17:32:20 +08:00
Mingyu Chen	3baf6cefc3	[fix](alter-job) Missing alter job when doing checkpoint image (#9329 ) This bug is introduced from #8030	2022-05-03 22:36:36 +08:00
Mingyu Chen	dcf5f784d8	[fix](catalog) fix bug that replica missing version cause query -214 error (#9266 ) 1. Fix bug described in #9267 When report missing version replica, set last failed version to (replica version + 1) 2. Skip non-exist partition when handling transactions.	2022-05-03 17:54:19 +08:00
Mingyu Chen	f5f629304b	[fix](truncate) fix bug that truncate partition throw NPE (#9339 ) 1. partition name is case insensitive. 2. add a simple help-resource.zip to help pass the FE ut.	2022-05-01 18:26:56 +08:00
morrySnow	784681f106	[FE Code Style][step 0]add github action to check incremental code in pr (#9328 ) 1. add rules to checkstyle 2. add github action to check incremental code in pr	2022-05-01 17:30:29 +08:00
ElvinWei	4bd5d4f163	[feature-wip](statistics) step3: schedule the statistics tasks and update relevant info (#8860 ) This pull request includes some implementations of the statistics(https://github.com/apache/incubator-doris/issues/6370), it will not affect any existing code and users will not be able to create statistics job. After receiving the statistics statement and dividing the collection task, here we will start implementing the scheduling statistics task and updating the job information. Mainly include the following: - Create a thread pool to schedule a certain number of tasks, and the number of concurrency is related to the configuration `cbo_concurrency_statistics_task_num`. - After the task is completed, update the information of of the statistics Job.	2022-05-01 11:34:08 +08:00
Mingyu Chen	489581777f	[fix](ut) Fix MarkDownParserTest (#9332 )	2022-04-30 13:02:11 +08:00
Mingyu Chen	420cc2c3d8	[fix](help-doc) fix format of all sql-manual doc (#9306 )	2022-04-29 11:42:02 +08:00
EmmyMiao87	c132abd2bd	(Refactor)[Statistics] Fix lock risks in Statistics Job (#9256 ) * (Refactor)[Statistics] Fix lock risks in Statistics Job 1. Remove lock nesting between job and task 2. Solve the deadlock problem during job update 3. Avoid printing the log while holding the lock * Add log	2022-04-29 10:46:24 +08:00
Mingyu Chen	cbfb4a3115	[fix](materialized-view) fix bug that can not create mv for list partitioned table (#9281 )	2022-04-29 10:45:09 +08:00
zhengshiJ	9ef09b8354	[feature](statistics) Statistics derivation.Step 1:ScanNode implement… (#8947 ) * [feature](statistics) Statistics derivation.Step 1:ScanNode implementation Co-authored-by: jianghaochen <jianghaochen@meituan.com>	2022-04-29 10:41:12 +08:00
xy720	93a41b2625	[refactor][routineload] Remove unused client object from routine load (#9223 )	2022-04-29 10:40:07 +08:00
HappenLee	d330bc3806	[Vectorized](stream-load-vec) Support stream load in vectorized engine (#8709 ) (#9280 ) Implement vectorized stream load. Added fe configuration option `enable_vectorized_load` to enable vectorized stream load. Co-authored-by: tengjp@outlook.com Co-authored-by: mrhhsg@gmail.com Co-authored-by: minghong.zhou@163.com Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>	2022-04-29 09:50:51 +08:00
EmmyMiao87	1378e7e05f	(Refactor)[Planner] Remove merge node (#9251 )	2022-04-28 15:05:35 +08:00
Zhengguo Yang	b6b6e17eb7	[chore] (workflow)add sonarcloud workflow to check code quality and security (#9252 )	2022-04-28 11:09:56 +08:00
Mingyu Chen	0b6758cacd	[fix](checkpoint) fix checkpoint failure when reloading new image (#9262 ) Introduced from #9011	2022-04-28 09:47:16 +08:00
xy720	2ec0b98787	[fix](routine-load) Fix bug that new coming routine load tasks are rejected all the time and report TOO_MANY_TASK error (#9164 ) ``` CREATE ROUTINE LOAD iaas.dws_nat ON dws_nat WITH APPEND PROPERTIES ( "desired_concurrent_number"="2", "max_batch_interval" = "20", "max_batch_rows" = "400000", "max_batch_size" = "314572800", "format" = "json", "max_error_number" = "0" ) FROM KAFKA ( "kafka_broker_list" = "xxxx:xxxx", "kafka_topic" = "nat_nsq", "property.kafka_default_offsets" = "2022-04-19 13:20:00" ); ``` In the create statement example below, you can see The user didn't specify the custom partitions. So that 1. Fe will get all kafka partitions from server in routine load's scheduler. The user set the default offset by datetime. So that 2. Fe will get kafka offset by time from server in routine load's scheduler. When 1 is success, meanwhile 2 is failed, the progress of this routine load may not contains any partitions and offsets. Nevertheless, since newCurrentKafkaPartition which is get by kafka server may be always equal to currentKafkaPartitions, the wrong progress will never be updated.	2022-04-27 23:21:17 +08:00
shee	5a7e46fe7b	[fix](planner) fix non-equal out join is not supported (#9156 )	2022-04-27 23:19:13 +08:00
ElvinWei	dfbeeccd47	[feature-wip](statistics) step2: schedule the statistics job and generate executable tasks (#8859 ) This pull request includes some implementations of the statistics(https://github.com/apache/incubator-doris/issues/6370), it will not affect any existing code and users will not be able to create statistics job. After receiving the statistics collection statement, it generates a job. Here it implements the division of statistics collection jobs according to the following statistics categories: table: - `row_count`: table row count are critical in estimating cardinality and memory usage of scan nodes. - `data_size`: table size, not applicable to CBO, mainly used to monitor and manage table size. column: - `num_distinct_value`: used to determine the selectivity of an equivalent expression. - `min`: The minimum value. - `max`: The maximum value. - `num_nulls`: number of nulls. - `avg_col_len`: the average length of a column, in bytes, is used for memory and network IO evaluation. - `max_col_len`: the Max length of the column, in bytes, is used for memory and network IO evaluation. After the job is divided, statistics tasks will be obtained.	2022-04-27 11:05:43 +08:00
GoGoWen	4f19fe81ec	remove some unused code (#9240 )	2022-04-27 11:04:16 +08:00
Zhengguo Yang	597115c305	[feature] add `SHOW TABLET STORAGE FORMAT` stmt (#9037 ) use this stmt to show tablets storage format in be, if verbose is set, will show detail message of tablet storage format. e.g. ``` MySQL [(none)]> admin show tablet storage format; +-----------+---------+---------+ \| BackendId \| V1Count \| V2Count \| +-----------+---------+---------+ \| 10002 \| 0 \| 2867 \| +-----------+---------+---------+ 1 row in set (0.003 sec) MySQL [test_query_qa]> admin show tablet storage format verbose; +-----------+----------+---------------+ \| BackendId \| TabletId \| StorageFormat \| +-----------+----------+---------------+ \| 10002 \| 39227 \| V2 \| \| 10002 \| 39221 \| V2 \| \| 10002 \| 39215 \| V2 \| \| 10002 \| 39199 \| V2 \| +-----------+----------+---------------+ 4 rows in set (0.034 sec) ``` add storage format infomation to show full table statment. ``` MySQL [test_query_qa]> show full tables; +-------------------------+------------+---------------+ \| Tables_in_test_query_qa \| Table_type \| StorageFormat \| +-------------------------+------------+---------------+ \| bigtable \| BASE TABLE \| V2 \| \| test_dup \| BASE TABLE \| V2 \| \| test \| BASE TABLE \| V2 \| \| baseall \| BASE TABLE \| V2 \| \| test_string \| BASE TABLE \| V2 \| +-------------------------+------------+---------------+ 5 rows in set (0.002 sec) ```	2022-04-27 10:53:43 +08:00

1 2 3 4 5 ...

2104 Commits