doris

Author	SHA1	Message	Date
HappenLee	aa1bcdbc18	[Bug] Show create table null pointer of storage policy and error htttp path of tablet info (#10950 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-07-22 20:55:35 +08:00
carlvinhust2012	49a17bea99	[regression]add the cases for csv/orc/parquet file format (#11082 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-07-22 20:54:34 +08:00
morrySnow	d17c906eb7	[chore](FE)add license header check in fe's checkstyle (#11076 ) Add license header check in fe's checkstyle	2022-07-22 18:37:32 +08:00
minghong	34f328aa57	[feature] (Nereids) Merge memo group recursively (#11043 ) In Memo.copyIn( plan, group1, isRewrite), one branch is that the plan is already recorded in Memo, and owned by group 'group2'. In such case, 'group1' should be merged with 'group2', because they are equivalent. After merge, the upper level of 'group1', saying 'p1 = group1.getLogicalExpression().getOwnerGroup()' of 'group1', and that of 'group2', saying 'p2', are equivalent. We need to merge 'p1' and 'p2'. And this process is recursive.	2022-07-22 18:31:32 +08:00
Shuo Wang	0681e4f04f	[Refactor](Nereids) Remove expression type. (#11066 ) ExpressionType is duplicated with Java class type info, so removed it.	2022-07-22 17:48:18 +08:00
huangzhaowei	6963c41a04	[dependency] Upgrade Apache Commons Validator version to the latest one (#10508 )	2022-07-22 17:03:46 +08:00
jakevin	764abfe72f	[enhancement](community): add ci for close outdated PR. (#11088 )	2022-07-22 17:00:17 +08:00
xy720	3744321f01	[feature-wip](array-type) add function array_union/array_except/array_intersect (#10781 ) Add array_union/array_except/array_intersect function.	2022-07-22 13:50:13 +08:00
lihangyu	9d21b2154d	[Fix](Array) correct the offset when using get_data_at from _item_convertor (#11094 ) get_data_at should use offset - offsets[start_index] since start_index may be changed after OlapColumnDataConvertorArray::set_source_column. Using just offset may access the memory out of _item_convertor's data range,	2022-07-22 11:25:17 +08:00
Mingyu Chen	4003489bd0	[fix](update) check LOAD priv for update stmt (#11099 )	2022-07-22 11:24:44 +08:00
xy720	40c8853c5d	[Fix] Fix select external table return “Lost connection to MySQL server during query” error	2022-07-22 11:24:09 +08:00
caoliang-web	552c0568fe	[sample]Add flink doris connector 1.1 sample code (#10970 ) * Add flink doris connector 1.1 sample code	2022-07-22 10:18:27 +08:00
chenlinzhong	900430f55e	[docs] update invaid links (#10272 )	2022-07-22 09:59:53 +08:00
Adonis Ling	9017afb7a8	[enhancement](workflow) Add a workflow for Clang check (#11083 ) * Add a workflow for Clang check	2022-07-22 09:14:21 +08:00
Dongyang Li	cac0dfcd63	[tools] opt tpch q9 (#11092 )	2022-07-22 08:38:43 +08:00
Mingyu Chen	7e3fc0d321	[enhancement](vec) Support outer join for vectorized exec engine (#11068 ) Hash join node adds three new attributes. The following will take an SQL as an example to illustrate the meaning of these three attributes ``` select t1. a from t1 left join t2 on t1. a=t2. b; ``` 1. vOutputTupleDesc：Tuple2(a'') 2. vIntermediateTupleDescList: Tuple1(a', b'<nullable>) 2. vSrcToOutputSMap: <Tuple1(a'), Tuple2(a'')> The slot in intermediatetuple corresponds to the slot in output tuple one by one through the expr calculation of the left child in vsrctooutputsmap. This code mainly merges the contents of two PRs: 1. [fix](vectorized) Support outer join for vectorized exec engine (https://github.com/apache/doris/pull/10323) 2. [Fix](Join) Fix the bug of outer join function under vectorization #9954 The following is the specific description of the first PR In a vectorized scenario, the query plan will generate a new tuple for the join node. This tuple mainly describes the output schema of the join node. Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema. For example: 1. The case where the null side column caused by outer join is converted to nullable. 2. The projection of the outer tuple. The following is the specific description of the second PR This pr mainly fixes the following problems: 1. Solve the query combined with inline view and outer join. After adding a tuple to the join operator, the position of the `tupleisnull` function is inconsistent with the row storage. Currently the vectorized `tupleisnull` will be calculated in the HashJoinNode.computeOutputTuple() function. 2. Column nullable property error problem. At present, once the outer join occurs, the column on the null-side side will be planned to be nullable in the semantic parsing stage. For example： ``` select * from (select a as k1 from test) tmp right join b on tmp.k1=b.k1 ``` At this time, the nullable property of column k1 in the `tmp` inline view should be true. In the vectorized code, the virtual `tableRef` of tmp will be used in constructing the output tuple of HashJoinNode (specifically, the function HashJoinNode.computeOutputTuple()). So the correctness of the column nullable property of this tableRef is very important. In the above case, since the tmp table needs to perform a right join with the b table, as a null-side tmp side, it is necessary to change the column attributes involved in the tmp table to nullable. In non-vectorized code, since the virtual tableRef tmp is not used at all, it uses the `TupleIsNull` function in `outputsmp` to ensure data correctness. That is to say, the a column of the original table test is still non-null, and it does not affect the correctness of the result. The vectorized nullable attribute requirements are very strict. Outer join will change the nullable attribute of the join column, thereby changing the nullable attribute of the column in the upper operator layer by layer. Since FE has no mechanism to modify the nullable attribute in the upper operator tuple layer by layer after the analyzer. So at present, we can only preset the attributes before the lower join as nullable in the analyzer stage in advance, so as to avoid the problem. (At the same time, be also wrote some evasive code in order to deal with the problem of null to non-null.) Co-authored-by: EmmyMiao87 Co-authored-by: HappenLee Co-authored-by: morrySnow Co-authored-by: EmmyMiao87 <522274284@qq.com>	2022-07-21 23:39:25 +08:00
huangzhaowei	4d158f9050	[dependency](arrow) Add GetRawORCReader function for arrow orc reader (#11069 ) Add a new function in arrow adapter to get the raw orc reader which we can get more information from such offset or min/max value. And this will be used in #1046 This modify is inspired by Clickhouse	2022-07-21 22:23:05 +08:00
Yongqiang YANG	08ebef2992	[Enhancement] check vm.max_map_count before starting (#11052 ) When vectorized engine is enabled, doris uses much more vmas than before, and it leads to core dump due to memory allocation failure.	2022-07-21 21:16:48 +08:00
huangzhaowei	7147a7c290	[feature-wip](multi-catalog) Support s3 storage for file scan node (#10977 ) This is an example of s3 hms_catalog: ```sql CREATE CATALOG hms_catalog properties( "type" = "hms", "hive.metastore.uris"="thrift://localhost:9083", "AWS_ACCESS_KEY" = "your access key", "AWS_SECRET_KEY"="your secret key", "AWS_ENDPOINT"="s3 endpoint", "AWS_REGION"="s3-region", "fs.s3a.paging.maximum"="1000"); ``` All these params are necessary;	2022-07-21 17:38:53 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
carlvinhust2012	5f6f35e886	Add the supported sub-type for array (#10824 ) 1.This pr is used for adding the supported sub-type for array which has been modified in #9916 2.add regression test for the supported sub-type Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-07-21 16:29:17 +08:00
Dongyang Li	ae53a8a7e9	[regression] sf1DataPath can be url or local path (#11065 )	2022-07-21 14:35:24 +08:00
924060929	03783ce551	[fix](Nereids) fix merge conflict caused compile error (#11064 ) fix merge conflict by #10882 and #10667 remove duplicate function hashCode	2022-07-21 14:14:26 +08:00
Compilation Success	a1758bd139	[feature-wip](unique-key-merge-on-write) Add agg cache for delete bitmap DSIP-018 (#10921 ) Use global LRU for delete bitmap cache	2022-07-21 12:48:44 +08:00
shee	f8ad2613cf	[Enhancement](Nereids) add some expr rewrite rule and plan rewrite rule of rewrite its expression (#10667 ) # first: Add two expr rewrite rule: 1. remove duplicate expr a = 1 and a = 1 -> a = 1 2. extract common expr (a or b) and (a or c) -> a or (b and c) # second: Add some plan rewrite rule of rewriting expr of operator 1. NormalizeExpressionOfPlan contains normalize expr rewrite rule. Using these normalizerule rewrite LogicalFilter、LogicalAggravate，LogicalProject，LogicalJoin exprs 2. OptimizeExpressionOfPlan contains optimize expr rewrite rule. Using these optimize rule rewrite LogicalFilter、LogicalAggravate，LogicalProject，LogicalJoin exprs	2022-07-21 12:35:28 +08:00
morrySnow	072479fa21	[enhancement](Nereids)expression equals and hashCode function (#10882 ) review and add all missing equals and hashCode function to Expression and its sub class. Alias Arithmetic BoundFunction CompoundPredicate Not UnboundFunction UnboundSlot UnboundStar	2022-07-21 12:20:53 +08:00
yinzhijian	329f70dc02	[enhancement](Nereids) support case when for TPC-H (#10947 ) support case when for TPC-H for example: CASE [expression] WHEN [value] THEN [expression] ... ELSE [expression] END or CASE WHEN [predicate] THEN [expression] ... ELSE [expression] END	2022-07-21 12:02:37 +08:00
Mingyu Chen	d36b927fdb	[improvement](fe-ut) use local journal to make FE ut run fast (#11038 ) * [improvement](fe-ut) use local journal to make FE ut run fast	2022-07-21 09:12:21 +08:00
carlvinhust2012	b59ce73e1d	fix the case fail when enable Hdfs (#11051 )	2022-07-21 07:09:09 +08:00
Dongyang Li	b35d5bc15c	[regressiontest] add tpcds_sf1 test (#10852 ) (#11042 ) * [regressiontest] add tpcds_sf1 test (#10852) Co-authored-by: smallhibiscus <844981280> Co-authored-by: stephen <hello-stephen@qq.com> * ignore q30 temporarily since it encounter latin character Ô Co-authored-by: stephen <hello-stephen@qq.com>	2022-07-21 07:07:05 +08:00
Gabriel	b115b362fb	[Bug] fix bug for function `unix_timestamp` (#11041 ) * [Bug] fix bug for function `unix_timestamp`	2022-07-20 20:17:41 +08:00
jiafeng.zhang	b95dedd07b	[doc]Gis function style (#11015 )	2022-07-20 19:18:35 +08:00
HappenLee	d9b6e07e9d	[Vectorized] Support ODBC sink for vec exec engine (#11045 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-07-20 19:09:41 +08:00
Xin Liao	c037066163	[fix](cache) fix that ShardedLRUCache may coredump when destructor was called (#10995 )	2022-07-20 19:07:04 +08:00
plat1ko	2df1822269	[bugfix]fix DCHECK failure in remove_all_remote_rowsets (#10994 )	2022-07-20 19:06:21 +08:00
Jibing-Li	6aadee9a2e	[data lake]Support hdfs ha for Iceberg table. (#11002 ) * Support Iceberg on HDFS with HA mode enabled.	2022-07-20 19:03:58 +08:00
AlexYue	a607c30ad4	[docs] Fe build idea doc (#10996 ) * [doc](fe): enhance the fe-idea-dev * [doc](fe)add solution for m1 mac compile error Co-authored-by: jackwener <jakevingoo@gmail.com>	2022-07-20 19:03:29 +08:00
Hong Liu	b62e3e7aa0	[regression test]Add ssb sf1 test under unique table with zstd (#11004 ) Co-authored-by: smallhibiscus <844981280>	2022-07-20 18:59:46 +08:00
camby	0a8ae6aeec	Refractor COLLECT_LIST and COLLECT_SET register logic (#10956 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-07-20 18:02:39 +08:00
Dongyang Li	1ca00e0107	[tools] add clickbench tools (#11009 ) * [tools] add clickbench tools Co-authored-by: stephen <hello-stephen@qq.com>	2022-07-20 17:59:04 +08:00
Adonis Ling	e5663f9872	[Bug](array-type) Fix the core dump caused by unaligned __int128 (#11020 ) Fix the core dump caused by unaligned __int128 and change DEFAULT_ALIGNMENT	2022-07-20 16:37:27 +08:00
Lightman	a71822a74d	[refactor]remove col_unique_id (#11025 )	2022-07-20 16:35:14 +08:00
Mingyu Chen	7bdce8f572	[refactor](policy) refactor some policy create and check logic (#11007 ) * [refactor](policy) refactor some policy create and check logic	2022-07-20 16:20:59 +08:00
morrySnow	658a9f7531	[fix](planner)unnecessary cast will be added on children in InPredicate (#11033 )	2022-07-20 16:00:26 +08:00
minghong	6233b5200e	[refactor] (Nereids) rename GroupExpression.getParent() to getOwnerGroup() (#11027 ) GroupExpression.getParent() returns the group which contains this expr. This name is missleading especially in tree structures. So we change the name to getOwnerGroup.	2022-07-20 15:57:59 +08:00
zhannngchen	a1c1cfce47	Add some comments for the feature mow (#11028 )	2022-07-20 15:35:41 +08:00
zhannngchen	ec5471f048	[feature-wip](unique-key-merge-on-write) Implement tablet lookup interface, using rowset-tree, DSIP-018[3/5] (#10938 )	2022-07-20 14:52:14 +08:00
Shuo Wang	9b91f86c38	[Feature](Nereids) Reorder join to eliminate cross join. (#10890 ) Try to eliminate cross join via finding join conditions in filters and changing the join orders. For example: -- input: SELECT * FROM t1, t2, t3 WHERE t1.id=t3.id AND t2.id=t3.id -- output: SELECT * FROM t1 JOIN t3 ON t1.id=t3.id JOIN t2 ON t2.id=t3.id This feature is controlled by session variable enable_nereids_reorder_to_eliminate_cross_join with true by default. Simplify usage of Memo and rewrite rule application. Before this PR, if we want to apply a rewrite rule to a plan, the code is like the below: Memo memo = new Memo(); memo.initialize(root); PlannerContext plannerContext = new PlannerContext(memo, new ConnectContext()); JobContext jobContext = new JobContext(plannerContext, new PhysicalProperties(), 0); RewriteTopDownJob rewriteTopDownJob = new RewriteTopDownJob(memo.getRoot(), ImmutableList.of(new AggregateDisassemble().build()), jobContext); plannerContext.pushJob(rewriteTopDownJob); plannerContext.getJobScheduler().executeJobPool(plannerContext); Plan after = memo.copyOut(); After this PR, we could use chain style calling: new Memo(plan) .newPlannerContext(connectContext) .setDefaultJobContext() .topDownRewrite(new AggregateDisassemble()) .getMemo() .copyOut(); Rename the session variable enable_nereids to enable_nereids_planner to make it more meaningful.	2022-07-20 13:53:54 +08:00
Mingyu Chen	56e036e68b	[feature-wip](multi-catalog) Support runtime filter for file scan node (#11000 ) * [feature-wip](multi-catalog) Support runtime filter for file scan node Co-authored-by: morningman <morningman@apache.org>	2022-07-20 12:36:57 +08:00
Kikyou1997	a5a50726bf	[Ehancement](planner) Rewrite implicit cast to the predicates (#10920 ) During the analysis of BinaryPredicate, it will generate a CastExpr if the slot implicitly in the below case: SELECT * FROM t1 WHERE t1.col1 = '1'; col1 is integer column. This will prevent the binary predicate from pushing down to OlapScan which would impact the performance.	2022-07-20 12:28:29 +08:00

1 2 3 4 5 ...

5412 Commits