doris

Author	SHA1	Message	Date
pengxiangyu	ee1bed46be	[config] Add backend_rpc_timeout_second in FE config (#9779 )	2022-05-27 21:58:09 +08:00
LOVEGISER	77de4869ed	[doc] website document update (#9776 )	2022-05-27 21:57:06 +08:00
Kang	efdb3b79a5	[feature] add zstd compression codec (#9747 ) ZSTD compression is fast with high compression ratio. It can be used to archive higher compression ratio than default Lz4f codec for storing cost sensitive data such as logs. Compared to Lz4f codec, we see zstd codec get 35% compressed size off, 30% faster at first time read without OS page cache, 40% slower at second time read with OS page cache in the following comparison test. test data: 25GB text log, 110 million rows test table: test_table(ts varchar(30), log string) test SQL: set enable_vectorized_engine=1; select sum(length(log)) from test_table be.conf: disable_storage_page_cache = true set this config to disable doris page cache to avoid all data cached in memory for test real decompression speed. test result master branch with lz4f codec result: - compressed size 4.3G - SQL first exec time(read data from disk + decompress + little computation) : 18.3s - SQL second exec time(read data from OS pagecache + decompress + little computation) : 2.4s this branch with zstd codec (hardcode enable it) result: - compressed size: 2.8G - SQL first exec time: 12.8s - SQL second exec time: 3.4s	2022-05-27 21:56:18 +08:00
Lightman	b2c2cdb122	[feature] Support compression prop (#8923 )	2022-05-27 21:52:05 +08:00
shaqf	ea4eaf1411	[doc] fix typos in bloomfilter.md (#9806 ) 修改错别字	2022-05-27 20:47:18 +08:00
Luwei	af2cfa2db4	[fix] Fix bug of bloom filter hash value calculation error (#9802 ) * Fix bug of bloom filter hash value calculation error * fix code style	2022-05-27 20:44:26 +08:00
Jerry Hu	a52e91a140	[chore] Update .gitignore to ignore generated files in tools (#9782 )	2022-05-27 20:43:06 +08:00
hanfengcan	b2b9463537	[doc] Updated the compilation and deployment chapter in the documentation (#9702 ) Updated the compilation and deployment chapter in the documentation	2022-05-27 17:11:31 +08:00
smallhibiscus	8019da4504	fix the format error of en doc. (#9756 ) fix the format error of en doc	2022-05-27 17:10:47 +08:00
lit	dbc2feb31a	modify bloomfilter.md (#9798 ) modify bloomfilter	2022-05-27 17:10:08 +08:00
jiafeng.zhang	b07004e2f7	[fix]Documentation fixes (#9787 ) [fix]Documentation fixes （rollup)	2022-05-27 12:57:00 +08:00
jiafeng.zhang	e7e551ffa7	[doc][fix]Sync job doc (#9790 ) binlog load documentation fix	2022-05-27 12:56:19 +08:00
jiafeng.zhang	80e9c3395a	[doc]Add export sql help documentation (#9797 ) Add export sql help documentation	2022-05-27 12:55:45 +08:00
jiafeng.zhang	add3f5da85	[doc]Remove tcp-h (#9794 ) remove tcp-h	2022-05-27 12:55:19 +08:00
Stalary	6698f63dec	[fix](function) If function adds type inference (#9728 )	2022-05-26 22:43:18 +08:00
yinzhijian	cbbda7857b	[feature-wip](parquet-orc) Support orc scanner in vectorized engine (#9541 )	2022-05-26 21:39:12 +08:00
jiafeng.zhang	dbc5af44ef	[doc]Add Doris join optimization documentation (#9753 ) * [doc]Add Doris join optimization documentation	2022-05-26 18:11:43 +08:00
morrySnow	e701c057dc	[style](fe) wrap and whitespace rules (#9764 ) change below rules' severity to error and fix original code error: - EmptyBlock - EmptyCatchBlock - LeftCurly - RightCurly - IllegalTokenText - MultipleVariableDeclarations - OneStatementPerLine - StringLiteralEquality - UnusedLocalVariable - Indentation - OuterTypeFilename - MethodParamPad - GenericWhitespace - NoWhitespaceBefore - OperatorWrap - ParenPad - WhitespaceAfter - WhitespaceAround	2022-05-26 16:56:20 +08:00
Pxl	13c1d20426	[Bug] [Vectorized] add padding when load char type data (#9734 )	2022-05-26 16:51:01 +08:00
jacktengg	9236c2efc9	[improvement] Show detail status code string for be http api (#9771 ) 1. move to_json method to common/status 2. modify related usage in http folder	2022-05-26 15:09:21 +08:00
jacktengg	f4dd3bf013	[bugfix] fix memleak in olapscannode(#9736 )	2022-05-26 15:06:54 +08:00
wudi	8898b11bb0	[docs]add sql mode markdown (#9742 ) Co-authored-by: wudi <>	2022-05-26 15:06:23 +08:00
Gabriel	24631915ed	[bugfix] fix correctness for vectorized compaction (#9773 )	2022-05-26 15:05:50 +08:00
Gabriel	cd99c24844	[Improvement] remove unused code in vectorized compaction (#9774 )	2022-05-26 15:05:27 +08:00
Mingyu Chen	32a210f426	[fix](help) fix bug of help command (#9761 ) This bug is introduced from #9306, that user need to execute "help stream-load" to show the help doc. But actually, it should be "help stream load".	2022-05-26 08:44:00 +08:00
Mingyu Chen	0c70359404	[fix](resource-tag) Consider resource tags when assigning tasks for broker & routine load (#9492 ) This CL mainly changes: 1. Broker Load When assigning backends, use user level resource tag to find available backends. If user level resource tag is not set, broker load task can be assigned to any BE node, otherwise, task can only be assigned to BE node which match the user level tags. 2. Routine Load The current routine load job does not have user info, so it can not get user level tag when assigning tasks. So there are 2 ways: 1. For old routine load job, use tags of replica allocation info to select BE nodes. 2. For new routine load job, the user info will be added and persisted in routine load job.	2022-05-26 08:42:09 +08:00
Adonis Ling	2a11a4ab99	[feature-wip][array-type] Support more sub types. (#9466 ) Please refer to #9465	2022-05-26 08:41:34 +08:00
spaces-x	73e31a2179	[stream-load-vec]: memtable flush only if necessary after aggregated (#9459 ) Co-authored-by: weixiang <weixiang06@meituan.com>	2022-05-25 21:12:24 +08:00
Gabriel	8470543144	[Improvement] fix typo (#9743 )	2022-05-25 19:29:01 +08:00
Zhengguo Yang	f5bef328fe	[fix] disable transfer data large than 2GB by brpc (#9770 ) because of brpc and protobuf cannot transfer data large than 2GB, if large than 2GB will overflow, so add a check before send	2022-05-25 18:41:13 +08:00
Zhengguo Yang	be026addde	[security] update canal version to fix fastjson security issue (#9763 )	2022-05-25 18:22:37 +08:00
camby	2ad691edf7	[doc] Add manual for Array data type and functions (#9700 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-05-25 16:44:20 +08:00
camby	2725127421	[fix] group by with two NULL rows after left join (#9688 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-05-25 16:43:55 +08:00
Xinyi Zou	ca05d1ee01	[fix](memory tracker) Fix lru cache, compaction tracker, add USE_MEM_TRACKER compile (#9661 ) 1. Fix Lru Cache MemTracker consumption value is negative. 2. Fix compaction Cache MemTracker has no track. 3. Add USE_MEM_TRACKER compile option. 4. Make sure the malloc/free hook is not stopped at any time.	2022-05-25 08:56:17 +08:00
924060929	cc9321a09b	[Enhancement](Nereids)refactor plan node into plan + operator (#9755 ) Close #9623 Summary: This pr refactor plan node into plan + operator. In the previous version in nereids, a plan node consists of children and relational algebra, e.g. ```java class LogicalJoin extends LogicalBinary { private Plan left, right; } ``` This structure above is easy to understand, but it difficult to optimize `Memo.copyIn`: rule generate complete sub-plan, and Memo must compare the complete sub-plan to distinct GroupExpression and hurt performance. First, we need change the rule to generate partial sub-plan, and replace some children plan to a placeholder, e.g. LeafOp in Columbia optimizer. And then mark some children in sub-plan to unchanged, and bind the relate group, so don't have to compare and copy some sub-plan if relate group exists. Second, we need separate the origin `Plan` into `Plan` and `Operator`, which Plan contains children and Operator, and Operator just denote relation relational algebra(no children/ input field). This design make operator and children not affect each other. So plan-group binder can generate placeholder plan(contains relate group) for the sub-query, don't have to generate current plan node case by case because the plan is immutable(means generate a new plan with replace children). And rule implementer can reuse the placeholder to generate partial sub-plan. Operator and Plan have the similar inheritance structure like below. XxxPlan contains XxxOperator, e.g. LogicalBinary contains a LogicalBinaryOperator. ``` TreeNode │ │ ┌───────┴────────┐ Operator │ │ │ │ │ │ │ │ │ ▼ ▼ ▼ Expression Plan PlanOperator │ │ │ │ ┌───────────┴─────────┐ │ │ │ ┌───────────┴──────────────────┐ │ │ │ │ │ │ │ │ ▼ ▼ ▼ ▼ LogicalPlan PhysicalPlan LogicalPlanOperator PhysicalPlanOperator │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ├───►LogicalLeaf ├──►PhysicalLeaf ├──► LogicalLeafOperator ├───►PhysicalLeafOperator │ │ │ │ │ │ │ │ │ │ │ │ ├───►LogicalUnary ├──►PhysicalUnary ├──► LogicalUnaryOperator ├───►PhysicalUnaryOperator │ │ │ │ │ │ │ │ │ │ │ │ └───►LogicalBinary └──►PhysicalBinary └──► LogicalBinaryOperator └───►PhysicalBinaryOperator ``` The concrete operator extends the XxxNaryOperator, e.g. ```java class LogicalJoin extends LogicalBinaryOperator; class PhysicalProject extends PhysicalUnaryOperator; class LogicalRelation extends LogicalLeafOperator; ``` So the first example change to this: ```java class LogicalBinary extends AbstractLogicalPlan implements BinaryPlan { private Plan left, right; private LogicalBinaryOperator operator; } class LogicalJoin extends LogicalBinaryOperator {} ``` Under such changes, Rule must build the plan and operator as needed, not only the plan like before. for example: JoinCommutative Rule ```java public Rule<Plan> build() { // the plan override function can automatic build plan, according to the Operator's type, // so return a LogicalBinary(LogicalJoin, Plan, Plan) return innerLogicalJoin().then(join -> plan( // operator new LogicalJoin(join.op.getJoinType().swap(), join.op.getOnClause()), // children join.right(), join.left() )).toRule(RuleType.LOGICAL_JOIN_COMMUTATIVE); } ```	2022-05-24 20:53:24 +08:00
Dongyang Li	90e8cda5f2	[Enhancement](Vectorized)build hash table with new thread, as non-vec… (#9290 ) * [Enhancement][Vectorized]build hash table with new thread, as non-vectorized past do edit after comments * format code with clang format Co-authored-by: lidongyang <dongyang.li@rateup.com.cn> Co-authored-by: stephen <hello-stephen@qq.com>	2022-05-24 10:23:15 +08:00
Yongqiang YANG	6353539ef7	[bugfix]teach BufferedBlockMgr2 track memory right (#9722 ) The problem was introduced by e2d3d0134eee5d50b6619fd9194a2e5f9cb557dc.	2022-05-24 10:18:51 +08:00
Kang	8b7bb2d07c	[bugfix]fix column reader compress codec unsafe problem (#9741 ) by moving codec from shared reader to unshared iterator	2022-05-23 20:25:49 +08:00
HappenLee	5039ec4570	[vec][opt] opt hash join build resize hash table before insert data (#9735 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-05-23 15:13:57 +08:00
jiafeng.zhang	fdd5bc07a9	[doc]Add SQL Select usage help documentation (#9729 ) Add SQL Select usage help documentation	2022-05-23 13:33:07 +08:00
HappenLee	500c36717d	[Bug-Fix][Vectorized] Full join return error result (#9690 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-05-23 13:29:37 +08:00
Shuangchi He	77297bb7ee	Fix some typos in fe/. (#9682 )	2022-05-23 12:11:01 +08:00
zxealous	5b13fa2b15	[typo] Fix typos in comments (#9710 )	2022-05-23 12:01:37 +08:00
Mingyu Chen	ddda91c89d	[doc] Update dev image (#9721 )	2022-05-23 11:59:15 +08:00
Gabriel	d97e2b1eb2	[doc] update docs for FE UT (#9718 )	2022-05-22 21:36:45 +08:00
zhengshiJ	d8f1b77cc1	[improvement](planner) Backfill the original predicate pushdown code (#9703 ) Due to the current architecture, predicate derivation at rewrite cannot satisfy all cases, because rewrite is performed on first and then where, and when there are subqueries, all cases cannot be derived. So keep the predicate pushdown method here. eg. select * from t1 left join t2 on t1 = t2 where t1 = 1; InferFiltersRule can't infer t2 = 1, because this is out of specification. The expression(t2 = 1) can actually be deduced to push it down to the scan node.	2022-05-22 21:35:32 +08:00
Jibing-Li	3768fdd3f8	[doc] Add trim_tailing_spaces_for_external_table_query variable to the docs. (#9701 )	2022-05-22 21:32:23 +08:00
Mingyu Chen	d270f4f2d4	[config](checksum) Disable consistency checker by default (#9699 ) Disable by default because current checksum logic has some bugs. And it will also bring some overhead.	2022-05-22 21:31:43 +08:00
zxealous	ad4da4aa8f	[doc] Fix typos in documentation (#9692 )	2022-05-22 21:30:22 +08:00
Yongqiang YANG	c13a6a1d8a	[fix] NullPredicate should implement evaluate_vec (#9689 ) select column from table where column is null	2022-05-22 21:29:53 +08:00

1 2 3 4 5 ...

4716 Commits