doris

Author	SHA1	Message	Date
jakevin	db8bc80c36	[feature](Nereids): semi join transpose (#12590 ) * [feature](Nereids): semi join transpose and enable ZIG_ZAG join reorder.	2022-09-15 21:32:50 +08:00
Zhengguo Yang	c6c84a2784	[chore](build) add build param to version string (#12591 )	2022-09-15 17:09:22 +08:00
morrySnow	858e8234d7	[feature](Nereids) add predicates push down on all join type (#12571 ) * [feature](Nereids) add predicates push down on all join type	2022-09-15 15:18:42 +08:00
yinzhijian	5b6d48ed5b	[feature](nereids) support distinct count (#12159 ) support distinct count with group by clause. for example: SELECT count(distinct c_custkey + 1) FROM customer group by c_nation; TODO: support distinct count without group by clause.	2022-09-15 13:01:47 +08:00
Shuo Wang	b11791b9a8	[Feature](Nereids) Limit pushdown. (#12518 ) This PR adds rewrite rules to push the limit down. Following two cases would be handled: ``` limit -> join limit -> project -> join ```	2022-09-15 12:12:10 +08:00
Shuo Wang	d2d5c19d51	[Improvement](Nereids) Avoid unsafe cast. (#12603 ) This PR changed some interfaces to avoid unsafe cast. - Modify `Plan.getExpressions()`'s return type from `List<Expression>` to `List<? extends Expression>` Return projects (type is a list of named expression) in `getExpressions` can avoid unsafe cast. See `LogicalProject.getExpression()` as an example. - Modify `EmptyRelation.getProjects()`'s return type from `List<NamedExpression>` to `List<? extends NamedExpression>` Creating empty relation with a list of slots can avoid unsafe cast. See the `EliminateLimit` rule for example.	2022-09-15 12:02:35 +08:00
mch_ucchi	5e0dc11f87	[feature](Nereids)add RelationId as a unique identifier of relations (#12461 ) In Nereids, we could not distinguish two relation from same table in one PlanTree. This lead to some trick code to process them when do plan. Such as a separate branch to do equals in GroupExpression. This PR add RelationId to LogicalRelation and PhysicalRelation. Then all relations equals function will compare RelationId to help us distinguish two relation from same table. TODO: add relation id to UnboundRelation, UnboundOneRowRelation, LogicalOneRowRelation, PhysicalOneRowRelation.	2022-09-15 11:56:56 +08:00
Gabriel	fc4298e85e	[feature](outfile) support parquet writer (#12492 )	2022-09-15 11:09:12 +08:00
zhangstar333	22a8d35999	[Feature](vectorized) support jdbc sink for insert into data to table (#12534 )	2022-09-15 11:08:41 +08:00
carlvinhust2012	33f5a86e69	[fix](array-type) forbid to create materialized view for array column (#12543 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-09-15 11:08:23 +08:00
HappenLee	e413a2b8e9	[Opt](vectorized) Use new way to do hash shffle to speed up query (#12586 )	2022-09-15 11:08:04 +08:00
Mingyu Chen	353bb6fdfb	[doc] update docs (#12615 )	2022-09-15 11:07:34 +08:00
zhannngchen	1080095f46	[typo](doc) fix some typos (#12611 )	2022-09-15 11:07:19 +08:00
starocean999	8e4374b7ec	[enhancement](agg)remove unnessasery mem alloc and dealloc in agg node (#12535 )	2022-09-15 11:07:06 +08:00
Henry2SS	2ac790bf31	[enhancement](statistic) the calculation of routine load statistics are not accurate (#12594 ) Co-authored-by: wuhangze <wuhangze@jd.com>	2022-09-15 11:00:57 +08:00
yixiutt	b136d80e1a	[enhancement](compress) reuse compression ctx and buffer (#12573 ) Reuse compression ctx and buffer. Use a global instance for every compression algorithm, and use a thread saft buffer pool to reuse compression buffer, pool size is equal to max parallel thread num in compression, and this will not be too large. Test shows this feature increase 5% of data import and compaction. Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-09-15 10:59:46 +08:00
jakevin	6543924790	[fix](Nereids): avoid commute cause dead-loop. (#12616 ) * [fix](Nereids): avoid commute cause dead-loop. * update best plan	2022-09-15 10:47:11 +08:00
Mingyu Chen	8aa5899484	[fix](load) add scan tuple for stream load scan node only when vectorization is enable (#12578 )	2022-09-15 08:44:39 +08:00
Gabriel	beeb0ef3eb	[Bug](lead) fix wrong child expression of `lead` function (#12587 )	2022-09-15 08:44:18 +08:00
Luzhijing	2dad67ee3e	[docs](readme) update 1.1.2 released (#12596 )	2022-09-15 08:43:45 +08:00
Zhengguo Yang	d8b6f09cc1	[Bugfix](string_functions) fix heap-buffer-overflow on find_in_set (#12613 )	2022-09-15 08:43:10 +08:00
Xinyi Zou	47d43b34b3	[enhancement](thirdparty) Compile Jemalloc separately on thirdparty (#12577 ) Compile Jemalloc separately and optimize the configuration	2022-09-14 23:31:48 +08:00
lihangyu	f50054f547	[Enhancement](array-type) record offsets info to speed up the seek performance (#12293 ) Store the offset rather than the length in file for the data with array type. The new file format can improve the seek performance. Please refer to #12246 to get the performance report. Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com>	2022-09-14 22:41:54 +08:00
Kikyou1997	d4cb0bbdd5	[test](nereids) Add TPC-H regression test cases for nereids (#12600 ) forbidden some test cases that could not run success. Will be open if we fix corresponding bugs	2022-09-14 22:37:56 +08:00
Mingyu Chen	c5ad989065	[refactor](reader) refactor the interface of file reader (#12574 ) Currently, Doris has a variety of readers for different file formats, such as parquet reader, orc reader, csv reader, json reader and so on. The interfaces of these readers are not unified, which makes it impossible to call them through a unified method. In this PR, I added a `GenericReader` interface class, and other Readers will implement this interface class to use the `get_next_block()` method. This PR currently only modifies `arrow_reader` and `parquet reader`. Other readers will be modified one by one in subsequent PRs.	2022-09-14 22:31:11 +08:00
Yongqiang YANG	be0a0200cf	[fix](grpc-java) use pooled stub to call rpc on be instead of one stub (#10439 ) A channel is closed when a timeout or exception happens, if only one stub is used, then all query would fail. If we dont close the channel, sometimes grpc-java stuck without sending any rpc.	2022-09-14 22:30:45 +08:00
Pxl	0ead048b93	[Enhancement](column) remove ColumnString terminating zero and add a data_version for pblock (#12456 ) 1. remove ColumnString terminating zero 2. add a data_version for pblock 3. change EncryptionMode to enum class	2022-09-14 21:25:22 +08:00
liwenqi1996	c03f7c3ba4	[sample](flink-connector) add doris data delete function (#12599 )	2022-09-14 19:18:59 +08:00
deardeng	3130a19fe9	[feature](regression) Enhancement regression frame, support http post… (#12565 )	2022-09-14 15:31:59 +08:00
Kikyou1997	3543f85ae5	[feature](nereids) merge push down and remove redundant operator rules into one batch (#12569 ) 1. For some related rules, we need to execute them together to get the expected plan. 2. Add session variables to avoid fallback to stale planner when running regression tests of nereids for piggyback.	2022-09-14 14:37:36 +08:00
zy-kkk	08ee84ef67	[typo](docs)fix tablet-local-debug doc err #12572	2022-09-14 14:26:56 +08:00
Jerry Hu	501e7b9132	[chore][config] increase the default value of doris_blocking_priority_queue_wait_timeout_ms (#12580 ) The default value of Config::doris_blocking_priority_queue_wait_timeout_ms make PriorityWorkStealingThreadPool::work_thread high CPU usage (about 8%)	2022-09-14 14:26:13 +08:00
HappenLee	a219a41dde	[dependency](xxhash) Add xxhash lib (#12566 ) Add xxhash lib for BE, which is the faster hash method by test.	2022-09-14 12:30:09 +08:00
jakevin	fd0cf78aa7	[fix](Nereids): fix StatsCalculator compute project and correct commute join type. (#12539 )	2022-09-14 10:32:05 +08:00
ChPi	ead016e0d2	[Enhancement](execute) add timeout for executing fragment rpc (#12512 ) Co-authored-by: chenjie <chenjie@cecdat.com>	2022-09-14 09:12:33 +08:00
lsy3993	8448867bed	[regression-test](window-function) add big table in regression of window function #12562	2022-09-14 08:43:24 +08:00
Yongqiang YANG	5dcf933012	[Bug](column) ColumnNullable::replace_column_data should DCHECK size > sel… #12558	2022-09-14 08:42:15 +08:00
camby	56b2fc43d4	[enhancement](array-type) shrink column suffix zero for type ARRAY<CHAR> (#12443 ) In compute level, CHAR type will shrink suffix zeros. To keep the logic the same as CHAR type, we also shrink for ARRAY or ARRAY<ARRAY> types. Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-09-13 23:24:48 +08:00
HappenLee	d913ca5731	[Opt](vectorized) Speed up bucket shuffle join hash compute (#12407 ) * [Opt](vectorized) Speed up bucket shuffle join hash compute	2022-09-13 20:19:22 +08:00
jakevin	9a5be4bab5	[feature](Nereids): Eliminate redundant filter and limit. (#12511 )	2022-09-13 20:08:13 +08:00
AlexYue	58508aea13	[enhance](information_schema) show hll type and bitmap type instead of unknown (#12519 ) Before this pr, when querying data type of hll/bitmap column, 'unknown' would be returned instead of the correct data type of queried column.	2022-09-13 19:43:42 +08:00
TengJianPing	6bf5fc6db5	[improvement](storage) For debugging problems: add session variable `skip_storage_engine_merge` to treat agg and unique data model as dup model (#11952 ) For debug purpose: Add session variable skip_storage_engine_merge, when set to true, tables of aggregate key model and unique key model will be read as duplicate key model. Add session variable skip_delete_predicate, when set to true, rows deleted with delete statement will be selected.	2022-09-13 19:18:56 +08:00
Henry2SS	6a3385437b	[fix](comments) modify comments of setting global variables #12514 Co-authored-by: wuhangze <wuhangze@jd.com>	2022-09-13 19:13:57 +08:00
Pxl	9e49f68663	[fix](new-scan) try to fix invalid call to nullptr slot (#12552 )	2022-09-13 18:54:29 +08:00
deardeng	b98a3ed86c	[fix](frontend) fix notify update storage policy agent task null exception #12470	2022-09-13 16:20:11 +08:00
Pxl	2306e46658	[Enhancement](compaction) reduce VMergeIterator copy block (#12316 ) This pr change make VMergeIterator support return row reference to instead copy a full block.	2022-09-13 16:19:34 +08:00
Jibing-Li	dc80a993bc	[feature-wip](new-scan) New load scanner. (#12275 ) Related pr: https://github.com/apache/doris/pull/11582 https://github.com/apache/doris/pull/12048 Using new file scan node and new scheduling framework to do the load job, replace the old broker scan node. The load part (Be part) is work in progress. Query part (Fe) has been tested using tpch benchmark. Please review only the FE code in this pr, BE code has been disabled by enable_new_load_scan_node configuration. Will send another pr soon to fix be side code.	2022-09-13 13:36:34 +08:00
jakevin	5b4d3616a4	[feature](Nereids): semi join transpose. (#12515 ) * [feature](Nereids): semi join transpose. * fix conditionChecker and check lasscom	2022-09-13 13:32:47 +08:00
Kikyou1997	d35a8a24a5	[feature](nereids) push down Project through Limit (#12490 ) This rule is rewrite project -> limit to limit -> project. The reason is we could get tree like project -> limit -> project -> other node. If we do not rewrite it. we could not merge the two project into one. And if we has more than one project on one node, the second one will overwrite the first one when translate. Then, be will core dump or return slot cannot find error.	2022-09-13 13:26:12 +08:00
jakevin	c3d7d4ce7a	[fix](Nereids): fix LAsscom project split. (#12506 )	2022-09-13 12:12:39 +08:00

1 2 3 4 5 ...

6307 Commits