doris

Author	SHA1	Message	Date
Xinyi Zou	97fcad76f8	[enhancement](memtracker) Improve readability (#15716 )	2023-01-16 16:30:35 +08:00
Pxl	b727033906	[Chore](build) enable -Wextra and remove some -Wno (#15760 ) enable -Wextra and remove some -Wno	2023-01-15 10:40:35 +08:00
yiguolei	16862d9b43	[refactor](remove unused code) remove buffer pool and disk io mgr (#15853 ) * [refactor](remove buffer pool and disk io mgr) remove unused code Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-13 09:42:58 +08:00
yiguolei	d857b4af1b	[refactor](remove row batch) remove impala rowbatch structure (#15767 ) * [refactor](remove row batch) remove impala rowbatch structure Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-11 09:37:35 +08:00
Gabriel	124c8662e8	[Bug](schema scanner) Fix wrong type in schema scanner (#15768 )	2023-01-11 08:37:39 +08:00
slothever	90a92f0643	[feature-wip](multi-catalog) add iceberg tvf to read snapshots (#15618 ) Support new table value function `iceberg_meta("table" = "ctl.db.tbl", "query_type" = "snapshots")` we can use the sql `select * from iceberg_meta("table" = "ctl.db.tbl", "query_type" = "snapshots")` to get snapshots info of a table. The other iceberg metadata will be supported later when needed. One of the usage: Before we use following sql to time travel: `select * from ice_table FOR TIME AS OF "2022-10-10 11:11:11"`; `select * from ice_table FOR VERSION AS OF "snapshot_id"`; we can use the snapshots metadata to get the `committed time` or `snapshot_id`, and then, we can use it as the time or version in time travel clause	2023-01-10 22:37:35 +08:00
zclllyybb	c3da5a687a	[fix]fixed dangerous usage of namespace std (#15741 ) Co-authored-by: zhaochangle <zhaochangle@selectdb.com>	2023-01-10 16:10:49 +08:00
Gabriel	d0e8f84279	[feature](vectorized) Support MemoryScratchSink on vectorized engine (#15612 )	2023-01-10 10:38:35 +08:00
Mingyu Chen	9e3a61989b	[refactor](es) remove BE generated dsl for es query #15751 remove fe config enable_new_es_dsl and all related code. Now the DSL for es is always generated on FE side.	2023-01-10 08:40:32 +08:00
spaces-x	1018657d9d	[Enhancement](SparkLoad): avoid BE OOM in push task, fix #15572 (#15620 ) Release memory pool held by the parquet reader when the data has been flushed by rowset writter. Co-authored-by: spaces-x <weixiang06@meituan.com>	2023-01-05 10:20:32 +08:00
Jibing-Li	17286861ef	[Fix](multi catalog)Skip non-vectorized init code for NewFileScanNode. #15550	2023-01-03 09:22:17 +08:00
AlexYue	87110ad3e3	[chore](Sink)remove useless OlapTablePartitionParam-related code (#15549 )	2023-01-02 22:47:16 +08:00
starocean999	100834df8b	[fix](nereids) fix some arrgregate bugs in Nereids (#15326 ) 1. the agg function without distinct keyword should be a "merge" funcion in threePhaseAggregateWithDistinct 2. use aggregateParam.aggMode.consumeAggregateBuffer instead of aggregateParam.aggPhase.isGlobal() to indicate if a agg function is a "merge" function 3. add an AvgDistinctToSumDivCount rule to support avg(distinct xxx) in some case 4. AggregateExpression's nullable method should call inner function's nullable method. 5. add a bind slot rule to bind pattern "logicalSort(logicalHaving(logicalProject()))" 6. don't remove project node in PhysicalPlanTranslator 7. add a cast to bigint expr when count( distinct datelike type ) 8. fallback to old optimizer if bitmap runtime filter is enabled. 9. fix exchange node mem leak	2022-12-30 23:07:37 +08:00
YueW	edecc2e706	[feature-wip](inverted index) API for inverted index reader and syntax for fulltext match (#14211 ) * [feature-wip](inverted index)inverted index api: reader * [feature-wip](inverted index) Fulltext query syntax with MATCH/MATCH_ALL/MATCH_ALL * [feature-wip](inverted index) Adapt to index meta * [enhance] add more metrics * [enhance] add fulltext match query check for column type and index parser * [feature-wip](inverted index) Support apply inverted index in compound predicate which except leaf node of and node	2022-12-30 21:48:14 +08:00
zhangstar333	85c7c531f1	[vectorized](jdbc) support array type in jdbc external table (#15303 )	2022-12-30 00:29:08 +08:00
YueW	305dd15fea	[improvement](index) Support bitmap index can be applied with compound predicate when enable vectorized engine query (#13035 ) Current bitmap index only can apply pushed down predicates which in AND conditions. When predicates in OR conditions and other complex compound conditions, it will not be pushed down to the storage layer, this leads to read more data. Based on that situation, this pr will do: 1. this pr in order to support bitmap index apply compound predicates, query sql like: select * from tb where a > 'hello' or b < 100; select * from tb where a > 'hello' or b < 100 or c > 'ok'; select * from tb where (a > 'hello' or b <100) and (a < 'world' or b > 200); select * from tb where (not a> 'hello') or b < 100; ... above sql，column a and b and c has created bitmap_index. 2. this optimization can reduce reading data by index 3. set config enable_index_apply_compound_predicates to use this optimization	2022-12-28 20:08:57 +08:00
yiguolei	a807978882	[refactor](non-vec) Remove rowbatch code from delta writer and some rowbatch related code (#15349 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-12-26 08:54:51 +08:00
luozenglin	8515a03ef9	[fix](compile) fix compile error caused by `mysql_scan_node.cpp` not being found when enabling `WITH_MYSQL` (#15277 )	2022-12-23 16:25:28 +08:00
Gabriel	b085ff49f0	[refactor](non-vec) delete non-vec data sink (#15283 ) * [refactor](non-vec) delete non-vec data sink Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-12-23 14:10:47 +08:00
Gabriel	e9a201e0ec	[refactor](non-vec) delete some non-vec exec node (#15239 ) * [refactor](non-vec) delete some non-vec exec node	2022-12-22 14:05:51 +08:00
Gabriel	af54299b26	[Pipeline](projection) Support projection on pipeline engine (#15220 )	2022-12-21 15:47:29 +08:00
Xin Liao	efdc73777a	[enhancement](load) verify the number of rows between different replicas when load data to avoid data inconsistency (#15101 ) It is very difficult to investigate the data inconsistency of multiple replicas. When loading data, the number of rows between replicas is checked to avoid some data inconsistency problems.	2022-12-21 09:50:13 +08:00
zhangstar333	494eb895d3	[vectorized](pipeline) support union node operator (#15031 )	2022-12-19 22:01:56 +08:00
xueweizhang	1597afcd67	[fix](mutil-catalog) fix get many same name db/table when show where (#15076 ) when show databases/tables/table status where xxx, it will change a selectStmt to select result from information_schema, it need catalog info to scan schema table, otherwise may get many database or table info from multi catalog. for example mysql> show databases where schema_name='test'; +----------+ \| Database \| +----------+ \| test \| \| test \| +----------+ MySQL [internal.test]> show tables from test where table_name='test_dc'; +----------------+ \| Tables_in_test \| +----------------+ \| test_dc \| \| test_dc \| +----------------+	2022-12-19 14:27:48 +08:00
camby	401d5776b0	[fix](compile) compile error while with DORIS_WITH_MYSQL #15105 Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-12-15 20:40:33 +08:00
Pxl	c25a7235f9	[Pipeline](load) support pipeline broker load (#14940 ) support pipeline broker load	2022-12-13 00:28:36 +08:00
plat1ko	f3aea7f0f0	[Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744 )	2022-12-11 23:33:18 +08:00
Mingyu Chen	0b945e4ee3	[fix](csv-reader) fix be crash when reading invalid value (#14951 )	2022-12-10 18:45:47 +08:00
HappenLee	68092fe514	[pipeline](NLJ) support nested loop join for pipeline (#14966 )	2022-12-10 00:20:16 +08:00
Jerry Hu	873b128fde	[feature](pipeline) add inersect/except operators (#14868 )	2022-12-09 14:13:48 +08:00
lsy3993	5292880310	[refactor](odbc) move param to config (#14596 ) move param to config	2022-12-06 17:38:52 +08:00
HappenLee	b30cd86e9e	[Refactor](pipeline) Refactor operator and builder code of pipeline (#14787 )	2022-12-05 18:35:00 +08:00
HappenLee	12304bc0ee	[Pipeline](exec) Support pipeline exec engine (#14736 ) Co-authored-by: Lijia Liu <liutang123@yeah.net> Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: Jerry Hu <mrhhsg@gmail.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: shee <13843187+qzsee@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> ## Problem Summary: ### 1. Design DSIP: https://cwiki.apache.org/confluence/display/DORIS/DSIP-027%3A+Support+Pipeline+Exec+Engine ### 2. How to use: Set the environment variable `set enable_pipeline_engine = true; `	2022-12-02 17:11:34 +08:00
Xinyi Zou	176f519fa1	[enhancement](memtracker) Optimize exec node memory tracking (#14711 )	2022-12-01 14:52:21 +08:00
AlexYue	898d0d42f1	[improvement](load)add more log for better bug tracing experience for be write (#14424 ) Recently when tracing one bug happened in version 1.1.4 I found out there were some places we can add more log for a better tracing.	2022-11-29 22:28:39 +08:00
zhannngchen	39c47d930b	[improvement](load) add more log on rpc error (#14559 ) * [improvement](load) add more log on rpc error * update	2022-11-28 08:32:20 +08:00
Jerry Hu	9103ded1dd	[improvement](join)optimize sharing hash table for broadcast join (#14371 ) This PR is to make sharing hash table for broadcast more robust: Add a session variable to enable/disable this function. Do not block the hash join node's close function. Use shared pointer to share hash table and runtime filter in broadcast join nodes. The Hash join node that doesn't need to build the hash table will close the right child without reading any data(the child will close the corresponding sender).	2022-11-24 21:06:44 +08:00
starocean999	7f4cc61286	[fix](cast)prevent be from crashing when cast function is not available (#14540 ) * [fix](cast)prevent be from crashing when cast function is not available * format code	2022-11-24 14:17:49 +08:00
Pxl	bcd641877f	[Enhancement](scan) disable build key range and filters when push down agg work (#14248 ) disable build key range and filters when push down agg work	2022-11-21 12:47:57 +08:00
zhannngchen	41dae8b6bb	[improvement](load) add a log when close OlapTableSink with error (#14257 )	2022-11-21 10:33:37 +08:00
Gabriel	2c42f0a905	[refactor](decimalv3) Refine code for DecimalV3 (#14394 )	2022-11-19 16:57:17 +08:00
Mingyu Chen	512b787559	[fix](parquet-reader) fix stack-use-after-return error (#14411 )	2022-11-19 10:52:50 +08:00
Xin Liao	a82896f420	[fix](broker-load) fix that broker load don not set be exec version and limit node channel memory (#14399 )	2022-11-18 23:38:37 +08:00
HappenLee	d5af4f6558	[Neried](Profile) Add projection timer for neried (#14286 )	2022-11-17 22:17:55 +08:00
yiguolei	dba19e591c	[cherry-pick](scanner) using avg rowset to calculate batch size instead of using total_bytes since it costs a lot of cpu (#14345 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-11-17 18:57:21 +08:00
Ashin Gau	20634ab7e3	[feature-wip](multi-catalog) support partition&missing columns in parquet lazy read (#14264 ) PR https://github.com/apache/doris/pull/13917 has supported lazy read for non-predicate columns in ParquetReader, but can't trigger lazy read when predicate columns are partition or missing columns. This PR support such case, and fill partition and missing columns in `FileReader`.	2022-11-16 08:43:11 +08:00
camby	3ea9d3f2e1	[enhancement](array) support read list(Array) type from orc file (#14132 ) Before this pr, if we try to load ORC file with native list(or array) type data, the be will crash. Because complex types in ORC file include multi real columns, so we need to filter columns by column names. Otherwise we could not read all columns we need. Now arrow release-7.0.0 only support create stripe reader by column index, so we patch it to support create stripe reader by column names. Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-11-15 17:48:17 +08:00
huangzhaowei	5badd70db2	[fix](csv-reader) Fix core dump when load text into doris with special delimiter (#14196 )	2022-11-15 16:06:59 +08:00
Adonis Ling	333c6390ee	[fix](be-ut) AddressSanitizer detects container-overflow issues (#14255 ) * [chore] Fix the container-overflow errors detected by address sanitizer * Fix compilation errors	2022-11-15 15:49:55 +08:00
Mingyu Chen	7eed5a292c	[feature-wip](multi-catalog) Support hive partition cache (#14134 )	2022-11-14 14:12:40 +08:00

1 2 3 4 5 ...

775 Commits