doris

Author	SHA1	Message	Date
Xin Liao	971e10a9db	[fix](csv-reader) fix column split error when there is escape character (#34364 )	2024-05-07 07:38:35 +08:00
Mingyu Chen	35f8563a75	[feature](iceberg) support iceberg equality delete (#34223 ) (#34327 ) bp #34223 Co-authored-by: Ashin Gau <AshinGau@users.noreply.github.com>	2024-04-30 11:51:29 +08:00
daidai	1bfe0f0393	[feature](iceberg)support read iceberg complex type，iceberg.orc format and position delete. (#33935 ) (#34256 ) master #33935	2024-04-29 14:40:12 +08:00
Qi Chen	99af54f779	[Fix](orc-reader) Fix the issue when string col has mixed plain and dict encoding in different stripes. (#34146 ) (#34248 ) backport #34146	2024-04-28 19:43:57 +08:00
苏小刚	0f0c0a266b	[opt](parquet)Skip page with offset index (#33082 ) Make skip_page() in ColumnChunkReader more efficient. No more reading page headers if there are pagelocations in chunk.	2024-04-26 15:06:16 +08:00
Mryange	9f0a5690a6	[profile](scan) add projection time in scaner #34120	2024-04-26 07:43:40 +08:00
wangbo	47b54d4bd5	Fix remote scan pool (#33976 )	2024-04-25 15:04:43 +08:00
Mingyu Chen	799c43686c	[fix](jni-connector) avoid core dump if init connector failed (#34007 ) _jni_scanner_cls may be null if connector init failed. So need to check it before delete it.	2024-04-24 17:13:50 +08:00
Pxl	5a5063be20	[bug](fix) heap use after free when json parse failed (#33955 )	2024-04-22 22:33:24 +08:00
Gabriel	4d7ac82305	[profile](scanner) Fix wrong metrics (#33965 )	2024-04-22 22:33:24 +08:00
Ashin Gau	c631f4f8a8	[fix](schema change) resolve the use count check of source logical column (#33932 ) Fix error like: ``` 8# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be 9# doris::vectorized::Block::clear_column_data(int) in /mnt/hdd01/ci/master-deploy/be/lib/doris_be 10# doris::vectorized::ParquetReader::get_next_block(doris::vectorized::Block, unsigned long, bool) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/parquet/vparquet_reader.cpp:514 11# doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState, doris::vectorized::Block, bool) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vfile_scanner.cpp:333 12# doris::vectorized::VScanner::get_block(doris::RuntimeState, doris::vectorized::Block, bool) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vscanner.cpp:132 13# doris::vectorized::VScanner::get_block_after_projects(doris::RuntimeState, doris::vectorized::Block, bool) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vscanner.cpp:99 ``` Because source logical column is the destination logical column if logical converter is consistent. Previously, the reference of column was reset after the conversion was completed, but if an EOF occurred, it was returned in advance, but EOF is not a true error. ``` if (_logical_converter->is_consistent()) { // If logical converter is consistent, _src_logical_column is the final destination column, // other components will check the use count _src_logical_column.reset(); } ```	2024-04-22 12:31:46 +08:00
Tiewei Fang	36a70ba1e7	[Fix](Csv-Reader)Fix the issue of BE core dump caused by improper configuration of column_seperator and line_delimiter. (#33693 )	2024-04-20 20:06:48 +08:00
Mingyu Chen	0e3ad5cd9d	[fix](parquet) fix time zone error(isAdjustedToUTC=true) in parquet reader (#33675 ) (#33924 ) bp (#33675) Co-authored-by: Ashin Gau <AshinGau@users.noreply.github.com>	2024-04-20 19:06:54 +08:00
zclllyybb	25358564ca	[Fix](compile) Fix gcc compile on master (#33864 ) This is imported by #33511. wrongly used ColumnStr<T> (); which violate C++20 standard(see https://wg21.cmeerw.net/cwg/issue2237) but still supported by clang up until now(see llvm/llvm-project#58112)	2024-04-19 23:41:37 +08:00
HappenLee	1300317723	[Exec](join) Support column string64 to avoid join failed in string size overflow the uint32 (#33511 ) (#33850 )	2024-04-18 19:43:08 +08:00
amory	b07e0a2f06	[FIX](cast)fix full/right out join for cast array (#33475 ) in some case, we has code ``` if (_join_op == TJoinOp::RIGHT_OUTER_JOIN \|\| _join_op == TJoinOp::FULL_OUTER_JOIN) { _probe_column_convert_to_null = _convert_block_to_null(*input_block); } ``` then do next function like cast , but in function cast we assume block column is same with from_type.which will make status error	2024-04-17 23:42:13 +08:00
huanghaibin	59de97be5e	[improvement](mow) Add profile for delete_bitmap get_agg function (#33576 )	2024-04-17 23:42:13 +08:00
Ashin Gau	2cd4012541	[opt](scan) read scan ranges in the order of partitions (#33515 ) (#33657 ) backport: #33515	2024-04-17 23:42:12 +08:00
wangbo	8ee8de7857	[Fix](executor)reset remote scan thread num #33579	2024-04-17 23:42:11 +08:00
Ashin Gau	ae68cca07d	[fix](schema change) CastStringConverter is compiled failed in g++ (#33546 ) follow #32873, CastStringConverter is compiled failed in g++ for uninitialized value, which is ok in clang:	2024-04-17 23:42:00 +08:00
lihangyu	249a9c9875	[Feature](Variant) support aggregation model for Variant type (#33493 ) refactor use `insert_from` to replace `replace_column_data` for variable lengths columns	2024-04-17 23:42:00 +08:00
zhangstar333	6bcf24b1f6	[bug](not in) if not in (null) could eos early (#33482 ) * [bug](not in) if not in (null) could eos early	2024-04-17 23:41:59 +08:00
Ashin Gau	9b7af4c0cf	[feature](schema change) unified schema change for parquet and orc reader (#32873 ) Following #25138, unified schema change interface for parquet and orc reader, and can be applied to other format readers as well. Unified schema change interface for all format readers: - First, read the data according to the column type of the file into source column; - Second, convert source column to the destination column with type planned by FE.	2024-04-12 15:09:25 +08:00
Pxl	5f30463bb3	[Chore](descriptors) remove unused codes for descriptors (#33408 ) remove unused codes for descriptors	2024-04-12 15:09:25 +08:00
Mryange	f7d52b5b1c	[feature](expr) add type check when expr prepare (#33330 )	2024-04-11 09:31:50 +08:00
Pxl	3081fc584d	[Improvement](runtime-filter) support sync join node build side's size to init bloom runtime filter (#32180 ) support sync join node build side's size to init bloom runtime filter	2024-04-11 09:31:50 +08:00
Mryange	28acfaed2b	[fix](pipeline)group by and output is empty (#33192 )	2024-04-10 16:23:20 +08:00
Pxl	8fd6d4c41b	[Chore](build) add -Wconversion and remove some unused code (#33127 ) add -Wconversion and remove some unused code	2024-04-10 15:26:08 +08:00
924060929	cc363f26c2	[fix](Nereids) fix group concat (#33091 ) Fix failed in regression_test/suites/query_p0/group_concat/test_group_concat.groovy select group_concat( distinct b1, '?'), group_concat( distinct b3, '?') from table_group_concat group by b2 exception: lowestCostPlans with physicalProperties(GATHER) doesn't exist in root group The root cause is '?' is push down to slot by NormalizeAggregate, AggregateStrategies treat the slot as a distinct parameter and generate a invalid PhysicalHashAggregate, and then reject by ChildOutputPropertyDeriver. I fix this bug by avoid push down literal to slot in NormalizeAggregate, and forbidden generate stream aggregate node when group by slots is empty	2024-04-10 14:59:46 +08:00
超威老仲	b0b5f84e40	[feature](load) support compressed JSON format data for broker load (#30809 )	2024-04-10 14:20:53 +08:00
Pxl	e4993a19e5	[Chore](column) remove ColumnVectorHelper (#33036 ) remove ColumnVectorHelper	2024-04-10 11:56:41 +08:00
Mryange	8e19cdd745	[featrue](expr) support common subexpression elimination be part (#32673 )	2024-04-10 11:56:21 +08:00
Xinyi Zou	cf7595d423	[opt](memory) Optimize mem tracker accuracy (#32039 ) (#33140 )	2024-04-10 11:42:19 +08:00
Gabriel	c5a3af5c27	[partitionsort](fix) Fix DCHECK failure (#33035 )	2024-04-10 11:34:30 +08:00
yiguolei	3c4ccb3981	Revert "[opt](scan) read scan ranges in the order of partitions (#31630 )" This reverts commit 5d99dffe6f1a3fcb107ce56181aeff96ef222def.	2024-04-09 12:37:31 +08:00
Mingyu Chen	0c8d3d007d	[fix](jni) don't delete global ref if scanner is not openned (#33398 )	2024-04-09 09:06:16 +08:00
Gabriel	0234976ab7	[refactor](meta scan) Remove RPC from execute threads (#33378 )	2024-04-08 20:28:02 +08:00
Gabriel	a8232c67f9	[pipelineX](runtime filter) Fix task timeout caused by runtime filter (#33332 ) (#33369 )	2024-04-08 16:30:32 +08:00
Ashin Gau	29556f758e	[fix](parquet) fix time zone error in parquet reader (#33217 ) `isAdjustedToUTC` is exactly the opposite in parquet reader(https://github.com/apache/parquet-format/blob/master/LogicalTypes.md), resulting the time with `isAdjustedToUTC=true` has increased by eight hours(UTC8). The parquet with `isAdjustedToUTC=true` can be produced by spark-sql with the following configuration: ``` --conf spark.sql.session.timeZone=UTC --conf spark.sql.parquet.outputTimestampType=TIMESTAMP_MICROS ``` However, using the following configuration, there's no logical and convert type in parquet meta data, so the time read by doris will also increase by eight hours(UTC8). Users need to set their own UTC time zone in doris(https://doris.apache.org/docs/dev/advanced/time-zone/) ``` --conf spark.sql.session.timeZone=UTC --conf spark.sql.parquet.outputTimestampType=INT96 ```	2024-04-07 23:24:22 +08:00
Mingyu Chen	ed93d6132f	[fix](jni) avoid coredump if failed to get jni env (#32950 ) This PR #32217 find a problem that may failed to get jni env. And it did a work around to avoid BE crash. This PR followup this issue, to avoid BE crash when doing `close()` of JniConnector if failed to get jni env. The `close()` method will return error when: 1. Failed to get jni env 2. Failed to release jni resource. This PR will ignore the first error, and still log fatal for second error	2024-04-07 22:16:53 +08:00
Qi Chen	ecb4372479	[Fix](pipelinex) Fix `MaxScannerThreadNum` calculation error in file scan operator when turn on pipelinex. (#33037 ) MaxScannerThreadNum in file scan operator when turn on pipelinex is incorrect, it will cost many memory and causing performance degradation. This PR fix it.	2024-04-07 22:11:27 +08:00
Gabriel	6600e92b12	[scan](status) Finish execution if scanner failed (#32966 )	2024-03-29 10:51:15 +08:00
Ashin Gau	352617a34d	[fix](scanner) cached blocks may be empty when VFileScanner return NOT_FOUND (#32745 ) Cached blocks may be empty when VFileScanner return NOT_FOUND. This feature is introduced by https://github.com/apache/doris/pull/15226. Move this function inner `VFileScanner`.	2024-03-27 10:01:05 +08:00
Pxl	f579eceb34	[Improvementation](profile) add some profile on vcollect_iterator (#32794 ) add some profile on vcollect_iterator	2024-03-26 20:33:16 +08:00
zhangstar333	0a44de67bf	[bug](distinct agg) fix distinct streaming agg not output all data (#32760 ) fix distinct streaming agg not output all data	2024-03-26 20:19:36 +08:00
Mryange	ad2d20348a	[fix](pipeline) fix use error row desc when origin block clear #32803 (#32849 ) * fix * add case	2024-03-26 20:02:46 +08:00
Mryange	de3b99be00	[fix](pipeline) fix check failed in StatefulOperator	2024-03-25 22:33:30 +08:00
苏小刚	2f2d488668	[opt](parquet) Support hive struct schema change (#32438 ) Followup: #31128 This optimization allows doris to correctly read struct type data after changing the schema from hive. ## Changing struct schema in hive: ```sql hive> create table struct_test(id int,sf struct<f1: int, f2: string>) stored as parquet; hive> insert into struct_test values > (1, named_struct('f1', 1, 'f2', 's1')), > (2, named_struct('f1', 2, 'f2', 's2')), > (3, named_struct('f1', 3, 'f2', 's3')); hive> alter table struct_test change sf sf struct<f1:int, f3:string>; hive> select * from struct_test; OK 1 {"f1":1,"f3":null} 2 {"f1":2,"f3":null} 3 {"f1":3,"f3":null} Time taken: 5.298 seconds, Fetched: 3 row(s) ``` The previous result of doris was: ```sql mysql> select * from struct_test; +------+-----------------------+ \| id \| sf \| +------+-----------------------+ \| 1 \| {"f1": 1, "f3": "s1"} \| \| 2 \| {"f1": 2, "f3": "s2"} \| \| 3 \| {"f1": 3, "f3": "s3"} \| +------+-----------------------+ ``` Now the result is same as hive: ```sql mysql> select * from struct_test; +------+-----------------------+ \| id \| sf \| +------+-----------------------+ \| 1 \| {"f1": 1, "f3": null} \| \| 2 \| {"f1": 2, "f3": null} \| \| 3 \| {"f1": 3, "f3": null} \| +------+-----------------------+ ```	2024-03-22 16:35:47 +08:00
Mryange	baf3ae1a93	[refactor](nereids)unify outputTupleDesc and projection be part (#32439 )	2024-03-22 16:35:43 +08:00
Jerry Hu	23c12fd68f	[fix](join) core caused by null-safe-equal join (#32623 )	2024-03-22 08:53:47 +08:00

1 2 3 4 5 ...

1388 Commits