Commit Graph

7568 Commits

Author SHA1 Message Date
7cb00a8e54 [Feature](hive-writer) Implements s3 file committer. (#34307)
Backport #33937.
2024-04-29 19:56:49 +08:00
1bfe0f0393 [feature](iceberg)support read iceberg complex type,iceberg.orc format and position delete. (#33935) (#34256)
master #33935
2024-04-29 14:40:12 +08:00
9b7e007ef6 [Bug](union) fix union operator set eos is not incorrect (#34250)
* [test](case) fix unstable case without order by distinct row

* [Bug](union) fix union operator set eos is not incorrect
2024-04-29 13:38:03 +08:00
5277a55791 (pick 34003) release fd for shutdown tablets (#34224) 2024-04-29 10:51:19 +08:00
946d28646a [fix](outfile)Fixed orcOutputStream.close() throwing an exception during destruction causing the program to hang. (#34254)
bp #34243
2024-04-28 19:54:34 +08:00
417431fd83 [Enhancement](hdfs-file-system) Change fs_handler ptr to shared_ptr and remove ref count operations. (#34049)
Backport #33959.
2024-04-28 19:45:30 +08:00
99af54f779 [Fix](orc-reader) Fix the issue when string col has mixed plain and dict encoding in different stripes. (#34146) (#34248)
backport #34146
2024-04-28 19:43:57 +08:00
341f5cd7a3 [fix](branch-2.1) Fix streamload profile not set (#34221) 2024-04-28 14:36:58 +08:00
cd1c9edd71 [fix](pipeline-load) fix no error url when data quality error and total rows is negative (#34072) (#34204)
Co-authored-by: HHoflittlefish777 <77738092+HHoflittlefish777@users.noreply.github.com>
2024-04-27 18:19:08 +08:00
30a68c1240 [fix](spill) use different algorithm to avoid partition data skew (#34162) 2024-04-27 11:20:36 +08:00
970d0c80df [Improvement](agg) Improve count distinct distribute keys (#33167) 2024-04-27 02:29:33 +08:00
10e098845d [fix](compile) fix two compile errors on MacOS (#33834) (#34149) 2024-04-26 17:02:44 +08:00
0f0c0a266b [opt](parquet)Skip page with offset index (#33082)
Make skip_page() in ColumnChunkReader more efficient. No more reading page headers if there are pagelocations in chunk.
2024-04-26 15:06:16 +08:00
60e20a3afe [fix](pipeline_x) Crc32HashPartitioner should use ShuffleChannelIds (#34147) 2024-04-26 15:03:11 +08:00
9aa08d8deb [improve](disk) Not add disk path to broken list if check status is not IO_ERROR (#34111) 2024-04-26 07:44:12 +08:00
4f6b9db7a7 Update doris_main.cpp (#34128)
* Update doris_main.cpp

Log(FATAL) introduces a core dump, which is confusing for users. We should print error msg and exit without a core dump.

* Update doris_main.cpp
2024-04-26 07:43:40 +08:00
9f0a5690a6 [profile](scan) add projection time in scaner #34120 2024-04-26 07:43:40 +08:00
Pxl
7fbca522b7 [Bug](runtime-filter) fix bloom filter size error on rf merge (#34082)
fix bloom filter size error on rf merge

W20240424 11:28:56.826277 3494287 ref_count_closure.h:80] RPC meet error status: [INVALID_ARGUMENT]PStatus: (172.21.0.15)[INVALID_ARGUMENT]bloom filter size not the same: already allocated bytes 65536, expected allocated bytes 32768
2024-04-26 07:41:56 +08:00
47ded2c6a0 Revert "[fix](compile) fix two compile errors on MacOS (#33834) (#34005)"
This reverts commit 743fb62a2c42cc5cc662583c235f7336d5e6ddef.
2024-04-26 00:55:21 +08:00
9083bf7e14 revert "[Improvementation](join) empty_block shall be set true when build blo… (#33977)"
This reverts commit e3ed861e4b6a602ea874b6501998578952291f38.
2024-04-25 23:33:11 +08:00
743fb62a2c [fix](compile) fix two compile errors on MacOS (#33834) (#34005) 2024-04-25 19:39:35 +08:00
Pxl
e3ed861e4b [Improvementation](join) empty_block shall be set true when build blo… (#33977)
empty_block shall be set true when build block only one row
2024-04-25 15:07:56 +08:00
f34fe46bfa [fix](scan) fix ignore expr exec when _non_predicate_columns is empty (#33934)
fix ignore expr exec when _non_predicate_columns is empty
2024-04-25 15:06:57 +08:00
47b54d4bd5 Fix remote scan pool (#33976) 2024-04-25 15:04:43 +08:00
5f2d0e3d53 [Fix](executor)Fix when Fe send empty wg list to be may cause query failed. (#34074) 2024-04-25 12:01:44 +08:00
f4deb42a80 [pipeline](fix) Prevent re-cancel pipeline tasks (#34073) 2024-04-25 12:01:44 +08:00
a17524b427 [bugfix](core) close method should check if the pointer is nullptr (#34067)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-04-25 12:01:44 +08:00
67b394f2b0 [feature](profile) sort pipelineX task by total time #34053 2024-04-25 12:01:44 +08:00
2c3e838971 [improvement](spill) improve config of spill thread pool (#33992) 2024-04-25 12:01:44 +08:00
f6ec64c6ad [fix](exception) Fix Block noexcept method not throw exception (#34002) 2024-04-24 17:13:50 +08:00
00d773117d [fix](stream agg) fix coredump when close if open failed (#33978) 2024-04-24 17:13:50 +08:00
080c07ad87 [bug](random distribution) fix data loss and incorrect in random distribution table #33962 2024-04-24 17:13:50 +08:00
799c43686c [fix](jni-connector) avoid core dump if init connector failed (#34007)
_jni_scanner_cls may be null if connector init failed.
So need to check it before delete it.
2024-04-24 17:13:50 +08:00
8d98c71079 [FIX]fix cidr func with const param (#33968) 2024-04-24 17:13:50 +08:00
df96f76f78 [featrue](pipelineX) check output type in some node (#33716) 2024-04-24 17:13:49 +08:00
9bb149b3be [fix](stream-load) fix query id is zero in stream load log (#33954) 2024-04-22 22:33:24 +08:00
Pxl
5a5063be20 [bug](fix) heap use after free when json parse failed (#33955) 2024-04-22 22:33:24 +08:00
4d7ac82305 [profile](scanner) Fix wrong metrics (#33965) 2024-04-22 22:33:24 +08:00
299d069da9 Fix alter policy failed (#33910) 2024-04-22 22:33:24 +08:00
a050513c91 [Fix](clean trash) Fix clean trash use agent task (#33912) (#33972)
* [Fix](clean trash) Fix clean trash use agent task (#33912)

* add .h
2024-04-22 17:14:21 +08:00
e384b495e3 [fix](pipeline_x) The execution loop of a task should be broken if the task is cancelled (#33938) 2024-04-22 12:31:55 +08:00
c631f4f8a8 [fix](schema change) resolve the use count check of source logical column (#33932)
Fix error like:
```
8# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
 9# doris::vectorized::Block::clear_column_data(int) in /mnt/hdd01/ci/master-deploy/be/lib/doris_be
10# doris::vectorized::ParquetReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/parquet/vparquet_reader.cpp:514
11# doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vfile_scanner.cpp:333
12# doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vscanner.cpp:132
13# doris::vectorized::VScanner::get_block_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/scan/vscanner.cpp:99
```

Because source logical column is the destination logical column if logical converter is consistent. Previously, the reference of column was reset after the conversion was completed, but if an EOF occurred, it was returned in advance, but EOF is not a true error.
```
if (_logical_converter->is_consistent()) {
            // If logical converter is consistent, _src_logical_column is the final destination column,
            // other components will check the use count
            _src_logical_column.reset();
}
```
2024-04-22 12:31:46 +08:00
7f61626c8d [fix](arrow_flight_sql) Fix ArrowSchema column alias (#33490)
run: select TABLE_SCHEMA as a, sum(TABLE_ROWS) as b  from tables group by TABLE_SCHEMA limit 2;
old output:

          TABLE_SCHEMA                        Nullable(Int64)_1
0  regression_test_mv_p0_sum_count           9
1  regression_test_query_p0_sql_functions_string_functions       70414
now output:

          a                        b
0  regression_test_mv_p0_sum_count            9
1  regression_test_query_p0_sql_functions_string_functions       70414
2024-04-22 11:28:22 +08:00
615765c1c0 [improvement](spill) improve spill directory and fix bugs (#33900)
* [improvement](spill) improve spill directory and fix bugs

* fix
2024-04-22 11:28:22 +08:00
00ff5f05d3 [chore](log) Avoid too many 'token parser result is empty' (#33921) 2024-04-21 13:22:26 +08:00
cb2598e814 [bugfix](memtracker) memtracker is attached duplicately (#33929)
fix:

F20240420 12:47:23.222411 31558 thread_context.h:164] Check failed: thread_mem_tracker()->label() == "Orphan" , thread mem tracker label: Load#Id=b43f342ae5564c23-b7b41daf24545f78, attach mem tracker label: Load#Id=4241cef180013366-1ba9f658007f339a
12:49:46   *** Check failure stack trace: ***
12:49:46       @     0x55584aae5d26  google::LogMessage::SendToLog()
12:49:46       @     0x55584aae2770  google::LogMessage::Flush()
12:49:46       @     0x55584aae6569  google::LogMessageFatal::~LogMessageFatal()
12:49:46       @     0x55581abce4ae  doris::ThreadContext::attach_task()
12:49:46       @     0x55581abc8e8e  doris::AttachTask::AttachTask()
12:49:46       @     0x5558170a055b  doris::MemTableWriter::flush_async()
12:49:46       @     0x5558170604ee  doris::MemTableMemoryLimiter::_flush_memtable()
12:49:46       @     0x55581705e8e6  doris::MemTableMemoryLimiter::_flush_active_memtables()
12:49:46       @     0x55581705d986  doris::MemTableMemoryLimiter::handle_memtable_flush()
12:49:46       @     0x555848c9a36d  doris::vectorized::VTabletWriterV2::_write_memtable()
12:49:46       @     0x555848c990c8  doris::vectorized::VTabletWriterV2::write()
2024-04-21 09:55:48 +08:00
687951202f [refactor](opt) move BE code of hll scalar functions together, optimize head files (#33757)
In this PR, we moved the BE code of hll scalar functions together to manage better, like bitmap functions file does.

Also, we optimized the head files by:
removing useless file "vec/aggregate_functions/aggregate_function.h" and "boost/iterator/iterator_facade.hpp",
using cstddef and cstdint instead of stddef.h and stdint.h.
2024-04-21 09:55:19 +08:00
36a70ba1e7 [Fix](Csv-Reader)Fix the issue of BE core dump caused by improper configuration of column_seperator and line_delimiter. (#33693) 2024-04-20 20:06:48 +08:00
03c3419265 [Refactor](executor)Add workload schedule policy table (#33729) 2024-04-20 20:06:34 +08:00
0e3ad5cd9d [fix](parquet) fix time zone error(isAdjustedToUTC=true) in parquet reader (#33675) (#33924)
bp (#33675)

Co-authored-by: Ashin Gau <AshinGau@users.noreply.github.com>
2024-04-20 19:06:54 +08:00