Commit Graph

7591 Commits

Author SHA1 Message Date
d127d67ebe Revert "[fix](csv-reader) fix column split error when there is escape character (#34364)"
This reverts commit 971e10a9db782c9986b20e1209468e4d7aeedf71.
2024-05-07 13:36:11 +08:00
9d0d7293f0 [fix](json) fix be crash while load json data (#34283) 2024-05-07 07:42:53 +08:00
971e10a9db [fix](csv-reader) fix column split error when there is escape character (#34364) 2024-05-07 07:38:35 +08:00
8fdfbcb3c4 Revert "[Opt](func) opt the percentile func performance (#34373) (#34416)"
This reverts commit 509ae425e416b4779ae94eab9c2b21f9850e03c3.
2024-05-07 07:23:48 +08:00
e19d57261c [improvement](spill) improve cancel (#34451)
* [improvement](spill) improve cancel

* fix
2024-05-07 00:07:20 +08:00
a81beb19c2 [fix](load) fix repeatedly open tablets_channel when tablets_channel already cancelled (#34442) 2024-05-06 23:15:33 +08:00
f7900b53ce [enhancement](function) floor/ceil/round/round_bankers can use column as scale argument (#34391) 2024-05-06 22:18:36 +08:00
c22f42121b [fix](compaction test) show single replica compaction status and fix test (#33076) (#34285) (#34438) 2024-05-06 21:00:34 +08:00
aa156f0781 [opt](memory) BE memory info compatible with Cgroup (#34262) 2024-05-06 20:11:20 +08:00
11ca738261 [fix](memory) Fix thread context init in MacOS and not use memory tracker (#34125) 2024-05-06 20:11:20 +08:00
509ae425e4 [Opt](func) opt the percentile func performance (#34373) (#34416) 2024-05-06 20:10:35 +08:00
ab5ee81811 [fix](memory) Fix page cache memory tracker consumption in prune (#34320) 2024-05-06 12:53:11 +08:00
85ae773996 [fix](spill) incorrect revocable mem size of hash join (#34379) 2024-05-06 06:53:12 +08:00
7248420cfd [chore](session_variable) Add 'data_queue_max_blocks' to prevent the DataQueue from occupying too much memory. (#34017) (#34395) 2024-05-05 21:20:33 +08:00
Pxl
0d106fe4c2 [Bug](runtime-filter) release rf count dependency when query canceled (#34367)
* release rf count dependency when query canceled

* update

* update
2024-05-02 09:56:17 +08:00
8abd136ba2 [Improvement](executor)Refactor Workload group memory GC (#33797)
* just gc group's overcommit query when minor gc

* add process usage
2024-04-30 19:34:31 +08:00
35f8563a75 [feature](iceberg) support iceberg equality delete (#34223) (#34327)
bp #34223

Co-authored-by: Ashin Gau <AshinGau@users.noreply.github.com>
2024-04-30 11:51:29 +08:00
7d77fd0286 [fix](profile) Fix reporting the profile while building the pipeline profile. (#34215) (#34326) 2024-04-30 11:38:03 +08:00
843c89f109 [fix] fix nullptr when clear cache due to move (#34323) 2024-04-30 09:52:41 +08:00
6f873c5907 [improvement](join) Avoid merging blocks more than once on the build side (#34291) 2024-04-30 08:37:53 +08:00
53c06ad9d2 [fix](spill) handel canceled status in spill (#34268) 2024-04-30 08:35:52 +08:00
b15fc2a906 [Cherry-pick](branch-2.1) Pick #34043 and #34112 (#34318)
* [Enhancement](full compaction) Add run status support for full compaction (#34043)

* The usage is `curl http://{ip}:{host}/api/compaction/run_status?tablet_id={tablet_id}`
e.g. `curl http://127.0.0.1:8040/api/compaction/run_status?tablet_id=10084`

If full compaction is running, the output will be
```
{
"status" : "Success",
"run_status" : true,
"msg" : "compaction task for this tablet is running",
"tablet_id" : 10084,
"compact_type" : "full"
}
```
else the ouput will be
```
{
"status" : "Success",
"run_status" : false,
"msg" : "compaction task for this tablet is not running",
"tablet_id" : 10084,
"compact_type" : "full"
}
```

* 2

* 2

* [Fix](partial update) Fix rowset not found error when doing partial update (#34112)

Cause: In the logic of partial column updates, the existing data columns are read first, and then the data is supplemented and written back. During the reading process, initialization involves initially fetching rowset IDs, and the actual rowset object is fetched only when needed later. However, between fetching the rowset IDs and the rowset object, compaction may occur, turning the old rowset into a stale rowset. If too much time passes, the stale rowset might be directly deleted. Thus, when the rowset object is needed for an update, it cannot be found. Although the update operation with partial column logic should be able to read all keys and should not encounter new keys, if the rowset disappears, the Backend (BE) will consider these keys as missing. Consequently, it will check whether other columns have default values or are nullable. If this check fails, the aforementioned error is thrown.

Solution: To avoid such issues during partial column updates, the initialization step should involve fetching both the rowset IDs and the shared pointer to the rowset object simultaneously. This ensures that the rowset can always be found during data retrieval.
2024-04-30 07:26:23 +08:00
a173513e27 [fix](pipelinex) exchange sink not set ready when source limit #34241 2024-04-29 20:58:50 +08:00
7cb00a8e54 [Feature](hive-writer) Implements s3 file committer. (#34307)
Backport #33937.
2024-04-29 19:56:49 +08:00
1bfe0f0393 [feature](iceberg)support read iceberg complex type,iceberg.orc format and position delete. (#33935) (#34256)
master #33935
2024-04-29 14:40:12 +08:00
9b7e007ef6 [Bug](union) fix union operator set eos is not incorrect (#34250)
* [test](case) fix unstable case without order by distinct row

* [Bug](union) fix union operator set eos is not incorrect
2024-04-29 13:38:03 +08:00
5277a55791 (pick 34003) release fd for shutdown tablets (#34224) 2024-04-29 10:51:19 +08:00
946d28646a [fix](outfile)Fixed orcOutputStream.close() throwing an exception during destruction causing the program to hang. (#34254)
bp #34243
2024-04-28 19:54:34 +08:00
417431fd83 [Enhancement](hdfs-file-system) Change fs_handler ptr to shared_ptr and remove ref count operations. (#34049)
Backport #33959.
2024-04-28 19:45:30 +08:00
99af54f779 [Fix](orc-reader) Fix the issue when string col has mixed plain and dict encoding in different stripes. (#34146) (#34248)
backport #34146
2024-04-28 19:43:57 +08:00
341f5cd7a3 [fix](branch-2.1) Fix streamload profile not set (#34221) 2024-04-28 14:36:58 +08:00
cd1c9edd71 [fix](pipeline-load) fix no error url when data quality error and total rows is negative (#34072) (#34204)
Co-authored-by: HHoflittlefish777 <77738092+HHoflittlefish777@users.noreply.github.com>
2024-04-27 18:19:08 +08:00
30a68c1240 [fix](spill) use different algorithm to avoid partition data skew (#34162) 2024-04-27 11:20:36 +08:00
970d0c80df [Improvement](agg) Improve count distinct distribute keys (#33167) 2024-04-27 02:29:33 +08:00
10e098845d [fix](compile) fix two compile errors on MacOS (#33834) (#34149) 2024-04-26 17:02:44 +08:00
0f0c0a266b [opt](parquet)Skip page with offset index (#33082)
Make skip_page() in ColumnChunkReader more efficient. No more reading page headers if there are pagelocations in chunk.
2024-04-26 15:06:16 +08:00
60e20a3afe [fix](pipeline_x) Crc32HashPartitioner should use ShuffleChannelIds (#34147) 2024-04-26 15:03:11 +08:00
9aa08d8deb [improve](disk) Not add disk path to broken list if check status is not IO_ERROR (#34111) 2024-04-26 07:44:12 +08:00
4f6b9db7a7 Update doris_main.cpp (#34128)
* Update doris_main.cpp

Log(FATAL) introduces a core dump, which is confusing for users. We should print error msg and exit without a core dump.

* Update doris_main.cpp
2024-04-26 07:43:40 +08:00
9f0a5690a6 [profile](scan) add projection time in scaner #34120 2024-04-26 07:43:40 +08:00
Pxl
7fbca522b7 [Bug](runtime-filter) fix bloom filter size error on rf merge (#34082)
fix bloom filter size error on rf merge

W20240424 11:28:56.826277 3494287 ref_count_closure.h:80] RPC meet error status: [INVALID_ARGUMENT]PStatus: (172.21.0.15)[INVALID_ARGUMENT]bloom filter size not the same: already allocated bytes 65536, expected allocated bytes 32768
2024-04-26 07:41:56 +08:00
47ded2c6a0 Revert "[fix](compile) fix two compile errors on MacOS (#33834) (#34005)"
This reverts commit 743fb62a2c42cc5cc662583c235f7336d5e6ddef.
2024-04-26 00:55:21 +08:00
9083bf7e14 revert "[Improvementation](join) empty_block shall be set true when build blo… (#33977)"
This reverts commit e3ed861e4b6a602ea874b6501998578952291f38.
2024-04-25 23:33:11 +08:00
743fb62a2c [fix](compile) fix two compile errors on MacOS (#33834) (#34005) 2024-04-25 19:39:35 +08:00
Pxl
e3ed861e4b [Improvementation](join) empty_block shall be set true when build blo… (#33977)
empty_block shall be set true when build block only one row
2024-04-25 15:07:56 +08:00
f34fe46bfa [fix](scan) fix ignore expr exec when _non_predicate_columns is empty (#33934)
fix ignore expr exec when _non_predicate_columns is empty
2024-04-25 15:06:57 +08:00
47b54d4bd5 Fix remote scan pool (#33976) 2024-04-25 15:04:43 +08:00
5f2d0e3d53 [Fix](executor)Fix when Fe send empty wg list to be may cause query failed. (#34074) 2024-04-25 12:01:44 +08:00
f4deb42a80 [pipeline](fix) Prevent re-cancel pipeline tasks (#34073) 2024-04-25 12:01:44 +08:00
a17524b427 [bugfix](core) close method should check if the pointer is nullptr (#34067)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-04-25 12:01:44 +08:00