Commit Graph

15942 Commits

Author SHA1 Message Date
efea006f3a [ut](move-memtable) add CLOSE_LOAD before EOS ut case (#29253)
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-12-29 00:33:34 +08:00
99a1e066b5 [fix](group_commit) group_commit is not support on table with property light_schema_change=false (#29244) 2023-12-29 00:26:38 +08:00
9be0f04506 (improv)[group commit] refactor some group commit code (#29180) 2023-12-29 00:26:10 +08:00
9a277a6f11 [fix](move-memtable) don't abort in replica write layer unless all replica fails (#29257) 2023-12-29 00:03:28 +08:00
feebe3e6fb [FIX](literal) fix expression literal error #29157 2023-12-28 23:08:01 +08:00
a90304c208 [fix](parquet) complex type in parquet is case sensitive (#29245)
Change name of complex type in parquet to case-insensitive. Otherwise, uppercase column names of complex types will return null.
2023-12-28 22:43:11 +08:00
8a491e7b1d Fix workload scheduler start too early may cause npe (#29258) 2023-12-28 22:41:42 +08:00
e64c5687f2 [fix](index compaction)support compact multi segments in one index (#28889) 2023-12-28 21:33:21 +08:00
ffd178f5ff [feat](pipelinex) support parallel scan on pipeline x engine (#29070)
* [feat](pipelinex) support parallel scan on pipeline x engine

* make parallel scan be independent of shared scan
2023-12-28 21:29:07 +08:00
0912b137e6 [Improvement](pipelineX) optimize local exchange sink (#29250) 2023-12-28 21:22:29 +08:00
b093097bc3 [improvement](statistic)Improve auto analyze visibility. (#29046)
Show auto analyze can show the running jobs, not only the finished/failed jobs.
Show analyze task status could show auto tasks as well.
Remove some useless code.
Auto analyze execute catalog/db/table in the order of id, small id first.
2023-12-28 21:21:17 +08:00
5129ab5738 [fix](decimalv2) fix decimalv2 agg errors (#29246) 2023-12-28 21:17:16 +08:00
c8a0d3e03c [fix](invert index) fix error handling for match_regexp resulting in an empty match. (#29233) 2023-12-28 19:58:41 +08:00
a14daca7ba [feature](inverted index)write separated index files in RAM directory to reduce IO(#28810)
Normally we write the separate index files to disk before we merge the index files into an idx compound file.
In high-frequency load scenarios, disk IO can become a bottleneck. 
In order to reduce the pressure on the disk, we write the standalone index file to the RAM directory for the first time, and then write it to the disk when merging it into a composite file.

Add config `index_inverted_index_by_ram_dir_enable`, default is `false`.
2023-12-28 17:18:59 +08:00
e610044bae [Enhancement] (schema) add column type check (#28718) 2023-12-28 17:11:24 +08:00
6323c17ad5 [fix](test) fix wrong DDL in test pipeline load #29211 2023-12-28 16:51:48 +08:00
b31494b18c [test](regression) add fault injection cases for LoadStream (#29101)
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-12-28 16:16:26 +08:00
03a6a2880a [fix](journal) Fix infinite block due to initial BDB journal failed (#29205)
Opening a BDBJournal will acquire the max journal id, but it doesn't
need to check whether the replica txn is matched with the master.
2023-12-28 15:57:51 +08:00
8becf053cb [fix](multi-catalog)unsupported hive input format should throw an exception and remove useless method (#29087)
introduce from: #28644
2023-12-28 15:43:28 +08:00
ba7b7c1f60 [Chore](Job)It is forbidden to change the status of internal JOB through PAUSE/RESUME (#29036) 2023-12-28 15:40:16 +08:00
5171a77f9e [fix](Nereids): merge Offset in Limit Translator (#29100) 2023-12-28 15:32:45 +08:00
14c902b504 [fix](regression test) fix test_alter_colocate_table (#29009) 2023-12-28 15:09:21 +08:00
31b3be456c add workload scheduler in be (#29116) 2023-12-28 15:04:22 +08:00
Pxl
118775f913 [Bug](schame-change) fix wrong result after reorder mor table (#29045)
* fix wrong result after reorder mor table

* update
2023-12-28 14:57:31 +08:00
Pxl
c98489fc09 [Feature](materialized-view) support visitBitmapUnion mv rewrite (#29200)
* support visitBitmapUnion rewrite

* add case
2023-12-28 14:56:33 +08:00
29a7c0d677 [pipelineX](scan) ignore storage data distribution by default (#29192) 2023-12-28 14:54:09 +08:00
fe93a8f1d0 [cleanup](move-memtable) remove unused log in load stream stub (#29084) 2023-12-28 14:39:10 +08:00
2e910dac2a [enhencement](segcompaction) cancel inflight segcompaction tasks faster when load finish (#28901)
[Goal]
When building the rowset writer, avoid waiting for inflight segcompaction
to elimite long tail latency for load.

[Current situation]
1. The segcompaction of a rowset is executed serially. During the build phase,
we need to wait for the completion of the inflight segcompaction task.

2. If the rowset writer finishes writing and starts building meta, then segments
that have not been compacted will not be submitted to segcompaction worker.
We simply ignore them to accelerate the build process.

3. But this is not enough. If a segcompaction task has already been submitted to
the worker thread pool, we will set a cancelled flag for the worker,
and nothing will be done during execution to complete the task ASAP.

4. But this is still not enough. Although the latency of the segcompaction task
has been shortened by aforemetioned method, tasks may still be queuing in the
thread pool.

[Solution]
We can increase the worker thread pool to avoid queuing congestion, but this is
not the best solution.
Segcompaction should be a best effort work, and should not use too many CPU and
memory resources. So we adopted the strategy of unbinding build and segcompaction,
specifically:

1. For the segcompaction task that is performing compaction operations, we should
not interrupt it, otherwise it may cause file corruption

2. For those tasks still queued, we no longer care about their results (because
these tasks will know they are cancelled and will not perform any actual operations),
so we just ignore them and continue with the subsequent rowset build process

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-12-28 14:32:29 +08:00
4f2d54d462 [fix](DatabaseTransactionMgr) Fix clean label bug which may cause inconsitent editlog operation (#29198) 2023-12-28 14:17:35 +08:00
f816d13c56 [feature](Nereids): eliminate groupby (#28615) 2023-12-28 14:00:41 +08:00
bc08535285 [fix](Nereids) throw readable exception when meet unsupport sup-query (#29147) 2023-12-28 13:26:09 +08:00
xy
fd90c3a6a6 [optimize](cooldown)Reduce the number of calls to the pick_cooldown_rowset (#27091)
Co-authored-by: xingying01 <xingying01@corp.netease.com>
2023-12-28 13:03:33 +08:00
a7c0dddbc9 [refactor](rename) Rename some variables in pipeline for better readability (#29140)
* rft-rename

* format
2023-12-28 12:54:47 +08:00
82a8232c8a [fix](expr) Fix BE core dump while common expr filter delete condition column (#29107)
pred column also needs to be filtered by expr, exclude delete condition column, delete condition column not need to be filtered, query engine does not need it, after _output_column_by_sel_idx, delete condition materialize column will be erase at the end of the block.
Eg:
delete from table where a = 10;
select b from table;
a column only effective in segment iterator, the block from query engine only contain the b column, so no need to filter a column by expr.
2023-12-28 11:39:54 +08:00
0562999f91 [fix](doc) spell errors fixes and align with code log for memory tracker. (#28000)
Spell corrected for LastestSuccessChannelCache and aligned that with the docs
2023-12-28 11:12:35 +08:00
3dc3e81734 [Improvement](datatype) Update Parser for IPv4/v6 data types (#29044)
Transforming from parsing std:: string to parsing char * to accelerate the parsing of ipv4/v6 data types.
2023-12-28 11:00:38 +08:00
8a169b9906 [case](regression) Test enable pipeline load (#28172)
Co-authored-by: qinhao <qinhao@newland.com.cn>
2023-12-28 10:49:19 +08:00
e5b6826de6 [fix](partial update) update error code when failed to fill the missing fields (#29103)
1. InternalError is not clear for such error, use InvalidSchema Error instead
2. avoid some useless stacktrace on InternalError when load failed
2023-12-28 10:33:03 +08:00
1284975b9b [Improve](Job) Create task adds concurrency control (#29144) 2023-12-28 10:24:39 +08:00
8b225c6c3c [pipelineX](fix) Fix core dump if cancelled (#29138) 2023-12-28 10:04:51 +08:00
1aa9ac4fe4 Prevent making snapshot on remote rowset in single replica compaction (#28716) 2023-12-27 23:43:43 +08:00
f4c5ce260b [fix](statistics)Fix rowCount==0 while analyzing bug (#28969)
Sample analyzing need to get row count by using table.getRowCount(). This method is not updated in real time, which may cause the sample task to scan whole table.
This pr is to fix this. Set the flag that indicate the analyze job is for an empty table and skip scan the table. Meanwhile, don't reset updatedRows in this case.

Set hugeTableAutoAnalyzeIntervalInMillis = 0 because all default huge table size has been set to 0.
2023-12-27 23:04:37 +08:00
0bff387577 [fix](tablet stat) fix tablet stat thread block #29151 2023-12-27 22:02:42 +08:00
0cc4ee52bf [fix](move-memtable) fix streams for node memory leak in sink v2 (#29146) 2023-12-27 21:48:32 +08:00
d96278ab21 [bug](fix) show create table show comment error (#28346) 2023-12-27 21:17:20 +08:00
224677af7c delete unused code in delta writer v2 (#29131) 2023-12-27 21:04:58 +08:00
abbd2cedff [fix](Nereids) merge limit should use bottom phase (#29142) 2023-12-27 21:04:00 +08:00
0436013baf [fix](decimal) fix cast decimal overflow and add test cases for casting decimalv2 to decimalv3 (#29165) 2023-12-27 20:58:37 +08:00
5f71691401 [fix](read) fix unexpected overflow of uninitialized column data in VStatisticsIterator::next_batch (#29141) 2023-12-27 20:58:02 +08:00
9715db61d4 [FIX](complextype)fix count func with complex type (#28873) 2023-12-27 20:38:44 +08:00