Commit Graph

6945 Commits

Author SHA1 Message Date
90d3f0d805 [Fix](json) avoid print warn log when parse failed (#30656) 2024-02-01 19:00:50 +08:00
e9c112b843 [Refact](inverted index) refact inverted index cache to decouple with reader (#30574) 2024-02-01 19:00:50 +08:00
Pxl
1aa7a914e1 fix wrong profile on distinct agg and pass reference on uint136's compare (#30661) 2024-02-01 19:00:50 +08:00
4f1d76d646 handle create rowset error to avoid null pointer exception (#30670) 2024-02-01 11:51:51 +08:00
92cad69fc4 [Fix](parquet-reader) Fix reading fixed length byte array decimal in parquet reader. (#30535) 2024-01-31 23:53:40 +08:00
65076949ef [fix](compile)Fix Ambiguous regex Namespace Issue on MacOS Compilation (#30652) 2024-01-31 23:53:40 +08:00
73371d44f8 [fix][refactor] refactor schema init of externa table and some parquet issue (#30325)
1. Skip parquet file which has only 4 bytes length: PAR1
2. Refactor the schema init method of iceberg/hudi/hive table in hms catalog
    1. Remove some redundant methods of `getIcebergTable`
    2. Fix issue described in #23771
3. Support HoodieParquetInputFormatBase, treat it as normal hive table format
4. When listing file, skip all hidden dirs and files
2024-01-31 23:53:40 +08:00
77b366fc4b [fix](join) incorrect result of mark join (#30543)
incorrect result of mark join
2024-01-31 23:53:40 +08:00
711b156a78 [Refactor][Rf] remove unless code in RF (#30597) 2024-01-31 23:53:40 +08:00
c28ced1ebb [Feature](executor)Insert select limited by WorkloadGroup #30610 2024-01-31 23:53:40 +08:00
045225a096 [pipelineX](profile) Fix Tablet counter on pipelineX engine (#30613) 2024-01-31 23:53:39 +08:00
ef8d9ad9a4 [pipelinex](profile) improve memory counter of pipelineX (#30538) 2024-01-31 23:53:39 +08:00
9eeb7dc9e4 Refactor MemTableFlushExecutor::create_flush_token to improve (#30554)
readability && Add default type guard

Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2024-01-31 23:53:39 +08:00
19f57b544e support cosh math function (#30602)
Co-authored-by: Rohit Satardekar <rohitrs1983@gmail.com>
2024-01-31 23:53:39 +08:00
f35803b7a0 [feature](pipeline-load) enable pipeline load by default (#30581) 2024-01-31 23:53:39 +08:00
e6fbccd3ed [Feature](Variant) support row store for variant type (#30052) 2024-01-31 23:53:39 +08:00
8b61b7c6cd [exec](function) Add tanh func (#30555) 2024-01-31 23:53:39 +08:00
a1ccf34ecc [fix](group commit) Fix replay wal core because undefined TLoadSourceType (#30571) 2024-01-31 23:53:39 +08:00
1d632f1af4 [improvement](move-memtable) enable stream write to socket in background bthread (#30586) 2024-01-31 23:53:39 +08:00
7d037c12bf [bugfix](paimon)fix paimon testcases (#30514)
1. set default timezone
2. not supported `char` type to pushdown
2024-01-31 23:53:39 +08:00
378d9e7336 [Colo][Scan] delete the colo scan code (#30584) 2024-01-31 23:53:39 +08:00
221308f78a [fix](datatype) fix bugs for IPv4/v6 datatype and add some basic regression test cases (#30261) 2024-01-31 23:53:39 +08:00
1a8e281255 [fix](cluster by) Fix cluster_by used-after-moved in compaction (#29273) 2024-01-31 23:53:39 +08:00
59b79d47ca fix compile bug 2024-01-30 16:17:55 +08:00
7fe4d00bb2 [fix](regex) use boost regex instead of std (#30462) 2024-01-30 15:33:40 +08:00
218fb80938 [fix](group commit) Fix group commit VOlapTablePartitionParam memory … (#30491) 2024-01-30 15:33:40 +08:00
08897b7e03 [Improvement](executor)Remove ThreadPoolToken from MemTableFlushExecutor #30529 2024-01-30 15:32:40 +08:00
0932dadcff [enhancement](log) print detail error for segment compaction failure (#30503) 2024-01-30 15:31:22 +08:00
28c4e69149 [fix](move-memtable) check load timeout before close wait (#30526) 2024-01-30 15:31:22 +08:00
129463f557 [Try_Fix](scan) try fix the scanner schedule logic to prevent excessive memory usage and timeout (#30515) 2024-01-30 15:31:22 +08:00
f7e01ceffa [bug](node) add dependency for set operation node (#30203)
These sinks must be completed one by one in order, eg: child(1) must wait child(0) build finish
2024-01-30 15:30:39 +08:00
49d17f2be2 [fix](move-memtable) fix potential duplicate of TabletStream profile (#30397) 2024-01-30 15:30:14 +08:00
6eba030897 [fix](chore) path gc should consider tablet migration (#30095) (#30548)
Background:

Migration will create new tablet in different DataDir, the old tablet will be moved to TabletManager::_shutdown_tablets.
The migration task won't copy data in stale rowsets to new tablet, so after migration, the new tablet don't contains stale rowsets of old tablet
The path GC process will check every path, to make sure if it's an useless tablet, or an useless rowset. If it is, will remove data of these tablets/rowsets
The issue:

When path GC got a stale rowset path from the data dir of old tablet, it extract the tablet id and rowset id
Then it check if the tablet id exists in TabletManager, and the answer is YES!
It got the tablet instance, which is the new tablet, then it check if the stale rowset id from the old tablet path exists in the new tablet instance, and got the answer NO.
The path GC process treat the rowset as an useless rowset, since it can't find anyone holds reference to it, then delete the data of this stale rowset.
But some query may still holds reference to this stale rowset, the deletion will cause query failure.
Solution:

The lifecycle of all rowsets in a shutdown tablet, should be related with the lifecycle of this tablet
We need to differentiate the old tablet and the new one created by migration task, while performing path GC.
2024-01-30 12:03:21 +08:00
cc3c6d1479 [improvement](create tablet) backend create tablet round robin among … (#30530)
* [improvement](create tablet) backend create tablet round robin among … (#29818)

* [improvement](create tablet) be choose disk tolerate with little skew (#30354)

---------

Co-authored-by: yujun <yu.jun.reach@gmail.com>
2024-01-30 10:20:35 +08:00
6231300e9e [Fix](Rf) fix in_or_bloom filter merge error in broadcast join remote target tpcds q78 (#30492) 2024-01-29 19:03:47 +08:00
11f1b129c0 [optimize](invert index) avoid redundant checks for exist. (#30191) 2024-01-29 19:03:47 +08:00
cc963b0f71 [Refact](inverted index) use boost regex to resolve stack overflow issues (#30477) 2024-01-29 19:02:46 +08:00
ae38f28280 [feature](invert index) does not create an inverted index to support the match_phrase_prefix feature. (#30414) 2024-01-29 19:02:46 +08:00
7e19224a6c [fix](function) fix ipv4 funcs get failed error, improve an ipv6 func and exception message (#30269) 2024-01-28 18:25:31 +08:00
0433b8730d [Feature](profile)add shuffle send rows/bytes #30456 2024-01-28 18:25:08 +08:00
b1a9370004 [fix](glue)support access glue iceberg with credential list (#30473)
merge from #30292
2024-01-28 18:23:07 +08:00
96c4fcfb20 [improve](node) refactor partition sort node to reduce memory use
pipelineX
2024-01-27 10:54:29 +08:00
4f915129a9 [pipelineX](localexchange) Add local exchange before TabletFunction (#30446)
* [pipelineX](localexchange) Add local exchange before TabletFunction

* update
2024-01-27 10:29:41 +08:00
823c469c5c [fix](rowsetreader) determine merge iterator considering segment num (#29269) 2024-01-27 09:13:21 +08:00
bedad15f03 [enhancement](scanner) add a lower bound for bytes in scanner queue (#29624) 2024-01-27 09:13:21 +08:00
f571ffe57f (fix)[group commit] Row count is incorrect when enable pipeline load (#30447) 2024-01-27 09:13:21 +08:00
d191809372 [fix](pipeline) Fix non-prepared execute of UnionOperator (#30355) 2024-01-27 09:11:44 +08:00
46cadc9856 [minor](Prefetch) log slow prefetch io operation #30415 2024-01-27 09:11:02 +08:00
22322a3864 [refactor](join) remove unused RowRefListType from join (#30442) 2024-01-27 09:10:41 +08:00
904182685b [debug](move-memtable) enable brpc debug log in regression pipelines (#30389) 2024-01-27 09:10:41 +08:00