Background:
Migration will create new tablet in different DataDir, the old tablet will be moved to TabletManager::_shutdown_tablets.
The migration task won't copy data in stale rowsets to new tablet, so after migration, the new tablet don't contains stale rowsets of old tablet
The path GC process will check every path, to make sure if it's an useless tablet, or an useless rowset. If it is, will remove data of these tablets/rowsets
The issue:
When path GC got a stale rowset path from the data dir of old tablet, it extract the tablet id and rowset id
Then it check if the tablet id exists in TabletManager, and the answer is YES!
It got the tablet instance, which is the new tablet, then it check if the stale rowset id from the old tablet path exists in the new tablet instance, and got the answer NO.
The path GC process treat the rowset as an useless rowset, since it can't find anyone holds reference to it, then delete the data of this stale rowset.
But some query may still holds reference to this stale rowset, the deletion will cause query failure.
Solution:
The lifecycle of all rowsets in a shutdown tablet, should be related with the lifecycle of this tablet
We need to differentiate the old tablet and the new one created by migration task, while performing path GC.
* [improvement](create tablet) backend create tablet round robin among … (#29818)
* [improvement](create tablet) be choose disk tolerate with little skew (#30354)
---------
Co-authored-by: yujun <yu.jun.reach@gmail.com>
materialized view definition is
> select
> sum(o_totalprice) as sum_total,
> max(o_totalprice) as max_total,
> min(o_totalprice) as min_total,
> count(*) as count_all,
> bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) >cnt_1,
> bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as >cnt_2
> from lineitem
> left join orders on l_orderkey = o_orderkey and l_shipdate = o_orderdate;
the query following can be rewritten by materialized view above.
it use the aggregate fuction arithmetic calculation in the select
> select
> count(distinct case when O_SHIPPRIORITY > 2 and o_orderkey IN (2) then o_custkey else null end) as cnt_2,
> (sum(o_totalprice) + min(o_totalprice)) * count(*),
> min(o_totalprice) + count(distinct case when O_SHIPPRIORITY > 2 and o_orderkey IN (2) then o_custkey else null >end)
> from lineitem
> left join orders on l_orderkey = o_orderkey and l_shipdate = o_orderdate;
the target expression should be:
1. only one numeric slot, or
2. cast for any data type
example:
select * from T1 join T2 on abs(T1.a) = T2.a
RF T2.a->abs(T1.a)