* [Fix](rowset) When a rowset is cooled down, it is directly deleted. This can result in data query misses in the second phase of a two-phase query.
related pr #20732
There are two reasons for moving the logic of delayed deletion from the Tablet to the StorageEngine. The first reason is to consolidate the logic and unify the delayed operations. The second reason is that delayed garbage collection during queries can cause rowsets to remain in the "stale rowsets" state, preventing the timely deletion of rowset metadata, It may cause rowset metadata too large.
* not use unused rowsets
Problem:
When doing show export jobs, limit would execute before sort before changed. So the result would not be expected because limit always cut results first and we can not get what we want.
Example:
we having export job1 and job2 with JobId1 > JobId2. We want to get job with JobId1
show export from db order by JobId desc limit 1;
We do limit 1 first, so we would probably get Job2 because JobId assigned from small to large
Solve:
We can not cut results first if we have order by clause. And cut result set after sorting
Currently we find that MetastoreEventsProcessor can not catch up the event producing rate in our cluster, so we need to merge some hms events every round.
add session var: MAX_JOIN_NUMBER_BUSHY_TREE, default is 5
if table number is less than MAX_JOIN_NUMBER_BUSHY_TREE in a join cluster, nereids try bushy tree, o.w. zigzag tree
Add new rule named "OrToIn", used to convert multi equalTo which has same slot and compare to a literal of disjunction to a InPredicate so that it could be pushdown to storage engine.
for example:
```sql
col1 = 1 or col1 = 2 or col1 = 3 and (col2 = 4)
col1 = 1 and col1 = 3 and col2 = 3 or col2 = 4
(col1 = 1 or col1 = 2) and (col2 = 3 or col2 = 4)
```
would be converted to
```sql
col1 in (1, 2) or col1 = 3 and (col2 = 4)
col1 = 1 and col1 = 3 and col2 = 3 or col2 = 4
(col1 in (1, 2) and (col2 in (3, 4)))
```
Before this PR, if user connect to follower and analyze table, stats would not get cached in follower FE, since Analyze stmt would be forwarded to master, and in follower it's still lazy load to cache.After this PR, once analyze finished on master, master would sync stats to all followers and update follower's stats cache
Load partition stats to col stats
Enlarge jetty_server_max_http_header_size to avoid Request Header Fields
Too Large error when streamloading to FE.
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
After the last time to call scan_task.scan_func(),the should be ended, this means PipelineFragmentContext could be released.
Then after PipelineFragmentContext is released, visiting its field such as query_ctx or _state may cause core dump.
But it can only explain core 2
void ScannerScheduler::_task_group_scanner_scan(ScannerScheduler* scheduler,
taskgroup::ScanTaskTaskGroupQueue* scan_queue) {
while (!_is_closed) {
taskgroup::ScanTask scan_task;
auto success = scan_queue->take(&scan_task);
if (success) {
int64_t time_spent = 0;
{
SCOPED_RAW_TIMER(&time_spent);
scan_task.scan_func();
}
scan_queue->update_statistics(scan_task, time_spent);
}
}
}
* [pipeline](ckb) also trigger new ckb pipeline
* [pipeline](ckb) all pr run ckb pipeline
* change required
---------
Co-authored-by: stephen <hello-stephen@qq.com>
Issue Number: close #xxx
when cal array hash, elem size is not need to seed hash
hash = HashUtil::zlib_crc_hash(reinterpret_cast<const char*>(&elem_size),
sizeof(elem_size), hash);
but we need to be care [[], [1]] vs [[1], []], when array nested array , and nested array is empty, we should make hash seed to
make difference
2. use range for one hash value to avoid virtual function call in loop.
which double the performance. I make it in ut
column: array[int64]
50 rows , and single array has 10w elements
1. expand the semantics of variable strict_mode to control the behavior for stream load: if strict_mode is true, the stream load can only update existing rows; if strict_mode is false, the stream load can insert new rows if the key is not present in the table
2. when inserting a new row in non-strict mode stream load, the unmentioned columns should have default value or be nullable