if BE's tablet not contains a txn, publish txn on them will no error, when check version exists it will indicate the tablet as error_tablet_id in task's response, so FE can know this tablet has fail.
Also for task, it's no need to set its status as "VERSION_NOT_EXIST". Because if set it as not ok, the BE will try this task two times. Since not contains this tablet's txn, the retry is in vain.
1. fix race condition problem when get tablet load index
2. change tablet search algorithm from random to round-robin for random distribution table when load_to_single_tablet set to false
Modify schedule_slot_num_per_path to schedule_slot_num_per_hdd_path and schedule_slot_num_per_ssd_path in the documentation
Co-authored-by: xingying01 <xingying01@corp.netease.com>
Reproduce:
DBA do following operations:
1. create user user1@['domain']; // the domain will be resolved as 2 ip: ip1 and ip2;
2. create user user1@'ip1';
3. wait at least 10 second
4. grant all on *.*.* to user1@'ip1'; // will return error: user1@'ip1' does not exist
This is because the daemon thread DomainResolver resolve the "domain" and overwrite the `user1@'ip1'`
which is created by DBA.
This PR fix it.
Effect: Client will see error message like below when BE meeting plan logical error.
RROR 1105 (HY000): errCode = 2, detailMessage = ([xxx]())[CANCELLED]Logical error during processing VNewOlapScanNode(dr_case_tag), output of projections 2 mismatches with exec node output 3
BitmapValue::write_to will get a string with size 1 for empty BitmapValue, however the size 1 string will reinterpret to BitmapValue* back in ColumnComplexType::insert:
void insert(const Field& x) override {
const String& s = doris::vectorized::get<const String&>(x);
data.push_back(reinterpret_cast<const T>(s.c_str()));
}
in data.push_back will goto BitmapValue copy constructor, as the _type is not first member in BitmapValue, cause access to an unknown memory location.
New session variable: runtime_filter_wait_infinitely. If set runtime_filter_wait_infinitely = true, consumer of rf will wait on receiving until query is timeout.
If preparation fails, the counter _peak_memory_usage_counter will be a wild pointer.
*** SIGSEGV address not mapped to object (@0x454d49545f) received by PID 16992 (TID 18856 OR 0x7f4d05444700) from PID 1296651359; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/doris/be/src/common/signal_handler.h:417
1# os::Linux::chained_handler(int, siginfo*, void*) in /app/doris/Nexchip-doris-1.2.4.2-bin-x86_64/java8/jre/lib/amd64/server/libjvm.so
2# JVM_handle_linux_signal in /app/doris/Nexchip-doris-1.2.4.2-bin-x86_64/java8/jre/lib/amd64/server/libjvm.so
3# signalHandler(int, siginfo*, void*) in /app/doris/Nexchip-doris-1.2.4.2-bin-x86_64/java8/jre/lib/amd64/server/libjvm.so
4# 0x00007F55C85B9400 in /lib64/libc.so.6
5# doris::vectorized::VDataStreamSender::close(doris::RuntimeState*, doris::Status) at /root/doris/be/src/vec/sink/vdata_stream_sender.cpp:734
6# doris::PlanFragmentExecutor::close() at /root/doris/be/src/runtime/plan_fragment_executor.cpp:543
7# doris::PlanFragmentExecutor::~PlanFragmentExecutor() at /root/doris/be/src/runtime/plan_fragment_executor.cpp:95
8# doris::FragmentExecState::~FragmentExecState() at /root/doris/be/src/runtime/fragment_mgr.cpp:112
9# std::_Sp_counted_ptr<doris::FragmentExecState*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() at /root/ldb/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:348
10# doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&) at /root/doris/be/src/runtime/fragment_mgr.cpp:855
11# doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&) at /root/doris/be/src/runtime/fragment_mgr.cpp:592
12# doris::PInternalServiceImpl::_exec_plan_fragment_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, doris::PFragmentRequestVersion, bool) at /root/doris/be/src/service/internal_service.cpp:463
13# doris::PInternalServiceImpl::_exec_plan_fragment_in_pthread(google::protobuf::RpcController*, doris::PExecPlanFragmentRequest const*, doris::PExecPlanFragmentResult*, google::protobuf::Closure*) at /root/doris/be/src/service/internal_service.cpp:305
14# doris::WorkThreadPool<false>::work_thread(int) at /root/doris/be/src/util/work_thread_pool.hpp:160
15# execute_native_thread_routine at ../../../../../libstdc++-v3/src/c++11/thread.cc:84
16# start_thread in /lib64/libpthread.so.0
17# clone in /lib64/libc.so.6
## Motivation:
In the past, our JOB only supported Timer JOB, which could only provide scheduling for fixed-time tasks. Meanwhile, the JOB was solely responsible for execution, and during execution, there might be inconsistencies in states, where the task was executed successfully, but the JOB's recorded task status was not updated.
This inconsistency in task states recorded by the JOB could not guarantee the correctness of the JOB's status. With the gradual integration of various businesses into the JOB, such as the export job and mtmv job, we found that scaling became difficult, and the JOB became particularly bulky. Hence, we have decided to refactor JOB.
## Refactoring Goals:
- Provide a unified external registration interface so that all JOBs can be registered through this interface and be scheduled by the JobScheduler.
- The JobScheduler can schedule instant JOBs, timer JOBs, and manual JOBs.
- JOB should provide a unified external extension class. All JOBs can be extended through this extension class, which can provide special functionalities like JOB status restoration, Task execution, etc.
- Extended JOBs should manage task states on their own to avoid inconsistent state maintenance issues.
- Different JOBs should use their own thread pools for processing to prevent inter-JOB interference.
### Design:
- The JOBManager provides a unified registration interface through which all JOBs can register and then be scheduled by the JobScheduler.
- The TimerJob periodically fetches JOBs that need to be scheduled within a time window and hands them over to the Time Wheel for triggering. To prevent excessive tasks in the Time Wheel, it distributes the tasks to the dispatch thread pool, which then assigns them to corresponding thread pools for execution.
- ManualJob or Instant Job directly assigns tasks to the corresponding thread pool for execution.
- The JOB provides a unified extension class that all JOBs can utilize for extension, providing special functionalities like JOB status restoration, Task execution, etc.
- To implement a new JOB, one only needs to implement AbstractJob.class and AbstractTask.class.
<img width="926" alt="image" src="https://github.com/apache/doris/assets/16631152/3032e05d-133e-425b-b31e-4bb492f06ddc">
## NOTICE
This will cause the master's metadata to be incompatible