* Fix bug of listener
* Change txnStateChangeListener to txnStateChangeCallback
* Fix the logic of beforeAborted
1. It task is not belong to job, the txn attachment will be set to null.
* Txn will be abort normally without attachment.
* Job will not be updated by this task which attachment is null.
1. Add Config.max_routine_load_concurrent_task_num instead of the old one
2. Fix a bug that SHOW ALTER TABLE COLUMN may throw Nullpointer exception
3. Fix some misspelling of docs
1. Moving lock of routine load job from inside of lock of txn to outside.
2. The process of routine load task commit or abort is following:
* lock job
check task
lock txn
commit txn
unlock txn
commit task
* unlock job
3. The process of checking timeout txn will be ignored when there are related task of txn.
4. The relationship between task and txn will be removed when task timeout.
1. Use a data consumer group to share a single stream load pipe with multi data consumers. This will increase the consuming speed of Kafka messages, as well as reducing the task number of routine
load job.
Test results:
* 1 consumer, 1 partitions:
consume time: 4.469s, rows: 990140, bytes: 128737139. 221557 rows/s, 28M/s
* 1 consumer, 3 partitions:
consume time: 12.765s, rows: 2000143, bytes: 258631271. 156689 rows/s, 20M/s
blocking get time(us): 12268241, blocking put time(us): 1886431
* 3 consumers, 3 partitions:
consume time(all 3): 6.095s, rows: 2000503, bytes: 258631576. 328220 rows/s, 42M/s
blocking get time(us): 1041639, blocking put time(us): 10356581
The next 2 cases show that we can achieve higher speed by adding more consumers. But the bottle neck transfers from Kafka consumer to Doris ingestion, so 3 consumers in a group is enough.
I also add a Backend config `max_consumer_num_per_group` to change the number of consumers in a data consumer group, and default value is 3.
In my test(1 Backend, 2 tablets, 1 replicas), 1 routine load task can achieve 10M/s, which is same as raw stream load.
2. Add OFFSET_BEGINNING and OFFSET_END support for Kafka routine load
1. add show proc "/routine_loads" to show statistic of all of jobs and tasks
2. add show proc "/routine_loads/jobname" to show info of all of jobs named jobname
3. add show proc "/routine_loads/jobname/jobid" to show tasks belong to jobid
4. fix bug of allocateBeToTask
1. ShowRoutineLoadStmt is sames like class description. It does not support show all of routine load job in all of db
2. ShowRoutineLoadTaskStmt is sames like class description. It does not support show all of routine laod task in all of job
3. Init partitionIdsToOffset in constructor of KafkaProgress
4. Change Create/Pause/Resume/Stop routine load job to LabelName such as [db.]name
5. Exclude final job when updating job
6. Catch all of exception when scheduling one job. The exception will not block the another jobs.
1. the stopped and cancelled job will be cleaned after the interval of clean second
2. the interval of clean second * 1000 = current timestamp - end timestamp
3. if job could not fetch topic metadata when need_schedule, job will be cancelled
4. fix the deadlock of job and txn. the lock of txn must be in front of the lock of job
5. the job will be paused or cancelled depend on the abort reason of txn
6. the job will be cancelled immediately if the abort reason named offsets out of range
1. stream load executor will abort txn when no correct data in task
2. change txn label to DebugUtil.print(UUID) which is same as task id printed by be
3. change print uuid to hi-lo
1. init cmt offset in stream load context
2. init default max error num = 5000 rows / per 10000 rows
3. add log builder for routine load job and task
4. clone plan fragment param for every task
5. be does not throw too many filter rows while the init max error ratio is 1
1. Check if properties is null before check routine load properties
2. Change transactionStateChange reason to string
3. calculate current num by beId
4. Add kafka offset properties
5. Prefer to use previous be id
6. Add before commit listener of txn: if txn is committed after task is aborted, commit will be aborted
7. queryId of stream load plan = taskId
* Change routine load task sheduler interval to 0
1. change routine load task scheduler interval to 0
2. init progress when routine load scheduler
3. add unit test and function test of routine load scheduler and task commit
* Add checker of custom kafka partition
1. need scheduler to need schedule
2. add checker of custom kafka partition when create routine load job
3. fix unit test error
When Backend report unused replica, which means this replica
is bad, Frontend should set this replica as bad and repair it.
Also, when a disk is reported unused, Frontend should mark this
disk as OFFLINE. And no more replica will be assigned to this
disk.
We also add 3 new metrics: disk_state, tablet_num and scheduled_tablet_num
on Frontend to monitor the disk state and number of tablet on each Backend.
* Remove build rows counter in PartitionHashJoinNode
* Fix unit test fail in RuntimeProfileTest
* Add check for result type length in cast_to_string_val
* Add param of specified thirdparty path
1. The thirdparth path can be specify on build.sh: ./build.sh --thirdparty /specified/path/to/thirdparty
2. If there are only thirdparty param of build.sh, it will build both fe and be
3. Add unit test of routine load stmt
4. Remove source code in docker image
* Add DORIS_THIRDPARTY env in docker image
1. Set DORIS_THIRDPARTY env in docker image. The build.sh will use /var/local/thirdparty instead of /source/code/thirdparty
2. remove --thirdparty param of build.sh
* Change image workdir to /root
Record query consumption into fe audit log. Its basic mode of work is as follows, one of instance of parent plan is responsible for accumulating sub plan's consumption and send to it's parent, BE coordinator will get total consumption because it's a single instance.