Commit Graph

17549 Commits

Author SHA1 Message Date
24d22c6f6b Forawrd some stmt to master (#944)
1. SHOW PROC
2. SHOW PROC web action
3. ADMIN SHOW stmt
4. SHOW ROUTINE LOAD stmt
2019-04-28 10:33:50 +08:00
19b34129bb Fix bug when resume a runnning job (#942) 2019-04-28 10:33:50 +08:00
0579540ba2 Fix routine load bugs (#940)
1. Plan for each task in case table schema may changed
2. Add more detail info for txn
2019-04-28 10:33:50 +08:00
f49c53ee5b Change some throwable to userException (#939) 2019-04-28 10:33:50 +08:00
2b4d02b2fa Add error load log url for routine load job (#938) 2019-04-28 10:33:50 +08:00
8e0512e88d Move lock of routine load job (#934)
1. Moving lock of routine load job from inside of lock of txn to outside.
2. The process of routine load task commit or abort is following:
* lock job
      check task
  lock txn
      commit txn
  unlock txn
      commit task
* unlock job
3. The process of checking timeout txn will be ignored when there are related task of txn.
4. The relationship between task and txn will be removed when task timeout.
2019-04-28 10:33:50 +08:00
0cccb5cc9c Fix bugs of routine load job (#917)
1. Uninitialized counter cause endless data consuming.
2. Incorrect handle null value in column mapping.

* fix bug
2019-04-28 10:33:50 +08:00
75674753c2 Add unit test for RoutineLoadManager and RoutineLoadJob (#881)
1. Add ut
2. Show history job when table has been deleted. Checking auth whatever tablename is null or not.
2019-04-28 10:33:50 +08:00
400d8a906f Optimize the consumer assignment of Kafka routine load job (#870)
1. Use a data consumer group to share a single stream load pipe with multi data consumers. This will increase the consuming speed of Kafka messages, as well as reducing the task number of routine
load job. 

Test results:

* 1 consumer, 1 partitions:
    consume time: 4.469s, rows: 990140, bytes: 128737139.  221557 rows/s, 28M/s
* 1 consumer, 3 partitions:
    consume time: 12.765s, rows: 2000143, bytes: 258631271. 156689 rows/s, 20M/s
    blocking get time(us): 12268241, blocking put time(us): 1886431
* 3 consumers, 3 partitions:
    consume time(all 3): 6.095s, rows: 2000503, bytes: 258631576. 328220 rows/s, 42M/s
    blocking get time(us): 1041639, blocking put time(us): 10356581

The next 2 cases show that we can achieve higher speed by adding more consumers. But the bottle neck transfers from Kafka consumer to Doris ingestion, so 3 consumers in a group is enough.

I also add a Backend config `max_consumer_num_per_group` to change the number of consumers in a data consumer group, and default value is 3.

In my test(1 Backend, 2 tablets, 1 replicas), 1 routine load task can achieve 10M/s, which is same as raw stream load.

2. Add OFFSET_BEGINNING and OFFSET_END support for Kafka routine load
2019-04-28 10:33:50 +08:00
22500b672c Add new keyword in ident (#860)
1. new keyword is added in ident in order to avoid syntax error when keyword is a string
2019-04-28 10:33:50 +08:00
cef2078cb8 Fix FE UT (#850) 2019-04-28 10:33:50 +08:00
01e2a8d48d Fix bug that stream load use invalid file type (#835) 2019-04-28 10:33:50 +08:00
5a757ff5b2 Change column order in show routine load (#824)
1. show "/routine_loads/jobname" and show routine load jobname
2. change jobname | id to  id | jobname
2019-04-28 10:33:50 +08:00
e1c6ba8397 Add show proc of routine load and task (#818)
1. add show proc "/routine_loads" to show statistic of all of jobs and tasks
2. add show proc "/routine_loads/jobname" to show info of all of jobs named jobname
3. add show proc "/routine_loads/jobname/jobid" to show tasks belong to jobid
4. fix bug of allocateBeToTask
2019-04-28 10:33:50 +08:00
c577b9397e Add help doc of routine load (#811) 2019-04-28 10:33:50 +08:00
2e250482fd Modify routine load fe unit test (#803) 2019-04-28 10:33:50 +08:00
e8b360d193 Merge master and fix BE ut 2019-04-28 10:33:50 +08:00
9d08be3c5f Add metrics for routine load (#795)
* Add metrics for routine load
* limit the max number of routine load task in backend to 10
* Fix bug that some partitions will no be assigned
2019-04-28 10:33:50 +08:00
8d2de42b36 Fix some routine load bugs (#787)
1. Reserve the column order in load stmt.
2. Fix some replay bugs of routine load task.
2019-04-28 10:33:50 +08:00
d213f922be Implement ShowRoutineLoadStmt and ShowRoutineLoadTaskStmt (#786)
1. ShowRoutineLoadStmt is sames like class description. It does not support show all of routine load job in all of db
2. ShowRoutineLoadTaskStmt is sames like class description. It does not support show all of routine laod task in all of job
3. Init partitionIdsToOffset in constructor of KafkaProgress
4. Change Create/Pause/Resume/Stop routine load job to LabelName such as [db.]name
5. Exclude final job when updating job
6. Catch all of exception when scheduling one job. The exception will not block the another jobs.
2019-04-28 10:33:50 +08:00
9fa5e1b768 Add a cleaner bg thread to clean idle data consumer (#776) 2019-04-28 10:33:50 +08:00
ff6844b7d6 Fix routine load replay bugs (#770) 2019-04-28 10:33:50 +08:00
95d0186e18 Modify some task scheduler logic (#767)
1. add job id and cluster name to Task info
2. Simplify the logic of getting beIdToMaxConcurrentTaskNum
2019-04-28 10:33:50 +08:00
aa7f4c82da modify the replay logic of routine load job (#762) 2019-04-28 10:33:50 +08:00
8f781f95c7 Add persist operations for routine load job (#754) 2019-04-28 10:33:50 +08:00
e1fb02d4c0 Add routine load job cleaner (#742)
1. the stopped and cancelled job will be cleaned after the interval of clean second
2. the interval of clean second * 1000 = current timestamp - end timestamp
3. if job could not fetch topic metadata when need_schedule, job will be cancelled
4. fix the deadlock of job and txn. the lock of txn must be in front of the lock of job
5. the job will be paused or cancelled depend on the abort reason of txn
6. the job will be cancelled immediately if the abort reason named offsets out of range
2019-04-28 10:33:50 +08:00
8b52787114 Stream load with no data will abort txn (#735)
1. stream load executor will abort txn when no correct data in task
2. change txn label to DebugUtil.print(UUID) which is same as task id printed by be
3. change print uuid to hi-lo
2019-04-28 10:33:50 +08:00
062f827b60 Add attachment in rollback txn (#725)
1. init cmt offset in stream load context
2. init default max error num = 5000 rows / per 10000 rows
3. add log builder for routine load job and task
4. clone plan fragment param for every task
5. be does not throw too many filter rows while the init max error ratio is 1
2019-04-28 10:33:50 +08:00
192c8c5820 Fix bug that data consumer should be removed from pool when being got (#723) 2019-04-28 10:33:50 +08:00
8474061d63 Add some logs (#711) 2019-04-28 10:33:50 +08:00
fbbe0d19ba Change the relationship between txn and task (#703)
1. Check if properties is null before check routine load properties
2. Change transactionStateChange reason to string
3. calculate current num by beId
4. Add kafka offset properties
5. Prefer to use previous be id
6. Add before commit listener of txn: if txn is committed after task is aborted, commit will be aborted
7. queryId of stream load plan = taskId
2019-04-28 10:33:50 +08:00
a2c3fb1507 Missing to set auth code (#699) 2019-04-28 10:33:50 +08:00
10cee6ecff Add missing files (#696) 2019-04-28 10:33:50 +08:00
567d5de2de Add a data consumer pool to reuse the data consumer (#691) 2019-04-28 10:33:50 +08:00
2314a3ecd4 Put begin txn into task scheduler (#687)
1. fix the nesting lock of db and txn
2. the txn of task will be init in task scheduler before take task from queue
2019-04-28 10:33:50 +08:00
20b2b2c37f Modify interface (#684)
1. Add batch submit interface
2. Add Kafka Event callback to catch Kafka events
2019-04-28 10:33:50 +08:00
152606fbd6 Submit routine load task immediately (#682)
1. Use submit_routine_load_task instead of agentTaskQueue
2. Remove thrift dependency in StreamLoadPlanner and StreamLoadScanNode
2019-04-28 10:33:50 +08:00
9618d20a72 Add unit test (#675) 2019-04-28 10:33:50 +08:00
0820a29b8d Implement the routine load process of Kafka on Backend (#671) 2019-04-28 10:33:50 +08:00
da308da17c Fix bug that empty stream load return unexpected error msg (#1052) 2019-04-28 09:36:19 +08:00
88bd289caa Modify thirdparties (#1044)
1. Update snappy from 1.1.4 to 1.1.7
2. disable ssl of librdkafka
2019-04-26 17:03:55 +08:00
642b2a3604 Add a java sample for stream load (#1039) 2019-04-26 13:30:50 +08:00
e8397d17f3 Change length of CHAR type when upon schema change (#1035)
CHAR type is of the fixed length string. Upon modify length
of CHAR type, the size of slice should be equal to new length.
2019-04-25 21:28:32 +08:00
101106d83e Add left/right string function (#1032) 2019-04-25 17:29:34 +08:00
8031ab773b Support add hll column in Schemachange (#1033) 2019-04-25 16:09:19 +08:00
3cc78fd952 Add trim in hdfs broker name node string (#1028) 2019-04-25 13:48:20 +08:00
566243c4e3 Fix unknown node id bug when stream load after decommission BE (#993) 2019-04-24 20:58:13 +08:00
95a06dcd2a Change the buffer length for FloatToBuffer() method (#1019) 2019-04-24 20:11:06 +08:00
f852aa1cff Support enable_insert_strict (#1013)
Add a variable enable_insert_strict, this default value is false. When
this value is set to true, insert will fail if there is any filtered
data. If this value is false, insert will ignore filtered data and
success
2019-04-24 17:06:53 +08:00
7933cc2209 Make add_pending_segment_group() idempotent (#1001) 2019-04-23 17:33:45 +08:00