Commit Graph

11 Commits

Author SHA1 Message Date
400d8a906f Optimize the consumer assignment of Kafka routine load job (#870)
1. Use a data consumer group to share a single stream load pipe with multi data consumers. This will increase the consuming speed of Kafka messages, as well as reducing the task number of routine
load job. 

Test results:

* 1 consumer, 1 partitions:
    consume time: 4.469s, rows: 990140, bytes: 128737139.  221557 rows/s, 28M/s
* 1 consumer, 3 partitions:
    consume time: 12.765s, rows: 2000143, bytes: 258631271. 156689 rows/s, 20M/s
    blocking get time(us): 12268241, blocking put time(us): 1886431
* 3 consumers, 3 partitions:
    consume time(all 3): 6.095s, rows: 2000503, bytes: 258631576. 328220 rows/s, 42M/s
    blocking get time(us): 1041639, blocking put time(us): 10356581

The next 2 cases show that we can achieve higher speed by adding more consumers. But the bottle neck transfers from Kafka consumer to Doris ingestion, so 3 consumers in a group is enough.

I also add a Backend config `max_consumer_num_per_group` to change the number of consumers in a data consumer group, and default value is 3.

In my test(1 Backend, 2 tablets, 1 replicas), 1 routine load task can achieve 10M/s, which is same as raw stream load.

2. Add OFFSET_BEGINNING and OFFSET_END support for Kafka routine load
2019-04-28 10:33:50 +08:00
9d08be3c5f Add metrics for routine load (#795)
* Add metrics for routine load
* limit the max number of routine load task in backend to 10
* Fix bug that some partitions will no be assigned
2019-04-28 10:33:50 +08:00
8d2de42b36 Fix some routine load bugs (#787)
1. Reserve the column order in load stmt.
2. Fix some replay bugs of routine load task.
2019-04-28 10:33:50 +08:00
9fa5e1b768 Add a cleaner bg thread to clean idle data consumer (#776) 2019-04-28 10:33:50 +08:00
8f781f95c7 Add persist operations for routine load job (#754) 2019-04-28 10:33:50 +08:00
8b52787114 Stream load with no data will abort txn (#735)
1. stream load executor will abort txn when no correct data in task
2. change txn label to DebugUtil.print(UUID) which is same as task id printed by be
3. change print uuid to hi-lo
2019-04-28 10:33:50 +08:00
8474061d63 Add some logs (#711) 2019-04-28 10:33:50 +08:00
567d5de2de Add a data consumer pool to reuse the data consumer (#691) 2019-04-28 10:33:50 +08:00
20b2b2c37f Modify interface (#684)
1. Add batch submit interface
2. Add Kafka Event callback to catch Kafka events
2019-04-28 10:33:50 +08:00
9618d20a72 Add unit test (#675) 2019-04-28 10:33:50 +08:00
0820a29b8d Implement the routine load process of Kafka on Backend (#671) 2019-04-28 10:33:50 +08:00