doris

Author	SHA1	Message	Date
Mingyu Chen	c3b5046940	Fix bug of invalid stream load task rollback (#1999 ) If stream load be committed with result PUBLISH_TIMEOUT, it should not rollback this transaction, but only return this message to user.	2019-10-17 21:08:29 +08:00
yiguolei	f852f50acb	Improve unique id performance (#1911 ) Remove the default constructor for UniqueID Add a gen_uid method in UniqueId. If need to generate a new uid, users should call this api explicitly. Reuse boost random generator not generate a new one every time.	2019-09-29 18:20:02 +08:00
yiguolei	2f0808137a	Refactor FrontendHelper (#1888 )	2019-09-27 13:21:14 +08:00
HangyuanLiu	235cdb0ecd	Commit kafka offset (#1734 ) Commit kafka offset in routine load Kafka will decide whether to delete data based on whether all consumer group is commit offset or not. If there is no commit offset, the kafka server disk may be full	2019-09-10 14:27:06 +08:00
Mingyu Chen	044489b92f	Optimize some kinds of load jobs (#1762 ) 1. Support specifying label to Insert Into stmt. INSERT INTO tbl1 WITH LABEL label1 ...; 2. Return job' state corresponding to the existing label in result of stream load. ... "Status": "Label Already Exists", "ExistingJobStatus": "FINISHED" ... 3. Return the recent 2000 transactions in SHOW PROC '/transactions'	2019-09-09 22:11:12 +08:00
Mingyu Chen	00f8040bf3	Fix bug that 2 same stream load jobs may both be able to executed successfully (#1690 ) This will cause 2 jobs trying to write same file, and cause file damaged.	2019-08-22 19:38:16 +08:00
Mingyu Chen	4e043e66e2	Modify the result json format of mini load (#1487 ) Mini load is now using stream load framework. But we should keep the mini load return behavior and result json format be same as old. So PUBLISH_TIMEOUT error should be treated as OK in mini load. Also add 2 counters for OlapTableSink profile: SerializeBatchTime: time of serializing all row batch. WaitInFlightPacketTime: time of waiting last send packet	2019-07-16 19:15:41 +08:00
EmmyMiao87	6c246418fb	Add timeout in stream load planner (#1480 ) Mini load timeout needs to be added in plan options. The timeout property has been added in request of process put. Otherwise, the timeout of mini load is useless. Add log of label, txn and query id in mini load	2019-07-15 22:14:59 +08:00
worker24h	7eab12a40e	Support reading Parquet file when loading data (#1173 )	2019-07-01 18:39:27 +08:00
EmmyMiao87	7550b2f09b	Convert mini load to streaming mini load (#1323 ) * This commit has brought contribution to streaming mini load The operation of streaming mini load is sames as previous. Also, user can check the load by frontend. The difference is that streaming mini load finish the task before reply of REST API while the non-streaming only register a load. * When updating doris Updating fe or be firstly are also supported. After fe and be are updated, the streaming mini load will take effect. * For multi mini load The non-streaming mini load still has been used by multi mini load. The behavior of multi mini load has not been changed. * Add a interface named isSupportedFunction This function is used to protect the correctness of new feature which consists of be and fe during updaing.	2019-06-21 19:34:50 +08:00
ZHAO Chun	9d03ba236b	Uniform Status (#1317 )	2019-06-14 23:38:31 +08:00
Mingyu Chen	ff0dd0d2da	Support SSL authentication with Kafka in routine load job (#1235 )	2019-06-07 16:29:01 +08:00
HangyuanLiu	9d19c6c315	Support arbitrary kafka properties (#1204 )	2019-05-28 10:03:50 +08:00
Mingyu Chen	a08170fd50	Enhance the usabilities (#1100 ) * Enhence the usabilities 1. Add metrics to monitor transactions and steaming load process in BE. 2. Modify BE config 'result_buffer_cancelled_interval_time' to 300s. 3. Modify FE config 'enable_metric_calculator' to true. 4. Add more log for tracing broker load process. 5. Modify the query report process, to cancel query immediately if some instance failed. * Fix bugs 1. Avoid NullPointer when enabling colocation join with broker load 2. Return immediately when pull load task coordinator execution failed	2019-05-07 15:55:04 +08:00
Mingyu Chen	3409ed41ac	Reset commit offset if task aborted due to runtime error (#994 )	2019-04-28 10:33:50 +08:00
Mingyu Chen	b7b66527ce	Fix some load bugs (#961 ) 1. Use load job's timeout as its txn timeout 2. Add a new session variable 'forward_to_master' for SHOW PROC and ADMIN stmt	2019-04-28 10:33:50 +08:00
Mingyu Chen	2b4d02b2fa	Add error load log url for routine load job (#938 )	2019-04-28 10:33:50 +08:00
Mingyu Chen	400d8a906f	Optimize the consumer assignment of Kafka routine load job (#870 ) 1. Use a data consumer group to share a single stream load pipe with multi data consumers. This will increase the consuming speed of Kafka messages, as well as reducing the task number of routine load job. Test results： * 1 consumer, 1 partitions: consume time: 4.469s, rows: 990140, bytes: 128737139. 221557 rows/s, 28M/s * 1 consumer, 3 partitions: consume time: 12.765s, rows: 2000143, bytes: 258631271. 156689 rows/s, 20M/s blocking get time(us): 12268241, blocking put time(us): 1886431 * 3 consumers, 3 partitions: consume time(all 3): 6.095s, rows: 2000503, bytes: 258631576. 328220 rows/s, 42M/s blocking get time(us): 1041639, blocking put time(us): 10356581 The next 2 cases show that we can achieve higher speed by adding more consumers. But the bottle neck transfers from Kafka consumer to Doris ingestion, so 3 consumers in a group is enough. I also add a Backend config `max_consumer_num_per_group` to change the number of consumers in a data consumer group, and default value is 3. In my test(1 Backend, 2 tablets, 1 replicas), 1 routine load task can achieve 10M/s, which is same as raw stream load. 2. Add OFFSET_BEGINNING and OFFSET_END support for Kafka routine load	2019-04-28 10:33:50 +08:00
morningman	e8b360d193	Merge master and fix BE ut	2019-04-28 10:33:50 +08:00
Mingyu Chen	9d08be3c5f	Add metrics for routine load (#795 ) * Add metrics for routine load * limit the max number of routine load task in backend to 10 * Fix bug that some partitions will no be assigned	2019-04-28 10:33:50 +08:00
Mingyu Chen	8f781f95c7	Add persist operations for routine load job (#754 )	2019-04-28 10:33:50 +08:00
EmmyMiao87	8b52787114	Stream load with no data will abort txn (#735 ) 1. stream load executor will abort txn when no correct data in task 2. change txn label to DebugUtil.print(UUID) which is same as task id printed by be 3. change print uuid to hi-lo	2019-04-28 10:33:50 +08:00
EmmyMiao87	062f827b60	Add attachment in rollback txn (#725 ) 1. init cmt offset in stream load context 2. init default max error num = 5000 rows / per 10000 rows 3. add log builder for routine load job and task 4. clone plan fragment param for every task 5. be does not throw too many filter rows while the init max error ratio is 1	2019-04-28 10:33:50 +08:00
Mingyu Chen	8474061d63	Add some logs (#711 )	2019-04-28 10:33:50 +08:00
EmmyMiao87	fbbe0d19ba	Change the relationship between txn and task (#703 ) 1. Check if properties is null before check routine load properties 2. Change transactionStateChange reason to string 3. calculate current num by beId 4. Add kafka offset properties 5. Prefer to use previous be id 6. Add before commit listener of txn: if txn is committed after task is aborted, commit will be aborted 7. queryId of stream load plan = taskId	2019-04-28 10:33:50 +08:00
Mingyu Chen	567d5de2de	Add a data consumer pool to reuse the data consumer (#691 )	2019-04-28 10:33:50 +08:00
Mingyu Chen	20b2b2c37f	Modify interface (#684 ) 1. Add batch submit interface 2. Add Kafka Event callback to catch Kafka events	2019-04-28 10:33:50 +08:00
Mingyu Chen	0820a29b8d	Implement the routine load process of Kafka on Backend (#671 )	2019-04-28 10:33:50 +08:00

28 Commits