Commit Graph

21 Commits

Author SHA1 Message Date
aea719627d Revert "[enhencement](streamload) add on_close callback for httpserver (#20826)" (#20927)
This reverts commit 5b6761acb86852a93351b7b971eb2049fb567aaf.
2023-06-17 10:39:02 +08:00
5b6761acb8 [enhencement](streamload) add on_close callback for httpserver (#20826)
Sometimes connection cannot be released properly during on_free. We need
on_close callback as the last resort.

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-06-15 13:44:02 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
b51ce415e7 [Feature](load) Add submitter and comments to load job (#16878)
* [Feature](load) Add submitter and comments to load job
2023-02-28 09:06:19 +08:00
f3aea7f0f0 [Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744) 2022-12-11 23:33:18 +08:00
a6bf8c13eb [Feature](Transaction) Support two phase commit (2PC) for stream load (#7473)
The two phase batch commit means:
During Stream load, after data is written, the message will be returned to the client,
the data is invisible at this point and the transaction status is PRECOMMITTED.
The data will be visible only after COMMIT is triggered by client.
    
1. User can invoke the following interface to trigger commit operations for transaction:

curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:commit" \
http://fe_host:http_port/api/{db}/_stream_load_2pc

or

curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:commit" \
http://be_host:webserver_port/api/{db}/_stream_load_2pc

    
2.User can invoke the following interface to trigger abort operations for transaction:

curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:abort" \
http://fe_host:http_port/api/{db}/_stream_load_2pc

or

curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:abort" \
http://be_host:webserver_port/api/{db}/_stream_load_2pc
2022-02-16 11:55:04 +08:00
c0e59e59aa [fix][refactor] fix bugs and refactor some code by lint (#7871)
1. Fix some `passedByValue` issues.
2. Fix some `dereferenceBeforeCheck` issues.
3. Fix some `uninitMemberVar` issues.
4. Fix some iterator `eraseDereference` issues.
5. Fix compile issue introduced from #7923 #7905 #7848
2022-02-01 14:31:14 +08:00
bde60280b8 [Optimize] use string_view instead of std::string in string function (#6010) 2021-06-16 09:40:13 +08:00
f4ebac0210 [BUG] BE core when FE get_stream_load_record (#5913) 2021-05-27 22:06:26 +08:00
a4f8194111 [Audit][Stream Load] Support audit function for stream load (#5452)
Record finished stream load job (both successful job and failed job) into audit log
so that we can see when the stream load job was executed and check the details of stream load jobs.
2021-04-21 16:36:12 +08:00
6fedf5881b [CodeFormat] Clang-format cpp sources (#4965)
Clang-format all c++ source files.
2020-11-28 18:36:49 +08:00
75e0ba32a1 Fixes some be typo (#4714) 2020-10-13 09:37:15 +08:00
01c1de1870 [Load] Add more metric to trace the time cost in stream load and make brpc_num_threads configurable (#3703) 2020-06-04 13:37:28 +08:00
3c539aac54 [Refactor] Some tiny refactor on streaming-load related code (#2891)
Mainly contains the following modifications:
1. Use `std::unique_ptr` to replace some naked pointers
2. Modify some methods from member-method to local-static-function
3. Modify some methods do not need to be public to private
4. Some formatting changes: such as wrapping lines that are too long
5. Remove some useless variables
6. Add or modify some comments for easier understanding

No functional changes in this patch.
2020-02-13 10:42:52 +08:00
044489b92f Optimize some kinds of load jobs (#1762)
1. Support specifying label to Insert Into stmt.

    INSERT INTO tbl1 WITH LABEL label1 ...;

2. Return job' state corresponding to the existing label in result of stream load.

    ...
    "Status": "Label Already Exists",
    "ExistingJobStatus": "FINISHED"
    ...

3. Return the recent 2000 transactions in SHOW PROC '/transactions'
2019-09-09 22:11:12 +08:00
4e043e66e2 Modify the result json format of mini load (#1487)
Mini load is now using stream load framework. But we should keep the
mini load return behavior and result json format be same as old.
So PUBLISH_TIMEOUT error should be treated as OK in mini load.

Also add 2 counters for OlapTableSink profile:
SerializeBatchTime: time of serializing all row batch.
WaitInFlightPacketTime: time of waiting last send packet
2019-07-16 19:15:41 +08:00
b7b66527ce Fix some load bugs (#961)
1. Use load job's timeout as its txn timeout
2. Add a new session variable 'forward_to_master' for SHOW PROC and ADMIN stmt
2019-04-28 10:33:50 +08:00
400d8a906f Optimize the consumer assignment of Kafka routine load job (#870)
1. Use a data consumer group to share a single stream load pipe with multi data consumers. This will increase the consuming speed of Kafka messages, as well as reducing the task number of routine
load job. 

Test results:

* 1 consumer, 1 partitions:
    consume time: 4.469s, rows: 990140, bytes: 128737139.  221557 rows/s, 28M/s
* 1 consumer, 3 partitions:
    consume time: 12.765s, rows: 2000143, bytes: 258631271. 156689 rows/s, 20M/s
    blocking get time(us): 12268241, blocking put time(us): 1886431
* 3 consumers, 3 partitions:
    consume time(all 3): 6.095s, rows: 2000503, bytes: 258631576. 328220 rows/s, 42M/s
    blocking get time(us): 1041639, blocking put time(us): 10356581

The next 2 cases show that we can achieve higher speed by adding more consumers. But the bottle neck transfers from Kafka consumer to Doris ingestion, so 3 consumers in a group is enough.

I also add a Backend config `max_consumer_num_per_group` to change the number of consumers in a data consumer group, and default value is 3.

In my test(1 Backend, 2 tablets, 1 replicas), 1 routine load task can achieve 10M/s, which is same as raw stream load.

2. Add OFFSET_BEGINNING and OFFSET_END support for Kafka routine load
2019-04-28 10:33:50 +08:00
9d08be3c5f Add metrics for routine load (#795)
* Add metrics for routine load
* limit the max number of routine load task in backend to 10
* Fix bug that some partitions will no be assigned
2019-04-28 10:33:50 +08:00
567d5de2de Add a data consumer pool to reuse the data consumer (#691) 2019-04-28 10:33:50 +08:00
0820a29b8d Implement the routine load process of Kafka on Backend (#671) 2019-04-28 10:33:50 +08:00