Commit Graph

48 Commits

Author SHA1 Message Date
a6bf8c13eb [Feature](Transaction) Support two phase commit (2PC) for stream load (#7473)
The two phase batch commit means:
During Stream load, after data is written, the message will be returned to the client,
the data is invisible at this point and the transaction status is PRECOMMITTED.
The data will be visible only after COMMIT is triggered by client.
    
1. User can invoke the following interface to trigger commit operations for transaction:

curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:commit" \
http://fe_host:http_port/api/{db}/_stream_load_2pc

or

curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:commit" \
http://be_host:webserver_port/api/{db}/_stream_load_2pc

    
2.User can invoke the following interface to trigger abort operations for transaction:

curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:abort" \
http://fe_host:http_port/api/{db}/_stream_load_2pc

or

curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:abort" \
http://be_host:webserver_port/api/{db}/_stream_load_2pc
2022-02-16 11:55:04 +08:00
8d7a0d9747 [docs](routine-load)Update routine-load-manual.md (#8006) 2022-02-14 09:28:08 +08:00
83f6eef506 [improvement](routine-load) Make routine load work with old kafka version (#7630)
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2022-01-10 17:30:24 +08:00
43ed54faa1 [docs] The name of hidden column is incorrect in batch-delete-manual.md(#7465) (#7466) 2021-12-24 21:30:57 +08:00
06c38ce46e [enhancement] Make concurrent_number for routine load task can be larger than be num (#7386)
* [enhancement] Make concurrent_number for routine load task can be larger than be num

Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2021-12-17 11:04:29 +08:00
2b90967c4c [fix][refactor](broker load) refactor the scheduling logic of broker load (#7371)
1. Refactor the scheduling logic of broker load. Details see #7367 
2. Fix bug that loadedBytes in SHOW LOAD result is wrong.
3. Cancel the thread of LoadTimeoutChecker
   Now for PENDING load jobs, there will be no timeout. And the timeout of a load job
   start when pending load task is scheduled.
4. Fix a bug that the loading task is never submitted to the pool.
   The logic of BlockedPolicy is wrong. We should make sure the task is submitted to the pool,
   or the RejectedExecutionException should be thrown.
5. Now the transaction of a load job will begin in pending task, instead of when submitting the job.
2021-12-16 10:39:22 +08:00
e9282205f1 [feat-opt](spark-load) support bitmap binary data from hive in spark load (#6883)
Support to load the binary data of bitmap value from Hive into Doris.
fix #6461
2021-11-20 21:38:38 +08:00
e8cabfff27 [S3] Support path style endpoint (#6962)
Add a use_path_style property for S3
Upgrade hadoop-common and hadoop-aws to 2.8.0 to support path style property
Fix some S3 URI bugs
Add some logs for tracing load process.
2021-11-01 10:48:10 +08:00
4170aabf83 [Optimize] optimize some session variable and profile (#6920)
1. optimize error message when using batch delete
2. rename session variable is_report_success to enable_profile
3. add table name to OlapScanner profile
2021-10-27 18:03:12 +08:00
090d99b690 [Docs] fix urls and format in routine load docs (#6896)
fix urls and format in routine load docs
2021-10-23 16:52:33 +08:00
7b50409ada [Bug][Binlog] Fix the number of versions may exceed the limit during data synchronization (#6889)
Bug detail: #6887 

To solve this problem, the commit of transaction must meet any of the following conditions to avoid commit too freqently:

1. The current accumulated event quantity is greater than the `min_sync_commit_size`.
2. The current accumulated data size is greater than the `min_bytes_sync_commit`.

In addition, when the accumulated data size exceeds `max_bytes_sync_commit`, the transaction needs to be committed immediately.

Before:
![a5e0a2ba01ec4935144253fe0a364af7](https://user-images.githubusercontent.com/22125576/137933545-77018e89-fa2e-4d45-ae5d-84638cc0506a.png)

After:
![4577ec53afa47452c847bd01fa7db56c](https://user-images.githubusercontent.com/22125576/137933592-146bef90-1346-47e4-996e-4f30a25d73bc.png)
2021-10-23 16:47:32 +08:00
bd25d1a828 [Doc] Add documents for MySQL Binlog Load (#6859)
* add zh-CN docs

* add en docs and image

* fix

* fix
2021-10-19 10:25:42 +08:00
f3d4c475b1 [DOC] Add connection reset exception solution (#6733)
Add solution for connection reset exception when doing stream load.
2021-09-25 12:27:35 +08:00
e01a845a4a [Doc] Update stream-load-manual.md (#6524)
Origin stream load column order transformation is unclear , a user is struggling for a long time in this part ,so i modified some expressions to make it clearer.
2021-09-01 13:28:25 +08:00
42fedc0a56 [Docs] Support json file format in routine load doc (#6439) 2021-08-14 10:25:06 +08:00
07ad038870 [Feature][RoutineLoad] Support for consuming kafka from the point of time (#5832)
Support when creating a kafka routine load, start consumption from a specified point in time instead of a specific offset.
eg:
```
FROM KAFKA
(
    "kafka_broker_list" = "broker1:9092,broker2:9092",
    "kafka_topic" = "my_topic",
    "property.kafka_default_offsets" = "2021-10-10 11:00:00"
);

or

FROM KAFKA
(
    "kafka_broker_list" = "broker1:9092,broker2:9092",
    "kafka_topic" = "my_topic",
    "kafka_partitions" = "0,1,2",
    "kafka_offsets" = "2021-10-10 11:00:00, 2021-10-10 11:00:00, 2021-10-10 12:00:00"
);
```

This PR also reconstructed the analysis method of properties when creating or altering
routine load jobs, and unified the analysis process in the `RoutineLoadDataSourceProperties` class.
2021-05-22 23:37:53 +08:00
add8c4bb74 [Load] Support reading multi-line json objects for JsonScanner (#5774)
Co-authored-by: caiconghui <caiconghui@xiaomi.com>
2021-05-18 15:44:45 +08:00
de87f4ae84 [Feature] Add list partition support (#5529)
Add list partition support
2021-04-24 17:42:27 +08:00
86af8c76a3 [DOC] Add docs of load and export using S3 protocol (#5551)
Add docs of load and export using S3 protocol
2021-03-27 18:58:29 +08:00
64fa305c06 [Doc] correct format errors in English doc (#5487)
Some formate errors in English doc.
They are very straightforward and should not break any existing build.
2021-03-11 22:34:54 +08:00
8855782aab [Doc] Fix page links (#5454) 2021-03-06 16:13:56 +08:00
e93a6da0e5 [Doc] correct format errors in English doc (#5321)
Fix some English doc format errors
2021-02-26 11:32:14 +08:00
780900ac9c [Feature] Support preceding filter original data when loading (#5338)
Support conditional filtering of original data in broker load and routine load
eg:

```
LOAD LABEL `label1`
(
DATA INFILE ('bos://cmy-repo/1.csv')
INTO TABLE tbl2
COLUMNS TERMINATED BY '\t'
(event_day, product_id, ocpc_stage, user_id)
SET (
	ocpc_stage = ocpc_stage + 100
)
PRECEDING FILTER user_id = 1381035
WHERE ocpc_stage > 30
)
...
```
2021-02-07 22:37:48 +08:00
62604dfeac Improve the processing logic of Load statement derived columns (#5140)
* support transitive in load expr
2020-12-30 10:27:46 +08:00
b640991e43 [Enhance] Add profile for load job (#5052)
Add viewable profile for broker load. Similar to the query profile,
the user can submit the import job by setting the session variable is_report_success to true,
and then view the running profile of the job on the FE web page for easy analysis and debugging.
2020-12-16 23:52:10 +08:00
bc063ebce2 fix typo in docs (#5046) 2020-12-10 15:10:22 +08:00
b954dfd82d [Bug] Fix the bug of Largetint and Decimal json load failed. (#4983)
Use param of json load "num_as_string" to use flag kParseNumbersAsStringsFlag to parse json data.
2020-12-06 08:49:30 +08:00
64b219f04d Fix typo (#4923) 2020-11-20 09:48:27 +08:00
dd70653c91 [DOCS] Fix some docs typo (#4873) 2020-11-11 21:24:19 +08:00
32afb11458 [Doc] Add doc for sequence column (#4814) 2020-10-30 10:05:15 +08:00
0199055be7 [Document] Fix some errors in the insert document (#4749) 2020-10-17 13:40:40 +08:00
751aa05cc0 fix docs typo (#4725) 2020-10-14 09:27:50 +08:00
dec91a3d43 fix docs typo (#4723) 2020-10-14 09:27:31 +08:00
3f55c1425c fix docs typo (#4722) 2020-10-14 09:27:12 +08:00
2f0d725a25 [Batch Delete] Add a session variable to show or hide hidden columns (#4579)
Sometimes we need to show hidden columns for debug.
So we need to add a session variable to show or hide hidden columns
2020-09-13 19:14:31 +08:00
81784d6471 Revert "Add a session variable to show or hide hidden columns (#4510)" (#4576)
This reverts commit fe0260e54f8dfa37260423cffcf42096de19ed1f.
2020-09-10 15:18:36 +08:00
fe0260e54f Add a session variable to show or hide hidden columns (#4510)
* add session variable to show hidden columns
2020-09-10 13:07:43 +08:00
f207036cad [Spark load][Document] Add docs about spark and yarn client for spark load (#4489)
Add docs about spark and yarn client for spark load
2020-09-02 10:52:49 +08:00
174c9f89ea [DOCS] Add batch delete docs (#4435)
update documents for batch delete #4051
2020-08-28 09:24:07 +08:00
1410d4e623 [Doc] Add in predicate support content in delete-manual.md (#4404)
Add in predicate support content in delete-manual.md
2020-08-24 21:52:28 +08:00
05fa55047e [Doc][Json Load] Improve json data format load documents (#4337)
And some detail explaination of JsonPath and Columns parameter
2020-08-13 23:39:57 +08:00
237c0807a4 [RoutineLoad] Support modify routine load job (#4158)
Support ALTER ROUTINE LOAD JOB stmt, for example:

```
alter routine load db1.label1
properties
(
"desired_concurrent_number"="3",
"max_batch_interval" = "5",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false",
"timezone" = "+08:00"
)
```

Details can be found in `alter-routine-load.md`
2020-08-06 23:11:02 +08:00
3f31866169 [Bug][Load][Json] #4124 Load json format with stream load failed (#4217)
Stream load should read all the data completely before parsing the json.
And also add a new BE config streaming_load_max_batch_read_mb
to limit the data size when loading json data.

Fix the bug of loading empty json array []

Add doc to explain some certain case of loading json format data.

Fix: #4124
2020-08-04 12:55:53 +08:00
c3d9feed75 [Load][Json] Refactor json load logic to make it more reasonable (#4020)
This CL mainly changes:

1. Reorganized the code logic to limit the supported json format to two, and the import behavior is more consistent.
2. Modified the statistical behavior of the number of error rows when loading in json format, so that the error rows can be counted correctly.
3. See `load-json-format.md` to get details of loading json format.
2020-07-07 23:07:28 +08:00
de91037d8c [Doc]Add some routine load docs (#3796)
Add some documentation about using routine load in the cloud environment
2020-06-10 22:57:00 +08:00
01c1de1870 [Load] Add more metric to trace the time cost in stream load and make brpc_num_threads configurable (#3703) 2020-06-04 13:37:28 +08:00
dbfe8a067f [Doc ]Add docs of max_running_txn_num_per_db (#3657)
Change-Id: Ibdbc19a5558b0eb3f6a5fc4ef630de255b408a92
2020-05-22 10:22:11 +08:00
432965e360 [Enhancement] documents rebuild with Vuepress (#3408) (#3414) 2020-04-29 09:14:31 +08:00