doris

Author	SHA1	Message	Date
weizuo93	a6bf8c13eb	[Feature](Transaction) Support two phase commit (2PC) for stream load (#7473 ) The two phase batch commit means： During Stream load, after data is written, the message will be returned to the client, the data is invisible at this point and the transaction status is PRECOMMITTED. The data will be visible only after COMMIT is triggered by client. 1. User can invoke the following interface to trigger commit operations for transaction： curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:commit" \ http://fe_host:http_port/api/{db}/_stream_load_2pc or curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:commit" \ http://be_host:webserver_port/api/{db}/_stream_load_2pc 2.User can invoke the following interface to trigger abort operations for transaction： curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:abort" \ http://fe_host:http_port/api/{db}/_stream_load_2pc or curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:abort" \ http://be_host:webserver_port/api/{db}/_stream_load_2pc	2022-02-16 11:55:04 +08:00
sodamnsure	8d7a0d9747	[docs](routine-load)Update routine-load-manual.md (#8006 )	2022-02-14 09:28:08 +08:00
caiconghui	83f6eef506	[improvement](routine-load) Make routine load work with old kafka version (#7630 ) Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2022-01-10 17:30:24 +08:00
Heng Zhao	43ed54faa1	[docs] The name of hidden column is incorrect in batch-delete-manual.md(#7465 ) (#7466 )	2021-12-24 21:30:57 +08:00
caiconghui	06c38ce46e	[enhancement] Make concurrent_number for routine load task can be larger than be num (#7386 ) * [enhancement] Make concurrent_number for routine load task can be larger than be num Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2021-12-17 11:04:29 +08:00
Mingyu Chen	2b90967c4c	[fix][refactor](broker load) refactor the scheduling logic of broker load (#7371 ) 1. Refactor the scheduling logic of broker load. Details see #7367 2. Fix bug that loadedBytes in SHOW LOAD result is wrong. 3. Cancel the thread of LoadTimeoutChecker Now for PENDING load jobs, there will be no timeout. And the timeout of a load job start when pending load task is scheduled. 4. Fix a bug that the loading task is never submitted to the pool. The logic of BlockedPolicy is wrong. We should make sure the task is submitted to the pool, or the RejectedExecutionException should be thrown. 5. Now the transaction of a load job will begin in pending task, instead of when submitting the job.	2021-12-16 10:39:22 +08:00
lihuigang	e9282205f1	[feat-opt](spark-load) support bitmap binary data from hive in spark load (#6883 ) Support to load the binary data of bitmap value from Hive into Doris. fix #6461	2021-11-20 21:38:38 +08:00
Mingyu Chen	e8cabfff27	[S3] Support path style endpoint (#6962 ) Add a use_path_style property for S3 Upgrade hadoop-common and hadoop-aws to 2.8.0 to support path style property Fix some S3 URI bugs Add some logs for tracing load process.	2021-11-01 10:48:10 +08:00
Zhengguo Yang	4170aabf83	[Optimize] optimize some session variable and profile (#6920 ) 1. optimize error message when using batch delete 2. rename session variable is_report_success to enable_profile 3. add table name to OlapScanner profile	2021-10-27 18:03:12 +08:00
qiye	090d99b690	[Docs] fix urls and format in routine load docs (#6896 ) fix urls and format in routine load docs	2021-10-23 16:52:33 +08:00
xy720	7b50409ada	[Bug][Binlog] Fix the number of versions may exceed the limit during data synchronization (#6889 ) Bug detail: #6887 To solve this problem, the commit of transaction must meet any of the following conditions to avoid commit too freqently: 1. The current accumulated event quantity is greater than the `min_sync_commit_size`. 2. The current accumulated data size is greater than the `min_bytes_sync_commit`. In addition, when the accumulated data size exceeds `max_bytes_sync_commit`, the transaction needs to be committed immediately. Before: ![a5e0a2ba01ec4935144253fe0a364af7](https://user-images.githubusercontent.com/22125576/137933545-77018e89-fa2e-4d45-ae5d-84638cc0506a.png) After: ![4577ec53afa47452c847bd01fa7db56c](https://user-images.githubusercontent.com/22125576/137933592-146bef90-1346-47e4-996e-4f30a25d73bc.png)	2021-10-23 16:47:32 +08:00
xy720	bd25d1a828	[Doc] Add documents for MySQL Binlog Load (#6859 ) * add zh-CN docs * add en docs and image * fix * fix	2021-10-19 10:25:42 +08:00
jiafeng.zhang	f3d4c475b1	[DOC] Add connection reset exception solution (#6733 ) Add solution for connection reset exception when doing stream load.	2021-09-25 12:27:35 +08:00
zhoubintao	e01a845a4a	[Doc] Update stream-load-manual.md (#6524 ) Origin stream load column order transformation is unclear , a user is struggling for a long time in this part ,so i modified some expressions to make it clearer.	2021-09-01 13:28:25 +08:00
EmmyMiao87	42fedc0a56	[Docs] Support json file format in routine load doc (#6439 )	2021-08-14 10:25:06 +08:00
Mingyu Chen	07ad038870	[Feature][RoutineLoad] Support for consuming kafka from the point of time (#5832 ) Support when creating a kafka routine load, start consumption from a specified point in time instead of a specific offset. eg: ``` FROM KAFKA ( "kafka_broker_list" = "broker1:9092,broker2:9092", "kafka_topic" = "my_topic", "property.kafka_default_offsets" = "2021-10-10 11:00:00" ); or FROM KAFKA ( "kafka_broker_list" = "broker1:9092,broker2:9092", "kafka_topic" = "my_topic", "kafka_partitions" = "0,1,2", "kafka_offsets" = "2021-10-10 11:00:00, 2021-10-10 11:00:00, 2021-10-10 12:00:00" ); ``` This PR also reconstructed the analysis method of properties when creating or altering routine load jobs, and unified the analysis process in the `RoutineLoadDataSourceProperties` class.	2021-05-22 23:37:53 +08:00
caiconghui	add8c4bb74	[Load] Support reading multi-line json objects for JsonScanner (#5774 ) Co-authored-by: caiconghui <caiconghui@xiaomi.com>	2021-05-18 15:44:45 +08:00
qiye	de87f4ae84	[Feature] Add list partition support (#5529 ) Add list partition support	2021-04-24 17:42:27 +08:00
Zhengguo Yang	86af8c76a3	[DOC] Add docs of load and export using S3 protocol (#5551 ) Add docs of load and export using S3 protocol	2021-03-27 18:58:29 +08:00
Ting Sun	64fa305c06	[Doc] correct format errors in English doc (#5487 ) Some formate errors in English doc. They are very straightforward and should not break any existing build.	2021-03-11 22:34:54 +08:00
qiye	8855782aab	[Doc] Fix page links (#5454 )	2021-03-06 16:13:56 +08:00
Ting Sun	e93a6da0e5	[Doc] correct format errors in English doc (#5321 ) Fix some English doc format errors	2021-02-26 11:32:14 +08:00
Mingyu Chen	780900ac9c	[Feature] Support preceding filter original data when loading (#5338 ) Support conditional filtering of original data in broker load and routine load eg: ``` LOAD LABEL `label1` ( DATA INFILE ('bos://cmy-repo/1.csv') INTO TABLE tbl2 COLUMNS TERMINATED BY '\t' (event_day, product_id, ocpc_stage, user_id) SET ( ocpc_stage = ocpc_stage + 100 ) PRECEDING FILTER user_id = 1381035 WHERE ocpc_stage > 30 ) ... ```	2021-02-07 22:37:48 +08:00
Zhengguo Yang	62604dfeac	Improve the processing logic of Load statement derived columns (#5140 ) * support transitive in load expr	2020-12-30 10:27:46 +08:00
Mingyu Chen	b640991e43	[Enhance] Add profile for load job (#5052 ) Add viewable profile for broker load. Similar to the query profile, the user can submit the import job by setting the session variable is_report_success to true, and then view the running profile of the job on the FE web page for easy analysis and debugging.	2020-12-16 23:52:10 +08:00
Zhengguo Yang	bc063ebce2	fix typo in docs (#5046 )	2020-12-10 15:10:22 +08:00
HappenLee	b954dfd82d	[Bug] Fix the bug of Largetint and Decimal json load failed. (#4983 ) Use param of json load "num_as_string" to use flag kParseNumbersAsStringsFlag to parse json data.	2020-12-06 08:49:30 +08:00
zh0122	64b219f04d	Fix typo (#4923 )	2020-11-20 09:48:27 +08:00
Zhengguo Yang	dd70653c91	[DOCS] Fix some docs typo (#4873 )	2020-11-11 21:24:19 +08:00
Youngwb	32afb11458	[Doc] Add doc for sequence column (#4814 )	2020-10-30 10:05:15 +08:00
xy720	0199055be7	[Document] Fix some errors in the insert document (#4749 )	2020-10-17 13:40:40 +08:00
Zhengguo Yang	751aa05cc0	fix docs typo (#4725 )	2020-10-14 09:27:50 +08:00
Zhengguo Yang	dec91a3d43	fix docs typo (#4723 )	2020-10-14 09:27:31 +08:00
Zhengguo Yang	3f55c1425c	fix docs typo (#4722 )	2020-10-14 09:27:12 +08:00
Zhengguo Yang	2f0d725a25	[Batch Delete] Add a session variable to show or hide hidden columns (#4579 ) Sometimes we need to show hidden columns for debug. So we need to add a session variable to show or hide hidden columns	2020-09-13 19:14:31 +08:00
Zhengguo Yang	81784d6471	Revert "Add a session variable to show or hide hidden columns (#4510 )" (#4576 ) This reverts commit fe0260e54f8dfa37260423cffcf42096de19ed1f.	2020-09-10 15:18:36 +08:00
Zhengguo Yang	fe0260e54f	Add a session variable to show or hide hidden columns (#4510 ) * add session variable to show hidden columns	2020-09-10 13:07:43 +08:00
xy720	f207036cad	[Spark load][Document] Add docs about spark and yarn client for spark load (#4489 ) Add docs about spark and yarn client for spark load	2020-09-02 10:52:49 +08:00
Zhengguo Yang	174c9f89ea	[DOCS] Add batch delete docs (#4435 ) update documents for batch delete #4051	2020-08-28 09:24:07 +08:00
caiconghui	1410d4e623	[Doc] Add in predicate support content in delete-manual.md (#4404 ) Add in predicate support content in delete-manual.md	2020-08-24 21:52:28 +08:00
Mingyu Chen	05fa55047e	[Doc][Json Load] Improve json data format load documents (#4337 ) And some detail explaination of JsonPath and Columns parameter	2020-08-13 23:39:57 +08:00
Mingyu Chen	237c0807a4	[RoutineLoad] Support modify routine load job (#4158 ) Support ALTER ROUTINE LOAD JOB stmt, for example: ``` alter routine load db1.label1 properties ( "desired_concurrent_number"="3", "max_batch_interval" = "5", "max_batch_rows" = "300000", "max_batch_size" = "209715200", "strict_mode" = "false", "timezone" = "+08:00" ) ``` Details can be found in `alter-routine-load.md`	2020-08-06 23:11:02 +08:00
Mingyu Chen	3f31866169	[Bug][Load][Json] #4124 Load json format with stream load failed (#4217 ) Stream load should read all the data completely before parsing the json. And also add a new BE config streaming_load_max_batch_read_mb to limit the data size when loading json data. Fix the bug of loading empty json array [] Add doc to explain some certain case of loading json format data. Fix: #4124	2020-08-04 12:55:53 +08:00
Mingyu Chen	c3d9feed75	[Load][Json] Refactor json load logic to make it more reasonable (#4020 ) This CL mainly changes: 1. Reorganized the code logic to limit the supported json format to two, and the import behavior is more consistent. 2. Modified the statistical behavior of the number of error rows when loading in json format, so that the error rows can be counted correctly. 3. See `load-json-format.md` to get details of loading json format.	2020-07-07 23:07:28 +08:00
yangzhg	de91037d8c	[Doc]Add some routine load docs (#3796 ) Add some documentation about using routine load in the cloud environment	2020-06-10 22:57:00 +08:00
caiconghui	01c1de1870	[Load] Add more metric to trace the time cost in stream load and make brpc_num_threads configurable (#3703 )	2020-06-04 13:37:28 +08:00
EmmyMiao87	dbfe8a067f	[Doc ]Add docs of max_running_txn_num_per_db (#3657 ) Change-Id: Ibdbc19a5558b0eb3f6a5fc4ef630de255b408a92	2020-05-22 10:22:11 +08:00
hffariel	432965e360	[Enhancement] documents rebuild with Vuepress (#3408 ) (#3414 )	2020-04-29 09:14:31 +08:00

48 Commits