doris

Author	SHA1	Message	Date
EmmyMiao87	85e89b79d5	Print src tuple in error_sample file (#1641 ) The src tuple could not be print in error_sample file when the value is filtered by strict mode. This commit fix this issue.	2019-08-14 19:58:09 +08:00
HangyuanLiu	199ff968dc	Fix time zone compatibility (#1631 )	2019-08-13 18:44:35 +08:00
ZHAO Chun	032d0b41bb	Fix compile error (#1630 )	2019-08-13 10:00:18 +08:00
HangyuanLiu	69af50aa8c	Time zone related BE function (#1598 ) Details can be found in time-zone.md document	2019-08-12 20:57:59 +08:00
ZHAO Chun	2bd01b23c7	Add page cache for column page in BetaRowset (#1607 )	2019-08-12 10:42:00 +08:00
Yunfeng,Wu	e3348c46a9	Expose data pruned-filter-scan ability (#1527 )	2019-08-11 12:59:24 +08:00
Mingyu Chen	0694b6a6fa	Fix bugs of Broker load (#1546 ) Use same UUID as query ID and load ID of a load execution plan. Each load execution plan has a load ID, and as a plan, there is also a query ID. We can use same UUID as query ID and load ID, for tracing the load process more easily. Change the load ID when retrying a load execution plan. When a load execution plan retry, the load ID should be changed, otherwise BE can not distinguish the old and new load requests. Cancel the running loading task when cancelling the broker load. When user cancel a broker load, the running loading task should also be cancelled, or it may occupies the worker thread for a long time. Remove the unnecessary query report when doing load execution plan. Only the last query report is needed. Add a new BE config tablet_writer_rpc_timeout_sec. It is used for RPC of tablet sink. The default is 600 seconds. which is long enough for flushing about 6GB data. The long timeout config will reduce the possibility of encountering fail to send batch error when loading. Use streaming_load_max_mb instead of mini_load_max_mb in BE config. Add more logs for tracing a broker load process easily.	2019-07-27 20:17:05 +08:00
Mingyu Chen	a88b55e649	Add more logs and metrics to trace the broker load process (#1530 ) The Operator wants to known when the job being scheduled as PENDING and LOADING. And how long it takes to finish these sub states. Also add 2 metrics on BE to monitor the memtable's flush time. `memtable_flush_total` and `memtable_flush_duration_us`	2019-07-23 21:42:44 +08:00
HangyuanLiu	4aedaea84e	Support TIME type and timediff function (#1505 )	2019-07-23 13:42:39 +08:00
Mingyu Chen	556299aae9	Remove query status report from BE when query is cancelled normally (#1489 ) When query result reach limit, the Coordinator in FE will send a cancel request to BE to cancel the query. And when being cancelled, BE will report query status to FE for debug purpose. But actually it is not necessary and will generate too many logs. So I add a CancelReason to distinguish the difference between 'normally' cancellation and 'internal error' cancellation. if 'normally' cancelled, no status will be reported from BE. When query reach limit, or user cancel it actively, it is being cancelled 'normally'. Otherwise, the query is cancelled due to internal error, which will need a report from BE.	2019-07-19 09:36:01 +08:00
Mingyu Chen	4e043e66e2	Modify the result json format of mini load (#1487 ) Mini load is now using stream load framework. But we should keep the mini load return behavior and result json format be same as old. So PUBLISH_TIMEOUT error should be treated as OK in mini load. Also add 2 counters for OlapTableSink profile: SerializeBatchTime: time of serializing all row batch. WaitInFlightPacketTime: time of waiting last send packet	2019-07-16 19:15:41 +08:00
EmmyMiao87	6c246418fb	Add timeout in stream load planner (#1480 ) Mini load timeout needs to be added in plan options. The timeout property has been added in request of process put. Otherwise, the timeout of mini load is useless. Add log of label, txn and query id in mini load	2019-07-15 22:14:59 +08:00
lichaoyong	0d48a3961c	Refactor Storage Engine (#1478 ) NOTE: This patch would modify all Backend's data. And this will cause a very long time to restart be. So if you want to interferer your product environment, you should upgrade backend one by one. 1. Refactoring be is to clarify the structure the codes. 2. Use unique id to indicate a rowset. Nameing rowset with tablet_id and version will lead to many conflicts among compaction, clone, restore. 3. Extract an rowset interface to encapsulate rowsets with different format.	2019-07-15 21:18:22 +08:00
ZHAO Chun	67b370a1ed	Add ColumnBlock (#1450 ) Use ColumnBlock to read data from Page.	2019-07-09 21:52:27 +08:00
Mingyu Chen	ded60e59f9	Add a configuration to modify the reverse time of load error log (#1433 ) Currently, the load error log on BE will be cleaned along with the intermediate data of load, configured by 'load_data_reserve_hours'. Sometimes user want to reserve the error log for longer time.	2019-07-09 10:36:13 +08:00
worker24h	7eab12a40e	Support reading Parquet file when loading data (#1173 )	2019-07-01 18:39:27 +08:00
EmmyMiao87	1ff1722d93	Fix the core in dpp sink by sum of int128 (#1412 )	2019-06-28 23:30:33 +08:00
Mingyu Chen	566e122c0d	Optimize Export feature (#1378 ) 1. Add 'timeout' properties in Export stmt. 2. Add more infos in 'show export' stmt. 3. Add more logs for debug.	2019-06-26 00:20:53 +08:00
ZHAO Chun	e30844a321	Add column reader writer for segment V2 (#1346 )	2019-06-25 16:59:26 +08:00
EmmyMiao87	7550b2f09b	Convert mini load to streaming mini load (#1323 ) * This commit has brought contribution to streaming mini load The operation of streaming mini load is sames as previous. Also, user can check the load by frontend. The difference is that streaming mini load finish the task before reply of REST API while the non-streaming only register a load. * When updating doris Updating fe or be firstly are also supported. After fe and be are updated, the streaming mini load will take effect. * For multi mini load The non-streaming mini load still has been used by multi mini load. The behavior of multi mini load has not been changed. * Add a interface named isSupportedFunction This function is used to protect the correctness of new feature which consists of be and fe during updaing.	2019-06-21 19:34:50 +08:00
chenhao	687d57be66	Fix bug that query statistics in audit log are wrong (#1354 )	2019-06-21 19:16:05 +08:00
ZHAO Chun	9d03ba236b	Uniform Status (#1317 )	2019-06-14 23:38:31 +08:00
kangpinghuang	ccf2e5bb9e	Add page api for new format segment (#1270 )	2019-06-11 10:37:16 +08:00
Mingyu Chen	ff0dd0d2da	Support SSL authentication with Kafka in routine load job (#1235 )	2019-06-07 16:29:01 +08:00
ZHAO Chun	934ca2481a	Make MySQL support optional (#1248 )	2019-06-05 12:28:15 +08:00
ZHAO Chun	9f5f44ec48	Reduce memory RowBlock needed (#1238 ) Before RowBlock will reserve memory for all columns in schema, even if it is not queried. Which will cause bad performance when quering wide table. In this patch, RowBlock will reserve memory for needed columns. In a case, this reduce ConvertBatchTime from 10s to 60ms when quering a wide table who has 178 columns. #1236	2019-06-04 12:58:41 +08:00
Mingyu Chen	180d8e5cbd	Modify some thirdparties (#1228 ) 1. Change Kafka java client from 2.0.0 to 0.10.1.1. Because high version client may not support low server server. 2. Enable SSL in librdkafka	2019-05-30 21:23:37 +08:00
HangyuanLiu	9d19c6c315	Support arbitrary kafka properties (#1204 )	2019-05-28 10:03:50 +08:00
Mingyu Chen	08c8caeacf	Add max cache size to ClientCache in BE (#1202 ) Currently, unlimited client cache pool may cause too many connections in FE	2019-05-24 22:02:09 +08:00
Mingyu Chen	488e3825f7	Fix bug that restore process in BE causes BE crash (#1193 ) When calling SnapshotLoader.move(), all files should be revoked if they are in GC queue, or the file may be deleted after move() success.	2019-05-23 19:22:29 +08:00
Mingyu Chen	722a9e71c7	Optimize json functions (#1177 ) 1. get_json_xxx() now support using quoto to escape dot 2. Implement json_path_prepare() function to preprocess json_path Performance of get_json_string() on 1000000 rows reduces from 2.27s to 0.27s	2019-05-21 09:13:12 +08:00
lide	ff2746157e	Remove log info from decimalv2_value to avoid performance degradation (#1175 )	2019-05-20 14:26:14 +08:00
Mingyu Chen	7f8a1bcdb6	Threadpool should be shutdown before join() (#1171 )	2019-05-17 19:10:22 +08:00
Mingyu Chen	b2e63910a6	Fix bug that routine load task may be blocked due to premature deconstruction (#1166 ) Data consumer group should wait all data consumers finished before return.	2019-05-16 16:15:00 +08:00
EmmyMiao87	79ab7f4413	Change label of broker load txn (#1134 ) * Change label of broker load txn 1. put broker load label into txn label 2. fix the bug of `label is already used` 3. fix partition error of new broker load * Fix count error in mini load and broker load There are three params (num_rows_load_total, num_rows_load_filtered, num_rows_load_unselected) which are used to count dpp.norm.ALL and dpp.abnorm.ALL. num_rows_load_total is the number rows of source file. num_rows_load_unselected is the not satisfied (where conjuncts) rows of num_rows_load_total num_rows_load_filtered is the rows (quality not good enough) of (num_rows_load_total-num_rows_load_unselected)	2019-05-10 16:53:46 +08:00
Mingyu Chen	a08170fd50	Enhance the usabilities (#1100 ) * Enhence the usabilities 1. Add metrics to monitor transactions and steaming load process in BE. 2. Modify BE config 'result_buffer_cancelled_interval_time' to 300s. 3. Modify FE config 'enable_metric_calculator' to true. 4. Add more log for tracing broker load process. 5. Modify the query report process, to cancel query immediately if some instance failed. * Fix bugs 1. Avoid NullPointer when enabling colocation join with broker load 2. Return immediately when pull load task coordinator execution failed	2019-05-07 15:55:04 +08:00
lide	9c82d41981	Support Doris query ES by HTTP way (#925 )	2019-04-28 17:14:44 +08:00
Mingyu Chen	cf1e7aa844	Add close tablet writer log (#1014 )	2019-04-28 10:33:50 +08:00
Mingyu Chen	3409ed41ac	Reset commit offset if task aborted due to runtime error (#994 )	2019-04-28 10:33:50 +08:00
Mingyu Chen	b7b66527ce	Fix some load bugs (#961 ) 1. Use load job's timeout as its txn timeout 2. Add a new session variable 'forward_to_master' for SHOW PROC and ADMIN stmt	2019-04-28 10:33:50 +08:00
Mingyu Chen	2b4d02b2fa	Add error load log url for routine load job (#938 )	2019-04-28 10:33:50 +08:00
Mingyu Chen	0cccb5cc9c	Fix bugs of routine load job (#917 ) 1. Uninitialized counter cause endless data consuming. 2. Incorrect handle null value in column mapping. * fix bug	2019-04-28 10:33:50 +08:00
Mingyu Chen	400d8a906f	Optimize the consumer assignment of Kafka routine load job (#870 ) 1. Use a data consumer group to share a single stream load pipe with multi data consumers. This will increase the consuming speed of Kafka messages, as well as reducing the task number of routine load job. Test results： * 1 consumer, 1 partitions: consume time: 4.469s, rows: 990140, bytes: 128737139. 221557 rows/s, 28M/s * 1 consumer, 3 partitions: consume time: 12.765s, rows: 2000143, bytes: 258631271. 156689 rows/s, 20M/s blocking get time(us): 12268241, blocking put time(us): 1886431 * 3 consumers, 3 partitions: consume time(all 3): 6.095s, rows: 2000503, bytes: 258631576. 328220 rows/s, 42M/s blocking get time(us): 1041639, blocking put time(us): 10356581 The next 2 cases show that we can achieve higher speed by adding more consumers. But the bottle neck transfers from Kafka consumer to Doris ingestion, so 3 consumers in a group is enough. I also add a Backend config `max_consumer_num_per_group` to change the number of consumers in a data consumer group, and default value is 3. In my test(1 Backend, 2 tablets, 1 replicas), 1 routine load task can achieve 10M/s, which is same as raw stream load. 2. Add OFFSET_BEGINNING and OFFSET_END support for Kafka routine load	2019-04-28 10:33:50 +08:00
morningman	e8b360d193	Merge master and fix BE ut	2019-04-28 10:33:50 +08:00
Mingyu Chen	9d08be3c5f	Add metrics for routine load (#795 ) * Add metrics for routine load * limit the max number of routine load task in backend to 10 * Fix bug that some partitions will no be assigned	2019-04-28 10:33:50 +08:00
Mingyu Chen	8d2de42b36	Fix some routine load bugs (#787 ) 1. Reserve the column order in load stmt. 2. Fix some replay bugs of routine load task.	2019-04-28 10:33:50 +08:00
Mingyu Chen	9fa5e1b768	Add a cleaner bg thread to clean idle data consumer (#776 )	2019-04-28 10:33:50 +08:00
Mingyu Chen	8f781f95c7	Add persist operations for routine load job (#754 )	2019-04-28 10:33:50 +08:00
EmmyMiao87	8b52787114	Stream load with no data will abort txn (#735 ) 1. stream load executor will abort txn when no correct data in task 2. change txn label to DebugUtil.print(UUID) which is same as task id printed by be 3. change print uuid to hi-lo	2019-04-28 10:33:50 +08:00
EmmyMiao87	062f827b60	Add attachment in rollback txn (#725 ) 1. init cmt offset in stream load context 2. init default max error num = 5000 rows / per 10000 rows 3. add log builder for routine load job and task 4. clone plan fragment param for every task 5. be does not throw too many filter rows while the init max error ratio is 1	2019-04-28 10:33:50 +08:00

1 2 3

125 Commits