doris

Author	SHA1	Message	Date
Mingyu Chen	ff0dd0d2da	Support SSL authentication with Kafka in routine load job (#1235 )	2019-06-07 16:29:01 +08:00
HangyuanLiu	9d19c6c315	Support arbitrary kafka properties (#1204 )	2019-05-28 10:03:50 +08:00
EmmyMiao87	398055ef3e	Add logic of cancel job (#1154 )	2019-05-14 17:26:45 +08:00
EmmyMiao87	79ab7f4413	Change label of broker load txn (#1134 ) * Change label of broker load txn 1. put broker load label into txn label 2. fix the bug of `label is already used` 3. fix partition error of new broker load * Fix count error in mini load and broker load There are three params (num_rows_load_total, num_rows_load_filtered, num_rows_load_unselected) which are used to count dpp.norm.ALL and dpp.abnorm.ALL. num_rows_load_total is the number rows of source file. num_rows_load_unselected is the not satisfied (where conjuncts) rows of num_rows_load_total num_rows_load_filtered is the rows (quality not good enough) of (num_rows_load_total-num_rows_load_unselected)	2019-05-10 16:53:46 +08:00
Mingyu Chen	ba78adae94	Fix bugs when using function in both stream load request and routine load job (#1091 )	2019-05-05 20:51:30 +08:00
EmmyMiao87	1662d91877	Change the logic of RoutineLoadTaskScheduler (#1061 ) 1. TaskScheduler will process one task per round 2. TaskScheduler will be blocked till queue tasks a new task 3. TaskScheduler will submit tasks when queue is empty 4. Add a example of creating a broker table by BOS 5. Change syntax of show routine load job	2019-04-28 20:05:48 +08:00
EmmyMiao87	a79bd0c771	Add doc of auto creator of kafka topic (#985 ) * Add annotation of show routine load	2019-04-28 10:33:50 +08:00
Mingyu Chen	1b5643c6fb	Fix some bugs (#979 ) 1. Add Config.max_routine_load_concurrent_task_num instead of the old one 2. Fix a bug that SHOW ALTER TABLE COLUMN may throw Nullpointer exception 3. Fix some misspelling of docs	2019-04-28 10:33:50 +08:00
Mingyu Chen	56bec6f22a	Add routine load manual (#967 )	2019-04-28 10:33:50 +08:00
Mingyu Chen	b7b66527ce	Fix some load bugs (#961 ) 1. Use load job's timeout as its txn timeout 2. Add a new session variable 'forward_to_master' for SHOW PROC and ADMIN stmt	2019-04-28 10:33:50 +08:00
Mingyu Chen	400d8a906f	Optimize the consumer assignment of Kafka routine load job (#870 ) 1. Use a data consumer group to share a single stream load pipe with multi data consumers. This will increase the consuming speed of Kafka messages, as well as reducing the task number of routine load job. Test results： * 1 consumer, 1 partitions: consume time: 4.469s, rows: 990140, bytes: 128737139. 221557 rows/s, 28M/s * 1 consumer, 3 partitions: consume time: 12.765s, rows: 2000143, bytes: 258631271. 156689 rows/s, 20M/s blocking get time(us): 12268241, blocking put time(us): 1886431 * 3 consumers, 3 partitions: consume time(all 3): 6.095s, rows: 2000503, bytes: 258631576. 328220 rows/s, 42M/s blocking get time(us): 1041639, blocking put time(us): 10356581 The next 2 cases show that we can achieve higher speed by adding more consumers. But the bottle neck transfers from Kafka consumer to Doris ingestion, so 3 consumers in a group is enough. I also add a Backend config `max_consumer_num_per_group` to change the number of consumers in a data consumer group, and default value is 3. In my test(1 Backend, 2 tablets, 1 replicas), 1 routine load task can achieve 10M/s, which is same as raw stream load. 2. Add OFFSET_BEGINNING and OFFSET_END support for Kafka routine load	2019-04-28 10:33:50 +08:00
Mingyu Chen	c577b9397e	Add help doc of routine load (#811 )	2019-04-28 10:33:50 +08:00
HangyuanLiu	67314f07f3	Fix syntax error lable -> label (#817 )	2019-03-26 19:29:11 +08:00
Mingyu Chen	c11e78c6e6	Fix bug of invalid replica last failed version (#746 ) 1. Some previous doris version may cause some invalid replica last failed version. 2. Also modify the CREATE TABLE help doc, remove row storage type and random distribution.	2019-03-14 12:35:29 +08:00
Mingyu Chen	4dbbd32a72	Remove sensitive info (#692 )	2019-03-06 17:29:11 +08:00
Mingyu Chen	9252beca99	Simplify the delete stmt (#668 ) Remove the restrict that delete stmt must specify partition even for unpartitioned table	2019-02-27 12:46:36 +08:00
lide	e135e3d41e	Add an example of help load (#584 )	2019-01-25 11:18:10 +08:00
Mingyu Chen	54e98f6964	Auto fix missing version replica (#560 )	2019-01-21 08:56:43 +08:00
Mingyu Chen	d15bc83de0	Fix some bugs of alter table operation (#550 ) 1. Fix bug that failed to query restored table after schema change. 2. Fix bug that failed to add rollup to restored table. 3. Optimize the info of SHOW ALTER TABLE stmt. 4. Optimize the info of some PROCs. 5. Optimize the tablet checker to avoid adding too much task to scheduler.	2019-01-17 15:17:51 +08:00
Mingyu Chen	798a66e6a0	Implement new tablet repair and balance framework (#336 ) More detail, see issue #540	2019-01-16 13:29:17 +08:00
Mingyu Chen	a51ce03595	Enhance the usability of Load operation (#490 ) 1. Add broker load error hub A broker load error hub will collect error messages in load process and saves them as a file to the specified remote storage via broker. In case that in broker/min/streaming load process, user may not be able to access the error log file in Backend directly. We also add a new header option: 'enable_hub' in streaming load request, and default is false. Because if we enable the broker load error hub, it will significantly slow down the processing speed of streaming load, due to the visit of remote storage via broker. So use can disable the error load hub using this header option, to avoid slowing down the load speed. 2. Show load error logs by using SHOW LOAD WARNINGS stmt We also provide a more easy way to get load error logs. We implement 'SHOW LOAD WARNINGS ON 'url'' stmt to show load error logs directly. The 'url' in stmt is provided in 'SHOW LOAD' stmt. eg: show load warnings on "http://192.168.1.1:8040/api/_load_error_log?file=__shard_2/error_log_xxx"; 3. Support now() function in broker load User can mapping a column to now() in broker load stmt, which means this column will be filled with time when the ETL started. 4. Support more types of wildcard in broker load Currently, we only support wildcard '' to match the file names. wildcard like '/path/to/20190[1-4]' is not support.	2019-01-03 19:07:27 +08:00
Mingyu Chen	dc4cbab11e	Report error when loading decimal value with scientific notation (#428 ) Currently we do not support scientific notation of decimal value.	2018-12-17 21:04:18 +08:00
morningman	faeb472909	Tidy up the docs and gensrc directory (#263 ) 1. Remove all design docs. They will be pushed again after modification. 2. Add streaming load and privilege help docs. 3. Rename palo.py to doris.py in gensrc/script/.	2018-11-01 10:37:30 +08:00
morningman	65fe7f65c1	Fixed: privilege logic error: 1. No one can set root password expect for root user itself 2. NODE_PRIV cannot be granted. 3. ADMIN_PRIV and GRANT_PRIV can only be granted or revoked on . 4. No one can modifly privs of default role 'operator' and 'admin'. 5. No user can be granted to role 'operator'. Fixed: the running load limit should not be applied to replay logic. It will cause replay or loading image fail. Changed: optimize the problem of too many directories under mini load directory. Fixed: missing password and auth check when handling mini load request in Frontend. Fixed: DomainResolver should start after Frontends transfer to a certain ROLE, not in Catalog construction methods. Fixed: a stupid bug that no one can set password for root user... fix it: only root user can set password for root. Fixed: read null data twice When reading data with a null value, in some cases, the same data will be read twice by the storage engine, resulting in a wrong result.The reason for this problem is that when splitting, and the start key is the minimum value, the data with null is read. Fixed: add a flag to prevent DomainResovler thread start twice. Fixed: fixed a mem leak of using ByteBuf when parsing auth info of http request. Fixed: add a new config 'disable_hadoop_load', default is false, set to true to disable hadoop load. Changed: add detail error msg of submitting hadoop load job in show load result. Fixed: Backend process should be crashed if failed to saving header. Added: exposure backend info to user when encounter error on Backend. for debugging it more convenient. Fixed: Should remove fd from map when inputstream or outputstream is closed in Broker process. Fixed: Change all files' LF to unix format. Internal commit id: merge from dfcd0aca18eed9ff99d188eb3d01c60d419be1b8	2018-10-01 19:58:41 +08:00
lide	bea10e4f06	1. hide password and other sensitive information in log and audit log 2. add 2 new proc '/current_queries' and '/current_backend_instances' to monitor the current running queries. 3. add a manual compaction api on Backend to trigger cumulative or base compaction manually. 4. add Frontend config 'max_bytes_per_broker_scanner' to limit to bytes per one broker scanner. This is to limit the memory cost of a single broker load job 5. add Frontend config 'max_unfinished_load_job' to limit load job number: if number of running load jobs exceed the limit, no more load job is allowed to be submmitted. 6. a log of bug fixed	2018-09-19 20:04:01 +08:00
morningman	cc74efb3c5	merge to ddb65b69f9c788e359e191889cb31f15279c41ec (#224 ) 1. Apache HDFS broker support HDFS HA and Hadoop kerberos authentication. 2. New Backup and Restore function. Use Fs Broker to backup your data to HDFS or restore them from HDFS. 3. Table-Level Privileges. Grant fine-grained privileges on table-level to specified user. 4. A lot of bugs fixed. 5. Performance improvement.	2018-08-24 17:12:26 +08:00
morningman	19997510a6	merge to 9625ef157dd44c58802d63cb7547f037b75fd710 (#208 ) 1. Implement Backend http server using libevent instead of mongoose. 2. Remove Old Hypertable rpc framework, use brpc instead. 3. Change rpc from FE to BE to brpc. 4. Fs broker support HDFS HA. 5. add more metrics to monitor. 6. Lots of bug fixed.	2018-07-17 09:20:30 +08:00
morningman	2419384e8a	push 3.3.19 to github (#193 ) * push 3.3.19 to github * merge to 20ed420122a8283200aa37b0a6179b6a571d2837	2018-05-15 20:38:22 +08:00
morningman	5de798fdd6	Merge code to github (#187 ) * merge to 95787f8be1fd0ff215708fb0f49997b632876586 * Bugs fixed	2018-03-23 14:04:55 +08:00
morningman	765f7f53d4	Update manipulation_stmt.md add mini load timeout help	2017-12-15 17:56:17 +08:00
morningman	05b86e82b2	fix bugs: (#136 ) 1. rename database does not modify the coresponding names in cluster. 2. Result of SHOW BROKERS does not match that of in SHOW PROC '/brokers'. 3. Bugs when set property for other user using admin user.	2017-11-11 20:26:29 -06:00
morningman	cc64875e6b	Fix database quota check bug. Modify load help doc (#130 )	2017-10-25 20:40:11 -05:00
morningman	0475394879	add new stmt ALTER SYSTEM ADD FREE BACKEND to add a backend not belongs to any cluster	2017-09-04 15:43:19 +08:00
cyongli	e2311f656e	baidu palo	2017-08-11 17:51:21 +08:00

34 Commits