doris

Author	SHA1	Message	Date
Seaven	42a4fff562	Replace boost canonicalize (#2209 )	2019-11-19 17:57:37 +08:00
令狐少侠	59e9027f76	Fix bug that timeout is not taken effect in streamload (#2217 )	2019-11-16 22:29:55 +08:00
Mingyu Chen	c3b5046940	Fix bug of invalid stream load task rollback (#1999 ) If stream load be committed with result PUBLISH_TIMEOUT, it should not rollback this transaction, but only return this message to user.	2019-10-17 21:08:29 +08:00
Mingyu Chen	62acf5d098	Limit the memory usage of Loading process (#1954 )	2019-10-15 09:26:20 +08:00
ZHAO Chun	f130bd3e7b	Use Env function to operate directory (#1980 ) Now Env has unify all environment operation, such as file operation. However some of our old functions don't leverage it. This change unify FileUtils::scan_dir to use Env's function.	2019-10-15 09:25:12 +08:00
yiguolei	2f0808137a	Refactor FrontendHelper (#1888 )	2019-09-27 13:21:14 +08:00
Mingyu Chen	e8da855cd2	Support setting timezone for stream load and routine load (#1831 )	2019-09-20 07:55:05 +08:00
Mingyu Chen	00f8040bf3	Fix bug that 2 same stream load jobs may both be able to executed successfully (#1690 ) This will cause 2 jobs trying to write same file, and cause file damaged.	2019-08-22 19:38:16 +08:00
EmmyMiao87	978b1ee1af	Add strict mode in Routine load, Stream load and Mini load (#1677 )	2019-08-20 21:56:45 +08:00
Mingyu Chen	8e6814cfcd	Support setting timeout for stream load (#1670 )	2019-08-20 15:43:03 +08:00
kangpinghuang	1e2a4c3b9b	Fix tablet restore api in BE(#1623 ) (#1624 )	2019-08-13 09:34:24 +08:00
Mingyu Chen	0694b6a6fa	Fix bugs of Broker load (#1546 ) Use same UUID as query ID and load ID of a load execution plan. Each load execution plan has a load ID, and as a plan, there is also a query ID. We can use same UUID as query ID and load ID, for tracing the load process more easily. Change the load ID when retrying a load execution plan. When a load execution plan retry, the load ID should be changed, otherwise BE can not distinguish the old and new load requests. Cancel the running loading task when cancelling the broker load. When user cancel a broker load, the running loading task should also be cancelled, or it may occupies the worker thread for a long time. Remove the unnecessary query report when doing load execution plan. Only the last query report is needed. Add a new BE config tablet_writer_rpc_timeout_sec. It is used for RPC of tablet sink. The default is 600 seconds. which is long enough for flushing about 6GB data. The long timeout config will reduce the possibility of encountering fail to send batch error when loading. Use streaming_load_max_mb instead of mini_load_max_mb in BE config. Add more logs for tracing a broker load process easily.	2019-07-27 20:17:05 +08:00
Mingyu Chen	4e043e66e2	Modify the result json format of mini load (#1487 ) Mini load is now using stream load framework. But we should keep the mini load return behavior and result json format be same as old. So PUBLISH_TIMEOUT error should be treated as OK in mini load. Also add 2 counters for OlapTableSink profile: SerializeBatchTime: time of serializing all row batch. WaitInFlightPacketTime: time of waiting last send packet	2019-07-16 19:15:41 +08:00
EmmyMiao87	6c246418fb	Add timeout in stream load planner (#1480 ) Mini load timeout needs to be added in plan options. The timeout property has been added in request of process put. Otherwise, the timeout of mini load is useless. Add log of label, txn and query id in mini load	2019-07-15 22:14:59 +08:00
lichaoyong	0d48a3961c	Refactor Storage Engine (#1478 ) NOTE: This patch would modify all Backend's data. And this will cause a very long time to restart be. So if you want to interferer your product environment, you should upgrade backend one by one. 1. Refactoring be is to clarify the structure the codes. 2. Use unique id to indicate a rowset. Nameing rowset with tablet_id and version will lead to many conflicts among compaction, clone, restore. 3. Extract an rowset interface to encapsulate rowsets with different format.	2019-07-15 21:18:22 +08:00
EmmyMiao87	734032d917	Fix the error unit of create timestamp in mini load (#1460 ) The unit of old create timestamp is micros while the unit of create timestamp in fe is millisecond.	2019-07-11 19:29:18 +08:00
EmmyMiao87	b0af97d8aa	Change error msg of mini load when PUBLISH_TIMEOUT (#1415 )	2019-07-01 16:05:49 +08:00
EmmyMiao87	4f416b7e21	Fix the core of mini load (#1375 ) A new struct named MiniLoadCtx is used to save is_streaming and ctx.	2019-06-25 20:54:58 +08:00
EmmyMiao87	7550b2f09b	Convert mini load to streaming mini load (#1323 ) * This commit has brought contribution to streaming mini load The operation of streaming mini load is sames as previous. Also, user can check the load by frontend. The difference is that streaming mini load finish the task before reply of REST API while the non-streaming only register a load. * When updating doris Updating fe or be firstly are also supported. After fe and be are updated, the streaming mini load will take effect. * For multi mini load The non-streaming mini load still has been used by multi mini load. The behavior of multi mini load has not been changed. * Add a interface named isSupportedFunction This function is used to protect the correctness of new feature which consists of be and fe during updaing.	2019-06-21 19:34:50 +08:00
ZHAO Chun	9d03ba236b	Uniform Status (#1317 )	2019-06-14 23:38:31 +08:00
lichaoyong	c20d62679e	Add negative load from StreamLoad (#1227 )	2019-05-31 07:14:06 +08:00
Mingyu Chen	afa3aa9069	Add some pre-calculated metrics (#1079 ) 1. max io util of disks 2. max network send/receive bytes rate of all network devices 3. base/cumulative compaction request counter and failure counter	2019-04-30 11:12:23 +08:00
lide	9c82d41981	Support Doris query ES by HTTP way (#925 )	2019-04-28 17:14:44 +08:00
Mingyu Chen	0820a29b8d	Implement the routine load process of Kafka on Backend (#671 )	2019-04-28 10:33:50 +08:00
Mingyu Chen	d3251a19f7	Modify the method to obtain some metrics (#904 )	2019-04-10 19:37:48 +08:00
Yunfeng,Wu	fb4e77d6d6	Add http post feature for HttpClient (#773 )	2019-03-19 22:05:33 +08:00
Mingyu Chen	aba1b9e5d6	Reopen the thrift client when got exception (#610 ) To avoid broken connection being reused.	2019-01-31 16:54:49 +08:00
Mingyu Chen	af445b6cc2	Optimize something (#607 ) 1. Unify the thrift rpc timeout from BE to FE. Add a BE config 'thrift_rpc_timeout_ms', default is 5000 2. Add hostname in "show proc '/frontends';" stmt result. 3. Fix a lock order bug in Load.java	2019-01-31 13:30:45 +08:00
Mingyu Chen	33b133c6ff	Fix bug that internal retry of stream load return wrong result (#541 ) Add an internal-generated timestamp as a unique identifier to identify a request and a retry request	2019-01-16 18:59:19 +08:00
Mingyu Chen	a51ce03595	Enhance the usability of Load operation (#490 ) 1. Add broker load error hub A broker load error hub will collect error messages in load process and saves them as a file to the specified remote storage via broker. In case that in broker/min/streaming load process, user may not be able to access the error log file in Backend directly. We also add a new header option: 'enable_hub' in streaming load request, and default is false. Because if we enable the broker load error hub, it will significantly slow down the processing speed of streaming load, due to the visit of remote storage via broker. So use can disable the error load hub using this header option, to avoid slowing down the load speed. 2. Show load error logs by using SHOW LOAD WARNINGS stmt We also provide a more easy way to get load error logs. We implement 'SHOW LOAD WARNINGS ON 'url'' stmt to show load error logs directly. The 'url' in stmt is provided in 'SHOW LOAD' stmt. eg: show load warnings on "http://192.168.1.1:8040/api/_load_error_log?file=__shard_2/error_log_xxx"; 3. Support now() function in broker load User can mapping a column to now() in broker load stmt, which means this column will be filled with time when the ETL started. 4. Support more types of wildcard in broker load Currently, we only support wildcard '' to match the file names. wildcard like '/path/to/20190[1-4]' is not support.	2019-01-03 19:07:27 +08:00
Mingyu Chen	46c70a16b1	Add more detail logs to debug streaming load (#484 ) * Add more detail logs to debug streaming load * fix bugs * fix bugs	2018-12-28 19:42:09 +08:00
ZHAO Chun	90d71508ff	Add UserFunctionCache to cache UDF's library (#453 ) * Add UserFunctionCache to cache UDF's library This patch replace LibCache with UserFunctionCache. LibCache use HDFS URL to identify a UDF's Library, and when BE process restart all of downloaded library should be loaded another time. We use function id corresponding to a library, and when process restart, all downloaded libraries can be loaded without another downloading. * update	2018-12-21 22:07:21 +08:00
ZHAO Chun	e99468c387	Add HttpClient class (#441 ) Replace FileDownloader with HttpClient, this patch change clone_copy and pusher's download.	2018-12-18 09:55:11 +08:00
Mingyu Chen	dedfccfaf5	Optimize the publish logic of streaming load (#350 ) 1. Only collect all error replicas if publish task is timeout. 2. Add 2 metrics to monitor the success of failure of txn. 3. Change publish timeout to Config.load_straggler_wait_second	2018-11-26 19:01:50 +08:00
kangpinghuang	be6e9c393d	Fix bug of using symbolic link dir as storage path (#340 ) * Fix bug of #307 There is a bug to use symbolic link directory as storage root path. It is a problem that whether the path is canonical In DownloadAction, checking fails by comparing canonical path with non-canonical path. So fix the bug by convert all path to canonical path before comparison	2018-11-23 10:12:26 +08:00
Zhao Chun	a2b299e3b9	Reduce UT binary size (#314 ) * Reduce UT binary size Almost every module depend on ExecEnv, and ExecEnv contains all singleton, which make UT binary contains all object files. This patch seperate ExecEnv's initial and destory to anthor file to avoid other file's dependence. And status.cc include debug_util.h which depend tuple.h tuple_row.h, and I move get_stack_trace() to stack_util.cpp to reduce status.cc's dependence. I add USE_RTTI=1 to build rocksdb to avoid linking librocksdb.a Issue: #292 * Update	2018-11-15 16:17:23 +08:00
chenhao7253886	37b4cafe87	Change variable and namespace name in BE (#268 ) Change 'palo' to 'doris'	2018-11-02 10:22:32 +08:00
morningman	2868793b6b	Change license to Apache License 2.0 (#262 )	2018-11-01 09:06:01 +08:00
morningman	051aced48d	Missing many files in last commit In last commit, a lot of files has been missed	2018-10-31 16:19:21 +08:00
morningman	5d3fc80067	Added: * Add streaming load feature. You can execute 'help stream load;' to see more information. Changed: * Loading phase of a certain table can be parallelized, to reduce the load job execution time when multi load jobs to a single table. * Using RocksDB to save the header info of tablets in Backends, to reduce the IO operations and increate speeding of restarting. Fixed: * A lot of bugs fixed.	2018-10-31 14:46:22 +08:00
morningman	65fe7f65c1	Fixed: privilege logic error: 1. No one can set root password expect for root user itself 2. NODE_PRIV cannot be granted. 3. ADMIN_PRIV and GRANT_PRIV can only be granted or revoked on . 4. No one can modifly privs of default role 'operator' and 'admin'. 5. No user can be granted to role 'operator'. Fixed: the running load limit should not be applied to replay logic. It will cause replay or loading image fail. Changed: optimize the problem of too many directories under mini load directory. Fixed: missing password and auth check when handling mini load request in Frontend. Fixed: DomainResolver should start after Frontends transfer to a certain ROLE, not in Catalog construction methods. Fixed: a stupid bug that no one can set password for root user... fix it: only root user can set password for root. Fixed: read null data twice When reading data with a null value, in some cases, the same data will be read twice by the storage engine, resulting in a wrong result.The reason for this problem is that when splitting, and the start key is the minimum value, the data with null is read. Fixed: add a flag to prevent DomainResovler thread start twice. Fixed: fixed a mem leak of using ByteBuf when parsing auth info of http request. Fixed: add a new config 'disable_hadoop_load', default is false, set to true to disable hadoop load. Changed: add detail error msg of submitting hadoop load job in show load result. Fixed: Backend process should be crashed if failed to saving header. Added: exposure backend info to user when encounter error on Backend. for debugging it more convenient. Fixed: Should remove fd from map when inputstream or outputstream is closed in Broker process. Fixed: Change all files' LF to unix format. Internal commit id: merge from dfcd0aca18eed9ff99d188eb3d01c60d419be1b8	2018-10-01 19:58:41 +08:00
lide	bea10e4f06	1. hide password and other sensitive information in log and audit log 2. add 2 new proc '/current_queries' and '/current_backend_instances' to monitor the current running queries. 3. add a manual compaction api on Backend to trigger cumulative or base compaction manually. 4. add Frontend config 'max_bytes_per_broker_scanner' to limit to bytes per one broker scanner. This is to limit the memory cost of a single broker load job 5. add Frontend config 'max_unfinished_load_job' to limit load job number: if number of running load jobs exceed the limit, no more load job is allowed to be submmitted. 6. a log of bug fixed	2018-09-19 20:04:01 +08:00
morningman	cc74efb3c5	merge to ddb65b69f9c788e359e191889cb31f15279c41ec (#224 ) 1. Apache HDFS broker support HDFS HA and Hadoop kerberos authentication. 2. New Backup and Restore function. Use Fs Broker to backup your data to HDFS or restore them from HDFS. 3. Table-Level Privileges. Grant fine-grained privileges on table-level to specified user. 4. A lot of bugs fixed. 5. Performance improvement.	2018-08-24 17:12:26 +08:00
morningman	19997510a6	merge to 9625ef157dd44c58802d63cb7547f037b75fd710 (#208 ) 1. Implement Backend http server using libevent instead of mongoose. 2. Remove Old Hypertable rpc framework, use brpc instead. 3. Change rpc from FE to BE to brpc. 4. Fs broker support HDFS HA. 5. add more metrics to monitor. 6. Lots of bug fixed.	2018-07-17 09:20:30 +08:00
李超勇	6beea22d7b	remove unused files (#205 ) * modify the license * remove unused files	2018-06-19 21:32:54 +08:00
lide-reed	81baef34f4	resotre mongoose's license	2018-06-12 16:02:59 +08:00
李超勇	7e2a3aa1b3	modify the license (#203 ) some license is replaced not correctly.	2018-06-09 19:12:16 +08:00
lide-reed	611afcd125	restore license which is replaced incorrectly	2018-06-09 16:35:35 +08:00
morningman	2419384e8a	push 3.3.19 to github (#193 ) * push 3.3.19 to github * merge to 20ed420122a8283200aa37b0a6179b6a571d2837	2018-05-15 20:38:22 +08:00
morningman	6cf2fb4d47	Bug fixs merge to 2d4cc9e1358c980b4f726e17d036639bc31127aa (#188) contains: first_value with PRECEDING LEFT and NON-PRECEDING RIGHT rewrite error and count* materialize SlotDescriptor error when referring to the slot of the current query and subquery Simultaneously. fix join and count(*) materialize SlotDescriptor error. fix materialize scannode's conjuncts bug. remove no used materialization work. it have to evaluate orderby in subquery because we limit the number of rows returned by subquery. the method of judging limit is wrong. user info is missing when retrying to call load check. It's wrong to pass aggregate function when it's param is not materialized. InsertStmt does not pass the session param to observer.	2018-04-11 16:05:33 +08:00

1 2

59 Commits