doris

Author	SHA1	Message	Date
EmmyMiao87	b28f4242c3	Add config max_concurrent_task_num_per_be (#1693 ) This config is used to control the max concurrent task num per be. The cluster max concurrent task num = max_concurrent_task_num_per_be * number of be.	2019-08-24 00:56:40 +08:00
Mingyu Chen	00f8040bf3	Fix bug that 2 same stream load jobs may both be able to executed successfully (#1690 ) This will cause 2 jobs trying to write same file, and cause file damaged.	2019-08-22 19:38:16 +08:00
Mingyu Chen	2b2bc82ae2	Add timeout on snapshot of data (#1672 ) Release snapshot when finishing or cancelling backup/restore job. Snapshot may takes a lot disk space if not releasing them in time.	2019-08-21 21:18:53 +08:00
Mingyu Chen	0792e06eed	Fix NPE of insert load job persist operation (#1683 ) tracking url may be null	2019-08-21 20:30:55 +08:00
worker24h	9f50f84b68	Fix bug: "SHOW DATA" or "SHOW PARTITIONS", the DATA-SIZE less than 0 (#1680 )	2019-08-21 15:33:26 +08:00
EmmyMiao87	978b1ee1af	Add strict mode in Routine load, Stream load and Mini load (#1677 )	2019-08-20 21:56:45 +08:00
Mingyu Chen	0a27ef030b	Reduce the number of partition info in BrokerScanNode param (#1675 ) And we should reduce the number of partition info in BrokerScanNode param if user already set target partitions to load, instead of adding all partitions' info. It will cause the size of RPC packet too large.	2019-08-20 19:30:57 +08:00
Mingyu Chen	8e6814cfcd	Support setting timeout for stream load (#1670 )	2019-08-20 15:43:03 +08:00
EmmyMiao87	731f78accc	Don't persisted the data source info in broker load (#1665 )	2019-08-19 15:45:21 +08:00
yuanli	ba6d728f26	Enable parsing columns from file path for Broker Load (#1582 ) (#1635 ) Currently, we do not support parsing encoded/compressed columns in file path, eg: extract column k1 from file path /path/to/dir/k1=1/xxx.csv This patch is able to parse columns from file path like in Spark(Partition Discovery). This patch parse partition columns at BrokerScanNode.java and save parsing result of each file path as a property of TBrokerRangeDesc, then the broker reader of BE can read the value of specified partition column.	2019-08-19 09:39:21 +08:00
Mingyu Chen	6d73658207	Support checking error data row when doing INSERT (#1597 ) If strict mode is true, and at least one row is filtered, the insert operation will fail and a url will be given to get the error rows. ``` ERROR 1064 (HY000): all partitions have no load data. url: http://host:ip/api/_load_error_log?file=__shard_2/error_log_insert_stmt_e0a620e93dc54461-b89ec64768367d25_e0a620e93dc54461_b89ec64768367d25 ``` If all rows are good, insert will return OK with affected rows: ``` Query OK, 1 row affected (0.26 sec) ``` If strict mode is false, and at least one row is good, the insert operation will return OK with affected rows and warnings. If has error row num, a label will be returned: ``` Query OK, 1 row affected, 1 warning (0.32 sec) {'label':'7d66c457-658b-4a3e-bdcf-8beee872ef2c'} ```	2019-08-16 21:40:29 +08:00
Mingyu Chen	82d0afc1ba	FROM_UNIXTIME should only convert timestamp from 0 to 253402271999 (#1658 ) which is between 1970-01-01 00:00:00 ~ 9999-12-31 23:59:59, otherwise, return null	2019-08-16 18:29:57 +08:00
wkhappy1	1ed25ad83d	Add kafka_default_offsets when no partiotion specify Support read kafka partition from start (#1642)	2019-08-16 13:30:26 +08:00
ZHAO Chun	b85bd334de	Remove tempory fail UT (#1659 )	2019-08-16 11:26:41 +08:00
DDDDDDouble	4f27129368	Fix get label when use StreamLoad (#1655 )	2019-08-16 09:56:20 +08:00
kangkaisen	4cc2285094	Make http server and thrift server backlog num configurable (#1638 )	2019-08-14 19:58:48 +08:00
Mingyu Chen	03b99ddd37	Fix bug that bad replica can not be synchronized when report (#1634 ) When the replica is recovered from bad on BE, the report process should change the bad status of replica on FE to false, or the replica can not be recovered.	2019-08-14 09:49:44 +08:00
HangyuanLiu	199ff968dc	Fix time zone compatibility (#1631 )	2019-08-13 18:44:35 +08:00
EmmyMiao87	780a255112	Change the prefix of table info apis (#1625 ) The pathtrie could not distinguish the different param key with the same prefix path. So the prefix of table info apis has been change to /api/external which is used by spark-doris-connector.	2019-08-13 11:30:32 +08:00
wangbo	c8352a9e4d	Insert select Stmt keep the same semantics with mysql (#1626 ) (#1628 )	2019-08-13 09:56:26 +08:00
HangyuanLiu	69af50aa8c	Time zone related BE function (#1598 ) Details can be found in time-zone.md document	2019-08-12 20:57:59 +08:00
Mingyu Chen	3080139e78	Avoid load or query failed when doing alter job 2 cases: Sometimes a missing version replica can not be repaired. Which may cause query failed with error: failed to initialize storage reader. tablet=xxx, res=-214 Cancel the rollup job when there are load jobs on that table may cause load job fail. We should ignore "table not found" exception when committing the txn.	2019-08-12 16:27:34 +08:00
Yunfeng,Wu	e3348c46a9	Expose data pruned-filter-scan ability (#1527 )	2019-08-11 12:59:24 +08:00
EmmyMiao87	add6266c71	Broker load supports function (#1592 ) * Broker load supports function The commit support the column function in broker load. The grammar of LoadStmt has not been changed. Example: columns terminated by ',' (tmp_c1, tmp_c2) set (c1=tmp_c1+tmp_c2) Also, the old function is compatible such as default_value, strftime etc. After this commit, there are no difference in column function between stream load and broker load except old function.	2019-08-09 13:27:31 +08:00
Mingyu Chen	69de5df167	Fix bug that cluster balance may cause load job failed (#1581 ) The bug is described in issue #1580 . And this patch will fix 2 cases of cluster balance After finish adding the new replica, the new replica's version may not catch up with the visible version, so the new replica may be treated as a stale and redundant replica, which will be deleting at next tablet checking round. I add a mark named needFurtherRepair to the newly added replica, only mark it when that replica's version does not catch up with visible version. This replica will receive a further repair at next tablet checking round, instead of being deleted. When deleting the redundant replicas, there may be some load jobs on it. Delete these replicas may cause the load job fail. Before deleting a redundant replica, I first mark the next txn id on that replica, and set replica's state to CLONE. The CLONE state will ensure that no more load jobs will be on that replica, and we will wait all load jobs before the marked txn id to be finished. After that, the replica can be deleted safely.	2019-08-08 18:38:30 +08:00
Yunfeng,Wu	60d997fe67	Fix errors when ES username and passwd is empty (#1601 )	2019-08-08 09:29:23 +08:00
xy720	4c2a3d6da4	Merge Help document to documentation (#1586 ) Help document collation (integration of help and documentation documents)	2019-08-07 21:31:53 +08:00
Youngwb	f7a05d8580	Support setting timezone variable in FE (#1587 )	2019-08-07 09:25:26 +08:00
Mingyu Chen	343b913f0d	Fix a serious bug that will cause all replicas being deleted. (#1589 ) Revert commit: eda55a7394fcec2f7b6c0aefd1628f9d63911815	2019-08-06 19:23:53 +08:00
Mingyu Chen	eda55a7394	Fix bug that unable to delete replica if version is missing (#1585 ) If there is a redundant replica on BE which version is missing, the tablet report logic can not drop it correctly.	2019-08-05 16:19:05 +08:00
Mingyu Chen	93a3577baa	Support multi partition column when creating table (#1574 ) When creating table with OLAP engine, use can specify multi parition columns. eg: PARTITION BY RANGE(`date`, `id`) ( PARTITION `p201701_1000` VALUES LESS THAN ("2017-02-01", "1000"), PARTITION `p201702_2000` VALUES LESS THAN ("2017-03-01", "2000"), PARTITION `p201703_all` VALUES LESS THAN ("2017-04-01") ) Notice that load by hadoop cluster does not support multi parition column table.	2019-08-05 16:16:43 +08:00
EmmyMiao87	938c6d4cdf	Thrown TabletQuorumFailedException in commitTxn (#1575 ) The TabletQuorumFailedException will be thrown in commitTxn while the success replica num of tablet is less then quorom replica num. The Hadoop load does not handle this exception because the push task will retry it later. The streaming broker, insert, stream and mini load will catch this exception and abort the txn after that.	2019-08-04 15:54:03 +08:00
Mingyu Chen	cefe1794d4	Fix bug that replicas of a tablet may be located on same host (#1517 ) Doris support deploy multi BE on one host. So when allocating BE for replicas of a tablet, we should select different host. But there is a bug in tablet scheduler that same host may be selected for one tablet. This patch will fix this problem. There are some places related to this problem: 1. Create Table There is no bug in Create Table process. 2. Tablet Scheduler Fixed when selecting BE for REPLICA_MISSING and REPLICA_RELOCATING. Fixed when balance the tablet. 3. Colocate Table Balancer Fixed when selecting BE for repairing colocate backend sequence. Not fix in colocate group balance. Leave it to colocate repairing. 4. Tablet report Tablet report may add replica to catalog. But I did not check the host here, Tablet Scheduler will fix it.	2019-08-01 10:26:06 +08:00
EmmyMiao87	8bc8fcffae	Fix NullPointerException when creating mini load in LoadManager (#1565 ) The catch statement cancel the load job in the function named createMiniLoad. But sometimes, the load job hasn't been created in catch statement. It will throw the NullPointerException when the load job is cancelled. This commit fix this bug.	2019-07-30 12:52:14 +08:00
chenhao	2cb82c57bb	Fix bug that <=> operator and in operator get wrong result (#1516 ) * Fix bug that <=> operator and in operator get wrong result * Add some comment to get_result_for_null * Add an new Binary Operator to replace is_safe_for_null for handleing '<=>' operator * Add EQ_FOR_NULL to TExprOpcode * Remove macro definition last backslash	2019-07-30 11:17:53 +08:00
Mingyu Chen	97718a35a2	Do not get file size in Broker openReader() method (#1560 ) The file is already got when listing files. Get file size in openReader() again is unnecessary and inefficient.	2019-07-29 23:05:01 +08:00
Mingyu Chen	0694b6a6fa	Fix bugs of Broker load (#1546 ) Use same UUID as query ID and load ID of a load execution plan. Each load execution plan has a load ID, and as a plan, there is also a query ID. We can use same UUID as query ID and load ID, for tracing the load process more easily. Change the load ID when retrying a load execution plan. When a load execution plan retry, the load ID should be changed, otherwise BE can not distinguish the old and new load requests. Cancel the running loading task when cancelling the broker load. When user cancel a broker load, the running loading task should also be cancelled, or it may occupies the worker thread for a long time. Remove the unnecessary query report when doing load execution plan. Only the last query report is needed. Add a new BE config tablet_writer_rpc_timeout_sec. It is used for RPC of tablet sink. The default is 600 seconds. which is long enough for flushing about 6GB data. The long timeout config will reduce the possibility of encountering fail to send batch error when loading. Use streaming_load_max_mb instead of mini_load_max_mb in BE config. Add more logs for tracing a broker load process easily.	2019-07-27 20:17:05 +08:00
EmmyMiao87	8a7fe521d6	Allow the null default in insert into stmt (#1556 ) The default value of null is forbidden in insert into stmt while null column has not been mentioned in stmt. This is a bug because the unmentioned column has default value. The values should be inserted successfully although the default value is null. So the column may simply be not assigned default value when the column is not allowed null and the default value of column is null.	2019-07-26 21:32:00 +08:00
chenhao	abda544d3c	Fix bug that getting compatible type for TIME with other types fails (#1544 )	2019-07-26 19:10:04 +08:00
EmmyMiao87	000e9cf53c	Add administrator guide of load (#1488 ) The catalogue of load docs: ---- load-manual.md ---- broker-load-manual.md ---- insert-into-manual.md ---- stream-load-manual.md This commit also changes max/min_stream_load_timeout to max/min_load_timeout. The old config named stream_load_timeout means the max timeout suited for all types of load. So the config name has been changed.	2019-07-25 21:02:32 +08:00
EmmyMiao87	e29eceae0a	Fix the null pointer exception when ReplayOnAborted of txn in broker load (#1543 ) The txn attachment maybe null when broker load has been cancelled without attachment. The end log of broker load has been record but the callback id of txnState hasn't been removed So the callback of txn is executed when log of txn aborted is replayed.	2019-07-24 22:17:55 +08:00
worker24h	4f4c8d1824	Fix Bug: Load fail when we don't specify format type. (#1538 )	2019-07-24 15:53:00 +08:00
HangyuanLiu	7c24bf38bc	Show load statement support offset (#1531 ) Such as `show load order by createtime desc limit 1,2`	2019-07-24 13:27:21 +08:00
Mingyu Chen	a88b55e649	Add more logs and metrics to trace the broker load process (#1530 ) The Operator wants to known when the job being scheduled as PENDING and LOADING. And how long it takes to finish these sub states. Also add 2 metrics on BE to monitor the memtable's flush time. `memtable_flush_total` and `memtable_flush_duration_us`	2019-07-23 21:42:44 +08:00
Mingyu Chen	69040572fb	Use different ID instead of table ID for base index of an OLAP table (#1524 )	2019-07-23 15:48:45 +08:00
HangyuanLiu	4aedaea84e	Support TIME type and timediff function (#1505 )	2019-07-23 13:42:39 +08:00
Mingyu Chen	221cd2e103	Fix bug that user with LOAD_PRIV can see load job by SHOW LOAD stmt (#1528 ) User should has LOAD_PRIV to use SHOW LOAD stmt, not SHOW_PRIV.	2019-07-23 08:48:23 +08:00
WingC	cd7ab5af0b	Fix variable arguments bug in UDAF (#1523 )	2019-07-21 23:11:56 +08:00
Mingyu Chen	556299aae9	Remove query status report from BE when query is cancelled normally (#1489 ) When query result reach limit, the Coordinator in FE will send a cancel request to BE to cancel the query. And when being cancelled, BE will report query status to FE for debug purpose. But actually it is not necessary and will generate too many logs. So I add a CancelReason to distinguish the difference between 'normally' cancellation and 'internal error' cancellation. if 'normally' cancelled, no status will be reported from BE. When query reach limit, or user cancel it actively, it is being cancelled 'normally'. Otherwise, the query is cancelled due to internal error, which will need a report from BE.	2019-07-19 09:36:01 +08:00
EmmyMiao87	1f3f3f76a2	Fix the duplicated request bug of mini load (#1504 ) The function of miniLoadBegin will return the txn_id. If the backend sends the duplicated request to frontend, frontend will return the txn_id which was created by the same mini load. The issue is that frontend returns the txn_id when the last same request hasn't been begun the txn. The frontend returns the zero which is initialized txn_id and the be could not execute the load plan with a error txn_id. The commit conbines the `createLoadJob` and `execute` together in the write lock. It protects the atomicity of `create` and `beginTxn`. So the duplicated request cannot get the txn id before the last same request is finished.	2019-07-18 23:52:12 +08:00

1 2 3 4 5 ...

462 Commits