doris

Author	SHA1	Message	Date
kangkaisen	3a33f3d350	Make bitmap_union agg column support insert into and broker load (#1721 )	2019-08-30 14:44:51 +08:00
Mingyu Chen	378ce8ca04	Use double when converting TIME type value (#1722 ) TIME type value is saved in DOUBLE, so using int64 can extend the time range.	2019-08-29 21:19:19 +08:00
HangyuanLiu	0c2e344f45	Refactor DateLiteral class in FE (#1644 ) 1. Add FE time zone function support 2. Refactor DateLiteral class in FE ISSUE #1583	2019-08-27 22:20:06 +08:00
Mingyu Chen	7e981b2b14	Limit the disk usage to avoid running out of disk capacity (#1702 ) Set high watermark and flood stage of disk used capacity. And forbid some operations if disk usage is too high.	2019-08-27 22:18:17 +08:00
Mingyu Chen	b6b860c808	Make the max recursion depth of distribution pruner configurable (#1709 ) Add a new FE config 'max_distribution_pruner_recursion_depth'.	2019-08-27 22:17:07 +08:00
EmmyMiao87	b28f4242c3	Add config max_concurrent_task_num_per_be (#1693 ) This config is used to control the max concurrent task num per be. The cluster max concurrent task num = max_concurrent_task_num_per_be * number of be.	2019-08-24 00:56:40 +08:00
Mingyu Chen	00f8040bf3	Fix bug that 2 same stream load jobs may both be able to executed successfully (#1690 ) This will cause 2 jobs trying to write same file, and cause file damaged.	2019-08-22 19:38:16 +08:00
Mingyu Chen	2b2bc82ae2	Add timeout on snapshot of data (#1672 ) Release snapshot when finishing or cancelling backup/restore job. Snapshot may takes a lot disk space if not releasing them in time.	2019-08-21 21:18:53 +08:00
yuanli	ba6d728f26	Enable parsing columns from file path for Broker Load (#1582 ) (#1635 ) Currently, we do not support parsing encoded/compressed columns in file path, eg: extract column k1 from file path /path/to/dir/k1=1/xxx.csv This patch is able to parse columns from file path like in Spark(Partition Discovery). This patch parse partition columns at BrokerScanNode.java and save parsing result of each file path as a property of TBrokerRangeDesc, then the broker reader of BE can read the value of specified partition column.	2019-08-19 09:39:21 +08:00
Mingyu Chen	6d73658207	Support checking error data row when doing INSERT (#1597 ) If strict mode is true, and at least one row is filtered, the insert operation will fail and a url will be given to get the error rows. ``` ERROR 1064 (HY000): all partitions have no load data. url: http://host:ip/api/_load_error_log?file=__shard_2/error_log_insert_stmt_e0a620e93dc54461-b89ec64768367d25_e0a620e93dc54461_b89ec64768367d25 ``` If all rows are good, insert will return OK with affected rows: ``` Query OK, 1 row affected (0.26 sec) ``` If strict mode is false, and at least one row is good, the insert operation will return OK with affected rows and warnings. If has error row num, a label will be returned: ``` Query OK, 1 row affected, 1 warning (0.32 sec) {'label':'7d66c457-658b-4a3e-bdcf-8beee872ef2c'} ```	2019-08-16 21:40:29 +08:00
ZHAO Chun	b85bd334de	Remove tempory fail UT (#1659 )	2019-08-16 11:26:41 +08:00
EmmyMiao87	780a255112	Change the prefix of table info apis (#1625 ) The pathtrie could not distinguish the different param key with the same prefix path. So the prefix of table info apis has been change to /api/external which is used by spark-doris-connector.	2019-08-13 11:30:32 +08:00
Yunfeng,Wu	e3348c46a9	Expose data pruned-filter-scan ability (#1527 )	2019-08-11 12:59:24 +08:00
EmmyMiao87	add6266c71	Broker load supports function (#1592 ) * Broker load supports function The commit support the column function in broker load. The grammar of LoadStmt has not been changed. Example: columns terminated by ',' (tmp_c1, tmp_c2) set (c1=tmp_c1+tmp_c2) Also, the old function is compatible such as default_value, strftime etc. After this commit, there are no difference in column function between stream load and broker load except old function.	2019-08-09 13:27:31 +08:00
xy720	4c2a3d6da4	Merge Help document to documentation (#1586 ) Help document collation (integration of help and documentation documents)	2019-08-07 21:31:53 +08:00
Youngwb	f7a05d8580	Support setting timezone variable in FE (#1587 )	2019-08-07 09:25:26 +08:00
Mingyu Chen	93a3577baa	Support multi partition column when creating table (#1574 ) When creating table with OLAP engine, use can specify multi parition columns. eg: PARTITION BY RANGE(`date`, `id`) ( PARTITION `p201701_1000` VALUES LESS THAN ("2017-02-01", "1000"), PARTITION `p201702_2000` VALUES LESS THAN ("2017-03-01", "2000"), PARTITION `p201703_all` VALUES LESS THAN ("2017-04-01") ) Notice that load by hadoop cluster does not support multi parition column table.	2019-08-05 16:16:43 +08:00
EmmyMiao87	938c6d4cdf	Thrown TabletQuorumFailedException in commitTxn (#1575 ) The TabletQuorumFailedException will be thrown in commitTxn while the success replica num of tablet is less then quorom replica num. The Hadoop load does not handle this exception because the push task will retry it later. The streaming broker, insert, stream and mini load will catch this exception and abort the txn after that.	2019-08-04 15:54:03 +08:00
Mingyu Chen	0694b6a6fa	Fix bugs of Broker load (#1546 ) Use same UUID as query ID and load ID of a load execution plan. Each load execution plan has a load ID, and as a plan, there is also a query ID. We can use same UUID as query ID and load ID, for tracing the load process more easily. Change the load ID when retrying a load execution plan. When a load execution plan retry, the load ID should be changed, otherwise BE can not distinguish the old and new load requests. Cancel the running loading task when cancelling the broker load. When user cancel a broker load, the running loading task should also be cancelled, or it may occupies the worker thread for a long time. Remove the unnecessary query report when doing load execution plan. Only the last query report is needed. Add a new BE config tablet_writer_rpc_timeout_sec. It is used for RPC of tablet sink. The default is 600 seconds. which is long enough for flushing about 6GB data. The long timeout config will reduce the possibility of encountering fail to send batch error when loading. Use streaming_load_max_mb instead of mini_load_max_mb in BE config. Add more logs for tracing a broker load process easily.	2019-07-27 20:17:05 +08:00
Mingyu Chen	69040572fb	Use different ID instead of table ID for base index of an OLAP table (#1524 )	2019-07-23 15:48:45 +08:00
EmmyMiao87	1f3f3f76a2	Fix the duplicated request bug of mini load (#1504 ) The function of miniLoadBegin will return the txn_id. If the backend sends the duplicated request to frontend, frontend will return the txn_id which was created by the same mini load. The issue is that frontend returns the txn_id when the last same request hasn't been begun the txn. The frontend returns the zero which is initialized txn_id and the be could not execute the load plan with a error txn_id. The commit conbines the `createLoadJob` and `execute` together in the write lock. It protects the atomicity of `create` and `beginTxn`. So the duplicated request cannot get the txn id before the last same request is finished.	2019-07-18 23:52:12 +08:00
Mingyu Chen	2551248a52	Support grant GRANT_PRIV on database or table level (#1472 ) Currently, GRANT_PRIV can only be granted on global level, which means it can only be granted on .. Grant it on db.* or db.tbl are not allowed. This will not be able to meet the requirement to create a user who has privilege to grant privileges to other users on specified database or table, such as: GRANT SELECT_PRIV ON db1.* TO cmy@'%'; So I extend the range of GRANT_PRIV. User can now grant GRANT_PRIV on database or even table level, such as: GRANT GRANT_PRIV ON db1.* TO cmy@'%'; And after being granted, the user cmy@'%' can now grant GRANT_PRIV on db1.* to other users.	2019-07-16 19:25:18 +08:00
lichaoyong	0d48a3961c	Refactor Storage Engine (#1478 ) NOTE: This patch would modify all Backend's data. And this will cause a very long time to restart be. So if you want to interferer your product environment, you should upgrade backend one by one. 1. Refactoring be is to clarify the structure the codes. 2. Use unique id to indicate a rowset. Nameing rowset with tablet_id and version will lead to many conflicts among compaction, clone, restore. 3. Extract an rowset interface to encapsulate rowsets with different format.	2019-07-15 21:18:22 +08:00
Mingyu Chen	863eb83cb1	Delete deprecated code in Frontend (#1463 ) 1. Delete Clone/CloneJob/CloneChecker The old clone framework is deprecated, using TabletChecker/TabletScheduler instead 2. Delete old BackupJob/RestoreJob 3. Delete OP_DROP_USER edit log 4. Delete CLONE_DONE edit log	2019-07-12 13:34:05 +08:00
Yunfeng,Wu	81f062dd4c	Bug-fix: query es table would fail when thrift_port configuration not set (#1455 )	2019-07-11 12:29:18 +08:00
Mingyu Chen	9c96a688c3	Fix bug that user can set null default value to non-nullable column in create table stmt (#1453 ) In create table stmt, column definition `k1 INT NOT NULL DEFAULT NULL` should not be allowed	2019-07-10 23:48:29 +08:00
EmmyMiao87	645f0a5279	Persist auth info in LoadJob (#1443 ) The new class named 'AuthorizationInfo' is used to save the auth info in jobs. The job doesn't need to retrieve the auth info by meta id which maybe throw the exception when db or table has been dropped or renamed. The persistence of 'AuthorizationInfo' take effect in META_VERSION 56	2019-07-09 20:50:55 +08:00
Mingyu Chen	bde362c3cd	Modify insert operation's behavior (#1444 ) Before changing default insert operation to streaming load, if the select result of a insert stmt is empty, a label will still be returned to the user, and user can use this label to check the insert load job's status. After changing the insert operation, if the select result is empty, a exception will be thrown to user client directly without any label. This new usage pattern is not friendly to already existed users, which is forcing them to change their way of using insert operation. So I add a new FE config 'using_old_load_usage_pattern', default is false. If set to true, a label will be returned to user even if the select result is empty.	2019-07-09 10:17:09 +08:00
worker24h	7eab12a40e	Support reading Parquet file when loading data (#1173 )	2019-07-01 18:39:27 +08:00
EmmyMiao87	6b83440b59	Get table name from DataSourceInfo instead of DataDesc (#1405 )	2019-06-29 11:20:12 +08:00
Mingyu Chen	5c1b4f641e	Add report version for publish task (#1401 )	2019-06-28 20:15:08 +08:00
Mingyu Chen	e807064a88	Modify colocation creation logic (#1289 )	2019-06-25 21:20:18 +08:00
EmmyMiao87	322de9cd8e	Add sql-function doc of cast_to_bigint (#1370 )	2019-06-24 19:40:57 +08:00
DDDDDDouble	120e7e9119	Add more UT for FEFunctions (#1344 )	2019-06-21 21:54:14 +08:00
EmmyMiao87	7550b2f09b	Convert mini load to streaming mini load (#1323 ) * This commit has brought contribution to streaming mini load The operation of streaming mini load is sames as previous. Also, user can check the load by frontend. The difference is that streaming mini load finish the task before reply of REST API while the non-streaming only register a load. * When updating doris Updating fe or be firstly are also supported. After fe and be are updated, the streaming mini load will take effect. * For multi mini load The non-streaming mini load still has been used by multi mini load. The behavior of multi mini load has not been changed. * Add a interface named isSupportedFunction This function is used to protect the correctness of new feature which consists of be and fe during updaing.	2019-06-21 19:34:50 +08:00
worker24h	ea71277094	Support mysql client 8.0 connection fe (#1349 ) for example: mysql --default-auth=mysql_native_password -P9030 -utest -ptest123456 -hA.B.C.D	2019-06-21 19:15:34 +08:00
EmmyMiao87	b002ba04d9	Fix the error of duplicated label (#1303 )	2019-06-14 14:13:38 +08:00
Mingyu Chen	ff0dd0d2da	Support SSL authentication with Kafka in routine load job (#1235 )	2019-06-07 16:29:01 +08:00
kangkaisen	f424321625	Fix IllegalArgumentException in LoadManager (#1240 )	2019-06-04 22:23:13 +08:00
kangkaisen	309b779a7d	Check colocate table name should be case-sensitive (#1224 )	2019-05-30 22:47:22 +08:00
Mingyu Chen	180d8e5cbd	Modify some thirdparties (#1228 ) 1. Change Kafka java client from 2.0.0 to 0.10.1.1. Because high version client may not support low server server. 2. Enable SSL in librdkafka	2019-05-30 21:23:37 +08:00
kangkaisen	fa4ac9f751	Replay GlobalVariable by Annotation (#1219 )	2019-05-29 19:21:42 +08:00
HangyuanLiu	f648bdd968	Fix datediff function (#1208 )	2019-05-28 15:55:31 +08:00
ZHAO Chun	f985ea99fc	Add support column reference in LOAD statement (#1162 )	2019-05-15 20:26:10 +08:00
kangkaisen	ffe3eaa1a7	Implement adddate, days_add and from_unixtime function in FE (#1149 )	2019-05-13 16:59:52 +08:00
Mingyu Chen	15c9be4dfe	Fix bug that balance task always choose high usage path (#1143 )	2019-05-11 22:07:17 +08:00
kangkaisen	ae18cebe0b	Improve colocate table balance logic for backend added (#1139 ) 1. Improve colocate table balance logic for backend added 2. Add more comment 3. Break loop early	2019-05-11 21:49:51 +08:00
HangyuanLiu	1eeb5ea891	Add str_to_date function in fe (#1118 )	2019-05-09 17:20:44 +08:00
Mingyu Chen	a08170fd50	Enhance the usabilities (#1100 ) * Enhence the usabilities 1. Add metrics to monitor transactions and steaming load process in BE. 2. Modify BE config 'result_buffer_cancelled_interval_time' to 300s. 3. Modify FE config 'enable_metric_calculator' to true. 4. Add more log for tracing broker load process. 5. Modify the query report process, to cancel query immediately if some instance failed. * Fix bugs 1. Avoid NullPointer when enabling colocation join with broker load 2. Return immediately when pull load task coordinator execution failed	2019-05-07 15:55:04 +08:00
HangyuanLiu	588aa7bed3	Fix date_format function in fe (#1082 )	2019-05-01 22:20:49 +08:00

1 2 3

128 Commits