doris

Author	SHA1	Message	Date
HuangWei	88a5429165	[FE] Add db&tbl info in broker load log (#3837 ) stream load log in FE has db & tbl info, broker load log should have too.	2020-06-12 20:54:41 +08:00
wyb	7f7ee63723	Move check hive table from SelectStmt to FromClause and update doc	2020-06-11 16:53:41 +08:00
EmmyMiao87	2ce2cf78ac	Remove unused import (#3826 ) Change-Id: Ic6ef5a0d372a9b17ffa21cffb9027d2d7e856474	2020-06-11 11:44:51 +08:00
yangzhg	cd402a6827	[Restore] Fix error message not match of restore job when job is time out (#3798 ) For the current code if a restore job is time out it will be reported as user canceled. This error message is very misleading	2020-06-10 23:12:04 +08:00
EmmyMiao87	4cb5f7a535	[Config]Remove max_user_connections from config (#3805 ) Update max_user_connections by user property: ``` set property `user` max_user_connections=100; ```	2020-06-10 22:56:05 +08:00
wyb	4c2e73a5fe	Add hive external table and update hive table syntax in loadstmt	2020-06-10 16:32:32 +08:00
Mingyu Chen	4fa9d8cbe9	[Spark load][Fe 3/5] Fe create job (#3715 ) * Add create spark load job * Remove unused import	2020-06-09 21:57:46 +08:00
Mingyu Chen	5b1589498a	[Bug] Fix SchemaChangeJobV2's meta persist bug (#3804 ) 1. Missing field `partitionIndexMap` in SchemaChangeJobV2 2. Pair in field `indexSchemaVersionAndHashMap` can not be persisted by GSON 3. Exit the FE process when replay edit log error. Fix: #3802	2020-06-09 21:55:46 +08:00
Yunfeng,Wu	acd7a58875	[Doris On ES] [1/3] Add ES QueryBuilders for debug mode (#3774 )	2020-06-09 16:45:16 +08:00
Mingyu Chen	8ada2559b7	[Bug] Fix bug that checkpoint thread failed to start (#3795 ) 1. Set thread id before starting the checkpoint thread 2. Init the CHECKPOINT catalog instance before visiting it.	2020-06-08 23:00:36 +08:00
kangkaisen	928379c5d8	[Bug] Fix colocate group replay NPE (#3790 ) Group id should also be persisted for replaying	2020-06-07 10:20:22 +08:00
Mingyu Chen	ea5b3b2d4c	[Bug] Fix bug that should not use "!=" to judge the equivalence of Type (#3786 ) org.apache.doris.catalog.Type is not an enum, so should not judge the equivalence of Type using "==" or "!="	2020-06-06 11:38:32 +08:00
WingC	a7bf006b51	Use BackendStatus to show BE's infomation in `show backends;` (#3713 ) The infomation is displayed in JSON format.For example: {"lastTabletReportTime":"2020-05-28 15:29:01"}	2020-06-06 11:37:48 +08:00
Xiang Wei	c51f20bb7a	Disable Bitmap or Hll type in keys or in values with incorrect agg-type (#3768 ) Bitmap and Hll type can not be used with incorrect aggregate functions, which will cause to BE crush. Add some logical checks in FE's ColumnDef#analyze to avoid creating tables or changing schemas incorrectly. Keys never be bitmap or hll type values with bitmap or hll type have to be associated with bitmap_union or hll_union	2020-06-06 11:36:06 +08:00
Mingyu Chen	173dd3953d	[Code Refactor] Remove Catalog.getInstance() method (#3784 ) Use Catalog.getCurrentCatalog() instead, to avoid potential meta operation error.	2020-06-06 11:35:01 +08:00
HangyuanLiu	4cbce687b7	Add getValueFn and removeFn to properties (#3782 )	2020-06-06 11:34:32 +08:00
Yunfeng,Wu	5abef19be4	[Doris On ES] Add more detailed error message when fail to create es table (#3758 )	2020-06-05 23:06:46 +08:00
yangzhg	cdd17333ba	Add some log to make it easier to find out bug (#3770 ) Added some logs to record to which be a query was sent. Increasing the efficiency of tracing the problem	2020-06-05 10:18:58 +08:00
EmmyMiao87	0a748661c1	Fix the error selectedIndexId when keysType of table is UNIQUE (#3772 ) The unique table also should be compensated candidate index. The reason is the same as the agg table type. Fixed #3771. Change-Id: Ic04b0360a0b178cb0b6ee635e56f48852092ec09	2020-06-04 19:26:50 +08:00
lichaoyong	9b2cf1c18e	[Bug] Clear Txn when load been cancelled (#3766 ) If you a load task encoutering error, it will be cancelled. At this time, FE will clear the Txn according to the DbName. In FE, DbName should be added by cluter name. If missing cluster name, it will encounter NullPointer. As a result, the Txn will still exists until timeout.	2020-06-04 18:18:37 +08:00
Mingyu Chen	27046c5b61	[Enhancement] Improve the performance of query with IN predicate (#3694 ) This CL mainly changes: 1. Add a new BE config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to storage engine. 2. Add 2 new session variables `max_scan_key_num` and `doris_max_scan_key_num` which can set in session level and overwrite the config value in BE.	2020-06-04 11:39:00 +08:00
Mingyu Chen	fc33ee3618	[Plugin] Add timeout of connection when downloading the plugins from URL (#3755 ) If no timeout is set, the download process may be blocked forever.	2020-06-04 11:37:18 +08:00
yangzhg	a8c95e7369	[Bug] Fix binaryPredicte's equals function ignore op (#3753 ) BinaryPredicte's equals function compare by opcode , but the opcode may not be inited yet. so it will return true if this child is same, for example `a>1` and `a<1` are equal.	2020-06-04 09:29:19 +08:00
wyb	7f6a7c6807	Remove unused import	2020-06-03 22:32:52 +08:00
wangbo	7f6271c637	[Bug]Fix Query failed when fact table has no data in join case (#3604 ) major work 1. Correct the value of ```numNodes``` and ```cardinality``` when ```OlapTableScan``` computeStats so that the ``` broadcast cost``` and ```paritition join cost ``` can be calculated correctly. 2. Find a input fragment with higher parallelism for shuffle fragment to assign backend	2020-06-03 22:01:55 +08:00
wyb	edfa6683fc	Add create spark load job	2020-06-03 21:27:27 +08:00
wyb	ad7270b7ca	[Spark load][Fe 1/5] Add spark etl job config (#3712 ) Add spark etl job config, includes: 1. Schema of the load tables, including columns, partitions and rollups 2. Infos of the source file, including split rules, corresponding columns, and conversion rules 3. ETL output directory and file name format 4. Job properties 5. Version for further extension	2020-06-03 11:23:09 +08:00
yangzhg	3194aa129d	Add a link to Tablet Meta URL (#3745 )	2020-06-03 10:10:32 +08:00
HappenLee	761a0ccd12	[Bug] Fix bug that runningprofile show time problem in FE web page and add the runingprofile doc (#3722 )	2020-06-02 11:07:15 +08:00
EmmyMiao87	30df9fcae9	Serialize origin stmt in Rollup Job and MV Meta (#3705 ) * Serialize origin stmt in Rollup Job and MV Meta In materialized view 2.0, the define expr is serialized in column. The method is that doris serialzie the origin stmt of Create Materialzied View Stmt in RollupJobV2 and MVMeta. The define expr will be extract from the origin stmt after meta is deserialized. The define expr is necessary for bitmap and hll materialized view. For example: MV meta: __doris_mv_bitmap_k1, bitmap_union, to_bitmap(k1) Origin stmt: select bitmap_union(to_bitmap(k1)) from table Deserialize meta: __doris_mv_bitmap_k1, bitmap_union, null After extract: the define expr `to_bitmap(k1)` from origin stmt should be extracted. __doris_mv_bitmap_v1, bitmap_union, to_bitmap(k1) (which comes from the origin stmt) Change-Id: Ic2da093188d8985f5e97be5bd094e5d60d82c9a7 * Add comment of read method Change-Id: I4e1e0f4ad0f6e76cdc43e49938de768ec3b0a0e8 * Fix ut Change-Id: I2be257d512bf541f00912a374a2e07a039fc42b4 * Change code style Change-Id: I3ab23f5c94ae781167f498fefde2d96e42e05bf9	2020-05-30 20:17:46 +08:00
Binglin Chang	5cb4063904	Fix UT ThreadPoolManagerTest failure (#3723 )	2020-05-30 10:35:07 +08:00
Binglin Chang	c967eaf496	[Memory Engine] Add TabletType to PartitionInfo and TabletMeta (#3668 )	2020-05-29 20:20:44 +08:00
lichaoyong	5f1d25a31a	[Bug] Set the HttpResponseStatus for QueryProfile when query_id been not set (#3710 ) Doris can get query profile by HttpRequest ``` http://fe_host:web_port/query_profile?query_id=123456 ``` Now, if query_id is not found, the 404 error is not set in HttpHeader.	2020-05-29 10:06:43 +08:00
Mingyu Chen	bc35f3a31f	[DynamicPartition] Optimize the rule of creating dynamic partition (#3679 ) Problem is described in ISSUE #3678 This CL mainly changed to rule of creating dynamic partition. 1. If time unit is DAY, the logic remains unchanged. 2. If time unit is WEEK, the logical changes are as follows: 1. Allow to set the start day of every week, the default is Monday. Optional Monday to Sunday 2. Assuming that the starting day is a Tuesday, the range of the partition is Tuesday of the week to Monday of the next week. 3. If time unit is MONTH, the logical changes are as follows: 1. Allow to set the start date of each month. The default is 1st, and can be selected from 1st to 28th. 2. Assuming that the starting date is the 2nd, the range of the partition is from the 2nd of this month to the 1st of the next month. 4. The `SHOW DYNAMIC PARTITION TABLES` statement adds a `StartOf` column to show the start day of week or month. It is recommended to refer to the example in `dynamic-partition.md` to understand. TODO: Better to support HOUR and YEAR time unit. Maybe in next PR. FIX: #3678	2020-05-27 16:42:41 +08:00
lichaoyong	1cc78fe69b	[Enhancement] Convert metric to Json format (#3635 ) Add a JSON format for existing metrics like this. ``` { "tags": { "metric":"thread_pool", "name":"thrift-server-pool", "type":"active_thread_num" }, "unit":"number", "value":3 } ``` I add a new JsonMetricVisitor to handle the transformation. It's not to modify existing PrometheusMetricVisitor and SimpleCoreMetricVisitor. Also I add 1. A unit item to indicate the metric better 2. Cloning tablet statistics divided by database. 3. Use white space to replace newline in audit.log	2020-05-27 08:49:30 +08:00
Mingyu Chen	dcd5e5df12	[AuditPlugin] Modify load label of audit plugin to avoid load confliction (#3681 ) Change the load label of audit plugin as: `audit_yyyyMMdd_HHmmss_feIdentity`. The `feIdentity` is got from the FE which run this plugin, currently just use FE's IP_editlog_port.	2020-05-26 18:23:07 +08:00
wyb	4978bd6c81	[Spark load] Add resource manager (#3418 ) 1. User interface: 1.1 Spark resource management Spark is used as an external computing resource in Doris to do ETL work. In the future, there may be other external resources that will be used in Doris, for example, MapReduce is used for ETL, Spark/GPU is used for queries, HDFS/S3 is used for external storage. We introduced resource management to manage these external resources used by Doris. ```sql -- create spark resource CREATE EXTERNAL RESOURCE resource_name PROPERTIES ( type = spark, spark_conf_key = spark_conf_value, working_dir = path, broker = broker_name, broker.property_key = property_value ) -- drop spark resource DROP RESOURCE resource_name -- show resources SHOW RESOURCES SHOW PROC "/resources" -- privileges GRANT USAGE_PRIV ON RESOURCE resource_name TO user_identity GRANT USAGE_PRIV ON RESOURCE resource_name TO ROLE role_name REVOKE USAGE_PRIV ON RESOURCE resource_name FROM user_identity REVOKE USAGE_PRIV ON RESOURCE resource_name FROM ROLE role_name ``` - CREATE EXTERNAL RESOURCE: FOR user_name is optional. If there has, the external resource belongs to this user. If not, the external resource belongs to the system and all users are available. PROPERTIES： 1. type: resource type. Only support spark now. 2. spark configuration: follow the standard writing of Spark configurations, refer to: https://spark.apache.org/docs/latest/configuration.html. 3. working_dir: optional, used to store ETL intermediate results in spark ETL. 4. broker: optional, used in spark ETL. The ETL intermediate results need to be read with the broker when pushed into BE. Example: ```sql CREATE EXTERNAL RESOURCE "spark0" PROPERTIES ( "type" = "spark", "spark.master" = "yarn", "spark.submit.deployMode" = "cluster", "spark.jars" = "xxx.jar,yyy.jar", "spark.files" = "/tmp/aaa,/tmp/bbb", "spark.yarn.queue" = "queue0", "spark.executor.memory" = "1g", "spark.hadoop.yarn.resourcemanager.address" = "127.0.0.1:9999", "spark.hadoop.fs.defaultFS" = "hdfs://127.0.0.1:10000", "working_dir" = "hdfs://127.0.0.1:10000/tmp/doris", "broker" = "broker0", "broker.username" = "user0", "broker.password" = "password0" ) ``` - SHOW RESOURCES: General users can only see their own resources. Admin and root users can show all resources. 1.2 Create spark load job ```sql LOAD LABEL db_name.label_name ( DATA INFILE ("/tmp/file1") INTO TABLE table_name, ... ) WITH RESOURCE resource_name [(key1 = value1, ...)] [PROPERTIES (key2 = value2, ... )] ``` Example: ```sql LOAD LABEL example_db.test_label ( DATA INFILE ("hdfs:/127.0.0.1:10000/tmp/file1") INTO TABLE example_table ) WITH RESOURCE "spark0" ( "spark.executor.memory" = "1g", "spark.files" = "/tmp/aaa,/tmp/bbb" ) PROPERTIES ("timeout" = "3600") ``` The spark configurations in load stmt can override the existing configuration in the resource for temporary use. #3010	2020-05-26 18:21:21 +08:00
Mingyu Chen	77b9acc242	[Stmt] Add rowCount column to SHOW DATA stmt (#3676 ) User can see the row count of all materialized indexes of a table. ``` mysql> show data from test; +-----------+-----------+-----------+--------------+----------+ \| TableName \| IndexName \| Size \| ReplicaCount \| RowCount \| +-----------+-----------+-----------+--------------+----------+ \| test2 \| r1 \| 10.000MB \| 30 \| 10000 \| \| \| r2 \| 20.000MB \| 30 \| 20000 \| \| \| test2 \| 50.000MB \| 30 \| 50000 \| \| \| Total \| 80.000 \| 90 \| \| +-----------+-----------+-----------+--------------+----------+ ``` Fix #3675	2020-05-26 15:53:38 +08:00
EmmyMiao87	aa4ac2d078	[Bug] Serialize storage format in rollup job (#3686 ) The segment v2 rollup job should set the storage format v2 and serialize it. If it is not serialized, the rollup of segment v2 may use the error format 'segment v1'.	2020-05-26 15:35:12 +08:00
Mingyu Chen	3ffc447b38	[OUTFILE] Support `INTO OUTFILE` to export query result (#3584 ) This CL mainly changes: 1. Support `SELECT INTO OUTFILE` command. 2. Support export query result to a file via Broker. 3. Support CSV export format with specified column separator and line delimiter.	2020-05-25 21:24:56 +08:00
caiconghui	e6864a1cda	Allow user to set thrift_client_timeout_ms config for thrift server (#3670 ) 1. Allow user to set thrift_client_timeout_ms config for thrift server 2. Add doc for thrift_client_timeout_ms config	2020-05-25 11:32:14 +08:00
HangyuanLiu	2608f83bdc	[WIP] Add define expr for column (#3651 ) In the materialized view 2.0 the define expr should be set in column. For example, the to_bitmap function on integer should be define in mv column. ``` create materialized view mv as select bitmap_union(to_bitmap(k1)) from table. ``` The meta of mv as following: column name: __doris_materialized_view_bitmap_k1 column aggregate type: bitmap_union column define exrp: to_bitmap(k1) The is WIP pr for materialized view 2.0. #3344	2020-05-25 11:00:29 +08:00
Mingyu Chen	ec955b8a36	[Bug] Fix bug that runningTxnNum does not equal to the real running txn num. (#3674 ) This is because the logic for modifying the number of things running is wrong. Because we did not persist the previous status(preStatus) of a transaction. Therefore, when replaying the metadata log, we cannot decide whether to modify the `runningTxnNum` value based on `preStatus`. This info is lost.	2020-05-25 10:41:38 +08:00
HangyuanLiu	838c1e9212	Modify HLL functions return type (#3656 ) 1、Modify hll_hash function return type to HLL 2、Make HLL_RAW_AGG is alias of HLL_UNION	2020-05-24 21:22:43 +08:00
Mingyu Chen	ef9c716682	[Bug] Fix bug that missing OP_SET_REPLICA_STATUS when reading journal (#3662 )	2020-05-22 23:04:47 +08:00
Mingyu Chen	1124808fbc	[Enhancement] Add detail msg to show the reason of publish failure. (#3647 ) Add 2 new columns `PublishTime` and `ErrMsg` to show publish version time and errors happen during the transaction process. Can be seen by executing: `SHOW PROC "/transactions/dbId/";` or `SHOW TRANSACTION WHERE ID=xx;` Currently is only record error happen in publish phase, which can help us to find out which txn is blocked. Fix #3646	2020-05-22 22:59:53 +08:00
yangzhg	00d563d014	[SQL] Support more syntax in case when clause (#3625 ) support support more syntax in case-when clause with subquey. suport query like ` case when k1 > subquery1 and k2 < subquey2 then ... else ... ` or `case when subquey in null then ...`	2020-05-22 10:22:59 +08:00
morningman	74fb1b830a	[Bug] Fix bug that missing OP_SET_REPLICA_STATUS when reading journal	2020-05-22 10:01:34 +08:00
worker24h	ef8fd1fcbe	[Load] Support load json-data into Doris by RoutineLoad or StreamLoad (#3553 ) Doris support load json-data by RoutineLoad or StreamLoad	2020-05-21 13:00:49 +08:00
EmmyMiao87	0d66e6bd15	Support bitmap_intersect (#3571 ) * Support bitmap_intersect Support aggregate function Bitmap Intersect, it is mainly used to take intersection of grouped data. The function 'bitmap_intersect(expr)' calculates the intersection of bitmap columns and returns a bitmap object. The defination is following: FunctionName: bitmap_intersect, InputType: bitmap, OutputType: bitmap The scenario is as follows: Query which users satisfy the three tags a, b, and c at the same time. ``` select bitmap_to_string(bitmap_intersect(user_id)) from ( select bitmap_union(user_id) user_id from bitmap_intersect_test where tag in ('a', 'b', 'c') group by tag ) a ``` Closed #3552. * Add docs of bitmap_union and bitmap_intersect * Support null of bitmap_intersect	2020-05-20 21:12:02 +08:00

... 95 96 97 98 99 ...

5755 Commits