doris

Author	SHA1	Message	Date
Mingyu Chen	173dd3953d	[Code Refactor] Remove Catalog.getInstance() method (#3784 ) Use Catalog.getCurrentCatalog() instead, to avoid potential meta operation error.	2020-06-06 11:35:01 +08:00
HangyuanLiu	4cbce687b7	Add getValueFn and removeFn to properties (#3782 )	2020-06-06 11:34:32 +08:00
HangyuanLiu	0f6e74f3f9	[BUG] Fix location url in agg_fn_evaluator (#3780 )	2020-06-06 11:34:12 +08:00
Yunfeng,Wu	5abef19be4	[Doris On ES] Add more detailed error message when fail to create es table (#3758 )	2020-06-05 23:06:46 +08:00
Xiang Wei	ed9022a908	Ignore broken disk when BE starts up (#3741 )	2020-06-05 10:26:07 +08:00
xy720	73719f263d	Fix document (#3773 )	2020-06-05 10:19:17 +08:00
yangzhg	cdd17333ba	Add some log to make it easier to find out bug (#3770 ) Added some logs to record to which be a query was sent. Increasing the efficiency of tracing the problem	2020-06-05 10:18:58 +08:00
wyb	fdf3415d06	[Website] Fix CREATE RESOURCE sidebar text and link not right bug (#3777 )	2020-06-05 09:20:36 +08:00
EmmyMiao87	0a748661c1	Fix the error selectedIndexId when keysType of table is UNIQUE (#3772 ) The unique table also should be compensated candidate index. The reason is the same as the agg table type. Fixed #3771. Change-Id: Ic04b0360a0b178cb0b6ee635e56f48852092ec09	2020-06-04 19:26:50 +08:00
lichaoyong	9b2cf1c18e	[Bug] Clear Txn when load been cancelled (#3766 ) If you a load task encoutering error, it will be cancelled. At this time, FE will clear the Txn according to the DbName. In FE, DbName should be added by cluter name. If missing cluster name, it will encounter NullPointer. As a result, the Txn will still exists until timeout.	2020-06-04 18:18:37 +08:00
Yunfeng,Wu	484e7de3c5	[Doirs On ES] fix bug for sparse docvalue context and remove the mistake usage of total (#3751 ) The other PR : https://github.com/apache/incubator-doris/pull/3513 (https://github.com/apache/incubator-doris/issues/3479) try to resolved the `inner hits node is not an array` because when a query( batch-size) run against new segment without this field, as-well the filter_path just only take `hits.hits.fields` 、`hits.hits._source` into account, this would appear an null inner hits node: ``` { "_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAHaUWY1ExUVd0ZWlRY2", "hits": { "total": 1 } } ``` Unfortunately this PR introduce another serious inconsistent result with different batch_size because of misusing the `total`. To avoid this two problem, we just add `hits.hits._score` to filter_path when `docvalue_mode` is true, `_score` would always `null` , and populate the inner hits node: ``` { "_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAHaUWY1ExUVd0ZWlRY2", "hits": { "total": 1, "hits": [ { "_score": null } ] } } ``` related issue: https://github.com/apache/incubator-doris/issues/3752	2020-06-04 16:31:18 +08:00
caiconghui	01c1de1870	[Load] Add more metric to trace the time cost in stream load and make brpc_num_threads configurable (#3703 )	2020-06-04 13:37:28 +08:00
Mingyu Chen	27046c5b61	[Enhancement] Improve the performance of query with IN predicate (#3694 ) This CL mainly changes: 1. Add a new BE config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to storage engine. 2. Add 2 new session variables `max_scan_key_num` and `doris_max_scan_key_num` which can set in session level and overwrite the config value in BE.	2020-06-04 11:39:00 +08:00
Mingyu Chen	fc33ee3618	[Plugin] Add timeout of connection when downloading the plugins from URL (#3755 ) If no timeout is set, the download process may be blocked forever.	2020-06-04 11:37:18 +08:00
Mingyu Chen	791f8fee49	[Bug][Outfile] Fix bug that column separater is missing in output file. (#3765 ) When output result of a query using `OUTFILE` statement, is some of output column is null, then then following column separator is missing.	2020-06-04 10:35:32 +08:00
yangzhg	a8c95e7369	[Bug] Fix binaryPredicte's equals function ignore op (#3753 ) BinaryPredicte's equals function compare by opcode , but the opcode may not be inited yet. so it will return true if this child is same, for example `a>1` and `a<1` are equal.	2020-06-04 09:29:19 +08:00
wangbo	7f6271c637	[Bug]Fix Query failed when fact table has no data in join case (#3604 ) major work 1. Correct the value of ```numNodes``` and ```cardinality``` when ```OlapTableScan``` computeStats so that the ``` broadcast cost``` and ```paritition join cost ``` can be calculated correctly. 2. Find a input fragment with higher parallelism for shuffle fragment to assign backend	2020-06-03 22:01:55 +08:00
Mingyu Chen	2ad1b20b24	[Config] Add new BE config for tcmalloc (#3732 ) Add a new BE config tc_max_total_thread_cache_bytes	2020-06-03 21:58:13 +08:00
Yingchun Lai	73c3de4313	[refactor] Simple refactor on class Reader (#3691 ) This is a simple refactor patch on class Reader without any functional changes. Main refactor points: - Remove some useless return value - Use range loop - Use empty() instead of size() for some STL containers size judgement - Use in-class initialization instead of initialize in constructor function - Some other small refactor	2020-06-03 19:55:53 +08:00
HuangWei	ed886a485d	[HttpServer] capture convert exception (#3736 ) If parameter str is an empty string, it will throw exception too. Maybe we can add an ut for parsing parameters in http server.	2020-06-03 19:54:41 +08:00
EmmyMiao87	e16873a6c1	Fix large string val allocation failure (#3724 ) * Fix large string val allocation failure Large bitmap will need use StringVal to allocate large memory, which is large than MAX_INT. The overflow will cause serialization failure of bitmap. Fixed #3600	2020-06-03 17:07:54 +08:00
Binglin Chang	70aa9d6ca8	[Memory Engine] Add MemTabletScan (#3734 )	2020-06-03 15:42:38 +08:00
wyb	ad7270b7ca	[Spark load][Fe 1/5] Add spark etl job config (#3712 ) Add spark etl job config, includes: 1. Schema of the load tables, including columns, partitions and rollups 2. Infos of the source file, including split rules, corresponding columns, and conversion rules 3. ETL output directory and file name format 4. Job properties 5. Version for further extension	2020-06-03 11:23:09 +08:00
yangzhg	3194aa129d	Add a link to Tablet Meta URL (#3745 )	2020-06-03 10:10:32 +08:00
HangyuanLiu	60f93b2142	Fix bitmap type (#3749 )	2020-06-03 10:07:58 +08:00
HappenLee	761a0ccd12	[Bug] Fix bug that runningprofile show time problem in FE web page and add the runingprofile doc (#3722 )	2020-06-02 11:07:15 +08:00
HuangWei	fdf66b8102	[MemTracker] add log depth & auto unregister (#3701 )	2020-06-01 23:16:25 +08:00
Mingyu Chen	ee260d5721	[Bug][FsBroker] NPE throw when username is empty (#3731 ) When using Broker with an empty username, a NPE is thrown, which is not expected.	2020-06-01 21:03:21 +08:00
wyb	8e71c0787c	[Spark load][Fe 2/5] Update push task thrift interface (#3718 ) 1. Add TBrokerScanRange and TDescriptorTable used by ParquetScanner 2. Add new TPushType LOAD_V2 for spark load	2020-06-01 18:21:43 +08:00
EmmyMiao87	30df9fcae9	Serialize origin stmt in Rollup Job and MV Meta (#3705 ) * Serialize origin stmt in Rollup Job and MV Meta In materialized view 2.0, the define expr is serialized in column. The method is that doris serialzie the origin stmt of Create Materialzied View Stmt in RollupJobV2 and MVMeta. The define expr will be extract from the origin stmt after meta is deserialized. The define expr is necessary for bitmap and hll materialized view. For example: MV meta: __doris_mv_bitmap_k1, bitmap_union, to_bitmap(k1) Origin stmt: select bitmap_union(to_bitmap(k1)) from table Deserialize meta: __doris_mv_bitmap_k1, bitmap_union, null After extract: the define expr `to_bitmap(k1)` from origin stmt should be extracted. __doris_mv_bitmap_v1, bitmap_union, to_bitmap(k1) (which comes from the origin stmt) Change-Id: Ic2da093188d8985f5e97be5bd094e5d60d82c9a7 * Add comment of read method Change-Id: I4e1e0f4ad0f6e76cdc43e49938de768ec3b0a0e8 * Fix ut Change-Id: I2be257d512bf541f00912a374a2e07a039fc42b4 * Change code style Change-Id: I3ab23f5c94ae781167f498fefde2d96e42e05bf9	2020-05-30 20:17:46 +08:00
Binglin Chang	5cb4063904	Fix UT ThreadPoolManagerTest failure (#3723 )	2020-05-30 10:35:07 +08:00
Yingchun Lai	43d25afa2c	[compaction] Update cumulative point calculate algorithm (#3690 ) Current cumulative point calculate algorithm may skip singleton rowset when the rowset has only one segment and with NONOVERLAPPING flag. When a tablet is new created and cumulate many singleton rowsets, cumulative point will be calculated as the max version + 1, and then cumulative compaction couldn't pick any rowsets and compaction failed, and will lead the next base compaction on this tablet with all rowsets, which can also cause memory consume problem, suppose there are thousands of rowsets. All singleton rowsets must be newly wrote by delta writer and hasn't do any compaction, we should place cumulative point before any of these rowsets.	2020-05-30 10:34:53 +08:00
Binglin Chang	7524c5ef63	[Memory Engine] Add MemSubTablet, MemTablet, WriteTx, PartialRowBatch (#3637 )	2020-05-30 10:33:10 +08:00
Binglin Chang	c967eaf496	[Memory Engine] Add TabletType to PartitionInfo and TabletMeta (#3668 )	2020-05-29 20:20:44 +08:00
lichaoyong	93aae6bdff	[Bug] fix mixed used of counter (#3720 ) MysqlResultWriter _sent_rows_counter and _result_send_timer are mixed used. It will results core dump when checking counter->type().	2020-05-29 15:36:21 +08:00
lichaoyong	5f1d25a31a	[Bug] Set the HttpResponseStatus for QueryProfile when query_id been not set (#3710 ) Doris can get query profile by HttpRequest ``` http://fe_host:web_port/query_profile?query_id=123456 ``` Now, if query_id is not found, the 404 error is not set in HttpHeader.	2020-05-29 10:06:43 +08:00
Dayue Gao	9c85d05e41	[Bug] RuntimeState should be destructed after DataSink (#3709 ) Fixes #3706 DataSink uses instance and query MemTracker from RuntimeState, therefore it should be destructed before RuntimeState. Otherwise memory corruption and segfault could happen.	2020-05-28 17:31:01 +08:00
worker24h	e76f712bb3	[Bug] Load data is error in json load	2020-05-28 17:28:33 +08:00
lichaoyong	8f71c7a331	Duplicate Key table core when predicate on metric column (#3699 ) ``` CREATE TABLE `query_detail` ( `query_id` varchar(100) NULL COMMENT "", `start_time` datetime NULL COMMENT "", `end_time` datetime NULL COMMENT "", `latency` int(11) NULL COMMENT "unit is milliseconds", `state` varchar(20) NULL COMMENT "RUNNING/FINISHED/FAILED", `sql` varchar(1024) NULL COMMENT "" ) DUPLICATE KEY(`query_id`) SELECT COUNT(*) FROM query_detail WHERE start_time >= '2020-05-27 14:52:16' AND start_time < '2020-05-27 14:52:31'; ``` The above query will core because of ZoneMap only in query_id. Use start_time to match ZoneMap cause this core.	2020-05-28 14:35:40 +08:00
Mingyu Chen	f89d970cfd	[Bug][Metrics] Fix bug that some of metrics can not be got (#3708 ) The metrics in a metric collector need have same type, but no need to have same unit.	2020-05-28 09:09:14 +08:00
Mingyu Chen	bc35f3a31f	[DynamicPartition] Optimize the rule of creating dynamic partition (#3679 ) Problem is described in ISSUE #3678 This CL mainly changed to rule of creating dynamic partition. 1. If time unit is DAY, the logic remains unchanged. 2. If time unit is WEEK, the logical changes are as follows: 1. Allow to set the start day of every week, the default is Monday. Optional Monday to Sunday 2. Assuming that the starting day is a Tuesday, the range of the partition is Tuesday of the week to Monday of the next week. 3. If time unit is MONTH, the logical changes are as follows: 1. Allow to set the start date of each month. The default is 1st, and can be selected from 1st to 28th. 2. Assuming that the starting date is the 2nd, the range of the partition is from the 2nd of this month to the 1st of the next month. 4. The `SHOW DYNAMIC PARTITION TABLES` statement adds a `StartOf` column to show the start day of week or month. It is recommended to refer to the example in `dynamic-partition.md` to understand. TODO: Better to support HOUR and YEAR time unit. Maybe in next PR. FIX: #3678	2020-05-27 16:42:41 +08:00
lichaoyong	1cc78fe69b	[Enhancement] Convert metric to Json format (#3635 ) Add a JSON format for existing metrics like this. ``` { "tags": { "metric":"thread_pool", "name":"thrift-server-pool", "type":"active_thread_num" }, "unit":"number", "value":3 } ``` I add a new JsonMetricVisitor to handle the transformation. It's not to modify existing PrometheusMetricVisitor and SimpleCoreMetricVisitor. Also I add 1. A unit item to indicate the metric better 2. Cloning tablet statistics divided by database. 3. Use white space to replace newline in audit.log	2020-05-27 08:49:30 +08:00
turbo jason	12c59ba889	[Thirdparty][glog][bug] convert init be log file length use fopen function (#3649 )	2020-05-26 22:42:50 +08:00
worker24h	fb66bac5fe	[Bug] Fix null pointer access in json-load (#3692 ) Add check for null pointer to avoid core dump	2020-05-26 22:41:30 +08:00
Mingyu Chen	dcd5e5df12	[AuditPlugin] Modify load label of audit plugin to avoid load confliction (#3681 ) Change the load label of audit plugin as: `audit_yyyyMMdd_HHmmss_feIdentity`. The `feIdentity` is got from the FE which run this plugin, currently just use FE's IP_editlog_port.	2020-05-26 18:23:07 +08:00
wyb	4978bd6c81	[Spark load] Add resource manager (#3418 ) 1. User interface: 1.1 Spark resource management Spark is used as an external computing resource in Doris to do ETL work. In the future, there may be other external resources that will be used in Doris, for example, MapReduce is used for ETL, Spark/GPU is used for queries, HDFS/S3 is used for external storage. We introduced resource management to manage these external resources used by Doris. ```sql -- create spark resource CREATE EXTERNAL RESOURCE resource_name PROPERTIES ( type = spark, spark_conf_key = spark_conf_value, working_dir = path, broker = broker_name, broker.property_key = property_value ) -- drop spark resource DROP RESOURCE resource_name -- show resources SHOW RESOURCES SHOW PROC "/resources" -- privileges GRANT USAGE_PRIV ON RESOURCE resource_name TO user_identity GRANT USAGE_PRIV ON RESOURCE resource_name TO ROLE role_name REVOKE USAGE_PRIV ON RESOURCE resource_name FROM user_identity REVOKE USAGE_PRIV ON RESOURCE resource_name FROM ROLE role_name ``` - CREATE EXTERNAL RESOURCE: FOR user_name is optional. If there has, the external resource belongs to this user. If not, the external resource belongs to the system and all users are available. PROPERTIES： 1. type: resource type. Only support spark now. 2. spark configuration: follow the standard writing of Spark configurations, refer to: https://spark.apache.org/docs/latest/configuration.html. 3. working_dir: optional, used to store ETL intermediate results in spark ETL. 4. broker: optional, used in spark ETL. The ETL intermediate results need to be read with the broker when pushed into BE. Example: ```sql CREATE EXTERNAL RESOURCE "spark0" PROPERTIES ( "type" = "spark", "spark.master" = "yarn", "spark.submit.deployMode" = "cluster", "spark.jars" = "xxx.jar,yyy.jar", "spark.files" = "/tmp/aaa,/tmp/bbb", "spark.yarn.queue" = "queue0", "spark.executor.memory" = "1g", "spark.hadoop.yarn.resourcemanager.address" = "127.0.0.1:9999", "spark.hadoop.fs.defaultFS" = "hdfs://127.0.0.1:10000", "working_dir" = "hdfs://127.0.0.1:10000/tmp/doris", "broker" = "broker0", "broker.username" = "user0", "broker.password" = "password0" ) ``` - SHOW RESOURCES: General users can only see their own resources. Admin and root users can show all resources. 1.2 Create spark load job ```sql LOAD LABEL db_name.label_name ( DATA INFILE ("/tmp/file1") INTO TABLE table_name, ... ) WITH RESOURCE resource_name [(key1 = value1, ...)] [PROPERTIES (key2 = value2, ... )] ``` Example: ```sql LOAD LABEL example_db.test_label ( DATA INFILE ("hdfs:/127.0.0.1:10000/tmp/file1") INTO TABLE example_table ) WITH RESOURCE "spark0" ( "spark.executor.memory" = "1g", "spark.files" = "/tmp/aaa,/tmp/bbb" ) PROPERTIES ("timeout" = "3600") ``` The spark configurations in load stmt can override the existing configuration in the resource for temporary use. #3010	2020-05-26 18:21:21 +08:00
Mingyu Chen	77b9acc242	[Stmt] Add rowCount column to SHOW DATA stmt (#3676 ) User can see the row count of all materialized indexes of a table. ``` mysql> show data from test; +-----------+-----------+-----------+--------------+----------+ \| TableName \| IndexName \| Size \| ReplicaCount \| RowCount \| +-----------+-----------+-----------+--------------+----------+ \| test2 \| r1 \| 10.000MB \| 30 \| 10000 \| \| \| r2 \| 20.000MB \| 30 \| 20000 \| \| \| test2 \| 50.000MB \| 30 \| 50000 \| \| \| Total \| 80.000 \| 90 \| \| +-----------+-----------+-----------+--------------+----------+ ``` Fix #3675	2020-05-26 15:53:38 +08:00
EmmyMiao87	aa4ac2d078	[Bug] Serialize storage format in rollup job (#3686 ) The segment v2 rollup job should set the storage format v2 and serialize it. If it is not serialized, the rollup of segment v2 may use the error format 'segment v1'.	2020-05-26 15:35:12 +08:00
HappenLee	f4c03fe8e2	1. Delete the code of Sort Node we do not use now. (#3666 ) Optimize the quick sort by find_the_median and try to reduce recursion level of quick sort.	2020-05-26 10:20:57 +08:00
hffariel	963d4d48aa	Override the style of sidebar's sub-direcotry (#3683 ) Override the style of sidebar's sub-directory.	2020-05-26 09:07:55 +08:00

1 2 3 4 5 ...

1952 Commits