doris

Author	SHA1	Message	Date
Mingyu Chen	fc33ee3618	[Plugin] Add timeout of connection when downloading the plugins from URL (#3755 ) If no timeout is set, the download process may be blocked forever.	2020-06-04 11:37:18 +08:00
Mingyu Chen	791f8fee49	[Bug][Outfile] Fix bug that column separater is missing in output file. (#3765 ) When output result of a query using `OUTFILE` statement, is some of output column is null, then then following column separator is missing.	2020-06-04 10:35:32 +08:00
yangzhg	a8c95e7369	[Bug] Fix binaryPredicte's equals function ignore op (#3753 ) BinaryPredicte's equals function compare by opcode , but the opcode may not be inited yet. so it will return true if this child is same, for example `a>1` and `a<1` are equal.	2020-06-04 09:29:19 +08:00
wangbo	7f6271c637	[Bug]Fix Query failed when fact table has no data in join case (#3604 ) major work 1. Correct the value of ```numNodes``` and ```cardinality``` when ```OlapTableScan``` computeStats so that the ``` broadcast cost``` and ```paritition join cost ``` can be calculated correctly. 2. Find a input fragment with higher parallelism for shuffle fragment to assign backend	2020-06-03 22:01:55 +08:00
Mingyu Chen	2ad1b20b24	[Config] Add new BE config for tcmalloc (#3732 ) Add a new BE config tc_max_total_thread_cache_bytes	2020-06-03 21:58:13 +08:00
Yingchun Lai	73c3de4313	[refactor] Simple refactor on class Reader (#3691 ) This is a simple refactor patch on class Reader without any functional changes. Main refactor points: - Remove some useless return value - Use range loop - Use empty() instead of size() for some STL containers size judgement - Use in-class initialization instead of initialize in constructor function - Some other small refactor	2020-06-03 19:55:53 +08:00
HuangWei	ed886a485d	[HttpServer] capture convert exception (#3736 ) If parameter str is an empty string, it will throw exception too. Maybe we can add an ut for parsing parameters in http server.	2020-06-03 19:54:41 +08:00
EmmyMiao87	e16873a6c1	Fix large string val allocation failure (#3724 ) * Fix large string val allocation failure Large bitmap will need use StringVal to allocate large memory, which is large than MAX_INT. The overflow will cause serialization failure of bitmap. Fixed #3600	2020-06-03 17:07:54 +08:00
Binglin Chang	70aa9d6ca8	[Memory Engine] Add MemTabletScan (#3734 )	2020-06-03 15:42:38 +08:00
wyb	ad7270b7ca	[Spark load][Fe 1/5] Add spark etl job config (#3712 ) Add spark etl job config, includes: 1. Schema of the load tables, including columns, partitions and rollups 2. Infos of the source file, including split rules, corresponding columns, and conversion rules 3. ETL output directory and file name format 4. Job properties 5. Version for further extension	2020-06-03 11:23:09 +08:00
yangzhg	3194aa129d	Add a link to Tablet Meta URL (#3745 )	2020-06-03 10:10:32 +08:00
HangyuanLiu	60f93b2142	Fix bitmap type (#3749 )	2020-06-03 10:07:58 +08:00
HappenLee	761a0ccd12	[Bug] Fix bug that runningprofile show time problem in FE web page and add the runingprofile doc (#3722 )	2020-06-02 11:07:15 +08:00
HuangWei	fdf66b8102	[MemTracker] add log depth & auto unregister (#3701 )	2020-06-01 23:16:25 +08:00
Mingyu Chen	ee260d5721	[Bug][FsBroker] NPE throw when username is empty (#3731 ) When using Broker with an empty username, a NPE is thrown, which is not expected.	2020-06-01 21:03:21 +08:00
wyb	8e71c0787c	[Spark load][Fe 2/5] Update push task thrift interface (#3718 ) 1. Add TBrokerScanRange and TDescriptorTable used by ParquetScanner 2. Add new TPushType LOAD_V2 for spark load	2020-06-01 18:21:43 +08:00
EmmyMiao87	30df9fcae9	Serialize origin stmt in Rollup Job and MV Meta (#3705 ) * Serialize origin stmt in Rollup Job and MV Meta In materialized view 2.0, the define expr is serialized in column. The method is that doris serialzie the origin stmt of Create Materialzied View Stmt in RollupJobV2 and MVMeta. The define expr will be extract from the origin stmt after meta is deserialized. The define expr is necessary for bitmap and hll materialized view. For example: MV meta: __doris_mv_bitmap_k1, bitmap_union, to_bitmap(k1) Origin stmt: select bitmap_union(to_bitmap(k1)) from table Deserialize meta: __doris_mv_bitmap_k1, bitmap_union, null After extract: the define expr `to_bitmap(k1)` from origin stmt should be extracted. __doris_mv_bitmap_v1, bitmap_union, to_bitmap(k1) (which comes from the origin stmt) Change-Id: Ic2da093188d8985f5e97be5bd094e5d60d82c9a7 * Add comment of read method Change-Id: I4e1e0f4ad0f6e76cdc43e49938de768ec3b0a0e8 * Fix ut Change-Id: I2be257d512bf541f00912a374a2e07a039fc42b4 * Change code style Change-Id: I3ab23f5c94ae781167f498fefde2d96e42e05bf9	2020-05-30 20:17:46 +08:00
Binglin Chang	5cb4063904	Fix UT ThreadPoolManagerTest failure (#3723 )	2020-05-30 10:35:07 +08:00
Yingchun Lai	43d25afa2c	[compaction] Update cumulative point calculate algorithm (#3690 ) Current cumulative point calculate algorithm may skip singleton rowset when the rowset has only one segment and with NONOVERLAPPING flag. When a tablet is new created and cumulate many singleton rowsets, cumulative point will be calculated as the max version + 1, and then cumulative compaction couldn't pick any rowsets and compaction failed, and will lead the next base compaction on this tablet with all rowsets, which can also cause memory consume problem, suppose there are thousands of rowsets. All singleton rowsets must be newly wrote by delta writer and hasn't do any compaction, we should place cumulative point before any of these rowsets.	2020-05-30 10:34:53 +08:00
Binglin Chang	7524c5ef63	[Memory Engine] Add MemSubTablet, MemTablet, WriteTx, PartialRowBatch (#3637 )	2020-05-30 10:33:10 +08:00
Binglin Chang	c967eaf496	[Memory Engine] Add TabletType to PartitionInfo and TabletMeta (#3668 )	2020-05-29 20:20:44 +08:00
lichaoyong	93aae6bdff	[Bug] fix mixed used of counter (#3720 ) MysqlResultWriter _sent_rows_counter and _result_send_timer are mixed used. It will results core dump when checking counter->type().	2020-05-29 15:36:21 +08:00
lichaoyong	5f1d25a31a	[Bug] Set the HttpResponseStatus for QueryProfile when query_id been not set (#3710 ) Doris can get query profile by HttpRequest ``` http://fe_host:web_port/query_profile?query_id=123456 ``` Now, if query_id is not found, the 404 error is not set in HttpHeader.	2020-05-29 10:06:43 +08:00
Dayue Gao	9c85d05e41	[Bug] RuntimeState should be destructed after DataSink (#3709 ) Fixes #3706 DataSink uses instance and query MemTracker from RuntimeState, therefore it should be destructed before RuntimeState. Otherwise memory corruption and segfault could happen.	2020-05-28 17:31:01 +08:00
worker24h	e76f712bb3	[Bug] Load data is error in json load	2020-05-28 17:28:33 +08:00
lichaoyong	8f71c7a331	Duplicate Key table core when predicate on metric column (#3699 ) ``` CREATE TABLE `query_detail` ( `query_id` varchar(100) NULL COMMENT "", `start_time` datetime NULL COMMENT "", `end_time` datetime NULL COMMENT "", `latency` int(11) NULL COMMENT "unit is milliseconds", `state` varchar(20) NULL COMMENT "RUNNING/FINISHED/FAILED", `sql` varchar(1024) NULL COMMENT "" ) DUPLICATE KEY(`query_id`) SELECT COUNT(*) FROM query_detail WHERE start_time >= '2020-05-27 14:52:16' AND start_time < '2020-05-27 14:52:31'; ``` The above query will core because of ZoneMap only in query_id. Use start_time to match ZoneMap cause this core.	2020-05-28 14:35:40 +08:00
Mingyu Chen	f89d970cfd	[Bug][Metrics] Fix bug that some of metrics can not be got (#3708 ) The metrics in a metric collector need have same type, but no need to have same unit.	2020-05-28 09:09:14 +08:00
Mingyu Chen	bc35f3a31f	[DynamicPartition] Optimize the rule of creating dynamic partition (#3679 ) Problem is described in ISSUE #3678 This CL mainly changed to rule of creating dynamic partition. 1. If time unit is DAY, the logic remains unchanged. 2. If time unit is WEEK, the logical changes are as follows: 1. Allow to set the start day of every week, the default is Monday. Optional Monday to Sunday 2. Assuming that the starting day is a Tuesday, the range of the partition is Tuesday of the week to Monday of the next week. 3. If time unit is MONTH, the logical changes are as follows: 1. Allow to set the start date of each month. The default is 1st, and can be selected from 1st to 28th. 2. Assuming that the starting date is the 2nd, the range of the partition is from the 2nd of this month to the 1st of the next month. 4. The `SHOW DYNAMIC PARTITION TABLES` statement adds a `StartOf` column to show the start day of week or month. It is recommended to refer to the example in `dynamic-partition.md` to understand. TODO: Better to support HOUR and YEAR time unit. Maybe in next PR. FIX: #3678	2020-05-27 16:42:41 +08:00
lichaoyong	1cc78fe69b	[Enhancement] Convert metric to Json format (#3635 ) Add a JSON format for existing metrics like this. ``` { "tags": { "metric":"thread_pool", "name":"thrift-server-pool", "type":"active_thread_num" }, "unit":"number", "value":3 } ``` I add a new JsonMetricVisitor to handle the transformation. It's not to modify existing PrometheusMetricVisitor and SimpleCoreMetricVisitor. Also I add 1. A unit item to indicate the metric better 2. Cloning tablet statistics divided by database. 3. Use white space to replace newline in audit.log	2020-05-27 08:49:30 +08:00
turbo jason	12c59ba889	[Thirdparty][glog][bug] convert init be log file length use fopen function (#3649 )	2020-05-26 22:42:50 +08:00
worker24h	fb66bac5fe	[Bug] Fix null pointer access in json-load (#3692 ) Add check for null pointer to avoid core dump	2020-05-26 22:41:30 +08:00
Mingyu Chen	dcd5e5df12	[AuditPlugin] Modify load label of audit plugin to avoid load confliction (#3681 ) Change the load label of audit plugin as: `audit_yyyyMMdd_HHmmss_feIdentity`. The `feIdentity` is got from the FE which run this plugin, currently just use FE's IP_editlog_port.	2020-05-26 18:23:07 +08:00
wyb	4978bd6c81	[Spark load] Add resource manager (#3418 ) 1. User interface: 1.1 Spark resource management Spark is used as an external computing resource in Doris to do ETL work. In the future, there may be other external resources that will be used in Doris, for example, MapReduce is used for ETL, Spark/GPU is used for queries, HDFS/S3 is used for external storage. We introduced resource management to manage these external resources used by Doris. ```sql -- create spark resource CREATE EXTERNAL RESOURCE resource_name PROPERTIES ( type = spark, spark_conf_key = spark_conf_value, working_dir = path, broker = broker_name, broker.property_key = property_value ) -- drop spark resource DROP RESOURCE resource_name -- show resources SHOW RESOURCES SHOW PROC "/resources" -- privileges GRANT USAGE_PRIV ON RESOURCE resource_name TO user_identity GRANT USAGE_PRIV ON RESOURCE resource_name TO ROLE role_name REVOKE USAGE_PRIV ON RESOURCE resource_name FROM user_identity REVOKE USAGE_PRIV ON RESOURCE resource_name FROM ROLE role_name ``` - CREATE EXTERNAL RESOURCE: FOR user_name is optional. If there has, the external resource belongs to this user. If not, the external resource belongs to the system and all users are available. PROPERTIES： 1. type: resource type. Only support spark now. 2. spark configuration: follow the standard writing of Spark configurations, refer to: https://spark.apache.org/docs/latest/configuration.html. 3. working_dir: optional, used to store ETL intermediate results in spark ETL. 4. broker: optional, used in spark ETL. The ETL intermediate results need to be read with the broker when pushed into BE. Example: ```sql CREATE EXTERNAL RESOURCE "spark0" PROPERTIES ( "type" = "spark", "spark.master" = "yarn", "spark.submit.deployMode" = "cluster", "spark.jars" = "xxx.jar,yyy.jar", "spark.files" = "/tmp/aaa,/tmp/bbb", "spark.yarn.queue" = "queue0", "spark.executor.memory" = "1g", "spark.hadoop.yarn.resourcemanager.address" = "127.0.0.1:9999", "spark.hadoop.fs.defaultFS" = "hdfs://127.0.0.1:10000", "working_dir" = "hdfs://127.0.0.1:10000/tmp/doris", "broker" = "broker0", "broker.username" = "user0", "broker.password" = "password0" ) ``` - SHOW RESOURCES: General users can only see their own resources. Admin and root users can show all resources. 1.2 Create spark load job ```sql LOAD LABEL db_name.label_name ( DATA INFILE ("/tmp/file1") INTO TABLE table_name, ... ) WITH RESOURCE resource_name [(key1 = value1, ...)] [PROPERTIES (key2 = value2, ... )] ``` Example: ```sql LOAD LABEL example_db.test_label ( DATA INFILE ("hdfs:/127.0.0.1:10000/tmp/file1") INTO TABLE example_table ) WITH RESOURCE "spark0" ( "spark.executor.memory" = "1g", "spark.files" = "/tmp/aaa,/tmp/bbb" ) PROPERTIES ("timeout" = "3600") ``` The spark configurations in load stmt can override the existing configuration in the resource for temporary use. #3010	2020-05-26 18:21:21 +08:00
Mingyu Chen	77b9acc242	[Stmt] Add rowCount column to SHOW DATA stmt (#3676 ) User can see the row count of all materialized indexes of a table. ``` mysql> show data from test; +-----------+-----------+-----------+--------------+----------+ \| TableName \| IndexName \| Size \| ReplicaCount \| RowCount \| +-----------+-----------+-----------+--------------+----------+ \| test2 \| r1 \| 10.000MB \| 30 \| 10000 \| \| \| r2 \| 20.000MB \| 30 \| 20000 \| \| \| test2 \| 50.000MB \| 30 \| 50000 \| \| \| Total \| 80.000 \| 90 \| \| +-----------+-----------+-----------+--------------+----------+ ``` Fix #3675	2020-05-26 15:53:38 +08:00
EmmyMiao87	aa4ac2d078	[Bug] Serialize storage format in rollup job (#3686 ) The segment v2 rollup job should set the storage format v2 and serialize it. If it is not serialized, the rollup of segment v2 may use the error format 'segment v1'.	2020-05-26 15:35:12 +08:00
HappenLee	f4c03fe8e2	1. Delete the code of Sort Node we do not use now. (#3666 ) Optimize the quick sort by find_the_median and try to reduce recursion level of quick sort.	2020-05-26 10:20:57 +08:00
hffariel	963d4d48aa	Override the style of sidebar's sub-direcotry (#3683 ) Override the style of sidebar's sub-directory.	2020-05-26 09:07:55 +08:00
Mingyu Chen	3ffc447b38	[OUTFILE] Support `INTO OUTFILE` to export query result (#3584 ) This CL mainly changes: 1. Support `SELECT INTO OUTFILE` command. 2. Support export query result to a file via Broker. 3. Support CSV export format with specified column separator and line delimiter.	2020-05-25 21:24:56 +08:00
yangzhg	6788cacb94	Fix unit test failed (#3642 ) Fix some unittest failed due to glog， this may be we change the ut build dir，and the log path is not exist in new build dir， so we change the log from file to stdout	2020-05-25 18:55:19 +08:00
caiconghui	e6864a1cda	Allow user to set thrift_client_timeout_ms config for thrift server (#3670 ) 1. Allow user to set thrift_client_timeout_ms config for thrift server 2. Add doc for thrift_client_timeout_ms config	2020-05-25 11:32:14 +08:00
HangyuanLiu	2608f83bdc	[WIP] Add define expr for column (#3651 ) In the materialized view 2.0 the define expr should be set in column. For example, the to_bitmap function on integer should be define in mv column. ``` create materialized view mv as select bitmap_union(to_bitmap(k1)) from table. ``` The meta of mv as following: column name: __doris_materialized_view_bitmap_k1 column aggregate type: bitmap_union column define exrp: to_bitmap(k1) The is WIP pr for materialized view 2.0. #3344	2020-05-25 11:00:29 +08:00
Mingyu Chen	ec955b8a36	[Bug] Fix bug that runningTxnNum does not equal to the real running txn num. (#3674 ) This is because the logic for modifying the number of things running is wrong. Because we did not persist the previous status(preStatus) of a transaction. Therefore, when replaying the metadata log, we cannot decide whether to modify the `runningTxnNum` value based on `preStatus`. This info is lost.	2020-05-25 10:41:38 +08:00
Binglin Chang	12ebd5d82b	Remove some outdate test (#3672 )	2020-05-25 09:23:56 +08:00
HangyuanLiu	838c1e9212	Modify HLL functions return type (#3656 ) 1、Modify hll_hash function return type to HLL 2、Make HLL_RAW_AGG is alias of HLL_UNION	2020-05-24 21:22:43 +08:00
Mingyu Chen	ef9c716682	[Bug] Fix bug that missing OP_SET_REPLICA_STATUS when reading journal (#3662 )	2020-05-22 23:04:47 +08:00
Mingyu Chen	1124808fbc	[Enhancement] Add detail msg to show the reason of publish failure. (#3647 ) Add 2 new columns `PublishTime` and `ErrMsg` to show publish version time and errors happen during the transaction process. Can be seen by executing: `SHOW PROC "/transactions/dbId/";` or `SHOW TRANSACTION WHERE ID=xx;` Currently is only record error happen in publish phase, which can help us to find out which txn is blocked. Fix #3646	2020-05-22 22:59:53 +08:00
yangzhg	ba7d2dbf7b	[Function] Support utf-8 encoding in instr, locate, locate_pos, lpad, rpad (#3638 ) Support utf-8 encoding for string function `instr`, `locate`, `locate_pos`, `lpad`, `rpad` and add unit test for them	2020-05-22 14:34:26 +08:00
EmmyMiao87	16deac96a9	[UT][Bug] Fix the ut error of bitmap_intersect (#3664 ) Change-Id: Id32fd9381119f30786acae9b4ac61b0d5ef9df48	2020-05-22 10:29:12 +08:00
yangzhg	00d563d014	[SQL] Support more syntax in case when clause (#3625 ) support support more syntax in case-when clause with subquey. suport query like ` case when k1 > subquery1 and k2 < subquey2 then ... else ... ` or `case when subquey in null then ...`	2020-05-22 10:22:59 +08:00
EmmyMiao87	dbfe8a067f	[Doc ]Add docs of max_running_txn_num_per_db (#3657 ) Change-Id: Ibdbc19a5558b0eb3f6a5fc4ef630de255b408a92	2020-05-22 10:22:11 +08:00

1 2 3 4 5 ...

1939 Commits