Commit Graph

1939 Commits

Author SHA1 Message Date
fc33ee3618 [Plugin] Add timeout of connection when downloading the plugins from URL (#3755)
If no timeout is set, the download process may be blocked forever.
2020-06-04 11:37:18 +08:00
791f8fee49 [Bug][Outfile] Fix bug that column separater is missing in output file. (#3765)
When output result of a query using `OUTFILE` statement, is some of output
column is null, then then following column separator is missing.
2020-06-04 10:35:32 +08:00
a8c95e7369 [Bug] Fix binaryPredicte's equals function ignore op (#3753)
BinaryPredicte's equals function compare by opcode ,
but the opcode may not be inited yet. 
so it will return true if this child is same,  for example `a>1` and `a<1` are equal.
2020-06-04 09:29:19 +08:00
7f6271c637 [Bug]Fix Query failed when fact table has no data in join case (#3604)
major work
1.  Correct the value of ```numNodes``` and ```cardinality``` when ```OlapTableScan``` computeStats so that the ``` broadcast cost``` and ```paritition join cost ``` can be calculated correctly.
2. Find a input fragment with higher parallelism for shuffle fragment to assign backend
2020-06-03 22:01:55 +08:00
2ad1b20b24 [Config] Add new BE config for tcmalloc (#3732)
Add a new BE config tc_max_total_thread_cache_bytes
2020-06-03 21:58:13 +08:00
73c3de4313 [refactor] Simple refactor on class Reader (#3691)
This is a simple refactor patch on class Reader without any functional changes.
Main refactor points:
- Remove some useless return value
- Use range loop
- Use empty() instead of size() for some STL containers size judgement
- Use in-class initialization instead of initialize in constructor function
- Some other small refactor
2020-06-03 19:55:53 +08:00
ed886a485d [HttpServer] capture convert exception (#3736)
If parameter str is an empty string, it will throw exception too. Maybe we can add an ut for parsing parameters in http server.
2020-06-03 19:54:41 +08:00
e16873a6c1 Fix large string val allocation failure (#3724)
* Fix large string val allocation failure

Large bitmap will need use StringVal to allocate large memory, which is large than MAX_INT.
The overflow will cause serialization failure of bitmap.

Fixed #3600
2020-06-03 17:07:54 +08:00
70aa9d6ca8 [Memory Engine] Add MemTabletScan (#3734) 2020-06-03 15:42:38 +08:00
wyb
ad7270b7ca [Spark load][Fe 1/5] Add spark etl job config (#3712)
Add spark etl job config, includes:

1. Schema of the load tables, including columns, partitions and rollups
2. Infos of the source file, including split rules, corresponding columns, and conversion rules
3. ETL output directory and file name format
4. Job properties
5. Version for further extension
2020-06-03 11:23:09 +08:00
3194aa129d Add a link to Tablet Meta URL (#3745) 2020-06-03 10:10:32 +08:00
60f93b2142 Fix bitmap type (#3749) 2020-06-03 10:07:58 +08:00
761a0ccd12 [Bug] Fix bug that runningprofile show time problem in FE web page and add the runingprofile doc (#3722) 2020-06-02 11:07:15 +08:00
fdf66b8102 [MemTracker] add log depth & auto unregister (#3701) 2020-06-01 23:16:25 +08:00
ee260d5721 [Bug][FsBroker] NPE throw when username is empty (#3731)
When using Broker with an empty username, a NPE is thrown, which is
not expected.
2020-06-01 21:03:21 +08:00
wyb
8e71c0787c [Spark load][Fe 2/5] Update push task thrift interface (#3718)
1. Add TBrokerScanRange and TDescriptorTable used by ParquetScanner
2. Add new TPushType LOAD_V2 for spark load
2020-06-01 18:21:43 +08:00
30df9fcae9 Serialize origin stmt in Rollup Job and MV Meta (#3705)
* Serialize origin stmt in Rollup Job and MV Meta

In materialized view 2.0, the define expr is serialized in column.
The method is that doris serialzie the origin stmt of Create Materialzied View Stmt in RollupJobV2 and MVMeta.
The define expr will be extract from the origin stmt after meta is deserialized.

The define expr is necessary for bitmap and hll materialized view.
For example:
MV meta: __doris_mv_bitmap_k1, bitmap_union, to_bitmap(k1)
Origin stmt: select bitmap_union(to_bitmap(k1)) from table
Deserialize meta: __doris_mv_bitmap_k1, bitmap_union, null
After extract: the define expr `to_bitmap(k1)` from origin stmt should be extracted.
               __doris_mv_bitmap_v1, bitmap_union, to_bitmap(k1) (which comes from the origin stmt)

Change-Id: Ic2da093188d8985f5e97be5bd094e5d60d82c9a7

* Add comment of read method

Change-Id: I4e1e0f4ad0f6e76cdc43e49938de768ec3b0a0e8

* Fix ut

Change-Id: I2be257d512bf541f00912a374a2e07a039fc42b4

* Change code style

Change-Id: I3ab23f5c94ae781167f498fefde2d96e42e05bf9
2020-05-30 20:17:46 +08:00
5cb4063904 Fix UT ThreadPoolManagerTest failure (#3723) 2020-05-30 10:35:07 +08:00
43d25afa2c [compaction] Update cumulative point calculate algorithm (#3690)
Current cumulative point calculate algorithm may skip singleton rowset when the rowset has only one segment and with NONOVERLAPPING flag. When a tablet is new created and cumulate many singleton rowsets, cumulative point will be calculated as the max version + 1, and then cumulative compaction couldn't pick any rowsets and compaction failed, and
will lead the next base compaction on this tablet with all rowsets, which can also cause memory consume problem, suppose there are thousands of rowsets.
    All singleton rowsets must be newly wrote by delta writer and hasn't
do any compaction, we should place cumulative point before any of these rowsets.
2020-05-30 10:34:53 +08:00
7524c5ef63 [Memory Engine] Add MemSubTablet, MemTablet, WriteTx, PartialRowBatch (#3637) 2020-05-30 10:33:10 +08:00
c967eaf496 [Memory Engine] Add TabletType to PartitionInfo and TabletMeta (#3668) 2020-05-29 20:20:44 +08:00
93aae6bdff [Bug] fix mixed used of counter (#3720)
MysqlResultWriter _sent_rows_counter and _result_send_timer are mixed used.
It will results core dump when checking counter->type().
2020-05-29 15:36:21 +08:00
5f1d25a31a [Bug] Set the HttpResponseStatus for QueryProfile when query_id been not set (#3710)
Doris can get query profile by HttpRequest
```
http://fe_host:web_port/query_profile?query_id=123456
```
Now, if query_id is not found, the 404 error is not set in HttpHeader.
2020-05-29 10:06:43 +08:00
9c85d05e41 [Bug] RuntimeState should be destructed after DataSink (#3709)
Fixes #3706 

DataSink uses instance and query MemTracker from RuntimeState, therefore it should be destructed before RuntimeState. Otherwise memory corruption and segfault could happen.
2020-05-28 17:31:01 +08:00
e76f712bb3 [Bug] Load data is error in json load 2020-05-28 17:28:33 +08:00
8f71c7a331 Duplicate Key table core when predicate on metric column (#3699)
```
CREATE TABLE `query_detail` (
  `query_id` varchar(100) NULL COMMENT "",
  `start_time` datetime NULL COMMENT "",
  `end_time` datetime NULL COMMENT "",
  `latency` int(11) NULL COMMENT "unit is milliseconds",
  `state` varchar(20) NULL COMMENT "RUNNING/FINISHED/FAILED",
  `sql` varchar(1024) NULL COMMENT ""
)
DUPLICATE KEY(`query_id`)

SELECT COUNT(*) FROM query_detail WHERE start_time >= '2020-05-27 14:52:16' AND start_time < '2020-05-27 14:52:31';
```
The above query will core because of ZoneMap only in query_id.
Use start_time to match ZoneMap cause this core.
2020-05-28 14:35:40 +08:00
f89d970cfd [Bug][Metrics] Fix bug that some of metrics can not be got (#3708)
The metrics in a metric collector need have same type, but no need
to have same unit.
2020-05-28 09:09:14 +08:00
bc35f3a31f [DynamicPartition] Optimize the rule of creating dynamic partition (#3679)
Problem is described in ISSUE #3678 
This CL mainly changed to rule of creating dynamic partition.

1. If time unit is DAY, the logic remains unchanged.
2. If time unit is WEEK, the logical changes are as follows:

	1. Allow to set the start day of every week, the default is Monday. Optional Monday to Sunday
	2. Assuming that the starting day is a Tuesday, the range of the partition is Tuesday of the week to Monday of the next week.

3. If time unit is MONTH, the logical changes are as follows:

	1. Allow to set the start date of each month. The default is 1st, and can be selected from 1st to 28th.
	2. Assuming that the starting date is the 2nd, the range of the partition is from the 2nd of this month to the 1st of the next month.

4. The `SHOW DYNAMIC PARTITION TABLES` statement adds a `StartOf` column to show the start day of week or month.

It is recommended to refer to the example in `dynamic-partition.md` to understand.

TODO:
Better to support HOUR and YEAR time unit. Maybe in next PR.

FIX: #3678
2020-05-27 16:42:41 +08:00
1cc78fe69b [Enhancement] Convert metric to Json format (#3635)
Add a JSON format for existing metrics like this.
```
{
    "tags":
    {
        "metric":"thread_pool",
        "name":"thrift-server-pool",
        "type":"active_thread_num"
    },
    "unit":"number",
    "value":3
}
```
I add a new JsonMetricVisitor to handle the transformation.
It's not to modify existing PrometheusMetricVisitor and SimpleCoreMetricVisitor.
Also I add
1.  A unit item to indicate the metric better 
2. Cloning tablet statistics divided by database.
3. Use white space to replace newline in audit.log
2020-05-27 08:49:30 +08:00
12c59ba889 [Thirdparty][glog][bug] convert init be log file length use fopen function (#3649) 2020-05-26 22:42:50 +08:00
fb66bac5fe [Bug] Fix null pointer access in json-load (#3692)
Add check for null pointer to avoid core dump
2020-05-26 22:41:30 +08:00
dcd5e5df12 [AuditPlugin] Modify load label of audit plugin to avoid load confliction (#3681)
Change the load label of audit plugin as:

`audit_yyyyMMdd_HHmmss_feIdentity`.

The `feIdentity` is got from the FE which run this plugin, currently just use FE's IP_editlog_port.
2020-05-26 18:23:07 +08:00
wyb
4978bd6c81 [Spark load] Add resource manager (#3418)
1. User interface:

1.1 Spark resource management

Spark is used as an external computing resource in Doris to do ETL work. In the future, there may be other external resources that will be used in Doris, for example, MapReduce is used for ETL, Spark/GPU is used for queries, HDFS/S3  is used for external storage. We introduced resource management to manage these external resources used by Doris.

```sql
-- create spark resource
CREATE EXTERNAL RESOURCE resource_name
PROPERTIES 
(                 
  type = spark,
  spark_conf_key = spark_conf_value,
  working_dir = path,
  broker = broker_name,
  broker.property_key = property_value
)

-- drop spark resource
DROP RESOURCE resource_name

-- show resources
SHOW RESOURCES
SHOW PROC "/resources"

-- privileges
GRANT USAGE_PRIV ON RESOURCE resource_name TO user_identity
GRANT USAGE_PRIV ON RESOURCE resource_name TO ROLE role_name

REVOKE USAGE_PRIV ON RESOURCE resource_name FROM user_identity
REVOKE USAGE_PRIV ON RESOURCE resource_name FROM ROLE role_name
```



- CREATE EXTERNAL RESOURCE:

FOR user_name is optional. If there has, the external resource belongs to this user. If not, the external resource belongs to the system and all users are available.

PROPERTIES:
1. type: resource type. Only support spark now.
2. spark configuration: follow the standard writing of Spark configurations, refer to: https://spark.apache.org/docs/latest/configuration.html.
3. working_dir: optional, used to store ETL intermediate results in spark ETL.
4. broker: optional, used in spark ETL. The ETL intermediate results need to be read with the broker when pushed into BE.

Example: 

```sql
CREATE EXTERNAL RESOURCE "spark0"
PROPERTIES 
(                                                                             
  "type" = "spark",                   
  "spark.master" = "yarn",
  "spark.submit.deployMode" = "cluster",
  "spark.jars" = "xxx.jar,yyy.jar",
  "spark.files" = "/tmp/aaa,/tmp/bbb",
  "spark.yarn.queue" = "queue0",
  "spark.executor.memory" = "1g",
  "spark.hadoop.yarn.resourcemanager.address" = "127.0.0.1:9999",
  "spark.hadoop.fs.defaultFS" = "hdfs://127.0.0.1:10000",
  "working_dir" = "hdfs://127.0.0.1:10000/tmp/doris",
  "broker" = "broker0",
  "broker.username" = "user0",
  "broker.password" = "password0"
)
```



- SHOW RESOURCES:
General users can only see their own resources.
Admin and root users can show all resources.




1.2 Create spark load job

```sql
LOAD LABEL db_name.label_name 
(
  DATA INFILE ("/tmp/file1") INTO TABLE table_name, ...
)
WITH RESOURCE resource_name
[(key1 = value1, ...)]
[PROPERTIES (key2 = value2, ... )]
```

Example:

```sql
LOAD LABEL example_db.test_label 
(
  DATA INFILE ("hdfs:/127.0.0.1:10000/tmp/file1") INTO TABLE example_table
)
WITH RESOURCE "spark0"
(
  "spark.executor.memory" = "1g",
  "spark.files" = "/tmp/aaa,/tmp/bbb"
)
PROPERTIES ("timeout" = "3600")
```

The spark configurations in load stmt can override the existing configuration in the resource for temporary use.

#3010
2020-05-26 18:21:21 +08:00
77b9acc242 [Stmt] Add rowCount column to SHOW DATA stmt (#3676)
User can see the row count of all materialized indexes of a table.

```
mysql> show data from test;
+-----------+-----------+-----------+--------------+----------+
| TableName | IndexName | Size      | ReplicaCount | RowCount |
+-----------+-----------+-----------+--------------+----------+
| test2     | r1        | 10.000MB  | 30           | 10000    |
|           | r2        | 20.000MB  | 30           | 20000    |
|           | test2     | 50.000MB  | 30           | 50000    |
|           | Total     | 80.000    | 90           |          |
+-----------+-----------+-----------+--------------+----------+
```

Fix #3675
2020-05-26 15:53:38 +08:00
aa4ac2d078 [Bug] Serialize storage format in rollup job (#3686)
The segment v2 rollup job should set the storage format v2 and serialize it.
If it is not serialized, the rollup of segment v2 may use the error format 'segment v1'.
2020-05-26 15:35:12 +08:00
f4c03fe8e2 1. Delete the code of Sort Node we do not use now. (#3666)
Optimize the quick sort by find_the_median and try to reduce recursion level of quick sort.
2020-05-26 10:20:57 +08:00
963d4d48aa Override the style of sidebar's sub-direcotry (#3683)
Override the style of sidebar's sub-directory.
2020-05-26 09:07:55 +08:00
3ffc447b38 [OUTFILE] Support INTO OUTFILE to export query result (#3584)
This CL mainly changes:

1. Support `SELECT INTO OUTFILE` command.
2. Support export query result to a file via Broker.
3. Support CSV export format with specified column separator and line delimiter.
2020-05-25 21:24:56 +08:00
6788cacb94 Fix unit test failed (#3642)
Fix some unittest failed due to glog, this may be we change the ut build dir,and the log path is not exist in new build dir, so we change the log from file to stdout
2020-05-25 18:55:19 +08:00
e6864a1cda Allow user to set thrift_client_timeout_ms config for thrift server (#3670)
1. Allow user to set thrift_client_timeout_ms config for thrift server
2. Add doc for thrift_client_timeout_ms config
2020-05-25 11:32:14 +08:00
2608f83bdc [WIP] Add define expr for column (#3651)
In the materialized view 2.0 the define expr should be set in column. 
For example, the to_bitmap function on integer should be define in mv column.

```
create materialized view mv as select bitmap_union(to_bitmap(k1)) from table.
```
The meta of mv as following:
column name: __doris_materialized_view_bitmap_k1
column aggregate type: bitmap_union
column define exrp: to_bitmap(k1)

The is WIP pr for materialized view 2.0.

#3344
2020-05-25 11:00:29 +08:00
ec955b8a36 [Bug] Fix bug that runningTxnNum does not equal to the real running txn num. (#3674)
This is because the logic for modifying the number of things running is wrong.

Because we did not persist the previous status(preStatus) of a transaction.
Therefore, when replaying the metadata log, we cannot decide whether to modify
the `runningTxnNum` value based on `preStatus`. This info is lost.
2020-05-25 10:41:38 +08:00
12ebd5d82b Remove some outdate test (#3672) 2020-05-25 09:23:56 +08:00
838c1e9212 Modify HLL functions return type (#3656)
1、Modify hll_hash function return type to HLL
2、Make HLL_RAW_AGG is alias of HLL_UNION
2020-05-24 21:22:43 +08:00
ef9c716682 [Bug] Fix bug that missing OP_SET_REPLICA_STATUS when reading journal (#3662) 2020-05-22 23:04:47 +08:00
1124808fbc [Enhancement] Add detail msg to show the reason of publish failure. (#3647)
Add 2 new columns `PublishTime` and `ErrMsg` to show publish version time and  errors happen during the transaction process. Can be seen by executing: 

`SHOW PROC "/transactions/dbId/";`
or
`SHOW TRANSACTION WHERE ID=xx;`

Currently is only record error happen in publish phase, which can help us to find out which txn
is blocked.

Fix #3646
2020-05-22 22:59:53 +08:00
ba7d2dbf7b [Function] Support utf-8 encoding in instr, locate, locate_pos, lpad, rpad (#3638)
Support utf-8 encoding for string function `instr`, `locate`, `locate_pos`, `lpad`, `rpad`
and add unit test for them
2020-05-22 14:34:26 +08:00
16deac96a9 [UT][Bug] Fix the ut error of bitmap_intersect (#3664)
Change-Id: Id32fd9381119f30786acae9b4ac61b0d5ef9df48
2020-05-22 10:29:12 +08:00
00d563d014 [SQL] Support more syntax in case when clause (#3625)
support support more syntax in case-when clause with subquey.
suport query like ` case when k1 > subquery1 and k2 < subquey2 then ... else ... ` or  `case when subquey in null then ...`
2020-05-22 10:22:59 +08:00
dbfe8a067f [Doc ]Add docs of max_running_txn_num_per_db (#3657)
Change-Id: Ibdbc19a5558b0eb3f6a5fc4ef630de255b408a92
2020-05-22 10:22:11 +08:00