Commit Graph

13073 Commits

Author SHA1 Message Date
0554e89645 [Alter] Fix bug of assertion failure when submitting schema change job (#3181)
When creating a schema change job, we will create a corresponding shadow replica for each replica.
Here we should check the state of the replica and only create replicas in the normal state.

The process here may need to be modified later. We should completely allow users to submit alter jobs
under any circumstances, and then in the job scheduling process, dynamically detect changes in the replicas
and do replica repairs, instead of forcing a check on submission.
2020-03-31 12:06:30 +08:00
e9b3584d45 [Bug] Fix bug that desc tbl all stmt throw error: Malformed packet (#3233) 2020-03-31 10:29:53 +08:00
4131afe316 [Bug] NPE when using unknown function in broker load process (#3225)
This CL fix the bug described in issue #3224 by

1. Forbid UDF in broker load process
2. Improving the function checking logic to avoid NPE when trying to
   get default database from ConnectionContext.
2020-03-30 18:34:41 +08:00
2e1a0030bc Add some connect samples (#3221)
Add connect samples for golang, java , nodejs, php, python.
2020-03-30 13:54:36 +08:00
5f9359d618 Use SleepFor() instead of usleep() (#3211) 2020-03-29 14:18:19 +08:00
e4682398bd [web] Dump configs on BE's website '/varz' (#3220)
Dump configs on BE's website '/varz'
Change NAVIGATION_BAR_PREFIX from 'Impala' to 'Doris'
Format the related files by clang-format
2020-03-28 16:26:38 +08:00
41f1ab006b Add curdate/now function in fe (#3215) 2020-03-28 13:39:54 +08:00
6cf217f0c7 Fix WARNING to WARN in fe.conf sys_log_level (#3218)
When I used it, I changed it to WARING in the comments, and the log didn't work because there was no warning-level log in Java
2020-03-28 10:13:15 +08:00
4a5164ab9d Fix 'Filesystem closed' in broker load (#3216) 2020-03-28 09:14:45 +08:00
d3555e3624 [Conf][API Change] Change the default FE meta dir and BE storage_root_path
1. Change word of palo to doris in conf file.
2. Set default meta_dir to ${DORIS_HOME}/doris-meta
3. Comment out FE meta_dir, leave it to ${DORIS_HOME}/doris-meta, as exsting in FE Config.java.
4. Comment out BE storage_root_path, leave it to ${DORIS_HOME}/storage, as exsting in BE config.h.

NOTICE: default config is changed.
2020-03-27 20:42:12 +08:00
cb68e10217 [MaterializedView] Add 'IndexKeysType' field in 'Desc all table stmt' (#3209)
After doris support aggregation materialized view on duplicate table, 
desc stmt of metadata is confused in sometimes. The reason is that
there is no grouping information in desc stmt of metadata.

For example:
There are two materialized view as following.
    1. create materialized view k1_k2 as select k1, k2 from table;
    2. create materialzied view deduplicated_k1_k2 as select k1, k2 from table group by k1, k2;
Before this commit, the metatdata in desc stmt is the same.

   ```
    +-----------------------+-------+----------+------+-------+---------+-------+
    | IndexName             | Field | Type     | Null | Key   | Default | Extra |
    +-----------------------+-------+----------+------+-------+---------+-------+
    | k1_k2                 | k1    | TINYINT  | Yes  | true  | N/A     |       |
    |                       | k2    | SMALLINT | Yes  | true  | N/A     |       |
    | deduplicated_k1_k2    | k1    | TINYINT  | Yes  | true  | N/A     |       |
    |                       | k2    | SMALLINT | Yes  | true  | N/A     |       |
    +-----------------------+-------+----------+------+-------+---------+-------+
   ```

So, we need to show the KeysType of materialized view in desc stmt.
Now, the desc stmt of all mvs is changed as following:

    ```
    +-----------------------+---------------+-------+----------+------+-------+---------+-------+
    | IndexName             | IndexKeysType | Field | Type     | Null | Key   | Default | Extra |
    +-----------------------+---------------+-------+----------+------+-------+---------+-------+
    | k1_k2                 | DUP_KEYS      | k1    | TINYINT  | Yes  | true  | N/A     |       |
    |                       |               | k2    | SMALLINT | Yes  | true  | N/A     |       |
    | deduplicated_k1_k2    | AGG_KEYS      | k1    | TINYINT  | Yes  | true  | N/A     |       |
    |                       |               | k2    | SMALLINT | Yes  | true  | N/A     |       |
    +-----------------------+---------------+-------+----------+------+-------+---------+-------+
    ```

NOTICE: this modify the the column of `desc` stmt.
2020-03-27 20:36:02 +08:00
aa8b2f86c4 [Bug][Refactor] Fix the conflict of temp partition and dynamic partition operations (#3201)
The bug is described in issue: #3200.

This CL solve the problem by:
1. Refactor the alter operation conflict checking logic by introducing new classes `AlterOperations` and `AlterOpType`.
2. Allow add/drop temporary partition when dynamic partition feature is enabled.
3. Allow modifying table's property when there is temporary partition in table.
4. Make the properties `dynamic_partition.enable` optional, and default is true.
2020-03-27 20:25:15 +08:00
c1969a3fb3 [Conf] Make default_storage_medium configurable (#2980)
Doris support choose medium when create table, and the cluster balance strategy is dependent
between different storage medium, and most use will not specify the storage medium when create table,
even they kown that they should choose a storage medium, they have no idea about the
cluster's storage medium, so, I think we should make storage_medium and storage_cooldown_time
configurable, and this should be the admin's responsibility.

For Example, if the cluster's storage medium is HDD, but we need to change part of machines to SSD,
if we change the machine, the tablets before change is stored in HDD and they can't find a dest path
to migrate, and user will create table as usual, it will make all tablets stored in old machines and
the new machines will only store a little tablets. Without this config the only way is admin need
to traverse all partitions in cluster and change the property of storage_medium, it will increase
operational and maintenance costs.

So I add a FE config default_storage_medium, so that user can set the default storage medium.
2020-03-27 20:22:18 +08:00
32c4fc691c Support determine isPreviousLoadFinished for some alter jobs in table level (#3196)
This PR is to reduce the time cost for waiting transactions to be completed in same db by filter the running transactions in table level.

NOTICE: Update FE meta version to 79
2020-03-27 20:16:23 +08:00
0462607d8d StorageEngine: unused_rowsets use unordered_multimap (#3207) 2020-03-27 14:30:31 +08:00
16b61b62f5 [Spark] Support convert Arrow data to RowBatch asynchronously in Spark-Doris-Connector (#3186)
Currently, in the Spark-Doris-Connector, when Spark iteratively obtains each row of data,
it needs to synchronously convert the Arrow format data into the row format required by Spark.
In order to speed up the conversion process, we can add an asynchronous thread in the Connector,
which is responsible for obtaining the Arrow format data from BE and converting it into the row
format required by Spark calculation

In our test environment, Doris cluster used 1 fe and 7 be (32C+128G). When using Spark-Doris-Connector
to query a table containing 67 columns, the original query returned 69 million rows of data
took about 2.5min, but after improvement, it reduced to about 1.6min, which reduced the time by about 30%
2020-03-26 21:34:37 +08:00
c4c37a4394 Rewritten subquery in having clause (#3206)
The subquery in having clause should be rewritten too.
If not, ExprRewriteRule will not be apply in subquery.
For example:
select k1, sum (k2) from table group by k1 having sum(k2) > (select t1 from table2 where t2 between 1 and 2);
```t1 between 1 and 2``` should be rewritten to ```t1 >=1 and t1<=2```.

Fixed #3205. TPC-DS 14 will be passed after this commit.
2020-03-26 21:13:57 +08:00
cc31bf9cf9 [rowset id] A little improvement of rowset id generator (#3203)
The main optimization points:
1. Use std::unordered_set instead of std::set, and use RowsetId.hi as RowsetId's hash value.
2. Minimize the scope of SpinLock in UniqueRowsetIdGenerator.

Profile comparation:
* Run UniqueRowsetIdGeneratorTest.GenerateIdBenchmark 10 times
old version |  new version
6s962ms     |  3s647ms
6s139ms     |  3s393ms
6s234ms     |  3s686ms
6s060ms     |  3s447ms
5s966ms     |  4s127ms
5s786ms     |  3s994ms
5s778ms     |  4s072ms
6s193ms     |  4s082ms
6s159ms     |  3s560ms
5s591ms     |  3s654ms
2020-03-26 20:24:26 +08:00
eda23b57f2 [Plugin] Create the FE plugin dir if missing (#3202)
The FE plugin dir should be created when initializing.
Also modify the pom.xml in fe_plugins dir to make it able to use custom maven setting.
2020-03-26 11:21:10 +08:00
a07fedd832 Fix unix_timestamp core where time less 1970 (#3198) 2020-03-25 23:16:58 +08:00
f585f30b1e [Plugin] Add FE plugin framework (#2463)
issue #2344 

* Add install/unintall Plugin statement
* Add show plugin statement
* Support install plugin through two ways:
    * Built-in Plugin: use PluginMgr's register method.
    * Dynamic Plugin: install by SQL statement, and the process:
        1. check Plugin has already install?
        2. download Plugin file from remote source or copy from local source
        3. extract Plugin's .zip 
        4. read Plugin's plugin.properties, and check Plugin's Value
        5. dynamic load .jar and init Plugin's main Class
        6. invoke Plugin's init method
        7. register Plugin into PluginMgr.
        8. update meta

* Support FE Plugin dynamic uninstall process
    1. check Plugin has install?
    2. invoke Plugin's close method
    3. delete Plugin from PluginMgr
    4. update meta

* Add audit plugin interface 
* Add plugin enable flags in Config
* Add plugin install path in Config, default plugin will install in ${DORIS_FE_PATH}/plugins
* Add FE plugins project
* Add audit plugin demo

The usage:

```
// install plugin and show plugins;

mysql>
mysql> install plugin from "/home/users/seaven/auditplugin.zip";                                              
Query OK, 0 rows affected (0.05 sec)
mysql>
mysql> show plugins;
+-------------------+-------+---------------+---------+-------------+------------------------+--------+---------------------------------------+
| Name              | Type  | Description   | Version | JavaVersion | ClassName              | SoName | Sources                               |
+-------------------+-------+---------------+---------+-------------+------------------------+--------+---------------------------------------+
| audit_plugin_demo | AUDIT | just for test | 0.11.0  | 1.8.31      | plugin.AuditPluginDemo | NULL   | /home/users/hekai/auditplugindemo.zip |
+-------------------+-------+---------------+---------+-------------+------------------------+--------+---------------------------------------+
1 row in set (0.00 sec)

mysql> show plugins;
+-------------------+-------+---------------+---------+-------------+------------------------+--------+---------------------------------------+
| Name              | Type  | Description   | Version | JavaVersion | ClassName              | SoName | Sources                               |
+-------------------+-------+---------------+---------+-------------+------------------------+--------+---------------------------------------+
| audit_plugin_demo | AUDIT | just for test | 0.11.0  | 1.8.31      | plugin.AuditPluginDemo | NULL   | /home/users/hekai/auditplugindemo.zip |
+-------------------+-------+---------------+---------+-------------+------------------------+--------+---------------------------------------+
1 row in set (0.00 sec)

mysql> uninstall plugin audit_plugin_demo; 
Query OK, 0 rows affected (0.04 sec)
mysql> show plugins;
Empty set (0.00 sec)
```

TODO:

*Config.plugin_dir should be created if missing
2020-03-25 22:57:05 +08:00
8426669472 [Plugin] Add BE plugin framework (#2348) (#2618)
Support BE plugin framework, include:

* update Plugin Manager, support Plugin find method
* support Builtin-Plugin register method

* plugin install/uninstall process
	* PluginLoader:
		* dynamic install and check Plugin .so file
		* dynamic uninstall and check Plugin status
	* PluginZip:
		* support plugin remote/local .zip file download and extract

TODO:

* We should support a PluginContext to transmit necessary system variable when the plugin's init/close method invoke

* Add the entry which is BE dynamic Plugin install/uninstall process, include:
	* The FE send install/uninstall Plugin statement (RPC way)
	* The FE meta update request with Plugin list information
	* The FE operation request(update/query) with Plugin (maybe don't need)

* Add the plugin status upload way
* Load already install Plugin when BE start
2020-03-25 21:55:44 +08:00
8fa328c344 [Doc]Update doc for dynamic partition (#3093)
Add explain of dynamic dropping partition.
2020-03-25 20:45:13 +08:00
c0282bbc58 Solve the problem of mv selector when there is having clause in query (#3176)
All of columns which belong to top of tupleIds in query should be considered in mv selector.
For example:

`select k1 from table group by k1 having sum(v1) >1;`

The candidate index should contain k1 and v1 columns instead of only k1.
The rollup which only has k1 column should not be selected.

The issue #3174 describe in detail.
2020-03-25 20:42:39 +08:00
8aa8b8c96d [Code Refactor] Using block manager to unify the data file access. (#3189)
Earlier we introduced `BlockManager` to separate data access logic from
underlying file read and write logic.

This CL further unifies all `SegmentV2` data access to the `BlockManager`, 
removes the previous `FileManager` class, and move the file cache to the `FileBlockManager`.

There are no logical changes to this CL.

After this CL, all user table data is read through the `WritableBlock` and `ReadableBlock` 
returned by the `BlockManager`, and no file operations are performed directly.
2020-03-25 20:39:07 +08:00
dfd1a33712 [Dynamic Partition] Unify dynamic partition name and range (#3193)
Generates partition names based on the granularity.
eg:
Year:prefix2020
Day:    prefix20200325
Week: prefix2020_#,  # is the week of year.

At the same time, for all granularity, align the partition range to 00:00:00.
2020-03-25 18:37:05 +08:00
71bc815b20 [SQL] Support subquery in case when statement (#3135)
#3153
implement subquery support for  sub query in case when statement like
```
SELECT CASE
        WHEN (
            SELECT COUNT(*) / 2
            FROM t
        ) > k4 THEN (
            SELECT AVG(k4)
            FROM t
        )
        ELSE (
            SELECT SUM(k4)
            FROM t
        )
    END AS kk4
FROM t;
```

this statement will be rewrite to 
```
SELECT CASE
        WHEN t1.a > k4 THEN t2.a
        ELSE t3.a
    END AS kk4
FROM t, (
        SELECT COUNT(*) / 2 AS a
        FROM t
    ) t1,  (
        SELECT AVG(k4) AS a
        FROM t
    ) t2,  (
        SELECT SUM(k4) AS a
        FROM t
    ) t3;
```
2020-03-25 17:12:54 +08:00
b2518fc285 [SQL] Support non-correlated subquery in having clause (#3150)
This commit support the non-correlated subquery in having clause.
For example:

select k1, sum(k2) from table group by k1 having sum(k2) > (select avg(k1) from table);

Also the non-scalar subquery is supportted in Doris.
For example:

select k1, sum(k2) from table group by k1 having sum(k2) > (select avg(k1) from table group by k2);

Doris will check the result row numbers of subquery in executing.
If more then one row returned by subquery, the query will thrown exception.

The implement method:

The entire outer query is regarded as inline view of new query.
The subquery in having clause is changed to the where predicate in this new query.

After this commit, tpc-ds 23,24,44 are supported.

This commit also support the subquery in ArithmeticExpr.
For example:

select k1  from table where k1=0.9*(select k1 from t);
2020-03-25 16:29:09 +08:00
3cff89df7f [Dynamic Partition] Support for automatically drop partitions (#3081) 2020-03-25 10:24:46 +08:00
e794bb69b7 [BUG] Make default result ordering of SHOW PARTITIONS statement be consist with 0.11 (#3184) 2020-03-24 17:14:27 +08:00
e20d905d70 Remove unused KUDU codes (#3175)
KUDU table is no longer supported long time ago. Remove code related to it.
2020-03-24 13:54:05 +08:00
3b32938140 [Doc] Create CONTRIBUTING.md (#3180) 2020-03-24 13:42:21 +08:00
d4c1938b5c Open datetime min value limit (#3158)
the min_value in olap/type.h of datetime is 0000-01-01 00:00:00, so we don't need restrict datetime min in tablet_sink
2020-03-24 10:52:57 +08:00
dff3c0d57e Revert "Remove deep copy when doing hash table EvalRow (#3171)" (#3173) 2020-03-23 15:29:46 +08:00
d837231fca [RoutineLoad] Fix bug that job will be paused when table is altering (#3169)
Also add some debug log to observe the cost time of the process of routine load task
2020-03-23 11:05:00 +08:00
473a67a5b8 [Syntax] Remove all EmptyStmt from the end of multi-statements list (#3140)
to resolve the ISSUE: #3139 

When user execute query by some client library such as python MysqlDb, if user execute like:

     "select * from tbl1;"  (with a comma at the end of statement)
     
The sql parser will produce 2 statements: `SelectStmt` and `EmptyStmt`.
Here we discard the `EmptyStmt` to make it act like one single statement.

This is for some compatibility. Because in python MysqlDb, if the first `SelectStmt` results in
some warnings, it will try to execute a `SHOW WARNINGS` statement right after the 
SelectStmt, but before the execution of `EmptyStmt`. So there will be an exception:

     `(2014, "Commands out of sync; you can't run this command now")`

I though it is a flaw of python MysqlDb.
However, in order to maintain the consistency of user use, here we remove all EmptyStmt
at the end to prevent errors.(Leave at least one statement)

But if user execute statements like:

     `"select * from tbl1;;select 2"`
     
If first `select * from tbl1` has warnings, python MysqlDb will still throw exception.
2020-03-23 09:39:22 +08:00
wyb
dd8d748c55 Remove deep copy when doing hash table EvalRow (#3171)
remove varchar column deep copy in partitioned hash table EvalRow function
2020-03-21 09:52:49 +08:00
d29ed84b6a [Bug] Fix bug that right semi/anti join is not right (#3167)
This bug is introduced by PR: #3148.
right semi/anti join can not use `insert_unique` in build phase of join.
2020-03-20 20:58:55 +08:00
47a3d5000b [UnitTest] Fix unit test bug in BetaRowset and PageCacheTest (#3157)
1. BlockManager has been added into StorageEngine.
   So StorageEngine should be initialized when starting BetaRowset unit test.

2. Cache should not use the same buf to store value, otherwise the address
   will be freed twice and crash.
2020-03-20 20:37:50 +08:00
6beadfda71 [Bug] Fix delete predicate bug for segment v2 (#3164)
This bug is because the min and max wrapper field is not initialized
when there is no predicate of that column.
2020-03-20 20:35:55 +08:00
2dc995df7b [CodeStyle] Rename new_partition_aggregation_node and new_partitioned_hash_table (#3166) 2020-03-20 19:59:01 +08:00
5a8fcd263f [CodeStyle] Delete obsolete code of partition_aggregation_node and partitioned_hash_table (#3162) 2020-03-20 16:25:29 +08:00
c08d6e4708 [tablet meta] Do some refactor on TabletMeta (#3136)
remove some functions' return value which always return OLAP_SUCCESS
optimize some loops
2020-03-20 15:03:22 +08:00
2d3dbc2c42 Revert "[CodeStyle] Del obsolete code of partition_aggregation_node (#3154)" (#3160)
This reverts commit dae013d797c1c2c9e54246d5ace4bdd90b297d43.
2020-03-20 14:47:25 +08:00
5f004cb009 Revert "[CodeStyle] Remove unused PartitionedHashTable (#3156)" (#3159)
This reverts commit d3fd44f0a2fe076d2c62851babc162fcebe4d63b.
2020-03-20 14:42:40 +08:00
d3fd44f0a2 [CodeStyle] Remove unused PartitionedHashTable (#3156) 2020-03-20 12:19:08 +08:00
dae013d797 [CodeStyle] Del obsolete code of partition_aggregation_node (#3154) 2020-03-20 11:33:55 +08:00
f0db9272dd [Performance] Improve performence of hash join in some case (#3148)
improve performent of hash join  when build table has to many duplicated rows, this will cause hash table collisions and slow down the probe performence.
In this pr when join type is  semi join or anti join, we will build a hash table without duplicated rows.
benchmark:
dataset: tpcds dataset  `store_sales` and `catalog_sales`
```
mysql> select count(*) from catalog_sales;
+----------+
| count(*) |
+----------+
| 14401261 |
+----------+
1 row in set (0.44 sec)

mysql> select count(distinct cs_bill_cdemo_sk) from catalog_sales;
+------------------------------------+
| count(DISTINCT `cs_bill_cdemo_sk`) |
+------------------------------------+
|                            1085080 |
+------------------------------------+
1 row in set (2.46 sec)

mysql> select count(*) from store_sales;
+----------+
| count(*) |
+----------+
| 28800991 |
+----------+
1 row in set (0.84 sec)

mysql> select count(distinct ss_addr_sk) from store_sales;
+------------------------------+
| count(DISTINCT `ss_addr_sk`) |
+------------------------------+
|                       249978 |
+------------------------------+
1 row in set (2.57 sec)
```

test querys:
query1: `select count(*) from (select store_sales.ss_addr_sk  from store_sales left semi join catalog_sales  on catalog_sales.cs_bill_cdemo_sk = store_sales.ss_addr_sk) a;`

query2: `select count(*) from (select catalog_sales.cs_bill_cdemo_sk from catalog_sales left semi join store_sales on catalog_sales.cs_bill_cdemo_sk = store_sales.ss_addr_sk) a;`

benchmark result:


||query1|query2|
|:--:|:--:|:--:|
|before|14.76 sec|3 min 16.52 sec|
|after|12.64 sec|10.34 sec|
2020-03-20 10:31:14 +08:00
12d1b072ef [Bug] Fix bug that of union statement (#3137)
fix a bug of const union query like `select null union select null`, this because the type of SlotDescriptor when clause is `select null` is null ,this will cause BE core dump, and FE find wrong cast function.
2020-03-20 09:51:38 +08:00
c88e8ab1ab Add some system variables (#3144)
Add event_scheduler and storage_engine system variables to compatible with some mysql client connect, say DataGrip of JetBrains.
2020-03-20 09:28:34 +08:00