Commit Graph

1027 Commits

Author SHA1 Message Date
5032b7fe7a Support materialized view schema change in bitmap hll and count field [#3739] (#3873)
+ Building the materialized view function for schema_change here based on defineExpr.
+ This is a trick because the current storage layer does not support expression evaluation.
+ count distinct materialized view will set mv_expr with to_bitmap or hll_hash.
+ count materialized view will set mv_expr with count.
+ Support to regenerate historical data when a new materialized view is created in BE。
    + Support to_bitmap function
    + Support hll_hash function
    + Support count(field) function
For #3344
2020-07-16 10:45:15 +08:00
78a1dea19d Support using B/K/KB/M/MB/G/GB/T/TB/P/PB as unit in session variable exec_mem_limit (#4063)
Support using B/K/KB/M/MB/G/GB/T/TB/P/PB as unit in  session variable exec_mem_limit
2020-07-13 20:54:14 +08:00
e435e6f9a8 [Bug][Planner]Fix bug of count(*) in MV selector (#4060)
The output columns of query should be collected by all of tupleIds
in BaseTableRef rather than the top tupleIds of query.
The top tupleIds of count(*) is Agg tuple which does not expand the star.

Fixed #4065
2020-07-13 20:53:10 +08:00
d7893f0fa7 [Bug]Fix some schema change not work right (#4009)
[Bug]Fix some schema change not work right
This CL mainly fix some schema change to varchar type not work right
because forget to logic check && Add ConvertTypeResolver to add
supported convert type in order to avoid forget logic check
2020-07-11 10:18:29 +08:00
265c26f67d [Doris On ES] Add docvalue limitation for doc_values scan and enable doc_values scan default (#4055) 2020-07-10 18:37:36 +08:00
ebaa0c7137 [Bug][SQL]Fix predicate pushdown may incorrect when groupby with grouping sets (#4041)
Fixes #4040 
Fix predicate pushdown may incorrect when groupby with grouping sets
2020-07-09 21:49:37 +08:00
d2ab38a5e0 [Feature] Batch update partition's property in one command (#3981)
Support following command.
```
alter table tbl_name modify partition (p1, p2, p3) set ("replication_num" = "3");
```
2020-07-09 21:48:43 +08:00
5a27981e49 [Config] Add thrift_client_retry_interval_ms config in be for thrift client to avoid avalanche disaster in fe thrift server (#4022)
This PR is mainly to add  `thrift_client_retry_interval_ms` config in be for thrift client
to avoid avalanche disaster in fe thrift server and fix some typo and some rpc
setting problems at the same time.
2020-07-08 21:07:00 +08:00
fb0ecb70fd [SQL]fix inline view join mysql choose shuffle join bug (#4048)
fix #4047 

#3886 has certain relevance to this case。

the sql : `bigtable t1 join mysqltable t2 join mysqltable t3 on t1.k1 = t3.k1`
1. after reorder:
    t1, t2, t3

2. choose join t1 with t2:  
   t1 join t2 with no conditions, and Doris choose cross join

3. choose join (t1 join on t2) with t3:  
   in old code, the t2 is mysqlTable, so the cardinality is zero,
and "the cross join t1 with t2" 's cardinality is t1.cardinality multiply t2.cardinality, 
for t2 is mysql, so t2.cardinality is zero, and "the cross join t1 with t2" is zero.
t3 is mysqltable, t3's cardinality is zero.

**If two tables need to be joined both are zero,we will choose the shuffle join**

So I change the mysql table ‘s cardinality from 0 to 1,  the cross join's cardinality is not zero.
2020-07-08 20:56:24 +08:00
7715a84d4d [Config] Enable some features by default (#4031)
Its time to enable some features by default.

1. Enable FE plugins by setting `plugin_enable=true`
2. Enable dynamic partition by setting `dynamic_partition_enable=true`
3. Enable nio mysql server by setting `mysql_service_nio_enabled=true`

Also modify installation doc, add download link of MySQL client.
2020-07-08 09:59:10 +08:00
b7051d0971 [Config]Make it easier for users to find configuration items needed (#3957)
This PR is to make config items ordered by key and support like predicate for admin show config stmt
2020-07-07 23:12:21 +08:00
1aa148da7f [Bug]Fix mini load NPE (#4026)
for #4025
2020-07-07 23:08:08 +08:00
5c42514a8f [Bug][SQL]Fix except node child not order correctly (#4003)
Fixes #3995 
## Why does it happen
When SetOperations encounters that the previous node needs Aggregate, the timing of add AggregationNode is wrong. You should add AggregationNode first before add other children.

## Why doesn't intersect and union have this problem
intersect and union conform to the commutation law, so it doesn't matter if the order is wrong

## Why this problem has not been tested before
In the previous test case, not cover the previous node was not AggregationNode
2020-07-07 23:06:36 +08:00
1cc9e1606f [Doris On ES] Add UT test for all search phase (#4035)
I forget push some UT test in this PR #4012.
Also remove `_cluster/state` resource because DOE does not rely the full ES cluster state meta.
2020-07-07 23:05:02 +08:00
c9a7c373a7 [Bug] Return actual json for ConnectionAction (#4016) 2020-07-07 20:14:55 +08:00
3ba38e3381 [Doris On ES][Refactor] refactor and enchanment ES sync meta logic (#4012)
After PR #3454 was merged, we should refactor and reorganize some logic for long-term sustainable iteration for Doris On ES.
To facilitate code review,I would divided into this work to multiple PRs (some other WIP work I also need to think carefully)

This PR include:

1. introduce SearchContext for all state we needed
2. divide meta-sync logic into three phase
3. modify some logic processing
4. introduce version detect logic for future using
2020-07-07 09:04:05 +08:00
913b2caac4 [Dynamic Partition]Support set replication number (#3965)
This CL mainly support set replication_num property in dynamic partition
table if dynamic_partition.replication_num is not set, the value is the
same as table's default replication_num.
2020-07-05 16:28:38 +08:00
fa338fb6d9 [Bug][Memroy Leak]Fix bug TransactionState is not clear from idToFinalStatusTransactionState (#4013)
This CL includes:
1. Memory leak because transactionState is not removed.
2. Extracting the clear logic to method to avoid forget.
2020-07-05 16:27:41 +08:00
ba120292ab [ShowIndex] Make Show Index stmt act same as MySQL behavior (#4010)
`SHOW INDEX FROM db2.tbl1 FROM db1;` will be same as
`SHOW INDEX FROM db1.tbl1;`
2020-07-05 16:26:54 +08:00
1fc82cd6e4 [Code Cleanup]Use ThreadPoolManager to manage some native thread (#3997)
Now, FE use ThreadPoolManager to manage and monitor all Thread,
but there are still some threads are not managed. And FE use `Timer` class
to do some scheduler task, but `Timer` class has some problem and is out of date,
It should replace by ScheduledThreadPool.
2020-07-05 16:26:22 +08:00
7351f7c237 [Config]Allower use to config different thrift server model (#3986)
Doris only support TThreadPoolServer model in thrift server, but the
server model is not effective in some high concurrency scenario, so this
PR introduced new config to allow user to choose different server model
by their scenario.
Add new FE config: `thrift_server_type`
2020-07-05 16:24:29 +08:00
f521507a46 [SQL] Explain verbose stmt to print tupleDesc/slotDesc information (#3970) 2020-07-05 16:22:43 +08:00
64f7a1fd1e [Log] Add log for loading image (#3996)
When fe load image failed, more logs should be printed to help users analyze errors.
2020-07-03 21:19:08 +08:00
9bb7e5d208 Fix some code & comments (#3999)
TPlanExecParams::volume_id is never used, so delete the print_volume_ids() function.
Fix log, and log if PlanFragmentExecutor::open() returns error.
Fix some comments
2020-07-03 21:18:47 +08:00
5ade21b55d [Load] Support load true or false as boolean value (#3898)
Fixes #3831
After this PR 
insert into: `1/"1" -> 1, 0/"0"->0, true/"true"->1, false/"false" -> 0, "10"->null, "xxxx" -> null`
load: `1/true -> 1, 0/false -> 0` other -> null
2020-07-02 13:58:24 +08:00
707d03cbde [SQL] Remove order by for subquery in set opertion clause (#3806)
implemnets #3803 
Support disable some unmeaningful order by clause.
The default limit of 65535 will not be disabled because of it is added at plannode,
after we support spill to disk we can move this limit to analyze.
2020-07-02 13:56:53 +08:00
2362500e77 [Doris On ES] Support create table with wildcard or aliase index (#3968) 2020-07-01 22:08:06 +08:00
fdcbea480d [Enhancement] DO NOT increase report version for publish task (#3894)
Fixes #3893 

In a cluster with frequent load activities, FE will ignore most tablet report from BE 
because currently it only handle reports whose version >= BE's latest report version
(which is increased each time a transaction is published). This can be observed from FE's log,
with many logs like `out of date report version 15919277405765 from backend[177969252].
current report version[15919277405766]` in it.

However many system functionalities rely on TabletReport processing to work properly. For example
1. bad or version miss replica is detected and repaired during TabletReport
2. storage medium migration decision and action is made based on TabletReport
3. BE's old transaction is cleared/republished during TabletReport

In fact, it is not necessary to update the report version after the publish task.
Because this is actually a problem left over by history. In the reporting logic of the current version,
we will no longer decrease the version information of the replica in the FE metadata according to the report.
So even if we receive a stale version of the report, it does not matter.

This CL contains mainly two changes

1. do not increase report version for publish task
2. populate `tabletWithoutPartitionId` out of read lock of TabletInvertedIndex
2020-07-01 09:23:40 +08:00
1bfb105ec1 [Bug] Fix bug that routine load task throw exception when calling afterVisible() (#3979) 2020-07-01 09:22:33 +08:00
f9a52f5db4 [Bug] Insert may leak DeltaWriter when re-analyzed (#3973) 2020-06-30 11:09:53 +08:00
3ac459f0ca [UT] resolve metric ut fails (#3975) 2020-06-29 21:54:41 +08:00
48398232e7 [Bug] Fix bug that default_rowset_type have a session variable (#3953)
This  PR is mainly for fixing bug that  `default_rowset_type` have a session variable
2020-06-29 19:16:42 +08:00
48d947edf4 Support rpc_timeout property in stream load request to cancel request in fe in time when stream load request is timeout (#3948)
This PR is to enable cancel stream load request in FE in time
when stream load request is timeout to make stream load more robust.
2020-06-29 19:16:16 +08:00
2c96d27fdc [Enhance] Add MetaUrl and CompactionUrl for "show tablet" stmt (#3962)
* [Enhance] Add MetaUrl and CompactionUrl for "show tablet" stmt

Add MetaUrl and CompactionUrl in result of following stmt:

`show tablet 10010`;

* fix ut

* add doc

Co-authored-by: chenmingyu <chenmingyu@baidu.com>
2020-06-29 19:15:38 +08:00
af1beb6ce4 [Enhance] Add prepare phase for some timestamp functions (#3947)
Fix: #3946 

CL:
1. Add prepare phase for `from_unixtime()`, `date_format()` and `convert_tz()` functions, to handle the format string once for all.
2. Find the cctz timezone when init `runtime state`, so that don't need to find timezone for each rows.
3. Add constant rewrite rule for `utc_timestamp()`
4. Add doc for `to_date()`
5. Comment out the `push_handler_test`, it can not run in DEBUG mode, will be fixed later.
6. Remove `timezone_db.h/cpp` and add `timezone_utils.h/cpp`

The performance shows bellow:

11,000,000 rows

SQL1: `select count(from_unixtime(k1)) from tbl1;`
Before: 8.85s
After: 2.85s

SQL2: `select count(from_unixtime(k1, '%Y-%m-%d %H:%i:%s')) from tbl1 limit 1;`
Before: 10.73s
After: 4.85s

The date string format seems still slow, we may need a further enhancement about it.
2020-06-29 19:15:09 +08:00
0cbacaf01d [Refactor] Replace some boost to std in OlapScanNode (#3934)
Replace some boost to std in OlapScanNode.

This refactor seems solve the problem describe in #3929.
Because I found that BE will crash to calling `boost::condition_variable.notify_all()`.
But after upgrade to this, BE does not crash any more.
2020-06-29 19:13:03 +08:00
d82d48da87 [Doris On ES][Bug-fix] Sync ES metadata failure after restart or upgrade FE (#3961)
ISSUE:#3960
PR #3454 introduce the caching for EsClient, but the initialization of the client was only during editlog replay, all this work should done also during image replay.

This happens when restart or upgrade FE

BTW: modify a UT failure for metric
2020-06-29 14:13:07 +08:00
eecc0c5ec9 fix ut 2020-06-28 14:01:45 +08:00
55c058e4b1 [Compile] modify compile error (#3959) 2020-06-28 10:39:31 +08:00
566a7f1ac7 [Enhance] Add MetaUrl and CompactionUrl for "show tablet" stmt
Add MetaUrl and CompactionUrl in result of following stmt:

`show tablet 10010`;
2020-06-28 10:11:46 +08:00
b2b9e22b24 [CreateTable] Check backend disk has available capacity by storage medium before create table (#3519)
Currently we choose BE random without check disk is available, 
the create table will failed until create tablet task is sent to BE
and BE will check is there has available capacity to create tablet.
So check backend disk available by storage medium will reduce unnecessary RPC call.
2020-06-28 09:36:31 +08:00
3be28460f7 [Bug]Dynamic partition check interval seconds is not right (#3951) 2020-06-27 10:07:39 +08:00
a894b1edc5 [Doris On ES] Split /_cluster/state to [indexName/_mappings, indexName/_search_shards] (#3454)
1. Split /_cluster/state into /_mapping and /_search_shards requests to reduce permissions and make the logic clearer
2. Rename part es related objects to make their representation more accurate
3. Simply support docValue and Fields in alias mode, and take the first one by default

#3311
2020-06-26 17:46:43 +08:00
46c64f0861 [Bug] Enable to get TCP metrics for linux kernel 2.x (#3921)
Fix #3920 

CL:
1. Parse the TCP metrics header in `/proc/net/snmp` to get the right position of the metrics.
2. Add 2 new metrics: `tcp_in_segs` and `tcp_out_segs`
2020-06-24 21:29:07 +08:00
df8f9cc215 [Bug] Unify the timezone (#3910)
When we get default system time zone, it will return `PRC`, which is not supported by us, thus
will cause dynamic partition create failed. Fix #3919

This CL mainly changes:
1. Use a unified method to get the system default time zone
2. Now the default variable `system_time_zone` and `time_zone` is set to the default system
time zone, which is `Asia/Shanghai`.
3. Modify related unit test.
4. Support time zone `PRC`.
2020-06-24 21:28:25 +08:00
wyb
3f7307d685 [Spark Load]Add spark etl job main class (#3927)
1. Add SparkEtlJob class
2. Remove DppResult comment
3. Support loading from hive table directly

#3433
2020-06-24 13:54:55 +08:00
8092aadc83 [Spark Load]Using SparkDpp to complete some calculation in Spark Load (#3729) 2020-06-22 19:58:34 +08:00
3a7b8e98a6 [Spark Load] Doris Support Using Hive Table to Build Global Dict (#3063) 2020-06-22 14:07:36 +08:00
f03abcdfb3 [Spark Load] Rollup Tree Builder (#3727)
1 A tree data structure to describe doris table's rollup
2 A builder to build the data structure
2020-06-22 14:06:33 +08:00
56bb218148 [Bug] Can not use non-key column as partition column in duplicate table (#3916)
The following statement will throw error:
```
create table test.tbl2
(k1 int, k2 int, k3 float)
duplicate key(k1)
partition by range(k2)
(partition p1 values less than("10"))
distributed by hash(k3) buckets 1
properties('replication_num' = '1'); 
```
Error: `Only key column can be partition column`

But in duplicate key table, columns can be partition or distribution column
even if they are not in duplicate keys.

This bug is introduced by #3812
2020-06-22 09:24:21 +08:00