Commit Graph

1714 Commits

Author SHA1 Message Date
807499427c unregister fragment mem tracker in close() (#3286)
ref https://github.com/apache/incubator-doris/issues/3273

P.S.
614a76beea/be/src/runtime/plan_fragment_executor.cpp (L559-L562)
I think this piece of code is useless.
This `_mem_tracker` in `PlanFragmentExecutor` is set as fragment_mem_tracker of `RuntimeState`.

**direct use**
We use it in these code, when rowbatch reset, mem tracker's consumption will be released.
7eab12a40e/be/src/exec/olap_rewrite_node.cpp (L57-L58)
839ec45197/be/src/exec/olap_scan_node.cpp (L1217-L1218)

**other usage** 
e.g.
6c33f80544/be/src/exec/olap_scanner.cpp (L245)
won't consume the fragment mem tracker. We don't need to worry about the fragment mem tracker consumption is not zero when we want to destroy it.

Or we can add a consumption check before we close the mem tracker?
2020-04-13 23:15:56 +08:00
a467c6f81f [ES Connector] Add field context for string field keyword type (#3305)
This PR is just a transitional way,but it is better to move the predicates transformation from Doris BE to Doris BE, in this way, Doris BE is responsible for fetching data from ES.

 Add a  `enable_keyword_sniff ` configuration item in creating External Elasticsearch Table ,it default to true , would to sniff the `keyword` type on the `text analyzed` Field and return the `json_path` which substitute the origin col name.

```
CREATE EXTERNAL TABLE `test` (
  `k1` varchar(20) COMMENT "",
  `create_time` datetime COMMENT ""
) ENGINE=ELASTICSEARCH
PROPERTIES (
"hosts" = "http://10.74.167.16:8200",
"user" = "root",
"password" = "root",
"index" = "test",
"type" = "doc",
"enable_keyword_sniff" = "true"
);
```
note: `enable_keyword_sniff` default to  "true"

run this SQL:

```
select * from test where k1 = "wu yun feng"
```
 Output predicate DSL:

```
{"term":{"k1.keyword":"wu yun feng"}}
```
and in this PR, I remove the elasticsearch version detected logic for now this is useless, maybe future is needed.
2020-04-13 23:07:33 +08:00
be090f5929 Use read lock when iterate tablet_map in TabletManager::start_trash_sweep (#3294) 2020-04-13 11:18:33 +08:00
7c07083cd5 Forbidden multi subquery in having clause (#3291)
Multiple subqueries in the having statement need to be rewritten into multiple tables for join. The current rewriting rules need to be transformed.
And this writing is not common, and there is no strong requirement from the business side.
This function will be added later if it is required.
2020-04-11 21:56:08 +08:00
5b69c70f9a [Bug] Fix bug that user plugin dir is removed after installing the plugin (#3302)
When user install a FE plugin from a directory, the directory should not
be removed after installing.
2020-04-11 20:30:14 +08:00
3086790e06 Fix bug when use ZoneMap/BloomFiter on column with REPLACE/REPLACE_IF_NOT_NULL (#3288)
Now, column with REPLACE/REPLACE_IF_NOT_NULL can be filtered by ZoneMap/BloomFilter
when the rowset is base(version starts with zero). Always we think is an optimization.
But when some case, it will occurs bug.

create table test(
  k1 int,
  v1 int replace,
  v2 int sum
);
If I have two records on different two versions

1 2 2 on version [0-10]
1 3 1 on version 11
If I perform a query

select * from test where k1 = 1 and v1 = 3;
The result will be 1 3 1, this is not right because of the first record is filtered.
The right answer is 1 3 3, the v2 should be summed.
Remove this optimization is necessity to make the result is right.
2020-04-10 10:22:21 +08:00
f39c8b156d [refactor] A small refactor on class DataDir (#3276)
main refactor points are:
- Use a single get_absolute_tablet_path function instead of 3
  independent functions
- Remove meaningless return value of register_tablet and deregister_tablet
- Some typo and format
2020-04-10 00:32:22 +08:00
ce1d5ab9ab [Bug] Fix some bugs of install/uninstall plugins (#3267)
1. Avoid losing plugin if plugin failed to load when replaying
    When in replay process, the plugin should always be added to the plugin manager,
    even if that plugin failed to be loaded.

2. `show plugin` statement should show all plugins, not only the successfully installed plugins.

3. plugin's name should be unique globally and case insensitive.

4. Avoid creating new instances of plugins when doing metadata checkpoint.

5. Add a __builtin_ prefix for builtin plugins.
2020-04-09 23:04:28 +08:00
a5703ef114 [Performance] Support sharding txn_map_lock into more small map locks to make good performance for txn manage task (#3222)
This PR is to enhance the performance for txn manage task, when there are so many txn in 
BE, the only one txn_map_lock and additional _txn_locks may cause poor performance, and 
now we remove the additional _txn_locks and split the txn_map_lock into many small locks.
2020-04-09 22:35:15 +08:00
037bc53b54 [BUG] Fix cast result expr bug (#3279)
When the result type is a date type, the result expr type should not be cast.
Because in the FE function, the specific type of the date type is determined by the actual
type of the return value, not by the function return value type.

For example, the function `str_to_date` may return DATE or DATETIME, depends on the
format pattern.

DATE:
```
mysql> select str_to_date('11/09/2011', '%m/%d/%Y');
+---------------------------------------+
| str_to_date('11/09/2011', '%m/%d/%Y') |
+---------------------------------------+
| 2011-11-09                            |
+---------------------------------------+
```

DATETIME:
```
mysql> select str_to_date('2014-12-21 12:34:56', '%Y-%m-%d %H:%i:%s');
+---------------------------------------------------------+
| str_to_date('2014-12-21 12:34:56', '%Y-%m-%d %H:%i:%s') |
+---------------------------------------------------------+
| 2014-12-21 12:34:56                                     |
+---------------------------------------------------------+
2020-04-09 22:02:05 +08:00
8699bb7bd4 [Query] Optimize where clause by extracting the common predicate in the OR compound predicate. (#3278)
Queries like below cannot finish in a acceptable time, `store_sales` has 2800w rows, `customer_address` has 5w rows, for now Doris will create only one cross join node to execute this sql, 
the time of eval the where clause is about 200-300 ns, the total count of eval will be  2800w * 5w, this is extremely large, and this will cost 2800w * 5w * 250 ns = 4 billion seconds;

```
select avg(ss_quantity)
       ,avg(ss_ext_sales_price)
       ,avg(ss_ext_wholesale_cost)
       ,sum(ss_ext_wholesale_cost)
 from store_sales, customer_address 
 where  ((ss_addr_sk = ca_address_sk
  and ca_country = 'United States'
  and ca_state in ('CO', 'IL', 'MN')
  and ss_net_profit between 100 and 200  
     ) or
     (ss_addr_sk = ca_address_sk
  and ca_country = 'United States'
  and ca_state in ('OH', 'MT', 'NM')
  and ss_net_profit between 150 and 300  
     ) or
     (ss_addr_sk = ca_address_sk
  and ca_country = 'United States'
  and ca_state in ('TX', 'MO', 'MI')
  and ss_net_profit between 50 and 250  
     ))
```

but this  sql can be rewrite to 
```
select avg(ss_quantity)
       ,avg(ss_ext_sales_price)
       ,avg(ss_ext_wholesale_cost)
       ,sum(ss_ext_wholesale_cost)
 from store_sales, customer_address 
 where ss_addr_sk = ca_address_sk
  and ca_country = 'United States' and (((ca_state in ('CO', 'IL', 'MN')
  and ss_net_profit between 100 and 200  
     ) or
     (ca_state in ('OH', 'MT', 'NM')
  and ss_net_profit between 150 and 300  
     ) or
     (ca_state in ('TX', 'MO', 'MI')
  and ss_net_profit between 50 and 250  
     ))
 )
```
there for  we can do a hash join first and then use 
```
(((ca_state in ('CO', 'IL', 'MN')
  and ss_net_profit between 100 and 200  
     ) or
     (ca_state in ('OH', 'MT', 'NM')
  and ss_net_profit between 150 and 300  
     ) or
     (ca_state in ('TX', 'MO', 'MI')
  and ss_net_profit between 50 and 250  
     ))
 )
```
to filter the value,

in TPCDS 10g dataset,  the rewritten sql only cost about 1 seconds.
2020-04-09 21:57:45 +08:00
3dc7ef634b [Dependency]Add cctz lib (#3280)
Add Google/CCTZ lib in Doris
2020-04-09 19:14:09 +08:00
e32ed28bf4 [Storage] Use getmntent_r() for thread-safe (#3284) 2020-04-09 14:19:09 +08:00
614a76beea [Doris on ES] Support compound_and predicate push down to Elasticsearch (#3277)
Relate Issue: https://github.com/apache/incubator-doris/issues/3248


SQL:

```
select * from test where (k2 = 6 and k3 = 1) or (k2 = 2 and k3 =3 and k4 = 'beijing');
```

Output filter:

```
((#k2:[6 TO 6] #k3:[1 TO 1]) (#(#k2:[2 TO 2] #k3:[3 TO 3]) #k4:beijing))~1
```

SQL:

```
select * from test where (k2 = 6 or k3 = 7) or (k2 = 2 and k3 =3 and (k4 = 'beijing' or k4 = 'zhaochun'));
```
Output filter:

```
(k2:[6 TO 6] k3:[7 TO 7] (#(#k2:[2 TO 2] #k3:[3 TO 3]) #((k4:beijing k4:zhaochun)~1)))~1
```

SQL:

```
select * from test where (k2 = 6 or k3 = 7) or (k2 = 2 and abs(k3) =3 and (k4 = 'beijing' or k4 = 'zhaochun'));
```

Output filter (`abs` can not be pushed down to es, so doris on es would not process this scenario ):

```
match_all
```
2020-04-08 21:09:39 +08:00
f37dbbc890 Fix openssl download url is not avaliable (#3281) 2020-04-08 19:00:48 +08:00
3557b12de5 [Bug] Avoid compacting recengly added rowset (#3271)
This CL fixes #3270 by skipping recently added version when performing cumulative compaction. A new config named "cumulative_compaction_skip_window_seconds" is added to adjust the time window.
2020-04-08 18:58:12 +08:00
8fc284d593 [config] Support to modify configs when BE is running without restarting (#3264)
In the past, when we want to modify some BE configs, we have to modify be.conf and then restart BE.
This patch provides a way to modify configs in the type of 'threshold', 'interval', 'enable flag'
when BE is running without restarting it.
You can update a single config once by BE's http API: `be_host:be_http_port/api/update_config?config_name=new_value`
2020-04-08 11:17:47 +08:00
d110629a5f Optimize performance of TxnManager::build_expire_txn_map (#3269)
It's not possible to insert duplicated transaction ids for a specific tablet, therefore we could use map<TabletInfo, vector<int64_t>> instead of map<TabletInfo, set<int64_t>> for expire_txn_map.
2020-04-07 23:54:05 +08:00
162b1c5d8b [Storage] Open data dirs parallelly (#3260) 2020-04-07 20:59:56 +08:00
d0f87728e0 [Doc] Add example of timeout property in alter table stmt (#3274) 2020-04-07 19:51:16 +08:00
c9c58342b2 [License] Add License to codes (#3272) 2020-04-07 16:35:13 +08:00
1ef4cb2d24 [Bug] Base compaction failed because of overlapping of input rowsets (#3262)
When calculating the cumulative point at first time, we should stop increasing
the cumulative point when we meet a rowset with overlap flag as OVERLAPPING,
even if it has only one segments.
2020-04-07 11:26:57 +08:00
79bac50361 Fix the bug that 'username' in broker load is invalid (#3237) 2020-04-06 22:15:37 +08:00
2ed184e06a Add config: tablet writer open rpc timeout (#3258) 2020-04-03 16:43:56 +08:00
d2307c719c Fix be unit test error (#3259) 2020-04-03 15:02:49 +08:00
a86161f6ce [Bug]Fix compile error (#3257) 2020-04-03 13:38:44 +08:00
3f247b0d2d Fix cast date type return wrong result (#3214)
We have multiple date type, and we also need to cast between different date types.
If not cast, it will cause problems when binarypredicate
2020-04-03 12:08:18 +08:00
881661ac10 Fix spell error (#3255) 2020-04-03 10:43:09 +08:00
fcb651329c [Plugin] Making FE audit module pluggable (#3219)
Currently we have implemented the plugin framework in FE. 
This CL make the original audit log logic pluggable.
The following classes are mainly implemented:

1. AuditPlugin
    The interface of audit plugin

2. AuditEvent
    An AuditEvent contains all information about an audit event, such as a query, or a connection.

3. AuditEventProcessor
    Audit event processor receive all audit events and deliver them to all installed audit plugins.

This CL implements two audit module plugins:

1. The builtin plugin `AuditLogBuilder`, which act same as the previous logic, to save the 
    audit log to the `fe.audit.log`

2. An optional plugin `AuditLoader`, which will periodically inserts the audit log into a Doris table
    specified by the user. In this way, users can conveniently use SQL to query and analyze this
    audit log table.

Some documents are added:

1. HELP docs of install/uninstall/show plugin.
2. Rename the `README.md` in `fe_plugins/` dir to `plugin-development-manual.md` and move
    it to the `docs/` dir
3. `audit-plugin.md` to introduce the usage of `AuditLoader` plugin.

ISSUE: #3226
2020-04-03 09:53:50 +08:00
c9ff6f68d1 Fix Rewrite count distinct bitmap and hll order by bug (#3251) 2020-04-03 09:08:27 +08:00
d14726e05b Fix join hints not work when need table reorder (#3188)
* fix join hints not work when need table reorder
fix cross join numNodes not computed

* fix some typo

* disable table reorder when has join hints
2020-04-02 17:13:35 +08:00
390f462f55 [Bug] Fix read schema change job meta bug (#3244) 2020-04-02 12:31:46 +08:00
6252a271dd Rewrite count distinct bitmap and hll in order by and having (#3232) 2020-04-02 09:11:42 +08:00
29b37dad49 Sql reference of materialized view (#3208)
* Sql reference of materialized view

Sql reference of Create and drop materialized view in English and Chinese.

* Change discription
2020-04-01 21:22:19 +08:00
9c937180cd [Alter]Clean SchemaChangeJobV2 when schema change CANCELLED or FINISHED (#3212)
SchemaChangeJobV2 will use too much memory in FE, which may cause FullGC. But these data is useless after job is done, so we need to clean it up.

NOTICE: update FE meta version to 80
2020-04-01 21:05:17 +08:00
63cee94c5c Fix output results may incorrect when using intersect and except statements (#3228)
output results may  incorrect  when using intersect and except statements
2020-04-01 20:58:43 +08:00
34993a69a8 Fix colocate relocateGroup bug after decommission (#3239) 2020-04-01 18:50:36 +08:00
6a9a62901f Fix bug of memory limit when group by varchar columns. (#3242)
select date_format(k10, '%Y%m%d') as myk10 from baseall group by myk10;
The date_format function in query above will be stored in MemPool during
the query execution. If the query handles millions of rows, it will
consume much memory. Should clear the MemPool at interval.
2020-04-01 18:48:18 +08:00
8a2eb8fbcf [Bug][segment_v2] Fix a bug that NullBitmapBuilder is not reset when data page doesn't have null (#3240)
This CL fixes a bug that could cause wrong answer for beta rowset with nullable column. The root cause is that NullBitmapBuilder is not reset when the current page doesn't contain NULL, which leads to wrong null map to be written for the next page.

Added a test case to reproduce the problem.
2020-04-01 18:39:04 +08:00
028da655a9 Increased compatibility with mysql (#3235)
Add divPrecisionIncrement and utf8-superset transform
2020-04-01 09:57:00 +08:00
68a801ffbe Support Java version 64 bits Integers for BITMAP type (#3090)
Fork from roaringbitmap's Roaring64NavigableMap, overwrite serialize/deserialize method to keep compatibility with be's bitmap storage format
2020-03-31 15:29:41 +08:00
0554e89645 [Alter] Fix bug of assertion failure when submitting schema change job (#3181)
When creating a schema change job, we will create a corresponding shadow replica for each replica.
Here we should check the state of the replica and only create replicas in the normal state.

The process here may need to be modified later. We should completely allow users to submit alter jobs
under any circumstances, and then in the job scheduling process, dynamically detect changes in the replicas
and do replica repairs, instead of forcing a check on submission.
2020-03-31 12:06:30 +08:00
e9b3584d45 [Bug] Fix bug that desc tbl all stmt throw error: Malformed packet (#3233) 2020-03-31 10:29:53 +08:00
4131afe316 [Bug] NPE when using unknown function in broker load process (#3225)
This CL fix the bug described in issue #3224 by

1. Forbid UDF in broker load process
2. Improving the function checking logic to avoid NPE when trying to
   get default database from ConnectionContext.
2020-03-30 18:34:41 +08:00
2e1a0030bc Add some connect samples (#3221)
Add connect samples for golang, java , nodejs, php, python.
2020-03-30 13:54:36 +08:00
5f9359d618 Use SleepFor() instead of usleep() (#3211) 2020-03-29 14:18:19 +08:00
e4682398bd [web] Dump configs on BE's website '/varz' (#3220)
Dump configs on BE's website '/varz'
Change NAVIGATION_BAR_PREFIX from 'Impala' to 'Doris'
Format the related files by clang-format
2020-03-28 16:26:38 +08:00
41f1ab006b Add curdate/now function in fe (#3215) 2020-03-28 13:39:54 +08:00
6cf217f0c7 Fix WARNING to WARN in fe.conf sys_log_level (#3218)
When I used it, I changed it to WARING in the comments, and the log didn't work because there was no warning-level log in Java
2020-03-28 10:13:15 +08:00
4a5164ab9d Fix 'Filesystem closed' in broker load (#3216) 2020-03-28 09:14:45 +08:00