Commit Graph

13073 Commits

Author SHA1 Message Date
c3f4d9ba7f [DOC]In the official website operation manual, add the window function instruction document (#6094) 2021-07-01 09:18:13 +08:00
2a1b2394a2 [Feature] Extract wide common factors (#6083)
This PR mainly adds a rewrite rule 'ExtractCommonFactorsRule'
  used to extract wide common factors in the planning stage for 'Expr'.
The main purpose of this rule is to extract (Range or In) expressions
  that can be combined from each or clause.
E.g:
  Origin expr: (1<a<3 and b in ('a') ) or (2<a<4 and b in ('b'))
  Rewritten expr: (1<a<4 ) and (b in ('a', 'b')) and ((1<a<3 and b in ('a') ) or (2<a<4 and b in ('b')))
Although the range of the wide common factors is larger than the real range,
  the wide common factors only involve a single column, so it can be pushed down to the scan node,
  thereby reducing the amount of scanned data in advance and improving the query speed.

It should be noted that this optimization strategy is not for all scenarios.
When filter rate of the wide common factor is too low,
  the query will consume an extra time to calculate the wide common factors.

So this strategy can be switched by configuring session vairables 'extract_wide_range_expr'.
The default policy is enabled which means this strategy takes effect.
If you encounter unsatisfactory filtering rate, you can set the variable to false.
It will turn off the strategy.

Fixed #6082
2021-07-01 09:17:57 +08:00
b69ebc3ec4 [Extension] Add DataX doriswriter extension directory (#6111)
This CL only add the script for building DataX development environment
2021-06-30 09:55:19 +08:00
6441a4c0ca [Metrics] Add metrics for load average. (#6069)
Added load average metrics to be for monitoring system load.
See iseue #6068 for a detailed explanation.
2021-06-30 09:28:45 +08:00
ce49fa5968 [Bug][DynamicPartition] Take table_id as key of runtimeInfo (#6053)
Co-authored-by: wangxixu <wangxixu@xiaomi.com>
2021-06-30 09:28:26 +08:00
f254870aeb [Optimize] Put _Tuple_ptrs into mempool when RowBatch is initialized (#6036) 2021-06-30 09:27:53 +08:00
28e7d01ef7 [FlinkConnector] Support time interval for flink connector (#5934) 2021-06-30 09:27:12 +08:00
a475d3e3a4 [Bug][Export]ExportJob ErrorMsg is UNKNOWN when job cancelled cause by time out (#5915) 2021-06-30 09:26:48 +08:00
e3899aa3e7 [Metrics] Add tablet count on different status metrics (#5787)
Based on these metrics, we can add alerts to remind
admins whether the cluster is healthy or not.
2021-06-30 09:25:42 +08:00
ba84eacb8c (#6009) fix bucket key distribute error when using spark load (#6087) 2021-06-29 12:30:08 +08:00
b0c7cd4b2a [Docs] Add contact us in README (#6097) 2021-06-29 11:36:12 +08:00
513b1e7358 [Docs] ADD: fe-idea-dev.md add thrift version notice (#6104) 2021-06-29 11:35:55 +08:00
fe65a623c1 Fix timeout error when delete condition contains invalid datetime format (#6030)
* add date time format check in delete statment
2021-06-29 09:47:42 +08:00
dd455af844 fix stdfs link error in some gcc version (#6090) 2021-06-26 14:09:00 +08:00
6651d3bf2a SIMD instruction speed up the storage layer (#6089)
* SIMD instruction speed up the storage layer

* 1. add DECHECK in power of 2 int32
2. change vector to array deduce the cost
2021-06-25 11:04:32 +08:00
45ec084a6d fix (#6081) 2021-06-24 09:44:24 +08:00
4870fd47fc [Docs] Fix README.md (#6084) 2021-06-24 09:44:14 +08:00
2998373354 [Bug] Fix bug that select into outfile in parquet format may cause NPE (#6054)
1. check the parquet schema property on FE side.
2. auto generate parquet schema if not specified.
2021-06-23 11:33:47 +08:00
c8899ee5bd [Build][ARM] Fix some compilation problems on ARM64 (#6076)
1. Disable libhdfs3 on ARM, because it doesn't support ARM now.
2. Add compilation doc for ARM64
2021-06-23 09:38:16 +08:00
72d1a3b39c fix spring boot web maximum upload file limit config (#6070)
Co-authored-by: zouxinyi <zouxinyi@baidu.com>
2021-06-22 10:46:26 +08:00
b9ad34736d [Feature] Support recording custom number of backup and restore task information (#5947)
* Record all backup jobs and support where clause
2021-06-22 09:19:54 +08:00
abcd56c6c8 [Enhance] Support show unrecoverable tablets (#6045)
* [Enhance] Support show unrecoverable tablets

The unrecoverable tablets are tablets which non of their replicas are healthy.
We should be able to find out these tablets then manual intervention.

And these tablets should not be added to the tablet scheduler.
2021-06-22 09:19:12 +08:00
68bab73c35 [Bug] Fix select random storage path maybe same at a long time (#6062)
random_shuflle will generate same random sequence when call multiple times,
although we use twice random, but when there is no change in the size relationship
between the adjacent numbers, the result of the second shuffle will not change either
2021-06-20 16:16:32 +08:00
882ebd3d7d [Bug] Fix show data bug (#6060) 2021-06-20 16:15:54 +08:00
5b2d07ca2f [Bug] Fix disk TotalUsedPct display error (#6059)
Fix TotalUsedPct display error
2021-06-20 16:15:39 +08:00
5dabf0bef5 [Alter] validate data file after alter operation success (#6022)
Co-authored-by: wangxixu <wangxixu@xiaomi.com>
2021-06-20 16:15:14 +08:00
1d796d9aa4 [Bug] Fix bug that routine load job may cause dead lock (#6058)
To make source the routine load job's lock must be released after txn aborted
2021-06-20 16:14:47 +08:00
fe0912f6e5 [SQL] Compatible with mysql nulls order by (#6043) 2021-06-20 16:12:52 +08:00
bf2423c91a [httpv2] Spring boot http upload file maximum limit parameterization (#6013)
spring.servlet.multipart.max-file-size and spring.servlet.multipart.max-request-size  Configurable
2021-06-20 16:10:54 +08:00
9bc2df43a7 [Bug][Export] Fix bug of one more record showed in the “show export limit n" (#6012) 2021-06-20 16:10:21 +08:00
4fe8bdfe1d [Doc] Update install-deploy.md (#5968)
improve doc to avoid error of installing be in hadoop cluster
2021-06-20 16:09:13 +08:00
bff6ede94e add data size field for partition cache (#6026)
Co-authored-by: wangxixu <wangxixu@xiaomi.com>
2021-06-18 11:40:00 +08:00
1999a0c26b [optimization] open gcc strict-aliasing optimization (#6034)
* open gcc strict-aliasing optimization

* use -Werror=strick-alias
2021-06-18 11:39:24 +08:00
48bd680068 update download info for boost and datatables (#6008) 2021-06-18 11:38:41 +08:00
5cfe081b05 [Bug] Remove duplicate memtracker (#6041)
* [Enhanece] Remove duplicate memtracker

This problem will cause frequent creation of memtracker and affect query concurrency.
2021-06-18 11:28:37 +08:00
ff47dc750d [Bug] Fix problem for thread safety issues and setting the status of non-existent replica does not prompt any error message (#6019)
Co-authored-by: caiconghui <caiconghui@xiaomi.com>
2021-06-18 10:50:47 +08:00
0ddd5da926 [DOC]Organize FE configuration file description (#5975)
* Organize FE configuration file description

Organize FE configuration file description

* Delete redundant numbers

Delete redundant numbers

* Add two configuration parameters of spring boot upload file

Add two configuration parameters of spring boot upload file

* Add configuration instructions

Add configuration instructions

* Fix typos

Fix typos

* Add English documentation of BE configuration

Add English documentation of BE configuration

* Modify style

Modify style

* Modify punctuation

Modify punctuation

* Correct the errors in the text

Correct the errors in the text

* Modify some ads and content issues

Modify some ads and content issues
2021-06-18 09:22:29 +08:00
99d8110972 [Bug-fix] Fix wrong data distribution judgment (#6029)
* [Bug-fix] Fix wrong data distribution judgment

The Fragment where OlapScanNode is located has three data distribution possibilities.
1. UNPARTITIONED: The scan range of OlapScanNode contains only one instance(BE)
2. RANDOM: Involving multi-partitioned tables in OlapScanNode.
3. HASH_PARTITIONED: The involving table is in the colocate group.

For a multi-partition table, although the data in each individual partition is distributed according to the bucketing column,
the same bucketing column between different partitions is not necessarily in the same be.
So the data distribution is RANDOM.

If Doris wrongly plan RANDOM as HASH_PARTITIONED, it will lead to the wrong colocate agg node.
The result of query is incorrect.
2021-06-18 09:21:46 +08:00
9f52f4f9e5 fix stream load error msg missing (#6050)
Co-authored-by: weizuo <weizuo@xiaomi.com>
2021-06-18 09:21:12 +08:00
d7e62e361f [Bug] Fix that build thirdparty of parallel-hashmap-1.33 failed on ubuntu18.04 (#6033)
Co-authored-by: caiconghui <caiconghui@xiaomi.com>
2021-06-17 14:45:34 +08:00
d57c2344e1 [MemTracker] Refactored the hierarchical structure of memtracker (#5956)
To avoid showing too many memtracker on BE web pages.
The MemTracker level now has 3 levels: OVERVIEW, TASK and VERBOSE.

OVERVIEW Mainly used for main memory consumption module such as Query/Load/Metadata.
TASK is mainly used to record the memory overhead of a single task such as a single query, load, and compaction task.
VERBOSE is used for other more detailed memtrackers.
2021-06-16 09:44:24 +08:00
0145bdb1f0 [Doc] Fix a typo (#6025)
Fix a typo in udaf-orthogonal-bitmap-manual.md
2021-06-16 09:41:32 +08:00
d2c1cddd55 [Bug-fix] Avoid using 'QueryDetail' in planning stage (#6018)
QueryDetail is used to statistic the current query details.
This property will only be set when the query starts to execute.
So in the query planning stage, using this attribute in the first query will cause 'NullPointerException'.
After that, this attribute retains the value of the previous query
until it is updated by the subsequent process.
Because code of 'colocateagg' uses this attribute incorrectly in its planning,
it causes 'NullPointerException' when clients like pymysql connect to doris and send the first query.
Fixed #6017
2021-06-16 09:40:53 +08:00
800c2c41bd [Docs] update data-model-rollup.md create table ddl (#6014)
update data-model-rollup.md create table ddl
2021-06-16 09:40:38 +08:00
bde60280b8 [Optimize] use string_view instead of std::string in string function (#6010) 2021-06-16 09:40:13 +08:00
daf8ce29ca [Bug] Fix bucket shuffle bug when left table is without any data (#5965) 2021-06-16 09:39:31 +08:00
8b4721c941 [Bug] Fix kafka consumer reuse bug (#6007)
When judging whether consumer can be reused, it is necessary to judge whether the parameter content is equal.
2021-06-16 09:39:05 +08:00
6d6c3d9703 [Enhancement] Reduce memory consumption by releasing readers earier (#5811)
We created multiple rowset readers to read data of one tablet,
after one rowset reader has reached EOF, it can be released to
reduce resource (typically memory) consumption.
As the same, we can release segment reader when it reach EOF.
2021-06-16 09:37:50 +08:00
d0b60541af [Bug] fix use uncorrect table name in expand star (#6003)
SelectStmt use new TableName(null, tableRef.getAlias()) to expand star expression. tableRef.getAlias() is full name include database name and table name. 
Using it as table name will generate wrong sql in CreateViewStmt. 
This patch fix this problem and use correct database name and table name in expand star method.
2021-06-15 14:18:00 +08:00
54c7d177f8 [Log] Fix a log issue in BDBJournalCursor (#6006) 2021-06-10 17:39:25 +08:00