Format of some docs are incorrect for building the doc website.
* fix a bug that `gensrc` dir can not be built with -j.
* fix ut bug of CreateFunctionTest
This CL implements 3 new operations:
```
ALTER TABLE tbl ADD TEMPORARY PARTITION ...;
ALTER TABLE tbl DROP TEMPORARY PARTITION ...;
ALTER TABLE tbl REPLACE TEMPORARY PARTITION (p1, p2, ...);
```
User manual can be found in document:
`docs/documentation/cn/administrator-guide/alter-table/alter-table-temp-partition.md`
I did not update the grammar manual of `alter-table.md`.
This manual is too confusing and too big, I will reorganize this manual after.
This is the first part to implement the "overwrite load" feature mentioned in issue #2663.
I will implement the "load to temp partition" feature in next PR.
This CL also add GSON serialization method for the following classes (But not used):
```
Partition.java
MaterializedIndex.java
Tablet.java
Replica.java
```
The issue is #3011.
Reset the tablet and scan range info before compute it.
The old rollup selector has computed tablet and scan range info.
Then the new mv selector maybe compute tablet and scan range info again sometimes.
So, we need to reset those info in here.
Before this commit, the result is double when query is "select k1 ,k2 from aggregate_table "
This PR is to remove some unused log for unauthorized exception, some unauthorized access such as LVS probe request may cause connection exception which we should ignore.
There is a case where the META upload succeeded but the upload INFO failed, in which case the UPLOAD_INFO task will try again, but the META file has succeeded and filename.part has been renamed to `filename.md5sum`. The retry task will keep failing with rename and cannot complete the backup job. Therefore, the `file.md5sum` file needs to be deleted in advance
Fix#3001
This commit mainly implements the new materialized view selector which supports SPJ<->SPJG.
Two parameters are currently used to regulate this function.
1. test_materialized_view: When this parameter is set to true, the user can create a materialized view for the duplicate table by using 'CREATE MATERIALIZED VIEW' command.
At the same time, if the result of the new materialized views is different from the old version during the query, an error will be reported. This parameter is false by default, which means that the new version of the materialized view function cannot be enabled.
2. use_old_mv_selector: When this parameter is set to true, the result of the old version selector will be selected. If set to false, the result of the new version selector will be selected. This parameter is true by default, which means that the old selector is used.
If the default values of the above two parameters do not change, there will be no behavior changes in the current version.
The main steps for the new selector are as follows:
1. Predicates stage: This stage will mainly filter out all materialized views that do not meet the current query requirements.
2. Priorities stage: This stage will sort the results of the first stage and choose the best materialized view.
The predicates phase is divided into 6 steps:
1. Calculate the predicate gap between the current query and view.
2. Whether the columns in the view can meet the needs of the compensating predicates.
3. Determine whether the group by columns of view match the group by columns of query.
4. Determine whether the aggregate columns of view match the aggregate columns of query.
5. Determine whether the output columns of view match the output columns of query.
6. Add partial materialized views
The priorities phase is divided into two steps:
1. Find the materialized view that matches the best prefix index
2. Find the materialized view with the least amount of data
The biggest difference between the current materialized view selector and the previous one is that it supports SPJ <-> SPJG.
In this CL, the isAlive field in FsBroker class will be persisted in metadata, to solve the
problem describe in ISSUE: #2989
Notice: this CL update FeMetaVersion to 73
This CL modify 2 things:
1. When a routine load task submit failed, it will not be put back to the task queue.
2. The rpc timeout when executing a routine load task in BE is set to `query_timeout` of the task plan.
ISSUE: #2964
1. Fix the bug introduced by https://github.com/apache/incubator-doris/pull/2947.
The following sql result is 0000, which is wrong. The result should be 1601
```
select date_format('2020-02-19 16:01:12','%H%i');
```
2. Add constant Express plan test, ensure the FE constant Express compute result is right.
3. Remove the `castToInt ` function in `FEFunctions`, which is duplicated with `CastExpr::getResultValue`
4. Implement `getNodeExplainString` method for `UnionNode`
The logic chain is following:
1. `date_format(if(, NULL, `dt`), '%Y%m%d')` as HASH_PARTITIONED exprs,which is not right, we should use Agg intermediate materialized slot
2. we don't use Agg intermediate materialized slot as HASH_PARTITIONED exprs, becasue
```
// the parent fragment is partitioned on the grouping exprs;
// substitute grouping exprs to reference the *output* of the agg, not the input
partitionExprs = Expr.substituteList(partitionExprs,
node.getAggInfo().getIntermediateSmap(), ctx_.getRootAnalyzer(), false);
parentPartition = DataPartition.hashPartitioned(partitionExprs);
```
the partitionExprs substitute failed。
3. partitionExprs substitute failed because partitionExprs has a casttodate child,but agg info getIntermediateSmap has a cast in datetime child.
4. The cast to date or cast to datetime child exist because `TupleIsNullPredicate` insert a `if` Expr. we don't have `if date` fn, so Doris use `if int` Expr.
5. the `date` in the `catstodate` depend on slot dt date type. the `datetime` in the `catstodatetime` depend on datetime arg type in `date_format` function.
So we could fix this issue by make if fn support date type or make date_format fn support date type
This CL modify the `evalExpr()` of ExpressionFunctions, so that it won't change the
`FunctionCallExpr` to `NullLiteral` when there is null parameter in UDF. Which will fix the
problem described in ISSUE: #2913
In the current implementation, the state of the table will be set until the next round of job scheduling. So there may be tens of seconds between job completion and table state changes to NORMAL.
And also, I made the synchronized range smaller by replacing the synchronized methods with synchronized blocks, which may solve the problem described in #2903
fix a bug when using grouping sets without all column in a grouping set item will produce wrong value.
fix grouping function check will not work in group by clause
This CL implements a simulated FE process and a simulated BE service.
You can view their specific usage methods at
`fe/src/test/java/org/apache/doris/utframe/DemoTest.java`
At the same time, I modified the configuration of the maven-surefire-plugin plugin,
so that each unit test runs in a separate JVM, which can avoid conflicts caused by
various singleton classes in FE.
Starting a separate jvm for each unit test will bring about 30% extra time overhead.
However, you can control the number of concurrency of unit tests by setting the `forkCount`
configuration of the maven-surefire-plugin plugin in `fe/pom.xml`. The default configuration
is still 1 for easy viewing of the output log. If set to 3, the entire FE unit test run time is about
4 minutes.
**Describe the bug**
**First**, In the broker load, we allow users to add multiple data descriptions. Each data description
represents a description of a file (or set of files). Including file path, delimiter, table and
partitions to be loaded, and other information.
When the user specifies multiple data descriptions, Doris currently aggregates the data
descriptions belonging to the same table and generates a unified load task.
The problem here is that although different data descriptions point to the same table,
they may specify different partitions. Therefore, the aggregation of data description
should not only consider the table level, but also the partition level.
Examples are as follows:
data description 1 is:
```
DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file1")
INTO TABLE `tbl1`
PARTITION (p1, p2)
```
data description 2 is:
```
DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file2")
INTO TABLE `tbl1`
PARTITION (p3, p4)
```
What user expects is to load file1 into partition p1 and p2 of tbl1, and load file2 into paritition
p3 and p4 of same table. But currently, it will be aggregated together, which result in loading
file1 and file2 into all partitions p1, p2, p3 and p4.
**Second**, the following 2 data descriptions are not allowed:
```
DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file1")
INTO TABLE `tbl1`
PARTITION (p1, p2)
DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file2")
INTO TABLE `tbl1`
PARTITION (p2, p3)
```
They have overlapping partition(p2), which is not support yet. And we should throw an Exception
to cancel this load job.
**Third**, there is a problem with the code implementation. In the constructor of
`OlapTableSink.java`, we pass in a string of partition names separated by commas.
But at the `OlapTableSink` level, we should be able to pass in a list of partition ids directly,
instead of names.
ISSUE: #2823
Support replication_num setting for table level, so There is no need for user to set replication_num for every alter table add partition statement.
eg:
`alter table tbl set ("default.replication_num" = "2");`
If we connect to a non-master FE and execute `show routine load;`. It may sometimes
throw Unknown Exception, because some of fields in thrift result is not set.
From January 15th, 2020, Requests to http://repo1.maven.org/maven2/ return a 501 HTTPS Required status.
So switch central repository url from http to https
Support Grouping Sets, Rollup and Cube to extend group by statement
support GROUPING SETS syntax
```
SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS ( (a, b), (a), (b), ( ) );
```
cube or rollup like
```
SELECT a, b,c, SUM( d ) FROM tab1 GROUP BY ROLLUP|CUBE(a,b,c)
```
[ADD] support grouping functions in expr like grouping(a) + grouping(b) (#2039)
[FIX] fix analyzer error in window function(#2039)