Commit Graph

1761 Commits

Author SHA1 Message Date
200210e708 [fix] (ut) fix fe unit test failed, this is because we fix the MAX_PHYSICAL_PACKET_LENGTH to 0xffffff 2021-12-06 11:13:01 +08:00
bffc2836d7 [fix](show) Fix bug that AdminShowDataSkew operation may cause fe oom (#7297) 2021-12-06 10:32:00 +08:00
e080afa186 [typo] update comment of MasterDaemon (#7285)
The comment of MasterDaemon is out of date, may misguide reader.
2021-12-06 10:30:48 +08:00
974ab9b90c [improvement](bdbje) clean too many bdbje log (#7273)
In an HA environment, JE will retains as many reserved files.
the jdbje log become too large.
so we should limit the reserved files size, default set 1GB
2021-12-06 10:28:36 +08:00
4bfee42ba1 [feature-wip](lateral view) Support lateral view based on subquery (#7269)
Support lateral view of the result column in subquery.
For example:
  ```
  select e1 from (select k2 as a from test_explode group by a) tmp1
  lateral view explode_split(a, ",") tmp2 as e1;
  ```
The lateral view will parse the inline view column
and put the table function node above the subquery.
2021-12-06 10:26:36 +08:00
845f931098 [fix](select outfile) Remove optional properties check of hdfs storage (#7272) 2021-12-03 13:42:56 +08:00
Wei
5f7c4f903f [refactor](log) Remove unused log instance creation (#7249) 2021-12-02 11:43:29 +08:00
dd36ccc3bf [feature](storage-format) Z-Order Implement (#7149)
Support sort data by Z-Order:

```
CREATE TABLE table2 (
siteid int(11) NULL DEFAULT "10" COMMENT "",
citycode int(11) NULL COMMENT "",
username varchar(32) NULL DEFAULT "" COMMENT "",
pv bigint(20) NULL DEFAULT "0" COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(siteid, citycode)
COMMENT "OLAP"
DISTRIBUTED BY HASH(siteid) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"data_sort.sort_type" = "ZORDER",
"data_sort.col_num" = "2",
"in_memory" = "false",
"storage_format" = "V2"
);
```
2021-12-02 11:39:51 +08:00
d8ba6e3eb6 1. Fix an error when fetch string type field may cause malform packet error. (#7262)
This is beacuse of an const MAX_PHYSICAL_PACKET_LENGTH  in fe should be 2^24 -1,
   but it is set as 2^24 -2 by mistake.
2. Fix bitmap_to_string may failed when the result is large than 2G
2021-12-01 10:02:34 +08:00
fbab8afe24 [feature] Support disable query and load for backend to make Doris more robust and set default value to 1 for max_query_retry_time (#7155)
ALTER SYSTEM MODIFY BACKEND "host1:9050" SET ("disable_query" = "true");
ALTER SYSTEM MODIFY BACKEND "host1:9050" SET ("disable_load" = "true");
2021-11-30 22:08:32 +08:00
baa5d6089f [fix](alter) Fix bug that partition column of a unique key table can be modified (#7217)
The partition columns can not be modified.
2021-11-26 10:16:01 +08:00
52cd12a1f9 [fix](planner) fix preaggregation reason error (#7205)
this pr is going to Fix #7204.
2021-11-26 10:13:53 +08:00
70670b5a42 [feat-wip](lateral-iew) Pruning output slot of TableFunctionNode (#7148)
If the calculation of the lateral view function is completed,
the result will be directly returned to the upper layer.
It will cause a lot of memory copy and network transmission.
The reason is that the original column that generally participates
in the lateral view is very likely to be a very long value.
If Doris still retain this column after calculating the lateral view,
it need to perform a memory copy.
However, in many cases, the upper plan node does not need the original columns of the lateral view,
so it is necessary to perform column pruning after the calculation of the lateral view,
so as to avoid useless memory copy and network transmission.
For example, the following query can prune the original column v1

```select k1, e1 from table lateral view explode_split(v1, ",") tmp as e1;```

The `outputSlotIds` in TableFunctionNode is used to store the columns that should be retained after pruning.

* Support scalar function in lateral view

The child 0 of explode_split function could be a scalar function
such as: concat(k1, ",", k2)

This pr mainly detects whether the lateral view with function satisfies the following specifications in semantics.
1. The columns in the function must all belong to the original table
2. The function must be a scalar function
2021-11-26 10:10:05 +08:00
d3c020b3cb [feat-opt](fe-config) Add tablets number limit to void wrong usage (#7025)
1. Add new FE config `default_db_replica_quota_size`
2. Check replica quota after create table/partition
2021-11-24 10:37:54 +08:00
d420ff0afd display current load bytes to show load progress, (#7134)
this value may greate than the file size when loading
parquert or orc file, will less than file size when loading
csv file.
2021-11-24 10:08:32 +08:00
836c95c2ca [feat](memory-track) Print peak memory use of all backend after query in audit log (#7030)
Add a new field `peakMemoryBytes` in fe.audit.log
2021-11-22 14:46:08 +08:00
07296a301b [chore](fe) Fix build error caused by Inaccessible pentaho-aggdesigner-algorithm jar (#7161) 2021-11-20 21:48:26 +08:00
49eac402e3 [fix](export) fix export retry error (#7143)
fix #7142
clear export status `alreadySentBackendIds` before Coordinator retry Export task.
2021-11-20 21:41:53 +08:00
a88541d2d4 [refactor] extract duplicate code to writePropertiesToFile (#7119)
Extract duplicate code to writePropertiesToFile in org/apache/doris/persist/Storage.java
2021-11-20 21:40:50 +08:00
143d3769b1 [feat](config) add FE config to limit the replica num per tablet (#7087) 2021-11-20 21:40:23 +08:00
52ebb3d8f5 [feat](mysql-compatibility) Increase compatibility with mysql (#7041)
Increase compatibility with mysql
  1. Added two system tables files and partitions
  2. Improved the return logic of mysql error code to make the error code more compatible with mysql
  3. Added lock/unlock tables statement and show columns statement for compatibility with mysql dump
  4. Compatible with mysqldump tool, now you can use mysql dump to dump data and table structure from doris

now use mysqldump may print error message like 
```
$ mysqldump -h127.0.0.1 -P9130 -uroot test_query_qa > a
mysqldump: Error: 'errCode = 2, detailMessage = select list expression not produced by aggregation output (missing from GROUP BY clause?): `EXTRA`' when trying to dump tablespaces
```

This error message not effect the export file, you can add `--no-tablespaces` to avoid this error
2021-11-20 21:39:37 +08:00
e9282205f1 [feat-opt](spark-load) support bitmap binary data from hive in spark load (#6883)
Support to load the binary data of bitmap value from Hive into Doris.
fix #6461
2021-11-20 21:38:38 +08:00
1238f8de46 [fix](auth) do not allow drop or create root user (#7140)
root user should not be dropped or created
2021-11-18 14:39:33 +08:00
94fa6db196 [feat-opt](binlog-load) add how to open binlog load to the error message (#7138) 2021-11-18 14:38:42 +08:00
4f7d7a52bd [refactor] remove unused code (#7137)
Remove unused code in ImportAction.java
2021-11-18 14:37:31 +08:00
eaebe6a40b [typo] correct getLogger argument (#7127) 2021-11-18 14:33:54 +08:00
be89f0f77e [feat-opt](routine-load) Support show offset lag in show routine load stmt (#7114)
Add a new field `Lag` in result of `show routine load` stmt.

`Lag: {"0":10, "1":0}` means kafka partition 0 has 10 msg behind and partition 1 is update-to-date.
2021-11-18 14:31:16 +08:00
74e8264c48 [fix](session-var) Fix the incompatibility of sql mode between Doris and MySQL (#7108)
Introduce by pr #4359

VariableMgr.fillValue() method should not call in ExpressionFunctions.eval(),
because in method analyzeImpl() of SysVariableDesc, it has been already called once.

If VariableMgr.fillValue() was called twice, the type of SysVariableDesc will become BigInt,
which is incorrect.
2021-11-18 14:30:31 +08:00
36360ba846 [BUG] fix profile not working with sql_cache enabled (#7105)
Fix profile not working in sql_cache enabled. It will thrown NullPointerException.
The reason is that the Coordinator in init profile is null when cache is enable.
Therefore, we should perform different profile processing in the case of cache hits and misses, so as to avoid the situation of null pointers.

Fixed #7104
2021-11-17 14:38:00 +08:00
7b712925fc [Lateral View] Multi lateral views map one TableFunctionNode (#7000)
1. Forbidden non-string column as params of explode_view.
The first param of explode_view must be string column(VARCHAR/CHAR/STRING)

2. N-1 n lateral views map one TableFunctionNode
The TableFunctionNode include all of fnExprs which belongs to one table.
For example:
select pageid,mycol1, mycol2 from pageAds
    lateral view explode_string(col1) myTable1 as mycol1
    lateral view explode_string(col2) myTable2 as mycol2;
TableFunctionNode
|----
|- fnExprList: explode_string(col1), explode_string(col2)
2021-11-17 11:13:08 +08:00
dcad6ff5e5 [License] Add License header for missing files (#7130)
1. Add License header for missing files
2. Modify the spark pom.xml to correct the location of `thrift`
2021-11-16 18:37:54 +08:00
5b01f7bba2 [Feature] Support query hive table (#6569)
Users can directly query the data in the hive table in Doris, and can use join to perform complex queries without laboriously importing data from hive.

Main changes list below:

FE:

Extend HiveScanNode from BrokerScanNode
HiveMetaStoreClientHelper communicate with HIVE and HDFS.
BE:
Treate HiveScanNode as BrokerScanNode, treate HiveTable as BrokerTable.

broker_scanner.cpp: suppot read column from HDFS path.
orc_scanner.cpp: support read hdfs file.
POM:

Add hive.version=2.3.7, hive-metastore and hive-exec
Add hadoop.version=2.8.0, hadoop-hdfs
Upgrade commons-lang to fix incompatiblity of Java 9 and later.
Thrift:

Add THiveTable
Add read_by_column_def in TBrokerRangeDesc
2021-11-16 11:59:07 +08:00
ccb1ea801a [Refactor] logger error in BDBStateChangeListener.java (#7101)
the logger BDBStateChangeListener.java should be BDBStateChangeListener.class instead of EditLog.class.
2021-11-16 10:03:53 +08:00
5aaf24bf55 [Compile] Remove unused import (#7112) 2021-11-15 11:57:35 +08:00
11cca0b15d [JoinReorder] Add session variable to close join order (#7076)
The new session variable 'close_join_reorder' is used to turn off all automatic join reorder algorithms.
If close_join_reorder is true, the Doris will execute query by the order in the original query.
2021-11-13 17:10:44 +08:00
93ccef4ec7 [Feature] Add degradate strategy for local_replica_selection. (#7064)
When local_replica_selection is turned on, support select a non-local BE to service the query
when the local be is unavailable
2021-11-13 17:09:25 +08:00
3d8166504a [Alter] Support alter table engine type from MySQL to ODBC (#6993)
Support alter table engine type from MySQL to ODBC:

```
ALTER TABLE tbl MODIFY ENGINE TO odbc PROPERTIES("driver" = "odbc");
```
2021-11-12 15:12:41 +08:00
c55e7221dc [Bug] Fix bug with use tableId to get table in publish version (#7091)
If table has been dropped when finishing txn, skip it.
2021-11-12 10:56:33 +08:00
9692131abc [BUG] Fix CacheAnalyzer's bug when aggregate column contains expression. (#7085)
When partition_cache is enabled, if Query's aggregate columns contain expression,
CacheAnalyzer may throw exception and causes the query to fail.
2021-11-12 10:54:24 +08:00
890bcdf606 [Feature] Clean up old sync jobs regularly (#7061)
#7060
#6287

Each job that has been stopped for more than 3 days(set with Config.label_keep_max_second)
will be permanently cleaned up.
2021-11-12 10:53:50 +08:00
795d549eb3 [Proc] Add stream load info to system info of web site (#6970)
#6969
2021-11-12 10:44:09 +08:00
35da149ebe [SparkDpp]Add not() and xor() methods to bitmapValue (#6885)
Add not() and xor() methods to bitmapValue
2021-11-12 10:38:15 +08:00
667e8bdce3 [Bug] Fix NumberFormatException for partition cache (#6846)
Fix #6845
2021-11-12 10:36:58 +08:00
0ae6e92dd4 [Build] fix unused import (#7094)
remove unused import . introduced by #7065
2021-11-11 19:59:43 +08:00
4dd77f602f [Bug] Fix bug that NPE thrown when adding partition for table with MV (#7069)
The `defineExpr` in `Column` must be analyzed before calling its `treeToThrift` method.
And fro CreateReplicaTask, no need to set `defineExpr` in TColumn.
2021-11-11 15:43:16 +08:00
108914db92 [Log] fix log error for ActionController (#7065) 2021-11-11 15:42:57 +08:00
58804d3570 [Colocate] Fix bug that colocate group can not be redistributed after dropping a backend (#7020)
Mainly changes:

1. Fix [Bug] Colocate group can not redistributed after dropping a backend #7019
2. Add detail msg about why a colocate group is unstable.
3. Add more suggestion when upgrading Doris cluster.
2021-11-11 15:41:49 +08:00
cf085b8b1a [RoutineLoad] And "runningTxns" fields in SHOW ROUTINE LOAD result (#6986)
Add a new field `runningTxns` in the result of `SHOW ROUTINE LOAD`. eg:

```
                  Id: 11001
                Name: test4
          CreateTime: 2021-11-02 00:04:54
           PauseTime: NULL
             EndTime: NULL
              DbName: default_cluster:db1
           TableName: tbl1
               State: RUNNING
      DataSourceType: KAFKA
      CurrentTaskNum: 1
       JobProperties: {xxx}
    CustomProperties: {"kafka_default_offsets":"OFFSET_BEGINNING","group.id":"test4"}
           Statistic: {"receivedBytes":6,"runningTxns":[1001, 1002],"errorRows":0,"committedTaskNum":1,"loadedRows":2,"loadRowsRate":0,"abortedTaskNum":13,"errorRowsAfterResumed":0,"totalRows":2,"unselectedRows":0,"receivedBytesRate":0,"taskExecuteTimeMs":20965}
            Progress: {"0":"10"}
ReasonOfStateChanged:
        ErrorLogUrls:
            OtherMsg:
```

So that user can view the status of corresponding transactions of this job by executing `show transaction where id=xx`;
2021-11-11 15:41:13 +08:00
9c12060db3 [Compile] Fix FE compile problem (#7029)
Co-authored-by: morningman <chenmingyu@baidu.com>
2021-11-08 10:35:49 +08:00
ca8268f1c9 [Feature] Extend logger interface, support structured log output (#6600)
Support structured logging.
2021-11-07 17:39:53 +08:00