Commit Graph

13073 Commits

Author SHA1 Message Date
32f6dec80f fix dup table don't schema schange (#6791)
Co-authored-by: qzsee <shizhiqiang03@meituan.com>
2021-10-13 11:37:39 +08:00
6a058792af [Feature][Step1] Support lateral view FE part (#6745)
* [Feature] Support lateral view

The syntax:
```
select k1, e1 from test lateral view explode_split(k1, ",") tmp as e1;
```
```explode_split``` is a special function of doris,
which is used to separate the string column according to the specified split string,
and then convert the row to column.
This is a conforming function of string separation + table function,
and its behavior is equivalent to explode in hive ```explode(split(string, string))```

The implement:
A tablefunction operator is added to the implementation to handle the syntax of the lateral view separately.
The query plan is following:
```
MySQL [test]> explain select k1, e1 from test_explode lateral view explode_split (k2, ",") tmp as e1;
+---------------------------------------------------------------------------+
| Explain String                                                            |
+---------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                           |
|  OUTPUT EXPRS:`k1` | `e1`                                                 |
|                                                                           |
|   RESULT SINK                                                             |
|                                                                           |
|   1:TABLE FUNCTION NODE                                                   |
|   |  table function: explode_split(`k2`, ',')                             |
|   |                                                                       |
|   0:OlapScanNode                                                          |
|      TABLE: test_explode                                                  |
+---------------------------------------------------------------------------+
```

* Add ut

* Add multi table function node

* Add session variables 'enable_lateral_view'

* Fix ut
2021-10-13 11:37:12 +08:00
ad949c2f65 Optimize Hex and add related Doc (#6697)
I tested hex in a 1000w times for loop with random numbers,
old hex avg time cost is 4.92 s,optimize hex avg time cost is 0.46 s which faster nearly 10x.
2021-10-13 11:36:14 +08:00
6cbefa9f10 [Docs] Update materialized view document (#6710)
* [Docs] Update materialized view document
2021-10-13 11:35:23 +08:00
630e273d94 use segmentV2 as default storage format for old tables using storage format 'DEFAULT' (#6807) 2021-10-13 11:34:40 +08:00
30bf6c0d1d [DOC] minor update (#6820) 2021-10-13 09:14:56 +08:00
a6e905eae9 [Revert] "[Bug] When using view, make toSql method generates the final sql (#6736)" (#6793)
This reverts part of commit 11ec38dd6fd9f86632d83c47bd9d8bc05db69a2b(#6736)
Because it will cause view query problem described in #6792 

The following bug fix kept:
1. Fix the problem that the WITH statement cannot be printed when UNION is included in SQL
2021-10-11 10:29:50 +08:00
f439e5e533 [Doc] Documentation error (#6797)
Documentation error
2021-10-10 23:08:16 +08:00
ea17682d1f [Typo] Correct misspellings in SparkDpp (#6789)
Correct misspellings in SparkDpp
2021-10-10 23:07:39 +08:00
Wei
979df5635f [Code Refactor] Remove unnecessary return statement (#6786)
Co-authored-by: wei.zhang2 <wei.zhang2@dmall.com>
2021-10-10 23:07:18 +08:00
Wei
a679d04b3b [Typo] Modify code description (#6785)
Co-authored-by: wei.zhang2 <wei.zhang2@dmall.com>
2021-10-10 23:06:53 +08:00
bd19491b5b [Doc] Modify the description of dynamic partition hot partition (#6764)
Modify the description of dynamic partition hot partition
2021-10-10 23:06:14 +08:00
675aef7d75 [AliasFunction] Add support for cast in alias function (#6754)
support #6753
2021-10-10 23:05:44 +08:00
0941322dd6 [Optimiaze] Optimize HyperLogLog (#6625)
1. Replace std::max with a ternary expression, std::max is much heavier than the ternary operator
2. Replace std::set with arrays, std::set is based on red-black trees, traversal will follow the chain domain, and cache hits are not good
3. Optimize the serialize function, improve the calculation speed of num_non_zero_registers by reducing branches, and the serialization of _registers after optimization is faster
4. The test found that the performance improvement is more obvious
2021-10-10 23:04:39 +08:00
4232f787ad [Doc] datax doriswriter use case (#6612)
datax doriswriter use case
2021-10-10 23:03:12 +08:00
237a8ae948 [Feature] support spark connector sink data using sql (#6796)
Co-authored-by: wei.zhao <wei.zhao@aispeech.com>
2021-10-09 15:47:36 +08:00
9e67b3a392 [Bug] Fix bug that replayCreateLoadJob will cause fe memory leak in non master node because InsertLoadJob cannot be removed from TxnStateCallbackFactory (#6795) 2021-10-08 13:17:22 +08:00
53fed2d35e [BUG] Fix the bug of query in expr (#6767) (#6768) 2021-10-05 12:26:10 +08:00
5f3559a94c [Bug][Binlog] Fix Bug that multiple sync jobs can connect to the same canal instance (#6756)
When creating sync jobs, we should ban that different jobs can connect to the same canal instance,
or else these jobs will compete with each other for the data produced by the same canal instance,
which may cause data inconsistency.
2021-10-03 12:21:06 +08:00
8cf7ff78df [Bug] big_int * big_int product overflow (#6788)
while query with multi where conditions, such as `where dt in (20210926,20210919) and hour<=13`,
will cause int * int product overflow result. and then in the function extend_scan_key will call 
`range.convert_to_fixed_value()` mistakenly. And for a big `range[_low_value, _high_value)`,
mass value will be inserted into _fixed_values, result in oom finally.
2021-10-03 12:17:03 +08:00
7297b275f1 [Optimize] Optimize cpu consumption when importing parquet files (#6782)
Remove part of dynamic_cast, reduce the overhead caused by type conversion,
and probably reduce the cpu consumption of parquet file import by about 10%
2021-10-03 12:14:35 +08:00
fb7fc27a0a [Bug] Fix duplicate result in colocated agg node (#6727)
Fixed #6726

If the plan fragment contains colocated agg plan node, it will be a colocated fragment.
The scan range and backend id of colocated fragment instance should be different from ordinary scheduler logic.
Tablets in the same bucket must fall on the same be.
For example, for the same bucket in different partitions,
even though the tablet id is different, they must be scheduled to the same be for scan node.
2021-10-03 11:59:38 +08:00
83003cc372 [Thirdparty] Change libhdfs3 download url to a stable one(#6744) 2021-10-03 11:56:36 +08:00
7a20d6d4c2 [Doc] Modify document of resource tag (#6778)
Fix typo
2021-10-03 11:37:45 +08:00
e7707c8180 [FOLLOWUP] create table like clause support copy rollup (#6580)
* Remove `ALL` key word to make grammar more clear.

Co-authored-by: qzsee <shizhiqiang03@meituan.com>
2021-09-30 18:26:21 +08:00
ad3c9390a2 [Bug] Fix bdbje getDatabaseNames() bug and scan node close bug (#6769)
1. This bug is introduced from #6582
2. Optimize the error log of Address used used error msg.
3. Add some document about compilation.
    1. Add a custom thirdparty download url.
    2. Add a custom com.alibaba maven jar package for DataX.
4. Fix bug that BE crash when closing scan node, introduced from #6622.
2021-09-29 11:11:28 +08:00
8d471007a6 [Feature] support spark connector sink stream data to doris (#6761)
* [Feature] support spark connector sink stream data to doris

* [Doc] Add spark-connector batch/stream writing instructions

* add license and remove meaningless blanks code

Co-authored-by: wei.zhao <wei.zhao@aispeech.com>
2021-09-28 17:46:19 +08:00
df5ba6b5a2 [Fix] Flink connector support json import and use httpclient to streamlaod (#6740)
* [Bug]:fix when data null , throw NullPointerException

* [Bug]:Distinguish between null and empty string

* [Feature]:flink-connector supports streamload parameters

* [Fix]:code style

* [Fix]: support json format import and use httpclient to streamload

* [Fix]:remove System out

* [Fix]:upgrade httpclient  version

* [Doc]: add json format import doc

Co-authored-by: wudi <wud3@shuhaisc.com>
2021-09-28 17:37:03 +08:00
cdf9f9e980 [Dynamic Partition] reserve specific history periods by dynamic partition. (#6554)
Add RESERVED_HISTORY_STARTS and RESERVED_HISTORY_ENDS.
Fixes #6514
2021-09-28 11:39:35 +08:00
adf6510050 [docs] Update README.md (#6711) 2021-09-28 10:38:23 +08:00
982b76c3c0 [Bug] Fix resource tag bug, add documents and some other bug fix (#6708)
1. Fix bug of UNKNOWN Operation Type 91
2. Support using resource_tag property of user to limit the usage of BE
3. Add new FE config `disable_tablet_scheduler` to disable tablet scheduler.
4. Add documents for resource tag.
5. Modify the default value of FE config `default_db_data_quota_bytes` to 1PB.
6. Add a new BE config `disable_compaction_trace_log` to disable the trace log of compaction time cost.
7. Modify the default value of BE config `remote_storage_read_buffer_mb` to 16MB
8. Fix `show backends` results error
9. Add new BE config `external_table_connect_timeout_sec` to set the timeout when connecting to odbc and mysql table.
10. Modify issue template to enable blank issue, for release note or other specific usage.
11. Fix a bug in alpha_row_set split_range() function.
2021-09-28 10:37:42 +08:00
42c7d39faa [Revert] "[Enhancement] Modify the method of calculating compaction score (#6252)" (#6748)
This reverts commit dedb57f87e31305db3e2a13e374ba4fd58043fca.
Reverts #6252

This commit may cause tablet which segments are all empty never to compaction, and results in -235 error.
I will revert this commit, and the problem will be solved in #6671
2021-09-27 10:35:19 +08:00
e4d999274f [BUG] Fix a bug when modify table's colocate group with same name (#6695)
If new group name is the same as old group name when mod table colocate group name,
the group has been in an unstable state
2021-09-27 10:34:41 +08:00
850cf10991 [Refactor] refactor olap_scan_node: discard boost, remove dynamic_cast (#6622)
1. refactor olap_scan_node: discard boost, remove dynamic_cast
2. use move instead of copy version for push_back
2021-09-27 10:32:57 +08:00
3db8160400 [Bug] Fix Tuple is null predicate may cause be cores (#6466) 2021-09-27 10:31:48 +08:00
11ec38dd6f [Bug] When using view, make toSql method generates the final sql (#6736)
1. Fix the problem that the WITH statement cannot be printed when `UNION` is included in SQL
2. In the `toSql` method, convert the normal VIEW into the final statement
3. Replace `selectStmt.originSql` with `selectStmt.toSql`
2021-09-26 11:44:23 +08:00
ce7f9bef91 [Bug][bdbje] handle bdb rollbackexception (#6582)
when use 3 FE follower, when restart the fe, and regardless of order, we probability can't start fe success,
and bdb throw RollbackException,
In this scenario, the bdb suggests to catch the exception, simply closing all your ReplicatedEnvironment handles,
and then reopening.

so we catch the RollbackException, and reopen the ReplicatedEnvironment
2021-09-26 11:43:58 +08:00
a121124fb2 [Doc] Update doris-on-es.md (#6734)
Typo
2021-09-25 12:28:03 +08:00
f3d4c475b1 [DOC] Add connection reset exception solution (#6733)
Add solution for connection reset exception when doing stream load.
2021-09-25 12:27:35 +08:00
ec777aa122 [DOCS] improve docs (#6718) 2021-09-25 12:26:41 +08:00
e5a4172b27 [Bug][Docs]Fix outfile docs for parquet (#6709)
Update outfile documents for parquet.
2021-09-25 12:24:52 +08:00
537a542dba [Bugs] Fix the bugs list of sync job (#6705)
1、Fix bug that the sync jobs are not cancelled after deleting the database.
2、The MySQL and Doris tables should have a one-to-one correspondence. 
      If they are not, they should fail when creating the task.
3、When the cluster has multiple FE, the non-master will core when replay create the sync job.
4、Inconsistent data when updating key column
5、Failed to synchronize data when there are multiple tables in single sync job.
6、After restarting the master, resuming the paused syncjob will fail.
2021-09-25 12:24:29 +08:00
36d6788bc3 [Optimize] Use compact mode to send query plan thrift data structure. (#6702)
In some cases, the query plan thrift structure of a query may be very large
(for example, when there are many columns in SQL), resulting in a large number
of "send fragment timeout" errors.

This PR adds an FE config to control whether to transmit the query plan in a compressed format.

Using compressed format transmission can reduce the size by ~50%. But it may reduce
the concurrency by ~10%. Therefore, in the high concurrency small query scenario,
you can choose to turn off compaction.
2021-09-25 12:13:29 +08:00
56031cbbe1 [Doc] Change CN/EN sql-functions single quote in markdown (#6698) 2021-09-24 21:42:52 +08:00
f73af475ce [HTTP API] Add aggregation type information in table schema api (#6686)
```
{
	"msg": "success",
	"code": 0,
	"data": {
		"properties": [{
			"type": "INT",
			"name": "k1",
			"comment": "",
			"aggregation_type":""
		}, {
			"type": "INT",
			"name": "k2",
			"comment": "",
			"aggregation_type":"MAX"
		}],
		"status": 200
	},
	"count": 0
}
```
2021-09-24 21:42:24 +08:00
e03b74ebc1 [Doc] Add the error code document of returned by the OLAP function on the BE side (#6666) 2021-09-24 21:40:20 +08:00
af771bee5a [Improvement] Try to finish transaction if all backends of unfinished tasks have been dead (#6662) 2021-09-24 21:39:20 +08:00
68529d20f3 [Flink] Fix bug of flink doris connector (#6655)
Flink-Doris-Connector do not support flink 1.13, refactor doris sink forma 
to not use GenericRowData. But to use RowData::FieldGetter.
2021-09-24 21:38:35 +08:00
39fd839cd1 [Bug] fix backup bug when comparisons case sensitive (#6648)
#6633
2021-09-24 21:35:50 +08:00
f49362b0d7 [Demo] Add Spark-Doris-Sink demo (#6570)
This demo includes reading hdfs files and writing doris through streaming load、 reading kafka message queues and writing doris through streaming load and reading doris tables through spark doris connector to build DataFrame dataset.
2021-09-24 21:35:08 +08:00