Commit Graph

1292 Commits

Author SHA1 Message Date
a1bce25677 [BUG] Fix Memory Leak in SchemaChange And Fix some DCHECK error (#5491) 2021-03-17 09:27:05 +08:00
105a86d1cd Use O_SYNC instead of O_DIRECT to be fs agnostic. (#5518)
Some fs might not support O_DIRECT and O_SYNC is semantically the same to be used for disk checking.
2021-03-15 10:07:51 +08:00
1100a0f3a0 [Profile] Add more timer for scan thread (#5511)
1.
Add timer to count the time the transfer thread waits for the scaner thread to return rowbatch.
2.
Add timer to count the time that the scanner thread waits for the available worker threads in the thread pool.

Co-authored-by: chenmingyu <chenmingyu@baidu.com>
2021-03-15 10:07:11 +08:00
4b316e4c3f [Outfile] Support exporting query result to local disk (#5489)
1.
User can export query result to local disk like:

`select * from tbl into outfile ("file:///disk1/result_");`

And modify the return result to show the details of export:

```
mysql> select * from tbl1 limit 10 into outfile "file:///home/work/path/result_";
+------------+-----------+----------+--------------+
| FileNumber | TotalRows | FileSize | URL          |
+------------+-----------+----------+--------------+
|          1 |         2 |        8 | 192.168.1.10 |
+------------+-----------+----------+--------------+
```

2.
Support create a mark file after export successfully finished.

Co-authored-by: chenmingyu <chenmingyu@baidu.com>
2021-03-14 15:39:46 +08:00
e9a73ee278 [Bug] Fix the memory expand 10~1000x of compression algorithm (#5504)
Fix the memory expand 10~1000x of compression algorithm in load and compaction
2021-03-12 23:04:07 +08:00
c9a25aa29e [UT] fix memory tracker ut (#5501)
* [UT] fix memory tracker ut

* Update mem_limit_test.cpp
2021-03-12 13:45:04 +08:00
8ead0aaad8 [Enhance] Sort directories by available space when do trash sweep (#5498)
* [Enhance] Sort directories by available space when do trash sweep

In the case when one disk is about to be full, we want to sweep trash
data on this disk as quickly as possible. The currently trash sweep
function is to remove trashed files order by path's name, however, disk
data directories may have some large different available space because
of the load balance algorithm, this patch improve it to remove files by
directories' available space.

* add log
2021-03-12 13:43:27 +08:00
9254e78e57 Fix compatibility of glibc (#5502)
* fix compatibility

* remove eventfd.c because eventfd in differene glibc has different declarition
2021-03-12 11:44:43 +08:00
689602e686 [Enhancement] Support Pallralel Merge In Exchange Node (#5468)
Support Parallel Merge In Exchange Node
2021-03-11 22:34:18 +08:00
0131c33966 [Enhance] Improve the readability of memtrackers' name (#5455)
Improve the readability of memtrackers' name, then you will be happy to read website be_ip:port/mem_tracker
2021-03-11 22:33:31 +08:00
e5c7a6dd9f [Bug] hll serialize 160 items cause backend crash(#5424) (#5425)
Co-authored-by: lanhuajian <lanhuajian@sankuai.com>
2021-03-11 22:24:01 +08:00
7a8fbe5db8 [internal] [doris-1084] support compressed csv file in stream load (#5463) 2021-03-11 10:53:05 +08:00
36002bec48 Fix bug that sql with limit query statistics are wrong (#5347)
Co-authored-by: weixiang <weixiang06@meituan.com>

When querying sql is with limit, the query statistics will be enlarged by merging the query statistics of each batch.

Fix #5340
2021-03-10 10:24:12 +08:00
e023ef5404 [Load] Support multi bytes LineDelimiter and ColumnSeparator (#5462)
* [Internal][Support Multibytes Separator] doris-1079
support multi bytes LineDelimiter and ColumnSeparator
2021-03-09 09:35:39 +08:00
43dd583cfc Fix dlopen faild by upgrade cmake (#5481)
* fix dlopen faild

* remove useless code
2021-03-08 09:02:53 +08:00
db2120a7f2 [Build][BE] Fix GLIBC_COMPATIBILITY can not compile in centos6 (#5472)
Add option to disable glibc_compatibility
2021-03-07 20:47:13 +08:00
35f5cb8e0c [Bug] Fix bug that BE failed to start when validating conf from be_custom.conf (#5465) 2021-03-07 17:37:14 +08:00
d6ac8f4e35 Masking glibc symbols for better portability (#4180)
* Masking glibc symbols for better portability

* Remove redundant files
2021-03-05 13:15:55 +08:00
805f98e0f9 [Bug] Set dest tuple to null when src_tuple is NULL. (#5431) 2021-03-04 22:26:05 +08:00
4e1b6b3eef [ODBC] Let the type conversion of the fail in query in ODBC of MySQL table to prompt the information of the column (#5422)
Let the type conversion of the fail in query in ODBC of MySQL table to prompt the information of the column
2021-03-04 22:23:37 +08:00
c38a1c799f [Config] Support config validating when BE bootstrap and update BE's config by API (#5379)
Some invalid config value may cause BE work in an unexpected behavior,
this patch aim to support config validating when BE bootstrap and update BE's config by API
to reject invalid value.
This is a work to accomplish PR #4423
2021-03-04 22:21:49 +08:00
47d6b1ff0b Fix ut failed for topn_function_test (#5449)
Co-authored-by: caiconghui [蔡聪辉] <caiconghui@xiaomi.com>
2021-03-04 21:53:52 +08:00
9c8766356a [Bug-Fix][Bitmap][Be] Resolve bitmap_not calculate wrong result(#5440) (#5441)
bitmap_not calculate wrong result(#5440)

Execute follow sql, and expect response ''
```
select bitmap_to_string(bitmap_not(bitmap_from_string('1'), bitmap_from_string('2,1'))); 
```

Co-authored-by: lanhuajian <lanhuajian@sankuai.com>
2021-03-04 15:46:42 +08:00
422456c31a Add warn log when client report be state failed and refactor some report code (#5342)
There are some redundant code for report task, disk and tablet in be, and when fe return error report message, there is no any warn log showing report failed.

Co-authored-by: caiconghui [蔡聪辉] <caiconghui@xiaomi.com>
2021-03-03 17:00:21 +08:00
577b62b3f9 [Internal][bug][doris-1091] Fix bug that compaction failed after deletion (#5413) 2021-02-24 13:22:55 +08:00
6dcc1b0a55 [Doris on ES] Fix query failed when ES field value is null (#5363)
* Update fe-idea-dev.md

use `brew install thrift@0.9` to install thrift 0.9.3.1
`brew edit thrift090 | head` shows thrift@0.9 uses thrift 0.9.3.1

* [Refactor] Remove the unnecessary if statement

Future<?> submit(Runnable task)
Submits a Runnable task for execution and returns a Future representing that task. The Future's get method will return null upon successful completion.

* Fix null type

* add comment

Co-authored-by: tanhao <tanhao.0902@bytedance.com>
2021-02-23 10:42:25 +08:00
6ede4c6ec1 [Feature] Support backup,restore,load,export directly connect to s3 (#5399)
* [doris-1008] support backup and restore directly to cloud storage via aws s3 protocol

* Internal][S3DirectAccess] Support backup,restore,load,export directlyconnect to s3
1. Support load and export data from/to s3 directly.
2. Add a config to auto convert broker access to s3 acces when available

Change-Id: Iac96d4b3670776708bc96a119ff491db8cb4cde7

(cherry picked from commit 2f03832ca52221cc7436069b96c45c48c4bc7201)

* [Internal][S3DirectAccess] File path glob compatible with broker

Change-Id: Ie55e07a547aa22c6fa8d432ca926216c10384e68
(cherry picked from commit d4fb25544c0dc06d23e1ada571ec3f8edd4ba56f)

* [internal] [doris-1008] fix log4j class not found

Change-Id: I468176aca0d821383c74ee658d461aba9e7d5be3
(cherry picked from commit 029adaa9d6ded8503acbd6644c1519456f3db232)

* add poms

Co-authored-by: yangzhengguo01 <yangzhengguo01@baidu.com>
2021-02-22 16:07:56 +08:00
7eae3e280a [optimization] use inline optimize ExprContext::get_value (#5385) 2021-02-16 22:35:14 +08:00
51ccd44865 [Load Parallel][3/3] Support parallel delta writer (#5369)
In the previous broker load, multiple OlapTableSinks would send data to the same LoadChannel,
and because of the lock granularity problem, LoadChannel could only process these requests serially,
which made it impossible to make full use of cluster resources.

This CL modifies the related locks so that LoadChannel can process these requests in parallel.

In the test, with a size of 20G, the load speed of 334 million rows of data in 3 nodes has been
increased from 9min to 5min, and after enabling 2 concurrency, it can be increased to 3min.

Also modify the profile of load job.
2021-02-07 22:42:18 +08:00
462efeaf39 [Performance Optimization and Refactor] (#5358) (#5364)
1. Add BlockColumnPredicate support OR and AND column predicate in RowBlockV2
2. Support evaluate vectorization delete predicate in storage engine not in Reader in SegmentV2
2021-02-07 22:41:33 +08:00
6b0521032d [Bug] Fix the problem of floating point precision when importing parquet data (#5360)
The double data "4206.9" in parquet is converted to decimal data "4206.8999" in Doris,
which is not right.
2021-02-07 22:40:51 +08:00
a1808c1a71 [Function] Add BE udf bitmap_not (#5346) (#5357)
this function will return the not result of inputs two bitmap.
2021-02-07 22:39:17 +08:00
aa5379cff5 [Doc] Modify cumulative_compaction_policy comment in config.h (#5354)
Co-authored-by: weizuo <weizuo@xiaomi.com>
2021-02-07 22:38:34 +08:00
8ad50bf745 [Bug] Fix bug that BE core will loading empty json array (#5349)
When loading json data like `[]` (an empty array). BE will crash with stack:

```
*** Aborted at 1612273824 (unix time) try "date -d @1612273824" if you are using GNU date ***
PC: @           0xe0cce7 rapidjson::GenericValue<>::Accept<>()
*** SIGSEGV (@0xe) received by PID 36798 (TID 0x7f7812114700) from PID 14; stack trace: ***
    @     0x7f791b74b470 (unknown)
    @           0xe0cce7 rapidjson::GenericValue<>::Accept<>()
    @          0x169ff79 _ZN5doris10JsonReader17_print_json_valueB5cxx11ERKN9rapidjson12GenericValueINS1_4UTF8IcEENS1_19MemoryPoolAllocatorINS1_12CrtAllocatorEEEEE
    @          0x16a0689 doris::JsonReader::_write_values_by_jsonpath()
    @          0x16a2cb4 doris::JsonReader::_handle_flat_array_complex_json()
    @          0x16a3761 doris::JsonScanner::get_next()
    @          0x1659bd4 doris::BrokerScanNode::scanner_scan()
    @          0x165a671 doris::BrokerScanNode::scanner_worker()
    @          0x281f67f execute_native_thread_routine
    @     0x7f791b5001c3 start_thread
    @     0x7f791b7fd12d __clone
```
2021-02-07 22:38:15 +08:00
780900ac9c [Feature] Support preceding filter original data when loading (#5338)
Support conditional filtering of original data in broker load and routine load
eg:

```
LOAD LABEL `label1`
(
DATA INFILE ('bos://cmy-repo/1.csv')
INTO TABLE tbl2
COLUMNS TERMINATED BY '\t'
(event_day, product_id, ocpc_stage, user_id)
SET (
	ocpc_stage = ocpc_stage + 100
)
PRECEDING FILTER user_id = 1381035
WHERE ocpc_stage > 30
)
...
```
2021-02-07 22:37:48 +08:00
fd7caf775c [Bug] fix bug that dead lock may occur when drop tablet concurrent with tablet distribution interface (#5278)
Fix bug that dead lock may occur when drop tablet concurrent with calling tablet distribution interface.
2021-02-06 23:14:43 +08:00
a6e2c3e3f1 [Bug][Clone] Fix the bug that incremental clone is not triggered (#5230)
In version 0.13, we support a more efficient compaction logic. 
This logic will maintain multiple version paths of the tablet.
This can avoid -230 errors and can also support incremental clone.

But the previous incremental clone uses the incremental rowset meta recorded in `incr_rs_meta`.
At present, the incremental rowset meta recorded in `incr_rs_meta` and the records
in `stale_rs_meta` are duplicated, and the current clone logic does not adapt to the
new multi-version path, resulting in many cases not triggering incremental clone.

This CL mainly modified:

1. Removed `incr_rs_meta` metadata
2. Modified the clone logic. When the clone is incremented, it will try to read the rowset in `stale_rs_meta`.
3. Delete a lot of code that was previously used for version compatibility.
2021-02-06 22:04:48 +08:00
a841905184 [optimization] use replace top instead of push pop in priority #5312 (#5313) 2021-02-04 09:21:54 +08:00
ea7f61e1c7 [Bug] Duplicate results when reading aggregation table (#5307)
Previously, we introduced an optimization logic for the aggr table,
that is, in the case of only one rowset and nonoverlapping,
the data can be read directly without merging.
But this logic has bugs.
2021-02-04 09:21:35 +08:00
wyb
128752b4f9 [Routine load] Fix kafka load too many task bug (#5327) 2021-02-03 13:23:30 +08:00
47e33c7987 Support create index on unique value column (#5305)
* support create index on unique table value columns
2021-02-03 13:22:00 +08:00
2d70cc532c [Bug] Fix CompactionPermitLimiter cv starve bug (#5274)
Fix _permits_cv.wait maybe starve to death bug.
2021-02-01 00:11:29 +08:00
f3aded9370 [Bug] System metric init failed cause be start failed (#5262)
System metric init failed cause be start failed
2021-02-01 00:10:57 +08:00
cd96ded1ad [Bugs] Fix bugs that FE heartbeat api of httpv2 does not return version info (#5306)
Co-authored-by: morningman <chenmingyu@baidu.com>
2021-01-30 20:34:33 +08:00
bf0cb78b67 [optimization] avoid extra memory copy while build hash table (#5301)
avoid extra memory copy while build hash table
2021-01-30 20:32:12 +08:00
90c2da54bd [Bug] Fix bug and add graceful exit for compaction producer (#5124)
1. add graceful exit mechanism for the compaction producer thread.
2. if compaction task submits unsuccessfully, the compaction task should pop from `_tablet_submitted_compaction`.
2021-01-30 16:35:36 +08:00
4ffc61be32 fix apply condition to unique table value columns incorrectly (#5302) 2021-01-29 10:34:47 +08:00
e774314ffb Fix some problems related to thrift rpc when use nonblokcing IO model (#5117)
* Fix some problems related to thrift rpc when use nonblokcing IO model

Co-authored-by: caiconghui [蔡聪辉] <caiconghui@xiaomi.com>
2021-01-28 10:57:30 +08:00
c084276600 Revert "[Bug] Fix row_number and group by are inconsistent with 0 and -0 partition (#5226)" (#5297)
This reverts commit 34bfc429868a9a22481d209c24ccd50d85cc3c9f.
The hash algo may be overflow
2021-01-26 13:58:19 +08:00
8ee4c48f13 [Compile] fix compile error in gcc10 (#5294) 2021-01-26 09:13:11 +08:00