Commit Graph

422 Commits

Author SHA1 Message Date
f9be31d4bc [refactor](rowbatch) make RowBatch better (#7286)
1. add const keyword for RowBatch's read-only member functions
2. should use member object rather than member object pointer as possible as you can
2021-12-06 10:31:43 +08:00
8a6528a2fb [fix](executor) set the length of StringValue to 0 when it is null (#7284)
the tuple String Slot's ptr and len are not assigned appropriately on send side, the receive side may crash in some situation.

detail description:
on send side, when we call RowBatch::serialize(PRowBatch* output_batch) to pack RowBatch, the Tuple::deep_copy()
 will be called, for each String Slot, only String Slots that is not null will set ptr and len with proper value, the null String
 Slots will keep original status, the ptr member will point randomly and the len member may unexpect.

on recv side, unpack is processed by RowBatch::RowBatch(const RowDescriptor&, const PRowBatch&...), in this 
function, each String Slot will transfer offset to valid string_val->ptr whether the String Slot is null or not.

but some business logic depends on string_val->len=0, such as AggregateFuncTraits::init(), HyperLogLog::deserialize() 
will return correctly if slice.size<=0. so if string_val->len is set to 0 in send side, everything will be ok, otherwise server 
may crash.

by netcomm viewpoint, we should make sure transfer correct data, it's sender's responsibility to set data with proper 
value, and do not make any presume which way the recv side will use it.
2021-12-06 10:30:26 +08:00
fc9e502b51 [improvement](brpc)(config) Support transfer RowBatch in Controller Attachment (#7164)
Transfer RowBatch in Protobuf Request to Controller Attachment,
when the maximum length of the RowBatch in the Protobuf Request is exceeded.
This can avoid reaching the upper limit of the Protobuf Request length (2G),
and it is expected that performance can be improved.
2021-12-02 11:41:38 +08:00
d8ba6e3eb6 1. Fix an error when fetch string type field may cause malform packet error. (#7262)
This is beacuse of an const MAX_PHYSICAL_PACKET_LENGTH  in fe should be 2^24 -1,
   but it is set as 2^24 -2 by mistake.
2. Fix bitmap_to_string may failed when the result is large than 2G
2021-12-01 10:02:34 +08:00
948a2a738d [performance] Improve DeltaWriter's performance. (#7216)
1. Support batch write for DeltaWriter.
2. Use mutex instead of SpinLock.
2021-11-26 10:15:27 +08:00
fb5adaf18e [fix](mem-tracker) Fix mem limit -1 in partition aggregate node (#7181)
Make error message more clear.
2021-11-24 10:43:35 +08:00
d420ff0afd display current load bytes to show load progress, (#7134)
this value may greate than the file size when loading
parquert or orc file, will less than file size when loading
csv file.
2021-11-24 10:08:32 +08:00
e2d3d0134e dd a method to get doris current memory usage (#6979)
Add all memory usage check when TryConsume memory
2021-11-24 10:07:54 +08:00
ad0d2b82ab [fix](memory) fix bug that ~BitShufflePageDecoder destroys uninitialized chunk (#7172)
Added a safe way to destroy Chunk.
2021-11-23 15:24:25 +08:00
836c95c2ca [feat](memory-track) Print peak memory use of all backend after query in audit log (#7030)
Add a new field `peakMemoryBytes` in fe.audit.log
2021-11-22 14:46:08 +08:00
fcd4f0b5c2 [fix](profile) fix some bugs about ReportProfile on BE (#7144)
1. setting _report_thread_active to false is not necessary protected by _report_thread_lock, because 
_report_thread_active's type is bool, writing data is multi-threadly safety if size <= marchine word length

2. report_profile thread terminates early is possiable, in the function report_profile(), while (_report_thread_active) may 
break if  _report_thread_active is false,  the thread of calling open() may be scheduled out between 
_report_thread_started_cv.wait(l) and _report_thread_active = true, we should not assume that how long time elapsed 
between a thread be scheduled twice
2021-11-20 21:43:57 +08:00
a81f4da4e4 [feat](minidump) Add minidump support (#7124)
Now minidump file will be created when BE crashes.
And user can manually trigger a minidump by sending SIGUSR1 to BE process.

More details can be found in minidump.md documents
2021-11-20 21:41:26 +08:00
f5a35c28e9 [Optimize] [Memory] BitShufflePageDecoder use memory allocated by ChunkAllocator instead of Faststring (#6515)
BitShufflePageDecoder reuses the memory for storing decoder results, allocate memory directly from the 
`ChunkAllocator`, the performance is improved to a certain extent.

In the case of #6285, the total time consumption is reduced by 13.5%, and the time consumption ratio of `~Reader()` 
has also been reduced from 17.65% to 1.53%, and the memory allocation is unified to `ChunkAllocator` for centralized 
management , Which is conducive to subsequent memory optimization.

which can avoid the memory waste caused by `Mempool`, because the chunk can be free at any time, but the 
performance is lower than the allocation from `Mempool`. The guess is that there is no `Mempool` after secondary 
allocation of large chunks , Will directly apply for a large number of small chunks from `ChunkAllocator`, and it takes 
longer to lock in `pop_free_chunk` and `push_free_chunk` (but this is not proven from the flame graphs of BE's cpu and 
contention).
2021-11-17 11:20:21 +08:00
6c6380969b [refactor] replace boost smart ptr with stl (#6856)
1. replace all boost::shared_ptr to std::shared_ptr
2. replace all boost::scopted_ptr to std::unique_ptr
3. replace all boost::scoped_array to std::unique<T[]>
4. replace all boost:thread to std::thread
2021-11-17 10:18:35 +08:00
dcad6ff5e5 [License] Add License header for missing files (#7130)
1. Add License header for missing files
2. Modify the spark pom.xml to correct the location of `thrift`
2021-11-16 18:37:54 +08:00
896a08cbcf [Enhancement] add thread id in be log (#6891)
Add thread id in be log in order to quickly find the query id that caused the BE crushed by segmentation fault
See #6890
2021-11-14 18:52:01 +08:00
d751937828 [Optimize] Optimize mem_tracker (#6988)
1. Optimize HighWaterMarkCounter::add(), call `UpdateMax()` only if delta greater than 0
to reduce function call times

2. delete useless code lines to keep MemTracker clean
    some member datas never be set, but check its value,the if condition never meet, so clean these codes
2021-11-12 10:51:45 +08:00
8ba2d79fe1 [Bug] Change DateTimeValue Memmory Layout To Old (#7022)
Change DateTimeValue Memmory Layout To Old to fix compatibility problems
2021-11-08 21:56:14 +08:00
Pxl
29ca77622f [Refactor] Refactor part of RuntimeFilter's code (#6998)
#6997
2021-11-07 17:40:45 +08:00
ca8268f1c9 [Feature] Extend logger interface, support structured log output (#6600)
Support structured logging.
2021-11-07 17:39:53 +08:00
4f13f98424 [Bug] Fix bug that memtracker in delta writer will be visited before initializd. (#7013) 2021-11-06 13:29:49 +08:00
5ca271299a [refactor] set forward_to_master true by default (#7017)
* ot set forward_to_master true by default

* Update docs/zh-CN/administrator-guide/variables.md
2021-11-06 13:27:26 +08:00
760fc02bfe Added bprc stub cache check and reset api, used to test whether the bprc stub cache is available, and reset the bprc stub cache (#6916)
Added bprc stub cache check and reset api, used to test whether the bprc stub cache is available, and reset the bprc stub cache
add a config used for auto check and reset bprc stub
2021-11-05 09:45:37 +08:00
db1c281be5 [Enhance][Load] Reduce the number of segments when loading a large volume data in one batch (#6947)
## Case

In the load process, each tablet will have a memtable to save the incoming data,
and if the data in a memtable is larger than 100MB, it will be flushed to disk as a `segment` file. And then
a new memtable will be created to save the following data/

Assume that this is a table with N buckets(tablets). So the max size of all memtables will be `N * 100MB`.
If N is large, it will cost too much memory.

So for memory limit purpose, when the size of all memtables reach a threshold(2GB as default), Doris will
try to flush all current memtables to disk(even if their size are not reach 100MB).

So you will see that the memtable will be flushed when it's size reach `2GB/N`, which maybe much smaller
than 100MB, resulting in too many small segment files.

## Solution

When decide to flush memtable to reduce memory consumption, NOT to flush all memtable, but to flush part
of them.
For example, there are 50 tablets(with 50 memtables). The memory limit is 1GB, so when each memtable reach
20MB, the total size reach 1GB, and flush will occur.

If I only flush 25 of 50 memtables, then next time when the total size reach 1GB, there will be 25 memtables with
size 10MB, and other 25 memtables with size 30MB. So I can flush those memtables with size 30MB, which is larger
than 20MB.

The main idea is to introduce some jitter during flush to ensure the small unevenness of each memtable, so as to ensure that flush will only be triggered when the memtable is large enough.

In my test, loading a table with 48 buckets, mem limit 2G, in previous version, the average memtable size is 44MB,
after modification, the average size is 82MB
2021-11-01 10:51:50 +08:00
e8cabfff27 [S3] Support path style endpoint (#6962)
Add a use_path_style property for S3
Upgrade hadoop-common and hadoop-aws to 2.8.0 to support path style property
Fix some S3 URI bugs
Add some logs for tracing load process.
2021-11-01 10:48:10 +08:00
Pxl
d4249e4f2d [Bug] fix Runtime filter can't find fragment-id when apply_filter called early (#6923)
#6921
2021-10-27 09:54:52 +08:00
adb6bfdf74 [Bug] Fix bug that truncate table may change the storage medium property (#6905) 2021-10-25 10:07:27 +08:00
ed7a873a44 [Memory Usage] Implement segment lru cache to save memory of BE (#6829) 2021-10-25 10:07:15 +08:00
da99749e7f [Bug] Fix bug that BE will crash when backup using S3 (#6855) 2021-10-17 22:54:42 +08:00
eff076b355 [BUG] Fix printing ReservationTrackerCounters cause BE crash when mem_limit is reached (#6849)
When the memory usage of BE reaches mem_limit, printing ReservationTrackerCounters through MemTracker
may cause BE crash in high concurrency.

ReservationTrackerCounters is not actually used in the current Doris, and the memory tracker in Doris
will be redesigned in the future.
2021-10-16 21:57:09 +08:00
59017cebe6 [ARM64] Fix some problem when compiling on ARM64 platform (#6836)
1. Refactor the create method of hdfs reader & writer.

    libhdfs3 does not support arm64. So we should not support hdfs reader & writer on arm64.

2. And micro for LowerUpperImpl
2021-10-16 21:56:49 +08:00
24d38614a0 [Dependency] Upgrade thirdparty libs (#6766)
Upgrade the following dependecies:

libevent -> 2.1.12
OpenSSL 1.0.2k -> 1.1.1l
thrift 0.9.3 -> 0.13.0
protobuf 3.5.1 -> 3.14.0
gflags 2.2.0 -> 2.2.2
glog 0.3.3 -> 0.4.0
googletest 1.8.0 -> 1.10.0
snappy 1.1.7 -> 1.1.8
gperftools 2.7 -> 2.9.1
lz4 1.7.5 -> 1.9.3
curl 7.54.1 -> 7.79.0
re2 2017-05-01 -> 2021-02-02
zstd 1.3.7 -> 1.5.0
brotli 1.0.7 -> 1.0.9
flatbuffers 1.10.0 -> 2.0.0
apache-arrow 0.15.1 -> 5.0.0
CRoaring 0.2.60 -> 0.3.4
orc 1.5.8 -> 1.6.6
libdivide 4.0.0 -> 5.0
brpc 0.97 -> 1.0.0-rc02
librdkafka 1.7.0 -> 1.8.0

after this pr compile doris should use build-env:1.4.0
2021-10-15 13:03:04 +08:00
5ef3f59928 [Optimize][RoutineLoad] Avoid sending tasks if there is no data to be consumed (#6805)
1 Avoid sending tasks if there is no data to be consumed
By fetching latest offset of partition before sending tasks.(Fix [Optimize] Avoid too many abort task in routine load job #6803 )

2 Add a preCheckNeedSchedule phase in update() of routine load.
To avoid taking write lock of job for long time when getting all kafka partitions from kafka server.

3 Upgrade librdkafka's version to 1.7.0 to fix a bug of "Local: Unknown partition"
See offsetsForTimes fails with 'Local: Unknown partition' edenhill/librdkafka#3295

4 Avoid unnecessary storage migration task if there is no that storage medium on BE.
Fix [Bug] Too many unnecessary storage migration tasks #6804
2021-10-13 11:39:01 +08:00
ad3c9390a2 [Bug] Fix bdbje getDatabaseNames() bug and scan node close bug (#6769)
1. This bug is introduced from #6582
2. Optimize the error log of Address used used error msg.
3. Add some document about compilation.
    1. Add a custom thirdparty download url.
    2. Add a custom com.alibaba maven jar package for DataX.
4. Fix bug that BE crash when closing scan node, introduced from #6622.
2021-09-29 11:11:28 +08:00
bdc8c98008 [Outfile] Support hdfs in select outfile clause (#6644)
Support hdfs in select outfile clause without broker.
This PR implement a HDFS writer in BE which is used to write HDFS file directly without using broker.
Also the hdfs outfile clause syntax check has been added in FE.
The syntax:
```
select * from xx into outfile "hdfs://user/outfile_" format as csv
properties ("hdfs.fs.dafultFS" = "xxx", "hdfs.hdfs_user" = "xxx");
```
Note that all hdfs configurations need to carry a prefix `hdfs.`.
2021-09-24 10:07:11 +08:00
5c45e26644 Fixed zone map init error for string type (#6667)
Fixed the problem that the StringValue memory generated by Expr may be released before use
Fixed from_string for String type may overflow
2021-09-23 09:44:22 +08:00
521fb15a9b [Bug] Fix some memory bugs (#6699)
1. Fix a memory leak in `collect_iterator.cpp` (Fix #6700)
2. Add a new BE config `max_segment_num_per_rowset` to limit the num of segment in new rowset.(Fix #6701)
3. Make the error msg of stream load more friendly.
2021-09-22 12:30:14 +08:00
332ba4cded [config] use thrift_rpc_timeout_ms config replace hard code value (#6637)
use thrift_rpc_timeout_ms config to replace hard code value
2021-09-16 10:22:57 +08:00
61c9d11fdb support change column type from decimal to string (#6643) 2021-09-14 15:56:44 +08:00
b3ae607fe9 [Sprak-Doris-Connector] support boolean data type (#6601)
1. Support boolean data type for spark-doris-connector because Doris has previously supported the boolean data type
2. Bug-Fix for the Doris BE core when spark request data from be
2021-09-12 10:07:23 +08:00
b2f1e21a3b [Bugs] Fix some bugs (#6586)
* fix regex lazy

* fix result file core

* fix dynamic partition replica and table name length bug

* fix replicanum 0

* fix delete bug

* renew proxy

Co-authored-by: morningman <chenmingyu@baidu.com>
2021-09-10 09:53:30 +08:00
4f744333c2 fix some core in local test: (#6594)
1. insert very large string value may coredump
    2. some analitic functiuon and agg function result may be incorrect
    3. string compare may be coredump when string type is too large
    4. string type in delete condition can not process correctly
    5. add text/blob as alias of string to compitable with mysql
    6. fix string type min/max agg may  process incorrectly
2021-09-10 09:52:03 +08:00
74ddea8d83 [Optimize] Remove some unused code to reduce lock contention (#6566)
1. Remove global runtime profile counter
2. Remove unused thread token register
2021-09-07 11:56:12 +08:00
9469b2ce1a [Outfile] Support concurrent export of query results (#6539)
This pr mainly supports
1. Export query result sets concurrently
2. Query result set export supports s3 protocol

Among them, there are several preconditions for concurrently exporting query result sets
1. Enable concurrent export variables
2. The query itself can be exported concurrently
    (some queries containing sort nodes at the top level cannot be exported concurrently)
3. Export the s3 protocol used instead of the broker

After exporting the result set concurrently,
the file prefix is changed to outfile_{query_instance_id}_filenumber.{file_format}
2021-09-07 11:53:32 +08:00
9f7d4cf741 [BUG] fix bugs with string type (#6538)
* fix bugs with string type
1. not support string with agg type min/max
2. agg_update with large string may coredump
3. stringval with large string may coredump
4. not support string as partition key
2021-09-01 15:59:55 +08:00
0393c9b3b9 [Optimize] Support send batch parallelism for olap table sink (#6397)
* Support send batch parallelism for olap table sink

Co-authored-by: caiconghui <caiconghui@xiaomi.com>
2021-08-30 11:03:09 +08:00
3f2fdd236f Add scan thread token (#6443) 2021-08-27 10:56:17 +08:00
fa290383dc [Doc] Modify README to add some statistical indicators (#6486)
1. Add license/total line/release badegs.
2. Add monthly active contributor and contributor growth graph
3. fix a pom.xml bug
4. Modify some routine load log on BE side
2021-08-25 09:36:26 +08:00
7e30b28f3a [Optimize] Speed up converting the data of other types to string in mysql_result_writer (#6384)
Co-authored-by: caiconghui <caiconghui@xiaomi.com>
2021-08-24 22:30:58 +08:00
146060dfc0 [Bug]Fix result_writer may coredump (#6482)
fix result_writer may coredump, let BufferControlBlock owns the memory
2021-08-22 22:04:00 +08:00