Commit Graph

1559 Commits

Author SHA1 Message Date
088a16d33b Chinese annotation modification (#6958)
* Modify Chinese comment (#6951)
2021-11-09 18:00:14 +08:00
Pxl
fc62090558 [Bug] fix Log tags empty reference core dump (#7043)
key may have been destructed when key reference is called.
2021-11-09 10:00:08 +08:00
8ba2d79fe1 [Bug] Change DateTimeValue Memmory Layout To Old (#7022)
Change DateTimeValue Memmory Layout To Old to fix compatibility problems
2021-11-08 21:56:14 +08:00
Pxl
29ca77622f [Refactor] Refactor part of RuntimeFilter's code (#6998)
#6997
2021-11-07 17:40:45 +08:00
9b1a80114e [Bug] Fix some return logic error in init BE encoding_map (#6936)
Checking _encoding_map in the original code to return in advance will cause some encoding methods cannot be pushed to default_encoding_type_map_ or value_seek_encoding_map_ in EncodingInfoResolver constructor.
E.g:
EncodingInfoResolver::EncodingInfoResolver() {
....
    _add_map<OLAP_FIELD_TYPE_BOOL, PLAIN_ENCODING>();
    _add_map<OLAP_FIELD_TYPE_BOOL, PLAIN_ENCODING, true>();
...
}
The second line code is invilid.
2021-11-07 17:40:18 +08:00
ca8268f1c9 [Feature] Extend logger interface, support structured log output (#6600)
Support structured logging.
2021-11-07 17:39:53 +08:00
e69249c082 sub_bitmap (#6977)
Starting from the offset position, intercept the specified limit bitmap elements and return a bitmap subset.

Types of chang
2021-11-06 13:31:03 +08:00
4f13f98424 [Bug] Fix bug that memtracker in delta writer will be visited before initializd. (#7013) 2021-11-06 13:29:49 +08:00
5ca271299a [refactor] set forward_to_master true by default (#7017)
* ot set forward_to_master true by default

* Update docs/zh-CN/administrator-guide/variables.md
2021-11-06 13:27:26 +08:00
760fc02bfe Added bprc stub cache check and reset api, used to test whether the bprc stub cache is available, and reset the bprc stub cache (#6916)
Added bprc stub cache check and reset api, used to test whether the bprc stub cache is available, and reset the bprc stub cache
add a config used for auto check and reset bprc stub
2021-11-05 09:45:37 +08:00
1f196442f7 [Bug] Fix the nullptr of core in schema change (#7003)
schema change fail as memory allocation fail on row block sorting, however, it should do internal sorting first before schema change fail as memory allocation fail on row block sorting in case there are enough memory after internal sorting.
2021-11-05 09:44:08 +08:00
599ecb1f30 [Function] Add bitmap function bitmap_subset_limit (#6980)
Add bitmap function bitmap_subset_limit.
This function will return subset in specified index.
2021-11-04 12:14:47 +08:00
aeec9c45e6 [Function] Add bitmap-xor-count function for doris (#6982)
Add bitmap-xor-count function for doris

relate to #6875
2021-11-02 16:37:00 +08:00
f0a71a067b [Build] Generate compile_command.json (#6976)
Set cmake to generate compile_commands.json, which is useful for lsp like clangd, cquery, et.
2021-11-02 16:36:35 +08:00
2d10300547 [Bug] Fix schema change fail as memory allocation on row block sorting (#6932)
schema change fail as memory allocation fail on row block sorting.
however, it should do internal sorting first before schema change fail
as memory allocation fail on row block sorting in case there are enough
memory after internal sorting.
2021-11-02 16:33:38 +08:00
1ff3d708ca [Function] add functions of bitmap_and/or_count (#6912)
issue #6875
add bitmap_and_count/ bitmap_or_count
2021-11-01 14:00:07 +08:00
c7a3116f98 [Function] add bitmap function of bitmap_has_all (#6918)
The 'bitmap_has_all' function returns true if the first bitmap contains all the elements of the second bitmap.
2021-11-01 12:50:47 +08:00
65ded82778 [Function] add BE bitmap function bitmap_subset_in_range (#6917)
Add bitmap function bitmap_subset_in_range.
This function will return subset in specified range (not include the range_end).
2021-11-01 11:05:19 +08:00
db1c281be5 [Enhance][Load] Reduce the number of segments when loading a large volume data in one batch (#6947)
## Case

In the load process, each tablet will have a memtable to save the incoming data,
and if the data in a memtable is larger than 100MB, it will be flushed to disk as a `segment` file. And then
a new memtable will be created to save the following data/

Assume that this is a table with N buckets(tablets). So the max size of all memtables will be `N * 100MB`.
If N is large, it will cost too much memory.

So for memory limit purpose, when the size of all memtables reach a threshold(2GB as default), Doris will
try to flush all current memtables to disk(even if their size are not reach 100MB).

So you will see that the memtable will be flushed when it's size reach `2GB/N`, which maybe much smaller
than 100MB, resulting in too many small segment files.

## Solution

When decide to flush memtable to reduce memory consumption, NOT to flush all memtable, but to flush part
of them.
For example, there are 50 tablets(with 50 memtables). The memory limit is 1GB, so when each memtable reach
20MB, the total size reach 1GB, and flush will occur.

If I only flush 25 of 50 memtables, then next time when the total size reach 1GB, there will be 25 memtables with
size 10MB, and other 25 memtables with size 30MB. So I can flush those memtables with size 30MB, which is larger
than 20MB.

The main idea is to introduce some jitter during flush to ensure the small unevenness of each memtable, so as to ensure that flush will only be triggered when the memtable is large enough.

In my test, loading a table with 48 buckets, mem limit 2G, in previous version, the average memtable size is 44MB,
after modification, the average size is 82MB
2021-11-01 10:51:50 +08:00
e8cabfff27 [S3] Support path style endpoint (#6962)
Add a use_path_style property for S3
Upgrade hadoop-common and hadoop-aws to 2.8.0 to support path style property
Fix some S3 URI bugs
Add some logs for tracing load process.
2021-11-01 10:48:10 +08:00
Pxl
28030294f7 [Feature] Support bitmap_and_not & bitmap_and_not_count (#6910)
Support bitmap_and_not & bitmap_and_not_count.
2021-11-01 10:11:54 +08:00
a842d41b87 [Function] add BE bitmap function bitmap_max (#6942)
Support bitmap_max.
2021-10-30 18:16:38 +08:00
c3b133bdb3 [Refactor] Refactor the reader code (#6866)
1. Removed useless redundant code logic
2. Change reader to interface, add tuple reader to simplify the structure of reader
2021-10-30 18:15:28 +08:00
b0926a317e Modify Chinese comment (#6951)
Modify Chinese comment
2021-10-28 13:56:59 +08:00
4170aabf83 [Optimize] optimize some session variable and profile (#6920)
1. optimize error message when using batch delete
2. rename session variable is_report_success to enable_profile
3. add table name to OlapScanner profile
2021-10-27 18:03:12 +08:00
00fe9deaeb [Benchmark] Add star schema benchmark tools (#6925)
This CL mainly changes:

1. Add star schema benchmark tools in `tools/ssb-tools`, for user to easy load and test with SSB data set.
2. Disable the segment cache for some read scenario such as compaction and alter operation.(Fix #6924 )
3. Fix a bug that `max_segment_num_per_rowset` won't work(Fix #6926)
4. Enable `enable_batch_delete_by_default` by default.
2021-10-27 09:55:36 +08:00
Pxl
d4249e4f2d [Bug] fix Runtime filter can't find fragment-id when apply_filter called early (#6923)
#6921
2021-10-27 09:54:52 +08:00
77a954d02c [Bug] Fix treat tuple_is_null_predicate is const expr cause core problem (#6919)
Fix treat tuple_is_null_predicate is const expr cause core problem
2021-10-27 09:54:25 +08:00
4f9b46d403 Fix String type column using zonemap to filter data maybe core dump (#6939)
Fix String type column using zonemap to filter data maybe core dump, because of not allocating memory before parsing string type zonemap
2021-10-27 09:25:38 +08:00
adb6bfdf74 [Bug] Fix bug that truncate table may change the storage medium property (#6905) 2021-10-25 10:07:27 +08:00
ed7a873a44 [Memory Usage] Implement segment lru cache to save memory of BE (#6829) 2021-10-25 10:07:15 +08:00
88760d66d1 [MetaTool]add error message when loading meta by meta tool (#6893)
When loading meta by meta_tool goes wrong, we only get an error code from `json2pb`,
which is inconvenient for us to locate the problem.

This change is adding error message when loading meta goes wrong.

Log change is like below.

```
# before
./meta_tool --root_path=/home/disk1/qjl/mydoris/be/storage --operation=load_meta --json_meta_path=/home/disk1/qjl/data/meta-json.json
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1020 11:41:56.564241 74937 data_dir.cpp:837] path: /home/disk1/qjl/mydoris/be/storage total capacity: 7750843404288, available capacity: 7583325925376
I1020 11:41:56.564415 74937 data_dir.cpp:275] path: /home/disk1/qjl/mydoris/be/storage, hash: 7528840506668047470
load meta failed, status:-1410

# after 
./meta_tool --root_path=/home/disk1/qjl/mydoris/be/storage --operation=load_meta --json_meta_path=/home/disk1/qjl/data/meta-json.json
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1020 14:41:40.084342 50727 data_dir.cpp:837] path: /home/disk1/qjl/mydoris/be/storage total capacity: 7750843404288, available capacity: 7584601022464
I1020 14:41:40.084496 50727 data_dir.cpp:275] path: /home/disk1/qjl/mydoris/be/storage, hash: 7528840506668047470
E1020 14:41:40.163007 50727 tablet_meta_manager.cpp:161] JSON to protobuf message failed: Fail to decode base64 string=0
load meta failed, status:-1410
```
2021-10-23 16:51:58 +08:00
51e210869a [ARM64] Fix some problem when compiling on ARM64 platform (#6836) (#6872)
With thirdparties 1.4.0 to 1.4.1

1. Add patch for aws-c-cal-0.4.5
2. Add some solutions for `undefined reference libpsl`
3. Move libgsasl to fix link problme of libcurl.
4. Downgrade openssl to 1.0.2k to fix problem of low version glibc
2021-10-19 13:26:02 +08:00
63dbcbc4e1 [UT] Fix ut bugs (#6862)
Co-authored-by: morningman <chenmingyu@baidu.com>
2021-10-18 10:12:55 +08:00
da99749e7f [Bug] Fix bug that BE will crash when backup using S3 (#6855) 2021-10-17 22:54:42 +08:00
eff076b355 [BUG] Fix printing ReservationTrackerCounters cause BE crash when mem_limit is reached (#6849)
When the memory usage of BE reaches mem_limit, printing ReservationTrackerCounters through MemTracker
may cause BE crash in high concurrency.

ReservationTrackerCounters is not actually used in the current Doris, and the memory tracker in Doris
will be redesigned in the future.
2021-10-16 21:57:09 +08:00
59017cebe6 [ARM64] Fix some problem when compiling on ARM64 platform (#6836)
1. Refactor the create method of hdfs reader & writer.

    libhdfs3 does not support arm64. So we should not support hdfs reader & writer on arm64.

2. And micro for LowerUpperImpl
2021-10-16 21:56:49 +08:00
a0b3840daa [MemerySave] Change TabletSchema in tablet to reference to save mem (#6814)
Change TabletSchema in tablet to reference to save memory
2021-10-16 21:54:32 +08:00
24d38614a0 [Dependency] Upgrade thirdparty libs (#6766)
Upgrade the following dependecies:

libevent -> 2.1.12
OpenSSL 1.0.2k -> 1.1.1l
thrift 0.9.3 -> 0.13.0
protobuf 3.5.1 -> 3.14.0
gflags 2.2.0 -> 2.2.2
glog 0.3.3 -> 0.4.0
googletest 1.8.0 -> 1.10.0
snappy 1.1.7 -> 1.1.8
gperftools 2.7 -> 2.9.1
lz4 1.7.5 -> 1.9.3
curl 7.54.1 -> 7.79.0
re2 2017-05-01 -> 2021-02-02
zstd 1.3.7 -> 1.5.0
brotli 1.0.7 -> 1.0.9
flatbuffers 1.10.0 -> 2.0.0
apache-arrow 0.15.1 -> 5.0.0
CRoaring 0.2.60 -> 0.3.4
orc 1.5.8 -> 1.6.6
libdivide 4.0.0 -> 5.0
brpc 0.97 -> 1.0.0-rc02
librdkafka 1.7.0 -> 1.8.0

after this pr compile doris should use build-env:1.4.0
2021-10-15 13:03:04 +08:00
adb9b0d9c6 [Bug] Return 0 when hex(0) (#6837) 2021-10-15 10:18:55 +08:00
58440b90f0 [Bug] Left() string function behaves not identically to the mysql implementation (#6811)
See Fix #6810
2021-10-15 10:17:21 +08:00
5ef3f59928 [Optimize][RoutineLoad] Avoid sending tasks if there is no data to be consumed (#6805)
1 Avoid sending tasks if there is no data to be consumed
By fetching latest offset of partition before sending tasks.(Fix [Optimize] Avoid too many abort task in routine load job #6803 )

2 Add a preCheckNeedSchedule phase in update() of routine load.
To avoid taking write lock of job for long time when getting all kafka partitions from kafka server.

3 Upgrade librdkafka's version to 1.7.0 to fix a bug of "Local: Unknown partition"
See offsetsForTimes fails with 'Local: Unknown partition' edenhill/librdkafka#3295

4 Avoid unnecessary storage migration task if there is no that storage medium on BE.
Fix [Bug] Too many unnecessary storage migration tasks #6804
2021-10-13 11:39:01 +08:00
ad949c2f65 Optimize Hex and add related Doc (#6697)
I tested hex in a 1000w times for loop with random numbers,
old hex avg time cost is 4.92 s,optimize hex avg time cost is 0.46 s which faster nearly 10x.
2021-10-13 11:36:14 +08:00
630e273d94 use segmentV2 as default storage format for old tables using storage format 'DEFAULT' (#6807) 2021-10-13 11:34:40 +08:00
0941322dd6 [Optimiaze] Optimize HyperLogLog (#6625)
1. Replace std::max with a ternary expression, std::max is much heavier than the ternary operator
2. Replace std::set with arrays, std::set is based on red-black trees, traversal will follow the chain domain, and cache hits are not good
3. Optimize the serialize function, improve the calculation speed of num_non_zero_registers by reducing branches, and the serialization of _registers after optimization is faster
4. The test found that the performance improvement is more obvious
2021-10-10 23:04:39 +08:00
8cf7ff78df [Bug] big_int * big_int product overflow (#6788)
while query with multi where conditions, such as `where dt in (20210926,20210919) and hour<=13`,
will cause int * int product overflow result. and then in the function extend_scan_key will call 
`range.convert_to_fixed_value()` mistakenly. And for a big `range[_low_value, _high_value)`,
mass value will be inserted into _fixed_values, result in oom finally.
2021-10-03 12:17:03 +08:00
7297b275f1 [Optimize] Optimize cpu consumption when importing parquet files (#6782)
Remove part of dynamic_cast, reduce the overhead caused by type conversion,
and probably reduce the cpu consumption of parquet file import by about 10%
2021-10-03 12:14:35 +08:00
ad3c9390a2 [Bug] Fix bdbje getDatabaseNames() bug and scan node close bug (#6769)
1. This bug is introduced from #6582
2. Optimize the error log of Address used used error msg.
3. Add some document about compilation.
    1. Add a custom thirdparty download url.
    2. Add a custom com.alibaba maven jar package for DataX.
4. Fix bug that BE crash when closing scan node, introduced from #6622.
2021-09-29 11:11:28 +08:00
982b76c3c0 [Bug] Fix resource tag bug, add documents and some other bug fix (#6708)
1. Fix bug of UNKNOWN Operation Type 91
2. Support using resource_tag property of user to limit the usage of BE
3. Add new FE config `disable_tablet_scheduler` to disable tablet scheduler.
4. Add documents for resource tag.
5. Modify the default value of FE config `default_db_data_quota_bytes` to 1PB.
6. Add a new BE config `disable_compaction_trace_log` to disable the trace log of compaction time cost.
7. Modify the default value of BE config `remote_storage_read_buffer_mb` to 16MB
8. Fix `show backends` results error
9. Add new BE config `external_table_connect_timeout_sec` to set the timeout when connecting to odbc and mysql table.
10. Modify issue template to enable blank issue, for release note or other specific usage.
11. Fix a bug in alpha_row_set split_range() function.
2021-09-28 10:37:42 +08:00
42c7d39faa [Revert] "[Enhancement] Modify the method of calculating compaction score (#6252)" (#6748)
This reverts commit dedb57f87e31305db3e2a13e374ba4fd58043fca.
Reverts #6252

This commit may cause tablet which segments are all empty never to compaction, and results in -235 error.
I will revert this commit, and the problem will be solved in #6671
2021-09-27 10:35:19 +08:00