Commit Graph

414 Commits

Author SHA1 Message Date
370b92a2ea [Bug](compile) fix clang build for be (#12183) 2022-08-30 17:58:54 +08:00
a16cf0e2c8 [feature-wip](scan) add profile for new olap scan node (#12042)
Copy most of profiles from VOlapScanNode and VOlapScanner to NewOlapScanNode and NewOlapScanner.
Fix some blocking bug of new scan framework.
TODO:

Memtracker
Opentelemetry spen
The new framework is still disabled by default, so it will not effect other feature.
2022-08-30 10:55:48 +08:00
db07e51cd3 [refactor](status) Refactor status handling in agent task (#11940)
Refactor TaggableLogger
Refactor status handling in agent task:
Unify log format in TaskWorkerPool
Pass Status to the top caller, and replace some OLAPInternalError with more detailed error message Status
Premature return with the opposite condition to reduce indention
2022-08-29 12:06:01 +08:00
a6e2e2f3bc [feature](remote)Add cache files cleaner for remote olap files (#11959) 2022-08-26 23:59:36 +08:00
0b5bb565a7 [feature-wip](parquet-reader) parquet dictionary decoder (#11981)
Parse parquet data with dictionary encoding.

Using the PLAIN_DICTIONARY enum value is deprecated in the Parquet 2.0 specification.
Prefer using RLE_DICTIONARY in a data page and PLAIN in a dictionary page for Parquet 2.0+ files.
refer: https://github.com/apache/parquet-format/blob/master/Encodings.md
2022-08-26 19:24:37 +08:00
54fc038dc5 [Fix](remote) Fix thread safety issue in cache (#11984) 2022-08-24 18:14:14 +08:00
3abc4f357f [Bug](bitmap) intersect_count function use in string cause ASAN error (#11936) 2022-08-24 08:51:53 +08:00
5d627e41a4 [fix](array-type) fix the be core dump when import number larger than uint64 (#11853)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-08-24 08:51:12 +08:00
982c5f06b5 [fix](build) Resolve the conflicts when building be with java-udf (#11938) 2022-08-20 18:24:32 +08:00
288b440b14 [improvement](vectorized) Improve count distinct performance by using fastunion (#11516)
Improve count distinct performance by using fastunion.
Testing our user real data has a 10-40% performance improvement.
2022-08-16 12:18:46 +08:00
5104982614 [enhancement](tracing) append the profile counter to trace. (#11458)
1. append the profile counter and infos to span attributes.
2. output traceid to audit log.
2022-08-15 21:36:38 +08:00
77e241cbb0 [refactor](date) Use uint32 as predicate type for date type (#11708)
Use uint32 as predicate type for date type
2022-08-15 11:12:33 +08:00
4047c3577d [enhancement](Status) Optimize Status implementation 2022-08-12 11:39:35 +08:00
8f5aed27ec [feature-wip](parquet-reader)read and decode parquet physical type (#11637)
# Proposed changes

Read and decode parquet physical type.
1. The encoding type of boolean is bit-packing, this PR introduces the implementation of bit-packing from Impala
2. Create a parquet including all the primitive types supported by hive

## Remaining Problems
1. At present, only physical types are decoded, and there is no corresponding and conversion methods with doris logical.
2. No parsing and processing Decimal type / Timestamp / Date.
3. Int_8 / Int_16 is stored as Int_32. How to resolve these types.
2022-08-11 10:17:32 +08:00
37d1180cca [feature-wip](parquet-reader)decode parquet data (#11536) 2022-08-08 12:44:06 +08:00
bd4048f8fb [enhancement](compaction) add idle schedule and max_size limit for base compaction (#11542)
Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-08-07 16:21:57 +08:00
abbf75d302 [doc][refactor](metrics) Reorganize FE and BE metrics and add document (#11307) 2022-08-02 11:34:06 +08:00
44a1a20e65 [feature-wip](parquet-reader)parse parquet schema (#11381)
Analyze schema elements in parquet FileMetaData, and generate the hierarchy of nested fields.
For exmpale:
1. primitive type
```
// thrift:
optional int32 <column-name>;
// sql definition:
<column-name> int32;
```
2. nested type
```
// thrift:
optional group <column-name> (LIST) {
  repeated group bag {
    optional group array_element (LIST) {
      repeated group bag {
        optional int32 array_element
      }
    }
  }
}
// sql definition:
<column-name> array<array<int32>>
```
2022-08-02 10:56:13 +08:00
4ccdd65bf6 [Fix](array) fix mysql_row_buffer may use after free when reserve() delete original address in dynamic_mode (#11395)
```
if (!_dynamic_mode) {
	int8store(_len_pos, _pos - _len_pos - 8);
	_len_pos = nullptr;
}
```

_len_pos may be pointed to the pos which already deleted in reserve, int8store will asign value to the freed address,
and lead to use after free when build in ASAN.So I changed _len_pos to the offset of _buf
2022-08-01 22:52:19 +08:00
d3c88471ad [tracing] Support opentelemtry collector. (#10864)
* [tracing] Support opentelemtry collector.

1. support for exporting traces to multiple distributed tracing system via collector;
2. support using collector to process traces.
2022-07-29 16:49:40 +08:00
Pxl
1b4a2c287e [Improvement][chore] replace from_decv2_to_packed128 to decv2.value (#11261) 2022-07-28 10:41:27 +08:00
Pxl
4e6a59df4c [Improvement][chore] add const to all operator== (#11251) 2022-07-27 21:46:47 +08:00
eab8382b4a [feature-wip](unique-key-merge-on-write) add the implementation of primary key index update, DSIP-018 (#11057) 2022-07-27 14:17:56 +08:00
babab5d535 [feature-wip] support datetimev2 (#11085) 2022-07-23 16:07:59 +08:00
0b6d2ae290 [fix] Move s3 fs connect outside the lock critical area (#11026)
* fix potential bug of S3FileSystem

* move s3 fs connect outside the lock critical area
2022-07-23 16:06:29 +08:00
4960043f5e [enhancement] Refactor to improve the usability of MemTracker (step2) (#10823) 2022-07-21 17:11:28 +08:00
e5663f9872 [Bug](array-type) Fix the core dump caused by unaligned __int128 (#11020)
Fix the core dump caused by unaligned __int128 and change DEFAULT_ALIGNMENT
2022-07-20 16:37:27 +08:00
d5fa66d9a3 [Enhancement] [Memory] Limit memory usage use process actual physical memory (#10924) 2022-07-19 11:08:39 +08:00
842ff2b1e2 [refactor] Refactor time LUT (#10982) 2022-07-19 08:23:29 +08:00
3bc6655069 [refactor] remove BlockManager (#10913)
* remove BlockManager
* remove deprecated field in tablet meta
2022-07-17 14:10:06 +08:00
dc6fbcce14 [feature-wip] (datev2) modify datev2 format in memory (#10873)
* [feature-wip] (datev2) modify datev2 format in memory

* update
2022-07-15 19:57:38 +08:00
ad4751972c [feature-wip] Support in predicate for datev2 type (#10810) 2022-07-15 14:32:40 +08:00
3b46242483 [feature-wip] Optimize Decimal type (#10794)
* [feature-wip](decimalv3) support decimalv3

* [feature-wip] Optimize Decimal type

Co-authored-by: liaoxin <liaoxinbit@126.com>
2022-07-14 10:50:50 +08:00
b04a791895 [Enhancement] support compile with jemalloc (#10542)
A test feature to use jemalloc as default malloc.
2022-07-11 12:15:35 +08:00
d5ea677282 [feature](tracing) Support query tracing to improve doris observability by introducing OpenTelemetry. (#10533)
The collection of query traces is implemented in fe and be, and the spans are exported to zipkin.
DSIP: https://cwiki.apache.org/confluence/display/DORIS/DSIP-012%3A+Introduce+opentelemetry
2022-07-09 15:50:40 +08:00
331fa50501 [feature](cold-data) move cold data to object storage without losing any feature(BE) (#10280)
This PR supports rowset level data upload on the BE side, so that there can be both cold data and hot data in a tablet,
and there is no necessary to prohibit loading new data to cooled tablets.

Each rowset is bound to a `FileSystem`, so that the storage layer can read and write rowsets without
perceiving the underlying filesystem.

The abstracted `RemoteFileSystem` can try local caching strategies with different granularity,
instead of caching segment files as before.

To avoid conflicts with the code in be/src/io, we temporarily put the file system related code in the be/src/io/fs directory.
In the future, `FileReader`s and `FileWriter`s should be unified.
2022-07-08 12:18:39 +08:00
86502b014d [feature-wip](unique-key-merge-on-write)port IntervalTree from kudu (#10511)
See the DISP-18:https://cwiki.apache.org/confluence/display/DORIS/DSIP-018%3A+Support+Merge-On-Write+implementation+for+UNIQUE+KEY+data+model
This patch is for step 3.1 in scheduling.
2022-07-05 17:43:01 +08:00
c9f86bc7e2 [refactor] Refactoring Status static methods to format message using fmt(#9533) 2022-07-02 18:58:23 +08:00
d259770b86 [Fix] avoid core dump cause by malformed bitmap type data (#10458) 2022-06-30 11:27:22 +08:00
8cbdbb5658 [Enhancement] a better vec version for count_zero_num (#10472) 2022-06-29 10:26:42 +08:00
ca94867b4e [Feature-wip] add date v2 type (#9916) 2022-06-26 16:07:56 +08:00
8a49c7ef04 [chore] Rename Doris binary output format 2022-06-24 15:30:05 +08:00
1bd0d7ded5 [typo] Fix typos in comments (#10252) 2022-06-24 08:57:54 +08:00
588634ddf6 [feature] support runtime filter on vectorized engine (#10103) 2022-06-20 09:46:38 +08:00
1d3496c6ab [feature] support backup/restore connect to HDFS (#10081) 2022-06-19 10:26:20 +08:00
44e979e43b [Vectorized][Function] add orthogonal bitmap agg functions (#10126)
* [Vectorized][Function] add orthogonal bitmap agg functions
save some file about orthogonal bitmap function
add some file to rebase
update functions file

* refactor union_count function
refactor orthogonal union count functions

* remove bool is_variadic
2022-06-17 08:48:41 +08:00
4b9d500425 [improvement](profile) Add table name and predicates (#10093) 2022-06-16 10:59:31 +08:00
4c24586865 [Vectorized][UDF] support java-udaf (#9930) 2022-06-15 10:53:44 +08:00
Pxl
5d624dfe6c [bugfix]fix segmentation fault at unalign address cast to int128 (#10094) 2022-06-14 15:32:58 +08:00
39a2785ce2 [enhancement] support simd instructions on arm cpus through sse2neon (#10068)
* [enhancement] support simd instructions on arm cpus through sse2neon
2022-06-14 09:17:09 +08:00