Commit Graph

4142 Commits

Author SHA1 Message Date
aaaaae53b5 [feature] (memory) Switch TLS mem tracker to separate more detailed memory usage (#8605)
In pr #8476, all memory usage of a process is recorded in the process mem tracker,
and all memory usage of a query is recorded in the query mem tracker,
and it is still necessary to manually call `transfer to` to track the cached memory size.

We hope to separate out more detailed memory usage based on Hook TCMalloc new/delete + TLS mem tracker.

In this pr, the more detailed mem tracker is switched to TLS, which automatically and accurately
counts more detailed memory usage than before.
2022-03-24 14:29:34 +08:00
5f606c9d57 [fix] Fix coredump of stddev function (#8543)
This is only a temporary fix its performance is not ideal. Finally,
we need to reconstruct the functions of `stddev` and delete the interface of `insert_to_null_default ()`.
2022-03-24 11:39:29 +08:00
Pxl
0292b9ad9e [Enhancement] add build paramnt ENABLE_JAVAUDF, BUILD_DOCS (#8612)
* add build parament ENABLE_JAVAUDF,BUILD_DOCS
2022-03-24 10:53:52 +08:00
Pxl
2760bcbcc1 [fix] fix core dump on deep_copy_tuple when data is null (#8620) 2022-03-24 09:15:38 +08:00
6e1147206e [doc] fix help module failed (#8617)
Introduced by #8509.
Docs title is duplicate.
2022-03-24 09:15:06 +08:00
286ee8e1d4 [doc] fix typo for session (#8610) 2022-03-24 09:14:44 +08:00
a58e56f0b4 [fix](load) fix another bug that BE may crash when calling mark_as_failed (#8607)
Same as #8501
2022-03-24 09:13:54 +08:00
Pxl
7fc22c2456 [fix][vectorized] fix core on get_predicate_column_ptr && fix double copy on _read_columns_by_rowids (#8581) 2022-03-24 09:12:42 +08:00
bea9a7ba4f [feature] Support pre-aggregation for quantile type (#8234)
Add a new column-type to speed up the approximation of quantiles.
1. The  new column-type is named `quantile_state` with fixed aggregation function `quantile_union`, which stores the intermediate results of pre-aggregated approximation calculations for quantiles.
2. support pre-aggregation of new column-type and quantile_state related functions.
2022-03-24 09:11:34 +08:00
36c85d2f06 [fix][vectorized] Fix bug of left semi/anti with other join conjunct (#8596) 2022-03-23 10:34:47 +08:00
72dfdb9a6c [fix] Fix Check_time return wrong value when exec show table status (#8578) 2022-03-23 10:34:23 +08:00
92feb9c6c8 [fix] Fix error crc32 method to cal uint128 and int128 (#8577) 2022-03-23 10:33:32 +08:00
b89e4c7bba [feature-wip](java-udf) support java UDF with fixed-length input and output (#8516)
This feature is propsoed in [DSIP-1](https://cwiki.apache.org/confluence/display/DORIS/DSIP-001%3A+Java+UDF). 
This PR support fixed-length input and output Java UDF. Phase I in DIP-1 is done after this PR.

To support Java UDF effeciently, I use no data copy in JNI call and all compute operations are off-heap in Java.
To achieve that, I use a UdfExecutor instead. 

For users, a UDF class must have a public evaluate method.
2022-03-23 10:32:50 +08:00
9f0b93e3c6 [feature-wip](array-type) Fix conflict while merge array-type branch (#8594) 2022-03-22 16:35:30 +08:00
b522de884c [feature-wip](array-type) Fix compilation error. (#8556) (#8591) 2022-03-22 15:52:34 +08:00
2580da4f72 [feature-wip](array-type) Support insertion for vectorized engine. (#8494) (#8590)
Please refer to #8493
2022-03-22 15:48:13 +08:00
71ce3c4a6e [feature-wip](array-type) Add codes and UT for array_contains and array_position functions (#8401) (#8589)
array_contains function Usage example:
1. create table with ARRAY column, and insert some data:
```
> select * from array_test;
+------+------+--------+
| k1   | k2   | k3     |
+------+------+--------+
|    1 |    2 | [1, 2] |
|    2 |    3 | NULL   |
|    4 | NULL | []     |
|    3 | NULL | NULL   |
+------+------+--------+
```
2. enable vectorized:
```
> set enable_vectorized_engine=true;
```
3. select with array_contains:
```
> select k1,array_contains(k3,1) from array_test;
+------+-------------------------+
| k1   | array_contains(`k3`, 1) |
+------+-------------------------+
|    3 |                    NULL |
|    1 |                       1 |
|    2 |                    NULL |
|    4 |                       0 |
+------+-------------------------+
```
4. also we can use array_contains in where condition
```
> select * from array_test where array_contains(k3,1);
+------+------+--------+
| k1   | k2   | k3     |
+------+------+--------+
|    1 |    2 | [1, 2] |
+------+------+--------+
```
5. array_position usage example
```
> select k1,k3,array_position(k3,2) from array_test;
+------+--------+-------------------------+
| k1   | k3     | array_position(`k3`, 2) |
+------+--------+-------------------------+
|    3 | NULL   |                    NULL |
|    1 | [1, 2] |                       2 |
|    2 | NULL   |                    NULL |
|    4 | []     |                       0 |
+------+--------+-------------------------+
```
2022-03-22 15:42:40 +08:00
a9f51b5b65 [feature-wip](array-type) Fix compilation error. (#8422) (#8587) 2022-03-22 15:31:16 +08:00
b638c07533 [feature-wip](array-type) Support nested array insertion. (#8305) (#8586)
Please refer to #8304 .
2022-03-22 15:28:26 +08:00
e44038caf3 [feature-wip](array-type) Array data can be loaded in stream load. (#8368) (#8585)
Please refer to #8367 .
2022-03-22 15:25:40 +08:00
a498463ab5 [feature-wip](array-type)support select ARRAY data type on vectorized engine (#8217) (#8584)
Usage Example:
1. create table for test;
```
`CREATE TABLE `array_test` (
  `k1` tinyint(4) NOT NULL COMMENT "",
  `k2` smallint(6) NULL COMMENT "",
  `k3` ARRAY<int(11)> NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`k1`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`k1`) BUCKETS 5
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2"
);`
```

2. insert some data
```
`insert into array_test values(1, 2, [1, 2]);`
`insert into array_test values(2, 3, null);`
`insert into array_test values(3, null, null);`
`insert into array_test values(4, null, []);`
```

3. open vectorized
`set enable_vectorized_engine=true;`

4. query array data
`select * from array_test;`
+------+------+--------+
| k1   | k2   | k3     |
+------+------+--------+
|    4 | NULL | []     |
|    2 |    3 | NULL   |
|    1 |    2 | [1, 2] |
|    3 | NULL | NULL   |
+------+------+--------+
4 rows in set (0.061 sec)

Code Changes include:
1. add column_array, data_type_array codes;
2. codes about data_type creation by Field, TabletColumn, TypeDescriptor, PColumnMeta move to DataTypeFactory;
3. support create data_type for ARRAY date type;
4. RowBlockV2::convert_to_vec_block support ARRAY date type;
5. VMysqlResultWriter::append_block support ARRAY date type;
6. vectorized::Block serialize and deserialize support ARRAY date type;
2022-03-22 15:21:44 +08:00
38ec3cbbdf [feature-wip](array-type) Support ArrayLiteral in SQL. (#8089) (#8582)
Please refer to #8074
2022-03-22 15:07:06 +08:00
cf0a9fd177 [feature-wip](array-type) Create table with nested array type. (#8003) (#8575)
```
create table array_type_table(k1 INT, k2 Array<Array<int>>) duplicate key (k1)
distributed by hash(k1) buckets 1 properties('replication_num' = '1');
```
2022-03-22 15:03:32 +08:00
106d7c2e41 [fix] Wrong conf be used for Filesytem in S3Storage (#8568)
wrong conf for Filesytem in S3Storage to disable cache.
it will lead to wrong behavior when use it to list objects in object store
2022-03-22 11:42:38 +08:00
54aaa8a56a [doc] update star-schema-benchmark.md (#8565) 2022-03-22 11:42:10 +08:00
4335c07c35 [doc] update star-schema-benchmark.md (#8564) 2022-03-22 11:41:45 +08:00
9a0a1c693e [fix] fix NPE in thrift when forwarding stmt to master FE 2022-03-22 11:41:13 +08:00
Pxl
be3d203289 [feature][vectorized] support table function explode_numbers() (#8509) 2022-03-22 11:38:00 +08:00
989e03ddf9 [improvement] Improve sig handler (#8545)
* Refactor glog's default signal handler

Co-authored-by: Zhengguo Yang <780531911@qq.com>
2022-03-22 10:40:31 +08:00
011985e7e3 fix en broker load (#8566)
fix en broker load
2022-03-21 22:53:51 +08:00
905b9a6289 [fix](lru_cache) fix heap-use-after-free problem for lru cache(#8569) 2022-03-21 21:23:43 +08:00
04004021b5 [chore] Separate debugging information from BE binaries (#8544)
Currently, the compiled output of BE mainly consists of two binaries:
palo_be and meta_tool, which are both around 1.6G in size.
However, the debug information is only needed for debugging purposes.

So I separate the debug info from binaries.
After BE is built, the debug info file will be saved in `be/lib/debug_info/` dir.
`palo_be` and `meta_tool`'s size decrease to about 100MB

This is optional, and default is disabled.
To enable it, use:

`STRIP_DEBUG_INFO=ON sh build.sh`
2022-03-21 16:33:01 +08:00
7c1c2b1d17 [chore] fix compile error when use clang as compiler and a be ut problem (#8554) 2022-03-21 15:38:59 +08:00
337d174c14 [Refactor](schema_change) Remove tablet instances since tablet id is unique between base tablet and new schema change tablet (#8486) 2022-03-21 12:43:54 +08:00
f06780249a fix some fe ut failed (#8547) 2022-03-21 10:36:06 +08:00
c772020db4 [fix] fix bug in WindowFunctionLastData::data, it keeps the first data not the last. (#8536)
WindowFunctionLastData::add should keep the last value,
but current implementation keeps the first one.
Obviously, this code is copied from WindowFunctionFirstData::add.
2022-03-21 09:51:56 +08:00
dde50fb2bf [doc] change http to https in download page (#8546) 2022-03-20 23:36:17 +08:00
Pxl
fc3ad371c8 [fix](vec) fix regexp_replace get wrong result on clang (#8505) 2022-03-20 23:11:24 +08:00
eeae516e37 [Feature](Memory) Hook TCMalloc new/delete automatically counts to MemTracker (#8476)
Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G

Implement a new way of memory statistics based on TCMalloc New/Delete Hook,
MemTracker and TLS, and it is expected that all memory new/delete/malloc/free
of the BE process can be counted.
2022-03-20 23:06:54 +08:00
276792daeb [feature](benchmark) Add TPC-H benchmark tools (#8408) 2022-03-20 23:06:10 +08:00
2ec0b81030 [improvement](storage) Low cardinality string optimization in storage layer (#8318)
Low cardinality string optimization in storage layer
2022-03-20 23:04:25 +08:00
ed47e20eea [license] Update license for thirdparties (#8537) 2022-03-19 16:24:27 +08:00
f91d78bf8d [doc] fix backup doc (#8529) 2022-03-19 15:45:45 +08:00
12bd967846 [doc] Fix some typo about spark load and broker load (#8520)
1. add hive-bitmap-udf link
2. modify preceding-filter
2022-03-19 15:45:17 +08:00
ef852d6a26 [release] Add download link for flink/spark connector (#8535)
Add Releases:
1. Flink Connector 1.0.3
2. Spark Connector 1.0.1
2022-03-19 15:44:35 +08:00
58a4c70fd4 [fix] fix String type comapaction or agg may crash when string is null (#8515) 2022-03-18 11:27:28 +08:00
4da1718147 [fix] memory leak in ResourceTls (#8517) 2022-03-18 09:42:19 +08:00
8765759a18 [doc] add flink 1.14 support (#8511)
flink 1.14 support
2022-03-18 09:41:28 +08:00
94991864f5 [fix] Fix bug that __set_ missing for thrift optional fields in be (#8507) 2022-03-18 09:41:06 +08:00
035ca5240f [fix] Fix may coredump when check if all rowset is beta-rowset of a tablet (#8503)
core dump like
```
*** Aborted at 1647468467 (unix time) try "date -d @1647468467" if you are using GNU date ***
PC: @     0x5555576940b0 doris::OlapScanNode::start_scan_thread()
*** SIGSEGV (@0x84) received by PID 39139 (TID 0x7ffee8388700) from PID 132; stack trace: ***
    @     0x555558926212 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7ffff753d400 (unknown)
    @     0x5555576940b0 doris::OlapScanNode::start_scan_thread()
    @     0x555557696e1b doris::OlapScanNode::start_scan()
    @     0x55555769737d doris::OlapScanNode::get_next()
    @     0x5555570784f5 doris::PlanFragmentExecutor::get_next_internal()
    @     0x55555707d24c doris::PlanFragmentExecutor::open_internal()
    @     0x55555707e72f doris::PlanFragmentExecutor::open()
    @     0x555556ffab95 doris::FragmentExecState::execute()
    @     0x555556fff0ed doris::FragmentMgr::_exec_actual()
    @     0x5555570088ec std::_Function_handler<>::_M_invoke()
    @     0x55555719a099 doris::ThreadPool::dispatch_thread()
    @     0x555557193a8f doris::Thread::supervise_thread()
    @     0x7ffff72f2ea5 start_thread
    @     0x7ffff76058dd __clone
    @                0x0 (unknown)
```
2022-03-18 09:39:13 +08:00