aaaaae53b5
[feature] (memory) Switch TLS mem tracker to separate more detailed memory usage ( #8605 )
...
In pr #8476 , all memory usage of a process is recorded in the process mem tracker,
and all memory usage of a query is recorded in the query mem tracker,
and it is still necessary to manually call `transfer to` to track the cached memory size.
We hope to separate out more detailed memory usage based on Hook TCMalloc new/delete + TLS mem tracker.
In this pr, the more detailed mem tracker is switched to TLS, which automatically and accurately
counts more detailed memory usage than before.
2022-03-24 14:29:34 +08:00
5f606c9d57
[fix] Fix coredump of stddev function ( #8543 )
...
This is only a temporary fix its performance is not ideal. Finally,
we need to reconstruct the functions of `stddev` and delete the interface of `insert_to_null_default ()`.
2022-03-24 11:39:29 +08:00
0292b9ad9e
[Enhancement] add build paramnt ENABLE_JAVAUDF, BUILD_DOCS ( #8612 )
...
* add build parament ENABLE_JAVAUDF,BUILD_DOCS
2022-03-24 10:53:52 +08:00
2760bcbcc1
[fix] fix core dump on deep_copy_tuple when data is null ( #8620 )
2022-03-24 09:15:38 +08:00
6e1147206e
[doc] fix help module failed ( #8617 )
...
Introduced by #8509 .
Docs title is duplicate.
2022-03-24 09:15:06 +08:00
286ee8e1d4
[doc] fix typo for session ( #8610 )
2022-03-24 09:14:44 +08:00
a58e56f0b4
[fix](load) fix another bug that BE may crash when calling mark_as_failed ( #8607 )
...
Same as #8501
2022-03-24 09:13:54 +08:00
7fc22c2456
[fix][vectorized] fix core on get_predicate_column_ptr && fix double copy on _read_columns_by_rowids ( #8581 )
2022-03-24 09:12:42 +08:00
bea9a7ba4f
[feature] Support pre-aggregation for quantile type ( #8234 )
...
Add a new column-type to speed up the approximation of quantiles.
1. The new column-type is named `quantile_state` with fixed aggregation function `quantile_union`, which stores the intermediate results of pre-aggregated approximation calculations for quantiles.
2. support pre-aggregation of new column-type and quantile_state related functions.
2022-03-24 09:11:34 +08:00
36c85d2f06
[fix][vectorized] Fix bug of left semi/anti with other join conjunct ( #8596 )
2022-03-23 10:34:47 +08:00
72dfdb9a6c
[fix] Fix Check_time return wrong value when exec show table status ( #8578 )
2022-03-23 10:34:23 +08:00
92feb9c6c8
[fix] Fix error crc32 method to cal uint128 and int128 ( #8577 )
2022-03-23 10:33:32 +08:00
b89e4c7bba
[feature-wip](java-udf) support java UDF with fixed-length input and output ( #8516 )
...
This feature is propsoed in [DSIP-1](https://cwiki.apache.org/confluence/display/DORIS/DSIP-001%3A+Java+UDF ).
This PR support fixed-length input and output Java UDF. Phase I in DIP-1 is done after this PR.
To support Java UDF effeciently, I use no data copy in JNI call and all compute operations are off-heap in Java.
To achieve that, I use a UdfExecutor instead.
For users, a UDF class must have a public evaluate method.
2022-03-23 10:32:50 +08:00
9f0b93e3c6
[feature-wip](array-type) Fix conflict while merge array-type branch ( #8594 )
2022-03-22 16:35:30 +08:00
b522de884c
[feature-wip](array-type) Fix compilation error. ( #8556 ) ( #8591 )
2022-03-22 15:52:34 +08:00
2580da4f72
[feature-wip](array-type) Support insertion for vectorized engine. ( #8494 ) ( #8590 )
...
Please refer to #8493
2022-03-22 15:48:13 +08:00
71ce3c4a6e
[feature-wip](array-type) Add codes and UT for array_contains and array_position functions ( #8401 ) ( #8589 )
...
array_contains function Usage example:
1. create table with ARRAY column, and insert some data:
```
> select * from array_test;
+------+------+--------+
| k1 | k2 | k3 |
+------+------+--------+
| 1 | 2 | [1, 2] |
| 2 | 3 | NULL |
| 4 | NULL | [] |
| 3 | NULL | NULL |
+------+------+--------+
```
2. enable vectorized:
```
> set enable_vectorized_engine=true;
```
3. select with array_contains:
```
> select k1,array_contains(k3,1) from array_test;
+------+-------------------------+
| k1 | array_contains(`k3`, 1) |
+------+-------------------------+
| 3 | NULL |
| 1 | 1 |
| 2 | NULL |
| 4 | 0 |
+------+-------------------------+
```
4. also we can use array_contains in where condition
```
> select * from array_test where array_contains(k3,1);
+------+------+--------+
| k1 | k2 | k3 |
+------+------+--------+
| 1 | 2 | [1, 2] |
+------+------+--------+
```
5. array_position usage example
```
> select k1,k3,array_position(k3,2) from array_test;
+------+--------+-------------------------+
| k1 | k3 | array_position(`k3`, 2) |
+------+--------+-------------------------+
| 3 | NULL | NULL |
| 1 | [1, 2] | 2 |
| 2 | NULL | NULL |
| 4 | [] | 0 |
+------+--------+-------------------------+
```
2022-03-22 15:42:40 +08:00
a9f51b5b65
[feature-wip](array-type) Fix compilation error. ( #8422 ) ( #8587 )
2022-03-22 15:31:16 +08:00
b638c07533
[feature-wip](array-type) Support nested array insertion. ( #8305 ) ( #8586 )
...
Please refer to #8304 .
2022-03-22 15:28:26 +08:00
e44038caf3
[feature-wip](array-type) Array data can be loaded in stream load. ( #8368 ) ( #8585 )
...
Please refer to #8367 .
2022-03-22 15:25:40 +08:00
a498463ab5
[feature-wip](array-type)support select ARRAY data type on vectorized engine ( #8217 ) ( #8584 )
...
Usage Example:
1. create table for test;
```
`CREATE TABLE `array_test` (
`k1` tinyint(4) NOT NULL COMMENT "",
`k2` smallint(6) NULL COMMENT "",
`k3` ARRAY<int(11)> NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`k1`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`k1`) BUCKETS 5
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2"
);`
```
2. insert some data
```
`insert into array_test values(1, 2, [1, 2]);`
`insert into array_test values(2, 3, null);`
`insert into array_test values(3, null, null);`
`insert into array_test values(4, null, []);`
```
3. open vectorized
`set enable_vectorized_engine=true;`
4. query array data
`select * from array_test;`
+------+------+--------+
| k1 | k2 | k3 |
+------+------+--------+
| 4 | NULL | [] |
| 2 | 3 | NULL |
| 1 | 2 | [1, 2] |
| 3 | NULL | NULL |
+------+------+--------+
4 rows in set (0.061 sec)
Code Changes include:
1. add column_array, data_type_array codes;
2. codes about data_type creation by Field, TabletColumn, TypeDescriptor, PColumnMeta move to DataTypeFactory;
3. support create data_type for ARRAY date type;
4. RowBlockV2::convert_to_vec_block support ARRAY date type;
5. VMysqlResultWriter::append_block support ARRAY date type;
6. vectorized::Block serialize and deserialize support ARRAY date type;
2022-03-22 15:21:44 +08:00
38ec3cbbdf
[feature-wip](array-type) Support ArrayLiteral in SQL. ( #8089 ) ( #8582 )
...
Please refer to #8074
2022-03-22 15:07:06 +08:00
cf0a9fd177
[feature-wip](array-type) Create table with nested array type. ( #8003 ) ( #8575 )
...
```
create table array_type_table(k1 INT, k2 Array<Array<int>>) duplicate key (k1)
distributed by hash(k1) buckets 1 properties('replication_num' = '1');
```
2022-03-22 15:03:32 +08:00
106d7c2e41
[fix] Wrong conf be used for Filesytem in S3Storage ( #8568 )
...
wrong conf for Filesytem in S3Storage to disable cache.
it will lead to wrong behavior when use it to list objects in object store
2022-03-22 11:42:38 +08:00
54aaa8a56a
[doc] update star-schema-benchmark.md ( #8565 )
2022-03-22 11:42:10 +08:00
4335c07c35
[doc] update star-schema-benchmark.md ( #8564 )
2022-03-22 11:41:45 +08:00
9a0a1c693e
[fix] fix NPE in thrift when forwarding stmt to master FE
2022-03-22 11:41:13 +08:00
be3d203289
[feature][vectorized] support table function explode_numbers() ( #8509 )
2022-03-22 11:38:00 +08:00
989e03ddf9
[improvement] Improve sig handler ( #8545 )
...
* Refactor glog's default signal handler
Co-authored-by: Zhengguo Yang <780531911@qq.com >
2022-03-22 10:40:31 +08:00
011985e7e3
fix en broker load ( #8566 )
...
fix en broker load
2022-03-21 22:53:51 +08:00
905b9a6289
[fix](lru_cache) fix heap-use-after-free problem for lru cache( #8569 )
2022-03-21 21:23:43 +08:00
04004021b5
[chore] Separate debugging information from BE binaries ( #8544 )
...
Currently, the compiled output of BE mainly consists of two binaries:
palo_be and meta_tool, which are both around 1.6G in size.
However, the debug information is only needed for debugging purposes.
So I separate the debug info from binaries.
After BE is built, the debug info file will be saved in `be/lib/debug_info/` dir.
`palo_be` and `meta_tool`'s size decrease to about 100MB
This is optional, and default is disabled.
To enable it, use:
`STRIP_DEBUG_INFO=ON sh build.sh`
2022-03-21 16:33:01 +08:00
7c1c2b1d17
[chore] fix compile error when use clang as compiler and a be ut problem ( #8554 )
2022-03-21 15:38:59 +08:00
337d174c14
[Refactor](schema_change) Remove tablet instances since tablet id is unique between base tablet and new schema change tablet ( #8486 )
2022-03-21 12:43:54 +08:00
f06780249a
fix some fe ut failed ( #8547 )
2022-03-21 10:36:06 +08:00
c772020db4
[fix] fix bug in WindowFunctionLastData::data, it keeps the first data not the last. ( #8536 )
...
WindowFunctionLastData::add should keep the last value,
but current implementation keeps the first one.
Obviously, this code is copied from WindowFunctionFirstData::add.
2022-03-21 09:51:56 +08:00
dde50fb2bf
[doc] change http to https in download page ( #8546 )
2022-03-20 23:36:17 +08:00
fc3ad371c8
[fix](vec) fix regexp_replace get wrong result on clang ( #8505 )
2022-03-20 23:11:24 +08:00
eeae516e37
[Feature](Memory) Hook TCMalloc new/delete automatically counts to MemTracker ( #8476 )
...
Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G
Implement a new way of memory statistics based on TCMalloc New/Delete Hook,
MemTracker and TLS, and it is expected that all memory new/delete/malloc/free
of the BE process can be counted.
2022-03-20 23:06:54 +08:00
276792daeb
[feature](benchmark) Add TPC-H benchmark tools ( #8408 )
2022-03-20 23:06:10 +08:00
2ec0b81030
[improvement](storage) Low cardinality string optimization in storage layer ( #8318 )
...
Low cardinality string optimization in storage layer
2022-03-20 23:04:25 +08:00
ed47e20eea
[license] Update license for thirdparties ( #8537 )
2022-03-19 16:24:27 +08:00
f91d78bf8d
[doc] fix backup doc ( #8529 )
2022-03-19 15:45:45 +08:00
12bd967846
[doc] Fix some typo about spark load and broker load ( #8520 )
...
1. add hive-bitmap-udf link
2. modify preceding-filter
2022-03-19 15:45:17 +08:00
ef852d6a26
[release] Add download link for flink/spark connector ( #8535 )
...
Add Releases:
1. Flink Connector 1.0.3
2. Spark Connector 1.0.1
2022-03-19 15:44:35 +08:00
58a4c70fd4
[fix] fix String type comapaction or agg may crash when string is null ( #8515 )
2022-03-18 11:27:28 +08:00
4da1718147
[fix] memory leak in ResourceTls ( #8517 )
2022-03-18 09:42:19 +08:00
8765759a18
[doc] add flink 1.14 support ( #8511 )
...
flink 1.14 support
2022-03-18 09:41:28 +08:00
94991864f5
[fix] Fix bug that __set_ missing for thrift optional fields in be ( #8507 )
2022-03-18 09:41:06 +08:00
035ca5240f
[fix] Fix may coredump when check if all rowset is beta-rowset of a tablet ( #8503 )
...
core dump like
```
*** Aborted at 1647468467 (unix time) try "date -d @1647468467" if you are using GNU date ***
PC: @ 0x5555576940b0 doris::OlapScanNode::start_scan_thread()
*** SIGSEGV (@0x84) received by PID 39139 (TID 0x7ffee8388700) from PID 132; stack trace: ***
@ 0x555558926212 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7ffff753d400 (unknown)
@ 0x5555576940b0 doris::OlapScanNode::start_scan_thread()
@ 0x555557696e1b doris::OlapScanNode::start_scan()
@ 0x55555769737d doris::OlapScanNode::get_next()
@ 0x5555570784f5 doris::PlanFragmentExecutor::get_next_internal()
@ 0x55555707d24c doris::PlanFragmentExecutor::open_internal()
@ 0x55555707e72f doris::PlanFragmentExecutor::open()
@ 0x555556ffab95 doris::FragmentExecState::execute()
@ 0x555556fff0ed doris::FragmentMgr::_exec_actual()
@ 0x5555570088ec std::_Function_handler<>::_M_invoke()
@ 0x55555719a099 doris::ThreadPool::dispatch_thread()
@ 0x555557193a8f doris::Thread::supervise_thread()
@ 0x7ffff72f2ea5 start_thread
@ 0x7ffff76058dd __clone
@ 0x0 (unknown)
```
2022-03-18 09:39:13 +08:00