067309c466
[fix](compile) fix compilation bug ( #8950 )
2022-04-11 13:12:34 +08:00
8a066e2586
[fix](vectorized) core dump on ST_AsText ( #8870 )
2022-04-11 09:39:32 +08:00
8158b05ea0
[fix] Fix bug that tablet data size and row num info are failed to report. ( #8945 )
...
Introduced from #8146
2022-04-11 09:38:28 +08:00
7f7172807f
[feature](function)(vectorized) Support all geolocation functions on vectorized engine ( #8846 )
2022-04-11 09:36:53 +08:00
0d761f9909
[feature-wip][UDF][DIP-1] Support variable-size input and output for Java UDF ( #8678 )
...
This feature is proposed in DSIP-1. This PR support variable-length input and output Java UDF.
2022-04-11 09:36:16 +08:00
6ed59bb98b
[refactor](code_style) remove useless inline #8933
...
1.Member functions defined in a class are inline by default (implicitly), and do not need to be added
2.inline is a keyword used for implementation, which has no effect when placed before the function declaration
2022-04-10 18:29:55 +08:00
1fe4ea4c7c
[Refactor-step1] Add OLAPInternalError to status ( #8900 )
2022-04-10 00:16:43 +08:00
ce6b5169c2
[fix](join) Fix error bucket num get in bucket shuffle join in dynamic partition ( #8891 )
2022-04-09 19:11:44 +08:00
c5718928df
[feature-wip](array-type) support explode and explode_outer table function ( #8766 )
...
explode(ArrayColumn) desc:
> Create a row for each element in the array column.
explode_outer(ArrayColumn) desc:
> Create a row for each element in the array column. Unlike explode, if the array is null or empty, it returns null.
Usage example:
1. create a table with array column, and insert some data;
2. open enable_lateral_view and enable_vectorized_engine;
```
set enable_lateral_view = true;
set enable_vectorized_engine=true;
```
3. use explode_outer
```
> select * from array_test;
+------+------+--------+
| k1 | k2 | k3 |
+------+------+--------+
| 3 | NULL | NULL |
| 1 | 2 | [1, 2] |
| 2 | 3 | NULL |
| 4 | NULL | [] |
+------+------+--------+
> select k1,explode_column from array_test LATERAL VIEW explode_outer(k3) TempExplodeView as explode_column;
+------+----------------+
| k1 | explode_column |
+------+----------------+
| 1 | 1 |
| 1 | 2 |
| 2 | NULL |
| 4 | NULL |
| 3 | NULL |
+------+----------------+
```
4. explode usage example. explode return empty rows while the ARRAY is null or empty
```
> select k1,explode_column from array_test LATERAL VIEW explode(k3) TempExplodeView as explode_column;
+------+----------------+
| k1 | explode_column |
+------+----------------+
| 1 | 1 |
| 1 | 2 |
+------+----------------+
```
2022-04-08 12:11:04 +08:00
bd0a3369b7
[fix] check disk capacity before writing data ( #8887 )
...
1. We forgot to check disk capacity when writing data.
2. TODO: the user specified disk capacity is not used now. We need to find a way to use it.
3. Avoid print too much compaction log when there is not suitable version for compaction.
2022-04-08 11:29:49 +08:00
f854f0e83e
remove unreadable char in comment ( #8909 )
2022-04-08 09:26:53 +08:00
dbbc6549bd
[feature](vectorized) support vexplode_bitmap ( #8890 )
2022-04-08 09:20:26 +08:00
3f04220d49
[typo] Fix typo in function.cpp ( #8873 )
2022-04-08 09:09:19 +08:00
0b98d78664
[improvement](hll) Optimize Hyperloglog ( #8829 )
...
In meituan, pr #6625 was revert due to the oom probleam.
currently, we are trying to modify the old hyperloglog, based on pr #8555 , we did some works.
via some test, we find it better than old hll, and better than apache:master hll.
Changes summary:
- use SIMD max tp speed up heavy function _merge_registers
- use phmap::flat_hash_set rather than std::set
- replace std::max
- other small changes
2022-04-08 09:06:08 +08:00
519305cb22
[feature-wip] (memory tracker) (step4) Switch TLS mem tracker to separate more detailed memory usage ( #8669 )
...
Based on #8605 , Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.
2022-04-08 09:02:26 +08:00
7fb4b6a6e2
[chore](tsan) add file mremap_fallback for tsan ( #8665 )
2022-04-08 09:01:53 +08:00
d51545a952
[fix](ut)(memory-leak) Fix be asan ut failed and hdfs file reader memory leak ( #8905 )
2022-04-08 00:07:00 +08:00
02be8176c3
[fix] access parallel_flat_hash_map via thread safely methods ( #8854 )
...
Iterator of parallel_flat_hash_map is not thread safely, so
we should use if_contains instead.
2022-04-07 11:35:59 +08:00
ca4055244e
[fix](storage) Fix core bug of convert to predicate column ( #8833 )
...
recurrent:
When `enable_low_cardinality_optimize = true`, for the TPCH dataset, using the following SQL query will Core
```sql
select count(*) from lineitem where l_comment = 'ously even exc';
```
This SQL will trigger the execution of `ColumnDictionary::convert_to_predicate_column_if_dictionary`, and `res->reserve(_codes.size())` is problematic because the current `_codes.size()` is smaller than its reserve value, so inserting a value into `PredicateColumn` will Core.
2022-04-07 11:29:26 +08:00
98cab78320
[refactor](schema_hash) remove schema_hash since every tablet id in be is unique ( #8574 )
2022-04-07 08:37:45 +08:00
e53c90fbef
min and max window function bug fix ( #8822 )
...
[Fix bug] min and max window function bug fix #8822
2022-04-07 08:36:33 +08:00
f90a1a1919
[fix](ut)(compile) Fix ut failure at functions_geo and compilation bug ( #8843 )
2022-04-05 21:30:40 +08:00
03c5d5d677
fix some error on build.sh && fix build fail with clang on runtime_profile ( #8748 )
2022-04-05 15:52:53 +08:00
d07b49247e
rm sequential file ( #8713 )
...
[refactor]remove sequential file reader from env
2022-04-04 17:49:06 +08:00
fcefed7c1c
[Bug][Vectorized] Fix core bug of segment vectorized ( #8800 )
...
* [Bug][Vectorized] Fix core bug of segment vectorized
1. Read table with delete condition
2. Read table with default value HLL/Bitmap Column
* refactor some code
Co-authored-by: lihaopeng <lihaopeng@baidu.com >
2022-04-03 19:50:25 +08:00
33736e45fa
[fix](table-function) Fixed unreasonable nullable conversion ( #8818 )
2022-04-03 11:02:35 +08:00
78b85414d6
[fix](debug) get_hash_value_fvn DCHECK failed ( #8811 )
...
* fix_get_hash_value_fvn
* fix compile
2022-04-03 10:55:15 +08:00
f3c6ddf651
[feature](function) Support geolocation functions on vectorized engine ( #8790 )
2022-04-03 10:50:54 +08:00
586bec79f5
[fix](storage) Fix query result error due to find code by bound ( #8787 )
...
Problem recurrence
SSB single table `lineorder_flat`, the query SQL is as follows:
```sql
SELECT
sum(LO_REVENUE),
(LO_ORDERDATE DIV 10000) AS year,
P_BRAND
FROM lineorder_flat
WHERE P_BRAND >= 'MFGR#22211111' AND P_BRAND <= 'MFGR#22281111' AND S_REGION = 'ASIA' and (LO_ORDERDATE DIV 10000) = 1992
GROUP BY
year,
P_BRAND
ORDER BY
year,
P_BRAND;
```
when `enable_low_cardinality_optimize=false`, query result:
```sql
+-------------------+------+-----------+
| sum(`LO_REVENUE`) | year | P_BRAND |
+-------------------+------+-----------+
| 65423264312 | 1992 | MFGR#2222 |
| 66936772687 | 1992 | MFGR#2223 |
| 64047191934 | 1992 | MFGR#2224 |
| 65744559138 | 1992 | MFGR#2225 |
| 66993045668 | 1992 | MFGR#2226 |
| 67411226147 | 1992 | MFGR#2227 |
| 69390885970 | 1992 | MFGR#2228 |
+-------------------+------+-----------+
```
when `enable_low_cardinality_optimize=true`, query result:
```sql
+-------------------+------+-----------+
| sum(`LO_REVENUE`) | year | P_BRAND |
+-------------------+------+-----------+
| 66936772687 | 1992 | MFGR#2223 |
| 64047191934 | 1992 | MFGR#2224 |
| 65744559138 | 1992 | MFGR#2225 |
| 66993045668 | 1992 | MFGR#2226 |
| 67411226147 | 1992 | MFGR#2227 |
| 69390885970 | 1992 | MFGR#2228 |
+-------------------+------+-----------+
```
One line less than the correct result.
The reason is that 'MFGR#22211111' is not in the dictionary, so get the boundary code (`find_code_by_bound` method), but there is a bug here.
2022-04-03 10:38:14 +08:00
6cc8762ce7
[fix](load) fix concurrent synchronization problem in NodeChannel::try_send_batch ( #8728 )
...
The patch fixes two problems.
1. Memory order problem accessing _last_patch_processed_finished and in_flight, actually _last_patch_processed_finished is redundant, so the patch removes it.
2. synchronization in join on cid.
Fix for #8725 .
2022-04-03 10:15:45 +08:00
4076c5466b
[refactor][improvement](type_info) use template and single instance to refactor get type info logic ( #8680 )
...
1. use const pointer instead of shared_ptr
2. Restrict array types to support only primitive types and nest up to 9 levels.
2022-04-03 10:10:36 +08:00
6b0a642390
[feature][vectorized] Support explode json array func #8526 ( #8539 )
2022-04-03 10:06:47 +08:00
a75e4a1469
Window funnel ( #8485 )
...
Add new feature window funnel
2022-04-02 22:08:50 +08:00
0c98c1ee03
[Improvement][fix](compaction) Change min_compaction_failure_interval_sec to 5 and fix a bug of log ( #8781 )
...
see issue #8767
2022-04-02 13:00:56 +08:00
4d516bece8
[feature-wip](array-type)Add element_at and subscript functions ( #8597 )
...
Describe the overview of changes.
1. add function element_at;
2. support element_subscript([]) to get element of array, col_array[N] <==> element_at(col_array, N);
3. return error message instead of BE crash while array function execute failed;
element_at(array, index) desc:
> Returns element of array at given **(1-based)** index.
If **index < 0**, accesses elements from the last to the first.
Returns NULL if the index exceeds the length of the array or the array is NULL.
Usage example:
1. create table with ARRAY type column and insert some data:
```
+------+------+--------+
| k1 | k2 | k3 |
+------+------+--------+
| 1 | 2 | [1, 2] |
| 2 | 3 | NULL |
| 4 | NULL | [] |
| 3 | NULL | NULL |
+------+------+--------+
```
2. enable vectorized:
```
set enable_vectorized_engine=true;
```
3. element_subscript([]) usage example:
```
> select k1,k3,k3[1] from array_test;
+------+--------+----------------------------+
| k1 | k3 | %element_extract%(`k3`, 1) |
+------+--------+----------------------------+
| 3 | NULL | NULL |
| 1 | [1, 2] | 1 |
| 2 | NULL | NULL |
| 4 | [] | NULL |
+------+--------+----------------------------+
```
4. element_at function usage example:
```
> select k1,k3 from array_test where element_at(k3, -1) = 2;
+------+--------+
| k1 | k3 |
+------+--------+
| 1 | [1, 2] |
+------+--------+
```
2022-04-02 12:03:56 +08:00
6c5bbc6e4c
fix agg functions check failed from empty table ( #8785 )
...
fix agg functions check failed from empty table
2022-04-02 10:44:55 +08:00
3698176c40
use row_size as name of variable indicating rows rather than column_size ( #8803 )
...
use row_size as name of variable indicating rows rather than column_size
2022-04-02 10:38:16 +08:00
c31c6ae91a
[improvement](storage) Add more detailed timer on SegmentIter in profile ( #8768 )
...
* [improvement](storage) Add more detailed timer on SegmentIter in profile
* add OutputColumnTime
2022-04-02 10:35:28 +08:00
f3539cd3ba
[refactor] remove useless code ( #8773 )
2022-04-02 10:28:16 +08:00
f315fbd5ac
[fix] vectorization decimal avg inconsistent ( #8746 )
2022-03-31 23:00:40 +08:00
71ac86b183
[improvement](join) Support join project in query engine ( #8722 )
2022-03-31 23:00:07 +08:00
2c774f5c79
[ubsan] avoid null bit offset to be 255 ( #8675 )
...
For now, invalid null bit offset is -1, but bit_offset in
NullIndicator is be of uint8_t, so invalid null bit offset
would be 255. Ubsan detects it.
2022-03-31 22:58:51 +08:00
01cc0573aa
[Bug][Vectorized] fix core dump with HLL and some refactor of Decompressor ( #8668 )
2022-03-31 17:05:08 +08:00
e684ffa6f5
[fix](compile) fix bug for StorageMediumPB type error ( #8777 )
2022-03-31 16:40:20 +08:00
71d050d0bc
[improvement][test] (log)Add more error message on connect to hdfs failure, and corresponding ut ( #8755 )
...
I met a failure of reading hdfs files in broker load, the error message is unclear and
I spent a lot of time to locate the problem.
```
W0330 11:08:01.093812 2755268 broker_scan_node.cpp:364] Scanner[0] process failed. status=connect failed.
W0330 11:08:01.097682 2018787 fragment_mgr.cpp:234] Got error while opening fragment 712ae2b848324cb6-94a83d646173c1e9: Internal error: connect failed.
W0330 11:08:01.097702 2018787 tablet_sink.cpp:148] connect failed.
```
We should add more information when connect to hdfs failed.
2022-03-31 13:56:25 +08:00
d24735e95a
[refactor] add some clang-tidy checks && some code style fix ( #8752 )
2022-03-31 13:53:41 +08:00
82792726ab
[ubsan] fix some ubsan complains on vector and pointer ( #8733 )
2022-03-31 13:50:25 +08:00
835cf1fe20
[fix](data-sink) Sinks call DataSink::close instead of operating _closed directly ( #8727 )
...
TabletSink::_is_closed is duplicated with DataSink::_closed and
all sinks should call DataSink::close rather than set _closed
directly.
Fix for https://github.com/apache/incubator-doris/issues/8726 .
2022-03-31 12:36:33 +08:00
9e3af471e5
[refactor] comment code converting decimal format ( #8708 )
...
The comment can help newbies read code much more quickly.
2022-03-31 12:32:49 +08:00
13b7af27b6
[refactor] remove useless code in DataTypeDecimal ( #8707 )
2022-03-31 12:30:35 +08:00