Commit Graph

1679 Commits

Author SHA1 Message Date
53ae24912f [vectorized](feature) support partition sort node (#19708) 2023-05-25 11:22:02 +08:00
2ec1d282c5 [fix](sink) fix END_OF_FILE error for pipeline caused by VDataStreamSender eof (#20007)
* [fix](sink) fix END_OF_FILE error for pipeline caused by VDataStreamSender eof
2023-05-25 10:29:35 +08:00
c730033595 [improvement](exchange) data stream sender stop sending data to receiver if it returns eos early (#19847)
For broadcast join, only one build fragment instance will build hash table, other fragment instances just receive and throw away build side data, this is waste of memory and cpu.

This PR improve this condition, data stream receiver tells sender that it does not need data from sender, and sender stops sending anydata to it.
2023-05-24 15:11:32 +08:00
14b4c7abf9 [fix](hashtable) Check query cancel status during build hash table #19970
should cancel query during hash table build stage if the query is cancelled.
2023-05-24 14:24:03 +08:00
cf7a74f6ec [fix](memory) query check cancel while waiting for memory in Allocator, and optimize log (#19967)
After the query check process memory exceed limit in Allocator, it will wait up to 5s.
Before, Allocator will not check whether the query is canceled while waiting for memory, this causes the query to not end quickly.
2023-05-24 11:08:48 +08:00
08ec5e2eb5 [fix](function) fix result column is nullable type when fast execute (#19889) 2023-05-24 10:27:50 +08:00
a434a49f71 [Bug](decimal) fix mod function (#19925)
Bug:
select id, kdcml * ktint, kdcml / ktint, kdcml % ktint from expr_test order by id;
+------+-------------------+-------------------+-----------------------+
| id | kdcml * ktint | kdcml / ktint | kdcml % ktint |
+------+-------------------+-------------------+-----------------------+
| NULL | NULL | NULL | NULL |
| 1 | 24.395 | 24.395 | -4702111234474983.74 |
| 2 | 68.968 | 17.242 | -4702111234474983.74 |
| 3 | 146.268 | 16.252 | -4702111234474983.74 |
| 4 | 275.772 | 17.235 | -4702111234474983.74 |
| 5 | 487.470 | 19.498 | -4702111234474983.74 |
| 6 | 827.244 | 22.979 | -4702111234474983.74 |
| 7 | 1364.860 | 27.854 | -4702111234474983.74 |
| 8 | 2205.928 | 34.467 | -4702111234474983.74 |
| 9 | 3509.595 | 43.328 | -4702111234474983.74 |
| 10 | 5514.790 | 55.147 | -4702111234474983.74 |
| 11 | 8578.988 | 70.900 | -4702111234474983.74 |
| 12 | 13235.484 | 91.913 | -4702111234474983.74 |
| 13 | 24.395 | 24.395 | -4702111234474983.74 |
| 14 | 68.968 | 17.242 | -4702111234474983.74 |
| 15 | 146.268 | 16.252 | -4702111234474983.74 |
| 16 | 275.772 | 17.235 | -4702111234474983.74 |
| 17 | 487.470 | 19.498 | -4702111234474983.74 |
| 18 | 827.244 | 22.979 | -4702111234474983.74 |
| 19 | 1364.860 | 27.854 | -4702111234474983.74 |
| 20 | 2205.928 | 34.467 | -4702111234474983.74 |
| 21 | 3509.595 | 43.328 | -4702111234474983.74 |
| 22 | 5514.790 | 55.147 | -4702111234474983.74 |
| 23 | 8578.988 | 70.900 | -4702111234474983.74 |
| 24 | 13235.484 | 91.913 | -4702111234474983.74 |
2023-05-23 18:24:31 +08:00
6efe6ef6e8 [Enhancement](scanner) allocate blocks in scanner_context on demand and free them on close (#19389)
Firstly, to reduce memory usage, we do not pre-allocate blocks, instead we lazily allocate block when upper call get_free_block. And when upper call return_free_block to return free block, we add the block to a queue for memory reuse, and we will free the blocks in the queue when the scanner_context was closed instead of destructed.
Secondly, to limit the memory usage of the scanner, we introduce a variable _free_blocks_capacity to indicate the current number of free blocks available to the scanners. The number of scanners that can be scheduled will be calculated based on this value.

ssb flat test
previous
lineorder 1.2G:
load time: 3s, query time: 0.355s
lineorder 5.8G:
load time: 330s, query time: 0.970s
load time: 349s, query time: 0.949s
load time: 349s, query time: 0.955s
load time: 360s, query time: 0.889s (pipeline enabled)
after
lineorder 1.2G:
load time: 3s, query time: 0.349s
lineorder 5.8G:
load time: 342s, query time: 0.929s
load time: 337s, query time: 0.913s
load time: 345s, query time: 0.946s
load time: 346s, query time: 0.865s (pipeline enabled)
2023-05-23 18:17:21 +08:00
fe111207a9 [Fix](lazy_open) Fix lazy open null point (#19829) 2023-05-23 09:17:46 +08:00
3dcdadcea6 [Improvement](function) support decimalv3 for function least and greatest (#19931) 2023-05-22 22:48:44 +08:00
53ba46e404 [Fix][Refactor] Fix 'not member call on null pointer of type 'doris::TextConverter' error in ubsan env and refactor text converter. (#19849)
Fix 'not member call on null pointer of type doris::TextConverter' error in ubsan env and refactor text converter.
2023-05-22 21:00:19 +08:00
6762af3c9b [Improve](struct)improve struct support into outfile (#19894)
support select into outfile for struct type
2023-05-22 18:45:56 +08:00
Pxl
9945067e3c [Bug](function) make VcompoundPred optimization work well (#19870)
make VcompoundPred optimization work well
#19818 this pr try to enable VcompoundPred optimization but get wrong result on tpcds q28.
The reason is some nullable logic on mysql need special handling.

mysql [regression_test_tpcds_sf1_p1]>select null and false;
+----------------+
| NULL AND FALSE |
+----------------+
|              0 |
+----------------+
1 row in set (0.00 sec)

mysql [regression_test_tpcds_sf1_p1]>select null and true;
+---------------+
| NULL AND TRUE |
+---------------+
| NULL          |
+---------------+
1 row in set (0.00 sec)

mysql [regression_test_tpcds_sf1_p1]>select null or false;
+---------------+
| NULL OR FALSE |
+---------------+
| NULL          |
+---------------+
1 row in set (0.00 sec)

mysql [regression_test_tpcds_sf1_p1]>select null or true;
+--------------+
| NULL OR TRUE |
+--------------+
|            1 |
+--------------+
1 row in set (0.00 sec)
2023-05-22 18:32:17 +08:00
Pxl
d64be9565d [Bug](function) fix function in get wrong result when input const column (#19791)
fix function in get wrong result when input const column
2023-05-22 10:58:29 +08:00
5547bbbaef [decimalv3](function) support function width_bucket (#19806) 2023-05-19 20:28:59 +08:00
c4900eb658 [Bug](DecimalV3) fix decimalv3 functions (#19801) 2023-05-19 14:10:01 +08:00
3e010bbee7 [improvement](profile) add profile counter 'BytesSent' for VDataBufferSender (#19826) 2023-05-19 08:46:50 +08:00
1d01136b1b [Fix](parquet-reader) Fix partition field conjuncts not work. (#19837)
Fix partition field conjuncts not work.
Add predicate_partition_columns in _slot_id_to_filter_conjuncts(single slot conjuncts) to _filter_conjuncts, others should had been added from not_single_slot_filter_conjuncts.
2023-05-19 08:44:02 +08:00
481e9aebdb [Refactor](spark load) remove parquet scanner (#19251) 2023-05-18 19:19:13 +08:00
fd4fa5c64e [Optimize](row store) optimize serialization and deserialization (#19691)
1. Get DataTypeSerde in advance to avoid get temporary DataTypeSerde iterate each column
2. Iterate the original row once is enoungh for deserializing by introducing a map for record the index of each column's unique id
2023-05-18 16:22:38 +08:00
294599ee45 [feature](jsonb) rename JSONB type name and function name to JSON (#19774)
To be more compatible with MySQL, rename JSONB type name and function name to JSON.

The old JSONB type name and jsonb_xx function can still be used for backward compatibility.

There is a function jsonb_extract remained since json_extract is used by json string function and more work need to change it. It will be changed further.
2023-05-18 16:16:52 +08:00
068a32bc49 [Improvement](memory) faststring use Allocator #19762
After the outer catch exception, faststring resize reserve build may throw a memory alloc failure exception from the Allocator.

Currently page body compress will catch memory alloc failure exception
2023-05-18 15:00:49 +08:00
62458ed0f4 [enhancement](compaction) not core when init failed (#19754) 2023-05-18 12:06:22 +08:00
fe42e52851 [pipeline](CTE) Support multi stream data sink in pipeline (#19519) 2023-05-18 10:34:37 +08:00
88ca4f3e6b [feature](like) make like regexp used as a sql function (#19755) 2023-05-18 10:03:12 +08:00
5fa956b0d6 [Bug](pipeline) RegressionTest failed release resouce cause DCHECK failed #19773 2023-05-18 08:35:57 +08:00
79d30cfe46 [feature](compact) Duplicate with no keys tables compaction coredump (#19490)
Co-authored-by: yuxianbing <yuxianbing@yy.com>
2023-05-17 22:22:14 +08:00
49c6bbce84 [improvement](load) do not create pthread in tablet_sink (#19465)
add bvar stat for streamload.
2023-05-17 22:05:54 +08:00
dc18da2ce4 [Log](expr) add DCHECK info for expr close DCHECK (#19683) 2023-05-17 21:37:38 +08:00
30c4f25cb3 [fix](multi-catalog) verify the precision of datetime types for each data source (#19544)
Fix threes bugs of timestampv2 precision:
1. Hive catalog doesn't set the precision of timestampv2, and can't get the precision from hive metastore, so set the largest precision for timestampv2;
2. Jdbc catalog use datetimev1 to parse timestamp, and convert to timestampv2, so the precision is lost.
3. TVF doesn't use the precision from meta data of file format.
2023-05-17 20:50:15 +08:00
272a7565b8 [improvement](tracing) Remove useless span levels from be side tracing (#19665)
1. Remove an exec node method corresponding to a span and replace it with an exec node corresponding to a span;
2. Fix some problems with tracing in pipeline.
2023-05-17 19:04:52 +08:00
48ec530d2c [fix](functions) fix least/greatest function coredump bug (#19462)
fix least/greatest function coredump bug
2023-05-17 14:12:52 +08:00
56809230d1 [Improvement](string function) optimize substring and in string set (#19257)
* [Improvement](string function) optimize substring and in string set

* update
2023-05-17 14:09:52 +08:00
8fd1eb0d1e [minor](hash table) parameterize hash table (#19653) 2023-05-17 09:58:26 +08:00
2bdfaac609 [fix](ubsan) fix ubsan errors (#19658)
ixu ubsan errors:

doris/be/src/util/string_parser.hpp:275:58: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'

doris/be/src/vec/functions/functions_comparison.h:214:51: runtime error: addition of unsigned offset to 0x7fea6c6b7010 overflowed to 0x7fea6c6b700c

doris/be/src/vec/functions/multiply.cpp:67:50: runtime error: signed integer overflow: 1295699415680000000 * 0x0000000000015401d0a4cd4890a77700 cannot be represented in type '__int128

doris/be/src/vec/aggregate_functions/aggregate_function_percentile_approx.h:445:73: runtime error: addition of unsigned offset to 0x7feca3343d10 overflowed to 0x7feca3343d08 

doris/be/src/exec/schema_scanner/schema_tables_scanner.cpp:330:24: run
2023-05-17 09:32:03 +08:00
Pxl
7f73749b88 [Bug](pipeline) fix distributionColumnIds not updated correct when outputColumnUnique… (#19704)
fix distributionColumnIds not updated correct when outputColumnUnique
2023-05-17 00:13:10 +08:00
16f5d3d5b3 [Improvement](memory) new page use Allocator (#19472) 2023-05-16 19:09:17 +08:00
325a1d4b28 [vectorized](function) support array_count function (#18557)
support array_count function.
array_count:Returns the number of non-zero and non-null elements in the given array.
2023-05-16 17:00:01 +08:00
e22f5891d2 [WIP](row store) two phase opt read row store (#18654) 2023-05-16 13:21:58 +08:00
Pxl
b927f8cd37 [Chore](asan) change asan_suppr from interceptor_via_lib to interceptor_via_fun (#19636)
change asan_suppr from interceptor_via_lib to interceptor_via_fun
2023-05-16 10:51:43 +08:00
c87e78dc35 [bug](jsonb) fix jsonb query bug When the json key value contains "." (#19185)
Issue Number: close #19173

mysql> SELECT jsonb_extract('{"a.b.c":{"k1":"v31", "k2.a1": 300},"a":"opentelemetry"}', '$."a.b.c".k1');
+-------------------------------------------------------------------------------------------+
| jsonb_extract('{"a.b.c":{"k1":"v31", "k2.a1": 300},"a":"opentelemetry"}', '$."a.b.c".k1') |
+-------------------------------------------------------------------------------------------+
| "v31" |
+-------------------------------------------------------------------------------------------+
1 row in set (0.06 sec)
2023-05-15 15:43:12 +08:00
Pxl
2a02561863 [Bug](ubsan) fix some wrong downcast founded by ubsan (#19591)
fix some wrong downcast founded by ubsan.
```cpp
doris/be/src/olap/bloom_filter_predicate.h:43:32: runtime error: downcast of address 0x7f8ec2b691a0 which does not point to an object of type 'doris::BloomFilterColumnPredicate<doris::TYPE_DATE>::SpecificFilter' (aka 'BloomFilterFunc<(doris::PrimitiveType)11U>')
0x7f8ec2b691a0: note: object is of type 'doris::BloomFilterFunc<(doris::PrimitiveType)12>'
 e5 55 00 00  10 74 58 42 e5 55 00 00  00 00 10 00 8e 7f 00 00  20 07 6f cc 8e 7f 00 00  80 fe 68 cc
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'doris::BloomFilterFunc<(doris::PrimitiveType)12>'  
```
1. TYPE_DATE/TYPE_DATETIME have same data format, so I change the cast about bloom filter to reinterpret cast.
```cpp
doris/be/src/vec/exec/format/orc/vorc_reader.h:281:17: runtime error: downcast of address 0x7f562f4c3180 which does not point to an object of type 'ColumnVector<int>'
0x7f562f4c3180: note: object is of type 'doris::vectorized::ColumnDecimal<doris::vectorized::Decimal<int> >'
 74 65 00 00  20 91 70 f5 ca 55 00 00  02 00 00 00 00 00 00 00  f0 d4 4c 2f 56 7f 00 00  f0 d4 4c 2f
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'doris::vectorized::ColumnDecimal<doris::vectorized::Decimal<int> >'
```
2. doris use ColumnDecimal to store decimal elements.
2023-05-15 14:27:48 +08:00
Pxl
4eb2604789 [Bug](function) fix function define of Retention inconsist and change some static_cast to assert cast (#19455)
1. fix function define of `Retention` inconsist, this function return tinyint on `FE` and return uint8 on `BE`
2. make assert_cast support cast to derived
3. change some static cast to assert cast
4. support sum(bool)/avg(bool)
2023-05-15 11:50:02 +08:00
92bf485abd [Bug] Fix doris pipeline shared scan and top n opt (#19599) 2023-05-15 10:00:44 +08:00
f8ef25bb10 [enhancement](load) lazy-open necessary partitions when load (#18874) 2023-05-14 16:09:55 +08:00
91cdb79d89 [Bugfix](Outfile) fix that export data to parquet and orc file format (#19436)
1. support export `LARGEINT` data type to parquet/orc file format.
2. Export the DORIS `DATE/DATETIME` type to the `Date/Timestamp` logic type of parquet file format.
3. Fix that the data is not correct when the DATE type data is exported to ORC.
2023-05-13 22:39:24 +08:00
cb943ae7ca [pipeline](bug) DCHECK may failed in pip sender queue (#19545)
DCHECK may failed in pip sender queue
2023-05-12 20:39:18 +08:00
8ef9212ddc [enhancement](exceptionsafe) force check exec node method's return value (#19538) 2023-05-12 10:21:00 +08:00
e9392780a9 [fix](nereids)fix some nereids planner bugs (#19509)
1.some encrypt and decrypt functions have wrong blockEncryptionMode
2.topN node should compare tuples from intermediate_row_desc with first_sort_slot.tuple_id
3.must keep the limit if it's an uncorrelated in-subquery with limit on sort, like select a from t1 where a in ( select b from t2 order by xx limit yy )
2023-05-12 09:06:16 +08:00
39ec8aa64c [refactor](complex-type) refactor array/map/struct literal to not invoke execute() function in prepare state (#19068) 2023-05-11 18:44:37 +08:00