doris

Author	SHA1	Message	Date
zhangstar333	53ae24912f	[vectorized](feature) support partition sort node (#19708 )	2023-05-25 11:22:02 +08:00
TengJianPing	2ec1d282c5	[fix](sink) fix END_OF_FILE error for pipeline caused by VDataStreamSender eof (#20007 ) * [fix](sink) fix END_OF_FILE error for pipeline caused by VDataStreamSender eof	2023-05-25 10:29:35 +08:00
TengJianPing	c730033595	[improvement](exchange) data stream sender stop sending data to receiver if it returns eos early (#19847 ) For broadcast join, only one build fragment instance will build hash table, other fragment instances just receive and throw away build side data, this is waste of memory and cpu. This PR improve this condition, data stream receiver tells sender that it does not need data from sender, and sender stops sending anydata to it.	2023-05-24 15:11:32 +08:00
Xinyi Zou	14b4c7abf9	[fix](hashtable) Check query cancel status during build hash table #19970 should cancel query during hash table build stage if the query is cancelled.	2023-05-24 14:24:03 +08:00
Xinyi Zou	cf7a74f6ec	[fix](memory) query check cancel while waiting for memory in Allocator, and optimize log (#19967 ) After the query check process memory exceed limit in Allocator, it will wait up to 5s. Before, Allocator will not check whether the query is canceled while waiting for memory, this causes the query to not end quickly.	2023-05-24 11:08:48 +08:00
YueW	08ec5e2eb5	[fix](function) fix result column is nullable type when fast execute (#19889 )	2023-05-24 10:27:50 +08:00
Gabriel	a434a49f71	[Bug](decimal) fix `mod` function (#19925 ) Bug: select id, kdcml * ktint, kdcml / ktint, kdcml % ktint from expr_test order by id; +------+-------------------+-------------------+-----------------------+ \| id \| kdcml * ktint \| kdcml / ktint \| kdcml % ktint \| +------+-------------------+-------------------+-----------------------+ \| NULL \| NULL \| NULL \| NULL \| \| 1 \| 24.395 \| 24.395 \| -4702111234474983.74 \| \| 2 \| 68.968 \| 17.242 \| -4702111234474983.74 \| \| 3 \| 146.268 \| 16.252 \| -4702111234474983.74 \| \| 4 \| 275.772 \| 17.235 \| -4702111234474983.74 \| \| 5 \| 487.470 \| 19.498 \| -4702111234474983.74 \| \| 6 \| 827.244 \| 22.979 \| -4702111234474983.74 \| \| 7 \| 1364.860 \| 27.854 \| -4702111234474983.74 \| \| 8 \| 2205.928 \| 34.467 \| -4702111234474983.74 \| \| 9 \| 3509.595 \| 43.328 \| -4702111234474983.74 \| \| 10 \| 5514.790 \| 55.147 \| -4702111234474983.74 \| \| 11 \| 8578.988 \| 70.900 \| -4702111234474983.74 \| \| 12 \| 13235.484 \| 91.913 \| -4702111234474983.74 \| \| 13 \| 24.395 \| 24.395 \| -4702111234474983.74 \| \| 14 \| 68.968 \| 17.242 \| -4702111234474983.74 \| \| 15 \| 146.268 \| 16.252 \| -4702111234474983.74 \| \| 16 \| 275.772 \| 17.235 \| -4702111234474983.74 \| \| 17 \| 487.470 \| 19.498 \| -4702111234474983.74 \| \| 18 \| 827.244 \| 22.979 \| -4702111234474983.74 \| \| 19 \| 1364.860 \| 27.854 \| -4702111234474983.74 \| \| 20 \| 2205.928 \| 34.467 \| -4702111234474983.74 \| \| 21 \| 3509.595 \| 43.328 \| -4702111234474983.74 \| \| 22 \| 5514.790 \| 55.147 \| -4702111234474983.74 \| \| 23 \| 8578.988 \| 70.900 \| -4702111234474983.74 \| \| 24 \| 13235.484 \| 91.913 \| -4702111234474983.74 \|	2023-05-23 18:24:31 +08:00
Chuanle Chen	6efe6ef6e8	[Enhancement](scanner) allocate blocks in scanner_context on demand and free them on close (#19389 ) Firstly, to reduce memory usage, we do not pre-allocate blocks, instead we lazily allocate block when upper call get_free_block. And when upper call return_free_block to return free block, we add the block to a queue for memory reuse, and we will free the blocks in the queue when the scanner_context was closed instead of destructed. Secondly, to limit the memory usage of the scanner, we introduce a variable _free_blocks_capacity to indicate the current number of free blocks available to the scanners. The number of scanners that can be scheduled will be calculated based on this value. ssb flat test previous lineorder 1.2G: load time: 3s, query time: 0.355s lineorder 5.8G: load time: 330s, query time: 0.970s load time: 349s, query time: 0.949s load time: 349s, query time: 0.955s load time: 360s, query time: 0.889s (pipeline enabled) after lineorder 1.2G: load time: 3s, query time: 0.349s lineorder 5.8G: load time: 342s, query time: 0.929s load time: 337s, query time: 0.913s load time: 345s, query time: 0.946s load time: 346s, query time: 0.865s (pipeline enabled)	2023-05-23 18:17:21 +08:00
HHoflittlefish777	fe111207a9	[Fix](lazy_open) Fix lazy open null point (#19829 )	2023-05-23 09:17:46 +08:00
Gabriel	3dcdadcea6	[Improvement](function) support decimalv3 for function `least` and `greatest` (#19931 )	2023-05-22 22:48:44 +08:00
Qi Chen	53ba46e404	[Fix][Refactor] Fix 'not member call on null pointer of type 'doris::TextConverter' error in ubsan env and refactor text converter. (#19849 ) Fix 'not member call on null pointer of type doris::TextConverter' error in ubsan env and refactor text converter.	2023-05-22 21:00:19 +08:00
amory	6762af3c9b	[Improve](struct)improve struct support into outfile (#19894 ) support select into outfile for struct type	2023-05-22 18:45:56 +08:00
Pxl	9945067e3c	[Bug](function) make VcompoundPred optimization work well (#19870 ) make VcompoundPred optimization work well #19818 this pr try to enable VcompoundPred optimization but get wrong result on tpcds q28. The reason is some nullable logic on mysql need special handling. mysql [regression_test_tpcds_sf1_p1]>select null and false; +----------------+ \| NULL AND FALSE \| +----------------+ \| 0 \| +----------------+ 1 row in set (0.00 sec) mysql [regression_test_tpcds_sf1_p1]>select null and true; +---------------+ \| NULL AND TRUE \| +---------------+ \| NULL \| +---------------+ 1 row in set (0.00 sec) mysql [regression_test_tpcds_sf1_p1]>select null or false; +---------------+ \| NULL OR FALSE \| +---------------+ \| NULL \| +---------------+ 1 row in set (0.00 sec) mysql [regression_test_tpcds_sf1_p1]>select null or true; +--------------+ \| NULL OR TRUE \| +--------------+ \| 1 \| +--------------+ 1 row in set (0.00 sec)	2023-05-22 18:32:17 +08:00
Pxl	d64be9565d	[Bug](function) fix function in get wrong result when input const column (#19791 ) fix function in get wrong result when input const column	2023-05-22 10:58:29 +08:00
Gabriel	5547bbbaef	[decimalv3](function) support function width_bucket (#19806 )	2023-05-19 20:28:59 +08:00
Gabriel	c4900eb658	[Bug](DecimalV3) fix decimalv3 functions (#19801 )	2023-05-19 14:10:01 +08:00
luozenglin	3e010bbee7	[improvement](profile) add profile counter 'BytesSent' for VDataBufferSender (#19826 )	2023-05-19 08:46:50 +08:00
Qi Chen	1d01136b1b	[Fix](parquet-reader) Fix partition field conjuncts not work. (#19837 ) Fix partition field conjuncts not work. Add predicate_partition_columns in _slot_id_to_filter_conjuncts(single slot conjuncts) to _filter_conjuncts, others should had been added from not_single_slot_filter_conjuncts.	2023-05-19 08:44:02 +08:00
WenYao	481e9aebdb	[Refactor](spark load) remove parquet scanner (#19251 )	2023-05-18 19:19:13 +08:00
lihangyu	fd4fa5c64e	[Optimize](row store) optimize serialization and deserialization (#19691 ) 1. Get DataTypeSerde in advance to avoid get temporary DataTypeSerde iterate each column 2. Iterate the original row once is enoungh for deserializing by introducing a map for record the index of each column's unique id	2023-05-18 16:22:38 +08:00
Kang	294599ee45	[feature](jsonb) rename JSONB type name and function name to JSON (#19774 ) To be more compatible with MySQL, rename JSONB type name and function name to JSON. The old JSONB type name and jsonb_xx function can still be used for backward compatibility. There is a function jsonb_extract remained since json_extract is used by json string function and more work need to change it. It will be changed further.	2023-05-18 16:16:52 +08:00
Xinyi Zou	068a32bc49	[Improvement](memory) faststring use Allocator #19762 After the outer catch exception, faststring resize reserve build may throw a memory alloc failure exception from the Allocator. Currently page body compress will catch memory alloc failure exception	2023-05-18 15:00:49 +08:00
yixiutt	62458ed0f4	[enhancement](compaction) not core when init failed (#19754 )	2023-05-18 12:06:22 +08:00
HappenLee	fe42e52851	[pipeline](CTE) Support multi stream data sink in pipeline (#19519 )	2023-05-18 10:34:37 +08:00
Kang	88ca4f3e6b	[feature](like) make like regexp used as a sql function (#19755 )	2023-05-18 10:03:12 +08:00
HappenLee	5fa956b0d6	[Bug](pipeline) RegressionTest failed release resouce cause DCHECK failed #19773	2023-05-18 08:35:57 +08:00
herry2038	79d30cfe46	[feature](compact) Duplicate with no keys tables compaction coredump (#19490 ) Co-authored-by: yuxianbing <yuxianbing@yy.com>	2023-05-17 22:22:14 +08:00
Yongqiang YANG	49c6bbce84	[improvement](load) do not create pthread in tablet_sink (#19465 ) add bvar stat for streamload.	2023-05-17 22:05:54 +08:00
HappenLee	dc18da2ce4	[Log](expr) add DCHECK info for expr close DCHECK (#19683 )	2023-05-17 21:37:38 +08:00
Ashin Gau	30c4f25cb3	[fix](multi-catalog) verify the precision of datetime types for each data source (#19544 ) Fix threes bugs of timestampv2 precision: 1. Hive catalog doesn't set the precision of timestampv2, and can't get the precision from hive metastore, so set the largest precision for timestampv2; 2. Jdbc catalog use datetimev1 to parse timestamp, and convert to timestampv2, so the precision is lost. 3. TVF doesn't use the precision from meta data of file format.	2023-05-17 20:50:15 +08:00
luozenglin	272a7565b8	[improvement](tracing) Remove useless span levels from be side tracing (#19665 ) 1. Remove an exec node method corresponding to a span and replace it with an exec node corresponding to a span; 2. Fix some problems with tracing in pipeline.	2023-05-17 19:04:52 +08:00
xueweizhang	48ec530d2c	[fix](functions) fix least/greatest function coredump bug (#19462 ) fix least/greatest function coredump bug	2023-05-17 14:12:52 +08:00
Gabriel	56809230d1	[Improvement](string function) optimize substring and in string set (#19257 ) * [Improvement](string function) optimize substring and in string set * update	2023-05-17 14:09:52 +08:00
Gabriel	8fd1eb0d1e	[minor](hash table) parameterize hash table (#19653 )	2023-05-17 09:58:26 +08:00
TengJianPing	2bdfaac609	[fix](ubsan) fix ubsan errors (#19658 ) ixu ubsan errors: doris/be/src/util/string_parser.hpp:275:58: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int' doris/be/src/vec/functions/functions_comparison.h:214:51: runtime error: addition of unsigned offset to 0x7fea6c6b7010 overflowed to 0x7fea6c6b700c doris/be/src/vec/functions/multiply.cpp:67:50: runtime error: signed integer overflow: 1295699415680000000 * 0x0000000000015401d0a4cd4890a77700 cannot be represented in type '__int128 doris/be/src/vec/aggregate_functions/aggregate_function_percentile_approx.h:445:73: runtime error: addition of unsigned offset to 0x7feca3343d10 overflowed to 0x7feca3343d08 doris/be/src/exec/schema_scanner/schema_tables_scanner.cpp:330:24: run	2023-05-17 09:32:03 +08:00
Pxl	7f73749b88	[Bug](pipeline) fix distributionColumnIds not updated correct when outputColumnUnique… (#19704 ) fix distributionColumnIds not updated correct when outputColumnUnique	2023-05-17 00:13:10 +08:00
Xinyi Zou	16f5d3d5b3	[Improvement](memory) new page use Allocator (#19472 )	2023-05-16 19:09:17 +08:00
Ziyu Wang	325a1d4b28	[vectorized](function) support array_count function (#18557 ) support array_count function. array_count：Returns the number of non-zero and non-null elements in the given array.	2023-05-16 17:00:01 +08:00
lihangyu	e22f5891d2	[WIP](row store) two phase opt read row store (#18654 )	2023-05-16 13:21:58 +08:00
Pxl	b927f8cd37	[Chore](asan) change asan_suppr from interceptor_via_lib to interceptor_via_fun (#19636 ) change asan_suppr from interceptor_via_lib to interceptor_via_fun	2023-05-16 10:51:43 +08:00
Liqf	c87e78dc35	[bug](jsonb) fix jsonb query bug When the json key value contains "." (#19185 ) Issue Number: close #19173 mysql> SELECT jsonb_extract('{"a.b.c":{"k1":"v31", "k2.a1": 300},"a":"opentelemetry"}', '$."a.b.c".k1'); +-------------------------------------------------------------------------------------------+ \| jsonb_extract('{"a.b.c":{"k1":"v31", "k2.a1": 300},"a":"opentelemetry"}', '$."a.b.c".k1') \| +-------------------------------------------------------------------------------------------+ \| "v31" \| +-------------------------------------------------------------------------------------------+ 1 row in set (0.06 sec)	2023-05-15 15:43:12 +08:00
Pxl	2a02561863	[Bug](ubsan) fix some wrong downcast founded by ubsan (#19591 ) fix some wrong downcast founded by ubsan. ```cpp doris/be/src/olap/bloom_filter_predicate.h:43:32: runtime error: downcast of address 0x7f8ec2b691a0 which does not point to an object of type 'doris::BloomFilterColumnPredicate<doris::TYPE_DATE>::SpecificFilter' (aka 'BloomFilterFunc<(doris::PrimitiveType)11U>') 0x7f8ec2b691a0: note: object is of type 'doris::BloomFilterFunc<(doris::PrimitiveType)12>' e5 55 00 00 10 74 58 42 e5 55 00 00 00 00 10 00 8e 7f 00 00 20 07 6f cc 8e 7f 00 00 80 fe 68 cc ^~~~~~~~~~~~~~~~~~~~~~~ vptr for 'doris::BloomFilterFunc<(doris::PrimitiveType)12>' ``` 1. TYPE_DATE/TYPE_DATETIME have same data format, so I change the cast about bloom filter to reinterpret cast. ```cpp doris/be/src/vec/exec/format/orc/vorc_reader.h:281:17: runtime error: downcast of address 0x7f562f4c3180 which does not point to an object of type 'ColumnVector<int>' 0x7f562f4c3180: note: object is of type 'doris::vectorized::ColumnDecimal<doris::vectorized::Decimal<int> >' 74 65 00 00 20 91 70 f5 ca 55 00 00 02 00 00 00 00 00 00 00 f0 d4 4c 2f 56 7f 00 00 f0 d4 4c 2f ^~~~~~~~~~~~~~~~~~~~~~~ vptr for 'doris::vectorized::ColumnDecimal<doris::vectorized::Decimal<int> >' ``` 2. doris use ColumnDecimal to store decimal elements.	2023-05-15 14:27:48 +08:00
Pxl	4eb2604789	[Bug](function) fix function define of Retention inconsist and change some static_cast to assert cast (#19455 ) 1. fix function define of `Retention` inconsist, this function return tinyint on `FE` and return uint8 on `BE` 2. make assert_cast support cast to derived 3. change some static cast to assert cast 4. support sum(bool)/avg(bool)	2023-05-15 11:50:02 +08:00
zclllyybb	92bf485abd	[Bug] Fix doris pipeline shared scan and top n opt (#19599 )	2023-05-15 10:00:44 +08:00
HHoflittlefish777	f8ef25bb10	[enhancement](load) lazy-open necessary partitions when load (#18874 )	2023-05-14 16:09:55 +08:00
Tiewei Fang	91cdb79d89	[Bugfix](Outfile) fix that export data to parquet and orc file format (#19436 ) 1. support export `LARGEINT` data type to parquet/orc file format. 2. Export the DORIS `DATE/DATETIME` type to the `Date/Timestamp` logic type of parquet file format. 3. Fix that the data is not correct when the DATE type data is exported to ORC.	2023-05-13 22:39:24 +08:00
HappenLee	cb943ae7ca	[pipeline](bug) DCHECK may failed in pip sender queue (#19545 ) DCHECK may failed in pip sender queue	2023-05-12 20:39:18 +08:00
yiguolei	8ef9212ddc	[enhancement](exceptionsafe) force check exec node method's return value (#19538 )	2023-05-12 10:21:00 +08:00
starocean999	e9392780a9	[fix](nereids)fix some nereids planner bugs (#19509 ) 1.some encrypt and decrypt functions have wrong blockEncryptionMode 2.topN node should compare tuples from intermediate_row_desc with first_sort_slot.tuple_id 3.must keep the limit if it's an uncorrelated in-subquery with limit on sort, like select a from t1 where a in ( select b from t2 order by xx limit yy )	2023-05-12 09:06:16 +08:00
xy720	39ec8aa64c	[refactor](complex-type) refactor array/map/struct literal to not invoke execute() function in prepare state (#19068 )	2023-05-11 18:44:37 +08:00

1 2 3 4 5 ...

1679 Commits