doris

Author	SHA1	Message	Date
Gabriel	3e8b3658c7	[feature-wip](decimalv3) Support basic agg and arithmetic operations for decimal v3 (#14513 )	2022-11-29 15:12:41 +08:00
Pxl	82da071b45	[Chore](format) update clang-format version to 15 (#13036 ) update clang-format version to 15	2022-11-29 14:46:10 +08:00
lsy3993	f7a827c06b	[fix](new-scan) fix some bugs about new scan node and readers (#14504 ) json reader DCHECK fail because of missing TYPE_STRING fix bug that if no file is found, the tvf will throw NPE. The predicate conjuncts can not be pushed down to parquet reader if this is a load task. Because the predicate should be applied on column of dest table, not on column of source file. Add a temp property "use_new_load_scan_node" of broker load to make regression test happy. So that we can use new load scan node for a certain job and avoid setting global FE config.	2022-11-29 10:21:41 +08:00
Xinyi Zou	e1f0fa069c	[enhancement](memory) Refactored process memory statistics periodically refresh, and fix catch bad_alloc (#14580 )	2022-11-29 10:15:25 +08:00
Gabriel	7513c82431	[NLJoin](conjuncts) separate join conjuncts and general conjuncts (#14608 )	2022-11-29 08:55:54 +08:00
Jerry Hu	daeabcf053	[improvement](vec) optimize the logic for _has_null in ColumnNullable (#14633 )	2022-11-29 08:53:30 +08:00
Yongqiang YANG	0702277196	[improvement](tcmalloc) add moderate mode and avoid oom with a lot of cache (#14374 ) ReleaseToSystem aggressively when there are little free memory.	2022-11-28 20:17:51 +08:00
luozenglin	1e690ea6aa	[fix](bitmapfilter) Set bitmap filter waiting time to the query timeout. (#14623 ) bitmap filter is precise filter and only filter once, so it must be applied.	2022-11-28 18:57:27 +08:00
abmdocrt	529bdfb153	[Fix](function) Fix retention function return wrong value type (#14552 ) MySQL [db]> SELECT SUM(a.r[1]) as active_user_num, SUM(a.r[2]) as active_user_num_1day, SUM(a.r[3]) as active_user_num_3day, SUM(a.r[4]) as active_user_num_7day FROM ( SELECT user_id, retention( day = '2022-11-01', day = '2022-11-02', day = '2022-11-04', day = '2022-11-07') as r FROM login_event WHERE (day >= '2022-11-01') AND (day <= '2022-11-21') GROUP BY user_id ) a; ERROR 1105 (HY000): errCode = 2, detailMessage = sum requires a numeric parameter: sum(%element_extract%(a.r, 1))	2022-11-28 15:56:18 +08:00
camby	73a600fba8	bug fix for outfile (#14550 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-11-28 15:46:41 +08:00
yiguolei	9d087a01f3	[improvement](startup) not print error stack for OLAP_ERR_TABLE_ALREADY_DELETED_ERROR because there are too much errors during startup (#14627 ) If there are too much deleted tablets in RocksDB, there are many OLAP_ERR_TABLE_ALREADY_DELETED_ERROR during startup and will try to get error stack. It will cost a lot of time and the start process taks very very long. Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-11-28 15:43:24 +08:00
Pxl	d712c4efe1	[Enhancement](predicate) move create column predicate to create_predicate_function (#14588 ) move create column predicate to create_predicate_function use same macro to create_column_predicate and create_predicate_function	2022-11-28 14:13:40 +08:00
Kang	ed92a8f81e	[feature](jsonb function)change jsonb_extract_string behavior and doc (#14619 ) 1. change jsonb_extract_string behavior: convert to string instead of NULL if the type of json path is not string 2. move jsonb tutorial doc to JSONB data type	2022-11-28 11:36:54 +08:00
zhannngchen	39c47d930b	[improvement](load) add more log on rpc error (#14559 ) * [improvement](load) add more log on rpc error * update	2022-11-28 08:32:20 +08:00
starocean999	78adecac1b	[enhancemennt](be)optimize mem usage in join and set node (#14602 )	2022-11-27 13:38:49 +08:00
Tiewei Fang	36419fae48	[fix](JdbcExecutor) fix that JdbcExecutor did not load the class jar (#14598 ) JdbcExecutor did not load jdbc driver jar, so add classloader to load jdbc jar.	2022-11-26 23:53:05 +08:00
Xinyi Zou	d5d3f7e0b7	[fix](memtracker) Fix thrift BackendService thread local is not initialized, memtracker init fail (#14589 )	2022-11-26 13:04:39 +08:00
HappenLee	70a424d6e3	[Bug](regression) Fail regression test in test_grouping_sets in fuzzy mode (#14601 )	2022-11-26 12:17:31 +08:00
yixiutt	81fece5360	[improvement](cache) close compaction&schema_change&checksum index meta cache (#14586 )	2022-11-26 12:15:32 +08:00
Mingyu Chen	064b8d2aa6	[fix](multi-catalog) fix coredump when querying partitioned hive table with text format (#14604 ) BE will crash when querying partitioned hive table with text format and put partition column at first of select items. 1. FE should use file slots to set the column mapping index of csv file. 2. BE should use `get_by_name` of block to get right column in a block in csv reader.	2022-11-26 11:42:40 +08:00
Kang	52c6ba051e	[feature](jsonb type)refactor JSONB type using column and add testcase (#13778 ) 1. Refactor JSONB type using ColumnString instead making a copy. 2. Add regression testcase for JSONB load and functions.	2022-11-26 10:06:15 +08:00
zy-kkk	7ae7830c50	[improvement](function)add size function alias array_size (#14594 ) * add size function alias * fix	2022-11-25 22:29:48 +08:00
luozenglin	4728e75079	[feature](bitmap) Support in bitmap syntax and bitmap runtime filter (#14340 ) 1.Support in bitmap syntax, like 'where k1 in (select bitmap_column from tbl)'; 2.Support bitmap runtime filter. Generate a bitmap filter using the right table bitmap and push it down to the left table storage layer for filtering.	2022-11-25 15:22:44 +08:00
camby	d5777bb1e9	[enhancement](outfile) add retry for broker pwrite #14556 Problem: We got following error frequently while SELECT xxx INTO OUTFILE: ERROR 1064 (HY000): RpcException, msg: Fail to write to broker, broker:TNetworkAddress(hostname=a.b.c.d, port=8111) failed:write() send(): Broken pipe Reason: we cache broker thrift client in BE; thrift client check connect isOpen only return cached flag, not care the real socket is opened or closed; after we get client from cache, the socket may already closed, then pwrite will failed. How to fix: Other interfaces such as open and close, will reopen and retry again, but pwrite do not retry. As there are write offset inside pwrite, and the broker(server) side also will check the write offset, it is safe to retry pwrite.	2022-11-25 14:20:33 +08:00
lihangyu	7ba4cd764a	[enhancement](array-function) `array_position`,`array_contains`,`countequal` which in `FunctionArrayIndex` handle target NULL (#14564 ) in the previous, the result is: ``` mysql> select array_position([1, null], null); +--------------------------------------+ \| array_position(ARRAY(1, NULL), NULL) \| +--------------------------------------+ \| NULL \| +--------------------------------------+ 1 row in set (0.02 sec) ``` but after this commit, the result become: ``` mysql> select array_position([1, null], null); +--------------------------------------+ \| array_position(ARRAY(1, NULL), NULL) \| +--------------------------------------+ \| 2 \| +--------------------------------------+ 1 row in set (0.02 sec) ```	2022-11-25 14:19:50 +08:00
zhangstar333	d5d356b17f	[vectorized](function) support order by field function (#14528 ) * [vectorized](function) support order by field function * update * update test	2022-11-25 14:00:46 +08:00
Ashin Gau	25de068a05	[fix](parquet-reader) the value of null map will overflow when LazyRead merges too many empty batches (#14558 ) The run length of null map is saved as `uint16_t`. Previously, the run length of null map was limited by `batch_size` in the `ParquetReader`, by setting `batch_size = std::min(batch_size, (size_t)USHRT_MAX)`. It works well when the batch size is less than `USHRT_MAX`. However, [Lazy read](https://github.com/apache/doris/pull/13917) will merge empty batches until reading a non-empty batch or reaching the EOF of a row group, so the `batch_size` may be greater than `USHRT_MAX` in non-predicate columns. In addition, even if the `batch_size` does not exceed `USHRT_MAX`, the adjacent batches may also make the run length exceed the `USHRT_MAX` in `ColumnSelectVector::get_next_run`.	2022-11-25 12:22:18 +08:00
HappenLee	f68fa442cd	[Bug](regression-test) Fix regression aggregate failed muti distinct (#14563 ) Fix regression aggregate failed muti distinct	2022-11-25 10:58:10 +08:00
plat1ko	225e4981ed	[feature](selectdb-cloud) Fix leak in VCollectorIterator (#962 ) (#14549 ) `VCollectIterator::build_heap()` leaks memory when there is a `VCollectIterator::LevelIterator::init()` fails.	2022-11-25 10:25:24 +08:00
Jerry Hu	9103ded1dd	[improvement](join)optimize sharing hash table for broadcast join (#14371 ) This PR is to make sharing hash table for broadcast more robust: Add a session variable to enable/disable this function. Do not block the hash join node's close function. Use shared pointer to share hash table and runtime filter in broadcast join nodes. The Hash join node that doesn't need to build the hash table will close the right child without reading any data(the child will close the corresponding sender).	2022-11-24 21:06:44 +08:00
lihangyu	bc699511d0	[Fix](array-function) fix `array_distinct` null values (#14544 ) in the previous the result is: ``` mysql> select array_distinct([1,1,3,3,null, null, null]); +-----------------------------------------------------+ \| array_distinct(ARRAY(1, 1, 3, 3, NULL, NULL, NULL)) \| +-----------------------------------------------------+ \| [1, 3, NULL, NULL, NULL] \| +-----------------------------------------------------+ 1 row in set (0.00 sec) ``` after this fix, the result becomes: ``` mysql> select array_distinct([1,1,3,3,null, null, null]); +-----------------------------------------------------+ \| array_distinct(ARRAY(1, 1, 3, 3, NULL, NULL, NULL)) \| +-----------------------------------------------------+ \| [1, 3, NULL] \| +-----------------------------------------------------+ 1 row in set (0.00 sec) ```	2022-11-24 19:07:28 +08:00
TengJianPing	ac46922433	[fix](ut) Fix failures for BE UT macOS (#14543 )	2022-11-24 17:39:37 +08:00
zy-kkk	59b31a03c4	[Improvement](agg function) support group_bit_and/group_bit_or/group_bit_xor functions (#14386 )	2022-11-24 16:46:42 +08:00
AlexYue	2389a90cd0	[enhancement](snapshot) add missed version log when make_snapshot in engine clone task (#14284 )	2022-11-24 14:51:28 +08:00
starocean999	7f4cc61286	[fix](cast)prevent be from crashing when cast function is not available (#14540 ) * [fix](cast)prevent be from crashing when cast function is not available * format code	2022-11-24 14:17:49 +08:00
TengJianPing	6c7f758ef7	[improvement](hashjoin) support partitioned hash table in hash join (#14480 )	2022-11-24 14:16:47 +08:00
wxy	6472d5506f	[fix](cache) fix cache overflow problem #14515 (#14516 ) Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>	2022-11-24 11:18:46 +08:00
abmdocrt	70ea07bc4b	[fix](nullable) Fix nullable cache to avoid function returning wrong value (#14463 )	2022-11-24 09:35:08 +08:00
Gabriel	496a92b668	[JavaUDF](loader) Fix compatible problem for JAVA 11 (#14519 )	2022-11-23 23:36:39 +08:00
Gabriel	d14e1d25ff	[Bug](vectorized) Fix wrong column type (#14387 )	2022-11-23 18:07:33 +08:00
starocean999	1520e5c88a	[enhancement](agg)use new method to serialize keys in batch if the key is too large (#14484 ) * [enhancement](agg)use new method to serialize keys in batch if the key is too large * fix compile error	2022-11-23 17:35:39 +08:00
yiguolei	fd3af489a4	[memory](chunkallocator) disable chunkallocator when reserved bytes == 0 (#14494 ) disable chunkallocator when reserved bytes == 0 disable chunkallocator by default	2022-11-23 17:12:53 +08:00
xy720	0074f55f96	[refactor](array-type) Remove encoding info for array type (#14498 ) Array column should not have encoding info because it use its sub columns' encoding info And this encoding info is never used and easy to make people confused. We should remove it.	2022-11-23 11:45:47 +08:00
Xin Liao	3b5f4ad198	[fix](unique-key-merge-on-write) fix that unique key with mow may loss some data in the query result with predicates (#14455 ) When unique key with MOW table has sequence column, the query result may be wrong with predicates. There are two problems: The sequence column needs to be removed from primary key index when comparing key. The sequence column needs to be removed from min/max key.	2022-11-23 09:08:07 +08:00
Adonis Ling	249b688663	[chore](github) Add a workflow to check BE UT on macOS (#14506 )	2022-11-23 08:38:28 +08:00
zhengyu	ab8346560d	[Enhancement](storage) add num_values consistency check when build/load IndexedColumn (#14447 ) (#14450 )	2022-11-22 21:37:08 +08:00
zhangstar333	b04ec41c1d	[Vectorized](udaf) fix java-udaf couldn't get jar core dump (#14393 ) fix java-udaf couldn't get jar core dump	2022-11-22 20:49:02 +08:00
luozenglin	30e1818724	[fix](tracing) fix tracing in the new scan node does not meet expectations (#14155 ) Issue Number: close #14149 - Remove unexpected tracing, like 'vscanner::scan' - Merge span vscannode::get_next	2022-11-22 16:44:02 +08:00
yiguolei	f72c63e4bb	[chore](error status) print error stack when rpc error (#14473 ) Currently, BE will print fail to get master client from cache. host=xxxxx, port=9228, code=THRIFT_RPC_ERROR but we did not know which step generate this error. So that I refactor error status in be and add error stack for RPC_ERROR. W1122 10:19:21.130796 30405 utils.cpp:89] fail to get master client from cache. host=xxxx, port=9228, code=RPC error(error -1): Couldn't open transport for xxxx:9228 (open() timed out)/n @ 0x559af8f774ea doris::Status::ConstructErrorStatus() @ 0x559af9aacbee _ZN5doris16ThriftClientImpl4openEv.cold @ 0x559af97f563a doris::ClientCacheHelper::_create_client() @ 0x559af97f78cd doris::ClientCacheHelper::get_client() @ 0x559af934f38b doris::MasterServerClient::report() @ 0x559af932e7a7 doris::TaskWorkerPool::_handle_report() @ 0x559af932f07c doris::TaskWorkerPool::_report_task_worker_thread_callback() @ 0x559af9b223c5 doris::ThreadPool::dispatch_thread() @ 0x559af9b187af doris::Thread::supervise_thread() @ 0x7f661bd8bea5 start_thread @ 0x7f661c09eb0d __clone Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-11-22 14:29:28 +08:00
Gabriel	1ec7f45fb6	[Bug](avg) Fix `avg` for bigint (#14433 )	2022-11-22 10:29:59 +08:00

1 2 3 4 5 ...

3269 Commits