doris

Author	SHA1	Message	Date
TengJianPing	b122f9b80c	[fix](concat) ColumnString::chars is resized with wrong size (#22610 ) FunctionStringConcat::execute_impl resized with size that include string null terminator, which causes ColumnString::chars.size() does not match with ColumnString::offsets.back, this will cause problems for some string functions, e.g. like and regexp.	2023-08-04 19:13:35 +08:00
Kaijie Chen	93593a013d	[feature](load) add segment bytes limit in segcompaction (#22526 )	2023-08-04 18:00:52 +08:00
amory	86e6f5d039	[FIX](decimal)fix decimal precision (#22364 ) Now we make wrong for decimal parse from string if given string precision is bigger than defined decimal precision, we will return a overflow error, but only digit part is bigger than typed digit length , we should return overflow error when we traverse given string to decimal value	2023-08-03 21:13:58 +08:00
Chenyang Sun	19d1f49fbe	[improvement](compaction) compaction policy and options in the properties of a table (#22461 )	2023-08-01 22:02:23 +08:00
Mryange	f16a39aea1	[feature](time) using timev2 type to replace the old time type. (#22269 )	2023-08-01 15:59:07 +08:00
HappenLee	3a11de889f	[Opt](exec) opt the performance of date parquet convert by date dict (#22384 ) before： mysql> select count(l_commitdate) from lineitem; +---------------------+ \| count(l_commitdate) \| +---------------------+ \| 600037902 \| +---------------------+ 1 row in set (0.86 sec) after: mysql> select count(l_commitdate) from lineitem; +---------------------+ \| count(l_commitdate) \| +---------------------+ \| 600037902 \| +---------------------+ 1 row in set (0.36 sec)	2023-08-01 12:24:00 +08:00
Gabriel	d585a8acc1	[Improvement](shuffle) Accumulate rows in a batch for shuffling (#22218 )	2023-08-01 09:55:06 +08:00
HHoflittlefish777	ee754307bb	[refactor](load) refactor memtable flush actively (#21634 )	2023-07-30 21:31:54 +08:00
zzzzzzzs	765f1b6efe	[Refactor](load) Extract load public code (#22304 )	2023-07-29 12:56:31 +08:00
huanghaibin	ec1a4d172b	(vertical compaction) fix vertical compaction core (#22275 ) * (vertical compaction) fix vertical compaction core co-author:@zhannngchen	2023-07-28 16:41:00 +08:00
HHoflittlefish777	9e16c69925	[improvement](compression) support LZ4_HC algorithm and parse LZ4_RAW (#22165 )	2023-07-26 18:23:39 +08:00
amory	d4a4c172ea	[Improve](serde)update serialize and deserialize text for data type (#21109 )	2023-07-26 10:06:16 +08:00
Gabriel	103c473b96	[Bug](pipeline) fix pipeline shared scan + topn optimization (#21940 )	2023-07-25 12:48:27 +08:00
Ashin Gau	30c21789c8	[opt](filecache) use weak_ptr to cache the file handle of file segment (#21975 ) Use weak_ptr to cache the file handle of file segment. The max cached number of file handles can be configured by `file_cache_max_file_reader_cache_size`, default `1000000`. Users can inspect the number of cached file handles by request BE metrics: `http://be_host:be_webserver_port/metrics`: ``` # TYPE doris_be_file_cache_segment_reader_cache_size gauge doris_be_file_cache_segment_reader_cache_size{path="/mnt/datadisk1/gaoxin/file_cache"} 2500 ```	2023-07-24 19:09:27 +08:00
zhannngchen	86e80ae175	[enhancement](merge-on-write) support concurrent delete bitmap calc while close_wait (#21488 )	2023-07-24 10:09:28 +08:00
Pxl	19ba6bec38	[Improvement](pipeline) support send eos on local exchange and remove some unused code (#22086 ) support send eos on local exchange and remove some unused code	2023-07-24 09:25:32 +08:00
Chenyang Sun	f7ac827c90	[fix](compaction) fix time series compaction point policy (#21670 )	2023-07-21 23:09:02 +08:00
amory	ce397a8d32	[FIX](map)fix arrow serde with map null key #21955	2023-07-19 12:09:34 +08:00
HappenLee	b35cfc5d5e	[opt](join) Opt the performance of join probe (#21845 )	2023-07-19 01:21:22 +08:00
HHoflittlefish777	c6063ed92f	[Revert](lazy open) revert lazy open and add case (#21821 )	2023-07-18 19:41:33 +08:00
amory	cbddff0694	[FIX](map) fix map key-column nullable for arrow serde #21762 arrow is not support key column has null element , but doris default map key column is nullable , so need to deal with if doris map row if key column has null element , we put null to arrow	2023-07-14 00:30:07 +08:00
amory	3163841a3a	[FIX](serde)Fix decimal for arrow serde (#21716 )	2023-07-12 19:15:48 +08:00
Xin Liao	f0d08da97c	[enhancement](merge-on-write) split delete bitmap from tablet meta (#21456 )	2023-07-12 19:13:36 +08:00
Xinyi Zou	4b30485d62	[improvement](memory) Refactor doris cache GC (#21522 ) Abstract CachePolicy, which controls the gc of all caches. Add stale sweep to all lru caches, including page caches, etc. I0710 18:32:35.729460 2945318 mem_info.cpp:172] End Full GC Free, Memory 3866389992 Bytes. cost(us): 112165339, details: FullGC: FreeTopMemoryQuery: - CancelCostTime: 1m51s - CancelTasksNum: 1 - FindCostTime: 0.000ns - FreedMemory: 2.93 GB WorkloadGroup: Cache name=DataPageCache: - CostTime: 15.283ms - FreedEntrys: 9.56K - FreedMemory: 691.97 MB - PruneAllNumber: 1 - PruneStaleNumber: 1	2023-07-11 20:21:31 +08:00
amory	d0eb4d7da3	[Improve](hash-fun)improve nested hash with range #21699 Issue Number: close #xxx when cal array hash, elem size is not need to seed hash hash = HashUtil::zlib_crc_hash(reinterpret_cast<const char*>(&elem_size), sizeof(elem_size), hash); but we need to be care [[], [1]] vs [[1], []], when array nested array , and nested array is empty, we should make hash seed to make difference 2. use range for one hash value to avoid virtual function call in loop. which double the performance. I make it in ut column: array[int64] 50 rows , and single array has 10w elements	2023-07-11 14:40:40 +08:00
Pxl	ca71048f7f	[Chore](status) avoid empty error msg on status (#21454 ) avoid empty error msg on status	2023-07-11 13:48:16 +08:00
TengJianPing	736d6f3b4c	[improvement](timezone) support mixed uppper-lower case of timezone names (#21572 )	2023-07-11 09:37:14 +08:00
Mryange	8973610543	[feature](datetime) "timediff" supports calculating microseconds (#21371 )	2023-07-10 19:21:32 +08:00
abmdocrt	7d4c47e250	[Enhancement](Compaction) Caculate all committed rowsets delete bitmaps when do comapction (#20907 ) Here we will calculate all the rowsets delete bitmaps which are committed but not published to reduce the calculation pressure of publish phase. Step1: collect this tablet's all committed rowsets' delete bitmaps. Step2: calculate all rowsets' delete bitmaps which are published during compaction. Step3: write back updated delete bitmap and tablet info.	2023-07-10 14:06:11 +08:00
YueW	cf1efce824	[fix](inverted index) use index id instead of column uid to determine whether a hard link is required when build index (#21574 ) Fix problem: For the same column, there are concurrent drop index request and build index request, if build index obtain lock before drop index, build a new index file, but when drop index request execute, link file not contains all index files for the column, that lead to new index file is missed. Based on the above questions, use index id instead of column unique id to determine whether a hard link is required when do build index	2023-07-09 16:45:27 +08:00
amory	7caab87bbe	[FIX](serde) fix map/struct/array support arrow #21628 support map/struct support arrow format fix string arrow format fix largeInt 128 for arrow builder	2023-07-08 15:51:14 +08:00
Mingyu Chen	2678afd2db	[fix][improvement](fs) add HdfsIO profile and modification time (#21638 ) Refactor the interface of create_file_reader the file_size and mtime are merged into FileDescription, not in FileReaderOptions anymore. Now the file handle cache can get correct file's modification time from FileDescription. Add HdfsIO for hdfs file reader pick from [Enhancement](multi-catalog) Add hdfs read statistics profile. #21442	2023-07-08 14:49:44 +08:00
zhannngchen	67afea73b1	[enhancement](merge-on-write) add more version and txn information for mow publish (#21257 )	2023-07-07 16:18:47 +08:00
Yongqiang YANG	fb14950887	[refactor](load) split flush_segment_writer into two parts (#21372 )	2023-07-06 11:13:34 +08:00
amory	b7d6a70868	[FIX](datatype) Implement hash func with array/map/struct type (#21334 ) we do not Implement any hash functions in array/map/struct column , so we use sql like this will make be core select * from ( select bdp.nc_num, collect_list(distinct(bd.catalog_name)) as catalog_name, material_qty from dataease.bu_delivery_product bdp left join dataease.bu_trans_transfer btt on bdp.delivery_product_id = btt.delivery_product_id left join dataease.bu_delivery bd on bdp.delivery_id = bd.delivery_id where bd.val_status in ('10', '20', '30', '90') and bd.delivery_type in (0, 1, 2) group by nc_num, material_qty union ALL select bdp.nc_num, collect_list(distinct(bd.catalog_name)) as catalog_name, material_qty from dataease.bu_trans_transfer btt left join dataease.bu_delivery_product bdp on bdp.delivery_product_id = btt.delivery_product_id left join dataease.bu_delivery bd on bdp.delivery_id = bd.delivery_id where bd.val_status in ('10', '20', '30', '90') and bd.delivery_type in (0, 1, 2) group by nc_num, material_qty ) aa; core :	2023-06-30 17:11:35 +08:00
Liqf	d76fa427a3	[improve](jsonb)Invalid json path prompts an error instead of null (#19646 ) 1. Invalid json path prompts an error instead of null： before： ```sql mysql> SELECT jsonb_extract('[{"k1":"v41","k2":400},1,"a",3.14]', '$[a]'); +-------------------------------------------------------------+ \| jsonb_extract('[{"k1":"v41","k2":400},1,"a",3.14]', '$[a]') \| +-------------------------------------------------------------+ \| NULL \| +-------------------------------------------------------------+ 1 row in set (0.01 sec) ``` now ```sql mysql> SELECT jsonb_extract('[{"k1":"v41","k2":400},1,"a",3.14]', '$[a]'); ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INVALID_ARGUMENT]Json path error: Invalid Json Path for value: $[a] ``` 2. fix some problem: https://github.com/apache/doris/pull/19185 a. support negative numbers ```sql mysql> SELECT jsonb_extract('[{"k1":"v41","k2":400},1,"a",3.14]', '$[-2]'); +--------------------------------------------------------------+ \| jsonb_extract('[{"k1":"v41","k2":400},1,"a",3.14]', '$[-2]') \| +--------------------------------------------------------------+ \| "a" \| +--------------------------------------------------------------+ 1 row in set (0.02 sec) ``` b. Avoid using unnecessary memory 3. Supplementary regression test	2023-06-30 14:29:21 +08:00
Xinyi Zou	0396f78590	[fix](memory) Remove ChunkAllocator & fix Allocator no use mmap (#21259 )	2023-06-28 16:10:24 +08:00
zhannngchen	ec0e398c50	[enhancement](merge-on-write) record precise primary key index size (#21196 )	2023-06-27 16:50:09 +08:00
Xin Liao	48065fce19	[bugfix](merge-on-write) optimize rowset tree and tablet header lock (#20911 )	2023-06-18 19:26:02 +08:00
zhannngchen	ce9a20a375	[enhancement](merge-on-write) format logs about MoW and add more stats for publish (#20853 )	2023-06-17 23:14:28 +08:00
yongjinhou	2e295a1ee9	[Enhancement](http) unify http auth config (#20864 )	2023-06-16 16:55:46 +08:00
Xin Liao	f1af09ef87	[Enhancement](merge-on-write) parallel calculate delete bitmap when tablet has multi segments (#20706 )	2023-06-15 21:11:39 +08:00
Pxl	b6835840f7	[Bug](table-function) return InvalidArgument when explode_split meet empty delimiter (#20795 ) return InvalidArgument when explode_split meet empty delimiter	2023-06-15 15:17:22 +08:00
Chenyang Sun	2a2e485456	[Enhancement](compaction) time-series scenario cumulative compaction policy (#20715 ) new compaction policy for log and time-series scenario	2023-06-14 23:48:44 +08:00
yiguolei	31a4f96f01	[refactor](exprcontext) move close to expr context's dector method (#20747 ) The close method does nothing. But I am not sure we could remove it. So that I add it to dector method and remove many many calls.	2023-06-14 18:01:07 +08:00
Mingyu Chen	4b15185e25	[improvement](hdfs) add parquet footer cache and hdfs file handle cache (#20544 ) 1. Add hdfs file handle cache for hdfs file reader Copied from Impala, `https://github.com/apache/impala/blob/master/be/src/util/lru-multi-cache.h`. (Thanks for the Impala team) This is a lru cache that can store multi entries with same key. The key is build with {file name + modification time} The value is the hdfsFile pointer that point to a certain hdfs file. This cache is to avoid reopen same hdfs file mutli time, which can save query time. Add a BE config `max_hdfs_file_handle_cache_num` to limit the max number of file handle cache, default is 20000. 2. Add file meta cache The file meta cache is a lru cache. the key is {file name + modification time}, the value is the parsed file meta info of the certain file, which can save the time of re-parsing file meta everytime. Currently, it is only used for caching parquet file footer. The test show that is cache is hit, the `FileOpenTime` and `ParseFooterTime` is reduce to almost 0 in query profile, which can save time when there are lots of files to read.	2023-06-13 15:13:57 +08:00
Kang	bd9a9a32f5	[bugfix](s3 fs) fix s3 uri parsing for http/https uri (#20656 )	2023-06-11 14:00:04 +08:00
Pxl	a15a0b9193	[Chore](build) use file(GLOB_RECURSE xxx CONFIGURE_DEPENDS) to replace set cpp (#20461 ) use file(GLOB_RECURSE xxx CONFIGURE_DEPENDS) to replace set cpp	2023-06-08 19:36:21 +08:00
plat1ko	a68fc551f0	[bug](cooldown) Fix async_write_cooldown_meta and snapshot cooldowned version not continuous bug (#20437 )	2023-06-08 15:35:35 +08:00
zhengyu	09344eaab5	[feature](load) introduce single-stream-multi-table load (#20006 ) For routine load (kafka load), user can produce all data for different table into single topic and doris will dispatch them into corresponding table. Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2023-06-07 17:55:25 +08:00

1 2 3 4 5 ...

1138 Commits