doris

Author	SHA1	Message	Date
amory	268c867679	[Improve](serde)replace function_cast from_string to serde (#24087 ) Now we can not support streamload with column which is map/array nested map/array serde can do this now , so we can replace it Notice. if item data in complex type data is empty we just return error, instead of makeup default value , because now we can not define right default for complex type	2023-09-14 13:53:16 +08:00
Kaijie Chen	563c3f75ff	[feature](move-memtable) share delta writer v2 among sinks (#24066 )	2023-09-13 14:39:29 +08:00
zhiqqqq	c7ae2a7d22	[Refactor & Bugfix](static variables) move some static vairables to exec_env (#24029 )	2023-09-13 09:27:03 +08:00
plat1ko	d8ef9dda59	[feature](merge-cloud) Rewrite FS interface (#23953 )	2023-09-12 19:20:25 +08:00
TengJianPing	4bb9a12038	[function](bitmap) support bitmap_remove (#24190 )	2023-09-12 14:52:04 +08:00
bobhan1	bdacefa734	[Fix](status)Fix leaky abstraction and shield the status code `END_OF_FILE` from upper layers (#24165 )	2023-09-12 11:10:52 +08:00
Yongqiang YANG	1228995dec	[improvement](segment) reduce memory footprint of column_reader and segment (#24140 )	2023-09-11 21:54:00 +08:00
zhangdong	dbb9365556	[Enhance](ip)optimize priority_ network matching logic for be (#23795 ) Issue Number: close #xxx If the user has configured the wrong priority_network, direct startup failure to avoid users mistakenly assuming that the configuration is correct If the user has not configured p_ n. Select only the first IP from the IPv4 list, rather than selecting from all IPs, to avoid users' servers not supporting IPv4 extends #23784	2023-09-11 18:32:31 +08:00
bobhan1	a0fcc30764	[Fix](Status) Handle status code correctly and add a new error code `ENTRY_NOT_FOUND` (#24139 )	2023-09-11 09:32:11 +08:00
daidai	f9a75b5c4f	[feature](csv_serde)1.append csv serde for serialize to csv and deserialize from csv. 2.let csvReader use csv serde not text_converter. (#23352 ) 1. append csv serde for serialize to csv and deserialize from csv. 2. let csvReader use csv serde not text_converter.	2023-09-10 00:16:21 +08:00
Jack Drogon	537369f4e2	[Fix](http) Fix curl return HTTP_ERROR && Add not_found HttpClientTest, fix (#23984 ) Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>	2023-09-07 10:10:51 +08:00
TengJianPing	2f8b075b71	[improvement](bitmap) support version for ser/deser of bitmap (#23959 )	2023-09-07 09:55:29 +08:00
Ashin Gau	d183e08f6d	[opt](MergedIO) optimize merge small IO, prevent amplified read (#23849 ) There were two vulnerabilities in the previous fix(https://github.com/apache/doris/pull/20305): 1. `next_content` may not necessarily be a truly readable range 2. The last range of the merged data may be the hollow This PR fundamentally solves the problem of reading amplification by rechecking the calculation range. According to the algorithm, there is only one possibility of generating read amplification, with only a small content of data within the 4k(`MIN_READ_SIZE `) range. However, 4k is generally the minimum IO size and there is no need for further segmentation.	2023-09-06 22:45:31 +08:00
bobhan1	95ae5376f3	[Fix](BinaryPrefixPage) stop to read values when current pos reached the end of the page in `BinaryPrefixPageDecoder::next_batch` (#23855 )	2023-09-06 16:34:38 +08:00
Pxl	a96adc01aa	[Chore](function) refactor of quantile_state (#23862 ) refactor of quantile_state	2023-09-06 15:39:19 +08:00
plat1ko	09bcedb116	[feature](merge-cloud) Remove deprecated old cache (#23881 ) * Remove deprecated old cache	2023-09-06 08:07:05 +08:00
Kaijie Chen	a542f107db	[feature](move-memtable) buffer messages in load stream stub (#23721 )	2023-09-02 13:42:34 +08:00
TengJianPing	75e2bc8a25	[function](bitmap) support bitmap_to_base64 and bitmap_from_base64 (#23759 )	2023-09-02 00:58:48 +08:00
Ashin Gau	eaf2a6a80e	[fix](date) return right date value even if out of the range of date dictionary(#23664 ) PR(https://github.com/apache/doris/pull/22360) and PR(https://github.com/apache/doris/pull/22384) optimized the performance of date type. However hive supports date out of 1970~2038, leading wrong date value in tpcds benchmark. How to fix: 1. Increase dictionary range: 1900 ~ 2038 2. The date out of 1900 ~ 2038 is regenerated.	2023-09-01 14:40:20 +08:00
plat1ko	25b6e4deb2	[fix](daemon) Fix incorrect initialization order of daemon services (#23578 ) Current initialization dependency: Daemon ───┬──► StorageEngine ──► ExecEnv ──► Disk/Mem/CpuInfo │ │ BackendService ─┘ However, original code incorrectly initialize Daemon before StorageEngine. This PR also stop and join threads of daemon services in their dtor, to ensure Daemon services release resources in reverse order of initialization via RAII.	2023-08-31 19:46:38 +08:00
Kaijie Chen	b3a9c247af	[refactor](move-memtable) add load stream stub (#23642 )	2023-08-31 19:39:34 +08:00
Xinyi Zou	f1e43fcaa4	[opt](cache) Support segment cache dynamic opening and closing (#23659 ) Dynamically modify the config to clear the cache, each time the disable cache will only be cleared once. TODO, Support page cache and other caches. curl -X POST http://xxxx:8040/api/update_config?disable_segment_cache=true	2023-08-31 18:48:26 +08:00
TengJianPing	62c075bf7e	[improvement](Block) Replace Block(const PBlock&) with deserialize because it has heavy operations in ctor (#23672 )	2023-08-31 14:44:17 +08:00
zy-kkk	3e4ee3c1e6	[fix](jdbc catalog) fix jdbc driver cache load error (#23656 ) log error: `W20230830 11:19:47.495721 3046231 status.h:363] meet error status: [INTERNAL_ERROR]user function's name should be function_id.checksum[.file_name].file_type, now the all split parts are by delimiter(.): 7119053928154065546.20c8228267b6c9ce620fddb39467d3eb.postgresql-42.5.0.jar` When the jdbc driver had `.` in its name we failed to split it properly	2023-08-31 10:17:15 +08:00
Kaijie Chen	14310ad30b	[improvement](move-memtable) wait StreamClose from remote (#23605 ) * [fix](move-memtable) wait StreamClose from remote	2023-08-30 18:03:36 +08:00
Siyang Tang	1ac0ff0ea9	[feature](delete-predicate) support delete sub predicate v2 (#22442 ) New structure for delete sub predicate. Delete sub predicate uses a string type condition_str to stored temporarily now and fields will be extracted from it using std::regex, which may introduces stack overflow when matching a extremely large string(bug of libc). Now we attempt to use a new PB structure to hold the delete sub predicate, to avoid that problem. message DeleteSubPredicatePB { optional int32 column_unique_id = 1; optional string column_name = 2; optional string op = 3; optional string cond_value = 4; } Currently, 2 versions of sub predicate will both be filled. For query, we use the v2, and during compaction we still use v1. The old rowset meta with delete predicates which had sub predicate v1 will be attempted to convert to v2 when read from PB. Moreover, efforts will be made to rewrite these meta with the new delete sub predicate. Make preparation to use column unique id to specify a column globally. Using the column unique id rather than the column name to identify a column is vital for flexible schema change. The rewritten delete predicate will attach column unique id.	2023-08-29 19:37:23 +08:00
zhangstar333	94a8fa6bc9	[bug](function) fix explode_number function return wrong rows (#23603 ) before the explode_number function result is random with const value. because the _cur_size is reset, so it's can't insert values to column.	2023-08-29 19:02:49 +08:00
abmdocrt	da9eb79ac4	[Enhancement](Schema hash) Remove schema hash in tablet info (#23516 )	2023-08-29 10:05:12 +08:00
Yongqiang YANG	9c65b7ab96	[improvement](column_reader) move load once to index reader to reduce (#23537 ) memory footprint of column reader	2023-08-29 09:34:27 +08:00
Jerry Hu	5be8d57f52	[fix](be-ut) fix ColumnFixedLenghtObjectTest on 32 bits system (#23519 )	2023-08-28 14:02:05 +08:00
Adonis Ling	e0bf621fe0	[chore](build) Fix compilation errors for BE UT (#23535 ) Issue Number: close #23536 This issue was introduced by #23414 .	2023-08-27 11:52:13 +08:00
Jerry Hu	f80b067990	[fix](column) add unimplemented function of ColumnFixedLengthObject (#23468 )	2023-08-25 17:38:01 +08:00
Kaijie Chen	d8e499cb55	[fix](UT) fix flaky test in LoadStreamMgrTest (#23459 )	2023-08-25 13:53:20 +08:00
zclllyybb	9cacf9535a	[Opt](functions) Use preloaded cache to accelerate timezone parsing (#22694 ) * opt * bugfix * fix ut * fix stylecheck	2023-08-25 10:00:48 +08:00
Kaijie Chen	71071ba057	[feature](move-memtable)[4/7] add stream sink file writer (#23416 ) Co-authored-by: laihui <1353307710@qq.com>	2023-08-25 00:08:27 +08:00
Kaijie Chen	98d0a2f6c1	[feature](move-memtable)[3/7] add load stream manager and rpc service (#23415 ) Co-authored-by: zhengyu <freeman.zhang1992@gmail.com> Co-authored-by: Yongqiang YANG <dataroaring@gmail.com> Co-authored-by: laihui <1353307710@qq.com>	2023-08-25 00:08:04 +08:00
zclllyybb	51ac92f65c	Revert "[fix](function) to_bitmap parameter parsing failure returns null instead of bitmap_empty (#21236 )" (#23368 ) This reverts commit 1c3cc77a54938ed948ad8186b8dea8385977d23c.	2023-08-23 18:27:35 +08:00
Pxl	8ed4045df9	[Chore](primitive-type) remove VecPrimitiveTypeTraits (#22842 )	2023-08-23 08:37:40 +08:00
yiguolei	bcdb481374	[refactor](fragment) refactor non pipeline fragment executor (#23281 ) --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-08-22 16:00:34 +08:00
Ashin Gau	5ff7b57fc1	[fix](parquet) parquet reader confuses logical/physical/slot id of columns (#23198 ) `ParquetReader` confuses logical/physical/slot id of columns. If only reading the scalar types, there's nothing wrong, but when reading complex types, `RowGroup` and `PageIndex` will get wrong statistics. Therefore, if the query contains complex types and pushed-down predicates, the probability of the result set is incorrect.	2023-08-22 13:35:29 +08:00
Kaijie Chen	0d7a61ae8c	[fix](load) fix duplicate register of memtable writer in memory limiter (#23205 )	2023-08-22 10:05:17 +08:00
Gabriel	12075f9853	[pipelineX](projection) Support projection and blocking agg (#23256 )	2023-08-21 22:23:02 +08:00
amory	33dfa0c454	[Improve](serde) support text serde for nested type-array/map (#22738 ) Now we can not support nested type array/map so this pr aim to: 1. add format option for string convert defined datatype to keep with origin from_string 2. support array map can nested array and map	2023-08-21 10:32:28 +08:00
ZenoYang	1c3cc77a54	[fix](function) to_bitmap parameter parsing failure returns null instead of bitmap_empty (#21236 ) * [fix](function) to_bitmap parameter parsing failure returns null instead of bitmap_empty * add ut * fix nereids * fix regression-test	2023-08-18 14:37:49 +08:00
Mingyu Chen	330f369764	[enhancement](file-cache) limit the file cache handle num and init the file cache concurrently (#22919 ) 1. the real value of BE config `file_cache_max_file_reader_cache_size` will be the 1/3 of process's max open file number. 2. use thread pool to create or init the file cache concurrently. To solve the issue that when there are lots of files in file cache dir, the starting time of BE will be very slow because it will traverse all file cache dirs sequentially.	2023-08-17 16:52:08 +08:00
Kaijie Chen	6cf1efc997	[refactor](load) use smart pointers to manage writers in memtable memory limiter (#23019 )	2023-08-16 16:34:57 +08:00
bobhan1	4510e16845	[improvement](delete) support delete predicate on value column for merge-on-write unique table (#21933 ) Previously, delete statement with conditions on value columns are only supported on duplicate tables. After we introduce delete sign mechanism to do batch delete, a delete statement with conditions on value columns on unique tables will be transformed into the corresponding insert into ..., __DELETE_SIGN__ select ... statement. However, for unique table with merge-on-write enabled, the overhead of inserting these data can be eliminated. So this PR add the ability to allow delete predicate on value columns for merge-on-write unique tables.	2023-08-16 12:18:05 +08:00
flynn	4e880288c6	[refactor]use clear concept to replace std::enable_if_t (#22801 ) --------- Signed-off-by: flynn <fenglv15@mails.ucas.ac.cn>	2023-08-12 15:10:30 +08:00
yujun	b9b9071c9b	[improvement](create partition) create partition require quorum replicas succ (#22554 )	2023-08-11 11:59:05 +08:00
Chuanle Chen	71807ceb5f	[Enhancement](tvf) Table value function support reading local file (#17404 ) I tested the local tvf with tpch queries. First, generate `lineitem` datasets with 6001215 rows, and load it into `lineitem` table by: ``` insert into lineitem select c11, c1, c4, c2, c3, c5, c6, c7, c8, c9, c10, c12, c13, c14, c15, c16 from local( "file_path" = "tools/tpch-tools/bin/tpch-data/lineitem.tbl.1", "backend_id" = "10003", "format" = "csv", "column_separator" = "\|" ); ``` Then, run `q1` and `q16` tpch queries, the query result is correct. It can also analyze the BE's log directly like: ``` mysql> select * from local( "file_path" = "log/be.out", "backend_id" = "10006", "format" = "csv") where c1 like "%start_time%" limit 10; +--------------------------------------------------------+ \| c1 \| +--------------------------------------------------------+ \| start time: 2023年 08月 07日星期一 23:20:32 CST \| \| start time: 2023年 08月 07日星期一 23:32:10 CST \| \| start time: 2023年 08月 08日星期二 00:20:50 CST \| \| start time: 2023年 08月 08日星期二 00:29:15 CST \| +--------------------------------------------------------+ ```	2023-08-10 20:07:42 +08:00

1 2 3 4 5 ...

1195 Commits