doris

Author	SHA1	Message	Date
bobhan1	c30c1d2436	[branch-2.1] Picks "[opt](delete) Delete job should retry for failure that is not DELETE_INVALID_XXX #37834 " (#38032 ) ## Proposed changes picks https://github.com/apache/doris/pull/37834 and https://github.com/apache/doris/pull/38043	2024-07-18 14:50:30 +08:00
bobhan1	a05406ecc9	[branch-2.1] Picks "[Fix](delete) Fix delete job timeout when executing delete from ... #37363 " (#37374 ) ## Proposed changes picks https://github.com/apache/doris/pull/37363	2024-07-07 18:33:17 +08:00
Siyang Tang	fdd87fe008	[enhancement](delete-pred) enable delete sub predicate v2 for compaction (#35859 ) (#35895 ) ## Proposed changes This PR enable `delete sub predicate v2` for compaction, and legacy version of delete predicate will be processed in the original way.	2024-06-05 12:05:21 +08:00
Gavin Chou	9c270e5cdf	[fix](delete) Fix unrecognized column name delete handler (#32429 ) (#35742 ) pick doris-master #32429	2024-05-31 20:41:22 +08:00
yiguolei	7d1db6cd1f	[refactor](exception safe) Refactor delete handler and block column predicates to make sure exception safe (#31618 )	2024-03-01 14:21:17 +08:00
camby	91a669f5fd	[chore](mac compile) remove using regex to avoid mac compile failed frequently #30783	2024-02-04 14:28:38 +08:00
zy-kkk	65076949ef	[fix](compile)Fix Ambiguous regex Namespace Issue on MacOS Compilation (#30652 )	2024-01-31 23:53:40 +08:00
Yongqiang YANG	7fe4d00bb2	[fix](regex) use boost regex instead of std (#30462 )	2024-01-30 15:33:40 +08:00
Siyang Tang	78cf401c92	[enhancement](err-msg) expose real error msg for invalid delete conf (#28287 )	2023-12-13 01:07:31 +08:00
yangshijie	c1d64a7128	[Feature](datatype) Add IPv4/v6 data type for doris (#24965 )	2023-10-26 17:33:28 +08:00
TengJianPing	693982fd1a	[feature](decimal) support decimal256 (#25386 )	2023-10-25 15:47:51 +08:00
Siyang Tang	bab7581054	[fix](delete) fix potential delete fail after adding columns (#25817 )	2023-10-24 21:45:21 +08:00
bobhan1	642e5cdb69	[Fix](Status) Make `Status` `[[nodiscard]]` and handle returned `Status` correctly (#23395 )	2023-09-29 22:38:52 +08:00
Siyang Tang	1ac0ff0ea9	[feature](delete-predicate) support delete sub predicate v2 (#22442 ) New structure for delete sub predicate. Delete sub predicate uses a string type condition_str to stored temporarily now and fields will be extracted from it using std::regex, which may introduces stack overflow when matching a extremely large string(bug of libc). Now we attempt to use a new PB structure to hold the delete sub predicate, to avoid that problem. message DeleteSubPredicatePB { optional int32 column_unique_id = 1; optional string column_name = 2; optional string op = 3; optional string cond_value = 4; } Currently, 2 versions of sub predicate will both be filled. For query, we use the v2, and during compaction we still use v1. The old rowset meta with delete predicates which had sub predicate v1 will be attempted to convert to v2 when read from PB. Moreover, efforts will be made to rewrite these meta with the new delete sub predicate. Make preparation to use column unique id to specify a column globally. Using the column unique id rather than the column name to identify a column is vital for flexible schema change. The rewritten delete predicate will attach column unique id.	2023-08-29 19:37:23 +08:00
Chenyang Sun	78849f6e64	[refactor](predicate) refactor function parse_to_predicte (#23343 ) Refactored the interface of the parse_to_predicte function, we will be utilizing this interface to support the decomposition of variant columns. The "column" parameter might represent a column resulting from the decomposition of a variant column. Instead of using a "unique_id" for identification, we are utilizing a "path" to denote this column.	2023-08-23 11:52:20 +08:00
plat1ko	d4694167a8	[Enhancement](chore) Some Status relevant enhancement (#23072 )	2023-08-21 14:14:38 +08:00
bobhan1	4510e16845	[improvement](delete) support delete predicate on value column for merge-on-write unique table (#21933 ) Previously, delete statement with conditions on value columns are only supported on duplicate tables. After we introduce delete sign mechanism to do batch delete, a delete statement with conditions on value columns on unique tables will be transformed into the corresponding insert into ..., __DELETE_SIGN__ select ... statement. However, for unique table with merge-on-write enabled, the overhead of inserting these data can be eliminated. So this PR add the ability to allow delete predicate on value columns for merge-on-write unique tables.	2023-08-16 12:18:05 +08:00
Siyang Tang	4359089b9c	[fix](delete-pred) fix special char in delete sub condition #22667 For some users, their delete condition may contain special chars like '$', which will cause failure in parsing delete condition.	2023-08-09 00:04:26 +08:00
AlexYue	f036cdfde6	[feature](compaction) support delete in cumulative compaction (#19609 )	2023-08-07 15:22:21 +08:00
zclllyybb	ad080c691f	[chore](log)Move non-user-friendly error message to be.WARNING (#22315 ) Move non-user-friendly error message to be.WARNING	2023-07-28 13:15:25 +08:00
Pxl	ca71048f7f	[Chore](status) avoid empty error msg on status (#21454 ) avoid empty error msg on status	2023-07-11 13:48:16 +08:00
Jerry Hu	8ff8705b3f	[fix](olap) deletion statement with space conditions did not take effect (#20349 ) Deletion statement like this: delete from tb where k1 = ' '; The rows whose k1's value is ' ' will not be deleted.	2023-06-02 13:52:57 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
Pxl	c9b4eaea76	[Chore](storage) change FieldType to enum class #18500	2023-04-10 08:53:44 +08:00
Xinyi Zou	d9fe5f7b67	[enhancement](memory) Remove MemPool and replace it with Arena (#17820 ) Arena can replace MemPool in most scenarios. Except for memory reuse, MemPool supports reuse of previous memory chunks after clear, but Arena does not. Some comparisons between MemPool and Arena: 1. Expansion Arena is less than 128M index 2 alloc chunk; more than 128M memory, allocate 128M * n > `size`, n is equal to the minimum value that satisfies the expression; MemPool less than 512K index 2 alloc chunk, greater than 512K memory, separately apply for a `size` length chunk After Arena applied for a chunk larger than 128M last time, the minimum chunk applied for after that is 128M. Does this seem to be a waste of memory? MemPool is also similar. After the chunk of 512K was applied for last time, the minimum chunk of subsequent applications is 512K. 2. Alignment MemPool defaults to 16 alignment, because memtable and other places that use int128 require 16 alignment; Arena has no default alignment; 3. Memory reuse Arena only supports `rollback`, which reuses the memory of the current chunk, usually the memory requested last time. MemPool supports clear(), all chunks can be reused; or call ReturnPartialAllocation() to roll back the last requested memory; if the last chunk has no memory, search for the most free chunk for allocation 4. Realloc Arena supports realloc contiguous memory; it also supports realloc contiguous memory from any position at the time of the last allocation. The difference between `alloc_continue` and `realloc` is: 1. Alloc_continue does not need to specify the old size, but the default old size = head->pos - range_start 2. alloc_continue supports expansion from range_start when additional_bytes is between head and pos, which is equivalent to reusing a part of memory, while realloc completely allocates a new memory MemPool does not support realloc, but supports transferring or absorbing chunks between two MemPools 5. check mem limit MemPool checks the mem limit, and Arena checks at the Allocator layer. 6. Support for ASAN Arena does something extra 7. Error handling MemPool supports returning the error message of application failure directly through `Status`, and Arena throws Exception. Tests that Arena can consider 1. After the last applied chunk is larger than 128M, the minimum applied chunk is 128M, which seems to waste memory; 2. Support clear, memory multiplexing; 3. Increase the large list, alloc the memory larger than 128M, and the size is equal to `size`, so as to avoid the current chunk not being fully used, which is wasteful. 4. In some cases, it may be possible to allocate backwards to find chunks t	2023-03-29 20:56:49 +08:00
xueweizhang	e0cd8599d2	[fix](delete) fix delete from bug which can get wrong result (#17146 ) 理论上，如果是两次独立的删除，比如delete from table where a=1; delete from table where a=2;其实这个地方应该可以使用的，但是目前的代码，是把所有不同版本的delete predicates和不同列的delete predicates都放到一起了，失去了版本信息、失去了谓词间可能是and的关系，统一弱化成了delete predicates都是独立的，有一个delete predicates满足条件，就把page都去掉。这个pr的修改方式，就是在当前代码的基础上，当只有一个delete predicate的时候才能保证后续淘汰page的正确性，所以这里一律加了 == 1的判断才传递delete predicates。如果要把不同版本的delete predicates和不同列的delete predicates作为完整和严谨的逻辑去判断page，需要修改的设计就有点多了，目前的方案算是一种优先解决bug的思路，后续可以进一步把delete predicates这块加速zone判断进行page淘汰的逻辑完善，提高delete predicates使用的场景。	2023-02-28 09:20:10 +08:00
plat1ko	f3aea7f0f0	[Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744 )	2022-12-11 23:33:18 +08:00
Gabriel	2c42f0a905	[refactor](decimalv3) Refine code for DecimalV3 (#14394 )	2022-11-19 16:57:17 +08:00
yiguolei	2f192019d3	[bugfix](delete hanlder) delete predicate is merged and could not find schema cause core dump (#12161 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-08-30 09:18:21 +08:00
Gabriel	5f7d6e8f2b	[Refactor](predicate) Unify Conditions and ColumnPredicate (#11985 )	2022-08-29 12:11:22 +08:00
yiguolei	ccff3f5711	[bugfix](light weight schema change) support delete condition in schema change (#11869 ) * [bugfix](light weight schema change) support delete condition in schema change Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-08-26 11:45:55 +08:00
Gabriel	1f9eec5462	[Regression](datev2) Add test cases for datev2/datetimev2 (#11831 )	2022-08-19 10:57:55 +08:00
yiguolei	321107cb40	[refactor](schema change) Using tablet schema shared ptr instead of raw ptr (#11475 ) * Using tabletschema shared ptr instead of raw ptrs Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-08-05 11:04:38 +08:00
yiguolei	de4466624d	[refactor](schema change)Remove delete from sc (#11441 ) * not need call delete handler to filter rows since they are filtered in rowset reader * need not call delete eval in schema change and remove related code Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-08-03 03:29:41 +08:00
Gabriel	babab5d535	[feature-wip] support datetimev2 (#11085 )	2022-07-23 16:07:59 +08:00
yiguolei	82251a6bab	[refactor] some refactor of delete predicates (#10816 )	2022-07-15 14:13:34 +08:00
Gabriel	3b46242483	[feature-wip] Optimize Decimal type (#10794 ) * [feature-wip](decimalv3) support decimalv3 * [feature-wip] Optimize Decimal type Co-authored-by: liaoxin <liaoxinbit@126.com>	2022-07-14 10:50:50 +08:00
Gabriel	ca94867b4e	[Feature-wip] add date v2 type (#9916 )	2022-06-26 16:07:56 +08:00
BePPPower	51db78d375	[refactor] modify all OLAP_LOG_WARNING to LOG(WARNING) (#9473 ) Co-authored-by: BePPPower <fangtiewei@selectdb.com>	2022-05-10 09:25:25 +08:00
chenlinzhong	c9961c9bb9	[style] clang-format all c++ code (#9305 ) - sh build-support/clang-format.sh to clang-format all c++ code	2022-04-29 16:14:22 +08:00
yiguolei	e5e0dc421d	[refactor] Change ALL OLAPStatus to Status (#8855 ) Currently, there are 2 status code in BE, one is common/Status.h, and the other is olap/olap_define.h called OLAPStatus. OLAPStatus is just an enum type, it is very simple and could not save many informations, I will unify these code to common/Status.	2022-04-14 11:43:49 +08:00
Zhengguo Yang	cfb57be731	[api-change] add soft limit of String type length (#8567 ) 1. add a config string_type_soft_limit to soft limit max length of string type 2. disable using String type in Key column, partition column and distribution column 3. remove String type alias BLOB for futrue use	2022-03-25 09:28:41 +08:00
Tanya-W	51103dcf6e	[typo] translate the comments of delete_handler.cpp (#8402 )	2022-03-09 13:08:28 +08:00
thinker	2a2f12ca51	[refactor & fix](exce & olap) refactor reader: rename Reader to TabletReader (#7544 ) 1. Consider the responsibility of Reader, Rename Reader to TabletReader, I think the new name TabletReader can represent its function exactly, it is more suitable and meaningful 2. add virtual keyword for the destructor of OlapScanner, because VOlapScanner is derived from it 3. refactor struct ReaderParams and KeysParam as TabletReader's inner struct，guard by TabletReader name scope, it's also more reasonable 4. reduce OlapScanner's member data amount, just use _parent->member_data is simpler 5. bugfix: TupleReader has the same memeber data _collect_iter to its parent class Reader, this usage is dangerous, the writer may make some mistake, so i delete TupleReader::_collect_iter to fix it. 6. call set_tablet_reader() in OlapScanner::prepare() to setup _tablet_reader, VOlapScanner should override set_tablet_reader to new BlockReader instead, use this way to avoid new Reader twice by reset unique_ptr _tablet_reader 7. if the member data is a inseparable part of a class, i suggest using normal variable while not pointer variable, because pointer bring a indirect lay and must handle coping and destructing carefully, it's not necessary 8. some other small changes for readability or design	2022-01-06 00:00:32 +08:00
Zhengguo Yang	4f744333c2	fix some core in local test: (#6594 ) 1. insert very large string value may coredump 2. some analitic functiuon and agg function result may be incorrect 3. string compare may be coredump when string type is too large 4. string type in delete condition can not process correctly 5. add text/blob as alias of string to compitable with mysql 6. fix string type min/max agg may process incorrectly	2021-09-10 09:52:03 +08:00
Zhengguo Yang	8738ce380b	Add long text type STRING, with a maximum length of 2GB. Usage is similar to varchar, and there is no guarantee for the performance of storing extremely long data (#6391 )	2021-08-18 09:05:40 +08:00
Zhengguo Yang	fe65a623c1	Fix timeout error when delete condition contains invalid datetime format (#6030 ) * add date time format check in delete statment	2021-06-29 09:47:42 +08:00
HappenLee	462efeaf39	[Performance Optimization and Refactor] (#5358 ) (#5364 ) 1. Add BlockColumnPredicate support OR and AND column predicate in RowBlockV2 2. Support evaluate vectorization delete predicate in storage engine not in Reader in SegmentV2	2021-02-07 22:41:33 +08:00
Zhengguo Yang	93a4c7efc1	[LOG] Standardize the use of VLOG in code (#5264 ) At present, the application of vlog in the code is quite confusing. It is inherited from impala VLOG_XX format, and there is also VLOG(number) format. VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG	2021-01-21 12:09:09 +08:00
Yingchun Lai	49f7eb69bf	[Refactor] Refactor DeleteHandler and Cond module (2nd) (#5030 ) * [Refactor] Refactor DeleteHandler and Cond module (#4925) This patch mainly do the following refactors: - Use int64_t instead of int32_t for 'version' in DeleteHandler - Move some comments from .cpp to .h file, add some new comments in .h files, and also remove some meaningless comments - Use switch...case... instead of multiple if..else.. for DeleteConditionHandler::is_condition_value_valid - Use range loop to simplify code - Reduce some compare operations in Cond::del_eval - Improve some branch predictions in Reader - Fix and improve some unit tests	2020-12-08 10:01:18 +08:00

1 2

73 Commits