doris

Author	SHA1	Message	Date
bobhan1	64b69ed1ba	[branch-2.1] Picks "[opt](merge-on-write) Skip the alignment process of some rowsets in partial update #38487 " (#38682 ) ## Proposed changes picks https://github.com/apache/doris/pull/38487	2024-08-02 20:05:31 +08:00
Kaijie Chen	0152a4e86f	[config](be) add be config migration_lock_timeout_ms (#38000 ) (#38337 ) backport #38000	2024-07-25 17:36:34 +08:00
lihangyu	217eac790b	[pick](Variant) pick some refactor and fix #34925 #36317 #36201 #36793 (#37526 )	2024-07-11 21:25:34 +08:00
zhannngchen	5541fd11e9	[branch-2.1](partial update)add logs for partial update (#35416 ) add logs for partial update the master PR is #35802 If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...	2024-06-04 22:47:48 +08:00
Xinyi Zou	2ed6a00fd1	[opt](memory) Add GlobalMemoryArbitrator and support ReserveMemory (#34985 ) (#35070 )	2024-05-22 09:53:45 +08:00
abmdocrt	42425808a1	[Cherry-Pick](branch-2.1) Pick "Fix multiple replica partial update auto inc data inconsistency problem #34788 " (#35056 ) * [Fix](auto inc) Fix multiple replica partial update auto inc data inconsistency problem (#34788) * Problem: For tables with auto-increment columns, updating partial columns can cause data inconsistency among replicas. Cause: Previously, the implementation for updating partial columns in tables with auto-increment columns was done independently on each BE (Backend), leading to potential inconsistencies in the auto-increment column values generated by each BE. Solution: Before distributing blocks, determine if the update involves partial columns of a table with an auto-increment column. If so, add the auto-increment column to the last column of the block. After distributing to each BE, each BE will check if the data key for the partial column update exists. If it exists, the previous auto-increment column value is used; if not, the auto-increment column value from the last column of the block is used. This ensures that the auto-increment column values are consistent across different BEs. * 2 * [Fix](regression-test) Fix auto inc partial update unstable regression test (#34940)	2024-05-20 15:43:46 +08:00
abmdocrt	b15fc2a906	[Cherry-pick](branch-2.1) Pick #34043 and #34112 (#34318 ) * [Enhancement](full compaction) Add run status support for full compaction (#34043) * The usage is `curl http://{ip}:{host}/api/compaction/run_status?tablet_id={tablet_id}` e.g. `curl http://127.0.0.1:8040/api/compaction/run_status?tablet_id=10084` If full compaction is running, the output will be ``` { "status" : "Success", "run_status" : true, "msg" : "compaction task for this tablet is running", "tablet_id" : 10084, "compact_type" : "full" } ``` else the ouput will be ``` { "status" : "Success", "run_status" : false, "msg" : "compaction task for this tablet is not running", "tablet_id" : 10084, "compact_type" : "full" } ``` * 2 * 2 * [Fix](partial update) Fix rowset not found error when doing partial update (#34112) Cause: In the logic of partial column updates, the existing data columns are read first, and then the data is supplemented and written back. During the reading process, initialization involves initially fetching rowset IDs, and the actual rowset object is fetched only when needed later. However, between fetching the rowset IDs and the rowset object, compaction may occur, turning the old rowset into a stale rowset. If too much time passes, the stale rowset might be directly deleted. Thus, when the rowset object is needed for an update, it cannot be found. Although the update operation with partial column logic should be able to read all keys and should not encounter new keys, if the rowset disappears, the Backend (BE) will consider these keys as missing. Consequently, it will check whether other columns have default values or are nullable. If this check fails, the aforementioned error is thrown. Solution: To avoid such issues during partial column updates, the initialization step should involve fetching both the rowset IDs and the shared pointer to the rowset object simultaneously. This ensures that the rowset can always be found during data retrieval.	2024-04-30 07:26:23 +08:00
abmdocrt	06a155abb0	[branch-2.1](cherry-pick) Pick some partial-update PR from master (#33639 ) * [Fix](partial-update) Fix partial update fail when the datetime default value is 'current_time' (#32926) * Problem: When importing data that includes datetime with a default value of current time for partial column updates, the import fails. Reason: Partial column updates do not handle the logic for datetime default values. Solution: During partial column updates, when the default value is set to current time, read the current time from the runtime state and write it into the data. * [Enhancement](partial update)Add timezone case for partial update timestamp #33177 * [fix](partial update) Support partial update when the date default value is 'current_date'. This PR is a extension of PR #32926. (#33394)	2024-04-17 23:42:12 +08:00
Kaijie Chen	1da1fac4ee	[improve](load) try lock 30ms to get base_migration_lock in rowset builder (#32243 )	2024-04-12 15:09:25 +08:00
lihangyu	fdc19b4892	[Fix](Variant) Initialize original_tablet_schema in _expand_variant_to_subcolumns to address potential nullptr issue (#32184 ) (#32678 )	2024-03-22 18:02:58 +08:00
abmdocrt	609761567c	[Fix](partial-update) Fix wrong column number passing to BE when partial and enable nereids (#31461 ) * Problem: Inconsistent behavior occurs when executing partial column update `UPDATE` statements and `INSERT` statements on merge-on-write tables with the Nereids optimizer enabled. The number of columns passed to BE differs; `UPDATE` operations incorrectly pass all columns, while `INSERT` operations correctly pass only the updated columns. Reason: The Nereids optimizer does not handle partial column update `UPDATE` statements properly. The processing logic for `UPDATE` statements rewrites them as equivalent `INSERT` statements, which are then processed according to the logic of `INSERT` statements. For example, assuming a MoW table structure with columns k1, k2, v1, v2, the correct rewrite should be: * `UPDATE` table t1 set v1 = v1 + 1 where k1 = 1 and k2 = 2 * => * `INSERT` into table (v1) select v1 + 1 from table t1 where k1 = 1 and k2 = 2 However, the actual rewriting process does not consider the logic for partial column updates, leading to all columns being included in the `INSERT` statement, i.e., the result is: * `INSERT` into table (k1, k2, v1, v2) select k1, k2, v1 + 1, v2 from table t1 where k1 = 1 and k2 = 2 This results in `UPDATE` operations incorrectly passing all columns to BE. Solution: Having analyzed the cause, the solution is straightforward: when rewriting partial column update `UPDATE` statements to `INSERT` statements, only retain the updated columns and all key columns (as partial column updates must include all key columns). Additionally, this PR includes error injection cases to verify the number of columns passed to BE is correct. * 2 * 3 * 4 * 5	2024-03-09 19:45:42 +08:00
abmdocrt	7c30cb20fd	[Fix](partial update) Fix partial update load false when schema includes auto increment column (#31725 ) Problem: When partially updating columns without specifying the auto-increment column, and the imported data contains new keys, an error stating the auto-increment column could not be found occurs. Reason: The logic for partial column updates does not account for new keys in auto-increment columns. Since auto-increment columns can be generated by the system, it's possible to omit this column data during import. However, partial column updates treat this as a regular column, expecting it to be nullable or have a default value for automatic filling, overlooking the fact that auto-increment columns can also be auto-filled. This oversight leads to the error. Solution: Incorporate a check for auto-increment columns into the partial column update logic, and include the logic for generating auto-increment column values in the process of completing partial updates.	2024-03-06 13:06:27 +08:00
Kaijie Chen	1a51d04cb8	[fix](move-memtable) fix schema use-after-free in delta writer v2 (#30254 )	2024-01-24 10:00:25 +08:00
zhengyu	b31494b18c	[test](regression) add fault injection cases for LoadStream (#29101 ) Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2023-12-28 16:16:26 +08:00
bobhan1	c4e484916b	[Fix](table property) Fix table property `disable_auto_compaction` (#27853 )	2023-12-11 20:48:11 +08:00
Sun Chenyang	573b594df3	[improvement](Variant Type) Support displaying subcolumns expanded for the variant column (#27764 )	2023-12-08 20:34:58 +08:00
plat1ko	6da36e1077	[feature](merge-cloud) Refactor write path code by abstract base class (#26537 ) Refactor write path code by abstract base class. Whether to use `StorageEngine` or `CloudStorageEngine` will be determined during compilation instead of runtime `config::cloud_mode` to avoid unexpected null pointer or undefined behavior issues caused by merging code. Class that depend on `StorageEngine` but are shared by the cloud mode need to have an abstract base class. Common code should be extracted into the base class, while the code that depends on `StorageEngine` should be implemented in a `StorageEngine` mix-in class of the base class.	2023-12-08 14:50:36 +08:00
lihangyu	48935c14e2	[Improvement](variant) limit the column size on tablet schema (#27399 ) (#27785 ) 1. limit the column count to default 2048 2. fix get_inverted_index return nullptr when variant's unique id is -1, using it's parent unique id instead 3. avoid add same path subcolumn duplicately in tablet schema 4. make extracted column unique id -1	2023-12-04 14:47:36 +08:00
lihangyu	7398c3daf1	[Feature-Variant](Variant Type) support variant type query and index (#27676 )	2023-11-29 10:37:28 +08:00
plat1ko	d767804815	[feature](merge-cloud) Decouple rowset id generator and local rowsets gc implementation (#25921 )	2023-11-10 10:07:02 +08:00
zhiqiang	a5565f68b2	[Refactor](opentelemetry) Remove opentelemetry (#26605 )	2023-11-09 18:05:34 +08:00
lihangyu	44b51bf0b9	[Feature](Variant) support variant load (#26572 )	2023-11-08 00:37:57 -06:00
Xin Liao	f31c1d858a	[fix](merge-on-write) fix duplicate key in schema change (#25705 ) It should be ensured that the obtained versions are continuous when calculate delete bitmap calculations in publish. The remaining NOTREADY tablet in the schema change failure should be dropped. When a rowset was deleted, the delete bitmap cannot be deleted until there are no read requests to use the rowset.	2023-10-25 05:59:48 -05:00
zhannngchen	0c8bce4292	[fix](partial update) fix some bugs about delete sign (#25712 )	2023-10-24 14:33:33 +08:00
plat1ko	9c9fc84f39	[feature](merge-cloud) Abstract BaseTablet for CloudTablet (#24929 )	2023-10-18 20:29:04 +08:00
bobhan1	1514f78b87	[refactor](partial-update) Split partial update infos from tablet schema (#25147 )	2023-10-17 14:21:40 +08:00
Kaijie Chen	cda8fb6b8b	[fix](load) return Status when error in RowsetWriter::build (#25381 )	2023-10-17 09:40:23 +08:00
zhannngchen	239df5860b	[enhancement](tablet_meta_lock) add more trace for write lock of tablet's _meta_lock (#25095 )	2023-10-08 10:28:10 +08:00
bobhan1	642e5cdb69	[Fix](Status) Make `Status` `[[nodiscard]]` and handle returned `Status` correctly (#23395 )	2023-09-29 22:38:52 +08:00
bobhan1	58ab25ccaa	Revert "[Feature](merge-on-write)Support ignore mode for merge-on-write unique table (#21773 )" (#24731 ) This reverts commit 3ee89aea35726197cb7e94bb4f2c36bc9d50da84.	2023-09-21 21:01:28 +08:00
bobhan1	2098670001	[Fix](merge-on-write) Skip to check delete bitmap correctness in commit phase if the current tablet is converting (#24675 )	2023-09-21 15:48:02 +08:00
plat1ko	b9ddcbf729	[feature](merge-cloud) Rewrite code related to IOContext (#24269 )	2023-09-15 19:57:58 +08:00
bobhan1	3ee89aea35	[Feature](merge-on-write)Support ignore mode for merge-on-write unique table (#21773 )	2023-09-14 18:03:51 +08:00
bobhan1	9898c08620	[enhancement](merge-on-write) Add delete bitmap correctness check in commit phase (#23316 )	2023-09-02 20:03:00 +08:00
abmdocrt	da9eb79ac4	[Enhancement](Schema hash) Remove schema hash in tablet info (#23516 )	2023-08-29 10:05:12 +08:00
Kaijie Chen	29fbe749cd	[refactor](load) split rowset builder out of delta writer (#22805 )	2023-08-14 10:32:58 +08:00

36 Commits