Commit Graph

201 Commits

Author SHA1 Message Date
0fada66e03 [fix](cooldown) Fix deadlock in tablet clone (#17252) 2023-03-03 15:53:12 +08:00
cc5fa509ad [fix](cooldown) Fix bug in concurrent update_cooldown_conf and operations that update cooldowned data (#17086) 2023-03-03 14:36:58 +08:00
26a46d8c3f [fix](cooldown) Handle full clone with cooldowned rowsets (#17069) 2023-02-28 11:04:01 +08:00
00723e36cf [enhancement](merge-on-write) add delete bitmap correctness check for single load (#17147)
For Unique Key MoW table, if there are duplicate keys in one single load job and there's multiple segments, we need to calculate delete bitmap to mark these duplicate keys deleted.
Add a check here to detect any bugs that might cause duplicate keys.
2023-02-28 10:06:36 +08:00
d5b1d3403f [fix](merge-on-write) fix that the version of delete bitmap is incorrect when calculate delete bitmap between segments (#17095)
Different version numbers are used to calculate the delete bitmap between segments and rowsets, resulting in the failure of the last update of the delete bitmap.
2023-02-27 17:17:25 +08:00
8eeb435963 [improvement](meta) Enhance Doris's fault tolerance to disk error (#16472)
Sense io error.
Retry query when io error.
Greylist: When finds one disk is completely broken, or the diff of tablet number in BE and FE meta is too large,reduce the query priority of the BE.
2023-02-23 08:40:45 +08:00
3636d0a561 [feature](merge-on-write) add DCHECK in compaction to detect data inconsistency (#16564)
MoW will mark all duplicate primary key as deleted, so we can add a DCHECK while compaction, if MoW's delete bitmap works incorrectly, we're able to detect this kind of issue ASAP.
In Debug version, DCHECK will make BE crush, in release version, compaction will fail and finally load will fail due to -235
2023-02-22 14:59:18 +08:00
52f9e03eea [fix](cooldown) Use pending_remote_rowsets to avoid deleting rowset files being uploaded (#16803) 2023-02-21 21:58:20 +08:00
a1799e5506 [improve](point query) reuse rowset from lookup_row_key to eliminate tablet lock (#16770)
Reuse rowset for 2 reasons:
1. eliminate tablet lock for performance issue, if other thread hold the lock too long could affect point query latency
2. rowset should be acquired during lookup procedure
2023-02-20 18:38:11 +08:00
c98a0bf803 [Enchancement](merge-on-write) check the correctness of rowid conversion after compaction (#16689)
MoW updates the delete bitmap of the imported data during the compaction by rowid conversion. The correctness of rowid conversion is very important to the result of delete bitmap. So I add a rowid conversion result check.
2023-02-20 16:27:18 +08:00
6a1e3d3435 [fix](cooldown)Fix bug for single cooldown compaction, add remote meta (#16812)
* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction
2023-02-17 15:13:06 +08:00
2a9e748073 [enhancement](merge-on-write) do compaction with merge on read (#16799)
To avoid data irrecoverable due to delete bitmap calculation error,do compaction with merge on read. Through this way ,even if the delete bitmap calculation is wrong, the data can be recovered by full compaction.
2023-02-16 19:20:15 +08:00
7482b6bad2 [fix](cooldown) Add cold_compaction_lock to serialize any operations which may delete the input rowsets of cold data compaction (#16742)
Add cold_compaction_lock to serialize tablet clone, cold data compaction and follow cooldowned data
2023-02-14 21:38:33 +08:00
f1b9185830 [feature](cooldown) Implement cold data compaction (#16681) 2023-02-14 15:21:54 +08:00
5014ad03e7 [feature](cooldown) Auto delete unused remote files (#16588) 2023-02-13 23:59:39 +08:00
6a8fc35b78 [Bug](Cooldown) fix load balance causing no cooldown replica (#16641) 2023-02-12 16:47:38 +08:00
8749aedbae [Bug](point query) make get_rowset thread safe (#16609)
`get_rowset` calling from `lookup_row_data` without lock will lead to core dump if _rs_version_map, _stale_rs_version_map changed
2023-02-10 23:54:56 +08:00
c3110f8153 [fix](merge-on-write) fix that the query result has duplicate keys when load with sequence column (#16587) 2023-02-10 22:31:05 +08:00
1f631c388d [enhance](cooldown)accelerate cooldown task produce efficiency (#16089) 2023-02-10 16:58:27 +08:00
e1f1386395 [fix](cooldown) Rewrite update cooldown conf (#16488)
Remove error-prone CooldownJob, and use CooldownConfHandler to update Tablet's cooldown conf.
Some bug fix about cooldown.
2023-02-09 09:12:55 +08:00
f90d844a53 [improvement](compaction) enable compaction in TABLET_NOTREADY (#16470)
If alter task in queue, compaction is not enabled and may cause too much version.
Keep last 10 version in new tablet so that base tablet's max version will
not be merged and than we can copy data from base tablet to new tablet.
2023-02-07 19:58:23 +08:00
f2fd47f238 [Improve](row-store) support row cache (#16263) 2023-02-06 11:16:39 +08:00
bd8ef4edeb [fix](cooldown) Fix core in remove_all_remote_rowsets (#16374) 2023-02-04 22:31:38 +08:00
1d8265c5a3 [refactor](row-store) make row store column a hidden column in meta (#16251)
This could simplfy storage engine logic and make code more readable, and we could analyze
the hidden `__DORIS_ROW_STORE_COL__` length etc..
2023-02-02 20:56:13 +08:00
6ee0dbfb23 [fix](cooldown) Fix bugs in cooldown single replica files (#16299) 2023-02-02 19:31:26 +08:00
Pxl
ca73c60442 [Chore](build) enable ignored-qualifiers check (#16196)
enable ignored-qualifiers check
2023-02-01 15:15:59 +08:00
00a598a839 [feature](cooldown) Decouple storage policy and resource (#15873) 2023-01-31 14:13:47 +08:00
116e17428b [Enhancement](point query optimize) improve performace of point query on primary keys (#15491)
1. support row format using codec of jsonb
2. short path optimize for point query
3. support prepared statement for point query
4. support mysql binary format
2023-01-20 13:33:01 +08:00
0b5e71d3b4 [refactor](refactor field) remove unused method (#16068) 2023-01-19 10:16:09 +08:00
3894de49d2 [Enhancement](topn) support two phase read for topn query (#15642)
This PR optimize topn query like `SELECT * FROM tableX ORDER BY columnA ASC/DESC LIMIT N`.

TopN is is compose of SortNode and ScanNode, when user table is wide like 100+ columns the order by clause is just a few columns.But ScanNode need to scan all data from storage engine even if the limit is very small.This may lead to lots of read amplification.So In this PR I devide TopN query into two phase:
1. The first phase we just need to read `columnA`'s data from storage engine along with an extra RowId column called `__DORIS_ROWID_COL__`.The other columns are pruned from ScanNode.
2. The second phase I put it in the ExchangeNode beacuase it's the central node for topn nodes in the cluster.The ExchangeNode will spawn a RPC to other nodes using the RowIds(sorted and limited from SortNode) read from the first phase and read row by row from storage engine.

After the second phase read, Block will contain all the data needed for the query
2023-01-19 10:01:33 +08:00
Pxl
b727033906 [Chore](build) enable -Wextra and remove some -Wno (#15760)
enable -Wextra and remove some -Wno
2023-01-15 10:40:35 +08:00
58c520dbfd [Feature](remote) Cooldown cold data to object storage only one replica (#15832) 2023-01-14 23:58:00 +08:00
ab186a60ce [enhancement](compaction) Optimize judging delete rowset and picking candidate rowsets for compaction #15631
Tablet::version_for_delete_predicate should travel all rowset metas in tablet meta which complex is O(N), however we can directly judge whether this rowset is a delete rowset by RowsetMeta::has_delete_predicate which complex is O(1).
As we won't call Tablet::version_for_delete_predicate when pick input rowsets for compaction, we can reduce the critical area of Tablet::_meta_lock.
2023-01-10 08:32:15 +08:00
3c2dee1d10 [fix](typo) Fix typo in variable name (#15538) 2023-01-01 11:03:45 +08:00
cc7a9d92ad [refactor](non-vec) remove non vec code for indexed column reader (#15409) 2022-12-30 23:01:54 +08:00
ad68764977 [enhancement](tablet) Unify redundant create_rowset_writer methods (#15519)
* Remove redundant create_rowset_writer methods

* Set resource id when setting FS in rowset meta

* fix

* fix ut
2022-12-30 22:57:12 +08:00
73957a028c [fix](mow-uniquekey) fix dereference to nullptr in Tablet::calc_delete_bitmap (#15375) 2022-12-27 11:14:25 +08:00
1ed5ad3a16 [fix](merge-on-write) delete all rows with same key in all pre segments (#14995) 2022-12-19 10:08:38 +08:00
f3aea7f0f0 [Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744) 2022-12-11 23:33:18 +08:00
6a26435e8d [bugfix](compaction) fix promotion size bug (#14836) 2022-12-07 18:54:30 +08:00
58bc254529 [enhancement](BE)add metric for too many version (#14735)
* add one funciton to get if exceeds version limit

add bvar to indicate version exceed

* resolve

* remove unnecessary header file
2022-12-05 11:37:14 +08:00
3dde97bff1 (compaction) opt compaction task producer and quick compaction (#13495) (#14535)
1.remove quick_compaction's rowset pick policy, call cu compaction when trigger
quick compaction
2. skip tablet's compaction task when compaction score is too small

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-12-02 10:07:44 +08:00
94a6ffb906 [feature](compaction) support vertical_compaction & ordered_data_compaction (#14524) 2022-12-01 22:15:41 +08:00
1f9fb4dc8b [Bugfix] Fix upgrade from 1.1 coredump (#14163)
When upgrade from 1.1 to master, and then rollback to 1.1, and upgrade to master again, BE will coredump because some rowsets has schema and some rowsets has no schema. In the first time upgrade from 1.1, BE will flush schema in all rowsets and after rollback to 1.1, BE do compaction, and create some new rowset without schema. And the second time upgrade from 1.1, BE coredump because some conditions depend on having all or none of the rowsets.
2022-11-11 10:29:34 +08:00
942611c185 Revert "[enhancement](compaction) opt compaction task producer and quick compaction (#13495)" (#13833)
This reverts commit 4f2ea0776ca3fe5315ab5ef7e00eefabfb5771a0.
2022-11-01 14:22:12 +08:00
4f2ea0776c [enhancement](compaction) opt compaction task producer and quick compaction (#13495)
1.remove quick_compaction's rowset pick policy, call cu compaction when trigger
quick compaction
2. skip tablet's compaction task when compaction score is too small

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-10-31 12:24:05 +08:00
eab8876abc [Feature](remote) Using heavy schema change if the table is not enable light weight schema change (#13487) 2022-10-28 15:48:22 +08:00
87864e40bf [doc](random_sink) Add some doc content about random sink (#13577)
1. Add some doc content about random sink
2. Fix bug of showing missing rowsets info
2022-10-23 22:51:56 +08:00
6d322f85ac [improvement](compaction) delete num based compaction policy (#13409)
Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-10-18 16:13:28 +08:00
125def5102 [enhancement](macOS M1) Support building from source on macOS (M1) (#13195)
# Proposed changes

This PR fixed lots of issues when building from source on macOS with Apple M1 chip.

## ATTENTION

The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime:
1. Some errors with memory tracker occur when BE (RELEASE) starts.
2. Some UT cases fail.
...

Temporarily, the following changes are made on macOS to start BE successfully.
1. Disable memory tracker.
2. Use tcmalloc instead of jemalloc.

This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues.

## Use case

```shell
./build.sh -j 8 --be --clean

cd output/be/bin
ulimit -n 60000
./start_be.sh --daemon
```

## Something else

It takes around _**10+**_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the  development experience on macOS greatly when we finish the adaptation job.
2022-10-18 13:10:13 +08:00