For Unique Key MoW table, if there are duplicate keys in one single load job and there's multiple segments, we need to calculate delete bitmap to mark these duplicate keys deleted.
Add a check here to detect any bugs that might cause duplicate keys.
Different version numbers are used to calculate the delete bitmap between segments and rowsets, resulting in the failure of the last update of the delete bitmap.
Sense io error.
Retry query when io error.
Greylist: When finds one disk is completely broken, or the diff of tablet number in BE and FE meta is too large,reduce the query priority of the BE.
MoW will mark all duplicate primary key as deleted, so we can add a DCHECK while compaction, if MoW's delete bitmap works incorrectly, we're able to detect this kind of issue ASAP.
In Debug version, DCHECK will make BE crush, in release version, compaction will fail and finally load will fail due to -235
Reuse rowset for 2 reasons:
1. eliminate tablet lock for performance issue, if other thread hold the lock too long could affect point query latency
2. rowset should be acquired during lookup procedure
MoW updates the delete bitmap of the imported data during the compaction by rowid conversion. The correctness of rowid conversion is very important to the result of delete bitmap. So I add a rowid conversion result check.
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
To avoid data irrecoverable due to delete bitmap calculation error,do compaction with merge on read. Through this way ,even if the delete bitmap calculation is wrong, the data can be recovered by full compaction.
If alter task in queue, compaction is not enabled and may cause too much version.
Keep last 10 version in new tablet so that base tablet's max version will
not be merged and than we can copy data from base tablet to new tablet.
1. support row format using codec of jsonb
2. short path optimize for point query
3. support prepared statement for point query
4. support mysql binary format
This PR optimize topn query like `SELECT * FROM tableX ORDER BY columnA ASC/DESC LIMIT N`.
TopN is is compose of SortNode and ScanNode, when user table is wide like 100+ columns the order by clause is just a few columns.But ScanNode need to scan all data from storage engine even if the limit is very small.This may lead to lots of read amplification.So In this PR I devide TopN query into two phase:
1. The first phase we just need to read `columnA`'s data from storage engine along with an extra RowId column called `__DORIS_ROWID_COL__`.The other columns are pruned from ScanNode.
2. The second phase I put it in the ExchangeNode beacuase it's the central node for topn nodes in the cluster.The ExchangeNode will spawn a RPC to other nodes using the RowIds(sorted and limited from SortNode) read from the first phase and read row by row from storage engine.
After the second phase read, Block will contain all the data needed for the query
Tablet::version_for_delete_predicate should travel all rowset metas in tablet meta which complex is O(N), however we can directly judge whether this rowset is a delete rowset by RowsetMeta::has_delete_predicate which complex is O(1).
As we won't call Tablet::version_for_delete_predicate when pick input rowsets for compaction, we can reduce the critical area of Tablet::_meta_lock.
1.remove quick_compaction's rowset pick policy, call cu compaction when trigger
quick compaction
2. skip tablet's compaction task when compaction score is too small
Co-authored-by: yixiutt <yixiu@selectdb.com>
When upgrade from 1.1 to master, and then rollback to 1.1, and upgrade to master again, BE will coredump because some rowsets has schema and some rowsets has no schema. In the first time upgrade from 1.1, BE will flush schema in all rowsets and after rollback to 1.1, BE do compaction, and create some new rowset without schema. And the second time upgrade from 1.1, BE coredump because some conditions depend on having all or none of the rowsets.
1.remove quick_compaction's rowset pick policy, call cu compaction when trigger
quick compaction
2. skip tablet's compaction task when compaction score is too small
Co-authored-by: yixiutt <yixiu@selectdb.com>
# Proposed changes
This PR fixed lots of issues when building from source on macOS with Apple M1 chip.
## ATTENTION
The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime:
1. Some errors with memory tracker occur when BE (RELEASE) starts.
2. Some UT cases fail.
...
Temporarily, the following changes are made on macOS to start BE successfully.
1. Disable memory tracker.
2. Use tcmalloc instead of jemalloc.
This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues.
## Use case
```shell
./build.sh -j 8 --be --clean
cd output/be/bin
ulimit -n 60000
./start_be.sh --daemon
```
## Something else
It takes around _**10+**_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the development experience on macOS greatly when we finish the adaptation job.