Commit Graph

11 Commits

Author SHA1 Message Date
cc924c9e6a [Rowset Reader] Improve the merge read efficiency of alpha rowsets (#2632)
When merge reads from one rowset with multi overlapping segments, 
I introduce a priority queue(A Minimum heap data structure) for multipath merge sort, 
to replace the old N*M time complexity algorithm.

This can significantly improve the read efficiency when merging large number of 
overlapping data.

In mytest:
1. Compaction with 187 segments reduce time from 75 seconds to 42 seconds
2. Compaction with 3574 segments cost 43 seconds, and with old version, I kill the 
process after waiting more than 10 minutes...

This CL only change the reads of alpha rowset. Beta rowset will be changed in another CL.

ISSUE: #2631
2020-01-02 14:10:05 +08:00
fbee3c7722 Remove VersionHash used to comparison in BE (#2358) 2019-12-04 20:09:03 +08:00
d0316d158d Refactor and reorganize the file utils (#2089) 2019-11-11 20:25:41 +08:00
6b4ef34162 fix AlphaRowsetTest by remove StorageEngine #2078 (#2091) 2019-10-30 19:39:41 +08:00
2cecf5901f Fix segment v2 bug (#1904) 2019-09-30 13:50:39 +08:00
a63989cc61 Use RowsetFactory to create and init RowsetWriter (#1740) 2019-09-04 17:02:43 +08:00
f76dad289e Basic implementation for BetaRowsetReader (#1718) 2019-09-03 13:52:16 +08:00
6f4feca3dc Add rowset id generator to FE and BE (#1678) 2019-09-02 18:51:31 +08:00
da8b9aad9a Remove preaggregation and index stream cache stuff out of RowsetReaderContext (#1698) 2019-08-26 14:19:03 +08:00
c5edf9dae0 Unify Field and ColumnSchema in Storage (#1561)
Currently, we have Field and ColumnSchema to access column data in a
row. These two classes are mostly the same. So we should unify these to
one class. Now, Field has offset information, which is an row attribute,
so we remove offset in Field.

RowCursor now has some logic which belong to Schema, so in this patch I
add Schema attribute to RowCursor to make RowCursor simple. After this
change, only Schema will handle Field/ColumnSchema.

I extract some logic from RowCursor to be/src/olap/row.h, then we can
use same logic to handle different types of row. Each type of row has
same function that to get Cell of this row. A cell represent a column
content with a null indicator.
2019-07-30 14:01:57 +08:00
0d48a3961c Refactor Storage Engine (#1478)
NOTE: This patch would modify all Backend's data.
And this will cause a very long time to restart be.
So if you want to interferer your product environment,
you should upgrade backend one by one.

1. Refactoring be is to clarify the structure the codes.
2. Use unique id to indicate a rowset.
   Nameing rowset with tablet_id and version will lead to
   many conflicts among compaction, clone, restore.
3. Extract an rowset interface to encapsulate rowsets
   with different format.
2019-07-15 21:18:22 +08:00