Commit Graph

127 Commits

Author SHA1 Message Date
720808fda5 Remove config::max_file_descriptor_number (#1833) 2019-09-20 07:50:57 +08:00
315f762523 Seek block when starts a ScanKey (#1828)
In Doris, one block has 1024 rows.
1. If the previous ScanKey scan rows multiple blocks,
   and also the final block has 1024 rows just right.
2. The current ScanKey scan rows with number less than one block.
Under the two conditions, if not seek block, the position of prefix shortkey columns is wrong.
2019-09-19 20:08:03 +08:00
17e52a4bac Improve LRUCache to get better performance (#1826)
In this CL, I move the entry's deleter out of LRUCache's mutex block,
which can let others access this cache without waiting free cache entry.
2019-09-19 17:37:02 +08:00
11eafe524f Add ChunkAllocator to accelerate chunk allocation (#1792)
I add ChunkAllocator in this CL to put unused memory chunk to a chunk
pool other than return it to system allocator. Now we only change
MemPool's chunk allocation and free to this.

And two configuration are introduduced too. 'chunk_reserved_bytes_limit'
is the limit of how many bytes this chunk pool can reserve in total and
its default value is 2147483648(2GB). 'use_mmap_allocate_chunk': if
chunk is allocated via mmap and default value is false.

And in my test case with default configuration a simple like
"select * from table limit 10", this can improve throughput from 280 QPS
to to 650 QPS. And when I config 'chunk_reserved_bytes_limit' to 0,
which means this is disabled, the throughput is the same with origin's.
2019-09-13 08:27:24 +08:00
9aa2045987 Refactor alter job (#1695) 2019-09-12 16:31:29 +08:00
dad4def708 Support estimate size for v2 segment writer (#1787) 2019-09-12 15:15:39 +08:00
5653822298 Writer magic number in footer instead of header (#1771) 2019-09-10 09:54:13 +08:00
cd5cfea5cc Encapsulate HLL logic (#1756) 2019-09-09 15:52:10 +08:00
a349409838 Move compare from RowCursor to row (#1764) 2019-09-09 14:51:13 +08:00
65dcabf1df Use crc32c checksum for segment v2 (#1753) 2019-09-06 15:23:57 +08:00
54fd3652e6 Fix bug in BetaRowsetReader which results in empty result (#1754) 2019-09-06 15:07:23 +08:00
3f22238012 Add check for to_bitmap function argument (#1747) 2019-09-05 18:11:38 +08:00
85940a292b RowsetFactory as a single entry for Rowset creation (#1748) 2019-09-05 18:10:18 +08:00
a63989cc61 Use RowsetFactory to create and init RowsetWriter (#1740) 2019-09-04 17:02:43 +08:00
f76dad289e Basic implementation for BetaRowsetReader (#1718) 2019-09-03 13:52:16 +08:00
a80e9996a6 Move version to high 8 bit (#1736) 2019-09-02 19:43:04 +08:00
b4f6f755f1 Add exchange in MemPool to reduce alloc/free operation (#1732)
Reuse allocated chunks when storage read operation.
2019-09-02 19:29:30 +08:00
6f4feca3dc Add rowset id generator to FE and BE (#1678) 2019-09-02 18:51:31 +08:00
76987275b9 Fix result of unix_timestamp() (#1727) 2019-08-30 21:39:16 +08:00
6865f4238b Add limit to show tablet stmt (#1547)
Also add some where predicates for filtering results
ISSUE #1687
2019-08-28 16:25:12 +08:00
34a6e06cb1 fix from string bug(#1710) (#1713) 2019-08-27 18:43:49 +08:00
ae22d5e682 Support multiple key ranges in RowwiseIterator and StorageReadOptions (#1704)
support multiple key ranges in RowwiseIterator and StorageReadOptions
remove unused fields and member functions in RowBlock and ColumnData
read num_rows_per_block from short key index footer
2019-08-27 17:57:42 +08:00
58801c6ab0 Support converting RowBatch and RowBlockV2 to/from Arrow (#1699) 2019-08-27 11:30:00 +08:00
1e4dd77d2a Add bitmap agg type and udaf (#1610) 2019-08-26 14:24:42 +08:00
da8b9aad9a Remove preaggregation and index stream cache stuff out of RowsetReaderContext (#1698) 2019-08-26 14:19:03 +08:00
6d040a33af Add zone map page(#1390) (#1633) 2019-08-24 00:57:30 +08:00
acf868c9d0 Support page compression and checksum in BetaRowset (#1646) 2019-08-19 09:40:47 +08:00
ba6d728f26 Enable parsing columns from file path for Broker Load (#1582) (#1635)
Currently, we do not support parsing encoded/compressed columns in file path, eg: extract column k1 from file path /path/to/dir/k1=1/xxx.csv

This patch is able to parse columns from file path like in Spark(Partition Discovery).

This patch parse partition columns at BrokerScanNode.java and save parsing result of each file path as a property of TBrokerRangeDesc, then the broker reader of BE can read the value of specified partition column.
2019-08-19 09:39:21 +08:00
82d0afc1ba FROM_UNIXTIME should only convert timestamp from 0 to 253402271999 (#1658)
which is between 1970-01-01 00:00:00 ~ 9999-12-31 23:59:59, otherwise, return null
2019-08-16 18:29:57 +08:00
199ff968dc Fix time zone compatibility (#1631) 2019-08-13 18:44:35 +08:00
032d0b41bb Fix compile error (#1630) 2019-08-13 10:00:18 +08:00
69af50aa8c Time zone related BE function (#1598)
Details can be found in time-zone.md document
2019-08-12 20:57:59 +08:00
c0253a17fc Add block compression codec and remove not used codec (#1622) 2019-08-12 20:47:16 +08:00
af8256be2a Implement BetaRowsetWriter (#1590)
BetaRowsetWriter is used to write rowset in V2 segment format.

This PR contains several interface changes
1. Rowset.make_snapshot() is renamed to `link_files_to` because hard links are also useful in copy task, linked schema change, etc
2. Rowset.copy_files_to_path() is renamed to `copy_files_to` to be consistent with other names
3. RowsetWriter.mem_pool() is removed because not all rowset writers use MemPool
4. RowsetWriter.garbage_collection() is removed because it's not used by clients
5. SegmentGroup's make_snapshot() is removed because link_segments_to_path() provides similar functionality
2019-08-12 16:41:47 +08:00
2bd01b23c7 Add page cache for column page in BetaRowset (#1607) 2019-08-12 10:42:00 +08:00
e3348c46a9 Expose data pruned-filter-scan ability (#1527) 2019-08-11 12:59:24 +08:00
b2e678dfc1 Support Segment for BetaRowset (#1577)
We create a new segment format for BetaRowset. New format merge
data file and index file into one file. And we create a new format
for short key index. In origin code index is stored in format like
RowCusor which is not efficient to compare. Now we encode multiple
column into binary, and we assure that this binary is sorted same
with the key columns.
2019-08-06 17:15:11 +08:00
c5edf9dae0 Unify Field and ColumnSchema in Storage (#1561)
Currently, we have Field and ColumnSchema to access column data in a
row. These two classes are mostly the same. So we should unify these to
one class. Now, Field has offset information, which is an row attribute,
so we remove offset in Field.

RowCursor now has some logic which belong to Schema, so in this patch I
add Schema attribute to RowCursor to make RowCursor simple. After this
change, only Schema will handle Field/ColumnSchema.

I extract some logic from RowCursor to be/src/olap/row.h, then we can
use same logic to handle different types of row. Each type of row has
same function that to get Cell of this row. A cell represent a column
content with a null indicator.
2019-07-30 14:01:57 +08:00
0694b6a6fa Fix bugs of Broker load (#1546)
Use same UUID as query ID and load ID of a load execution plan.
Each load execution plan has a load ID, and as a plan, there is also a query ID.
We can use same UUID as query ID and load ID, for tracing the load process more easily.

Change the load ID when retrying a load execution plan.
When a load execution plan retry, the load ID should be changed, otherwise BE can not
distinguish the old and new load requests.

Cancel the running loading task when cancelling the broker load.
When user cancel a broker load, the running loading task should also be cancelled, or
it may occupies the worker thread for a long time.

Remove the unnecessary query report when doing load execution plan.
Only the last query report is needed.

Add a new BE config tablet_writer_rpc_timeout_sec.
It is used for RPC of tablet sink. The default is 600 seconds. which is long enough for flushing
about 6GB data. The long timeout config will reduce the possibility of encountering fail to send batch error when loading.

Use streaming_load_max_mb instead of mini_load_max_mb in BE config.

Add more logs for tracing a broker load process easily.
2019-07-27 20:17:05 +08:00
e8561d71a6 Add dict page (#1409)
Add dict encoding page for binary/string type data. 
Construct a dict for original data, and save encoded id instead of 
origin data to save space. If the dict is too big, then will automatically fall
back to plain encoding.
2019-07-26 09:47:11 +08:00
dbc912d2df Unify ColumnSchemaV2 and ColumnSchema to one (#1545)
Currently, we have two versions of ColumnSchema, in this patch, we unify
these two classes to one class.
2019-07-25 10:48:16 +08:00
0805b05d81 Remove unused FieldInfo (#1540) 2019-07-24 19:33:30 +08:00
68782be7a6 Refactor storage aggregate framework (#1529)
Add AggregateInfo to enclose all functions that used to aggregate value
column.
2019-07-24 10:02:35 +08:00
4aedaea84e Support TIME type and timediff function (#1505) 2019-07-23 13:42:39 +08:00
0c8e91adf4 Add storage rowwise iterator (#1515)
Use RowwiseIterator to uniform all data fetch in storage engine.
All objects in storage engine can be read in iterator format.
For example: Segment, Rowset.

This patch implement two generic iterators: UnionRowwiseIterator,
MergeRowwiseIterator. These two class will add iterator as its inputs.

To implement iterators, we define a new class RowBlockV2, all data read
from iterator is in this format. We define a new class other than use
old version's RowBlock is because we want to keep old code work
normally.
2019-07-22 14:35:11 +08:00
41499061ac Refactor types.h to reduce code and add UT (#1498) 2019-07-18 12:24:41 +08:00
a9e8113b82 Fix heap-buffer-overflow in split_part() function in StringFunctions (#1482) 2019-07-15 23:00:37 +08:00
0d48a3961c Refactor Storage Engine (#1478)
NOTE: This patch would modify all Backend's data.
And this will cause a very long time to restart be.
So if you want to interferer your product environment,
you should upgrade backend one by one.

1. Refactoring be is to clarify the structure the codes.
2. Use unique id to indicate a rowset.
   Nameing rowset with tablet_id and version will lead to
   many conflicts among compaction, clone, restore.
3. Extract an rowset interface to encapsulate rowsets
   with different format.
2019-07-15 21:18:22 +08:00
a7390c03f4 Add percentile_approx aggregate function (#1432) 2019-07-11 16:44:43 +08:00
98bd4b4565 Add string function split_part (#1451) 2019-07-10 09:47:33 +08:00