Commit Graph

556 Commits

Author SHA1 Message Date
e1a8f9d30f Segment v2 stream load core dump(#2037) (#2075)
[STORAGE]
1 fix mem fix mem leak when calling string builder.get_dictionary_page;
2 fix delete invalid mem addr in bitshuffleBuilder when no array grow happends
when bitshuffleBuilder didn't grow array, the data page which not use new to allocate will be
returned to ColumnWriter.
When ColumnWriter destructs, the data page will be deleted,this causes core dump
2019-11-01 22:52:58 +08:00
713e04624f Modify the lower bound of percentile_approx compression param to 2048 (#2111) 2019-11-01 13:07:39 +08:00
45df6aae08 Fix some routine load bugs (#2093)
Mainly fix the following issues:

1. A null pointer exception is raised when a database or table is dropped. The expected behavior is that the routine load job is stopped.

2. Memory leaks. Batch routine load task submissions are no longer performed, and modifications are submitted separately for each task.

3. Unreasonable task timeout.
    Routine load tasks should not be queued in the BE thread pool for execution. The task sent to the BE should be executed immediately, otherwise the task in the FE will be timeout first. Eventually leads to constant timeout for all subsequent tasks.

4. All routine load job should be scheduled once it being submitted. Not waiting the available BE slot. Otherwise, all later submitted jobs may not be scheduled forever.
2019-10-31 21:53:03 +08:00
95a3b4ccfe Add object type (#1948)
Add a new type: Object. Currently, it's mainly for complex aggregate metrics(HLL , Bitmap).

The Object type has the following constraints:
1 Object type could not as key column type
2 Object type doesn't support all indices (BloomFilter, short key, zone map, invert index)
3 Object type doesn't support filter and group by

In the implementation:

The Object type reuse the StringValue and StringVal, because in storage engine, the Object type is binary, it has a pointer and length.
2019-10-31 21:42:58 +08:00
5e8c96f28b Optimize FE start logic (#2052) 2019-10-31 11:11:50 +08:00
f53f188c5d Add arrow IPC serialization for Doris-Spark-Connector (#2013) 2019-10-31 10:32:06 +08:00
6b4ef34162 fix AlphaRowsetTest by remove StorageEngine #2078 (#2091) 2019-10-30 19:39:41 +08:00
0a0da8292f Fix BE could not strat (#2104) 2019-10-30 18:53:39 +08:00
b006d58f5c Fix SegmentIterator lost data when there are multiple RowRanges (#2092) 2019-10-30 12:27:50 +08:00
2ae54250e7 Fix null stats when beta rowset schema change (#2085)
BetaRowsetReader's _context->stats is null when schema change calls next_block
2019-10-28 22:15:33 +08:00
ebdcfc21df Multi distinct + no group by + big data is stuck (#2079)
ISSUE-2069: This kind of query could be stuck.
The sender failed to send the last packet to receiver.
Also, the failure does not be reportted to FE , so the query is not cancelled.
The error log sames as "body_size=xxxx from xxx:xxx is too large".
The reason of the socket is that the packet of the query is too big which is more then the max_body_size of brpc.

This commit add a config named brpc_max_body_size whcih is used to change the max_body_size of brpc.
Also, user can change the max_body_size directly on-the-fly by "http://host:brpc_port/flags".
2019-10-28 18:51:05 +08:00
9408ad67e9 Fix predicate error when reading BetaRowset (#2067) 2019-10-27 12:12:41 +08:00
13fde9fce3 Add stats to BetaRowsetReader (#2074) 2019-10-27 12:06:39 +08:00
52a176b229 Remove stats in SchemaChange (#2071) 2019-10-25 19:25:18 +08:00
b6e3725c5d Fix bug that tablet failed to be committed when no data is loaded (#2064) 2019-10-25 16:36:35 +08:00
189e08faa5 Replace NewStatus with Status (#2046) 2019-10-24 22:48:59 +08:00
78bf825e73 Optimize the convert of row block v2 to v1 #2011 (#2058)
Use MemPool exchange to avoid string copy
Use batch convert to replace row by row
2019-10-24 22:36:30 +08:00
0bcfddab92 Remove clear_alter_task (#2056)
Alter task has been refactored and clear_alter_task is not necessary.
2019-10-24 18:57:14 +08:00
e3c39a192c Fix schema change core dump because of null stats (#2049) 2019-10-23 23:06:29 +08:00
d33e1693b0 Initialize DeltaWriter lazily (#2044)
Only when there is loading data passing to the delta writer, the delta writer is
then initailized. Otherwise, there will be lots of unnecessary transaction adding
and removing on BE.
2019-10-23 18:51:38 +08:00
9bc2325c6a Fix incorrect scan bytes in metrics (#2034) 2019-10-23 18:13:40 +08:00
e6bd1855e2 fix default compaction rowset type bug (#2042) 2019-10-23 11:08:14 +08:00
d25f0ba69a Make ColumnReader load lazily (#2026)
[Storage][SegmentV2]
Currently `segment_v2::Segment::open` will eagerly initialize all column readers, regardless of whether the column is queried or not. Initializing `segment_v2::ColumnReader` incurs additional I/O cost to read ordinal index and zonemap index and should be delayed to the time it's needed.
2019-10-23 10:25:28 +08:00
0f94b685ab Add ES7.x compatibility for doris on es (#2033) 2019-10-22 17:23:33 +08:00
9c2d149c36 add profile for segment v2 (#2015) 2019-10-22 09:43:16 +08:00
6634051359 Make default rowset type to config (#2020) 2019-10-21 21:44:00 +08:00
8aa2cbe12d Load Rowset only once in a thread-safe manner (#2022)
[Storage]
This PR implements thread-safe `Rowset::load()` for both AlphaRowset and BetaRowset. The main changes are 

1. Introduce `DorisCallOnce<ReturnType>` to be the replacement for `DorisInitOnce` . It works for both Status and OLAPStatus.
2. `segment_v2::ColumnReader::init()` is now implemented by DorisCallOnce.
3. `segment_v2::Segment` is now created by a factory open() method. This guarantees all Segment instances are in opened state.
4. `segment_v2::Segment::_load_index()` is now implemented by DorisCallOnce.
5. Implement thread-safe load() for AlphaRowset and BetaRowset
2019-10-21 16:05:12 +08:00
58c882fa2a Remove SchemaChangeV1 (#2014) 2019-10-21 15:07:28 +08:00
05643dc403 Replace Arena with MemPool (#2012)
After replacing Arena with MemPool, we can achieve one copy for string
value read from segment v2. We can exchange MemPool's chunk between
RowBlockV2 and RowBlock. This change only replace Arena, this work will
be done in other change list.
2019-10-19 15:53:24 +08:00
292273be2e Fix string bug in segment v2 (#2005) 2019-10-18 15:53:01 +08:00
c3b5046940 Fix bug of invalid stream load task rollback (#1999)
If stream load be committed with result PUBLISH_TIMEOUT, it should not rollback
this transaction, but only return this message to user.
2019-10-17 21:08:29 +08:00
4f7cc7e033 add predicate filter(#1652) (#1775) 2019-10-17 19:20:00 +08:00
3bca253fb3 Fix beta rowset read slow (#1994)
[Bug][BetaRowset] fix beta rowset read slowly with limit

beta rowset do not update raw_rows_read in statistics and will read all
data in tablet when query with limit, which lead to long query time.
2019-10-17 19:19:46 +08:00
3c12af4dcc Limit the memory consumption of broker scan node (#1996)
If memory exceed limit, no more row batch will be pushed to batch queue
2019-10-17 14:40:16 +08:00
41e55cfca9 Modify fixed partition feature (#1989)
1. Not support MAVALUE in multi partition column.
2. Fix the incorrect show create table stmt.
2019-10-16 16:03:46 +08:00
2fcb79e3ef Fix wrong group by result bug (#1987) 2019-10-16 07:19:53 +08:00
63fa260d3f Support prepare/close in UDF (#1985)
The prepare/close step of scalar function is already supported in execution framework, We only need to do is that support it in syntax and meta in frontend.

In addition, 'Hive' binary type of scalar function NOT supports prepare/close step, we need to make it supports.
2019-10-16 07:19:20 +08:00
ee5b79ac2b Fix bug that memtable should be destroyed before finishing the load process (#1983)
The parent mem tracker may be release before visiting it in child mem tracker,
which cause segfault.
2019-10-15 22:46:19 +08:00
62acf5d098 Limit the memory usage of Loading process (#1954) 2019-10-15 09:26:20 +08:00
f130bd3e7b Use Env function to operate directory (#1980)
Now Env has unify all environment operation, such as file operation.
However some of our old functions don't leverage it. This change unify
FileUtils::scan_dir to use Env's function.
2019-10-15 09:25:12 +08:00
4391152168 Make variable argument UDAF work (#1982) 2019-10-15 09:24:53 +08:00
9fb9dbefca Get rid of compaction on rowset when making snapshot (#1977)
When making snapshot for incremental clone, missed singleton versions may have been compacted.
So get_rowset_by_version() will not acquire any rowset.
This rowset should be acquired from _inc_rs_version_map.
2019-10-14 22:07:31 +08:00
01e71def63 Update engine_clone_task.cpp (#1979) 2019-10-14 16:23:04 +08:00
a6a9b0021f Check tablet state before update it (#1974)
After alter BE try to set tablet state to running but the tablet maybe dropped
2019-10-14 16:04:47 +08:00
fb7e63038b Fix compile fail (#1971) 2019-10-14 10:24:13 +08:00
e3cc0ee93e Fix empty string bug in dict encoding (#1970) 2019-10-14 10:05:00 +08:00
d68b1b287c Support segment-level zone map (#1931) 2019-10-13 22:06:09 +08:00
7eece1e9e2 Support variable arguments for UDAF (#1968) 2019-10-13 22:04:23 +08:00
80e9b21fb0 Make Segment v2 use string's real length(#1943) (#1944) 2019-10-13 13:23:43 +08:00
8232261df1 Lost rowset during tablet revise tablet meta (#1967) 2019-10-12 23:30:11 +08:00