Commit Graph

42 Commits

Author SHA1 Message Date
068707484d Support sequence column for UNIQUE_KEYS Table (#4256)
* add sequence  col

Co-authored-by: yangwenbo6 <yangwenbo3@jd.com>
2020-09-04 10:10:17 +08:00
498b06fbe2 [Metrics] Support tablet level metrics (#4428)
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `. 
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
2020-09-02 10:39:41 +08:00
8b0b120aca [Profile] Add 2 Segment related metrics in query profile (#4348)
Total number of segments and filterd number of segment
2020-08-27 12:07:21 +08:00
4c571cb6f5 Revert "[Metrics] Support tablet level metrics (#4327)" (#4397)
This reverts commit 56260a65c87830ffe34109195ee4d6f1d543e630.

Co-authored-by: morningman <chenmingyu@baidu.com>
2020-08-19 22:37:52 +08:00
56260a65c8 [Metrics] Support tablet level metrics (#4327)
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `.
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
2020-08-18 16:56:12 +08:00
10f822eb43 [MemTracker] make all MemTrackers shared (#4135)
We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web.
As follows:
1. nearly all MemTracker raw ptr -> shared_ptr
2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent)
3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec 
     before MemTracker's destructor. So we don't change these code.
4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared 
    too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile.
Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.
2020-07-31 21:57:21 +08:00
725ebafd99 [Bug] Cancel the query if OlapScanner prepare failed (#4002) 2020-07-03 21:33:07 +08:00
b8ee84a120 [Doc] Add docs to OLAP_SCAN_NODE query profile (#3808) 2020-06-13 16:25:40 +08:00
4f79036a7e Add error code into error message (#3645) 2020-05-21 19:14:35 +08:00
b58b1b3953 [metrics] Make DorisMetrics to be a real singleton (#3417) 2020-05-04 09:20:53 +08:00
6c33f80544 Add disable_storage_page_cache config (#2890)
1. when read column data page:
    for compaction, schema_change, check_sum: we don't use page cache
    for query and config::disable_storage_page_cache is false, we use page cache
2. when read column index page
    if config::disable_storage_page_cache is false, we use page cache
2020-02-16 19:13:30 +08:00
4c5b0b6dc9 Remove VersionHash used to comparison in BE (#2622) 2019-12-31 19:38:45 +08:00
d31f774852 Add block split bloom filter (#2471)
[STORAGE][SEGMENTV2]

    use block split bloom filter
    build bloom filter against data page
    add distinct value to bloom filter
    add ordinal index to bloom filter index
2019-12-18 12:57:44 +08:00
f828670245 Add Bitmap index reader (#2319)
[STORAGE] [INDEX]

For #2061 and #2062

Add bitmap index reader
SegmentIterator support bitmap index
Add some metrics
2019-12-03 23:01:40 +08:00
95a3b4ccfe Add object type (#1948)
Add a new type: Object. Currently, it's mainly for complex aggregate metrics(HLL , Bitmap).

The Object type has the following constraints:
1 Object type could not as key column type
2 Object type doesn't support all indices (BloomFilter, short key, zone map, invert index)
3 Object type doesn't support filter and group by

In the implementation:

The Object type reuse the StringValue and StringVal, because in storage engine, the Object type is binary, it has a pointer and length.
2019-10-31 21:42:58 +08:00
9bc2325c6a Fix incorrect scan bytes in metrics (#2034) 2019-10-23 18:13:40 +08:00
9c2d149c36 add profile for segment v2 (#2015) 2019-10-22 09:43:16 +08:00
69d0a34bfd Remove unused _request_columns_size from olap_scanner (#1916) 2019-09-30 15:25:10 +08:00
cafb9f1e62 Replace Arena with MemPool first step (#1899) 2019-09-28 01:12:22 +08:00
b246d93128 Avoid SerDe for aggregation query with object pool (#1854) 2019-09-26 13:51:13 +08:00
0dc0dadad1 Reduce unnecessary memory allocat and copy in OlapScanNode (#1742) 2019-09-04 21:05:12 +08:00
1e4dd77d2a Add bitmap agg type and udaf (#1610) 2019-08-26 14:24:42 +08:00
c5edf9dae0 Unify Field and ColumnSchema in Storage (#1561)
Currently, we have Field and ColumnSchema to access column data in a
row. These two classes are mostly the same. So we should unify these to
one class. Now, Field has offset information, which is an row attribute,
so we remove offset in Field.

RowCursor now has some logic which belong to Schema, so in this patch I
add Schema attribute to RowCursor to make RowCursor simple. After this
change, only Schema will handle Field/ColumnSchema.

I extract some logic from RowCursor to be/src/olap/row.h, then we can
use same logic to handle different types of row. Each type of row has
same function that to get Cell of this row. A cell represent a column
content with a null indicator.
2019-07-30 14:01:57 +08:00
0d48a3961c Refactor Storage Engine (#1478)
NOTE: This patch would modify all Backend's data.
And this will cause a very long time to restart be.
So if you want to interferer your product environment,
you should upgrade backend one by one.

1. Refactoring be is to clarify the structure the codes.
2. Use unique id to indicate a rowset.
   Nameing rowset with tablet_id and version will lead to
   many conflicts among compaction, clone, restore.
3. Extract an rowset interface to encapsulate rowsets
   with different format.
2019-07-15 21:18:22 +08:00
ae6f2d99c5 Fix bug when use SELECT * FROM TABLE LIMIT 1 (#1469) 2019-07-13 23:57:14 +08:00
687d57be66 Fix bug that query statistics in audit log are wrong (#1354) 2019-06-21 19:16:05 +08:00
9d03ba236b Uniform Status (#1317) 2019-06-14 23:38:31 +08:00
9f5f44ec48 Reduce memory RowBlock needed (#1238)
Before RowBlock will reserve memory for all columns in schema, even if
it is not queried. Which will cause bad performance when quering wide
table.

In this patch, RowBlock will reserve memory for needed columns. In a
case, this reduce ConvertBatchTime from 10s to 60ms when quering a wide
table who has 178 columns.

 #1236
2019-06-04 12:58:41 +08:00
02f36c23ed Set tablet as bad when loading index failed (#1146)
Bad tablet will be reported to FE and be handled

And add a config auto_recover_index_loading_failure to control the index loading failure processing
2019-05-13 10:22:04 +08:00
c0fbc84381 Fix bug that ScanBytes is when collect executing query's infos (#869) 2019-04-03 18:27:50 +08:00
c34b306b4f Decimal optimize branch #695 (#727) 2019-03-22 17:22:16 +08:00
e8360f5eee Add counters to OlapScanNode (#538)
There is unnegligible cost to covnert VectorRowBatch to RowBatch,
When we seek block, we only read one row from engine to minimize
this convert cost.

This patch can optimize some query's time from 5s to 2s
2019-01-16 18:57:04 +08:00
6b4049e21c Unify Slice code path (#380) 2018-12-03 18:11:47 +08:00
a2b299e3b9 Reduce UT binary size (#314)
* Reduce UT binary size

Almost every module depend on ExecEnv, and ExecEnv contains all
singleton, which make UT binary contains all object files.

This patch seperate ExecEnv's initial and destory to anthor file to
avoid other file's dependence. And status.cc include debug_util.h which
depend tuple.h tuple_row.h, and I move get_stack_trace() to
stack_util.cpp to reduce status.cc's dependence.

I add USE_RTTI=1 to build rocksdb to avoid linking librocksdb.a

Issue: #292

* Update
2018-11-15 16:17:23 +08:00
37b4cafe87 Change variable and namespace name in BE (#268)
Change 'palo' to 'doris'
2018-11-02 10:22:32 +08:00
2868793b6b Change license to Apache License 2.0 (#262) 2018-11-01 09:06:01 +08:00
051aced48d Missing many files in last commit
In last commit, a lot of files has been missed
2018-10-31 16:19:21 +08:00
65fe7f65c1 Fixed: privilege logic error:
1. No one can set root password expect for root user itself
    2. NODE_PRIV cannot be granted.
    3. ADMIN_PRIV and GRANT_PRIV can only be granted or revoked on *.*
    4. No one can modifly privs of default role 'operator' and 'admin'.
    5. No user can be granted to role 'operator'.
Fixed: the running load limit should not be applied to replay logic. It will cause replay or loading image fail.
Changed: optimize the problem of too many directories under mini load directory.
Fixed: missing password and auth check when handling mini load request in Frontend.
Fixed: DomainResolver should start after Frontends transfer to a certain ROLE, not in Catalog construction methods.
Fixed: a stupid bug that no one can set password for root user... fix it: only root user can set password for root.
Fixed: read null data twice
    When reading data with a null value, in some cases, the same data will be read twice by the storage engine,
    resulting in a wrong result.The reason for this problem is that when splitting,
    and the start key is the minimum value, the data with null is read.
Fixed: add a flag to prevent DomainResovler thread start twice.
Fixed: fixed a mem leak of using ByteBuf when parsing auth info of http request.
Fixed: add a new config 'disable_hadoop_load', default is false, set to true to disable hadoop load.
Changed: add detail error msg of submitting hadoop load job in show load result.
Fixed: Backend process should be crashed if failed to saving header.
Added: exposure backend info to user when encounter error on Backend. for debugging it more convenient.
Fixed: Should remove fd from map when inputstream or outputstream is closed in Broker process.
Fixed: Change all files' LF to unix format.

Internal commit id: merge from dfcd0aca18eed9ff99d188eb3d01c60d419be1b8
2018-10-01 19:58:41 +08:00
2419384e8a push 3.3.19 to github (#193)
* push 3.3.19 to github

* merge to 20ed420122a8283200aa37b0a6179b6a571d2837
2018-05-15 20:38:22 +08:00
cf99230f9e Close #19 fix machine hostname is resolved to loopback address (#34) 2017-08-19 21:35:10 +08:00
6486be64c3 fix license statement (#29)
* change picture to word

* change picture to word

* SHOW FULL TABLES WHERE Table_type != VIEW sql can not execute

* change license description
2017-08-18 19:16:23 +08:00
e2311f656e baidu palo 2017-08-11 17:51:21 +08:00