Commit Graph

354 Commits

Author SHA1 Message Date
498b06fbe2 [Metrics] Support tablet level metrics (#4428)
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `. 
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
2020-09-02 10:39:41 +08:00
123237afb7 [Compaction] Persistence stale rowsets meta (#4454)
Persistence stale rowsets meta. When BE reboots, stale rowsets meta
can resume and the stale version can also be readable before stale gc time.

ISSUE: #4453
2020-08-30 21:05:48 +08:00
ad738fa198 Add OLAP_ERR_DATE_QUALITY_ERR error status to display schema change failure (#4388)
In the process of historical data transformation of materialized views, it may occur that the transformation fails due to data quality.
Add an error status code :OLAP_ERR_DATE_QUALITY_ERR to determine if a data problem is causing the failure

#3344
2020-08-27 17:52:53 +08:00
97d963468a [Code Cleanup] Template nest convert to c++11 syntax and style (#4442) 2020-08-26 10:51:52 +08:00
67b842ce04 [License] Organize and modify the license of the code (#4371)
1. Disable the MySQL client and LZO library by default when building the Doris.

    MySQL client library is used for MySQL external table feature.
    This feature will be replaced by the new ODBC external table soon.

    LZO library is used to compress/decompress data of some old data format of Doris,
    which is no longer used anymore.

2. Add missing license to some files.

3. For all non-Apache-License code, all are explained in NOTICE file and the corresponding license is declared.

4. Remove the js source code from webroot, it will be downloaded as thirdparty
2020-08-24 21:51:55 +08:00
d61c10b761 [Delete] Support batch delete [part 1] (#4310)
* Implements the grammar of the batch delete #4051 
* Process create, alter table when table has delete sign column
* Support the syntax for enabling the delete column
* Automatically filtered deleted data in the select statement.
* Automatically add delete sign when create  rollup table
TODO:
 * Optimize the reading and compaction logic on the be side, so that the data marked as deleted will be completely deleted during base compaction
2020-08-21 22:57:16 +08:00
a8fe54b7b9 [ODBC SCAN NODE] 1/4 Add unix odbc library. (#4377) 2020-08-21 21:26:14 +08:00
a7422ee142 [UT][Bug-Fix] Resolve UT memory leak problem (#4406)
Fix ut memory leak on Fix #4164
2020-08-21 10:41:54 +08:00
bfb39a2826 [SQL][Function] Add replace() function (#4347)
replace is an user defined function, which is to replace all old substrings with a new substring in a string, as follow:
mysql> select replace("http://www.baidu.com:9090", "9090", "");
+------------------------------------------------------+
| replace('http://www.baidu.com:9090', '9090', '') |
+------------------------------------------------------+
| http://www.baidu.com: |
+------------------------------------------------------+
2020-08-20 09:28:53 +08:00
4c571cb6f5 Revert "[Metrics] Support tablet level metrics (#4327)" (#4397)
This reverts commit 56260a65c87830ffe34109195ee4d6f1d543e630.

Co-authored-by: morningman <chenmingyu@baidu.com>
2020-08-19 22:37:52 +08:00
dc3ed1c525 [Compaction]Compaction rules optimization (#4212)
Compaction rules optimization, the detail problem description and design to see #4164.
This pr commits 2 functions:
(1) add the cumulative policy configable, and implement original policy.
(2) implement universal policy, the optimization version in #4164.
2020-08-19 09:34:13 +08:00
56260a65c8 [Metrics] Support tablet level metrics (#4327)
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `.
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
2020-08-18 16:56:12 +08:00
11ec7bbe24 [Bug]Add LargeInt cast to Date and Datatime, add timezone to stale_version_path_json_doc (#4321)
(1) Add LargeInt cast to date and datatime,  see #3864 
    LargeInt can cast to  date and datatime.  Fix this error: 
Unable to find _ZN5doris13CastFunctions16cast_to_date_valEPN9doris_udf15FunctionContextERKNS1_11LargeIntValE
    
(2) Add local timezone info to stale_version_path_json_doc rest api
Add timezone to "last create time" field.
{
        "path id": "1",
        "last create time": "1970-01-01 10:46:40 +0800",
        "path list": "1 -> [2-3] -> [4-5]"
 },
 and add timezone to the test unix,  see #4121 .
2020-08-13 23:38:30 +08:00
10e3fc2778 [BUG] Fix abs function cannot handle bigint or bigger data type (#4326) 2020-08-12 20:58:35 +08:00
912547260a [UnitTest] Refactor BE unit test script (#4266)
1. Rename run-ut.sh to run-be-ut.sh
2. Find all test files from build dir instead of declaring separately in the script
3. Add gtest output to collect the result of unit test.
2020-08-11 10:23:51 +08:00
0f30e03914 [BUG] Fix TabletSinkTest unit test (#4318) 2020-08-10 21:28:28 +08:00
e71152132c [metrics] Redesign metrics to 3 layers (#4115)
Redesign metrics to 3 layers:
    MetricRegistry - MetricEntity - Metrics
    MetricRegistry : the register center
    MetricEntity : the entity registered on MetricRegistry. Generally a MetricRegistry can be registered on several 
        MetricEntities, each of MetricEntity is an independent entity, such as server, disk_devices, data_directories, thrift 
        clients and servers, and so on. 
    Metric : metrics of an entity. Such as fragment_requests_total on server entity, disk_bytes_read on a disk_device entity, 
        thrift_opened_clients on a thrift_client entity.
    MetricPrototype: the type of a metric. MetricPrototype is a global variable, can be shared by the same metrics across 
        different MetricEntities.
2020-08-08 11:23:01 +08:00
eefad13107 [Feature] Support InPredicate in delete statement (#4006)
This PR is to add inPredicate support to delete statement,
and add max_allowed_in_element_num_of_delete variable to
limit element num of InPredicate in delete statement.
2020-08-06 23:19:40 +08:00
bfb8c654c1 [Bug] Fix UT bug after making MemTracker shared (#4243)
after making MemTracker shared(#4135), some code haven't been fixed,
and add some useless ut back to build. Fixed in this pr.
2020-08-04 17:52:11 +08:00
16c89c7d56 [BUG]Fix remove expired stale rowset path order error (#4214)
Delete stale rowset path order error. This bug leads to stale rowsets version inconsistents. #4213
2020-08-01 17:44:39 +08:00
10f822eb43 [MemTracker] make all MemTrackers shared (#4135)
We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web.
As follows:
1. nearly all MemTracker raw ptr -> shared_ptr
2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent)
3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec 
     before MemTracker's destructor. So we don't change these code.
4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared 
    too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile.
Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.
2020-07-31 21:57:21 +08:00
fdcc223ad2 [Bug][Json] Refactor the json load logic to fix some bug
1. Add `json_root` for nest json data.
2. Remove `_jmap` to make the logic reasonable.
2020-07-30 10:36:34 +08:00
50e6a2c8a0 [SQL][Function] Fix from/to_base64 may return incorrect value (#4183)
from/to_base64 may return incorrect value when the value is null #4130 
remove the duplicated base64 code
fix the base64 encoded string length is wrong, and this will cause the memory error
2020-07-27 22:55:05 +08:00
46c8c250a6 [Bug] fix use-after-poison bug in ut schema_change_test (#4118)
Using slice->data to create HyperLogLog, it will exec HyperLogLog(Slice(const char*)). Then Slice(const char*) will use strlen(data) to calc the size. But the slice in this unit test isn't a C-string. Need to use Slice.
2020-07-22 09:33:41 +08:00
03cf9b2a24 [Compaction] Add delayed deletion of rowsets function, fix -230 error. (#4039)
Related issue #4017, main changes as follows:
1. Add expired_snapshot_rs_version_map,_expired_snapshot_rs_metas,
2. Add  VersionedRowsetTracker record compacted path version
3. Record path version when rowsets compact
4. In gc process, add expired snapshot rowsets to unused set to remove.
2020-07-19 22:03:59 +08:00
8500d8b695 [metrics] Use atomic instead of SpinLock for integer metric (#4036) 2020-07-17 11:01:33 +08:00
d07a23ece3 [webserver] Introduce mustache to simplify BE's website render (#4062)
cpp-mustache is a C++ implementation of a Mustache template engine
with support for RapidJSON, and in order to simplify RapidJSON object
building, we introduce class EasyJson from Apache Kudu.
2020-07-16 22:39:51 +08:00
3a4a38c2fc [Bug] Fix orc decimal (#4097)
Result may error when ORC load negative decimal value

When load negative decimal which has pre zero , the result is wrong.
eg -0.0014, the orc result is -14(precision ... 0)
2020-07-16 22:36:52 +08:00
5032b7fe7a Support materialized view schema change in bitmap hll and count field [#3739] (#3873)
+ Building the materialized view function for schema_change here based on defineExpr.
+ This is a trick because the current storage layer does not support expression evaluation.
+ count distinct materialized view will set mv_expr with to_bitmap or hll_hash.
+ count materialized view will set mv_expr with count.
+ Support to regenerate historical data when a new materialized view is created in BE。
    + Support to_bitmap function
    + Support hll_hash function
    + Support count(field) function
For #3344
2020-07-16 10:45:15 +08:00
9b0ad66b78 [runtime] Replace the thread pool in FragmentMgr (#4057) 2020-07-15 10:03:48 +08:00
1bfb105ec1 [Bug] Fix bug that routine load task throw exception when calling afterVisible() (#3979) 2020-07-01 09:22:33 +08:00
93a0b47d22 Revert "[Memory Engine] MemTablet creation and compatibility handling in BE (#3762)" (#3931)
This reverts commit ca96ea30560c9e9837c28cfd2cdd8ed24196f787.
2020-06-24 10:13:45 +08:00
c50a310f8f [optimize] Optimize spark load/broker load reading parquet format file (#3878)
Add BufferedReader for reading parquet file via broker
2020-06-23 13:42:22 +08:00
f189a2e7b8 [Spark load][Be 1/1] Be handle push task (#3742)
1、Add a PushBrokerReader in push_handle.cpp.
2、PushBrokerReader wraps the ParquetScanner to support reading data from parquet format file through broker.
2020-06-22 19:57:58 +08:00
8cd36f1c5d [Spark Load] Support java version hyperloglog (#3320)
mainly used for Spark Load process to calculate approximate deduplication value and then serialize to parquet file.
Try to keep the same calculation semantic with be's C++ version
2020-06-21 09:37:05 +08:00
fdd65c50c4 [Bug] fix mem_tracker use-after-free & add UT for it (#3899) 2020-06-20 19:08:53 +08:00
51367abce7 [Bug] Fix bug that BE crash when doing Insert Operation (#3872)
Mainly change:
1. Fix the bug in `update_status(status)` of `PlanFragmentExecutor`.
2. When the FE Coordinator executes `execRemoteFragmentAsync()`, if it finds an RPC error, return a Future with an error code instead of exception.
3. Protect the `_status` in RuntimeState with lock
4. Move the `_runtime_profile` of RuntimeState before the `_obj_pool`, so that the profile will be
deconstructed after the object pool.
5. Remove the unused `ObjectPool` param in RuntimeProfile constructor. If I don't remove it,
RuntimeProfile will depends on the `_obj_pool` in RuntimeProfile.
2020-06-19 17:09:04 +08:00
ca96ea3056 [Memory Engine] MemTablet creation and compatibility handling in BE (#3762) 2020-06-18 09:56:07 +08:00
0224d49842 [Fix][Bug] Fix compile bug (#3888)
Co-authored-by: chenmingyu <chenmingyu@baidu.com>
2020-06-16 18:42:04 +08:00
6c4d7c60dd [Feature] Add QueryDetail to store query statistics. (#3744)
1. Store the query statistics in memory.
2. Supporting RESTFUL interface to get the statistics.
2020-06-15 18:16:54 +08:00
2211cb0ee0 [Metrics] Add metrics document and 2 new metrics of TCP (#3835) 2020-06-15 09:48:09 +08:00
8caedadb67 use scoped_refptr to new HashIndex (#3818) 2020-06-10 23:47:10 +08:00
e4dc2ec440 [StorageEngine] Make StorageEngine::open return more detailed info (#3761)
StorageEngine::open just return a very vague status info when failed,
we have to check logs to find out the root reason, and it's not
convenient to check logs if we run unit tests in CI dockers.
It would be better to return more detailed failure info to point out
the root reason, for example, it may return error status with message
"file descriptors limit is too small".
2020-06-07 10:21:33 +08:00
3b6a781862 [Bug] Fix a bug that tablet's _preferred_rowset_type may be modified to BETA_ROWSET after cloned (#3750)
TabletMeta's _preferred_rowset_type is not initialized after object constructing and
may be a random value, and this field is not updated when create ALPHA_ROWSET tablet,
and it will not be serialized into pb in this case. So if cloning an ALPHA_ROWSET
tablet from another BE, this new created local tablet's _preferred_rowset_type field
may be random as BETA_ROWSET and can not be overwrote after cloned, then new input
rows will be wrote as BETA_ROWSET format which is not we expect.
This patch fix this bug by giving _preferred_rowset_type a default value and updating
this field when create any type of tablet, and add an unit test and related overwrite
equal operator functions.
2020-06-06 11:36:28 +08:00
70aa9d6ca8 [Memory Engine] Add MemTabletScan (#3734) 2020-06-03 15:42:38 +08:00
60f93b2142 Fix bitmap type (#3749) 2020-06-03 10:07:58 +08:00
7524c5ef63 [Memory Engine] Add MemSubTablet, MemTablet, WriteTx, PartialRowBatch (#3637) 2020-05-30 10:33:10 +08:00
c967eaf496 [Memory Engine] Add TabletType to PartitionInfo and TabletMeta (#3668) 2020-05-29 20:20:44 +08:00
e76f712bb3 [Bug] Load data is error in json load 2020-05-28 17:28:33 +08:00
1cc78fe69b [Enhancement] Convert metric to Json format (#3635)
Add a JSON format for existing metrics like this.
```
{
    "tags":
    {
        "metric":"thread_pool",
        "name":"thrift-server-pool",
        "type":"active_thread_num"
    },
    "unit":"number",
    "value":3
}
```
I add a new JsonMetricVisitor to handle the transformation.
It's not to modify existing PrometheusMetricVisitor and SimpleCoreMetricVisitor.
Also I add
1.  A unit item to indicate the metric better 
2. Cloning tablet statistics divided by database.
3. Use white space to replace newline in audit.log
2020-05-27 08:49:30 +08:00