doris

Author	SHA1	Message	Date
Yingchun Lai	b780df697a	[refactor] Optimize threads usage mode in BE (#4440 ) BE can not graceful exit because some threads are running in endless loop. This patch do the following optimization: - Use the well encapsulated Thread and ThreadPool instead of std::thread and std::vector<std::thread> - Use CountDownLatch in thread's loop condition to avoid endless loop - Introduce a new class Daemon for daemon works, like tcmalloc_gc, memory_maintenance and calculate_metrics - Decouple statistics type TaskWorkerPool and StorageEngine notification by submit tasks to TaskWorkerPool's queue - Reorder objects' stop and deconstruct in main(), i.e. stop network services at first, then internal services - Use libevent in pthreads mode, by calling evthread_use_pthreads(), then EvHttpServer can exit gracefully in multi-threads - Call brpc::Server's Stop() and ClearServices() explicitly	2020-09-06 20:19:14 +08:00
Youngwb	068707484d	Support sequence column for UNIQUE_KEYS Table (#4256 ) * add sequence col Co-authored-by: yangwenbo6 <yangwenbo3@jd.com>	2020-09-04 10:10:17 +08:00
Mingyu Chen	d7ac44ac79	[Bug] Fix bug that BE will crash when querying information_schema.columns (#4511 ) This bug is introduced from #4364	2020-09-03 16:57:56 +08:00
Yingchun Lai	498b06fbe2	[Metrics] Support tablet level metrics (#4428 ) Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet, but we have no insight about tablets in the cluster. This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `. However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request, and not return tablet level metrics by default.	2020-09-02 10:39:41 +08:00
Mingyu Chen	f218327dd9	[Mysql Compatibility] Support convert() and signed/unsigned interger cast (#4364 ) 1. Support convert(expr, target_type) function, which is same as CastExpr 2. Support cast (expr as signed/unsigned int) This is just for compatibility, the signed/unsigned specification is meaningless.	2020-08-27 12:07:58 +08:00
Mingyu Chen	8b0b120aca	[Profile] Add 2 Segment related metrics in query profile (#4348 ) Total number of segments and filterd number of segment	2020-08-27 12:07:21 +08:00
HappenLee	e4e9af4577	This PR contain three things (#4448 ) 1. Fix core bug wild pointer in PlanFragmentExecutor, fix issue #4447 2. Fix core bug wild pointer json load, fix issue #4452 3. Change the declare order of ODBC type in thrift for compatibility	2020-08-26 10:53:53 +08:00
ZhangYu0123	97d963468a	[Code Cleanup] Template nest convert to c++11 syntax and style (#4442 )	2020-08-26 10:51:52 +08:00
EmmyMiao87	b4d8b3d9ba	Forbidden the illegal column types on BITMAP_UNION OR HLL_UNION mv (#4432 ) 1. The base column of bitmap_union could must be integer. The largeint is not supported too. 2. The base column of hll_union could not be decimal. Check error msg of const expr in Union Node If user wants to insert a negative number into bitmap mv, Doris will thrown exception 'invalid input'. The const value in Union Node is checked in this commit.	2020-08-26 10:49:32 +08:00
Zhengguo Yang	d61c10b761	[Delete] Support batch delete [part 1] (#4310 ) * Implements the grammar of the batch delete #4051 * Process create, alter table when table has delete sign column * Support the syntax for enabling the delete column * Automatically filtered deleted data in the select statement. * Automatically add delete sign when create rollup table TODO: * Optimize the reading and compaction logic on the be side, so that the data marked as deleted will be completely deleted during base compaction	2020-08-21 22:57:16 +08:00
lichaoyong	5976395bb6	[BUG] Remove the deduplication of LEFT SEMI/ANTI JOIN with not equal predicate (#4417 ) ``` SELECT * FROM (SELECT cs_order_number, cs_warehouse_sk FROM catalog_sales WHERE cs_order_number = 125005 AND cs_warehouse_sk = 4) cs1 LEFT SEMI JOIN (SELECT cs_order_number, cs_warehouse_sk FROM catalog_sales WHERE cs_order_number = 125005) cs2 ON cs1.cs_order_number = cs2.cs_order_number AND cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk; ``` The above query has an equal predicate and a not equal predicate. If there exists not equal preidcate, the build table should be remained as it is. So the deduplication should be removed.	2020-08-21 19:55:09 +08:00
Mingyu Chen	b6859f1bd4	[JsonLoad] Fix bug that row num stat is not correct when loading json (#4379 ) When all fields are null, the row is invalid, it should be filtered	2020-08-20 09:30:19 +08:00
Mingyu Chen	4c571cb6f5	Revert "[Metrics] Support tablet level metrics (#4327 )" (#4397 ) This reverts commit 56260a65c87830ffe34109195ee4d6f1d543e630. Co-authored-by: morningman <chenmingyu@baidu.com>	2020-08-19 22:37:52 +08:00
Yingchun Lai	56260a65c8	[Metrics] Support tablet level metrics (#4327 ) Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet, but we have no insight about tablets in the cluster. This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `. However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request, and not return tablet level metrics by default.	2020-08-18 16:56:12 +08:00
Mingyu Chen	e25108097d	[Bug][MemTracker] Cleanup the mem tracker's constructor to avoid wrong usage (#4345 ) After PR: #4135, If a mem tracker has parent, it should be created by 'CreateTracker'. So I removed other unused constructors. And also fix the bug described in #4344	2020-08-18 16:54:55 +08:00
lichaoyong	d5e456a3c3	[BUG] Fix except wrong answer bug (#4369 ) Doris use HashTable to implement except. If user send A except B except C, first do A except B and then except C. After A except B, HashTable will be rebuild. There is a bug here to throw some rows.	2020-08-18 09:23:48 +08:00
WangCong	391d534ae7	[Bug]Fix bug that BE crash when load ORC file (#4350 )	2020-08-17 22:55:29 +08:00
lichaoyong	c81862ebec	Remove palo::PInternalService_Stub in BE code. (#4298 ) We can remove the caller in sender side. After all node are upgraded, we can remove the callee in receiver side.	2020-08-10 09:46:17 +08:00
Yingchun Lai	e71152132c	[metrics] Redesign metrics to 3 layers (#4115 ) Redesign metrics to 3 layers: MetricRegistry - MetricEntity - Metrics MetricRegistry : the register center MetricEntity : the entity registered on MetricRegistry. Generally a MetricRegistry can be registered on several MetricEntities, each of MetricEntity is an independent entity, such as server, disk_devices, data_directories, thrift clients and servers, and so on. Metric : metrics of an entity. Such as fragment_requests_total on server entity, disk_bytes_read on a disk_device entity, thrift_opened_clients on a thrift_client entity. MetricPrototype: the type of a metric. MetricPrototype is a global variable, can be shared by the same metrics across different MetricEntities.	2020-08-08 11:23:01 +08:00
Mingyu Chen	3f31866169	[Bug][Load][Json] #4124 Load json format with stream load failed (#4217 ) Stream load should read all the data completely before parsing the json. And also add a new BE config streaming_load_max_batch_read_mb to limit the data size when loading json data. Fix the bug of loading empty json array [] Add doc to explain some certain case of loading json format data. Fix: #4124	2020-08-04 12:55:53 +08:00
sduzh	3ce6fc631e	[BUG] Fix wrong result of querying with cast expr in where clause (#4219 )	2020-08-01 17:46:39 +08:00
HuangWei	10f822eb43	[MemTracker] make all MemTrackers shared (#4135 ) We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web. As follows: 1. nearly all MemTracker raw ptr -> shared_ptr 2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent) 3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec before MemTracker's destructor. So we don't change these code. 4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile. Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.	2020-07-31 21:57:21 +08:00
worker24h	fdcc223ad2	[Bug][Json] Refactor the json load logic to fix some bug 1. Add `json_root` for nest json data. 2. Remove `_jmap` to make the logic reasonable.	2020-07-30 10:36:34 +08:00
HangyuanLiu	3a4a38c2fc	[Bug] Fix orc decimal (#4097 ) Result may error when ORC load negative decimal value When load negative decimal which has pre zero , the result is wrong. eg -0.0014, the orc result is -14(precision ... 0)	2020-07-16 22:36:52 +08:00
Zhengguo Yang	da921928d0	[Code]Fix some spell problem (#4066 ) fix some spell problem	2020-07-13 20:54:31 +08:00
Yunfeng,Wu	265c26f67d	[Doris On ES] Add docvalue limitation for doc_values scan and enable doc_values scan default (#4055 )	2020-07-10 18:37:36 +08:00
HappenLee	fafc7e406e	[Spill]Fix the problem of mem exec, when analytic eval node need to spill to disk with a low mem limit (#3991 ) [Bug] Fix the problem of mem exec, when analytic eval node need to spill to disk with a low mem limit. And clear_reservations of Analytic node reservation of block manager. [Running Profile] Add Spilled flag in Running Profile, when Analytic eval node and sort node spill to Disk.	2020-07-09 09:30:22 +08:00
Mingyu Chen	c3d9feed75	[Load][Json] Refactor json load logic to make it more reasonable (#4020 ) This CL mainly changes: 1. Reorganized the code logic to limit the supported json format to two, and the import behavior is more consistent. 2. Modified the statistical behavior of the number of error rows when loading in json format, so that the error rows can be counted correctly. 3. See `load-json-format.md` to get details of loading json format.	2020-07-07 23:07:28 +08:00
Mingyu Chen	725ebafd99	[Bug] Cancel the query if OlapScanner prepare failed (#4002 )	2020-07-03 21:33:07 +08:00
HuangWei	9bb7e5d208	Fix some code & comments (#3999 ) TPlanExecParams::volume_id is never used, so delete the print_volume_ids() function. Fix log, and log if PlanFragmentExecutor::open() returns error. Fix some comments	2020-07-03 21:18:47 +08:00
Yunfeng,Wu	1e813df3fd	[Doris On ES] [Bug-Fix][Refactor] Fix potential null pointer exception and refactor function process logic (#3985 ) fix: https://github.com/apache/incubator-doris/issues/3984 1. add `conjunct.size` checking and `slot_desc nullptr` checking logic 2. For historical reasons, the function predicates are added one by one, I just refactor the processing make thelogic for function predicate processing more clearly	2020-07-02 22:32:16 +08:00
Mingyu Chen	af1beb6ce4	[Enhance] Add prepare phase for some timestamp functions (#3947 ) Fix: #3946 CL: 1. Add prepare phase for `from_unixtime()`, `date_format()` and `convert_tz()` functions, to handle the format string once for all. 2. Find the cctz timezone when init `runtime state`, so that don't need to find timezone for each rows. 3. Add constant rewrite rule for `utc_timestamp()` 4. Add doc for `to_date()` 5. Comment out the `push_handler_test`, it can not run in DEBUG mode, will be fixed later. 6. Remove `timezone_db.h/cpp` and add `timezone_utils.h/cpp` The performance shows bellow: 11,000,000 rows SQL1: `select count(from_unixtime(k1)) from tbl1;` Before: 8.85s After: 2.85s SQL2: `select count(from_unixtime(k1, '%Y-%m-%d %H:%i:%s')) from tbl1 limit 1;` Before: 10.73s After: 4.85s The date string format seems still slow, we may need a further enhancement about it.	2020-06-29 19:15:09 +08:00
Mingyu Chen	0cbacaf01d	[Refactor] Replace some boost to std in OlapScanNode (#3934 ) Replace some boost to std in OlapScanNode. This refactor seems solve the problem describe in #3929. Because I found that BE will crash to calling `boost::condition_variable.notify_all()`. But after upgrade to this, BE does not crash any more.	2020-06-29 19:13:03 +08:00
Yunfeng,Wu	dc603de4bd	[Doris On ES][Bug-fix] Solve the problem of time format processing (#3941 ) https://github.com/apache/incubator-doris/issues/3936 Doris On ES can obtain field value from `_source` or `docvalues`: 1. From `_source` , get the origin value as you put, ES process indexing、docvalues for `date` field is converted to millisecond 2. From `docvalues`, before( 6.4 you get `millisecond timestamp` value, after(include) 6.4 you get the formatted `date` value :2020-06-18T12:10:30.000Z, but ES (>=6.4) provide `format` parameter for `docvalue` field request, this would coming soon for Doris On ES After this PR was merged into Doris, Doris On ES would only correctly support to process `millisecond` timestamp and string format date, if you provided a `seconds` timestamp, Doris On ES would process wrongly which (divided by 1000 internally) ES mapping: ``` { "timestamp_test": { "mappings": { "doc": { "properties": { "k1": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss\|\|yyyy-MM-dd\|\|epoch_millis" } } } } } } ``` ES documents: ``` { "_index": "timestamp_test", "_type": "doc", "_id": "AXLbzdJY516Vuc7SL51m", "_score": 1, "_source": { "k1": "2020-6-25" } }, { "_index": "timestamp_test", "_type": "doc", "_id": "AXLbzddn516Vuc7SL51n", "_score": 1, "_source": { "k1": 1592816393000 -> 2020/6/22 16:59:53 } } ``` Doris Table: ``` CREATE EXTERNAL TABLE `timestamp_source` ( `k1` date NULL COMMENT "" ) ENGINE=ELASTICSEARCH ``` ### enable_docvalue_scan = false For ES 5.5: ``` mysql> select k1 from timestamp_source; +------------+ \| k1 \| +------------+ \| 2020-06-25 \| \| 2020-06-22 \| +------------+ ``` For ES 6.5 or above: ``` mysql> select * from timestamp_source; +------------+ \| k1 \| +------------+ \| 2020-06-25 \| \| 2020-06-22 \| +------------+ ``` ### enable_docvalue_scan = true For ES 5.5: ``` mysql> select k1 from timestamp_dv; +------------+ \| k1 \| +------------+ \| 2020-06-25 \| \| 2020-06-22 \| +------------+ ``` For ES 6.5 or above: ``` mysql> select * from timestamp_dv; +------------+ \| k1 \| +------------+ \| 2020-06-25 \| \| 2020-06-22 \| +------------+ ```	2020-06-28 09:21:22 +08:00
Yunfeng,Wu	be5fc76557	[Doris On ES][Optimization] Ignore _total node for efficiency (#3932 ) Prior to this PR, Doris On ES merged another PR https://github.com/apache/incubator-doris/pull/3513 which misusing the `total` node. After Doris On ES introduce `terminate_after` (https://github.com/apache/incubator-doris/issues/2576), the `total` documents would not be computed, rely on this `total` field would be dangerous， we just rely on the actual document count by counting the `inner hits` node which it means to be. So we just remove all total parsing and related logic from Doris On ES, this maybe improve performance slightly because of ignoring and skipping `total` json node.	2020-06-26 17:42:33 +08:00
xy720	c50a310f8f	[optimize] Optimize spark load/broker load reading parquet format file (#3878 ) Add BufferedReader for reading parquet file via broker	2020-06-23 13:42:22 +08:00
HappenLee	66a8383ac0	[Running_Profile] Fix all counter in DataStreamRecv and change the image path in docs (#3858 )	2020-06-22 09:20:22 +08:00
HuangWei	fdd65c50c4	[Bug] fix mem_tracker use-after-free & add UT for it (#3899 )	2020-06-20 19:08:53 +08:00
Mingyu Chen	51367abce7	[Bug] Fix bug that BE crash when doing Insert Operation (#3872 ) Mainly change: 1. Fix the bug in `update_status(status)` of `PlanFragmentExecutor`. 2. When the FE Coordinator executes `execRemoteFragmentAsync()`, if it finds an RPC error, return a Future with an error code instead of exception. 3. Protect the `_status` in RuntimeState with lock 4. Move the `_runtime_profile` of RuntimeState before the `_obj_pool`, so that the profile will be deconstructed after the object pool. 5. Remove the unused `ObjectPool` param in RuntimeProfile constructor. If I don't remove it, RuntimeProfile will depends on the `_obj_pool` in RuntimeProfile.	2020-06-19 17:09:04 +08:00
Yunfeng,Wu	355df127b7	[Doris On ES] Support fetch `_id` field from ES (#3900 ) More information can be found: https://github.com/apache/incubator-doris/issues/3901 The created ES external Table must contains `_id` column if you want to fetch the Elasticsearch document `_id`. ``` CREATE EXTERNAL TABLE `doe_id2` ( `_id` varchar COMMENT "", `city` varchar COMMENT "" ) ENGINE=ELASTICSEARCH PROPERTIES ( "hosts" = "http://10.74.167.16:8200", "user" = "root", "password" = "root", "index" = "doe", "type" = "doc", "version" = "6.5.3", "enable_docvalue_scan" = "true", "transport" = "http" ); Query: ``` mysql> select * from doe_id2 limit 10; +----------------------+------+ \| _id \| city \| +----------------------+------+ \| iRHNc3IB8XwmcbhB7lEB \| gz \| \| jBHNc3IB8XwmcbhB71Ef \| gz \| \| jRHNc3IB8XwmcbhB71GI \| gz \| \| jhHNc3IB8XwmcbhB71Hx \| gz \| \| ThHNc3IB8XwmcbhBkFHB \| sh \| \| TxHNc3IB8XwmcbhBkFH9 \| sh \| \| URHNc3IB8XwmcbhBklFA \| sh \| \| ahHNc3IB8XwmcbhBxlFq \| gz \| \| axHNc3IB8XwmcbhBxlHw \| gz \| \| bxHNc3IB8XwmcbhByVFO \| gz \| +----------------------+------+ ``` NOTICE: This change the column name format to support column name start with "_".	2020-06-19 17:07:07 +08:00
Mingyu Chen	b8ee84a120	[Doc] Add docs to OLAP_SCAN_NODE query profile (#3808 )	2020-06-13 16:25:40 +08:00
HappenLee	dac156b6b1	[Spill To Disk] Analytic_Eval_Node Support Spill Disk and Del Some Unless Code (#3820 ) * 1. Add enable spilling in query option, support spill disk in Analytic_Eval_Node, FE can open enable spilling by set enable_spilling = true; Now, Sort Node and Analytic_Eval_Node can spill to disk. 2. Delete merge merge_sorter code we do not use now. 3. Replace buffered_tuple_stream by buffered_tuple_stream2 in Analytic_Eval_Node and support spill to disk. Delete the useless code of buffered_block_mgr and buffered_tuple_stream. 4. Add DataStreamRecvr Profile. Move the counter belong to DataStreamRecvr from fragment to DataStreamRecvr Profile to make clear of Running Profile. * change some hint in code * replace disable_spill with enable_spill which is better compatible to FE	2020-06-13 10:19:02 +08:00
Dayue Gao	7591527977	[Bug] Fix a bug that insert null bitmap crashes BE (#3830 ) INSERT INTO VALUES to_bitmap('xx') may insert null into bitmap column, which may cause dirty data to be written.	2020-06-12 18:03:02 +08:00
Yunfeng,Wu	8c608bbad5	[Doris On ES] Skip function_call expr when process predicate (#3813 ) [Doris On ES] Skip function_call expr when process predicate Fixed #3801 Do not push-down function_call such as split_xxx when process predicate, Doris BE is responsible for processing these predicate All rows in table: ``` +------+------+------+------------+------------+ \| k1 \| k2 \| k3 \| UpdateTime \| ArriveTime \| +------+------+------+------------+------------+ \| NULL \| NULL \| kkk1 \| 123456789 \| NULL \| \| kkk1 \| NULL \| NULL \| 123456789 \| NULL \| \| NULL \| kkk2 \| NULL \| 123456789 \| NULL \| +------+------+------+------------+------------+ ``` The following predicate could not push down to ES. ``` SQL 1: mysql> select * from (select split_part(k1, "1", 1) as kk from case_replay_for_milimin) t where t.kk is not null; +------+ \| kk \| +------+ \| kkk \| +------+ 1 row in set (0.02 sec) SQL 2: mysql> select * from (select split_part(k1, "1", 1) as kk from case_replay_for_milimin) t where t.kk > 'a'; +------+ \| kk \| +------+ \| kkk \| +------+ SQL 3: mysql> select * from (select split_part(k1, "1", 1) as kk from case_replay_for_milimin) t where t.kk > '2'; +------+ \| kk \| +------+ \| kkk \| +------+ 1 row in set (0.03 sec) ```	2020-06-10 11:22:53 +08:00
Yunfeng,Wu	484e7de3c5	[Doirs On ES] fix bug for sparse docvalue context and remove the mistake usage of total (#3751 ) The other PR : https://github.com/apache/incubator-doris/pull/3513 (https://github.com/apache/incubator-doris/issues/3479) try to resolved the `inner hits node is not an array` because when a query( batch-size) run against new segment without this field, as-well the filter_path just only take `hits.hits.fields` 、`hits.hits._source` into account, this would appear an null inner hits node: ``` { "_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAHaUWY1ExUVd0ZWlRY2", "hits": { "total": 1 } } ``` Unfortunately this PR introduce another serious inconsistent result with different batch_size because of misusing the `total`. To avoid this two problem, we just add `hits.hits._score` to filter_path when `docvalue_mode` is true, `_score` would always `null` , and populate the inner hits node: ``` { "_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAHaUWY1ExUVd0ZWlRY2", "hits": { "total": 1, "hits": [ { "_score": null } ] } } ``` related issue: https://github.com/apache/incubator-doris/issues/3752	2020-06-04 16:31:18 +08:00
Mingyu Chen	27046c5b61	[Enhancement] Improve the performance of query with IN predicate (#3694 ) This CL mainly changes: 1. Add a new BE config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to storage engine. 2. Add 2 new session variables `max_scan_key_num` and `doris_max_scan_key_num` which can set in session level and overwrite the config value in BE.	2020-06-04 11:39:00 +08:00
worker24h	e76f712bb3	[Bug] Load data is error in json load	2020-05-28 17:28:33 +08:00
lichaoyong	1cc78fe69b	[Enhancement] Convert metric to Json format (#3635 ) Add a JSON format for existing metrics like this. ``` { "tags": { "metric":"thread_pool", "name":"thrift-server-pool", "type":"active_thread_num" }, "unit":"number", "value":3 } ``` I add a new JsonMetricVisitor to handle the transformation. It's not to modify existing PrometheusMetricVisitor and SimpleCoreMetricVisitor. Also I add 1. A unit item to indicate the metric better 2. Cloning tablet statistics divided by database. 3. Use white space to replace newline in audit.log	2020-05-27 08:49:30 +08:00
HappenLee	f4c03fe8e2	1. Delete the code of Sort Node we do not use now. (#3666 ) Optimize the quick sort by find_the_median and try to reduce recursion level of quick sort.	2020-05-26 10:20:57 +08:00
Mingyu Chen	3ffc447b38	[OUTFILE] Support `INTO OUTFILE` to export query result (#3584 ) This CL mainly changes: 1. Support `SELECT INTO OUTFILE` command. 2. Support export query result to a file via Broker. 3. Support CSV export format with specified column separator and line delimiter.	2020-05-25 21:24:56 +08:00

1 2 3 4 5 ...

275 Commits