doris

Author	SHA1	Message	Date
HangyuanLiu	5032b7fe7a	Support materialized view schema change in bitmap hll and count field [#3739 ] (#3873 ) + Building the materialized view function for schema_change here based on defineExpr. + This is a trick because the current storage layer does not support expression evaluation. + count distinct materialized view will set mv_expr with to_bitmap or hll_hash. + count materialized view will set mv_expr with count. + Support to regenerate historical data when a new materialized view is created in BE。 + Support to_bitmap function + Support hll_hash function + Support count(field) function For #3344	2020-07-16 10:45:15 +08:00
Zhengguo Yang	14ac49dde5	Fix be may core dump when linked schema change (#4079 ) * fix a core * Update be/src/olap/rowset/segment_group.cpp	2020-07-15 10:14:42 +08:00
HuangWei	9b0ad66b78	[runtime] Replace the thread pool in FragmentMgr (#4057 )	2020-07-15 10:03:48 +08:00
Zhengguo Yang	da921928d0	[Code]Fix some spell problem (#4066 ) fix some spell problem	2020-07-13 20:54:31 +08:00
caiconghui	2e460f581c	[Bug] Support get all rowset meta info in memory from tablet meta url (#4061 ) This PR is to fix bug that we cannot get the newest tablet meta info from tablet meta url.	2020-07-13 20:53:51 +08:00
HappenLee	cd4fec8ab1	[Bug] Fix core of double delete, when RowBatch call transfer_resource_ownership (#4052 ) Resource release should be done by dest RowBatch. When we call method transfer_resource_ownership. if we don't clear the corresponding resources, which will cause the core problem of double delete.	2020-07-13 20:52:22 +08:00
WingC	d7893f0fa7	[Bug]Fix some schema change not work right (#4009 ) [Bug]Fix some schema change not work right This CL mainly fix some schema change to varchar type not work right because forget to logic check && Add ConvertTypeResolver to add supported convert type in order to avoid forget logic check	2020-07-11 10:18:29 +08:00
Yunfeng,Wu	265c26f67d	[Doris On ES] Add docvalue limitation for doc_values scan and enable doc_values scan default (#4055 )	2020-07-10 18:37:36 +08:00
Yingchun Lai	42cb11901b	[webserver] Make BE webserver more pretty (#4050 ) Add some CSS and js files, and use boost-table framework to make BE's website more pretty	2020-07-09 21:50:52 +08:00
wangbo	efef067f2d	[Bug] Fix mem_pool npe (#4045 ) Fix mem_pool NPE in column reader. Add a safe allocation method.	2020-07-09 21:50:22 +08:00
HappenLee	fafc7e406e	[Spill]Fix the problem of mem exec, when analytic eval node need to spill to disk with a low mem limit (#3991 ) [Bug] Fix the problem of mem exec, when analytic eval node need to spill to disk with a low mem limit. And clear_reservations of Analytic node reservation of block manager. [Running Profile] Add Spilled flag in Running Profile, when Analytic eval node and sort node spill to Disk.	2020-07-09 09:30:22 +08:00
caiconghui	5a27981e49	[Config] Add thrift_client_retry_interval_ms config in be for thrift client to avoid avalanche disaster in fe thrift server (#4022 ) This PR is mainly to add `thrift_client_retry_interval_ms` config in be for thrift client to avoid avalanche disaster in fe thrift server and fix some typo and some rpc setting problems at the same time.	2020-07-08 21:07:00 +08:00
lichaoyong	413d6d2f22	[Bug] Fix core when modifing char to varchar and loading boolean with replace aggregation (#4042 ) 1. Doris support modify char to varchar. There is a bug when use two-level pointer when converting the date. 2. Boolean can be used as metric value with REPLACE and REPLACE_IF_NOT_NULL aggregation function. The aggregation function should be added into aggregation map.	2020-07-08 11:12:42 +08:00
Yingchun Lai	ab8851f7aa	[webserver] Make BE webserver handle static files (#4021 ) Make BE webserver handle static files, e.g. css, js, ico, then we can make BE website more pretty.	2020-07-07 23:08:29 +08:00
Mingyu Chen	c3d9feed75	[Load][Json] Refactor json load logic to make it more reasonable (#4020 ) This CL mainly changes: 1. Reorganized the code logic to limit the supported json format to two, and the import behavior is more consistent. 2. Modified the statistical behavior of the number of error rows when loading in json format, so that the error rows can be counted correctly. 3. See `load-json-format.md` to get details of loading json format.	2020-07-07 23:07:28 +08:00
Mingyu Chen	2e111c05ac	[Bug] Fix bug that BE crash when doing alter table task (#4015 ) Need to check delete condition first	2020-07-05 16:28:03 +08:00
yangzhg	6699be2ac8	[Bug] Keep order of read from segment consist with the write order (#3993 ) Fixes #3989 Add segment id to the comparator when merging the rows read from UNIQUE key table.	2020-07-05 16:25:28 +08:00
Mingyu Chen	725ebafd99	[Bug] Cancel the query if OlapScanner prepare failed (#4002 )	2020-07-03 21:33:07 +08:00
Mingyu Chen	bbb7782702	[Bug] Fix bug that linked schema change for alpha rowset will case BE to crash (#3983 ) Co-authored-by: chenmingyu <chenmingyu@baidu.com>	2020-07-03 21:19:31 +08:00
HuangWei	9bb7e5d208	Fix some code & comments (#3999 ) TPlanExecParams::volume_id is never used, so delete the print_volume_ids() function. Fix log, and log if PlanFragmentExecutor::open() returns error. Fix some comments	2020-07-03 21:18:47 +08:00
Yingchun Lai	a16236f22f	[refactor] Remove useless return value of class RowsetGraph (#3977 )	2020-07-03 09:59:51 +08:00
Yunfeng,Wu	1e813df3fd	[Doris On ES] [Bug-Fix][Refactor] Fix potential null pointer exception and refactor function process logic (#3985 ) fix: https://github.com/apache/incubator-doris/issues/3984 1. add `conjunct.size` checking and `slot_desc nullptr` checking logic 2. For historical reasons, the function predicates are added one by one, I just refactor the processing make thelogic for function predicate processing more clearly	2020-07-02 22:32:16 +08:00
yangzhg	5ade21b55d	[Load] Support load true or false as boolean value (#3898 ) Fixes #3831 After this PR insert into: `1/"1" -> 1, 0/"0"->0, true/"true"->1, false/"false" -> 0, "10"->null, "xxxx" -> null` load: `1/true -> 1, 0/false -> 0` other -> null	2020-07-02 13:58:24 +08:00
yangzhg	d3d835844f	[Performance] Improve performance of unique table read (#3974 ) Implements #3971 the test table as list: ``` mysql> desc test; +------------+---------+------+-------+---------+---------+ \| Field \| Type \| Null \| Key \| Default \| Extra \| +------------+---------+------+-------+---------+---------+ \| rid \| BIGINT \| No \| true \| 0 \| \| \| qid \| BIGINT \| No \| true \| 0 \| \| \| qidDeleted \| TINYINT \| No \| false \| 0 \| REPLACE \| \| type \| TINYINT \| No \| false \| 0 \| REPLACE \| \| uid \| BIGINT \| No \| false \| 0 \| REPLACE \| \| toUid \| BIGINT \| No \| false \| 0 \| REPLACE \| \| status \| INT \| No \| false \| 0 \| REPLACE \| \| createTime \| INT \| No \| false \| 0 \| REPLACE \| \| source \| INT \| No \| false \| 0 \| REPLACE \| \| misFlag \| INT \| No \| false \| 0 \| REPLACE \| \| anonymous \| TINYINT \| No \| false \| 0 \| REPLACE \| \| uv \| TINYINT \| No \| false \| 1 \| REPLACE \| +------------+---------+------+-------+---------+---------+ 12 rows in set (0.00 sec) mysql> select count() from test; +----------+ \| count() \| +----------+ \| 1093760 \| +----------+ 1 row in set (1.00 sec) ``` There is 29 versions at present ![image](https://user-images.githubusercontent.com/9098473/85992244-2aa26c80-ba27-11ea-918a-04701a58dbdf.png) I run the query `select sum(uv) from test` for 10 times, the average ScanTime reduced from `9s277ms` to `8s206ms`	2020-07-02 13:56:08 +08:00
caiconghui	9785e103ea	[Bug] Fix bug that delete stmt with filter condition delete all data from table on segment v2 (#3943 ) When we get different columns's row ranges by column_delete_conditions, we should use union operation instead of intersection operation to get final get final row ranges. The root cause is that we lost the relationship of the two delete conditions in same delete stmt. Base data: ``` k1, k2 1, 2 1, 3 case 1: delete from tbl where k1=1 and k2=2; case 2: delete from tbl where k1=1; delete from tbl where k2=2; ``` We treat the above 2 cases as same, which is incorrect. So we need to process every rowset of delete conditions separately.	2020-07-02 11:07:23 +08:00
Dayue Gao	fdcbea480d	[Enhancement] DO NOT increase report version for publish task (#3894 ) Fixes #3893 In a cluster with frequent load activities, FE will ignore most tablet report from BE because currently it only handle reports whose version >= BE's latest report version (which is increased each time a transaction is published). This can be observed from FE's log, with many logs like `out of date report version 15919277405765 from backend[177969252]. current report version[15919277405766]` in it. However many system functionalities rely on TabletReport processing to work properly. For example 1. bad or version miss replica is detected and repaired during TabletReport 2. storage medium migration decision and action is made based on TabletReport 3. BE's old transaction is cleared/republished during TabletReport In fact, it is not necessary to update the report version after the publish task. Because this is actually a problem left over by history. In the reporting logic of the current version, we will no longer decrease the version information of the replica in the FE metadata according to the report. So even if we receive a stale version of the report, it does not matter. This CL contains mainly two changes 1. do not increase report version for publish task 2. populate `tabletWithoutPartitionId` out of read lock of TabletInvertedIndex	2020-07-01 09:23:40 +08:00
Mingyu Chen	1bfb105ec1	[Bug] Fix bug that routine load task throw exception when calling afterVisible() (#3979 )	2020-07-01 09:22:33 +08:00
Dayue Gao	f9a52f5db4	[Bug] Insert may leak DeltaWriter when re-analyzed (#3973 )	2020-06-30 11:09:53 +08:00
caiconghui	48d947edf4	Support rpc_timeout property in stream load request to cancel request in fe in time when stream load request is timeout (#3948 ) This PR is to enable cancel stream load request in FE in time when stream load request is timeout to make stream load more robust.	2020-06-29 19:16:16 +08:00
Mingyu Chen	af1beb6ce4	[Enhance] Add prepare phase for some timestamp functions (#3947 ) Fix: #3946 CL: 1. Add prepare phase for `from_unixtime()`, `date_format()` and `convert_tz()` functions, to handle the format string once for all. 2. Find the cctz timezone when init `runtime state`, so that don't need to find timezone for each rows. 3. Add constant rewrite rule for `utc_timestamp()` 4. Add doc for `to_date()` 5. Comment out the `push_handler_test`, it can not run in DEBUG mode, will be fixed later. 6. Remove `timezone_db.h/cpp` and add `timezone_utils.h/cpp` The performance shows bellow: 11,000,000 rows SQL1: `select count(from_unixtime(k1)) from tbl1;` Before: 8.85s After: 2.85s SQL2: `select count(from_unixtime(k1, '%Y-%m-%d %H:%i:%s')) from tbl1 limit 1;` Before: 10.73s After: 4.85s The date string format seems still slow, we may need a further enhancement about it.	2020-06-29 19:15:09 +08:00
Mingyu Chen	0cbacaf01d	[Refactor] Replace some boost to std in OlapScanNode (#3934 ) Replace some boost to std in OlapScanNode. This refactor seems solve the problem describe in #3929. Because I found that BE will crash to calling `boost::condition_variable.notify_all()`. But after upgrade to this, BE does not crash any more.	2020-06-29 19:13:03 +08:00
xy720	2c8fdb6134	[BUG]Make segment V1 and V2 share same file cache (#3945 ) This commit make segment V1 and V2 share on same file cache, so that segment V2's file descriptors stored in cache can be cleaned up as V1 do.	2020-06-29 18:43:09 +08:00
Yunfeng,Wu	dc603de4bd	[Doris On ES][Bug-fix] Solve the problem of time format processing (#3941 ) https://github.com/apache/incubator-doris/issues/3936 Doris On ES can obtain field value from `_source` or `docvalues`: 1. From `_source` , get the origin value as you put, ES process indexing、docvalues for `date` field is converted to millisecond 2. From `docvalues`, before( 6.4 you get `millisecond timestamp` value, after(include) 6.4 you get the formatted `date` value :2020-06-18T12:10:30.000Z, but ES (>=6.4) provide `format` parameter for `docvalue` field request, this would coming soon for Doris On ES After this PR was merged into Doris, Doris On ES would only correctly support to process `millisecond` timestamp and string format date, if you provided a `seconds` timestamp, Doris On ES would process wrongly which (divided by 1000 internally) ES mapping: ``` { "timestamp_test": { "mappings": { "doc": { "properties": { "k1": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss\|\|yyyy-MM-dd\|\|epoch_millis" } } } } } } ``` ES documents: ``` { "_index": "timestamp_test", "_type": "doc", "_id": "AXLbzdJY516Vuc7SL51m", "_score": 1, "_source": { "k1": "2020-6-25" } }, { "_index": "timestamp_test", "_type": "doc", "_id": "AXLbzddn516Vuc7SL51n", "_score": 1, "_source": { "k1": 1592816393000 -> 2020/6/22 16:59:53 } } ``` Doris Table: ``` CREATE EXTERNAL TABLE `timestamp_source` ( `k1` date NULL COMMENT "" ) ENGINE=ELASTICSEARCH ``` ### enable_docvalue_scan = false For ES 5.5: ``` mysql> select k1 from timestamp_source; +------------+ \| k1 \| +------------+ \| 2020-06-25 \| \| 2020-06-22 \| +------------+ ``` For ES 6.5 or above: ``` mysql> select * from timestamp_source; +------------+ \| k1 \| +------------+ \| 2020-06-25 \| \| 2020-06-22 \| +------------+ ``` ### enable_docvalue_scan = true For ES 5.5: ``` mysql> select k1 from timestamp_dv; +------------+ \| k1 \| +------------+ \| 2020-06-25 \| \| 2020-06-22 \| +------------+ ``` For ES 6.5 or above: ``` mysql> select * from timestamp_dv; +------------+ \| k1 \| +------------+ \| 2020-06-25 \| \| 2020-06-22 \| +------------+ ```	2020-06-28 09:21:22 +08:00
Yunfeng,Wu	be5fc76557	[Doris On ES][Optimization] Ignore _total node for efficiency (#3932 ) Prior to this PR, Doris On ES merged another PR https://github.com/apache/incubator-doris/pull/3513 which misusing the `total` node. After Doris On ES introduce `terminate_after` (https://github.com/apache/incubator-doris/issues/2576), the `total` documents would not be computed, rely on this `total` field would be dangerous， we just rely on the actual document count by counting the `inner hits` node which it means to be. So we just remove all total parsing and related logic from Doris On ES, this maybe improve performance slightly because of ignoring and skipping `total` json node.	2020-06-26 17:42:33 +08:00
HappenLee	5e5696fda2	[Bug] Fix the core in data_stream_recvr. Remove the map in DataStreamRecvr and replace by vector<pair> (#3928 ) Before we use a map in DataStreamRecvr to save the StopWatch corresponding to the pending closures. But we need to take care of the consistency between the map and pending closures queue, it is very error-prone. If it is not consistent, BE will crash. So we remove the map in DataStreamRecvr and replace by vector<pair<Closure*, MonotonicStopWatch>>.	2020-06-25 16:29:07 +08:00
Yingchun Lai	b29cb4b126	[log] Downgrade a log in RunLengthByteReader from WARNING to VLOG (#3925 ) There are too many logs in be.WARNING looks like: ``` W0622 17:47:52.513341 26554 run_length_byte_reader.cpp:102] fail to ReadOnlyFileStream seek.[res = -1705] W0622 17:47:52.513417 26554 run_length_byte_reader.cpp:102] fail to ReadOnlyFileStream seek.[res = -1705] W0622 17:47:52.513466 26554 run_length_byte_reader.cpp:102] fail to ReadOnlyFileStream seek.[res = -1705] ``` It's a normal case when a run length is EOF, so we can downgrade it from WARNING to INFO to reduce useless log in be.WARNING	2020-06-25 16:23:48 +08:00
Mingyu Chen	4a44c457a3	[Bug] Fix bug that a query plan is not correctly cancelled (#3933 ) This bug is introduced by #3872. It will cause some expected to be cancelled queries not being cancelled.	2020-06-25 16:23:13 +08:00
Mingyu Chen	46c64f0861	[Bug] Enable to get TCP metrics for linux kernel 2.x (#3921 ) Fix #3920 CL: 1. Parse the TCP metrics header in `/proc/net/snmp` to get the right position of the metrics. 2. Add 2 new metrics: `tcp_in_segs` and `tcp_out_segs`	2020-06-24 21:29:07 +08:00
wyb	3f7307d685	[Spark Load]Add spark etl job main class (#3927 ) 1. Add SparkEtlJob class 2. Remove DppResult comment 3. Support loading from hive table directly #3433	2020-06-24 13:54:55 +08:00
lichaoyong	93a0b47d22	Revert "[Memory Engine] MemTablet creation and compatibility handling in BE (#3762 )" (#3931 ) This reverts commit ca96ea30560c9e9837c28cfd2cdd8ed24196f787.	2020-06-24 10:13:45 +08:00
EmmyMiao87	feec4ee5bf	[UDF] Support external users to contribute udf (#3760 )	2020-06-23 13:43:08 +08:00
xy720	c50a310f8f	[optimize] Optimize spark load/broker load reading parquet format file (#3878 ) Add BufferedReader for reading parquet file via broker	2020-06-23 13:42:22 +08:00
xy720	f189a2e7b8	[Spark load][Be 1/1] Be handle push task (#3742 ) 1、Add a PushBrokerReader in push_handle.cpp. 2、PushBrokerReader wraps the ParquetScanner to support reading data from parquet format file through broker.	2020-06-22 19:57:58 +08:00
HappenLee	66a8383ac0	[Running_Profile] Fix all counter in DataStreamRecv and change the image path in docs (#3858 )	2020-06-22 09:20:22 +08:00
Mingyu Chen	1e42c4adb7	[Bug] Fix bug that BE crash when doing some queries (#3918 ) This bug is introduced by PR #3872 In that PR, I removed the obj_pool param of the RuntimeProfile constructor. So the first param is std::string. But in DataStreamRecv, it accidentally pass a nullptr to std::string, it compiles OK but will cause runtime error. Fix #3917	2020-06-21 15:25:15 +08:00
wangbo	8cd36f1c5d	[Spark Load] Support java version hyperloglog (#3320 ) mainly used for Spark Load process to calculate approximate deduplication value and then serialize to parquet file. Try to keep the same calculation semantic with be's C++ version	2020-06-21 09:37:05 +08:00
HuangWei	fdd65c50c4	[Bug] fix mem_tracker use-after-free & add UT for it (#3899 )	2020-06-20 19:08:53 +08:00
Mingyu Chen	51367abce7	[Bug] Fix bug that BE crash when doing Insert Operation (#3872 ) Mainly change: 1. Fix the bug in `update_status(status)` of `PlanFragmentExecutor`. 2. When the FE Coordinator executes `execRemoteFragmentAsync()`, if it finds an RPC error, return a Future with an error code instead of exception. 3. Protect the `_status` in RuntimeState with lock 4. Move the `_runtime_profile` of RuntimeState before the `_obj_pool`, so that the profile will be deconstructed after the object pool. 5. Remove the unused `ObjectPool` param in RuntimeProfile constructor. If I don't remove it, RuntimeProfile will depends on the `_obj_pool` in RuntimeProfile.	2020-06-19 17:09:04 +08:00
Yunfeng,Wu	355df127b7	[Doris On ES] Support fetch `_id` field from ES (#3900 ) More information can be found: https://github.com/apache/incubator-doris/issues/3901 The created ES external Table must contains `_id` column if you want to fetch the Elasticsearch document `_id`. ``` CREATE EXTERNAL TABLE `doe_id2` ( `_id` varchar COMMENT "", `city` varchar COMMENT "" ) ENGINE=ELASTICSEARCH PROPERTIES ( "hosts" = "http://10.74.167.16:8200", "user" = "root", "password" = "root", "index" = "doe", "type" = "doc", "version" = "6.5.3", "enable_docvalue_scan" = "true", "transport" = "http" ); Query: ``` mysql> select * from doe_id2 limit 10; +----------------------+------+ \| _id \| city \| +----------------------+------+ \| iRHNc3IB8XwmcbhB7lEB \| gz \| \| jBHNc3IB8XwmcbhB71Ef \| gz \| \| jRHNc3IB8XwmcbhB71GI \| gz \| \| jhHNc3IB8XwmcbhB71Hx \| gz \| \| ThHNc3IB8XwmcbhBkFHB \| sh \| \| TxHNc3IB8XwmcbhBkFH9 \| sh \| \| URHNc3IB8XwmcbhBklFA \| sh \| \| ahHNc3IB8XwmcbhBxlFq \| gz \| \| axHNc3IB8XwmcbhBxlHw \| gz \| \| bxHNc3IB8XwmcbhByVFO \| gz \| +----------------------+------+ ``` NOTICE: This change the column name format to support column name start with "_".	2020-06-19 17:07:07 +08:00
lichaoyong	e0461cc7f4	[bug] Make compaction metrics value is right (#3903 ) Now _input_rowsets will be cleared when calling gc_used_rowsets(). After that, the metrics is not right upon be calculated.	2020-06-19 11:22:06 +08:00

1 2 3 4 5 ...

1015 Commits