doris

Author	SHA1	Message	Date
Yunfeng,Wu	d82d48da87	[Doris On ES][Bug-fix] Sync ES metadata failure after restart or upgrade FE (#3961 ) ISSUE:#3960 PR #3454 introduce the caching for EsClient, but the initialization of the client was only during editlog replay, all this work should done also during image replay. This happens when restart or upgrade FE BTW: modify a UT failure for metric	2020-06-29 14:13:07 +08:00
Yunfeng,Wu	55c058e4b1	[Compile] modify compile error (#3959 )	2020-06-28 10:39:31 +08:00
WingC	b2b9e22b24	[CreateTable] Check backend disk has available capacity by storage medium before create table (#3519 ) Currently we choose BE random without check disk is available, the create table will failed until create tablet task is sent to BE and BE will check is there has available capacity to create tablet. So check backend disk available by storage medium will reduce unnecessary RPC call.	2020-06-28 09:36:31 +08:00
Yunfeng,Wu	dc603de4bd	[Doris On ES][Bug-fix] Solve the problem of time format processing (#3941 ) https://github.com/apache/incubator-doris/issues/3936 Doris On ES can obtain field value from `_source` or `docvalues`: 1. From `_source` , get the origin value as you put, ES process indexing、docvalues for `date` field is converted to millisecond 2. From `docvalues`, before( 6.4 you get `millisecond timestamp` value, after(include) 6.4 you get the formatted `date` value :2020-06-18T12:10:30.000Z, but ES (>=6.4) provide `format` parameter for `docvalue` field request, this would coming soon for Doris On ES After this PR was merged into Doris, Doris On ES would only correctly support to process `millisecond` timestamp and string format date, if you provided a `seconds` timestamp, Doris On ES would process wrongly which (divided by 1000 internally) ES mapping: ``` { "timestamp_test": { "mappings": { "doc": { "properties": { "k1": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss\|\|yyyy-MM-dd\|\|epoch_millis" } } } } } } ``` ES documents: ``` { "_index": "timestamp_test", "_type": "doc", "_id": "AXLbzdJY516Vuc7SL51m", "_score": 1, "_source": { "k1": "2020-6-25" } }, { "_index": "timestamp_test", "_type": "doc", "_id": "AXLbzddn516Vuc7SL51n", "_score": 1, "_source": { "k1": 1592816393000 -> 2020/6/22 16:59:53 } } ``` Doris Table: ``` CREATE EXTERNAL TABLE `timestamp_source` ( `k1` date NULL COMMENT "" ) ENGINE=ELASTICSEARCH ``` ### enable_docvalue_scan = false For ES 5.5: ``` mysql> select k1 from timestamp_source; +------------+ \| k1 \| +------------+ \| 2020-06-25 \| \| 2020-06-22 \| +------------+ ``` For ES 6.5 or above: ``` mysql> select * from timestamp_source; +------------+ \| k1 \| +------------+ \| 2020-06-25 \| \| 2020-06-22 \| +------------+ ``` ### enable_docvalue_scan = true For ES 5.5: ``` mysql> select k1 from timestamp_dv; +------------+ \| k1 \| +------------+ \| 2020-06-25 \| \| 2020-06-22 \| +------------+ ``` For ES 6.5 or above: ``` mysql> select * from timestamp_dv; +------------+ \| k1 \| +------------+ \| 2020-06-25 \| \| 2020-06-22 \| +------------+ ```	2020-06-28 09:21:22 +08:00
WingC	3be28460f7	[Bug]Dynamic partition check interval seconds is not right (#3951 )	2020-06-27 10:07:39 +08:00
Stalary	a894b1edc5	[Doris On ES] Split /_cluster/state to [indexName/_mappings, indexName/_search_shards] (#3454 ) 1. Split /_cluster/state into /_mapping and /_search_shards requests to reduce permissions and make the logic clearer 2. Rename part es related objects to make their representation more accurate 3. Simply support docValue and Fields in alias mode, and take the first one by default #3311	2020-06-26 17:46:43 +08:00
Yunfeng,Wu	be5fc76557	[Doris On ES][Optimization] Ignore _total node for efficiency (#3932 ) Prior to this PR, Doris On ES merged another PR https://github.com/apache/incubator-doris/pull/3513 which misusing the `total` node. After Doris On ES introduce `terminate_after` (https://github.com/apache/incubator-doris/issues/2576), the `total` documents would not be computed, rely on this `total` field would be dangerous， we just rely on the actual document count by counting the `inner hits` node which it means to be. So we just remove all total parsing and related logic from Doris On ES, this maybe improve performance slightly because of ignoring and skipping `total` json node.	2020-06-26 17:42:33 +08:00
张家锋	b956bd5c8e	[Doc] Add document for setting up IntelliJ IDEA (#3939 )	2020-06-26 14:35:02 +08:00
HappenLee	5e5696fda2	[Bug] Fix the core in data_stream_recvr. Remove the map in DataStreamRecvr and replace by vector<pair> (#3928 ) Before we use a map in DataStreamRecvr to save the StopWatch corresponding to the pending closures. But we need to take care of the consistency between the map and pending closures queue, it is very error-prone. If it is not consistent, BE will crash. So we remove the map in DataStreamRecvr and replace by vector<pair<Closure*, MonotonicStopWatch>>.	2020-06-25 16:29:07 +08:00
EmmyMiao87	6c768f5303	[Doc] Add doc of `enable_materialized_view` (#3940 )	2020-06-25 16:26:18 +08:00
Yingchun Lai	b29cb4b126	[log] Downgrade a log in RunLengthByteReader from WARNING to VLOG (#3925 ) There are too many logs in be.WARNING looks like: ``` W0622 17:47:52.513341 26554 run_length_byte_reader.cpp:102] fail to ReadOnlyFileStream seek.[res = -1705] W0622 17:47:52.513417 26554 run_length_byte_reader.cpp:102] fail to ReadOnlyFileStream seek.[res = -1705] W0622 17:47:52.513466 26554 run_length_byte_reader.cpp:102] fail to ReadOnlyFileStream seek.[res = -1705] ``` It's a normal case when a run length is EOF, so we can downgrade it from WARNING to INFO to reduce useless log in be.WARNING	2020-06-25 16:23:48 +08:00
Mingyu Chen	4a44c457a3	[Bug] Fix bug that a query plan is not correctly cancelled (#3933 ) This bug is introduced by #3872. It will cause some expected to be cancelled queries not being cancelled.	2020-06-25 16:23:13 +08:00
Mingyu Chen	46c64f0861	[Bug] Enable to get TCP metrics for linux kernel 2.x (#3921 ) Fix #3920 CL: 1. Parse the TCP metrics header in `/proc/net/snmp` to get the right position of the metrics. 2. Add 2 new metrics: `tcp_in_segs` and `tcp_out_segs`	2020-06-24 21:29:07 +08:00
Mingyu Chen	df8f9cc215	[Bug] Unify the timezone (#3910 ) When we get default system time zone, it will return `PRC`, which is not supported by us, thus will cause dynamic partition create failed. Fix #3919 This CL mainly changes: 1. Use a unified method to get the system default time zone 2. Now the default variable `system_time_zone` and `time_zone` is set to the default system time zone, which is `Asia/Shanghai`. 3. Modify related unit test. 4. Support time zone `PRC`.	2020-06-24 21:28:25 +08:00
wyb	3f7307d685	[Spark Load]Add spark etl job main class (#3927 ) 1. Add SparkEtlJob class 2. Remove DppResult comment 3. Support loading from hive table directly #3433	2020-06-24 13:54:55 +08:00
lichaoyong	93a0b47d22	Revert "[Memory Engine] MemTablet creation and compatibility handling in BE (#3762 )" (#3931 ) This reverts commit ca96ea30560c9e9837c28cfd2cdd8ed24196f787.	2020-06-24 10:13:45 +08:00
EmmyMiao87	feec4ee5bf	[UDF] Support external users to contribute udf (#3760 )	2020-06-23 13:43:08 +08:00
xy720	c50a310f8f	[optimize] Optimize spark load/broker load reading parquet format file (#3878 ) Add BufferedReader for reading parquet file via broker	2020-06-23 13:42:22 +08:00
Yunfeng,Wu	e5da108110	[Doris On ES][Docs] update document for best practices (#3924 ) Add best practices for #3559 and update feature for #3901	2020-06-23 13:39:56 +08:00
wangbo	8092aadc83	[Spark Load]Using SparkDpp to complete some calculation in Spark Load (#3729 )	2020-06-22 19:58:34 +08:00
xy720	f189a2e7b8	[Spark load][Be 1/1] Be handle push task (#3742 ) 1、Add a PushBrokerReader in push_handle.cpp. 2、PushBrokerReader wraps the ParquetScanner to support reading data from parquet format file through broker.	2020-06-22 19:57:58 +08:00
wangbo	3a7b8e98a6	[Spark Load] Doris Support Using Hive Table to Build Global Dict (#3063 )	2020-06-22 14:07:36 +08:00
wangbo	f03abcdfb3	[Spark Load] Rollup Tree Builder (#3727 ) 1 A tree data structure to describe doris table's rollup 2 A builder to build the data structure	2020-06-22 14:06:33 +08:00
Mingyu Chen	56bb218148	[Bug] Can not use non-key column as partition column in duplicate table (#3916 ) The following statement will throw error: ``` create table test.tbl2 (k1 int, k2 int, k3 float) duplicate key(k1) partition by range(k2) (partition p1 values less than("10")) distributed by hash(k3) buckets 1 properties('replication_num' = '1'); ``` Error: `Only key column can be partition column` But in duplicate key table, columns can be partition or distribution column even if they are not in duplicate keys. This bug is introduced by #3812	2020-06-22 09:24:21 +08:00
Mingyu Chen	4c3ccfb906	[FE] Prohibit pointing helper to itself when starting FE (#3850 ) When starting FE with `start_fe.sh --helper xxx` command, do not allow to point helper to FE itself. Because this is meaningless and may cause some confusing problemes.	2020-06-22 09:21:08 +08:00
HappenLee	66a8383ac0	[Running_Profile] Fix all counter in DataStreamRecv and change the image path in docs (#3858 )	2020-06-22 09:20:22 +08:00
Yuance.Li	35d07d8012	[Doc] Fix audit-plugin doc error (#3922 ) Fix audit-plugin doc error Droris -> Doris	2020-06-22 09:09:56 +08:00
wyb	a63fa88294	[Spark load][Fe 6/6] Fe process etl and loading state job (#3717 ) 1. Fe checks the status of etl job regularly 1.1 If status is RUNNING, update etl job progress 1.2 If status is CANCELLED, cancel load job 1.3 If status is FINISHED, get the etl output file paths, update job state to LOADING and log job update info 2. Fe sends PushTask to Be and commits transaction after all push tasks execute successfully #3433	2020-06-21 22:17:03 +08:00
YuJun	03fa1fefa9	[Doc] Fix doc-bug (#3914 )	2020-06-21 16:39:27 +08:00
Mingyu Chen	1e42c4adb7	[Bug] Fix bug that BE crash when doing some queries (#3918 ) This bug is introduced by PR #3872 In that PR, I removed the obj_pool param of the RuntimeProfile constructor. So the first param is std::string. But in DataStreamRecv, it accidentally pass a nullptr to std::string, it compiles OK but will cause runtime error. Fix #3917	2020-06-21 15:25:15 +08:00
wangbo	8cd36f1c5d	[Spark Load] Support java version hyperloglog (#3320 ) mainly used for Spark Load process to calculate approximate deduplication value and then serialize to parquet file. Try to keep the same calculation semantic with be's C++ version	2020-06-21 09:37:05 +08:00
HuangWei	fdd65c50c4	[Bug] fix mem_tracker use-after-free & add UT for it (#3899 )	2020-06-20 19:08:53 +08:00
Mingyu Chen	8e895958d6	[Bug] Checkpoint thread is not running (#3913 ) This bug is introduced by PR #3784 In #3784, I remove the `Catalog.getInstance()`, and use `Catalog.getCurrentCatalog()` instead. But actually, there are some place that should use the serving catalog explicitly. Mainly changed: 1. Add a new method `getServingCatalog()` to explicitly return the real catalog instance. 2. Fix a compile bug of broker introduced by #3881	2020-06-20 09:32:14 +08:00
wyb	532d15d381	[Spark Load]Fe submit spark etl job (#3716 ) After user creates a spark load job which status is PENDING, Fe will schedule and submit the spark etl job. 1. Begin transaction 2. Create a SparkLoadPendingTask for submitting etl job 2.1 Create etl job configuration according to https://github.com/apache/incubator-doris/issues/3010#issuecomment-635174675 2.2 Upload the configuration file and job jar to HDFS with broker 2.3 Submit etl job to spark cluster 2.4 Wait for etl job submission result 3. Update job state to ETL and log job update info if etl job is submitted successfully #3433	2020-06-19 17:44:47 +08:00
caiconghui	5d40218ae6	[Config] Support max_stream_load_timeout_second config in fe (#3902 ) This configuration is specifically used to limit timeout setting for stream load. It is to prevent that failed stream load transactions cannot be canceled within a short time because of the user's large timeout setting.	2020-06-19 17:09:27 +08:00
Mingyu Chen	51367abce7	[Bug] Fix bug that BE crash when doing Insert Operation (#3872 ) Mainly change: 1. Fix the bug in `update_status(status)` of `PlanFragmentExecutor`. 2. When the FE Coordinator executes `execRemoteFragmentAsync()`, if it finds an RPC error, return a Future with an error code instead of exception. 3. Protect the `_status` in RuntimeState with lock 4. Move the `_runtime_profile` of RuntimeState before the `_obj_pool`, so that the profile will be deconstructed after the object pool. 5. Remove the unused `ObjectPool` param in RuntimeProfile constructor. If I don't remove it, RuntimeProfile will depends on the `_obj_pool` in RuntimeProfile.	2020-06-19 17:09:04 +08:00
Yunfeng,Wu	355df127b7	[Doris On ES] Support fetch `_id` field from ES (#3900 ) More information can be found: https://github.com/apache/incubator-doris/issues/3901 The created ES external Table must contains `_id` column if you want to fetch the Elasticsearch document `_id`. ``` CREATE EXTERNAL TABLE `doe_id2` ( `_id` varchar COMMENT "", `city` varchar COMMENT "" ) ENGINE=ELASTICSEARCH PROPERTIES ( "hosts" = "http://10.74.167.16:8200", "user" = "root", "password" = "root", "index" = "doe", "type" = "doc", "version" = "6.5.3", "enable_docvalue_scan" = "true", "transport" = "http" ); Query: ``` mysql> select * from doe_id2 limit 10; +----------------------+------+ \| _id \| city \| +----------------------+------+ \| iRHNc3IB8XwmcbhB7lEB \| gz \| \| jBHNc3IB8XwmcbhB71Ef \| gz \| \| jRHNc3IB8XwmcbhB71GI \| gz \| \| jhHNc3IB8XwmcbhB71Hx \| gz \| \| ThHNc3IB8XwmcbhBkFHB \| sh \| \| TxHNc3IB8XwmcbhBkFH9 \| sh \| \| URHNc3IB8XwmcbhBklFA \| sh \| \| ahHNc3IB8XwmcbhBxlFq \| gz \| \| axHNc3IB8XwmcbhBxlHw \| gz \| \| bxHNc3IB8XwmcbhByVFO \| gz \| +----------------------+------+ ``` NOTICE: This change the column name format to support column name start with "_".	2020-06-19 17:07:07 +08:00
lichaoyong	e0461cc7f4	[bug] Make compaction metrics value is right (#3903 ) Now _input_rowsets will be cleared when calling gc_used_rowsets(). After that, the metrics is not right upon be calculated.	2020-06-19 11:22:06 +08:00
xy720	1d9fa5071d	[BUG][Broker] Fix broker read buffer size from input stream (#3881 ) This commit fixs a bug that broker cannot read the full length of buffer size, when the buffer size is set larger than 128k. This bug will cause the data size returned by pread request to be less than 128K all the time.	2020-06-19 09:33:09 +08:00
yangzhg	5a253bc2c6	[BE][Tool] Add segment v2 footer meta viewer (#3822 ) Add segment v2 footer meta viewer tool	2020-06-19 09:32:11 +08:00
Binglin Chang	ca96ea3056	[Memory Engine] MemTablet creation and compatibility handling in BE (#3762 )	2020-06-18 09:56:07 +08:00
ZhangYu0123	2f99f632e8	Modify docs format (#3896 )	2020-06-18 09:43:28 +08:00
EmmyMiao87	a62cebfccf	Forbidden float column in short key (#3812 ) * Forbidden float column in short key When the user does not specify the short key column, doris will automatically supplement the short key column. However, doris does not support float or double as the short key column, so when adding the short key column, doris should avoid setting those column as the key column. The short key columns must be less then 3 columns and less then 36 bytes. The CreateMaterailizedView, AddRollup and CreateDuplicateTable need to forbidden float column in short key. If the float column is directly encountered during the supplement process, the subsequent columns are all value columns. Also the float and double could not be the short key column. At the same time, Doris must be at least one short key column. So the type of first column could not be float or double. If the varchar is the short key column, it can only be the least one short key column. Fixed #3811 For duplicate table without order by columns, the order by columns are same as short key columns. If the order by columns have been designated, the count of short key columns must be <= the count of order by columns.	2020-06-17 14:16:48 +08:00
lichaoyong	e9f7576b9d	[Enhancement] make metrics api more clear (#3891 )	2020-06-17 12:17:54 +08:00
Yunfeng,Wu	c6f2b5ef0d	[Doris On ES][Docs] refator documentation for doe (#3867 )	2020-06-17 10:54:28 +08:00
Mingyu Chen	d659167d6d	[Planner] Set MysqlScanNode's cardinality to avoid unexpected shuffle join (#3886 )	2020-06-17 10:53:36 +08:00
Mingyu Chen	a2df29efe9	[Bug][RoutineLoad] Fix bug that exception thrown when txn of a routineload task become visible (#3890 )	2020-06-17 10:52:51 +08:00
WingC	bfbe22526f	Show create table result with bitmap column should not return default value (#3882 )	2020-06-17 09:43:17 +08:00
lichaoyong	ae7028bee4	[Enhancement] Replace N/A with NULL in ShowStmt result (#3851 )	2020-06-17 09:41:51 +08:00
Mingyu Chen	0224d49842	[Fix][Bug] Fix compile bug (#3888 ) Co-authored-by: chenmingyu <chenmingyu@baidu.com>	2020-06-16 18:42:04 +08:00

1 2 3 4 5 ...

2042 Commits