doris

Author	SHA1	Message	Date
xy720	64f7a1fd1e	[Log] Add log for loading image (#3996 ) When fe load image failed, more logs should be printed to help users analyze errors.	2020-07-03 21:19:08 +08:00
HuangWei	9bb7e5d208	Fix some code & comments (#3999 ) TPlanExecParams::volume_id is never used, so delete the print_volume_ids() function. Fix log, and log if PlanFragmentExecutor::open() returns error. Fix some comments	2020-07-03 21:18:47 +08:00
yangzhg	5ade21b55d	[Load] Support load true or false as boolean value (#3898 ) Fixes #3831 After this PR insert into: `1/"1" -> 1, 0/"0"->0, true/"true"->1, false/"false" -> 0, "10"->null, "xxxx" -> null` load: `1/true -> 1, 0/false -> 0` other -> null	2020-07-02 13:58:24 +08:00
yangzhg	707d03cbde	[SQL] Remove order by for subquery in set opertion clause (#3806 ) implemnets #3803 Support disable some unmeaningful order by clause. The default limit of 65535 will not be disabled because of it is added at plannode, after we support spill to disk we can move this limit to analyze.	2020-07-02 13:56:53 +08:00
Yunfeng,Wu	2362500e77	[Doris On ES] Support create table with wildcard or aliase index (#3968 )	2020-07-01 22:08:06 +08:00
Dayue Gao	fdcbea480d	[Enhancement] DO NOT increase report version for publish task (#3894 ) Fixes #3893 In a cluster with frequent load activities, FE will ignore most tablet report from BE because currently it only handle reports whose version >= BE's latest report version (which is increased each time a transaction is published). This can be observed from FE's log, with many logs like `out of date report version 15919277405765 from backend[177969252]. current report version[15919277405766]` in it. However many system functionalities rely on TabletReport processing to work properly. For example 1. bad or version miss replica is detected and repaired during TabletReport 2. storage medium migration decision and action is made based on TabletReport 3. BE's old transaction is cleared/republished during TabletReport In fact, it is not necessary to update the report version after the publish task. Because this is actually a problem left over by history. In the reporting logic of the current version, we will no longer decrease the version information of the replica in the FE metadata according to the report. So even if we receive a stale version of the report, it does not matter. This CL contains mainly two changes 1. do not increase report version for publish task 2. populate `tabletWithoutPartitionId` out of read lock of TabletInvertedIndex	2020-07-01 09:23:40 +08:00
Mingyu Chen	1bfb105ec1	[Bug] Fix bug that routine load task throw exception when calling afterVisible() (#3979 )	2020-07-01 09:22:33 +08:00
Dayue Gao	f9a52f5db4	[Bug] Insert may leak DeltaWriter when re-analyzed (#3973 )	2020-06-30 11:09:53 +08:00
Yunfeng,Wu	3ac459f0ca	[UT] resolve metric ut fails (#3975 )	2020-06-29 21:54:41 +08:00
caiconghui	48398232e7	[Bug] Fix bug that default_rowset_type have a session variable (#3953 ) This PR is mainly for fixing bug that `default_rowset_type` have a session variable	2020-06-29 19:16:42 +08:00
caiconghui	48d947edf4	Support rpc_timeout property in stream load request to cancel request in fe in time when stream load request is timeout (#3948 ) This PR is to enable cancel stream load request in FE in time when stream load request is timeout to make stream load more robust.	2020-06-29 19:16:16 +08:00
Mingyu Chen	2c96d27fdc	[Enhance] Add MetaUrl and CompactionUrl for "show tablet" stmt (#3962 ) * [Enhance] Add MetaUrl and CompactionUrl for "show tablet" stmt Add MetaUrl and CompactionUrl in result of following stmt: `show tablet 10010`; * fix ut * add doc Co-authored-by: chenmingyu <chenmingyu@baidu.com>	2020-06-29 19:15:38 +08:00
Mingyu Chen	af1beb6ce4	[Enhance] Add prepare phase for some timestamp functions (#3947 ) Fix: #3946 CL: 1. Add prepare phase for `from_unixtime()`, `date_format()` and `convert_tz()` functions, to handle the format string once for all. 2. Find the cctz timezone when init `runtime state`, so that don't need to find timezone for each rows. 3. Add constant rewrite rule for `utc_timestamp()` 4. Add doc for `to_date()` 5. Comment out the `push_handler_test`, it can not run in DEBUG mode, will be fixed later. 6. Remove `timezone_db.h/cpp` and add `timezone_utils.h/cpp` The performance shows bellow: 11,000,000 rows SQL1: `select count(from_unixtime(k1)) from tbl1;` Before: 8.85s After: 2.85s SQL2: `select count(from_unixtime(k1, '%Y-%m-%d %H:%i:%s')) from tbl1 limit 1;` Before: 10.73s After: 4.85s The date string format seems still slow, we may need a further enhancement about it.	2020-06-29 19:15:09 +08:00
Mingyu Chen	0cbacaf01d	[Refactor] Replace some boost to std in OlapScanNode (#3934 ) Replace some boost to std in OlapScanNode. This refactor seems solve the problem describe in #3929. Because I found that BE will crash to calling `boost::condition_variable.notify_all()`. But after upgrade to this, BE does not crash any more.	2020-06-29 19:13:03 +08:00
Yunfeng,Wu	d82d48da87	[Doris On ES][Bug-fix] Sync ES metadata failure after restart or upgrade FE (#3961 ) ISSUE:#3960 PR #3454 introduce the caching for EsClient, but the initialization of the client was only during editlog replay, all this work should done also during image replay. This happens when restart or upgrade FE BTW: modify a UT failure for metric	2020-06-29 14:13:07 +08:00
chenmingyu	eecc0c5ec9	fix ut	2020-06-28 14:01:45 +08:00
Yunfeng,Wu	55c058e4b1	[Compile] modify compile error (#3959 )	2020-06-28 10:39:31 +08:00
morningman	566a7f1ac7	[Enhance] Add MetaUrl and CompactionUrl for "show tablet" stmt Add MetaUrl and CompactionUrl in result of following stmt: `show tablet 10010`;	2020-06-28 10:11:46 +08:00
WingC	b2b9e22b24	[CreateTable] Check backend disk has available capacity by storage medium before create table (#3519 ) Currently we choose BE random without check disk is available, the create table will failed until create tablet task is sent to BE and BE will check is there has available capacity to create tablet. So check backend disk available by storage medium will reduce unnecessary RPC call.	2020-06-28 09:36:31 +08:00
WingC	3be28460f7	[Bug]Dynamic partition check interval seconds is not right (#3951 )	2020-06-27 10:07:39 +08:00
Stalary	a894b1edc5	[Doris On ES] Split /_cluster/state to [indexName/_mappings, indexName/_search_shards] (#3454 ) 1. Split /_cluster/state into /_mapping and /_search_shards requests to reduce permissions and make the logic clearer 2. Rename part es related objects to make their representation more accurate 3. Simply support docValue and Fields in alias mode, and take the first one by default #3311	2020-06-26 17:46:43 +08:00
Mingyu Chen	46c64f0861	[Bug] Enable to get TCP metrics for linux kernel 2.x (#3921 ) Fix #3920 CL: 1. Parse the TCP metrics header in `/proc/net/snmp` to get the right position of the metrics. 2. Add 2 new metrics: `tcp_in_segs` and `tcp_out_segs`	2020-06-24 21:29:07 +08:00
Mingyu Chen	df8f9cc215	[Bug] Unify the timezone (#3910 ) When we get default system time zone, it will return `PRC`, which is not supported by us, thus will cause dynamic partition create failed. Fix #3919 This CL mainly changes: 1. Use a unified method to get the system default time zone 2. Now the default variable `system_time_zone` and `time_zone` is set to the default system time zone, which is `Asia/Shanghai`. 3. Modify related unit test. 4. Support time zone `PRC`.	2020-06-24 21:28:25 +08:00
wyb	3f7307d685	[Spark Load]Add spark etl job main class (#3927 ) 1. Add SparkEtlJob class 2. Remove DppResult comment 3. Support loading from hive table directly #3433	2020-06-24 13:54:55 +08:00
wangbo	8092aadc83	[Spark Load]Using SparkDpp to complete some calculation in Spark Load (#3729 )	2020-06-22 19:58:34 +08:00
wangbo	3a7b8e98a6	[Spark Load] Doris Support Using Hive Table to Build Global Dict (#3063 )	2020-06-22 14:07:36 +08:00
wangbo	f03abcdfb3	[Spark Load] Rollup Tree Builder (#3727 ) 1 A tree data structure to describe doris table's rollup 2 A builder to build the data structure	2020-06-22 14:06:33 +08:00
Mingyu Chen	56bb218148	[Bug] Can not use non-key column as partition column in duplicate table (#3916 ) The following statement will throw error: ``` create table test.tbl2 (k1 int, k2 int, k3 float) duplicate key(k1) partition by range(k2) (partition p1 values less than("10")) distributed by hash(k3) buckets 1 properties('replication_num' = '1'); ``` Error: `Only key column can be partition column` But in duplicate key table, columns can be partition or distribution column even if they are not in duplicate keys. This bug is introduced by #3812	2020-06-22 09:24:21 +08:00
Mingyu Chen	4c3ccfb906	[FE] Prohibit pointing helper to itself when starting FE (#3850 ) When starting FE with `start_fe.sh --helper xxx` command, do not allow to point helper to FE itself. Because this is meaningless and may cause some confusing problemes.	2020-06-22 09:21:08 +08:00
wyb	a63fa88294	[Spark load][Fe 6/6] Fe process etl and loading state job (#3717 ) 1. Fe checks the status of etl job regularly 1.1 If status is RUNNING, update etl job progress 1.2 If status is CANCELLED, cancel load job 1.3 If status is FINISHED, get the etl output file paths, update job state to LOADING and log job update info 2. Fe sends PushTask to Be and commits transaction after all push tasks execute successfully #3433	2020-06-21 22:17:03 +08:00
wangbo	8cd36f1c5d	[Spark Load] Support java version hyperloglog (#3320 ) mainly used for Spark Load process to calculate approximate deduplication value and then serialize to parquet file. Try to keep the same calculation semantic with be's C++ version	2020-06-21 09:37:05 +08:00
Mingyu Chen	8e895958d6	[Bug] Checkpoint thread is not running (#3913 ) This bug is introduced by PR #3784 In #3784, I remove the `Catalog.getInstance()`, and use `Catalog.getCurrentCatalog()` instead. But actually, there are some place that should use the serving catalog explicitly. Mainly changed: 1. Add a new method `getServingCatalog()` to explicitly return the real catalog instance. 2. Fix a compile bug of broker introduced by #3881	2020-06-20 09:32:14 +08:00
wyb	532d15d381	[Spark Load]Fe submit spark etl job (#3716 ) After user creates a spark load job which status is PENDING, Fe will schedule and submit the spark etl job. 1. Begin transaction 2. Create a SparkLoadPendingTask for submitting etl job 2.1 Create etl job configuration according to https://github.com/apache/incubator-doris/issues/3010#issuecomment-635174675 2.2 Upload the configuration file and job jar to HDFS with broker 2.3 Submit etl job to spark cluster 2.4 Wait for etl job submission result 3. Update job state to ETL and log job update info if etl job is submitted successfully #3433	2020-06-19 17:44:47 +08:00
caiconghui	5d40218ae6	[Config] Support max_stream_load_timeout_second config in fe (#3902 ) This configuration is specifically used to limit timeout setting for stream load. It is to prevent that failed stream load transactions cannot be canceled within a short time because of the user's large timeout setting.	2020-06-19 17:09:27 +08:00
Mingyu Chen	51367abce7	[Bug] Fix bug that BE crash when doing Insert Operation (#3872 ) Mainly change: 1. Fix the bug in `update_status(status)` of `PlanFragmentExecutor`. 2. When the FE Coordinator executes `execRemoteFragmentAsync()`, if it finds an RPC error, return a Future with an error code instead of exception. 3. Protect the `_status` in RuntimeState with lock 4. Move the `_runtime_profile` of RuntimeState before the `_obj_pool`, so that the profile will be deconstructed after the object pool. 5. Remove the unused `ObjectPool` param in RuntimeProfile constructor. If I don't remove it, RuntimeProfile will depends on the `_obj_pool` in RuntimeProfile.	2020-06-19 17:09:04 +08:00
Yunfeng,Wu	355df127b7	[Doris On ES] Support fetch `_id` field from ES (#3900 ) More information can be found: https://github.com/apache/incubator-doris/issues/3901 The created ES external Table must contains `_id` column if you want to fetch the Elasticsearch document `_id`. ``` CREATE EXTERNAL TABLE `doe_id2` ( `_id` varchar COMMENT "", `city` varchar COMMENT "" ) ENGINE=ELASTICSEARCH PROPERTIES ( "hosts" = "http://10.74.167.16:8200", "user" = "root", "password" = "root", "index" = "doe", "type" = "doc", "version" = "6.5.3", "enable_docvalue_scan" = "true", "transport" = "http" ); Query: ``` mysql> select * from doe_id2 limit 10; +----------------------+------+ \| _id \| city \| +----------------------+------+ \| iRHNc3IB8XwmcbhB7lEB \| gz \| \| jBHNc3IB8XwmcbhB71Ef \| gz \| \| jRHNc3IB8XwmcbhB71GI \| gz \| \| jhHNc3IB8XwmcbhB71Hx \| gz \| \| ThHNc3IB8XwmcbhBkFHB \| sh \| \| TxHNc3IB8XwmcbhBkFH9 \| sh \| \| URHNc3IB8XwmcbhBklFA \| sh \| \| ahHNc3IB8XwmcbhBxlFq \| gz \| \| axHNc3IB8XwmcbhBxlHw \| gz \| \| bxHNc3IB8XwmcbhByVFO \| gz \| +----------------------+------+ ``` NOTICE: This change the column name format to support column name start with "_".	2020-06-19 17:07:07 +08:00
EmmyMiao87	a62cebfccf	Forbidden float column in short key (#3812 ) * Forbidden float column in short key When the user does not specify the short key column, doris will automatically supplement the short key column. However, doris does not support float or double as the short key column, so when adding the short key column, doris should avoid setting those column as the key column. The short key columns must be less then 3 columns and less then 36 bytes. The CreateMaterailizedView, AddRollup and CreateDuplicateTable need to forbidden float column in short key. If the float column is directly encountered during the supplement process, the subsequent columns are all value columns. Also the float and double could not be the short key column. At the same time, Doris must be at least one short key column. So the type of first column could not be float or double. If the varchar is the short key column, it can only be the least one short key column. Fixed #3811 For duplicate table without order by columns, the order by columns are same as short key columns. If the order by columns have been designated, the count of short key columns must be <= the count of order by columns.	2020-06-17 14:16:48 +08:00
lichaoyong	e9f7576b9d	[Enhancement] make metrics api more clear (#3891 )	2020-06-17 12:17:54 +08:00
Mingyu Chen	d659167d6d	[Planner] Set MysqlScanNode's cardinality to avoid unexpected shuffle join (#3886 )	2020-06-17 10:53:36 +08:00
Mingyu Chen	a2df29efe9	[Bug][RoutineLoad] Fix bug that exception thrown when txn of a routineload task become visible (#3890 )	2020-06-17 10:52:51 +08:00
WingC	bfbe22526f	Show create table result with bitmap column should not return default value (#3882 )	2020-06-17 09:43:17 +08:00
lichaoyong	ae7028bee4	[Enhancement] Replace N/A with NULL in ShowStmt result (#3851 )	2020-06-17 09:41:51 +08:00
Mingyu Chen	0224d49842	[Fix][Bug] Fix compile bug (#3888 ) Co-authored-by: chenmingyu <chenmingyu@baidu.com>	2020-06-16 18:42:04 +08:00
lichaoyong	6c4d7c60dd	[Feature] Add QueryDetail to store query statistics. (#3744 ) 1. Store the query statistics in memory. 2. Supporting RESTFUL interface to get the statistics.	2020-06-15 18:16:54 +08:00
Mingyu Chen	2211cb0ee0	[Metrics] Add metrics document and 2 new metrics of TCP (#3835 )	2020-06-15 09:48:09 +08:00
Mingyu Chen	b3811f910f	[Spark load][Fe 4/6] Add hive external table and update hive table syntax in loadstmt (#3819 ) * Add hive external table and update hive table syntax in loadstmt * Move check hive table from SelectStmt to FromClause and update doc * Update hive external table en sql reference	2020-06-13 16:28:24 +08:00
WingC	414a0a35e5	[Dynamic Partition] Use ZonedDateTime to support set timezone (#3799 ) This CL mainly support timezone in dynamic partition: 1. use new Java Time API to replace Calendar. 2. support set time zone in dynamic partition parameters.	2020-06-13 16:27:09 +08:00
Mingyu Chen	b8ee84a120	[Doc] Add docs to OLAP_SCAN_NODE query profile (#3808 )	2020-06-13 16:25:40 +08:00
caiconghui	6928c72703	Optimize the logic for getting TabletMeta from TabletInvertedIndex to reduce frequency of getting read lock (#3815 ) This PR is to optimize the logic for getting tabletMeta from TabletInvertedIndex to reduce frequence of getting read lock	2020-06-13 12:46:59 +08:00
HappenLee	dac156b6b1	[Spill To Disk] Analytic_Eval_Node Support Spill Disk and Del Some Unless Code (#3820 ) * 1. Add enable spilling in query option, support spill disk in Analytic_Eval_Node, FE can open enable spilling by set enable_spilling = true; Now, Sort Node and Analytic_Eval_Node can spill to disk. 2. Delete merge merge_sorter code we do not use now. 3. Replace buffered_tuple_stream by buffered_tuple_stream2 in Analytic_Eval_Node and support spill to disk. Delete the useless code of buffered_block_mgr and buffered_tuple_stream. 4. Add DataStreamRecvr Profile. Move the counter belong to DataStreamRecvr from fragment to DataStreamRecvr Profile to make clear of Running Profile. * change some hint in code * replace disable_spill with enable_spill which is better compatible to FE	2020-06-13 10:19:02 +08:00

... 94 95 96 97 98 ...

5755 Commits