doris

Author	SHA1	Message	Date
wangbo	f03abcdfb3	[Spark Load] Rollup Tree Builder (#3727 ) 1 A tree data structure to describe doris table's rollup 2 A builder to build the data structure	2020-06-22 14:06:33 +08:00
Mingyu Chen	56bb218148	[Bug] Can not use non-key column as partition column in duplicate table (#3916 ) The following statement will throw error: ``` create table test.tbl2 (k1 int, k2 int, k3 float) duplicate key(k1) partition by range(k2) (partition p1 values less than("10")) distributed by hash(k3) buckets 1 properties('replication_num' = '1'); ``` Error: `Only key column can be partition column` But in duplicate key table, columns can be partition or distribution column even if they are not in duplicate keys. This bug is introduced by #3812	2020-06-22 09:24:21 +08:00
Mingyu Chen	4c3ccfb906	[FE] Prohibit pointing helper to itself when starting FE (#3850 ) When starting FE with `start_fe.sh --helper xxx` command, do not allow to point helper to FE itself. Because this is meaningless and may cause some confusing problemes.	2020-06-22 09:21:08 +08:00
wyb	a63fa88294	[Spark load][Fe 6/6] Fe process etl and loading state job (#3717 ) 1. Fe checks the status of etl job regularly 1.1 If status is RUNNING, update etl job progress 1.2 If status is CANCELLED, cancel load job 1.3 If status is FINISHED, get the etl output file paths, update job state to LOADING and log job update info 2. Fe sends PushTask to Be and commits transaction after all push tasks execute successfully #3433	2020-06-21 22:17:03 +08:00
wangbo	8cd36f1c5d	[Spark Load] Support java version hyperloglog (#3320 ) mainly used for Spark Load process to calculate approximate deduplication value and then serialize to parquet file. Try to keep the same calculation semantic with be's C++ version	2020-06-21 09:37:05 +08:00
Mingyu Chen	8e895958d6	[Bug] Checkpoint thread is not running (#3913 ) This bug is introduced by PR #3784 In #3784, I remove the `Catalog.getInstance()`, and use `Catalog.getCurrentCatalog()` instead. But actually, there are some place that should use the serving catalog explicitly. Mainly changed: 1. Add a new method `getServingCatalog()` to explicitly return the real catalog instance. 2. Fix a compile bug of broker introduced by #3881	2020-06-20 09:32:14 +08:00
wyb	532d15d381	[Spark Load]Fe submit spark etl job (#3716 ) After user creates a spark load job which status is PENDING, Fe will schedule and submit the spark etl job. 1. Begin transaction 2. Create a SparkLoadPendingTask for submitting etl job 2.1 Create etl job configuration according to https://github.com/apache/incubator-doris/issues/3010#issuecomment-635174675 2.2 Upload the configuration file and job jar to HDFS with broker 2.3 Submit etl job to spark cluster 2.4 Wait for etl job submission result 3. Update job state to ETL and log job update info if etl job is submitted successfully #3433	2020-06-19 17:44:47 +08:00
caiconghui	5d40218ae6	[Config] Support max_stream_load_timeout_second config in fe (#3902 ) This configuration is specifically used to limit timeout setting for stream load. It is to prevent that failed stream load transactions cannot be canceled within a short time because of the user's large timeout setting.	2020-06-19 17:09:27 +08:00
Mingyu Chen	51367abce7	[Bug] Fix bug that BE crash when doing Insert Operation (#3872 ) Mainly change: 1. Fix the bug in `update_status(status)` of `PlanFragmentExecutor`. 2. When the FE Coordinator executes `execRemoteFragmentAsync()`, if it finds an RPC error, return a Future with an error code instead of exception. 3. Protect the `_status` in RuntimeState with lock 4. Move the `_runtime_profile` of RuntimeState before the `_obj_pool`, so that the profile will be deconstructed after the object pool. 5. Remove the unused `ObjectPool` param in RuntimeProfile constructor. If I don't remove it, RuntimeProfile will depends on the `_obj_pool` in RuntimeProfile.	2020-06-19 17:09:04 +08:00
Yunfeng,Wu	355df127b7	[Doris On ES] Support fetch `_id` field from ES (#3900 ) More information can be found: https://github.com/apache/incubator-doris/issues/3901 The created ES external Table must contains `_id` column if you want to fetch the Elasticsearch document `_id`. ``` CREATE EXTERNAL TABLE `doe_id2` ( `_id` varchar COMMENT "", `city` varchar COMMENT "" ) ENGINE=ELASTICSEARCH PROPERTIES ( "hosts" = "http://10.74.167.16:8200", "user" = "root", "password" = "root", "index" = "doe", "type" = "doc", "version" = "6.5.3", "enable_docvalue_scan" = "true", "transport" = "http" ); Query: ``` mysql> select * from doe_id2 limit 10; +----------------------+------+ \| _id \| city \| +----------------------+------+ \| iRHNc3IB8XwmcbhB7lEB \| gz \| \| jBHNc3IB8XwmcbhB71Ef \| gz \| \| jRHNc3IB8XwmcbhB71GI \| gz \| \| jhHNc3IB8XwmcbhB71Hx \| gz \| \| ThHNc3IB8XwmcbhBkFHB \| sh \| \| TxHNc3IB8XwmcbhBkFH9 \| sh \| \| URHNc3IB8XwmcbhBklFA \| sh \| \| ahHNc3IB8XwmcbhBxlFq \| gz \| \| axHNc3IB8XwmcbhBxlHw \| gz \| \| bxHNc3IB8XwmcbhByVFO \| gz \| +----------------------+------+ ``` NOTICE: This change the column name format to support column name start with "_".	2020-06-19 17:07:07 +08:00
EmmyMiao87	a62cebfccf	Forbidden float column in short key (#3812 ) * Forbidden float column in short key When the user does not specify the short key column, doris will automatically supplement the short key column. However, doris does not support float or double as the short key column, so when adding the short key column, doris should avoid setting those column as the key column. The short key columns must be less then 3 columns and less then 36 bytes. The CreateMaterailizedView, AddRollup and CreateDuplicateTable need to forbidden float column in short key. If the float column is directly encountered during the supplement process, the subsequent columns are all value columns. Also the float and double could not be the short key column. At the same time, Doris must be at least one short key column. So the type of first column could not be float or double. If the varchar is the short key column, it can only be the least one short key column. Fixed #3811 For duplicate table without order by columns, the order by columns are same as short key columns. If the order by columns have been designated, the count of short key columns must be <= the count of order by columns.	2020-06-17 14:16:48 +08:00
lichaoyong	e9f7576b9d	[Enhancement] make metrics api more clear (#3891 )	2020-06-17 12:17:54 +08:00
Mingyu Chen	d659167d6d	[Planner] Set MysqlScanNode's cardinality to avoid unexpected shuffle join (#3886 )	2020-06-17 10:53:36 +08:00
Mingyu Chen	a2df29efe9	[Bug][RoutineLoad] Fix bug that exception thrown when txn of a routineload task become visible (#3890 )	2020-06-17 10:52:51 +08:00
WingC	bfbe22526f	Show create table result with bitmap column should not return default value (#3882 )	2020-06-17 09:43:17 +08:00
lichaoyong	ae7028bee4	[Enhancement] Replace N/A with NULL in ShowStmt result (#3851 )	2020-06-17 09:41:51 +08:00
Mingyu Chen	0224d49842	[Fix][Bug] Fix compile bug (#3888 ) Co-authored-by: chenmingyu <chenmingyu@baidu.com>	2020-06-16 18:42:04 +08:00
lichaoyong	6c4d7c60dd	[Feature] Add QueryDetail to store query statistics. (#3744 ) 1. Store the query statistics in memory. 2. Supporting RESTFUL interface to get the statistics.	2020-06-15 18:16:54 +08:00
Mingyu Chen	2211cb0ee0	[Metrics] Add metrics document and 2 new metrics of TCP (#3835 )	2020-06-15 09:48:09 +08:00
Mingyu Chen	b3811f910f	[Spark load][Fe 4/6] Add hive external table and update hive table syntax in loadstmt (#3819 ) * Add hive external table and update hive table syntax in loadstmt * Move check hive table from SelectStmt to FromClause and update doc * Update hive external table en sql reference	2020-06-13 16:28:24 +08:00
WingC	414a0a35e5	[Dynamic Partition] Use ZonedDateTime to support set timezone (#3799 ) This CL mainly support timezone in dynamic partition: 1. use new Java Time API to replace Calendar. 2. support set time zone in dynamic partition parameters.	2020-06-13 16:27:09 +08:00
Mingyu Chen	b8ee84a120	[Doc] Add docs to OLAP_SCAN_NODE query profile (#3808 )	2020-06-13 16:25:40 +08:00
caiconghui	6928c72703	Optimize the logic for getting TabletMeta from TabletInvertedIndex to reduce frequency of getting read lock (#3815 ) This PR is to optimize the logic for getting tabletMeta from TabletInvertedIndex to reduce frequence of getting read lock	2020-06-13 12:46:59 +08:00
HappenLee	dac156b6b1	[Spill To Disk] Analytic_Eval_Node Support Spill Disk and Del Some Unless Code (#3820 ) * 1. Add enable spilling in query option, support spill disk in Analytic_Eval_Node, FE can open enable spilling by set enable_spilling = true; Now, Sort Node and Analytic_Eval_Node can spill to disk. 2. Delete merge merge_sorter code we do not use now. 3. Replace buffered_tuple_stream by buffered_tuple_stream2 in Analytic_Eval_Node and support spill to disk. Delete the useless code of buffered_block_mgr and buffered_tuple_stream. 4. Add DataStreamRecvr Profile. Move the counter belong to DataStreamRecvr from fragment to DataStreamRecvr Profile to make clear of Running Profile. * change some hint in code * replace disable_spill with enable_spill which is better compatible to FE	2020-06-13 10:19:02 +08:00
HuangWei	88a5429165	[FE] Add db&tbl info in broker load log (#3837 ) stream load log in FE has db & tbl info, broker load log should have too.	2020-06-12 20:54:41 +08:00
wyb	7f7ee63723	Move check hive table from SelectStmt to FromClause and update doc	2020-06-11 16:53:41 +08:00
EmmyMiao87	2ce2cf78ac	Remove unused import (#3826 ) Change-Id: Ic6ef5a0d372a9b17ffa21cffb9027d2d7e856474	2020-06-11 11:44:51 +08:00
yangzhg	cd402a6827	[Restore] Fix error message not match of restore job when job is time out (#3798 ) For the current code if a restore job is time out it will be reported as user canceled. This error message is very misleading	2020-06-10 23:12:04 +08:00
EmmyMiao87	4cb5f7a535	[Config]Remove max_user_connections from config (#3805 ) Update max_user_connections by user property: ``` set property `user` max_user_connections=100; ```	2020-06-10 22:56:05 +08:00
wyb	4c2e73a5fe	Add hive external table and update hive table syntax in loadstmt	2020-06-10 16:32:32 +08:00
Mingyu Chen	4fa9d8cbe9	[Spark load][Fe 3/5] Fe create job (#3715 ) * Add create spark load job * Remove unused import	2020-06-09 21:57:46 +08:00
Mingyu Chen	5b1589498a	[Bug] Fix SchemaChangeJobV2's meta persist bug (#3804 ) 1. Missing field `partitionIndexMap` in SchemaChangeJobV2 2. Pair in field `indexSchemaVersionAndHashMap` can not be persisted by GSON 3. Exit the FE process when replay edit log error. Fix: #3802	2020-06-09 21:55:46 +08:00
Yunfeng,Wu	acd7a58875	[Doris On ES] [1/3] Add ES QueryBuilders for debug mode (#3774 )	2020-06-09 16:45:16 +08:00
Mingyu Chen	8ada2559b7	[Bug] Fix bug that checkpoint thread failed to start (#3795 ) 1. Set thread id before starting the checkpoint thread 2. Init the CHECKPOINT catalog instance before visiting it.	2020-06-08 23:00:36 +08:00
kangkaisen	928379c5d8	[Bug] Fix colocate group replay NPE (#3790 ) Group id should also be persisted for replaying	2020-06-07 10:20:22 +08:00
Mingyu Chen	ea5b3b2d4c	[Bug] Fix bug that should not use "!=" to judge the equivalence of Type (#3786 ) org.apache.doris.catalog.Type is not an enum, so should not judge the equivalence of Type using "==" or "!="	2020-06-06 11:38:32 +08:00
WingC	a7bf006b51	Use BackendStatus to show BE's infomation in `show backends;` (#3713 ) The infomation is displayed in JSON format.For example: {"lastTabletReportTime":"2020-05-28 15:29:01"}	2020-06-06 11:37:48 +08:00
Xiang Wei	c51f20bb7a	Disable Bitmap or Hll type in keys or in values with incorrect agg-type (#3768 ) Bitmap and Hll type can not be used with incorrect aggregate functions, which will cause to BE crush. Add some logical checks in FE's ColumnDef#analyze to avoid creating tables or changing schemas incorrectly. Keys never be bitmap or hll type values with bitmap or hll type have to be associated with bitmap_union or hll_union	2020-06-06 11:36:06 +08:00
Mingyu Chen	173dd3953d	[Code Refactor] Remove Catalog.getInstance() method (#3784 ) Use Catalog.getCurrentCatalog() instead, to avoid potential meta operation error.	2020-06-06 11:35:01 +08:00
HangyuanLiu	4cbce687b7	Add getValueFn and removeFn to properties (#3782 )	2020-06-06 11:34:32 +08:00
Yunfeng,Wu	5abef19be4	[Doris On ES] Add more detailed error message when fail to create es table (#3758 )	2020-06-05 23:06:46 +08:00
yangzhg	cdd17333ba	Add some log to make it easier to find out bug (#3770 ) Added some logs to record to which be a query was sent. Increasing the efficiency of tracing the problem	2020-06-05 10:18:58 +08:00
EmmyMiao87	0a748661c1	Fix the error selectedIndexId when keysType of table is UNIQUE (#3772 ) The unique table also should be compensated candidate index. The reason is the same as the agg table type. Fixed #3771. Change-Id: Ic04b0360a0b178cb0b6ee635e56f48852092ec09	2020-06-04 19:26:50 +08:00
lichaoyong	9b2cf1c18e	[Bug] Clear Txn when load been cancelled (#3766 ) If you a load task encoutering error, it will be cancelled. At this time, FE will clear the Txn according to the DbName. In FE, DbName should be added by cluter name. If missing cluster name, it will encounter NullPointer. As a result, the Txn will still exists until timeout.	2020-06-04 18:18:37 +08:00
Mingyu Chen	27046c5b61	[Enhancement] Improve the performance of query with IN predicate (#3694 ) This CL mainly changes: 1. Add a new BE config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to storage engine. 2. Add 2 new session variables `max_scan_key_num` and `doris_max_scan_key_num` which can set in session level and overwrite the config value in BE.	2020-06-04 11:39:00 +08:00
Mingyu Chen	fc33ee3618	[Plugin] Add timeout of connection when downloading the plugins from URL (#3755 ) If no timeout is set, the download process may be blocked forever.	2020-06-04 11:37:18 +08:00
yangzhg	a8c95e7369	[Bug] Fix binaryPredicte's equals function ignore op (#3753 ) BinaryPredicte's equals function compare by opcode , but the opcode may not be inited yet. so it will return true if this child is same, for example `a>1` and `a<1` are equal.	2020-06-04 09:29:19 +08:00
wyb	7f6a7c6807	Remove unused import	2020-06-03 22:32:52 +08:00
wangbo	7f6271c637	[Bug]Fix Query failed when fact table has no data in join case (#3604 ) major work 1. Correct the value of ```numNodes``` and ```cardinality``` when ```OlapTableScan``` computeStats so that the ``` broadcast cost``` and ```paritition join cost ``` can be calculated correctly. 2. Find a input fragment with higher parallelism for shuffle fragment to assign backend	2020-06-03 22:01:55 +08:00
wyb	edfa6683fc	Add create spark load job	2020-06-03 21:27:27 +08:00

1 2 3 4 5 ...

974 Commits