doris

Author	SHA1	Message	Date
Youngwb	068707484d	Support sequence column for UNIQUE_KEYS Table (#4256 ) * add sequence col Co-authored-by: yangwenbo6 <yangwenbo3@jd.com>	2020-09-04 10:10:17 +08:00
ZhangYu0123	123237afb7	[Compaction] Persistence stale rowsets meta (#4454 ) Persistence stale rowsets meta. When BE reboots, stale rowsets meta can resume and the stale version can also be readable before stale gc time. ISSUE: #4453	2020-08-30 21:05:48 +08:00
HappenLee	e4e9af4577	This PR contain three things (#4448 ) 1. Fix core bug wild pointer in PlanFragmentExecutor, fix issue #4447 2. Fix core bug wild pointer json load, fix issue #4452 3. Change the declare order of ODBC type in thrift for compatibility	2020-08-26 10:53:53 +08:00
Lijia Liu	d5a0a738f4	[SQL] Rewrite count(distinct if(bool, bitmap, null)) to bitmap_union_count (#4201 ) Add IF(BOOL, BITMAP, BITMAP) function.	2020-08-26 10:26:40 +08:00
Zhengguo Yang	d61c10b761	[Delete] Support batch delete [part 1] (#4310 ) * Implements the grammar of the batch delete #4051 * Process create, alter table when table has delete sign column * Support the syntax for enabling the delete column * Automatically filtered deleted data in the select statement. * Automatically add delete sign when create rollup table TODO: * Optimize the reading and compaction logic on the be side, so that the data marked as deleted will be completely deleted during base compaction	2020-08-21 22:57:16 +08:00
HappenLee	984006adf9	[ODBC SCAN NODE] 2/4 Add Thrift Interface of odbc_scan_node (#4389 ) issue:#4376	2020-08-21 21:26:48 +08:00
xinghuayu007	bfb39a2826	[SQL][Function] Add replace() function (#4347 ) replace is an user defined function, which is to replace all old substrings with a new substring in a string, as follow: mysql> select replace("http://www.baidu.com:9090", "9090", ""); +------------------------------------------------------+ \| replace('http://www.baidu.com:9090', '9090', '') \| +------------------------------------------------------+ \| http://www.baidu.com: \| +------------------------------------------------------+	2020-08-20 09:28:53 +08:00
Mingyu Chen	3359467b9a	[Tablet][Recovery] Support using empty tablet to repair the damaged or missing tablet (#4255 ) In some very special circumstances, such as code bugs, or human misoperation, etc., all replicas of some tablets may be lost. In this case, the data has been substantially lost. However, in some scenarios, the business still hopes to ensure that the query will not report errors even if there is data loss, and reduce the perception of the user layer. At this point, we can use the blank Tablet to fill the missing replica to ensure that the query can be executed normally. Add a new FE config `recover_with_empty_tablet`. default is false. true means to use empty tablet to fill the missing one. Also fix a bug in Fix #4274	2020-08-18 06:13:53 +00:00
Zhengguo Yang	10e3fc2778	[BUG] Fix abs function cannot handle bigint or bigger data type (#4326 )	2020-08-12 20:58:35 +08:00
HaiBo Li	4ad943e45d	[Feature][Cache] Cache proxy and coordinator #2581 (#4248 ) * [Feature][Cache] Cache proxy and coordinator #2581 1. Cache's abstract proxy class and BE's Cache implementation 2. Cache coordinator implemented by consistent hashing * Adjusted the formatting code, naming and variables according to the comments	2020-08-10 16:40:25 +08:00
caiconghui	eefad13107	[Feature] Support InPredicate in delete statement (#4006 ) This PR is to add inPredicate support to delete statement, and add max_allowed_in_element_num_of_delete variable to limit element num of InPredicate in delete statement.	2020-08-06 23:19:40 +08:00
Zhengguo Yang	1b341601fe	Generate jave files using maven (#4133 ) generate generated-java files using maven instead of by build.sh	2020-08-05 15:20:39 +08:00
HaiBo Li	1ebd156b99	[Feature]Add fetch/update/clear proto of fe&be for cache (#4190 )	2020-07-31 13:23:24 +08:00
worker24h	fdcc223ad2	[Bug][Json] Refactor the json load logic to fix some bug 1. Add `json_root` for nest json data. 2. Remove `_jmap` to make the logic reasonable.	2020-07-30 10:36:34 +08:00
HangyuanLiu	5032b7fe7a	Support materialized view schema change in bitmap hll and count field [#3739 ] (#3873 ) + Building the materialized view function for schema_change here based on defineExpr. + This is a trick because the current storage layer does not support expression evaluation. + count distinct materialized view will set mv_expr with to_bitmap or hll_hash. + count materialized view will set mv_expr with count. + Support to regenerate historical data when a new materialized view is created in BE。 + Support to_bitmap function + Support hll_hash function + Support count(field) function For #3344	2020-07-16 10:45:15 +08:00
Mingyu Chen	c3d9feed75	[Load][Json] Refactor json load logic to make it more reasonable (#4020 ) This CL mainly changes: 1. Reorganized the code logic to limit the supported json format to two, and the import behavior is more consistent. 2. Modified the statistical behavior of the number of error rows when loading in json format, so that the error rows can be counted correctly. 3. See `load-json-format.md` to get details of loading json format.	2020-07-07 23:07:28 +08:00
caiconghui	48d947edf4	Support rpc_timeout property in stream load request to cancel request in fe in time when stream load request is timeout (#3948 ) This PR is to enable cancel stream load request in FE in time when stream load request is timeout to make stream load more robust.	2020-06-29 19:16:16 +08:00
Mingyu Chen	af1beb6ce4	[Enhance] Add prepare phase for some timestamp functions (#3947 ) Fix: #3946 CL: 1. Add prepare phase for `from_unixtime()`, `date_format()` and `convert_tz()` functions, to handle the format string once for all. 2. Find the cctz timezone when init `runtime state`, so that don't need to find timezone for each rows. 3. Add constant rewrite rule for `utc_timestamp()` 4. Add doc for `to_date()` 5. Comment out the `push_handler_test`, it can not run in DEBUG mode, will be fixed later. 6. Remove `timezone_db.h/cpp` and add `timezone_utils.h/cpp` The performance shows bellow: 11,000,000 rows SQL1: `select count(from_unixtime(k1)) from tbl1;` Before: 8.85s After: 2.85s SQL2: `select count(from_unixtime(k1, '%Y-%m-%d %H:%i:%s')) from tbl1 limit 1;` Before: 10.73s After: 4.85s The date string format seems still slow, we may need a further enhancement about it.	2020-06-29 19:15:09 +08:00
lichaoyong	93a0b47d22	Revert "[Memory Engine] MemTablet creation and compatibility handling in BE (#3762 )" (#3931 ) This reverts commit ca96ea30560c9e9837c28cfd2cdd8ed24196f787.	2020-06-24 10:13:45 +08:00
Binglin Chang	ca96ea3056	[Memory Engine] MemTablet creation and compatibility handling in BE (#3762 )	2020-06-18 09:56:07 +08:00
HappenLee	dac156b6b1	[Spill To Disk] Analytic_Eval_Node Support Spill Disk and Del Some Unless Code (#3820 ) * 1. Add enable spilling in query option, support spill disk in Analytic_Eval_Node, FE can open enable spilling by set enable_spilling = true; Now, Sort Node and Analytic_Eval_Node can spill to disk. 2. Delete merge merge_sorter code we do not use now. 3. Replace buffered_tuple_stream by buffered_tuple_stream2 in Analytic_Eval_Node and support spill to disk. Delete the useless code of buffered_block_mgr and buffered_tuple_stream. 4. Add DataStreamRecvr Profile. Move the counter belong to DataStreamRecvr from fragment to DataStreamRecvr Profile to make clear of Running Profile. * change some hint in code * replace disable_spill with enable_spill which is better compatible to FE	2020-06-13 10:19:02 +08:00
Mingyu Chen	27046c5b61	[Enhancement] Improve the performance of query with IN predicate (#3694 ) This CL mainly changes: 1. Add a new BE config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to storage engine. 2. Add 2 new session variables `max_scan_key_num` and `doris_max_scan_key_num` which can set in session level and overwrite the config value in BE.	2020-06-04 11:39:00 +08:00
wyb	8e71c0787c	[Spark load][Fe 2/5] Update push task thrift interface (#3718 ) 1. Add TBrokerScanRange and TDescriptorTable used by ParquetScanner 2. Add new TPushType LOAD_V2 for spark load	2020-06-01 18:21:43 +08:00
Binglin Chang	c967eaf496	[Memory Engine] Add TabletType to PartitionInfo and TabletMeta (#3668 )	2020-05-29 20:20:44 +08:00
Mingyu Chen	3ffc447b38	[OUTFILE] Support `INTO OUTFILE` to export query result (#3584 ) This CL mainly changes: 1. Support `SELECT INTO OUTFILE` command. 2. Support export query result to a file via Broker. 3. Support CSV export format with specified column separator and line delimiter.	2020-05-25 21:24:56 +08:00
HangyuanLiu	2608f83bdc	[WIP] Add define expr for column (#3651 ) In the materialized view 2.0 the define expr should be set in column. For example, the to_bitmap function on integer should be define in mv column. ``` create materialized view mv as select bitmap_union(to_bitmap(k1)) from table. ``` The meta of mv as following: column name: __doris_materialized_view_bitmap_k1 column aggregate type: bitmap_union column define exrp: to_bitmap(k1) The is WIP pr for materialized view 2.0. #3344	2020-05-25 11:00:29 +08:00
HangyuanLiu	838c1e9212	Modify HLL functions return type (#3656 ) 1、Modify hll_hash function return type to HLL 2、Make HLL_RAW_AGG is alias of HLL_UNION	2020-05-24 21:22:43 +08:00
worker24h	ef8fd1fcbe	[Load] Support load json-data into Doris by RoutineLoad or StreamLoad (#3553 ) Doris support load json-data by RoutineLoad or StreamLoad	2020-05-21 13:00:49 +08:00
Binglin Chang	63fecc7954	Remove unused ColumnType (#3532 )	2020-05-11 18:57:47 +08:00
Youngwb	a656a7ddd4	Support append_trailing_char_if_absent function (#3439 )	2020-05-09 08:59:34 +08:00
yangzhg	94b3a2bd50	[Bug] Fix string functions not support multibyte string (#3345 ) Let string functions support utf8 encoding	2020-05-08 12:52:46 +08:00
yangzhg	f90da72078	[Planner]Enhance AssertNumRowsNode (#3485 ) Enhance AssertNumRowsNode to support equal, less than, greater than,... assert conditions	2020-05-08 12:49:48 +08:00
Mingyu Chen	9a934ec9f6	[Load] Add more info in SHOW LOAD result (#3391 ) Fix #3390 This CL add more info in `JobDetails` column of `SHOW LOAD` result for Broker Load Job. For example: ``` { "Unfinished backends": { "9c3441027ff948a0-8287923329a2b6a7": [10002] }, "All backends": { "9c3441027ff948a0-8287923329a2b6a7": [10002, 10004, 10006] }, "ScannedRows": 2390016, "TaskNumber": 1, "FileNumber": 1, "FileSize": 1073741824 } ``` 2 newly added keys: `Unfinished backends` indicates the BE which task on them are not finished. `All backends` indicates the BE which this job has tasks on it. One more thing, I pass the Backend Id along with the heartbeat msg from FE to BE, so that BE can know the Id of themselves.	2020-04-26 21:30:23 +08:00
Yunfeng,Wu	a467c6f81f	[ES Connector] Add field context for string field keyword type (#3305 ) This PR is just a transitional way，but it is better to move the predicates transformation from Doris BE to Doris BE, in this way, Doris BE is responsible for fetching data from ES. Add a `enable_keyword_sniff ` configuration item in creating External Elasticsearch Table ，it default to true , would to sniff the `keyword` type on the `text analyzed` Field and return the `json_path` which substitute the origin col name. ``` CREATE EXTERNAL TABLE `test` ( `k1` varchar(20) COMMENT "", `create_time` datetime COMMENT "" ) ENGINE=ELASTICSEARCH PROPERTIES ( "hosts" = "http://10.74.167.16:8200", "user" = "root", "password" = "root", "index" = "test", "type" = "doc", "enable_keyword_sniff" = "true" ); ``` note: `enable_keyword_sniff` default to "true" run this SQL： ``` select * from test where k1 = "wu yun feng" ``` Output predicate DSL： ``` {"term":{"k1.keyword":"wu yun feng"}} ``` and in this PR, I remove the elasticsearch version detected logic for now this is useless, maybe future is needed.	2020-04-13 23:07:33 +08:00
Yingchun Lai	f39c8b156d	[refactor] A small refactor on class DataDir (#3276 ) main refactor points are: - Use a single get_absolute_tablet_path function instead of 3 independent functions - Remove meaningless return value of register_tablet and deregister_tablet - Some typo and format	2020-04-10 00:32:22 +08:00
Seaven	8426669472	[Plugin] Add BE plugin framework (#2348 ) (#2618 ) Support BE plugin framework, include: * update Plugin Manager, support Plugin find method * support Builtin-Plugin register method * plugin install/uninstall process * PluginLoader: * dynamic install and check Plugin .so file * dynamic uninstall and check Plugin status * PluginZip: * support plugin remote/local .zip file download and extract TODO: * We should support a PluginContext to transmit necessary system variable when the plugin's init/close method invoke * Add the entry which is BE dynamic Plugin install/uninstall process, include: * The FE send install/uninstall Plugin statement (RPC way) * The FE meta update request with Plugin list information * The FE operation request(update/query) with Plugin (maybe don't need) * Add the plugin status upload way * Load already install Plugin when BE start	2020-03-25 21:55:44 +08:00
lichaoyong	e20d905d70	Remove unused KUDU codes (#3175 ) KUDU table is no longer supported long time ago. Remove code related to it.	2020-03-24 13:54:05 +08:00
Mingyu Chen	0f14408f13	[Temp Partition] Support loading data into temp partitions (#3120 ) Related issue: #2663, #2828. This CL support loading data into specified temporary partitions. ``` INSERT INTO tbl TEMPORARY PARTITIONS(tp1, tp2, ..) ....; curl .... -H "temporary_partition: tp1, tp, .. " .... LOAD LABEL db1.label1 ( DATA INFILE("xxxx") INTO TABLE `tbl2` TEMPORARY PARTITION(tp1, tp2, ...) ... ``` NOTICE: this CL change the FE meta version to 77. There 3 major changes in this CL ## Syntax reorganization Reorganized the syntax related to the `specify-partitions`. Removed some redundant syntax definitions, and unified the syntax related to the `specify-partitions` under one syntax entry. ## Meta refactor In order to be able to support specifying temporary partitions, I made some changes to the way the partition information in the table is stored. Partition information is now organized as follows: The following two maps are reserved in OlapTable for storing formal partitions: ``` idToPartition nameToPartition ``` Use the `TempPartitions` class for storing temporary partitions. All the partition attributes of the formal partition and the temporary partition, such as the range, the number of replicas, and the storage medium, are all stored in the `partitionInfo` of the OlapTable. In `partitionInfo`, we use two maps to store the range of formal partition and temporary partition: ``` idToRange idToTempRange ``` Use separate map is because the partition ranges of the formal partition and the temporary partition may overlap. Separate map can more easily check the partition range. All partition attributes except the partition range are stored using the same map, and the partition id is used as the map key. ## Method to get partition A table may contain both formal and temporary partitions. There are several methods to get the partition of a table. Typically divided into two categories: 1. Get partition by id 2. Get partition by name According to different requirements, the caller may want to obtain a formal partition or a temporary partition. These methods are described below in order to obtain the partition by using the correct method. 1. Get by name This type of request usually comes from a user with partition names. Such as `select * from tbl partition(p1);`. This type of request has clear information to indicate whether to obtain a formal or temporary partition. Therefore, we need to get the partition through this method: `getPartition(String partitionName, boolean isTemp)` To avoid modifying too much code, we leave the `getPartition(String partitionName)`, which is same as: `getPartition(partitionName, false)` 2. Get by id This type of request usually means that the previous step has obtained certain partition ids in some way, so we only need to get the corresponding partition through this method: `getPartition(long partitionId)`. This method will try to get both formal partitions and temporary partitions. 3. Get all partition instances Depending on the requirements, the caller may want to obtain all formal partitions, all temporary partitions, or all partitions. Therefore we provide 3 methods, the caller chooses according to needs. `getPartitions()` `getTempPartitions()` `getAllPartitions()`	2020-03-19 15:07:01 +08:00
Mingyu Chen	4c98596283	[MysqlProtocol] Support MySQL multiple statements protocol (#3050 ) 2 Changes in this CL: ## Support multiple statements in one request like: ``` select 10; select 20; select 30; ``` ISSUE: #3049 For simple testing this CL, you can using mysql-client shell command tools: ``` mysql> delimiter // mysql> select 1; select 2; // +------+ \| 1 \| +------+ \| 1 \| +------+ 1 row in set (0.01 sec) +------+ \| 2 \| +------+ \| 2 \| +------+ 1 row in set (0.02 sec) Query OK, 0 rows affected (0.02 sec) ``` I add a new class called `OriginStatement.java`, to save the origin statement in string format with an index. This class is mainly for the following cases: 1. User send a multi-statement to the non-master FE: `DDL1; DDL2; DDL3` 2. Currently we cannot separate the original string of a single statement from multiple statements. So we have to forward the entire statement to the Master FE. So I add an index in the forward request. `DDL1`'s index is 0, `DDL2`'s index is 1,... 3. When the Master FE handle the forwarded request, it will parse the entire statement, got 3 DDL statements, and using the `index` to get the specified the statement. ## Optimized the display of syntax errors I have also optimized the display of syntax errors so that longer syntax errors can be fully displayed.	2020-03-13 22:21:40 +08:00
Yingchun Lai	8276c6d7f8	Show BE version in 'show backends;' (#3074 ) In a large scale cluster, we may rolling upgrade BEs, this patch add a column named 'Version' for command 'show backends;', as well as website '/system?path=//backends', to provide a method to check whether there is any BE missing upgraded.	2020-03-12 22:15:13 +08:00
Youngwb	a77515fe03	[Backup] Fix backup job block at SNAPSHOTING phase (#3058 ) This bug occurred when BE make snapshot, the version required by fe had been merged into the cumulative version, so the snapshot task could not complete the task even if it retried. In order to solve this problem, the BackupJob could be set to CANCELLED, and the user could continue to retry the job. Fix #3057	2020-03-11 14:05:02 +08:00
Mingyu Chen	fdcbfbb793	[Bug] Fix bug that coalesce() function return null when there is constant value in parameter. (#3062 ) select coalesce(1, null); RETURNS: NULL EXPECTED: 1	2020-03-09 16:38:50 +08:00
wangbo	c8054ebe13	[Function] ifnull function supports new args (date,datetime) and (datetime, date) (#3043 )	2020-03-09 09:37:26 +08:00
Lishi	0d1e28746e	[Function] Support null_or_empty function (#2977 ) It returns true if the string is empty or NULL. Otherwise it returns false.	2020-03-01 17:35:45 +08:00
yangzhg	3b5a0b6060	[TPCDS] Implement the planner for set operation (#2957 ) Implement intersect and except planner. This CL does not implement intersect and except node in execution level.	2020-02-27 16:03:31 +08:00
Dayue Gao	d2d95bfa84	[segment_v2] Switch to Unified and Extensible Page Format (#2953 ) Fixes #2892 IMPORTANT NOTICE: this CL makes incompatible changes to V2 storage format, developers need to create new tables for test. This CL refactors the metadata and page format for segment_v2 in order to * make it easy to extend existing page type * make it easy to add new page type while not sacrificing code reuse * make it possible to use SIMD to speed up page decoding Here we summary the main code changes * Page and index metadata is redesigned, please see `segment_v2.proto` * The new class `PageIO` is the single place for reading and writing all pages. This removes lots of duplicated code. `PageCompressor` and `PageDecompressor` are now useless and removed. * The type of value ordinal is changed from `rowid_t` to 64-bits `ordinal_t`, this affects ordinal index as well. * Column's ordinal index is now implemented by IndexPage, the same with IndexedColumn. * Zone map index is now implemented by IndexedColumn	2020-02-27 15:09:57 +08:00
LingBin	8291a45267	Assign each status type an constant explicitly (#2960 ) The `TStatusCode` struct is used in all FEs and BEs. In order to be able to avoid errors when identifying status_codes in RPC when upgrading Doris (update and restart the servers one by one), we must ensure that each element always a fixed value. If each element is not explicitly assigned a constant, then the value of each element will be assigned from 0 in turn, which will need us to be very careful when adding and removing elements, to avoid the same element on different machines to be recognized as a different value. i.e., new elements can only be added to the end, and only elements at the end can be deleted. Unfortunately, this implicit constraint is likely to be ignored by programmers when coding, especially those who are new to Doris. No functional change in this patch.	2020-02-21 01:34:12 -06:00
Mingyu Chen	35b09ecd66	[JDK] Support OpenJDK (#2804 ) Support compile and running Frontend process and Broker process with OpenJDK. OpenJDK 13 is tested.	2020-02-20 23:47:02 +08:00
kangkaisen	ece8740c1b	Fix some function DATE type priority (#2952 ) 1. Fix the bug introduced by https://github.com/apache/incubator-doris/pull/2947. The following sql result is 0000, which is wrong. The result should be 1601 ``` select date_format('2020-02-19 16:01:12','%H%i'); ``` 2. Add constant Express plan test, ensure the FE constant Express compute result is right. 3. Remove the `castToInt ` function in `FEFunctions`, which is duplicated with `CastExpr::getResultValue` 4. Implement `getNodeExplainString` method for `UnionNode`	2020-02-20 20:45:45 +08:00
kangkaisen	147953f09e	Fix some function with date type bug (#2947 ) The logic chain is following: 1. `date_format(if(, NULL, `dt`), '%Y%m%d')` as HASH_PARTITIONED exprs，which is not right, we should use Agg intermediate materialized slot 2. we don't use Agg intermediate materialized slot as HASH_PARTITIONED exprs, becasue ``` // the parent fragment is partitioned on the grouping exprs; // substitute grouping exprs to reference the output of the agg, not the input partitionExprs = Expr.substituteList(partitionExprs, node.getAggInfo().getIntermediateSmap(), ctx_.getRootAnalyzer(), false); parentPartition = DataPartition.hashPartitioned(partitionExprs); ``` the partitionExprs substitute failed。 3. partitionExprs substitute failed because partitionExprs has a casttodate child,but agg info getIntermediateSmap has a cast in datetime child. 4. The cast to date or cast to datetime child exist because `TupleIsNullPredicate` insert a `if` Expr. we don't have `if date` fn, so Doris use `if int` Expr. 5. the `date` in the `catstodate` depend on slot dt date type. the `datetime` in the `catstodatetime` depend on datetime arg type in `date_format` function. So we could fix this issue by make if fn support date type or make date_format fn support date type	2020-02-19 20:16:44 +08:00

1 2 3 4 5

235 Commits