doris

Author	SHA1	Message	Date
xinghuayu007	44325ae850	[Bug-Fix] Bucket shuffle join executes failed when two tables have no data (#5145 ) Bucket shuffle join is an algorithm of joining two tables. Left table is distributed by a column. Right table sends the data to the left table for joining operation. It reduces the network cost. But when two table is without any data. Bucket shuffle join will fail. Related Issue: #5144	2020-12-31 09:49:35 +08:00
xinghuayu007	2e95b1c389	[Enhancement]Make Cholocate table join more load balance (#5104 ) When two colocate tables make join operation, to make join operation locally, the tablet belongs to the same bucket sequence will be distributed to the same host. When choosing which host for a bucket sequence, it takes random strategy. Random strategy can not make query task load balance logically for one query. Therefore, this patch takes round-robin strategy, make buckets distributed evenly. For example, if there are 6 bucket sequences and 3 hosts, it is better to distributed 2 buckets sequence for every host.	2020-12-31 09:47:06 +08:00
HuangWei	d7a584ac59	[Rebalancer] support partition rebalancer (#5010 ) RebalancerType could be configured via Config.rebalancer_type(BeLoad, Partition). PartitionRebalancer is based on TwoDimensionalGreedyAlgo. Two dims of Doris should be cluster & partition. And we only consider about the replica count, do not consider replica size. #4845 for further details.	2020-12-31 09:41:38 +08:00
xinghuayu007	fd6fb90a5a	[Bug] Hit none partition cache, but hit range is still right (#5065 ) Doris supports two kinds of cache mode: sql_cache and partition_cache. sql_cache takes sql string as key and cache the whole data. partition_cache splits the data into many partition data and caches them differently. Therefore a query may hit part of the partition_cache data. If a query hits the left part of the data, we call the hit range is left. If a query hits the right part of the data, we call the hit range is right. And if a query hits the whole part of the data, we call the hit range is full. A query does not hit any partition cache, but the algorithm still returns hit range right. It should return hit range none. Related issue: #5136	2020-12-31 09:40:31 +08:00
Zhengguo Yang	62604dfeac	Improve the processing logic of Load statement derived columns (#5140 ) * support transitive in load expr	2020-12-30 10:27:46 +08:00
924060929	cd865c95e0	Follower don't forward non-query statement to master repeatedly (#5160 ) Co-authored-by: lanhuajian <lanhuajian@sankuai.com>	2020-12-29 10:29:26 +08:00
xinghuayu007	f7a325a08f	[Refactor]Refactor function computeScanRangeAssignmentByColocate (#5097 )	2020-12-26 14:38:39 +08:00
Zhengguo Yang	279ae1cb75	Add fuzzy_parse option to speed up json import (#5114 ) add a flag of fuzzy_parse, if the json file all object keys are the same and has same order, we only need to parse the first row, and then use index instead key to parse value	2020-12-25 09:19:42 +08:00
曹建华	cf3f830e9a	[Bug-Fix] Fix 'Malformed packet' error when desc OlapTable with Rollup (#4455 ) (#5115 ) Fix 'Malformed packet' error when desc OlapTable with Rollup #4455	2020-12-23 09:34:12 +08:00
Mingyu Chen	c57145b4c2	[Bug] Fix bug that routine load may lost some data (#5093 ) In the previous implementation, whether a subtask is in commit or abort state, we will try to update the job progress, such as the consumed offset of kafka. Under normal circumstances, the aborted transaction does not consume any data, and all progress is 0, so even we update the progress, the progress will remain unchanged. However, in the case of high cluster load, the subtask may fail half of the execution on the BE side. At this time, although the task is aborted, part of the progress is updated. Cause the next subtask to skip these data for consumption, resulting in data loss.	2020-12-23 09:33:52 +08:00
Lijia Liu	6673306fda	[DOC] fix toSql of ShowPartitionsStmt (#5070 )	2020-12-19 11:18:00 +08:00
ccoffline	5bf84814cc	[Doc] Improve broadcast instructions (#5048 )	2020-12-19 11:16:59 +08:00
HappenLee	b485c10d56	[ODBC] ODBC Catalog do not show password in 'show resource' (#5088 ) issue:#5087	2020-12-17 00:34:04 +08:00
EmmyMiao87	9864a5d818	[Enhance] Modify the error message when mv column is transformed from base column in agg family table (#5084 ) When user wants to create materialized view with a mv column which is transformed from original column in agg family table, Doris will throw a new error message "The mv column of agg or uniq table cannot be transformed from original column" instead of "column not exists".	2020-12-17 00:33:27 +08:00
stdpain	ef15c5151c	[BUG] Fix colocate balance bug when no available BE (#5079 )	2020-12-17 00:32:42 +08:00
Mingyu Chen	b640991e43	[Enhance] Add profile for load job (#5052 ) Add viewable profile for broker load. Similar to the query profile, the user can submit the import job by setting the session variable is_report_success to true, and then view the running profile of the job on the FE web page for easy analysis and debugging.	2020-12-16 23:52:10 +08:00
EmmyMiao87	74bfd69595	[Bug] Forbidden creating table with dynamic partition when FE.config dynamic_partition_enable=false (#5043 ) - There is a fe configuration called dynamic_partition_enable which controls the opening and closing of the dynamic partition function. When this configuration is false, it means that all tables do not support dynamic partitioning. - But when the user tried to create the dynamic partition table, Doris did not detect this parameter. This will cause the user can normally create a dynamic partition table, but in fact Doris cannot create a partition for this table. - This pr detect this config when building the table. The dynamic partition table can be created only when the dynamic_partition_enable configuration is true. If the configuration is false, the command to create a dynamic partition table will directly report an error.	2020-12-16 23:44:20 +08:00
caiconghui	dfa413335f	[Heartbeat] Support fe heartbeat use thrift protocol to get stable response (#5027 ) This PR is to support fe master get fe heartbeat response by thrift protocol instead of http protocol.	2020-12-16 23:38:04 +08:00
Youngwb	650536d53e	[Feature] Add Topn udaf (#4803 ) For #4674 This is a udaf for approximate topn using Space-Saving algorithm. At present, we can only calculate the frequent items and their frequencies in a certain column, based on which we can implement similar topN functions supported by Kylin in the future. I have also added a test to calculate the accuracy of this algorithm. The following is a rough running result. The total amount of data is 1 million lines and follows the Zipfian distribution, where Element Cardinality represents the data cardinality, 20X, 50X.. The value representing space_expand_rate is 20,50, which is used to set the counter number in the space-saving algorithm ``` zf exponent = 0.5 Element cardinality 20X 50X 100X 1000 100% 100% 100% 10000 100% 100% 100% 100000 100% 100% 100% 500000 94% 98% 99% zf exponent = 0.6，1 Element cardinality 20X 50X 100X 1000 100% 100% 100% 10000 100% 100% 100% 100000 100% 100% 100% 500000 100% 100% 100% ```	2020-12-16 21:58:34 +08:00
Lijia Liu	ff4bd1223f	[Profile] Add cpu time cost in query audit (#5051 )	2020-12-13 22:22:15 +08:00
Lijia Liu	f847e22eeb	[AuditLog] Send queryId to master FE (#5064 ) For fix #4977, we return queryId in master FE when finish query for non master to audit it in #4978. But when the query fail(timeout), the client may not receive the right queryId for audit. In this PR: None master FE send queryId to master for querying; Add more log.	2020-12-13 22:05:35 +08:00
HappenLee	115d4332aa	[ODBC] Support ODBC Sink for insert into data to ODBC external table (#5033 ) issue:#5031 1. Support ODBC Sink for insert into data to ODBC external table. 2. Support Transaction for ODBC sink to make sure insert into data is atomicital. 3. The document about ODBC sink has been modified	2020-12-13 21:53:27 +08:00
Zhengguo Yang	1267d6bf66	[Bug][MultiLoad] Fix multiload missing userinfo and rebase error (#5058 )	2020-12-11 12:01:32 +08:00
sduzh	e47fb502b2	[Compatibility] Support embedded quota in string literal (#5045 ) ``` mysql> select 'I''m a student'; +-----------------+ \| 'I'm a student' \| +-----------------+ \| I'm a student \| +-----------------+ mysql> select "I""m a student"; +-----------------+ \| 'I"m a student' \| +-----------------+ \| I"m a student \| +-----------------+ mysql> select 'I""m a student'; +------------------+ \| 'I""m a student' \| +------------------+ \| I""m a student \| +------------------+ mysql> select "I''m a student"; +------------------+ \| 'I''m a student' \| +------------------+ \| I''m a student \| +------------------+ ```	2020-12-10 21:34:06 +08:00
Zhengguo Yang	e278e0b3db	[Load] Support full StreamLoad feature in multiload (#4717 )	2020-12-10 09:37:18 +08:00
Mingyu Chen	2dbcb726ac	[Bug] Fix bug that failed to write meta image of load job (#5029 ) In #4863, we add userInfo in load job, but the userInfo must be analyzed so that it can be written to the image.	2020-12-08 10:00:42 +08:00
HappenLee	6021d6fc7f	[Performance Optimization] Remove push down conjuncts in olap scan node (#4999 ) Push conjunct to Storage Engine as more as possible olap scan node do not need filter data use push down conjuncts again. fix #4986	2020-12-06 08:50:08 +08:00
HappenLee	b954dfd82d	[Bug] Fix the bug of Largetint and Decimal json load failed. (#4983 ) Use param of json load "num_as_string" to use flag kParseNumbersAsStringsFlag to parse json data.	2020-12-06 08:49:30 +08:00
Xinyi Zou	b1b99ae884	[Function] Support Decimal to calculate variance and standard deviation (#4959 )	2020-12-06 08:49:01 +08:00
Hao Tan	42dd821021	[Refactor] Private constructor for singleton (#4956 )	2020-12-06 08:47:29 +08:00
Xinyi Zou	c5f780305e	[Repair] Add an option whether to allow the partition column to be NULL (#5013 )	2020-12-05 14:58:32 +08:00
songchuangyuan	1ae6de7117	[Enhance] Add "statistics" meta table and fix some mysql compatibility problem (#4991 ) 1. Add metadata table 'statistics' to store index information; 2. In the header information returned by mysql, the data type length is returned according to the actual type.	2020-12-03 09:38:18 +08:00
Yunfeng,Wu	bd558f1895	[Doris][Doris On ES] support prefix @ symbol for column name (#5006 ) Support `@` leading column name, such as: ``` CREATE EXTERNAL TABLE `es_10` ( `@k3` bigint(20) NULL COMMENT "", `@k1` boolean NULL COMMENT "", `@k2` varchar(20) NULL COMMENT "" ) ENGINE=ELASTICSEARCH COMMENT "ELASTICSEARCH" PROPERTIES ( "hosts" = "ip:port", "user" = "root", "password" = "", "index" = "data_type_test", "type" = "doc", "transport" = "http" ); ```	2020-12-03 09:33:49 +08:00
Mingyu Chen	5215727b45	[Function] Let "str_to_date" return correct type (#5004 ) The return type of str_to_date depends on whether the time part is included in the format. If included, it is DATETIME, otherwise it is DATE. If the format parameter is not constant, the return type will be DATETIME. The above judgment has been completed in the FE query planning stage, so here we directly set the value type to the return type set in the query plan. For example: A table with one column k1 varchar, and has 2 lines: "%Y-%m-%d" "%Y-%m-%d %H:%i:%s" Query: SELECT str_to_date("2020-09-01", k1) from tbl; Result will be: 2020-09-01 00:00:00 2020-09-01 00:00:00 Query: SELECT str_to_date("2020-09-01", "%Y-%m-%d"); Return type is DATE Query: SELECT str_to_date("2020-09-01", "%Y-%m-%d %H:%i:%s"); Return type is DATETIME	2020-12-03 09:33:26 +08:00
wangbo	204c15119f	[Bug] ConcurrentModificationException when finish transaction (#5003 )	2020-12-03 09:33:04 +08:00
qiye	b4c1eabe3f	[Bug] fix finished load jobs cost too much heap (#4993 ) Since the plan is retained in the task, if the task is not cleaned up, the memory usage will be too large caused Memory leak or OOM. When load job finished, there is no need to hold the tasks which are the biggest memory consumers. Fixed #4992	2020-12-02 17:11:27 +08:00
Mingyu Chen	99404df8b2	[Bug][Compaction] Fix bug that output rowset is not deleted after compaction failure (#4964 ) This CL fix 2 bugs: 1. When the compaction fails, we must explicitly delete the output rowset, otherwise the GC logic cannot process these rows. 2. Base compaction failed if compaction process include some delete version in SegmentV2, Because the number of filtered rows is wrong.	2020-11-30 22:02:03 +08:00
Lijia Liu	27ef5b4d2c	[Bug] Use the right queryId to audit master only query in non master (#4978 ) Add queryId in TMasterOpResult. Audit it in non master FE.	2020-11-29 11:14:17 +08:00
Mingyu Chen	f944bf4d44	[Compile][Bug] Fix FE compilation bug (#4979 ) [Bug] Fix compile failed that cannot find symbol for variable scanRangeLength, Introduced by #4914 #4912	2020-11-28 16:19:54 +08:00
gengjun-git	f1248cb10e	[BUG] Fix colocate balance bug when there is decommissioned be (#4955 ) We should ignore decommissioned BE when select BEs to balance group bucketSeq.	2020-11-28 09:59:25 +08:00
Yunfeng,Wu	2e9c8dda04	[Doris On ES][Bug-Fix] fix problem for selecting random be (#4972 ) 1. Random().nextInt() maybe return negative numeric value which would result in `java.lang.ArrayIndexOutOfBoundsException`, pass a positive numeric value would avoid this problem. ``` int seed = new Random().nextInt(Short.MAX_VALUE) % nodesInfo.size() ``` 2. EsNodeInfo[] nodeInfos = (EsNodeInfo[]) nodesInfo.values().toArray() maybe lead `java.lang.ClassCastException in some JDK version : [Ljava.lang.Object; cannot be cast to [Lorg.apache.doris.external.elasticsearch.EsNodeInfo` , pass the original `Class Type` can resolve this. ``` EsNodeInfo[] nodeInfos = nodesInfo.values().toArray(new EsNodeInfo[0]); ```	2020-11-28 09:57:44 +08:00
Zhengguo Yang	c6bc30e375	[Bug] Fix httpv2 append extra useless information in get_small_file api (#4953 )	2020-11-28 09:52:52 +08:00
HappenLee	55ce88da34	[Schema change] Support More column type in schema change (#4938 ) 1. Support modify column type CHAR to TINYINT/SMALLINT/INT/BIGINT/LARGEINT/FLOAT/DOUBLE/DATE and TINYINT/SMALLINT/INT/BIGINT/LARGEINT/FLOAT/DOUBLE convert to a wider range of numeric types (#4937) 2. Use template to refactor code of types.h and schema_change.cpp to delete redundant code.	2020-11-28 09:52:28 +08:00
ccoffline	3b56b601fb	Show fe commit hash on proc (#4943 ) Show FE's commit has in SHOW PROC "/frontends" result.	2020-11-28 09:50:48 +08:00
xinghuayu007	0493eb172f	[Optimize] optimize host selection strategy (#4914 ) When a tablet selects which replica's host to execute scan operation, it takes `round-robin` strategy to load balance. `minAssignedBytes` is the current load of one host. If a backend is not alive momently, it will randomly take one of other replicas as the choice, but the unalive backend's `minAssignedBytes` not be descreased and the new choice's `minAssignedBytes` also not be increased. That will make the real load of the backends not correct.	2020-11-28 09:48:13 +08:00
xinghuayu007	68db176013	[Refator]Modify code write error (#4950 ) * fix typo in udf: replace function Co-authored-by: wangxixu <wangxixu@xiaomi.com>	2020-11-27 12:16:45 +08:00
gengjun-git	37a6731244	[BUG] Fix Colocate table balance bug (#4936 ) Fix bug that colocation group is always in unstable status.	2020-11-22 21:22:44 +08:00
HappenLee	584b33f95b	[Bug] Fix the bug of NULL do not show in CTE statement. (#4932 ) All Column create in inlineView will set `allowNull = false`, which will cause `NULL` data in CTE be process will be ignore. So we should set column in inlineView allowNull to make sure correct of query.	2020-11-22 20:58:03 +08:00
ccoffline	c28769c512	[Bug] Avoid partition prune if predicate is not with SlotRef (#4833 ) (#4921 )	2020-11-22 20:49:20 +08:00
xinghuayu007	4f7c6da1f5	[Refactor] Refactor function getScanRangeLength (#4912 ) getScanRangeLength always return 1, it is no need to maintain a function like this.	2020-11-22 20:44:11 +08:00

1 2 3 4 5 ...

263 Commits