doris

Author	SHA1	Message	Date
HuangWei	d46b57fae4	[Thirdparty] Use the bundle source of zstd when build_arrow (#5179 ) use bundle source of zstd & double-conversion, instead of system.	2021-01-04 09:33:40 +08:00
Skysheepwang	0d3564c2e1	[Feature] Implementation of histogram metric (#5148 ) #5146 Add histogram metrics into util/metrics.h. The data structure of histogram is implemented in util/histogram.h, which could also be used in other situations that in need of histogram. Unit tests added as well.	2021-01-04 09:32:46 +08:00
gengjun-git	d5768cf7d9	[Bug] Fix create colocate table bug (#5139 ) Fix #5138 1. fix bug when create colocate table with empty partition. 2. put code groupName2Id.put(fullGroupName, groupId) to the end to avoid state inconsistent when exception thrown. 3. do not check backendsPerBucketSeq empty in replayAddTableToGroup(), cause backendsPerBucketSeq can be empty for colocate table with empty partition.	2021-01-04 09:32:05 +08:00
HappenLee	5807413ad0	[UT] Add ut for column predicate of comlumnblock (#5123 ) Add ut for column predicate of ColumnBlock	2021-01-04 09:29:30 +08:00
HuangWei	17d939b789	[Bug] Fix scanner threads heap-use-after-free (#5111 ) Scanner threads may be running and using the member vars of OlapScanNode, when the OlapScanNode has already destroyed. We can use `_running_thread` to be the last accessed member variable. And `transfer_thread` need to wait for `_running_thread==0`. After `transfer_thread` joined, `OlapScanNode::close()` can continue.	2021-01-04 09:28:51 +08:00
lihuigang	05ac7fcd4a	[Function] Add BE udf bitmap_xor (#5098 ) this function will return the xor result of inputs two bitmap .	2021-01-04 09:27:46 +08:00
blueChild	a8b8c4760c	[Doc] Fix some spelling mistakes and default value mistakes in document (#5180 )	2021-01-03 15:45:56 +08:00
HappenLee	f2cf8d2c5e	[Bug-Fix] Fix the bug of `PERCENTILE_APPROX` return error result `nan` and add `PERCENTILE_APPROX` UT (#5172 )	2021-01-03 15:45:22 +08:00
HappenLee	9e19b6b133	[Performance Improve] Push Down _conjunct of 'A is NULL' and 'B is not NULL' to Storage Engine. (#5092 ) This patch mainly do the following: - Support #5086 - Refactor ColumnRangeValue to support contain null	2021-01-03 15:45:07 +08:00
xinghuayu007	44325ae850	[Bug-Fix] Bucket shuffle join executes failed when two tables have no data (#5145 ) Bucket shuffle join is an algorithm of joining two tables. Left table is distributed by a column. Right table sends the data to the left table for joining operation. It reduces the network cost. But when two table is without any data. Bucket shuffle join will fail. Related Issue: #5144	2020-12-31 09:49:35 +08:00
blueChild	bebbc27a83	[Thirdparty] Fix the DataTables.zip download issue (#5128 ) Modify the third-party library DataTables.zip download url from dt-1.10.22 to dt-1.10.23, resolved the download failure issue.	2020-12-31 09:47:43 +08:00
xinghuayu007	2e95b1c389	[Enhancement]Make Cholocate table join more load balance (#5104 ) When two colocate tables make join operation, to make join operation locally, the tablet belongs to the same bucket sequence will be distributed to the same host. When choosing which host for a bucket sequence, it takes random strategy. Random strategy can not make query task load balance logically for one query. Therefore, this patch takes round-robin strategy, make buckets distributed evenly. For example, if there are 6 bucket sequences and 3 hosts, it is better to distributed 2 buckets sequence for every host.	2020-12-31 09:47:06 +08:00
HuangWei	d7a584ac59	[Rebalancer] support partition rebalancer (#5010 ) RebalancerType could be configured via Config.rebalancer_type(BeLoad, Partition). PartitionRebalancer is based on TwoDimensionalGreedyAlgo. Two dims of Doris should be cluster & partition. And we only consider about the replica count, do not consider replica size. #4845 for further details.	2020-12-31 09:41:38 +08:00
xinghuayu007	fd6fb90a5a	[Bug] Hit none partition cache, but hit range is still right (#5065 ) Doris supports two kinds of cache mode: sql_cache and partition_cache. sql_cache takes sql string as key and cache the whole data. partition_cache splits the data into many partition data and caches them differently. Therefore a query may hit part of the partition_cache data. If a query hits the left part of the data, we call the hit range is left. If a query hits the right part of the data, we call the hit range is right. And if a query hits the whole part of the data, we call the hit range is full. A query does not hit any partition cache, but the algorithm still returns hit range right. It should return hit range none. Related issue: #5136	2020-12-31 09:40:31 +08:00
Zhengguo Yang	62604dfeac	Improve the processing logic of Load statement derived columns (#5140 ) * support transitive in load expr	2020-12-30 10:27:46 +08:00
924060929	cd865c95e0	Follower don't forward non-query statement to master repeatedly (#5160 ) Co-authored-by: lanhuajian <lanhuajian@sankuai.com>	2020-12-29 10:29:26 +08:00
HuangWei	5e1a80bb22	[UT][Bug] fix LOOP_LESS_OR_MORE (#5157 ) This bug introduced by #5131. When AllowSlowTests() is true, we should loop more.	2020-12-29 09:48:19 +08:00
Yingchun Lai	11c0aafa5c	[UT] Speed up BE unit test (#5131 ) There are some long loops and sleeps in unit tests, it will cost a very long time to run all unit tests, especially run in TSAN mode. This patch speed up unit tests by shortening long loops and sleeps, on my environment all unit tests finished in 1 minite. It's useful to do basic functional unit tests. You can switch to run in this mode by adding a new environment variable 'DORIS_ALLOW_SLOW_TESTS'. For example, you can set: export DORIS_ALLOW_SLOW_TESTS=1 and also you can disable it by setting: export DORIS_ALLOW_SLOW_TESTS=0	2020-12-27 22:19:56 +08:00
HuangWei	85076b5678	[UT] fix test_env & add a sample (#5085 ) Easily create tests.	2020-12-27 22:14:30 +08:00
xinghuayu007	f7a325a08f	[Refactor]Refactor function computeScanRangeAssignmentByColocate (#5097 )	2020-12-26 14:38:39 +08:00
wangbo	d9f1ffe9a0	(#5151 ) An already merged rowset should skip window check (#5152 )	2020-12-26 11:40:44 +08:00
HuangWei	16d52651f3	[Docs] some brpc configs can't be modified at runtime (#5137 ) brpc_max_body_size & brpc_socket_max_unwritten_bytes can't be modified at runtime. Only flags which have (R)(has_validator_fn) can.	2020-12-25 15:31:14 +08:00
Zhengguo Yang	279ae1cb75	Add fuzzy_parse option to speed up json import (#5114 ) add a flag of fuzzy_parse, if the json file all object keys are the same and has same order, we only need to parse the first row, and then use index instead key to parse value	2020-12-25 09:19:42 +08:00
Skysheepwang	86e40dd3e5	Fix old tablet inserting bug (#5113 ) #4996 When BE is restarting and the older tablet have been added to the garbage collection queue but not deleted yet. In this case, since the data_dirs are parallel loaded, a later loaded tablet may be older than previously loaded one, which should not be acknowledged as a failure. It should be noted that the _add_tablet_unlocked() method will also be called when creating a new tablet. In that case, the changes in this pull request will not be accessed so there is no affect on the tablet creating process.	2020-12-24 15:20:54 +08:00
令狐少侠	80209ef1b6	Update outfile to support cos.md (#5129 ) update doc to add how to export query result on cos	2020-12-23 20:21:10 +08:00
令狐少侠	7199bcc88b	Update outfile(en) to support cos.md (#5130 ) Export query result to `COS` (Tencent Cloud Object Storage)	2020-12-23 15:39:45 +08:00
曹建华	cf3f830e9a	[Bug-Fix] Fix 'Malformed packet' error when desc OlapTable with Rollup (#4455 ) (#5115 ) Fix 'Malformed packet' error when desc OlapTable with Rollup #4455	2020-12-23 09:34:12 +08:00
Mingyu Chen	c57145b4c2	[Bug] Fix bug that routine load may lost some data (#5093 ) In the previous implementation, whether a subtask is in commit or abort state, we will try to update the job progress, such as the consumed offset of kafka. Under normal circumstances, the aborted transaction does not consume any data, and all progress is 0, so even we update the progress, the progress will remain unchanged. However, in the case of high cluster load, the subtask may fail half of the execution on the BE side. At this time, although the task is aborted, part of the progress is updated. Cause the next subtask to skip these data for consumption, resulting in data loss.	2020-12-23 09:33:52 +08:00
Zhengguo Yang	5f2868667a	[Script] Check if ninja exist in build.sh (#5099 ) Add a check of ninja exist to build.sh	2020-12-19 11:18:50 +08:00
Yingchun Lai	176dcf8bd9	[Trace] Add trace for create tablet tasks (#5091 ) Add trace for create tablet tasks, it's a useful tool for admin to find out the bottleneck when create tablets timeouted. For example, admin could enlarge 'tablet_map_shard_size' when found 'got tablets shard lock' procedure cost too much time.	2020-12-19 11:18:12 +08:00
Lijia Liu	6673306fda	[DOC] fix toSql of ShowPartitionsStmt (#5070 )	2020-12-19 11:18:00 +08:00
xinghuayu007	9ddf434f6b	[Bug-Fix] Fix partition cache match bug (#5060 ) When partition cache is not cached continuely, range query may fail. For example, partition key 20201011 and 20201013 is cached, but rang query is between 20201011 and 20201013, the query will not hit the cache. issue:#5059	2020-12-19 11:17:44 +08:00
ccoffline	5bf84814cc	[Doc] Improve broadcast instructions (#5048 )	2020-12-19 11:16:59 +08:00
Mingyu Chen	984807910f	[Bug] Fix bug when delete condition is null but zonemap is not null (#5109 ) If a column does not have any null value, and execute a delete operation with "where k1 is null" on it, BE may crash. This bug is introducaed from #5030	2020-12-18 21:39:52 +08:00
Mingyu Chen	3d4b2cb1ae	[Bug] Fix tablet shared ptr circular reference causing the tablet not to be cleared (#5100 ) Regardless of whether the tablet is submitted for compaction or not, we need to call 'reset_compaction' to clean up the base_compaction or cumulative_compaction objects in the tablet, because these two objects store the tablet's own shared_ptr. If it is not cleaned up, the reference count of the tablet will always be greater than 1, thus cannot be collected by the garbage collector. (TabletManager::start_trash_sweep) This bug is introduced from #4891	2020-12-18 21:17:18 +08:00
Yingchun Lai	f6881d2f7b	[Bug] Fix coredump bug when create new tablets (#5089 ) There is a bug may cause BE coredump when create tablet, the accessing of tablet_set of a data dir should be protected by lock.	2020-12-17 00:34:31 +08:00
HappenLee	b485c10d56	[ODBC] ODBC Catalog do not show password in 'show resource' (#5088 ) issue:#5087	2020-12-17 00:34:04 +08:00
EmmyMiao87	9864a5d818	[Enhance] Modify the error message when mv column is transformed from base column in agg family table (#5084 ) When user wants to create materialized view with a mv column which is transformed from original column in agg family table, Doris will throw a new error message "The mv column of agg or uniq table cannot be transformed from original column" instead of "column not exists".	2020-12-17 00:33:27 +08:00
stdpain	ef15c5151c	[BUG] Fix colocate balance bug when no available BE (#5079 )	2020-12-17 00:32:42 +08:00
Mingyu Chen	b640991e43	[Enhance] Add profile for load job (#5052 ) Add viewable profile for broker load. Similar to the query profile, the user can submit the import job by setting the session variable is_report_success to true, and then view the running profile of the job on the FE web page for easy analysis and debugging.	2020-12-16 23:52:10 +08:00
EmmyMiao87	74bfd69595	[Bug] Forbidden creating table with dynamic partition when FE.config dynamic_partition_enable=false (#5043 ) - There is a fe configuration called dynamic_partition_enable which controls the opening and closing of the dynamic partition function. When this configuration is false, it means that all tables do not support dynamic partitioning. - But when the user tried to create the dynamic partition table, Doris did not detect this parameter. This will cause the user can normally create a dynamic partition table, but in fact Doris cannot create a partition for this table. - This pr detect this config when building the table. The dynamic partition table can be created only when the dynamic_partition_enable configuration is true. If the configuration is false, the command to create a dynamic partition table will directly report an error.	2020-12-16 23:44:20 +08:00
caiconghui	dfa413335f	[Heartbeat] Support fe heartbeat use thrift protocol to get stable response (#5027 ) This PR is to support fe master get fe heartbeat response by thrift protocol instead of http protocol.	2020-12-16 23:38:04 +08:00
Youngwb	650536d53e	[Feature] Add Topn udaf (#4803 ) For #4674 This is a udaf for approximate topn using Space-Saving algorithm. At present, we can only calculate the frequent items and their frequencies in a certain column, based on which we can implement similar topN functions supported by Kylin in the future. I have also added a test to calculate the accuracy of this algorithm. The following is a rough running result. The total amount of data is 1 million lines and follows the Zipfian distribution, where Element Cardinality represents the data cardinality, 20X, 50X.. The value representing space_expand_rate is 20,50, which is used to set the counter number in the space-saving algorithm ``` zf exponent = 0.5 Element cardinality 20X 50X 100X 1000 100% 100% 100% 10000 100% 100% 100% 100000 100% 100% 100% 500000 94% 98% 99% zf exponent = 0.6，1 Element cardinality 20X 50X 100X 1000 100% 100% 100% 10000 100% 100% 100% 100000 100% 100% 100% 500000 100% 100% 100% ```	2020-12-16 21:58:34 +08:00
stdpain	6afa14cda7	[Bug] Fix Memory Leak in Json Load (#5073 ) fix json load memory leak #5069	2020-12-15 22:55:47 +08:00
Mingyu Chen	81c7c0360e	[Bug] Fix a core dump of counter in BE (#5078 ) Introduced by PR #5051. As @liutang123 said, when PlanFragmentExecutor is destructed, it will call `close -> ExecNode::close -> OlapScanNode::close`. OlapScanNode will wait for `_transfer_thread`. `_transfer_thread` will wait for all OlapScanner processing to complete. OlapScanner is processed by the scanner thread. When the last scanner processing is completed, `_transfer_thread` will break out of the loop, and PlanFragmentExecutor will continue to destruct. And if it is completed, its RuntimeProfile::Counter will also be destructed. At this time, the ScopedTimer in the Scan thread may still use this Counter when it is destructed. So we must make sure that the timer is deconstructed before deconstructing the runtime profile.	2020-12-15 09:33:38 +08:00
HuangWei	49f26f4413	[UT] cleanup storage engine creation in tablet_mgr_test etc (#5077 ) Mistakenly use the string '_engine_data_path' as the path, actually the storage engine is not open, so option/path is needless. Cleanup it to avoid any doubt about the file path management.	2020-12-15 09:30:32 +08:00
HappenLee	0a0e46fd53	[Bug] Fix the bug of where condition a in ('A', 'B', 'V') and a in ('A') return error result (#5072 ) And Refactor ColumnRangeValue and OlapScanNode This patch mainly do the following: - Fix issue #5071 - Change type_min in ColumnRangeValue as static - Add Class of type_limit make code clear - Refactor the function of normalize_in_and_eq_predicate	2020-12-15 09:29:10 +08:00
Mingyu Chen	90e7f7005e	[Bug] Fix bug that query multi mysql external table with union will get incomplete result (#5067 ) The `eos` flag should be reset to false after opening next child of union node.	2020-12-15 09:28:39 +08:00
Zhengguo Yang	193db4207e	[enhancement]improve performance of json load (#5055 ) * imporve performance of json load	2020-12-15 09:27:51 +08:00
Zhengguo Yang	2e5126cc09	support ninja build system (#5076 ) Add ninja build system support, if you installed ninja you can building be by ninja using bash build.sh --be --ninja. ninja build is more faster than make	2020-12-15 09:27:20 +08:00

... 208 209 210 211 212 ...

13073 Commits