doris

Author	SHA1	Message	Date
Mingyu Chen	7db8841ae2	[Feature][ResourceTag] Support Resource Tag (#6203 ) #5902 This CL mainly changes: 1. Support setting tags for BE nodes: ``` alter system add backend "1272:9050, 1212:9050" properties("tag.location": "zoneA"); alter system modify backend "1272:9050, 1212:9050" set ("tag.location": "zoneB"); ``` And for compatibility, all BE nodes will be set a "default" tag when upgrading: `"tag.location": "default"`. 2. Create a new class `ReplicaAllocation` to replace the previous `replication_num`. `ReplicaAllocation` represents the allocation of the replicas of a tablet. It contains a map from Tag to number of replicas. For example, if user set a table's replication num to 3, it will be converted to a ReplicaAllocation like: `"tag.location.default" : "3"`, which means the tablet will have 3 replicas and all of them will be allocated in BE nodes with tag "default"; 3. Support create table with replication allocation: ``` CREATE TABLE example_db.table_hash ( k1 TINYINT ) DISTRIBUTED BY HASH(k1) BUCKETS 32 PROPERTIES ( "replication_allocation"="tag.location.zone1:1, tag.location.zone2:2" ); ``` Also support set replica allocation for dynamic tables, and modify replica allocation at runtime. For compatibility, user can still set "replication_num" = "3", and it will be automatically converted to: ` "replication_allocation"="tag.location.default:3"` 4. Support tablet repair and balance based on Tag 1. For tablets of non-colocate table, most of the logic is the same as before, but when selecting the destination node for clone, the tag of the node will be considered. If the required tag does not exist, it cannot be repaired. Similarly, under the condition of ensuring that the replicas are complete, the tablet will be reallocated according to the tag or the replicas will be balanced. Balancing is performed separately within each resource group. 2. For tablets of colocate table, the backends sequence of buckets will be splitted by tag. For example, if replica allocation is "tag.location.zone1:1, tag.location.zone2:2", And zone1 has 2 BE: A, B; zone2 has 3 BE: C, D, F there will be 2 backend sequences: one is for zone1, and the other is for zone2. And one posible seqeunces will be: zone1: [A] [B] [A] [B] zone2: [C, D][D, F][F, C][C, D] 5. Support setting tags for user and restrict execution node with tags: ``` set property for 'cmy' 'resource_tags.location' : 'zone1, zone2'; ``` After setting, the user 'cmy' can only query data stored on backends with tag zone1 and zone2, And query can only be executed on backends with tag zone1 and zone2 For compatibility, after upgrading, the property `resource_tags.location` will be empty, so that user can still query data stored on any backends. 6. Modify the Unit test frame of FE so that we can created multi backends with different mocked IP in unit test. This help us to easily test some distributed cases like query, tablet repair and balance The document will be added in another PR. Also fix a bug described in #6194	2021-09-04 10:59:35 +08:00
ccoffline	df54b34f98	[Catalog] Enforce null check at Catalog.getDb and Database.getTable (#6416 ) fix #5378 #5391 #5688 #5973 #6155 and all replay NPE. All replay method can now throw MetaNotFoundException and caught to log a warning for potential inconsistent metadata cases. try to establish a clear notice for future developer to check null.	2021-09-03 13:34:49 +08:00
王连松	79fd117d60	Update load-json-format.md (#6546 ) change stripe_outer_array to strip_outer_array	2021-09-02 16:08:09 +08:00
Xiang Wei	6ac0ab6b29	fix(sparkload): bitmap deep copy in `or` operator (#6480 ) * fix(sparkload): bitmap deep copy in `or` operator fix multi rollup hold the same Ref of bitmapvalue which may be updated repeatedly. * fix(sparkload): bitmap deep copy in `or` operator fix multi rollup hold the same Ref of bitmapvalue which may be updated repeatedly. Co-authored-by: weixiang <weixiang06@meituan.com>	2021-09-02 12:15:02 +08:00
weizuo93	57199955d6	[Compaction][ThreadPool]Support adjust compaction threads num at runtime (#5781 ) * adjust thread number of compaction thread pool dynamically Co-authored-by: weizuo <weizuo@xiaomi.com>	2021-09-02 10:01:44 +08:00
wangbo	d8cde8c044	(#6454 ) Remove useless code for Segment V2 (#6455 )	2021-09-02 09:59:21 +08:00
zhangstar333	7a15e583a7	[Feature]Support functions of json_array, json_object, json_quote (#6504 )	2021-09-02 09:59:02 +08:00
Pxl	4dd610c28d	[Feature] Support for storage layer benchmark (#6506 ) * add benchmark tool	2021-09-02 09:57:19 +08:00
Zhengguo Yang	9f7d4cf741	[BUG] fix bugs with string type (#6538 ) * fix bugs with string type 1. not support string with agg type min/max 2. agg_update with large string may coredump 3. stringval with large string may coredump 4. not support string as partition key	2021-09-01 15:59:55 +08:00
zhoubintao	e01a845a4a	[Doc] Update stream-load-manual.md (#6524 ) Origin stream load column order transformation is unclear , a user is struggling for a long time in this part ,so i modified some expressions to make it clearer.	2021-09-01 13:28:25 +08:00
Mingyu Chen	e795c7d2cc	[Community] Add new template for issues (#6534 ) * [Community] Add new template for issues Inpired by Apache Skywalking https://github.com/apache/skywalking/issues/new?assignees=&labels=bug&template=bug-report.yml&title=%5BBug%5D+	2021-09-01 09:59:44 +08:00
shee	a949dcd9f6	[Feature] Create table like clause support copy rollup (#6475 ) for issue #6474 ```sql create table test.table1 like test.table with rollup r1,r2 -- copy some rollup create table test.table1 like test.table with rollup all -- copy all rollup create table test.table1 like test.table -- only copy base table ```	2021-08-31 20:33:26 +08:00
Zhengguo Yang	138e7e896d	Fix min(string) Unable to find symbol (#6531 )	2021-08-31 11:19:13 +08:00
weizuo93	d5d8316ff3	[Optimize][Clone] Take version count into consideration when choosing src replica for clone task (#6513 ) Fix #6512 If there is missing replica for a tablet, clone task will be executed to restore missing replica from a healthy replica. Src replica selector will randomly choose a healthy replica as src replica. It's better to choose the health replica with min version count as src replica so that it could avoid repetitive compaction task. In addition, replica with less version count is good for query performance.	2021-08-30 18:52:41 +08:00
Zeno Yang	7324f4b0ae	[Bug] Regularly clean up old DeleteInfos in the DeleteHandler (#6448 ) fix #6447 1. FE master regularly triggers the remove operation 2. After the master completes the removal of deleteInfo, it is synchronized to the Follower through editlog for remove 3. When the DeleteInfo creation time is longer than the current time, it will be cleaned up, which is determined by the `delete_info_keep_max_second` configuration	2021-08-30 18:52:18 +08:00
GeoffreyStark	abbc9202af	support routine load isolation_level read_committed (#6191 ) Co-authored-by: Geoffrey <gaofeng01@rd.netease.com>	2021-08-30 17:22:08 +08:00
caiconghui	0393c9b3b9	[Optimize] Support send batch parallelism for olap table sink (#6397 ) * Support send batch parallelism for olap table sink Co-authored-by: caiconghui <caiconghui@xiaomi.com>	2021-08-30 11:03:09 +08:00
Pxl	5eed1f897a	[Document] update docker env version to 1.3.1 (#6517 ) * update docker env version	2021-08-30 11:01:39 +08:00
caiconghui	a2a13dadba	[Optimize] Make light schema change complete more faster under concurrent conditions (#6292 ) * [Optimize] Make schema change complete more faster under concurrent conditions Co-authored-by: caiconghui <caiconghui@xiaomi.com>	2021-08-29 09:41:56 +08:00
weizuo93	dedb57f87e	[Enhancement] Modify the method of calculating compaction score (#6252 ) * optimize calculation method of compaction score to lower the priority of rowset with 0 segments Co-authored-by: weizuo <weizuo@xiaomi.com>	2021-08-27 11:10:41 +08:00
月眸	ace21ebf83	Doris-spark connector examples (#6485 ) * doris spark connector examples * add usage documentation and license Co-authored-by: shengy <whyMy2017>	2021-08-27 10:57:11 +08:00
Mingyu Chen	3f2fdd236f	Add scan thread token (#6443 )	2021-08-27 10:56:17 +08:00
caoliang-web	4cfebc35a7	Flink reads multiple data sources to doris (#6490 ) * Flink reads multiple data sources to doris Co-authored-by: caol <caol@shuhaisc.com>	2021-08-27 10:55:53 +08:00
jiafeng.zhang	a7b8d110a0	Spark 2.x and 3.x version compilation instructions (#6503 ) Spark 2.x and 3.x version compilation instructions	2021-08-27 10:55:29 +08:00
weizuo93	7235d86331	[Bug] Support show load for insert 0 row (#6510 ) * support show load for insert 0 row * support show load for insert 0 row Co-authored-by: weizuo <weizuo@xiaomi.com>	2021-08-27 10:55:02 +08:00
stdpain	bfb2252175	[RuntimeFilter] provide no simd block bloom filter implement to support arm (#6511 )	2021-08-27 10:22:36 +08:00
Zhengguo Yang	ca3eb6490e	push down conditions on unique table value columns to base rowset (#6457 )	2021-08-26 09:14:49 +08:00
Zhengguo Yang	acc5fd2f21	[BUG] Fix string type cast bug and runtime filter may core when not support avx2 (#6495 ) * fix string type cast bug and runtime filter instructions may not support * add arm support	2021-08-26 09:14:31 +08:00
PKU-zhoubintao	5419d74abf	[Doc]Update hit-the-rollup.md (#6430 )	2021-08-25 22:35:05 +08:00
luozenglin	92e50504e5	[Feature] Supports case-insensitive table names. (#6403 ) Implement the lower_case_table_names variable of mysql. The value meaning is as follows: 0: the table names are case-sensitive. 1: table names are stored in lowercase and comparisons are not case sensitive. 2: table names are stored as given but compared case-insensitively.	2021-08-25 22:34:45 +08:00
Xinyi Zou	96013decd3	[BUG] Fixed the materialized number of resultExprs/constExprs and output slot of Union Node is inconsistent (#6380 )	2021-08-25 22:33:49 +08:00
Mingyu Chen	fa290383dc	[Doc] Modify README to add some statistical indicators (#6486 ) 1. Add license/total line/release badegs. 2. Add monthly active contributor and contributor growth graph 3. fix a pom.xml bug 4. Modify some routine load log on BE side	2021-08-25 09:36:26 +08:00
caiconghui	7e30b28f3a	[Optimize] Speed up converting the data of other types to string in mysql_result_writer (#6384 ) Co-authored-by: caiconghui <caiconghui@xiaomi.com>	2021-08-24 22:30:58 +08:00
Zhengguo Yang	146060dfc0	[Bug]Fix result_writer may coredump (#6482 ) fix result_writer may coredump, let BufferControlBlock owns the memory	2021-08-22 22:04:00 +08:00
wunan1210	4ff6eb55d0	[FlinkConnector] Make flink datastream source parameterized (#6473 ) make flink datastream source parameterized as List<?> instead of Object.	2021-08-22 22:03:32 +08:00
Hao Tan	c71f58fef9	[Doc] Add sidebar for percentile doc (#6470 )	2021-08-22 22:03:07 +08:00
Mingyu Chen	0cf2bc6644	[Doc] Refactor all grammar help documents (#6337 ) See #6336 for details	2021-08-22 22:02:51 +08:00
xy720	6c23f8d413	[Bug] Fix bug that check point load image failed in some circumstance (#6465 ) Fix bug that check point load image failed in some circumstance	2021-08-19 14:17:57 +08:00
Xiang Wei	52f39e3fde	[Bug][SparkLoad]: bitmap value in `or` operator in spark load should be deep copied (#6453 ) fix multi rollup hold the same Ref of bitmapvalue which may be updated repeatedly. fix #6452	2021-08-19 14:17:31 +08:00
Mingyu Chen	fa382f8602	[Bug][MemLimit] Modify the memory limit of storage page cache (#6451 ) This CL mainly changes: 1. the `storage_page_cache_limit` is based on config `mem_limit` the default is 20% of `mem_limit`. 2. the `buffer_pool_limit` is based on config `mem_limit` the default is 20% of `mem_limit`. 3. the `buffer_pool_clean_pages_limit` is based on config `buffer_pool_limit` the default is 50% of `buffer_pool_limit` 4. Fix some show bugs of lru cache hit ratio and usage ratio 5. Fix a create view bug that `notEvalNondeterministicFunction` should be reset after analyze.	2021-08-19 14:16:53 +08:00
Xiang Wei	c65ec3136b	[Improvement] spark load without agg and de/serialization (#6270 ) fix #6269 The outline of our changes is to improve our memory in case of OOM in BE and to speed up the calculation. 1. We do not need to do Aggregation in load, which has already been done in the ETL spark job. 2. Based on 1, we do not need to serialize/deserialize bitmap/HLL objects.	2021-08-19 14:15:01 +08:00
jiafeng.zhang	4ea2fcefbc	[Improve]The connector supports spark 3.0, flink 1.13 (#6449 ) Modify the flink/spark compilation documentation	2021-08-18 15:57:50 +08:00
Hao Tan	66a7a4b294	[Feature] Support exact percentile aggregate function (#6410 ) Support to calculate the exact percentile value array of numeric column `col` at the given percentage(s).	2021-08-18 15:56:06 +08:00
Mingyu Chen	9148bcb673	[Build] Reduce the parallel of build (#6469 )	2021-08-18 15:24:19 +08:00
Pxl	999eaeb276	fix `Wrong use on SCOPED_RAW_TIMER` (#6459 )	2021-08-18 09:06:18 +08:00
Zhengguo Yang	0c5c3f7d87	Fixed the problem that there may be redundant retries when the query result export fails (#6436 )	2021-08-18 09:06:02 +08:00
Zhengguo Yang	8738ce380b	Add long text type STRING, with a maximum length of 2GB. Usage is similar to varchar, and there is no guarantee for the performance of storing extremely long data (#6391 )	2021-08-18 09:05:40 +08:00
wunan1210	2f90aaab8e	[Doc] flink/spark connector: add sources/javadoc plugins (#6435 ) spark-doris-connector/flink-doris-connect add plugins to generate javadoc and sources jar, so can be easy to distribute and debug.	2021-08-16 22:41:24 +08:00
huzk	b13e512a65	[Feature] Support spark connector sink data to Doris (#6256 ) support spark conector write dataframe to doris	2021-08-16 22:40:43 +08:00
EmmyMiao87	63a0d9d23a	Add statistics struct and Support manually inject statistics (#6420 ) * Add statistics struct and Support manually inject statistics This PR mainly developed the data structure used by statistical information and the function of manually modifying the statistical information. We use a statistics package alone to store statistical information, and use the 'statistics manager' as a unified entry for statistical information. For detailed data structure and explanation, please refer to the comments on the class. Manually modify statistics include: Manually modify table statistics and column statistics. The syntax is explained in the issue #6370. * Show table and column statistics 'SHOW TABLE STATS' used to show the statistics of table. 'SHOW COLUMN STATS' used to show the statistics of columns. Currently, only the tables and columns for setting statistics will be displayed in the results.	2021-08-16 17:20:05 +08:00

1 2 3 4 5 ...

3279 Commits