doris

Author	SHA1	Message	Date
jiafeng.zhang	267e8b67c2	[refactor][doc]The new version of the document is online (#9272 ) replace the `docs/` with `new-docs/`	2022-04-28 15:22:34 +08:00
EmmyMiao87	1378e7e05f	(Refactor)[Planner] Remove merge node (#9251 )	2022-04-28 15:05:35 +08:00
yiguolei	2c0bccef24	[feature-wip](global-dict) global dict thrift definition (#9243 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-04-28 14:42:41 +08:00
Zhengguo Yang	b6b6e17eb7	[chore] (workflow)add sonarcloud workflow to check code quality and security (#9252 )	2022-04-28 11:09:56 +08:00
Mingyu Chen	0b6758cacd	[fix](checkpoint) fix checkpoint failure when reloading new image (#9262 ) Introduced from #9011	2022-04-28 09:47:16 +08:00
Gabriel	5cbb4a2317	[Improvement](docs) Update EN doc (#9228 )	2022-04-27 23:22:38 +08:00
xy720	2ec0b98787	[fix](routine-load) Fix bug that new coming routine load tasks are rejected all the time and report TOO_MANY_TASK error (#9164 ) ``` CREATE ROUTINE LOAD iaas.dws_nat ON dws_nat WITH APPEND PROPERTIES ( "desired_concurrent_number"="2", "max_batch_interval" = "20", "max_batch_rows" = "400000", "max_batch_size" = "314572800", "format" = "json", "max_error_number" = "0" ) FROM KAFKA ( "kafka_broker_list" = "xxxx:xxxx", "kafka_topic" = "nat_nsq", "property.kafka_default_offsets" = "2022-04-19 13:20:00" ); ``` In the create statement example below, you can see The user didn't specify the custom partitions. So that 1. Fe will get all kafka partitions from server in routine load's scheduler. The user set the default offset by datetime. So that 2. Fe will get kafka offset by time from server in routine load's scheduler. When 1 is success, meanwhile 2 is failed, the progress of this routine load may not contains any partitions and offsets. Nevertheless, since newCurrentKafkaPartition which is get by kafka server may be always equal to currentKafkaPartitions, the wrong progress will never be updated.	2022-04-27 23:21:17 +08:00
shee	5a7e46fe7b	[fix](planner) fix non-equal out join is not supported (#9156 )	2022-04-27 23:19:13 +08:00
Xinyi Zou	26bc462e1c	[feature-wip] (memory tracker) (step5) Fix track bthread, fix track vectorized query (#9145 ) 1. fix track bthread - Bthread, a high performance M:N thread library used by brpc. In Doris, a brpc server response runs on one bthread, possibly on multiple pthreads. Currently, MemTracker consumption relies on pthread local variables (TLS). - This caused pthread TLS MemTracker confusion when switching pthread TLS MemTracker in brpc server response. So replacing pthread TLS with bthread TLS in the brpc server response saves the MemTracker. Ref: `731730da85/docs/en/server.md (bthread-local)` 2. fix track vectorized query - Added track mmap. Currently, mmap allocates memory in many places of the vectorized execution engine. - Refactored ThreadContext to avoid dependency conflicts and make it easier to debug. - Fix some bugs.	2022-04-27 20:34:02 +08:00
ElvinWei	dfbeeccd47	[feature-wip](statistics) step2: schedule the statistics job and generate executable tasks (#8859 ) This pull request includes some implementations of the statistics(https://github.com/apache/incubator-doris/issues/6370), it will not affect any existing code and users will not be able to create statistics job. After receiving the statistics collection statement, it generates a job. Here it implements the division of statistics collection jobs according to the following statistics categories: table: - `row_count`: table row count are critical in estimating cardinality and memory usage of scan nodes. - `data_size`: table size, not applicable to CBO, mainly used to monitor and manage table size. column: - `num_distinct_value`: used to determine the selectivity of an equivalent expression. - `min`: The minimum value. - `max`: The maximum value. - `num_nulls`: number of nulls. - `avg_col_len`: the average length of a column, in bytes, is used for memory and network IO evaluation. - `max_col_len`: the Max length of the column, in bytes, is used for memory and network IO evaluation. After the job is divided, statistics tasks will be obtained.	2022-04-27 11:05:43 +08:00
GoGoWen	4f19fe81ec	remove some unused code (#9240 )	2022-04-27 11:04:16 +08:00
Mingyu Chen	923c38398f	[test] reset default port in regression test conf (#9246 ) Co-authored-by: morningman <chenmingyu@baidu.com>	2022-04-27 11:02:32 +08:00
Zhengguo Yang	597115c305	[feature] add `SHOW TABLET STORAGE FORMAT` stmt (#9037 ) use this stmt to show tablets storage format in be, if verbose is set, will show detail message of tablet storage format. e.g. ``` MySQL [(none)]> admin show tablet storage format; +-----------+---------+---------+ \| BackendId \| V1Count \| V2Count \| +-----------+---------+---------+ \| 10002 \| 0 \| 2867 \| +-----------+---------+---------+ 1 row in set (0.003 sec) MySQL [test_query_qa]> admin show tablet storage format verbose; +-----------+----------+---------------+ \| BackendId \| TabletId \| StorageFormat \| +-----------+----------+---------------+ \| 10002 \| 39227 \| V2 \| \| 10002 \| 39221 \| V2 \| \| 10002 \| 39215 \| V2 \| \| 10002 \| 39199 \| V2 \| +-----------+----------+---------------+ 4 rows in set (0.034 sec) ``` add storage format infomation to show full table statment. ``` MySQL [test_query_qa]> show full tables; +-------------------------+------------+---------------+ \| Tables_in_test_query_qa \| Table_type \| StorageFormat \| +-------------------------+------------+---------------+ \| bigtable \| BASE TABLE \| V2 \| \| test_dup \| BASE TABLE \| V2 \| \| test \| BASE TABLE \| V2 \| \| baseall \| BASE TABLE \| V2 \| \| test_string \| BASE TABLE \| V2 \| +-------------------------+------------+---------------+ 5 rows in set (0.002 sec) ```	2022-04-27 10:53:43 +08:00
Zhengguo Yang	c1ae1a0fa2	remove gensrc/proto/palo_internal_service.proto, this removed in #6341 and add back in #6329 by mistake (#9233 )	2022-04-27 08:25:01 +08:00
mklzl	b406684486	Modify incorrect comments in ShowExecutor (#9232 ) Fixed some incorrect comments in ShowExecutor	2022-04-26 19:10:49 +08:00
Daniel Gruno	7076ba40ed	[infra] Adjust .asf.yaml spacing to make it parse properly	2022-04-26 10:46:58 +02:00
zhannngchen	87fc46f84c	update comments in run-be-ut.sh (#9092 )	2022-04-26 12:48:35 +08:00
SleepyBear	47a59c7fe6	[fix](OlapScanner)fix bitmap or hll's OOM when loading too many unqualified data (#9205 )	2022-04-26 10:25:56 +08:00
Toms1999	a20cf1e03e	[typo](annotation): fix typo in ldap.conf (#9200 )	2022-04-26 10:25:07 +08:00
Pxl	951c2a90eb	[fix](Lateral-View)(Vectorized) core dump on lateral-view with nullable column (#9191 )	2022-04-26 10:24:11 +08:00
caoliang-web	da4e7ec6c2	[refactor](doc)Cluster upgrade adds metadata backup (#9189 )	2022-04-26 10:22:07 +08:00
Pxl	e772163b98	[fix](script) meet error on start_fe.sh(#9187 ) start_fe.sh: line 174: [: -eq: unary operator expected	2022-04-26 10:21:03 +08:00
Userwhite	555cc0dfce	[fix] fix sequence bug in non-vec mode (#9184 )	2022-04-26 10:15:59 +08:00
Mingyu Chen	7cfebd05fd	[fix](hierarchical-storage) Fix bug that storage medium property change back to SSD (#9158 ) 1. fix bug described in #9159 2. fix a `fill_tuple` bug introduced from #9173	2022-04-26 10:15:19 +08:00
spaces-x	62b38d7a75	[fix](spark load) fix `getHashValue` of string type is always zero in spark load. (#9136 ) Buffer flip is used incorrectly. When the hash key is string type, the hash value is always zero. The reason is that the buffer of string type is obtained by wrap, which is not needed to flip. If we do so, the buffer limit for read will be zero.	2022-04-26 10:14:21 +08:00
camby	88115ffcb3	[feature-wip](array-type) ArrayFileColumnIterator bug fix (#9114 )	2022-04-26 09:35:46 +08:00
zhangstar333	cdd1b6d6dd	[fix](function) fix lag/lead function return invalid data (#9076 )	2022-04-26 09:34:46 +08:00
dataroaring	9e13be4cb6	[github] enable requested status check before merging pull requests (#9222 )	2022-04-26 08:58:36 +08:00
Henry2SS	bdf915abd4	[Enhancement] (image) check image validity as soon as generated (#9011 ) * load newly generated image file as soon as generated to check if it is valid. * delete the latest invalid image file * fix * fix * get filePath from saveImage() to ensure deleting the correct file while exception happens * fix Co-authored-by: wuhangze <wuhangze@jd.com>	2022-04-25 19:35:41 +08:00
dataroaring	687421b43f	keep at least one validated image file (#9192 ) * rename ImageSeq to LatestImageSeq in Storage * keep at least one validated image file	2022-04-25 19:32:43 +08:00
yiguolei	3bdfcde8e8	[Improvement] not print logs to fe.out when fe is running under daemon mode (#9195 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-04-25 18:29:29 +08:00
Stalary	7226089116	FIX: getChannel -> getChannel() (#9217 ) Co-authored-by: Rongqian Li <rongqian_li@idgcapital.com>	2022-04-25 17:46:00 +08:00
dataroaring	5b9a1a2a5d	avoiding a corrupt image file when there is image.ckpt with non-zero … (#9180 ) * avoiding a corrupt image file when there is image.ckpt with non-zero size For now, saveImage writes data to image.ckpt via an append FileOutputStream, when there is a non-zero size file named image.ckpt, a disaster would happen due to a corrupt image file. Even worse, fe only keeps the lastest image file and removes others. BTW, image file should be synced to disk. It is dangerous to only keep the latest image file, because an image file is validated when generating the next image file. Then we keep an non validated image file but remove validated ones. So I will issue a pr which keeps at least 2 image file. * append other data after MetaHeader * use channel.force instead of sync	2022-04-25 17:01:01 +08:00
Gabriel	b81f49b0d3	[BUG] fix compiling bug for java udf (#9161 )	2022-04-25 10:02:01 +08:00
SleepyBear	c3d0fee01b	[fix](broker load) sync the workflow of BrokerScanner to other Scanner to avoid oom (#9173 )	2022-04-25 10:01:42 +08:00
Stalary	af2295f971	MOD: remove <scope>provided</scope> (#9177 )	2022-04-25 10:00:57 +08:00
dataroaring	a608c3d5dc	[Fixbug]assure transaction num in image file is right (#9181 ) For now, dbTransactionManager::getTransactionNum is only used by checkpoint to get transaction num to put into a image file. However, transactions written into a image file do not come from the same data structure as the num comes. Thus, we should pay much attention to assure two data structue is consistent on size. Actually, it is very difficult to do so. This patch just let getTransactionNum get number from the same data structure as write method. The change was introduced by b93e841688.	2022-04-25 09:59:18 +08:00
Pxl	2d83167e50	[Feature] [Lateral-View] support outer combinator of table function (#9147 )	2022-04-24 12:09:40 +08:00
Stalary	4e1b75f5e7	[doc] add docker for Mac note (#9178 )	2022-04-23 22:08:53 +08:00
caoliang-web	48ac0d9591	[Refactor][doc]Modify the flink doris connector compilation documentation (#9169 )	2022-04-23 22:08:09 +08:00
wudi	bfa9814350	[doc] add scala2.11 compile doc (#9166 )	2022-04-23 22:07:45 +08:00
jiafeng.zhang	f2d741fa95	[doc] Modify the release version to prepare the key generation problem solution (#9165 )	2022-04-23 22:06:48 +08:00
liuzhuang2017	4911d6898a	[docs][typo] Fix some typos in "alter-table" content. (#9131 )	2022-04-23 22:05:13 +08:00
jakevin	6756db6587	[enhancment](): polish ignore with build_ (#9128 )	2022-04-23 22:04:46 +08:00
liuzhuang2017	4445d3188d	[docs][typo] Fix some typos in "getting-started" content. (#9124 )	2022-04-23 22:03:59 +08:00
ZenoYang	ae25633d50	[fix](cache) Generate md5 value using utf8 encoding for sqlkey string (#9121 )	2022-04-23 21:37:34 +08:00
caiconghui	89d37d920e	[fix](transaction) Fix running transaction num always be zero when execute show proc '/transactions' stmt (#9106 )	2022-04-23 21:37:18 +08:00
Henry2SS	4a10b37ca2	[feature](image tool) support image load tool (#8982 )	2022-04-23 21:36:58 +08:00
pengxiangyu	e157c2c254	[feature-wip](remote-storage) step3: Support remote storage, only for be, add migration_task_v2 (#8806 ) 1. Add TStorageMigrationReqV2 and EngineStorageMigrationTask to support migration action 2. Change TabletManager::create_tablet() for remote storage 3. Change TabletManager::try_delete_unused_tablet_path() for remote storage	2022-04-22 22:38:10 +08:00
Elvin wei	e880dde7a5	[feature-wip](statistics) step1: create the statistics job (#8858 ) This is the first PR for statistics collection includes some implementations of the statistics(#6370), it will not affect any existing code and users will not be able to create statistics job. It mainly implements the semantic checking module for statistical information collection jobs, and the job creation module. The syntax is: ANALYZE [[ db_name.tb_name ] [( column_name [, ...] )], ...] [ PROPERTIES(...) ] e.g. ANALYZE; ANALYZE tbl1; ANALYZE tbl1(col1, col2) PROPERTIES("cbo_ statistics_ task_ timeout" = "10"); Two configurations have been added: Timeout time of a single task max_cbo_statistics_task_timeout_sec The maximum number of running jobs the system can receive cbo_max_statistics_job_num Co-authored-by: weizhengte <1141550741@qq.com> Co-authored-by: weizhengte <weizhengte@foxmail.com> Co-authored-by: EmmyMiao87 <522274284@qq.com> Co-authored-by: frankywei <frankywei@tencent.com>	2022-04-22 18:24:54 +08:00

... 42 43 44 45 46 ...

6608 Commits