doris

Author	SHA1	Message	Date
Xinyi Zou	f2a34dde52	[fix](memory) Fix memory leak due to incorrect block reuse of AggregateFunctionSortData #19214	2023-05-05 14:29:34 +08:00
Ashin Gau	b6c7f3aeb8	[opt](FileCache) Add file cache metrics and management (#19177 ) Add file cache metrics and management. 1. Get file cache metrics > If the performance of file cache is not efficient, there are currently no metrics to investigate the cause. In practice, hit ratio, disk usage, and segments removed status are very important information. API: `http://be_host:be_webserver_port/metrics` File cache metrics for each base path start with `doris_be_file_cache_` prefix. `hits_ratio` is the hit ratio of the cache since BE startup; `removed_elements` is the num of removed segment files since BE startup; Every cache path has three queues: index, normal and disposable. The capacity ratio of the three queues is 1:17:2. ``` doris_be_file_cache_hits_ratio{path="/mnt/datadisk1/gaoxin/file_cache"} 0.500000 doris_be_file_cache_hits_ratio{path="/mnt/datadisk1/gaoxin/small_file_cache"} 0.500000 doris_be_file_cache_removed_elements{path="/mnt/datadisk1/gaoxin/file_cache"} 0 doris_be_file_cache_removed_elements{path="/mnt/datadisk1/gaoxin/small_file_cache"} 0 doris_be_file_cache_normal_queue_max_size{path="/mnt/datadisk1/gaoxin/file_cache"} 912680550400 doris_be_file_cache_normal_queue_max_size{path="/mnt/datadisk1/gaoxin/small_file_cache"} 8500000000 doris_be_file_cache_normal_queue_max_elements{path="/mnt/datadisk1/gaoxin/file_cache"} 217600 doris_be_file_cache_normal_queue_max_elements{path="/mnt/datadisk1/gaoxin/small_file_cache"} 102400 doris_be_file_cache_normal_queue_curr_size{path="/mnt/datadisk1/gaoxin/file_cache"} 14129846 doris_be_file_cache_normal_queue_curr_size{path="/mnt/datadisk1/gaoxin/small_file_cache"} 14874904 doris_be_file_cache_normal_queue_curr_elements{path="/mnt/datadisk1/gaoxin/file_cache"} 18 doris_be_file_cache_normal_queue_curr_elements{path="/mnt/datadisk1/gaoxin/small_file_cache"} 22 ... ``` 2. Release file cache > Frequent segment files swapping can seriously affect the performance of file cache. Adding a deletion interface helps users clean up the file cache. API: `http://be_host:be_webserver_port/api/file_cache?op=release&base_path=${file_cache_base_path}` Return the number of released segment files. If `base_path` is not provide in url, all cache paths will be released. It's thread-safe to call this api, so only the segment files not been read currently can be released. ``` {"released_elements":22} ``` 3. Specify the base path to store cache data > Currently, regression testing lacks test cases of file cache, which cannot guarantee the stability of file cache. This interface is generally used in regression testing scenarios. Different queries use different paths to verify different usage cases and performance. User can set session variable `file_cache_base_path` to specify the base path to store cache data. `file_cache_base_path="random"` as default, means chosing a random path from cached paths to store cache data. If `file_cache_base_path` is not one of the base paths in BE configuration, a random path is used.	2023-05-05 14:28:01 +08:00
minghong	817f3ce510	[fix](nereids) plan shape on tpch_sf1T q21 case #19291	2023-05-05 14:24:28 +08:00
Liqf	525ede54cb	[doc](fix)fix array_map doc tag wrong #19249	2023-05-05 12:44:46 +08:00
ZhangYu0123	63602f9f06	[Chore](thrift) prevent BE to be recompiled many files #19272 Prevent BE to be recompiled many files: When we execute build.sh, it clean thrift code so that BE will be recompiled many files. It is added by this pr #19217 We can use build.sh --clean to clean the thrift code. No need to clean it every time.	2023-05-05 12:28:00 +08:00
Pxl	09b9aba243	[Bug](web) fix web of frontend meet error (#19279 ) * fix web of frontend meet error upgrade servelet api version	2023-05-05 12:26:50 +08:00
Kang	8286098b19	[community](release) add download scripts for 2.0.0-alpha1 release #19289	2023-05-05 12:17:09 +08:00
Gabriel	9dd6c8f87b	[refactor](function) ignore DST for function `from_unixtime` (#19151 )	2023-05-05 11:51:49 +08:00
奕冷	1a1aee3886	[fix](load) exclude canceled job when canceling load (#19268 )	2023-05-05 10:31:16 +08:00
zclllyybb	693a3651c1	[bugfix](rpc) fix read-after-free problem of DeleteClosure (#19250 ) 1. fix read-after-free problem of DeleteClosure. 2. modified fresh_exec_timer for operators	2023-05-05 09:57:54 +08:00
lsy3993	44d95aa3d9	[typo](docs)add new attention of str_to_date function (#19264 )	2023-05-05 09:40:06 +08:00
xiaojunjie	9813406757	[Enhancement](HttpServer) Add http interface authentication for BE (#17753 )	2023-05-04 23:46:49 +08:00
xy720	4b85c2738e	[bug](function)fix potential npe in getFunction() when fe restart (#18989 ) fix potential npe in getFunction() when fe restart	2023-05-04 23:45:22 +08:00
Mingyu Chen	ddd67dba8c	[chore](release) build-for-release.sh support arm (#19270 ) Use `uname -m` to get arch	2023-05-04 19:48:41 +08:00
yiguolei	4e4fb33995	[refactor](conjuncts) simplify conjuncts in exec node (#19254 ) Co-authored-by: yiguolei <yiguolei@gmail.com> Currently, exec node save exprcontext*, but the object is in object pool, the code is very unclear. we could just use exprcontext.	2023-05-04 18:04:32 +08:00
Yongqiang YANG	fa7d86efbd	[improvement](log) log timeout seconds when creating partitions timeout (#19223 )	2023-05-04 17:18:42 +08:00
amory	e9a4cbcdf9	[Refact](type system) refact column with arrow serde (#19091 ) * refact arrow serde * add date serde * update arrow and fix nullable and date type	2023-05-04 15:28:46 +08:00
Yueyang Zhan	feeda7230a	[Enhancement](storage engine) avoid deleting tablets on unused disk (#19010 )	2023-05-04 15:15:43 +08:00
Xinyi Zou	e17a171a3c	[fix](vertical_compaction) Fix continuous_agg_count PODArray wrong boundary judgment #19187	2023-05-04 14:50:30 +08:00
starocean999	a573e1093a	[fix](planner) insubquery should always be converted to semi or anti join (#19240 )	2023-05-04 11:16:18 +08:00
ZhangYu0123	aaf0ef741e	[fix](regression) fix inverted_index_p1 q72.sql timeout error (#19241 ) Fix inverted_index_p1 q72.sql timeout error 1、the runtime filter exeed wait time and lead to 100w * 1000w data join	2023-05-04 11:05:15 +08:00
Adonis Ling	2c1a5bb352	Revert "[chore](third-party) Fix the checksums of mysql (#19047 )" (#19189 ) This reverts commit c93d6ba3be2f2448b824d36da61835e2cd1235cd.	2023-05-04 10:09:37 +08:00
Calvin Kirs	5459cd9c30	[Improve](fe)Upgrade dependencies and optimize jar package management (#18882 ) bind netty-version to 4.1.89-final bind jettison to 1.5.4 upgrade hadoop version to 3.3.5 upgrade range-plugins-common to 2.4.0 bind bcprov-jdk15on to 2.4.0 upgrade and bind woodstox to 6.5.1 upgrade and bind kerby to 2.0.3 upgrade hudi to 0.13.0 upgrade parquet to 1.13.0 upgrade maven-source-plugin to 3.2.1 upgrade maven-assembly-plugin to 3.3.0 upgrade maven-javadoc-plugin to 3.3.2 upgrade maven-shade-plugin to 3.3.4 upgrade maven-clean-plugin to 3.1.0 Remove meaningless plugins Optimize doris maven path Unify the Java modules for management in fe	2023-05-04 10:07:37 +08:00
zzzzzzzs	ffd50b6aeb	[improvement](broker) TOperationStatus determines that a null pointer is redundant. (#18712 ) TOperationStatus determines that a null pointer is redundant. If tOperationStatus is a null pointer, then tOperationStatus.getMessage() will have a null pointer exception.	2023-05-04 10:03:09 +08:00
DuRipeng	52d25f41a4	[feature](multi-catalog) Rename multi-catalog config 'specified_database_list' to 'include_database_list', and introduce new multi-catalog config 'exclude_database_list' (#18834 ) In my scene, We need to specify databases that are excluded to synchronize to doris, like some databases store temporary table. Since #17803 introduce `specified_database_list` to specify 'include databases', this pr introduce new config `exclude_database_list` to specify 'exclude databases', and rename `specified_database_list` to `include_database_list` for naming symmetry. BTW, when `include_database_list` and `exclude_database_list` specify overlapping databases, `exclude_database_list` would take effect with higher privilege over `include_database_list`.	2023-05-04 09:30:02 +08:00
minghong	7652d8649b	[regression](nereids) check tpc-h 1G/500G/1T plan if backend_num == 1 #18848 cases in nereids_tpch_shape_sf1_p0, nereids_tpch_shape_sf500_p0 and nereids_tpch_shape_sf1000_p0 are only for one be environment	2023-05-04 08:55:06 +08:00
Yongqiang YANG	c98829c94b	[improvement](load) log time consumed by waiting flush (#19226 )	2023-05-03 17:48:13 +08:00
zhangdong	72d937ad52	[fix](auth)fix es catalog show table (#19202 )	2023-05-02 20:22:07 +08:00
Mingyu Chen	9d18be9dd3	[doc](thrift) update doc for thrift 0.16 (#19217 ) * 1 update doc for thrift 0.16	2023-05-02 16:00:10 +08:00
TsukiokaKogane	145b94531f	[Fix](load) fix request_slave_tablet_pull_rowset get wrong url in case of ipv6 address (#19026 )	2023-05-02 09:55:09 +08:00
hechao	224bca3794	[docker](hudi) add hudi docker compose (#19048 )	2023-05-02 09:54:52 +08:00
AlexYue	b0c215e694	[enhance](be)add more profile in prefetched buffered reader (#19119 )	2023-05-02 09:53:39 +08:00
Xiangyu Wang	05beb8538e	[Fix](multi-catalog) fix FE abnormal exit when replay OP_REFRESH_EXTERNAL_TABLE (#19120 ) When salve FE nodes replay OP_REFRESH_EXTERNAL_TABLE log, it will invoke `org.apache.doris.datasource.hive.HiveMetaStoreCache#invalidateTableCache`, but if the table is a non-partitioned table, it will invoke `catalog.getClient().getTable`. If some network problem occurs or this table is not existed, an exception will be thrown and FE will exit right away. The solution is that we can use a dummy key as the file cache key which only contains db name and table name. And when slave FE nodes replay OP_REFRESH_EXTERNAL_TABLE log, it will not rely on the hms client and there will not any exception occurs.	2023-05-02 09:53:20 +08:00
abmdocrt	43803940f5	[community](collaborator) add more collaborators (#19229 ) Add @TangSiyang2001 as collaborator, and he helped a lot in good first issue.	2023-05-01 23:34:06 +08:00
zhangstar333	eac61dc410	[vectorized](function) add some check about result type in array map (#19228 )	2023-05-01 16:28:11 +08:00
Yongqiang YANG	a978be32a6	[fix](schema_change) remove shadow prefix of schema for tablesink (#18822 ) LSC updates tablet's schema in writing. Be optimized adding columns via linked schema change and it distinguishes adding by comparing column name. e.g. if new column's name is not found in old schema, then it is a newly-add column. When a table is under schema-changing, it adds __doris_shadow_ prefix in name of columns in shadow index. Then writes during schema-changing would bring schema with __doris_shadow_ to be. If schema change request arrives at be after writes, then be do it as a add-column schema change due to __doris_shadow_ is not in base tablet.	2023-04-30 22:46:36 +08:00
nanfeng	da4de37dec	[feature-wip](mv lifecycle) separate life cycle of base table and its materialized views (#19210 ) support related syntax and add:regress-test case --------- Co-authored-by: yzy <yzy@nanfeng_yzy@163.com>	2023-04-30 17:42:02 +08:00
yiguolei	8eab20d3df	[bugfix](low cardinality) cached code is wrong will result wrong query result when many null pages (#19221 ) Sometimes the dict is not initialized when run comparison predicate here, for example, the full page is null, then the reader will skip read, so that the dictionary is not inited. The cached code is wrong during this case, because the following page maybe not null, and the dict should have items in the future. This will result the dict string column query return wrong result, if there are many null values in the column. I also add some regression test for dict column's equal query, larger than query, less than query. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-04-29 21:28:41 +08:00
zclllyybb	d383f1f3d7	[optimization](simd) optimize count_zero_num for ColumnNullable #19124	2023-04-29 14:50:39 +08:00
wangbo	f2b15c03ca	[fix]disable enable_resource_group for regression test (#19206 ) When running regression test with setting enable_resource_group = true, it's shared by other test case, may be cause regression test failed. So we should not set it to true until we have fully test it.	2023-04-29 14:47:50 +08:00
Mingyu Chen	8c6ccc092a	[fix](test) fix 2 unstable test (#19220 )	2023-04-29 14:42:47 +08:00
Mingyu Chen	fc3728c6ab	[fix](dynamic-partition) create HOUR unit partition with DATEV2 throw exception (#19213 ) Need to forbid create HOUR unit partition with partition column type DATEV2 ``` Unexpected exception: String index out of range: 10 ```	2023-04-29 08:23:06 +08:00
Tiewei Fang	c74c2a4f8e	[fix](Metadata tvf) Metadata TVF supports read the specified columns from Fe (#19110 )	2023-04-29 00:06:08 +08:00
slothever	d006143330	[fix](multi-catalog) when endpoint has no region, need a suggestion (#19203 ) solve the problem ``` mysql> CREATE CATALOG iceberg PROPERTIES ( 'type'='iceberg', 'iceberg.catalog.type'='rest', 'uri' = 'http://0.0.0.0:8888, "AWS_ACCESS_KEY" = "admin", "AWS_SECRET_KEY" = "password", "AWS_REGION" = "us-east-1", "AWS_ENDPOINT" = "http://minio:9000" ); show databases; ERROR 1105 (HY000): IllegalArgumentException, msg: java.lang.IllegalArgumentException: The value of property fs.s3a.endpoint.region must not be null ```	2023-04-29 00:05:41 +08:00
HappenLee	4a10d146bf	[pipeline](exec) fix regression prepare failed cause query core dump (#19208 ) fix regression prepare failed cause query core dump	2023-04-28 20:46:39 +08:00
yongjinhou	bee3aa3007	be conf action supports specify item (#19159 )	2023-04-28 19:12:51 +08:00
Xinyi Zou	a324ee794c	[fix](memory) Fix Aggregation null key memory leak due to incorrect aggfunc destroy #19201	2023-04-28 18:41:41 +08:00
liujinhui	b87d21d836	[doc](spark-load)add spark load ha EN docs (#19194 ) * 15000-doc-spark-ha english doc * Update spark-load-manual.md format --------- Co-authored-by: liujh <liujh@t3go.cn> Co-authored-by: Luzhijing <82810928+luzhijing@users.noreply.github.com>	2023-04-28 18:18:42 +08:00
zgxme	fd3c132d91	[enhancement](test) split large data of p2 cases (#19186 )	2023-04-28 18:18:25 +08:00
Xinyi Zou	1379d7f3e0	[fix](memory) mmap threshold can be modified in conf, Increase to 128M	2023-04-28 18:17:22 +08:00

1 2 3 4 5 ...

10283 Commits