doris

Author	SHA1	Message	Date
Tiewei Fang	e78149cb65	[Enhencement](Export) add property for outfile/export and add test (#18997 ) This pr does three things: 1. add `delete_existing_files` property for outfile/export. If `delete_existing_files = true`, export/outfile will delete all files under file_path first. 2. add p2 test for export 3. modify docs	2023-05-08 14:02:20 +08:00
Ashin Gau	05c5c5949c	[refactor](FileCache) set FE session variable enable_file_cache=false as default (#19327 ) Users should set `enable_file_cache=true` in FE session variables and BE configuration to enable file cache.	2023-05-08 13:53:51 +08:00
Mingyu Chen	fb5b3029a7	[fix](meta) fix image file checksum error (#19363 )	2023-05-08 10:00:09 +08:00
yongkang.zhong	32273a7a9b	[improvement](backend)Optimized error messages for insufficient replication (#19211 ) optimized the error message for creating insufficient table replications	2023-05-07 20:45:21 +08:00
Mingyu Chen	abc73ac1eb	[refactor](cluster)(step-1) remove cluster related stmt (#19355 ) * [refactor](cluster)(step-1) remove cluster stmt	2023-05-07 18:44:42 +08:00
Yusheng Xu	9edbfa37cd	[Enhancement](Broker Load) New progress manager for showing loading progress status (#19170 ) This work is in the early stage, current progress is not accurate because the scan range will be too large for gathering information, what's more, only file scan node and import job support new progress manager ## How it works for example, when we use the following load query: ``` LOAD LABEL test_broker_load ( DATA INFILE("XXX") INTO TABLE `XXX` ...... ) ``` Initial Progress: the query will call `BrokerLoadJob` to create job, then `coordinator` is called to calculate scan range and its location. Update Progress: BE will report runtime_state to FE and FE update progress status according to jobID and fragmentID we can use `show load` to see the progress PENDING: ``` State: PENDING Progress: 0.00% ``` LOADING: ``` State: LOADING Progress: 14.29% (1/7) ``` FINISH: ``` State: FINISHED Progress: 100.00% (7/7) ``` At current time, full output of `show load\G` looks like: ``` ************************* 1. row ************************* JobId: 25052 Label: test_broker State: LOADING Progress: 0.00% (0/7) Type: BROKER EtlInfo: NULL TaskInfo: cluster:N/A; timeout(s):250000; max_filter_ratio:0.0 ErrorMsg: NULL CreateTime: 2023-05-03 20:53:13 EtlStartTime: 2023-05-03 20:53:15 EtlFinishTime: 2023-05-03 20:53:15 LoadStartTime: 2023-05-03 20:53:15 LoadFinishTime: NULL URL: NULL JobDetails: {"Unfinished backends":{"5a9a3ecd203049bc-85e39a765c043228":[10080]},"ScannedRows":39611808,"TaskNumber":1,"LoadBytes":7398908902,"All backends":{"5a9a3ecd203049bc-85e39a765c043228":[10080]},"FileNumber":1,"FileSize":7895697364} TransactionId: 14015 ErrorTablets: {} User: root Comment: ``` ## TODO: 1. The current partition granularity of scan range is too large, resulting in an uneven loading process for progress." 2. Only broker load supports the new Progress Manager, support progress for other query	2023-05-06 22:44:40 +08:00
yongkang.zhong	2fe9ba7c2a	[fix](jdbc catalog) fix trino jdbc catalog varchar type err (#19298 )	2023-05-06 17:16:28 +08:00
Gabriel	4c6ca88088	Revert "[refactor](function) ignore DST for function `from_unixtime` (#19151 )" (#19333 ) This reverts commit 9dd6c8f87b73db238bfd38fb1d76f3796910f398.	2023-05-06 16:33:58 +08:00
ElvinWei	3f6e5118e6	[enchancement](statistics) support periodic collection of statistics (#19247 ) This PR enables periodic collection of statistics and is a precursor to automatic statistics collection. It mainly includes the following contents： support periodic collection of statistics. Change the type of Date in statistics p0 to DateV2(see [Enhancement](data-type) add FE config to prohibit create date and decimalv2 type #19077) for test locally. complement cases(remove Chinese characters, optimize code, etc) , improve stability. Supports setting whether to keep records of statistics synchronization job info, convenient for use in p0 testing. The statistics job table was modified, and some auxiliary judgments were added to avoid the user perceiving the modification. This function was removed when the table schema is stable.	2023-05-06 14:53:06 +08:00
Luwei	3287f350de	[feature](table) implement the round robin selection be when create tablet (#19167 )	2023-05-06 14:46:48 +08:00
Yulei-Yang	ff6e0d3943	[Improvement](meta) support return no partition info for show_create_table (#19030 ) Some tables have a mount of partitions, when use show create table stmt on them, you will get so many lines of result that a whole screen cannot show them all, even if you scroll up to the top. show create table table2; \| table2 \| CREATE TABLE `table2` ( `k1` int(11) NULL COMMENT 'test column k1', `k2` int(11) NULL COMMENT 'test column k2' ) ENGINE=OLAP DUPLICATE KEY(`k1`, `k2`) COMMENT 'test table1' PARTITION BY RANGE(`k1`) (PARTITION p01 VALUES [("-2147483648"), ("10")), PARTITION p02 VALUES [("10"), ("100"))) DISTRIBUTED BY HASH(`k1`) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "storage_format" = "V2", "light_schema_change" = "true", "disable_auto_compaction" = "false" ); show brief create table table2; \| table2 \| CREATE TABLE `table2` ( `k1` int(11) NULL COMMENT 'test column k1', `k2` int(11) NULL COMMENT 'test column k2' ) ENGINE=OLAP DUPLICATE KEY(`k1`, `k2`) COMMENT 'test table1' DISTRIBUTED BY HASH(`k1`) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "storage_format" = "V2", "light_schema_change" = "true", "disable_auto_compaction" = "false" ); \|	2023-05-06 14:45:08 +08:00
AKIRA	bd23db762d	[minor](stats) Add doc for stats framework (#19311 )	2023-05-06 13:30:55 +08:00
plat1ko	cdfbfd1f6b	[fix](replica) Fix inconsistent replica id between FE and BE (#18688 )	2023-05-06 11:06:29 +08:00
starocean999	a72eee24f1	[fix](nereids) fix merge project with window function bug (#19280 ) 1. don't merge projects if any window function exists 2. bypass SimplifyArithmeticRule for decimalV3 type	2023-05-06 10:38:14 +08:00
Mingyu Chen	42bac3343d	[Refactor](StmtExecutor)(step-1) Extract profile logic from StmtExecutor and Coordinator (#19219 ) Previously, we use RuntimeProfile class directly, and because there are multiple level in profile, so you can see there may be several RuntimeProfile instances be to maintain. I created several new classes for profile: class Profile: The root profile of a execution task(query or load) class SummaryProfile: The profile that contains summary info of a execution task, such as start time, end time, query id. etc. class ExecutionProfile: The profile for a single Coordinator. Each Coordinator will have a ExecutionProfile. The profile structure is as following: Profile: SummaryProfile: ExecutionProfile 1: Fragment 0: Instance 0: Instance 1: ... Fragment 1: ... ExecutionProfile 2: ... You can see, each Profile has a SummaryProfile and one or more ExecutionProfile. For most kinds of job, such as query/insert, there is only one ExecutionProfile. But for broker load job, will may be more than one ExecutionProfile, corresponding to each sub task of the load job. How to use For query/insert, etc: Each StmtExcutor will have a Profile instance. Each Coordinator will have a ExecutionProfile instance. StmtExcutor is responsible for the SummaryProfile, it will update the SummaryProfile during the execution. Coordinator is responsible for the ExecutionProfile, it will first add ExecutionProfile to the child of Profile, and update the ExecutionProfile periodically during the execution. For Load/Export, etc: Each job will hava a Profile instance. For each Coordinator of this job, add its ExecutionProfile to the children of job's Profile. Behavior Change The columns of show load profile/show query profile and QueryProfile Web UI has changed to: \| Profile ID \| Task Type \| Start Time \| End Time \| Total \| Task State \| User \| Default Db\| Sql Statement \| Is Cached \| Total Instances Num \| Instances Num Per BE \| Parallel Fragment Exec Instance Num \| Trace ID \| The Query Id and Job Id is removed and using Profile ID instead. For load job, the profile id is job id, for query/insert, is query id.	2023-05-06 09:01:51 +08:00
Jibing-Li	5210c04241	[Refactor](ScanNode) Split interface refactor (#19133 ) Move getSplits function to ScanNode, remove Splitter interface. For each kind of data source, create a specific ScanNode and implement the getSplits interface. For example, HiveScanNode. Remove FileScanProviderIf move the code to each ScanNode.	2023-05-05 23:20:29 +08:00
jakevin	159344792f	[enhance](Nereids) make getExplorationRule static (#19278 ) make getExplorationRule static to avoid new ArrayList() multiple times.	2023-05-05 19:58:24 +08:00
starocean999	3e3262361c	[fix](fe)havingClause should be substituted the same way as resultExprs (#19261 ) substituted havingClause in the same way as resultExprs to prevent " HAVING clause not produced by aggregation output" error	2023-05-05 18:03:43 +08:00
slothever	96d729f719	[refactor](fs)(step3)use filesystem instead of old storage, new storage just access remote object storage (#19098 ) see #18960 PR1: add new storage file system template and move old storage to new package PR2: extract some method in old storage to new file system. PR3: use storages to access remote object storage, and use file systems to access file in local or remote location. Will add some unit tests. this is PR3.	2023-05-05 16:20:20 +08:00
Mingyu Chen	70236adc1f	[Refactor](doc)(config)(variable) use script to generate doc for FE config and session variables (#19246 ) The document of configs(FE and BE) and session variables is hard to maintain. Because developer need to modify both code and document. And you can see that some of config's document is missing. So I plan to write the document of config or variables directly in code, and using script to generate document automatically. How To This CL mainly changes: Add field in Config and Session Variables' annaotion description: The description of the config or variable item. It is a String array. And first element is in Chinese, second is in English options: the valid options if the config or variable is enum. Add a scripts docs/generate-config-and-variable-doc.sh Simple run sh docs/generate-config-and-variable-doc.sh and it will generate docs of FE config and variables, And save it under docs/admin-manual/config/fe-config.md and docs/advanced/variables.md, both in Chinese and in English. And there are template markdowns for this script to read and replace with real doc content. TODO Too many description need to be filled. I will finish them in next PR. And now the origin doc remain unchanged. Find a way to check the description field of config and variables, to make sure we won't missing it. Generate doc for BE config.	2023-05-05 14:42:43 +08:00
Ashin Gau	b6c7f3aeb8	[opt](FileCache) Add file cache metrics and management (#19177 ) Add file cache metrics and management. 1. Get file cache metrics > If the performance of file cache is not efficient, there are currently no metrics to investigate the cause. In practice, hit ratio, disk usage, and segments removed status are very important information. API: `http://be_host:be_webserver_port/metrics` File cache metrics for each base path start with `doris_be_file_cache_` prefix. `hits_ratio` is the hit ratio of the cache since BE startup; `removed_elements` is the num of removed segment files since BE startup; Every cache path has three queues: index, normal and disposable. The capacity ratio of the three queues is 1:17:2. ``` doris_be_file_cache_hits_ratio{path="/mnt/datadisk1/gaoxin/file_cache"} 0.500000 doris_be_file_cache_hits_ratio{path="/mnt/datadisk1/gaoxin/small_file_cache"} 0.500000 doris_be_file_cache_removed_elements{path="/mnt/datadisk1/gaoxin/file_cache"} 0 doris_be_file_cache_removed_elements{path="/mnt/datadisk1/gaoxin/small_file_cache"} 0 doris_be_file_cache_normal_queue_max_size{path="/mnt/datadisk1/gaoxin/file_cache"} 912680550400 doris_be_file_cache_normal_queue_max_size{path="/mnt/datadisk1/gaoxin/small_file_cache"} 8500000000 doris_be_file_cache_normal_queue_max_elements{path="/mnt/datadisk1/gaoxin/file_cache"} 217600 doris_be_file_cache_normal_queue_max_elements{path="/mnt/datadisk1/gaoxin/small_file_cache"} 102400 doris_be_file_cache_normal_queue_curr_size{path="/mnt/datadisk1/gaoxin/file_cache"} 14129846 doris_be_file_cache_normal_queue_curr_size{path="/mnt/datadisk1/gaoxin/small_file_cache"} 14874904 doris_be_file_cache_normal_queue_curr_elements{path="/mnt/datadisk1/gaoxin/file_cache"} 18 doris_be_file_cache_normal_queue_curr_elements{path="/mnt/datadisk1/gaoxin/small_file_cache"} 22 ... ``` 2. Release file cache > Frequent segment files swapping can seriously affect the performance of file cache. Adding a deletion interface helps users clean up the file cache. API: `http://be_host:be_webserver_port/api/file_cache?op=release&base_path=${file_cache_base_path}` Return the number of released segment files. If `base_path` is not provide in url, all cache paths will be released. It's thread-safe to call this api, so only the segment files not been read currently can be released. ``` {"released_elements":22} ``` 3. Specify the base path to store cache data > Currently, regression testing lacks test cases of file cache, which cannot guarantee the stability of file cache. This interface is generally used in regression testing scenarios. Different queries use different paths to verify different usage cases and performance. User can set session variable `file_cache_base_path` to specify the base path to store cache data. `file_cache_base_path="random"` as default, means chosing a random path from cached paths to store cache data. If `file_cache_base_path` is not one of the base paths in BE configuration, a random path is used.	2023-05-05 14:28:01 +08:00
Gabriel	9dd6c8f87b	[refactor](function) ignore DST for function `from_unixtime` (#19151 )	2023-05-05 11:51:49 +08:00
奕冷	1a1aee3886	[fix](load) exclude canceled job when canceling load (#19268 )	2023-05-05 10:31:16 +08:00
xiaojunjie	9813406757	[Enhancement](HttpServer) Add http interface authentication for BE (#17753 )	2023-05-04 23:46:49 +08:00
xy720	4b85c2738e	[bug](function)fix potential npe in getFunction() when fe restart (#18989 ) fix potential npe in getFunction() when fe restart	2023-05-04 23:45:22 +08:00
Yongqiang YANG	fa7d86efbd	[improvement](log) log timeout seconds when creating partitions timeout (#19223 )	2023-05-04 17:18:42 +08:00
starocean999	a573e1093a	[fix](planner) insubquery should always be converted to semi or anti join (#19240 )	2023-05-04 11:16:18 +08:00
zzzzzzzs	ffd50b6aeb	[improvement](broker) TOperationStatus determines that a null pointer is redundant. (#18712 ) TOperationStatus determines that a null pointer is redundant. If tOperationStatus is a null pointer, then tOperationStatus.getMessage() will have a null pointer exception.	2023-05-04 10:03:09 +08:00
DuRipeng	52d25f41a4	[feature](multi-catalog) Rename multi-catalog config 'specified_database_list' to 'include_database_list', and introduce new multi-catalog config 'exclude_database_list' (#18834 ) In my scene, We need to specify databases that are excluded to synchronize to doris, like some databases store temporary table. Since #17803 introduce `specified_database_list` to specify 'include databases', this pr introduce new config `exclude_database_list` to specify 'exclude databases', and rename `specified_database_list` to `include_database_list` for naming symmetry. BTW, when `include_database_list` and `exclude_database_list` specify overlapping databases, `exclude_database_list` would take effect with higher privilege over `include_database_list`.	2023-05-04 09:30:02 +08:00
zhangdong	72d937ad52	[fix](auth)fix es catalog show table (#19202 )	2023-05-02 20:22:07 +08:00
Xiangyu Wang	05beb8538e	[Fix](multi-catalog) fix FE abnormal exit when replay OP_REFRESH_EXTERNAL_TABLE (#19120 ) When salve FE nodes replay OP_REFRESH_EXTERNAL_TABLE log, it will invoke `org.apache.doris.datasource.hive.HiveMetaStoreCache#invalidateTableCache`, but if the table is a non-partitioned table, it will invoke `catalog.getClient().getTable`. If some network problem occurs or this table is not existed, an exception will be thrown and FE will exit right away. The solution is that we can use a dummy key as the file cache key which only contains db name and table name. And when slave FE nodes replay OP_REFRESH_EXTERNAL_TABLE log, it will not rely on the hms client and there will not any exception occurs.	2023-05-02 09:53:20 +08:00
Yongqiang YANG	a978be32a6	[fix](schema_change) remove shadow prefix of schema for tablesink (#18822 ) LSC updates tablet's schema in writing. Be optimized adding columns via linked schema change and it distinguishes adding by comparing column name. e.g. if new column's name is not found in old schema, then it is a newly-add column. When a table is under schema-changing, it adds __doris_shadow_ prefix in name of columns in shadow index. Then writes during schema-changing would bring schema with __doris_shadow_ to be. If schema change request arrives at be after writes, then be do it as a add-column schema change due to __doris_shadow_ is not in base tablet.	2023-04-30 22:46:36 +08:00
nanfeng	da4de37dec	[feature-wip](mv lifecycle) separate life cycle of base table and its materialized views (#19210 ) support related syntax and add:regress-test case --------- Co-authored-by: yzy <yzy@nanfeng_yzy@163.com>	2023-04-30 17:42:02 +08:00
Mingyu Chen	fc3728c6ab	[fix](dynamic-partition) create HOUR unit partition with DATEV2 throw exception (#19213 ) Need to forbid create HOUR unit partition with partition column type DATEV2 ``` Unexpected exception: String index out of range: 10 ```	2023-04-29 08:23:06 +08:00
Tiewei Fang	c74c2a4f8e	[fix](Metadata tvf) Metadata TVF supports read the specified columns from Fe (#19110 )	2023-04-29 00:06:08 +08:00
slothever	d006143330	[fix](multi-catalog) when endpoint has no region, need a suggestion (#19203 ) solve the problem ``` mysql> CREATE CATALOG iceberg PROPERTIES ( 'type'='iceberg', 'iceberg.catalog.type'='rest', 'uri' = 'http://0.0.0.0:8888, "AWS_ACCESS_KEY" = "admin", "AWS_SECRET_KEY" = "password", "AWS_REGION" = "us-east-1", "AWS_ENDPOINT" = "http://minio:9000" ); show databases; ERROR 1105 (HY000): IllegalArgumentException, msg: java.lang.IllegalArgumentException: The value of property fs.s3a.endpoint.region must not be null ```	2023-04-29 00:05:41 +08:00
Zhengguo Yang	43e70ab252	[chore](recover) add a config to recover remaining data in emergency (#18986 )	2023-04-28 17:42:00 +08:00
yixiutt	aef9355cd3	[feature-wip](partial update) PART1: support basic partial write (#17542 )	2023-04-28 17:17:57 +08:00
ElvinWei	718297d3c1	[test](statistics) add p0 test of sampling statistics (#19176 ) 1. Added test p0 for sampling collection statistics 2. Modify the uniqueKeys of table analysis_jobs for deletion based on relevant conditions 3. Solve the problem that incremental statistics p0 is less stable	2023-04-28 15:50:05 +08:00
starocean999	f0852f2ac9	[fix](fe)fix bug if left table is empty and there are multiple right tables need do bucket shuffle to left side (#19169 ) * [fix](fe)fix bug if left table is empty and there are multiple right tables need do bucket shuffle to left side * fix bug * fix test cases	2023-04-28 15:06:38 +08:00
WenYao	5e9c0c3500	[Enhancement](data-type) add FE config to prohibit create date and decimalv2 type (#19077 ) * prohibits date and decimal type * add config in test	2023-04-28 11:31:51 +08:00
Tiewei Fang	86be6d27e7	[Enhencement](Cancel Export) Cancel export support to cancel IN_QUEUE state export job (#19058 )	2023-04-28 09:27:23 +08:00
jakevin	a35fc02bd4	[enhance](Nereids): handle project of OuterJoin in Reorder. (#19137 )	2023-04-27 22:17:03 +08:00
morrySnow	0f895640d9	[opt](Nereids)(WIP) optimize agg and window normalization step 1 (#19168 ) 1. move SimplifyAggGroupBy behind NormalizeAggregate 2. fix project to agg rule for the project containing window expression	2023-04-27 21:42:23 +08:00
AKIRA	7cf1ffa0b4	[fix](planner) ctas should not clone queryStmt after parse (#19114 ) Remove redundant clone in the constructor of CTAS stmt Error message: ``` NullPointerException, msg: java.lang.NullPointerException: null ```	2023-04-27 20:09:11 +08:00
starocean999	8288494e8e	[fix](planner) AnalyticEvalNode should call child's getOutputTupleIds method to get the correct output tuple id (#19163 )	2023-04-27 20:04:51 +08:00
yongkang.zhong	f3f0496b99	[feature](multi-catalog) support oceanbase jdbc catalog and jdbc external table (#18943 ) * [feature](multi-catalog) support oceanbase jdbc catalog and jdbc external table	2023-04-27 17:14:48 +08:00
AKIRA	7d89b57706	[enhancement](stats) Optimize stats pre-load logic #19138 1. Don't do pre-load until stats table gets ready 2. Don't put pre-loaded unknown stats to cache	2023-04-27 16:01:31 +08:00
morrySnow	9de2ec5aa5	[fix](Nereids) topn two phase read do not process child correct (#19136 )	2023-04-27 13:23:15 +08:00
ElvinWei	484612a0af	[opt](statistics) optimize Incremental statistics collection and statistics cleaning (#18971 ) This pr mainly optimizes the following items: - the collection of statistics: clear up invalid historical statistics before collecting them, so as not to affect the final table statistics. - the incremental collection of statistics: in the case of incremental collection, only the corresponding partition statistics need to be collected. TODO: Supports incremental collection of materialized view statistics.	2023-04-27 11:51:47 +08:00

1 2 3 4 5 ...

3268 Commits