doris

Author	SHA1	Message	Date
Pxl	0e9dd348fb	[Improvment](materialized-view) add short circuit for selectBestMV #23743	2023-09-01 14:46:54 +08:00
Ashin Gau	eaf2a6a80e	[fix](date) return right date value even if out of the range of date dictionary(#23664 ) PR(https://github.com/apache/doris/pull/22360) and PR(https://github.com/apache/doris/pull/22384) optimized the performance of date type. However hive supports date out of 1970~2038, leading wrong date value in tpcds benchmark. How to fix: 1. Increase dictionary range: 1900 ~ 2038 2. The date out of 1900 ~ 2038 is regenerated.	2023-09-01 14:40:20 +08:00
morrySnow	5b2360e836	[opt](planner) speed up computeColumnsFilter on ScanNode (#23742 ) computeColumnsFilter compute filter on all table base schema's column. However, it table is very wide, such as 5000 columns. It will take a long time. This PR compare conjuncts size and columns size. If conjuncts size is small than columns size, then collect slots from conjuncts to avoid traverse all columns.	2023-09-01 14:22:17 +08:00
Calvin Kirs	e88c218390	[Improve](Job)Job internal interface provides immediate scheduling (#23735 ) Delete meaningless job status System scheduling is executed in the time wheel Optimize window calculation code	2023-09-01 12:50:08 +08:00
AlexYue	c31cb5fd11	[enhance] use correct default value for show config action (#19284 )	2023-09-01 11:28:26 +08:00
AlexYue	d96bc2de1a	[enhance](policy) Support to change table's storage policy if the two policy has same resource (#23665 )	2023-09-01 11:25:27 +08:00
Jibing-Li	d6450a3f1c	[Fix](statistics)Fix external table auto analyze bugs (#23574 ) 1. Fix auto analyze external table recursively load schema cache bug. 2. Move some function in StatisticsAutoAnalyzer class to TableIf. So that external table and internal table could implement the logic separately. 3. Disable external catalog auto analyze by default, could open it by adding catalog property "enable.auto.analyze"="true"	2023-09-01 10:58:14 +08:00
Jibing-Li	9a7e8b298a	[Improvement](statistics)Show column stats even when error occurred (#23703 ) Before, show column stats will ignore column with error. In this pr, when min or max value failed to deserialize, show column stats will use N/A as value of min or max, and still show the rest stats. (count, null_count, ndv and so on).	2023-09-01 10:57:37 +08:00
morrySnow	b93a1a83a5	[opt](Nereids) let keywords list same with legacy planner (#23632 )	2023-09-01 10:24:30 +08:00
airborne12	e1090d6a63	[Fix](column predicate) seperate CHAR primitive type for column predicate (#23581 )	2023-09-01 09:41:53 +08:00
hzq	16d6357266	[fix] (mac compile) Fix mac compile error & fe start time related (#23727 ) Fix of PR #23582 Some Fe codes are deleted by [Improvement](pipeline) Cancel outdated query if original fe restarts #23582 , need to be added back; Fix mac build failed caused by wrong thrift declaration order.	2023-09-01 08:02:30 +08:00
i78086	b16ab0bff7	[Docs] (maint-monitor) when config automatic-service-start, we need config JAVA_HOME in the fe.conf and be.conf firstly (#23610 )	2023-09-01 08:01:12 +08:00
Gabriel	65f41f71c1	[pipelineX](refactor) refine codes (#23726 )	2023-09-01 07:57:35 +08:00
bobhan1	d0e906f329	[Docs](alter partition) Fix the docs of adding default partition (#23705 ) according to https://github.com/apache/doris/pull/15509, add a default list partition don't need the keyword `DEFAULT`	2023-09-01 00:20:12 +08:00
DongLiang-0	6b4d1c2d86	[Doc](flink connector) Add new configuration be nodes (#23698 )	2023-09-01 00:16:08 +08:00
mch_ucchi	52e645abd2	[Feature](Nereids): support cte for update and delete statements of Nereids (#23384 )	2023-08-31 23:36:27 +08:00
HappenLee	c74ca15753	[pipeline](sink) Supprt Async Writer Sink of result file sink and memory scratch sink (#23589 )	2023-08-31 22:44:25 +08:00
DongLiang-0	b763bfa17d	[Doc](tvf)Added tvf support for reading documents from avro files (#23436 )	2023-08-31 21:49:27 +08:00
DongLiang-0	72fef48f87	[Doc](flink-connector)Flink connector adds schema change related parameter documents (#23439 )	2023-08-31 21:48:27 +08:00
daidai	e680d42fe7	[feature](information_schema)add metadata_name_ids for quickly get catlogs,db,table and add profiling table in order to Compatible with mysql (#22702 ) add information_schema.metadata_name_idsfor quickly get catlogs,db,table. 1. table struct : ```mysql mysql> desc internal.information_schema.metadata_name_ids; +---------------+--------------+------+-------+---------+-------+ \| Field \| Type \| Null \| Key \| Default \| Extra \| +---------------+--------------+------+-------+---------+-------+ \| CATALOG_ID \| BIGINT \| Yes \| false \| NULL \| \| \| CATALOG_NAME \| VARCHAR(512) \| Yes \| false \| NULL \| \| \| DATABASE_ID \| BIGINT \| Yes \| false \| NULL \| \| \| DATABASE_NAME \| VARCHAR(64) \| Yes \| false \| NULL \| \| \| TABLE_ID \| BIGINT \| Yes \| false \| NULL \| \| \| TABLE_NAME \| VARCHAR(64) \| Yes \| false \| NULL \| \| +---------------+--------------+------+-------+---------+-------+ 6 rows in set (0.00 sec) mysql> select * from internal.information_schema.metadata_name_ids where CATALOG_NAME="hive1" limit 1 \G; ************************* 1. row ************************* CATALOG_ID: 113008 CATALOG_NAME: hive1 DATABASE_ID: 113042 DATABASE_NAME: ssb1_parquet TABLE_ID: 114009 TABLE_NAME: dates 1 row in set (0.07 sec) ``` 2. when you create / drop catalog , need not refresh catalog . ```mysql mysql> select count() from internal.information_schema.metadata_name_ids\G; ************************ 1. row ************************* count(): 21301 1 row in set (0.34 sec) mysql> drop catalog hive2; Query OK, 0 rows affected (0.01 sec) mysql> select count() from internal.information_schema.metadata_name_ids\G; ************************* 1. row ************************* count(): 10665 1 row in set (0.04 sec) mysql> create catalog hive3 ... mysql> select count() from internal.information_schema.metadata_name_ids\G; ************************* 1. row ************************* count(): 21301 1 row in set (0.32 sec) ``` 3. create / drop table , need not refresh catalog . ```mysql mysql> CREATE TABLE IF NOT EXISTS demo.example_tbl ... ; mysql> select count() from internal.information_schema.metadata_name_ids\G; ************************* 1. row ************************* count(): 10666 1 row in set (0.04 sec) mysql> drop table demo.example_tbl; Query OK, 0 rows affected (0.01 sec) mysql> select count() from internal.information_schema.metadata_name_ids\G; ************************* 1. row ************************* count(): 10665 1 row in set (0.04 sec) ``` 4. you can set query time , prevent queries from taking too long . ``` fe.conf : query_metadata_name_ids_timeout the time used to obtain all tables in one database ``` 5. add information_schema.profiling in order to Compatible with mysql ```mysql mysql> select from information_schema.profiling; Empty set (0.07 sec) mysql> set profiling=1; Query OK, 0 rows affected (0.01 sec) ```	2023-08-31 21:22:26 +08:00
Mryange	6fe2418cfc	[fix](filter) fix error id in bloomfilter (#23564 ) 1. "set" may overwrite the original ID. 2.A bloom filter may not necessarily be an IN_OR_BLOOM_FILTER. before may be RuntimeFilterInfo id -1: [type = BF, input = 25, filtered = 0] now RuntimeFilterInfo id 0: [type = BF, input = 25, filtered = 0]	2023-08-31 21:12:09 +08:00
plat1ko	25b6e4deb2	[fix](daemon) Fix incorrect initialization order of daemon services (#23578 ) Current initialization dependency: Daemon ───┬──► StorageEngine ──► ExecEnv ──► Disk/Mem/CpuInfo │ │ BackendService ─┘ However, original code incorrectly initialize Daemon before StorageEngine. This PR also stop and join threads of daemon services in their dtor, to ensure Daemon services release resources in reverse order of initialization via RAII.	2023-08-31 19:46:38 +08:00
Kaijie Chen	b3a9c247af	[refactor](move-memtable) add load stream stub (#23642 )	2023-08-31 19:39:34 +08:00
morrySnow	b5e8217743	[opt](Nereids) speed up deepEquals of TreeNode (#23710 )	2023-08-31 19:38:44 +08:00
zhangstar333	3a34ec95af	[FE](fucntion) add date_floor/ceil in FE function (#23539 )	2023-08-31 19:26:47 +08:00
zhangguoqiang	e54cd6a35d	[fix](regression)fix case test_outfile_orc_max_file_size by replace table_export_name #23648 fix case test_outfile_orc_max_file_size by replace table_export_name	2023-08-31 18:51:13 +08:00
Xinyi Zou	f1e43fcaa4	[opt](cache) Support segment cache dynamic opening and closing (#23659 ) Dynamically modify the config to clear the cache, each time the disable cache will only be cleared once. TODO, Support page cache and other caches. curl -X POST http://xxxx:8040/api/update_config?disable_segment_cache=true	2023-08-31 18:48:26 +08:00
Mingyu Chen	3a2c0d16f7	[fix](parquet) fix potential heap-use-after-free issue and cache issue (#23638 ) 1. When file meta cache is disabled (by setting `max_external_file_meta_cache_num=0` in be.conf), the parquet's meta info is owned by parquet reader and will be released when calling `reader->close()`. But the underlying file reader of this parquet reader will be released after `reader->close()`, this may causing `heap-use-after-free` bug because some part of meta info may be referenced by file reader. This PR fix it by making sure that meta info is released after file reader released. 2. Add modification time for file meta cache in BE, to avoid parquet read error like: `Failed to deserialize parquet page header`	2023-08-31 18:23:05 +08:00
morrySnow	da5c78019c	[opt](fe-ui) support read hardware info from aarch64 MacOS (#23708 ) update the version of oshi and jna to support read hardware info from aarch64 MacOS	2023-08-31 18:16:33 +08:00
hzq	c083336bbe	[Improvement](pipeline) Cancel outdated query if original fe restarts (#23582 ) If any FE restarts, queries that is emitted from this FE will be cancelled. Implementation of #23704	2023-08-31 17:58:52 +08:00
bobhan1	f214485733	[fix](regression) try fix regression test no_await (#23661 )	2023-08-31 16:22:51 +08:00
abmdocrt	cb2515b7c8	[Fix](meta lock) Should not acquire wlock twice (#23666 )	2023-08-31 15:53:35 +08:00
starocean999	7379cdc995	[feature](nereids) support subquery in select list (#23271 ) 1. add scalar subquery's output to LogicalApply's output 2. for in and exists subquery's, add mark join slot into LogicalApply's output 3. forbid push down alias through join if the project list have any mark join slots. 4. move normalize aggregate rule to analysis phase	2023-08-31 15:51:32 +08:00
TengJianPing	62c075bf7e	[improvement](Block) Replace Block(const PBlock&) with deserialize because it has heavy operations in ctor (#23672 )	2023-08-31 14:44:17 +08:00
Gabriel	409640ac46	[Bug](decimal) Prevent invalid decimal value (#23677 )	2023-08-31 14:43:10 +08:00
Xiangyu Wang	126606cb4d	[Fix](cache) fix query cache returns wrong result after deleting partitions. (#23555 ) The reason is that sql cache just use partitionKey , latestVersion and latestTime to check if the cache should be returned, if we delete some partition(s) which is not the latest updated partition, all above values are not changed, so the cache will hit. Use a field to save the partition num of these tables and sum the partition nums and send it to BE, there are two situations which contains delete-partition ops: - just delete some partition(s), so the sum of partition num will be lower than before. - delete some partition(s) coexists with add some partition(s), so the latest time or latest version will be higher than before.	2023-08-31 14:22:52 +08:00
bobhan1	46eb0c7796	[Fix](status) fix printing too many logs in VNodeChannel::try_send_and_fetch_status #23693 after #23425, Status::InternalError(...) will print stacktrace and warning logs, so we can't use it in VNodeChannel::try_send_and_fetch_status	2023-08-31 13:54:23 +08:00
Gabriel	d22290e548	[pipelineX](join) support hash join (#23689 )	2023-08-31 13:01:26 +08:00
Pxl	f35ab37e1e	[Bug](materialized-view) fix load db use analyzer to analyze diffrent metaindex (#23673 ) fix load db use analyzer to analyze diffrent metaindex	2023-08-31 12:35:38 +08:00
starocean999	41c5e00071	[fix](planner)fix bug of resolve column (#23512 ) if resolve a inline view column failed, we try to resolve it again by removing the table name. But it's wrong if the table name(may be the inlineview's alias) is same as some table name inside inlineview. So this pr check the table name, and only remove it when there is no table inside the inlineview has the same name with the column's table name	2023-08-31 12:25:26 +08:00
morrySnow	897151fc2b	[fix](Nereids) set operation syntax is not compatible with legacy planner (#23668 ) for example ```sql WITH A AS (SELECT * FROM B) SELECT * FROM C UNION SELECT * FROM D ``` the scope of CTE in Nereids is the first set oeprand. the scope of CTE in legacy planner is the whole statement.	2023-08-31 11:55:35 +08:00
airborne12	ab85fb3592	[Fix](PhysicalPlanTranslator) forget setPushDownAggNoGrouping in OlapScanNode (#23675 ) * [Fix](PhysicalPlanTranslator) forget setPushDownAggNoGrouping in OlapScanNode * use relation id instead of table id	2023-08-31 11:49:55 +08:00
zxealous	08b4977d44	[fix](deploy) fix deploy manager can't drop node (#23667 )	2023-08-31 10:53:34 +08:00
zy-kkk	618f12115f	[fix](chore) fix env.sh build on macOS (#23676 )	2023-08-31 10:32:59 +08:00
zy-kkk	3e4ee3c1e6	[fix](jdbc catalog) fix jdbc driver cache load error (#23656 ) log error: `W20230830 11:19:47.495721 3046231 status.h:363] meet error status: [INTERNAL_ERROR]user function's name should be function_id.checksum[.file_name].file_type, now the all split parts are by delimiter(.): 7119053928154065546.20c8228267b6c9ce620fddb39467d3eb.postgresql-42.5.0.jar` When the jdbc driver had `.` in its name we failed to split it properly	2023-08-31 10:17:15 +08:00
Mryange	96c4471b4a	[feature](udf) udf array/map support decimal and update doc (#23560 ) * update * decimal * update table name * remove log * add log	2023-08-31 07:44:18 +08:00
Dongyang Li	a1505d25ce	[fix](case) drop storage policy before drop resource (#23669 ) Co-authored-by: stephen <hello-stephen@qq.com>	2023-08-30 21:00:28 +08:00
Pxl	7f4f39551a	[Bug](materialized-view) fix change base schema when create mv (#23607 ) * fix change base schema when create mv * fix * fix	2023-08-30 21:00:12 +08:00
Jibing-Li	4f26750c91	[Improvement](statistics)Disable file cache while running analysis tasks. (#23663 ) Disable file cache while running analysis tasks. Analyze tasks are background tasks, shouldn't affect user local cache data.	2023-08-30 20:50:12 +08:00
Ashin Gau	449c595f9d	[opt](FileReader) InMemoryReader is only used in s3 (#23486 ) If file size < 8MB, the file will be read into memory, and this idea is from https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md#s3inmemoryinputstream. However, in some cases, we only read one or two columns in a file, and the actually required bytes is only 1%, resulting in a multiple fold increase in the amount of data read. Therefore, `InMemoryReader` can only used in object storage, and reduce the threshold.	2023-08-30 20:43:39 +08:00

1 2 3 4 5 ...

13073 Commits