doris

Author	SHA1	Message	Date
plat1ko	9d2fc78bd5	[fix](cooldown) Fix potential data loss when clone task's dst tablet is cooldown replica (#17644 ) Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com> Co-authored-by: Kang <kxiao.tiger@gmail.com>	2023-09-01 15:27:52 +08:00
yujun	91c5640cae	[fix](tablet clone) fix clone backend chose wrong disk (#23729 )	2023-09-01 15:12:35 +08:00
Pxl	32853a529c	[Bug](cte) fix multi cast data stream source not open expr (#23740 ) fix multi cast data stream source not open expr	2023-09-01 14:57:12 +08:00
Ashin Gau	eaf2a6a80e	[fix](date) return right date value even if out of the range of date dictionary(#23664 ) PR(https://github.com/apache/doris/pull/22360) and PR(https://github.com/apache/doris/pull/22384) optimized the performance of date type. However hive supports date out of 1970~2038, leading wrong date value in tpcds benchmark. How to fix: 1. Increase dictionary range: 1900 ~ 2038 2. The date out of 1900 ~ 2038 is regenerated.	2023-09-01 14:40:20 +08:00
AlexYue	c31cb5fd11	[enhance] use correct default value for show config action (#19284 )	2023-09-01 11:28:26 +08:00
airborne12	e1090d6a63	[Fix](column predicate) seperate CHAR primitive type for column predicate (#23581 )	2023-09-01 09:41:53 +08:00
hzq	16d6357266	[fix] (mac compile) Fix mac compile error & fe start time related (#23727 ) Fix of PR #23582 Some Fe codes are deleted by [Improvement](pipeline) Cancel outdated query if original fe restarts #23582 , need to be added back; Fix mac build failed caused by wrong thrift declaration order.	2023-09-01 08:02:30 +08:00
Gabriel	65f41f71c1	[pipelineX](refactor) refine codes (#23726 )	2023-09-01 07:57:35 +08:00
HappenLee	c74ca15753	[pipeline](sink) Supprt Async Writer Sink of result file sink and memory scratch sink (#23589 )	2023-08-31 22:44:25 +08:00
daidai	e680d42fe7	[feature](information_schema)add metadata_name_ids for quickly get catlogs,db,table and add profiling table in order to Compatible with mysql (#22702 ) add information_schema.metadata_name_idsfor quickly get catlogs,db,table. 1. table struct : ```mysql mysql> desc internal.information_schema.metadata_name_ids; +---------------+--------------+------+-------+---------+-------+ \| Field \| Type \| Null \| Key \| Default \| Extra \| +---------------+--------------+------+-------+---------+-------+ \| CATALOG_ID \| BIGINT \| Yes \| false \| NULL \| \| \| CATALOG_NAME \| VARCHAR(512) \| Yes \| false \| NULL \| \| \| DATABASE_ID \| BIGINT \| Yes \| false \| NULL \| \| \| DATABASE_NAME \| VARCHAR(64) \| Yes \| false \| NULL \| \| \| TABLE_ID \| BIGINT \| Yes \| false \| NULL \| \| \| TABLE_NAME \| VARCHAR(64) \| Yes \| false \| NULL \| \| +---------------+--------------+------+-------+---------+-------+ 6 rows in set (0.00 sec) mysql> select * from internal.information_schema.metadata_name_ids where CATALOG_NAME="hive1" limit 1 \G; ************************* 1. row ************************* CATALOG_ID: 113008 CATALOG_NAME: hive1 DATABASE_ID: 113042 DATABASE_NAME: ssb1_parquet TABLE_ID: 114009 TABLE_NAME: dates 1 row in set (0.07 sec) ``` 2. when you create / drop catalog , need not refresh catalog . ```mysql mysql> select count() from internal.information_schema.metadata_name_ids\G; ************************ 1. row ************************* count(): 21301 1 row in set (0.34 sec) mysql> drop catalog hive2; Query OK, 0 rows affected (0.01 sec) mysql> select count() from internal.information_schema.metadata_name_ids\G; ************************* 1. row ************************* count(): 10665 1 row in set (0.04 sec) mysql> create catalog hive3 ... mysql> select count() from internal.information_schema.metadata_name_ids\G; ************************* 1. row ************************* count(): 21301 1 row in set (0.32 sec) ``` 3. create / drop table , need not refresh catalog . ```mysql mysql> CREATE TABLE IF NOT EXISTS demo.example_tbl ... ; mysql> select count() from internal.information_schema.metadata_name_ids\G; ************************* 1. row ************************* count(): 10666 1 row in set (0.04 sec) mysql> drop table demo.example_tbl; Query OK, 0 rows affected (0.01 sec) mysql> select count() from internal.information_schema.metadata_name_ids\G; ************************* 1. row ************************* count(): 10665 1 row in set (0.04 sec) ``` 4. you can set query time , prevent queries from taking too long . ``` fe.conf : query_metadata_name_ids_timeout the time used to obtain all tables in one database ``` 5. add information_schema.profiling in order to Compatible with mysql ```mysql mysql> select from information_schema.profiling; Empty set (0.07 sec) mysql> set profiling=1; Query OK, 0 rows affected (0.01 sec) ```	2023-08-31 21:22:26 +08:00
Mryange	6fe2418cfc	[fix](filter) fix error id in bloomfilter (#23564 ) 1. "set" may overwrite the original ID. 2.A bloom filter may not necessarily be an IN_OR_BLOOM_FILTER. before may be RuntimeFilterInfo id -1: [type = BF, input = 25, filtered = 0] now RuntimeFilterInfo id 0: [type = BF, input = 25, filtered = 0]	2023-08-31 21:12:09 +08:00
plat1ko	25b6e4deb2	[fix](daemon) Fix incorrect initialization order of daemon services (#23578 ) Current initialization dependency: Daemon ───┬──► StorageEngine ──► ExecEnv ──► Disk/Mem/CpuInfo │ │ BackendService ─┘ However, original code incorrectly initialize Daemon before StorageEngine. This PR also stop and join threads of daemon services in their dtor, to ensure Daemon services release resources in reverse order of initialization via RAII.	2023-08-31 19:46:38 +08:00
Kaijie Chen	b3a9c247af	[refactor](move-memtable) add load stream stub (#23642 )	2023-08-31 19:39:34 +08:00
Xinyi Zou	f1e43fcaa4	[opt](cache) Support segment cache dynamic opening and closing (#23659 ) Dynamically modify the config to clear the cache, each time the disable cache will only be cleared once. TODO, Support page cache and other caches. curl -X POST http://xxxx:8040/api/update_config?disable_segment_cache=true	2023-08-31 18:48:26 +08:00
Mingyu Chen	3a2c0d16f7	[fix](parquet) fix potential heap-use-after-free issue and cache issue (#23638 ) 1. When file meta cache is disabled (by setting `max_external_file_meta_cache_num=0` in be.conf), the parquet's meta info is owned by parquet reader and will be released when calling `reader->close()`. But the underlying file reader of this parquet reader will be released after `reader->close()`, this may causing `heap-use-after-free` bug because some part of meta info may be referenced by file reader. This PR fix it by making sure that meta info is released after file reader released. 2. Add modification time for file meta cache in BE, to avoid parquet read error like: `Failed to deserialize parquet page header`	2023-08-31 18:23:05 +08:00
hzq	c083336bbe	[Improvement](pipeline) Cancel outdated query if original fe restarts (#23582 ) If any FE restarts, queries that is emitted from this FE will be cancelled. Implementation of #23704	2023-08-31 17:58:52 +08:00
abmdocrt	cb2515b7c8	[Fix](meta lock) Should not acquire wlock twice (#23666 )	2023-08-31 15:53:35 +08:00
TengJianPing	62c075bf7e	[improvement](Block) Replace Block(const PBlock&) with deserialize because it has heavy operations in ctor (#23672 )	2023-08-31 14:44:17 +08:00
Gabriel	409640ac46	[Bug](decimal) Prevent invalid decimal value (#23677 )	2023-08-31 14:43:10 +08:00
Xiangyu Wang	126606cb4d	[Fix](cache) fix query cache returns wrong result after deleting partitions. (#23555 ) The reason is that sql cache just use partitionKey , latestVersion and latestTime to check if the cache should be returned, if we delete some partition(s) which is not the latest updated partition, all above values are not changed, so the cache will hit. Use a field to save the partition num of these tables and sum the partition nums and send it to BE, there are two situations which contains delete-partition ops: - just delete some partition(s), so the sum of partition num will be lower than before. - delete some partition(s) coexists with add some partition(s), so the latest time or latest version will be higher than before.	2023-08-31 14:22:52 +08:00
bobhan1	46eb0c7796	[Fix](status) fix printing too many logs in VNodeChannel::try_send_and_fetch_status #23693 after #23425, Status::InternalError(...) will print stacktrace and warning logs, so we can't use it in VNodeChannel::try_send_and_fetch_status	2023-08-31 13:54:23 +08:00
Gabriel	d22290e548	[pipelineX](join) support hash join (#23689 )	2023-08-31 13:01:26 +08:00
Pxl	f35ab37e1e	[Bug](materialized-view) fix load db use analyzer to analyze diffrent metaindex (#23673 ) fix load db use analyzer to analyze diffrent metaindex	2023-08-31 12:35:38 +08:00
zy-kkk	3e4ee3c1e6	[fix](jdbc catalog) fix jdbc driver cache load error (#23656 ) log error: `W20230830 11:19:47.495721 3046231 status.h:363] meet error status: [INTERNAL_ERROR]user function's name should be function_id.checksum[.file_name].file_type, now the all split parts are by delimiter(.): 7119053928154065546.20c8228267b6c9ce620fddb39467d3eb.postgresql-42.5.0.jar` When the jdbc driver had `.` in its name we failed to split it properly	2023-08-31 10:17:15 +08:00
Ashin Gau	449c595f9d	[opt](FileReader) InMemoryReader is only used in s3 (#23486 ) If file size < 8MB, the file will be read into memory, and this idea is from https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md#s3inmemoryinputstream. However, in some cases, we only read one or two columns in a file, and the actually required bytes is only 1%, resulting in a multiple fold increase in the amount of data read. Therefore, `InMemoryReader` can only used in object storage, and reduce the threshold.	2023-08-30 20:43:39 +08:00
zzzzzzzs	05771e8a14	[Enhancement](Load) stream Load using SQL (#23362 ) Using stream load in SQL mode for example: example.csv 10000,北京 10001,天津 curl -v --location-trusted -u root: -H "sql: insert into test.t1(c1, c2) select c1,c2 from stream(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql curl -v --location-trusted -u root: -H "sql: insert into test.t2(c1, c2, c3) select c1,c2, 'aaa' from stream(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql curl -v --location-trusted -u root: -H "sql: insert into test.t3(c1, c2) select c1, count(1) from stream(\"format\" = \"CSV\", \"column_separator\" = \",\") group by c1" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql	2023-08-30 19:02:48 +08:00
Tiewei Fang	25b8831afd	[fix](Outfile) fix core dump when export data to orc file format using `outfile` (#23586 ) * fix * add test	2023-08-30 19:01:44 +08:00
Jerry Hu	f7caae08d5	[fix](union) should open/alloc_resource in sink operator instead of source (#23637 )	2023-08-30 18:58:59 +08:00
HappenLee	6d41272421	[Opt](pipeline) Refactor the short circuit of join pipeline (#23639 ) * [Opt](pipeline) Refactor the short circuit of join pipeline * change core by cr	2023-08-30 18:44:14 +08:00
Kaijie Chen	14310ad30b	[improvement](move-memtable) wait StreamClose from remote (#23605 ) * [fix](move-memtable) wait StreamClose from remote	2023-08-30 18:03:36 +08:00
Xinyi Zou	1ce783fb23	[fix](stacktrace) Temporary fix ARM and MacOS stacktrace #23650	2023-08-30 14:51:20 +08:00
zhangstar333	942a119881	[bug](java-udf) fix java-udf not return const column when all args are const values (#23188 ) eg: udf('asd'), this need return a const column, otherwise will be failed use the return column as other function params. mysql> select concat('a', 'b', cuuid9('a'), ':c'); ERROR 1105 (HY000): errCode = 2, detailMessage = (10.16.10.6)[CANCELLED][INTERNAL_ERROR]const check failed, expr=VectorizedFn[VectorizedFnCallconcat]{ VLiteral (name = String, type = String, value = (a)), VLiteral (name = String, type = String, value = (b)), VectorizedFn[VectorizedFnCallcuuid9] { VLiteral (name = String, type = String, value = (a))}, VLiteral (name = String, type = String, value = (:c))}	2023-08-30 10:46:47 +08:00
bobhan1	e05a0466f2	[improve](Status) Add new status code`KEY_NOT_FOUND` and `KEY_ALREADY_EXISTS` for merge on write (#23619 )	2023-08-30 08:50:07 +08:00
walter	d1dbe7bfc8	[fix](reader) fix leak in Level1Iteartor (#23612 ) _merge_next() and _normal_next() leak _cur_child when _cur_child->next() returns failure.	2023-08-29 23:32:24 +08:00
zy-kkk	030df6db35	[fix](odbc) fix odbc insert string data to sqlserve (#23364 )	2023-08-29 21:47:50 +08:00
Siyang Tang	1ac0ff0ea9	[feature](delete-predicate) support delete sub predicate v2 (#22442 ) New structure for delete sub predicate. Delete sub predicate uses a string type condition_str to stored temporarily now and fields will be extracted from it using std::regex, which may introduces stack overflow when matching a extremely large string(bug of libc). Now we attempt to use a new PB structure to hold the delete sub predicate, to avoid that problem. message DeleteSubPredicatePB { optional int32 column_unique_id = 1; optional string column_name = 2; optional string op = 3; optional string cond_value = 4; } Currently, 2 versions of sub predicate will both be filled. For query, we use the v2, and during compaction we still use v1. The old rowset meta with delete predicates which had sub predicate v1 will be attempted to convert to v2 when read from PB. Moreover, efforts will be made to rewrite these meta with the new delete sub predicate. Make preparation to use column unique id to specify a column globally. Using the column unique id rather than the column name to identify a column is vital for flexible schema change. The rewritten delete predicate will attach column unique id.	2023-08-29 19:37:23 +08:00
zhangstar333	94a8fa6bc9	[bug](function) fix explode_number function return wrong rows (#23603 ) before the explode_number function result is random with const value. because the _cur_size is reset, so it's can't insert values to column.	2023-08-29 19:02:49 +08:00
huanghaibin	82a4f114e4	[improvement](compaction) add an option on delete stale rowset by judging _stale_rs_metas size when doing compaction (#23448 )	2023-08-29 17:40:37 +08:00
huanghaibin	1410a15a61	[fix](compaction) print column name when checking block ColumnPtr is nullptr on get block byte (#23338 )	2023-08-29 17:24:48 +08:00
lihangyu	0cece561f9	[refactor](segment iterator) remove std::map in iterator use std::vector instead and not rely on unique id to idenfy position (#23505 )	2023-08-29 16:43:32 +08:00
amory	f7a3d2778a	[FIX](array)update array olapconvertor and support array nested other complex type (#23489 ) * update array olapconvertor and support array nested other complex type * update for inverted index	2023-08-29 16:18:11 +08:00
amory	993659cd0b	[FIX](serde) fix handle serde error #23565	2023-08-29 14:55:35 +08:00
Qi Chen	97eb2b9172	[Fix](multi-catalog) Fix broker load reader and hdfs reader issue. (#23529 ) Broker load with broker sometimes will throw 'Invalid orc post script length'. hdfs query sometimes will throw 'Invalid orc post script length'.	2023-08-29 13:45:48 +08:00
Gabriel	7dcde4d529	[bug](decimal) Use max value as result if overflow (#23602 ) * [bug](decimal) Use max value as result if overflow * update	2023-08-29 13:26:25 +08:00
Pxl	7913354f78	add column number check for vsorted_run_merger (#23584 )	2023-08-29 10:41:59 +08:00
yuxuan-luo	0128dd42d9	[fix](regexp_extract_all) fix be OOM when quering with regexp_extrac… (#23284 )	2023-08-29 10:34:12 +08:00
abmdocrt	da9eb79ac4	[Enhancement](Schema hash) Remove schema hash in tablet info (#23516 )	2023-08-29 10:05:12 +08:00
Kaijie Chen	d863cc3a12	[fix](move-memtable) fix tablets to commit (#23577 )	2023-08-29 09:49:07 +08:00
Yongqiang YANG	9c65b7ab96	[improvement](column_reader) move load once to index reader to reduce (#23537 ) memory footprint of column reader	2023-08-29 09:34:27 +08:00
huanghaibin	fbf8499999	[improvement](compaction) reduce the memory using on vertical compaction (#23388 )	2023-08-28 21:54:21 +08:00

1 2 3 4 5 ...

5440 Commits