doris

Author	SHA1	Message	Date
AlexYue	e396f853a0	Pick "[enhance](Cooldown) Use config to control whether use cooldown replica for scanning first" (#38322 ) ## Proposed changes <!--Describe your changes.--> Same as master #37492	2024-07-25 12:17:38 +08:00
wangbo	81a7542cae	[pick]Add audit log event queue size limit (#37914 ) ## Proposed changes pick #37786	2024-07-16 19:00:22 +08:00
abmdocrt	63c2d22513	[cherry-pick](branch-2.1) Pick "[Fix](delete command) Mark delete sign when do delete command in MoW table (#35917 )" (#37594 ) Pick #35917 and #37151	2024-07-15 18:54:01 +08:00
slothever	16de141743	[regression](kerberos)add hive kerberos docker regression env (#37657 ) ## Proposed changes pick: [regression](kerberos)fix regression pipeline env when write hosts (#37057) [regression](kerberos)add hive kerberos docker regression env (#36430)	2024-07-15 09:35:39 +08:00
Jibing-Li	259d28407e	[improvement](statistics)Enable estimate hive table row count using file size. (#37218 ) (#37694 ) backport: https://github.com/apache/doris/pull/37218	2024-07-12 13:47:27 +08:00
feiniaofeiafei	6214d6421f	[Fix](planner) fix bug of char(255) toSql (#37340 ) (#37671 ) cherry-pick #37340 from master	2024-07-12 10:33:24 +08:00
hui lai	dd18652861	[branch-2.1](routine-load) make get Kafka meta timeout configurable (#37399 ) pick #36619	2024-07-08 10:39:17 +08:00
hui lai	d08a418dd8	[branch-2.1](routine-load) optimize routine load job auto resume policy (#37373 ) pick #35266	2024-07-07 18:16:56 +08:00
wuwenchi	b3eaf0e4d2	[bugfix](hive)Prevent multiple fs from being generated for 2.1 (#37142 ) pick #36954	2024-07-02 22:54:40 +08:00
Mingyu Chen	e25717458e	[opt](catalog) add some profile for parquet reader and change meta cache config (#37040 ) (#37146 ) bp #37040	2024-07-02 20:58:43 +08:00
zy-kkk	3f382b797a	[branch-2.1][improvement](sqlserver catalog) Configurable whether to use encrypt when connecting to SQL Server using the catalog (#36971 ) pick (#36659) pick #37015 In previous versions, we used druid as the default JDBC connection pool, which can use custom decryption to parse the certificate when SQL Server encryption is turned on. However, in the new version, after changing HikariCP as the default connection pool, the SQLServer certificate cannot be parsed, so encryption needs to be turned off for normal use. Therefore, a parameter is added to decide whether to disable SQLServer encryption. It is not disabled by default.	2024-07-02 10:14:43 +08:00
yujun	22cb7b8fcb	[improvement](compaction) be do not compact invisible version to avoid query error -230 #28082 (#36222 ) cherry pick from #28082	2024-06-27 13:45:21 +08:00
walter	58cc1dca7f	[improve](fe) Support to config max msg/frame size of the thrift server (#36594 ) Cherry-pick #35845	2024-06-21 00:15:15 +08:00
xy720	74162a1b7e	[enhancement](prepared statement) Handle unsigned numeric type in prepare statement (#36388 ) ## Proposed changes Issue Number: bp #36133 <!--Describe your changes.-->	2024-06-18 19:33:12 +08:00
yujun	7c0ec4ea2e	[fix](autobucket) fix autobucket config masterOnly=true #36116 (#36286 ) cherry pick from #36116	2024-06-14 14:26:23 +08:00
lihangyu	9708ca8fcb	[Feature](Prepared Statment) Implement in nereids planner (#35318 ) (#36172 )	2024-06-12 19:54:17 +08:00
amory	b5a35b9cef	[FIX] Pick array inverted index bugfix (#35837 ) here with some array with inverted index bugfix: see also: https://github.com/apache/doris/pull/34766 https://github.com/apache/doris/pull/35086 https://github.com/apache/doris/pull/34683 https://github.com/apache/doris/pull/34076	2024-06-06 09:54:14 +08:00
Mingyu Chen	5c8f87e01e	[opt](log) refine the FE logger (#35679 ) Previously, FE logs were written to files. The main FE logs include fe.log, fe.warn.log, fe.audit.log, fe.out, and fe.gc.log. In a K8s deployment environment, logs usually need to be output to standard output, and then other components process the log stream. This PR made the following changes: 1. Modified the log4j configuration template - When started with `--daemon`, logs are still written to various files, and the format remains unchanged. - When started with `--console`, all logs are output to standard output and marked with different prefixes: - `StdoutLogger`: logs for standard output - `StderrLogger`: logs for standard error output - `RuntimeLogger`: logs for fe.log or fe.warn.log - `AuditLogger:` logs for fe.audit.log - No prefix: logs for fe.gc.log Examples are as follows: ``` RuntimeLogger 2024-06-03 14:54:51,229 INFO (binlog-gcer\|62) [BinlogManager.gc():359] begin gc binlog ``` 2. Added a new FE config: `enable_file_logger` Defaults to true. Indicates that logs will be recorded to files regardless of the startup method. For example, if it is started with `--console`, the log will be output to both the file and the standard output. If it is `false`, the log will not be recorded in the file regardless of the startup method. 3. Optimized the log format of standard output The byte streams of stdout and stderr are captured. The logs previously outputted using `System.out` will be captured in fe.log for unified management.	2024-06-04 18:20:30 +08:00
deardeng	f94222a04e	[fix](log) Support fe log rollover size strategy (#34446 )	2024-06-04 18:18:16 +08:00
deardeng	db3bbc2437	[feature](merge-cloud) Change fe log rolling max size (#32777 )	2024-06-04 18:17:33 +08:00
Kang	bc6b316e87	[chore](index) add config enable_create_bitmap_index_as_inverted_index default true #33434 (#35521 )	2024-06-04 12:07:03 +08:00
Ashin Gau	4f0365e0bf	[fix](s3) move s3 providers to fe-common to be accessible for jni reader (#35779 ) backport: #35690 `PropertyConverter.setS3FsAccess` has add customized s3 providers: ``` public static final List<String> AWS_CREDENTIALS_PROVIDERS = Arrays.asList( DataLakeAWSCredentialsProvider.class.getName(), TemporaryAWSCredentialsProvider.class.getName(), SimpleAWSCredentialsProvider.class.getName(), EnvironmentVariableCredentialsProvider.class.getName(), IAMInstanceCredentialsProvider.class.getName()); ``` And these providers are set as configuration value of `fs.s3a.aws.credentials.provider`, which will be used as configuration to build s3 reader in JNI readers. However, `DataLakeAWSCredentialsProvider` is in `fe-core`, that is not dependent by JNI readers, so we have to move s3 providers to `fe-common'.	2024-06-03 14:04:39 +08:00
HHoflittlefish777	d83c714824	[branch-2.1](routine-load) adjusting the default configuration of routing load (#35753 ) #34898	2024-06-01 11:22:21 +08:00
slothever	fd23386ec5	[fix](auth)fix simple auth check and default username (#35620 ) fix simple auth check and default username we should set simple auth to valid by default, and check whether to set the default username in loginWithUGI	2024-05-30 19:59:37 +08:00
zy-kkk	b0e2461181	[branch-2.1][improvement](JdbcScan) Change the mysql function that does not support pushdown in JdbcScan to Config (#35631 ) pk #35196	2024-05-30 15:40:08 +08:00
Tiewei Fang	bddaeb9261	[Fix](JobSchedual) Modify the default value of `async_task_consumer_thread_num` (#35456 ) When `Export` statements are executed concurrently, the background uses `Job schedule` to manage export tasks. Previously, the default value of `async_task_consumer_thread_num` was 5, meaning that regardless of the concurrency setting, a maximum of only 5 threads could execute concurrently. On the other hand, not only `Export` uses `Job schedule`, but other scheduled tasks might also use `Job schedule`, leading to a shortage of thread resources Now, we have found that in many scenarios, `Export` needs to be set to a high concurrency value and run concurrently according to that high value. Clearly, `async_task_consumer_thread_num = 5` is no longer sufficient, so we have changed the default value of `async_task_consumer_thread_num` to 64	2024-05-28 18:54:06 +08:00
Xujian Duan	af7b16f213	[optimize](desc) display the correct data type of aggStateType (#34968 ) If a table column is AGG_STATE type, we can't get the clear defined data type if we use `desc tbl` statement. create table a_table( k1 int null, k2 agg_state<max_by(int not null,int)> generic, k3 agg_state<group_concat(string)> generic ) aggregate key (k1) distributed BY hash(k1) buckets 3 properties("replication_num" = "1"); before optimize: mysql> desc a_table; +-------+------------------------------------------------+------+-------+---------+---------+ \| Field \| Type \| Null \| Key \| Default \| Extra \| +-------+------------------------------------------------+------+-------+---------+---------+ \| k1 \| INT \| Yes \| true \| NULL \| \| \| k2 \| org.apache.doris.catalog.AggStateType@239f771c \| No \| false \| NULL \| GENERIC \| \| k3 \| org.apache.doris.catalog.AggStateType@2e535f50 \| No \| false \| NULL \| GENERIC \| +-------+------------------------------------------------+------+-------+---------+---------+ 3 rows in set (0.00 sec) after optimize: mysql> desc a_table; +-------+------------------------------------+------+-------+---------+---------+ \| Field \| Type \| Null \| Key \| Default \| Extra \| +-------+------------------------------------+------+-------+---------+---------+ \| k1 \| INT \| Yes \| true \| NULL \| \| \| k2 \| AGG_STATE<max_by(INT, INT NULL)> \| No \| false \| NULL \| GENERIC \| \| k3 \| AGG_STATE<group_concat(TEXT NULL)> \| No \| false \| NULL \| GENERIC \| +-------+------------------------------------+------+-------+---------+---------+ Co-authored-by: duanxujian <duanxujian@jd.com>	2024-05-22 10:03:31 +08:00
924060929	5012ddd87a	[fix](Nereids) fix sql cache return old value when truncate partition (#34698 ) 1. fix sql cache return old value when truncate partition 2. use expire_sql_cache_in_fe_second to control the expire time of the sql cache which in the NereidsSqlCacheManager	2024-05-18 18:05:31 +08:00
HHoflittlefish777	1a24895257	[opt](routine-load) optimize routine load task thread pool and related param(#32282 ) (#34896 )	2024-05-15 12:42:02 +08:00
zhangdong	f9c42f34dd	[fix](auth)Compatible with previously enabled ldap configuration (#34891 )	2024-05-15 12:36:47 +08:00
Mingyu Chen	cadbbdd2c0	[fix](config) for compatibility issue of log dir config (#34734 ) * [fix](config) for compatibility issue of log dir config * 1	2024-05-12 09:44:50 +08:00
xy720	ec34bc0386	[bug](config) Fix modifying label_num_threshold does not take effect (#34575 )	2024-05-10 22:12:17 +08:00
morrySnow	9a94681b29	[refactor](type) AggStateType should not extends ScalarType (#34463 ) 1. let AggStateType extends Type 2. remove useless interface isFixedLengthType and supportsTablePartitioning 3. let MapType implement interface isSupported 4. let VariantType extends ScalarType	2024-05-10 22:10:42 +08:00
lihangyu	853dbdcb00	[Feature](PreparedStatement) implement general server side prepared (#33807 )	2024-05-10 22:10:11 +08:00
feiniaofeiafei	6c11dd2231	[Fix](planner) fix ScalarType.getAssignmentCompatibleType() when deal boolean and decimal (#34435 ) The legacy planner encounters issues when handling filters such as: c1(boolean type)=0.0(decimalv3). The literal 0.0 is interpreted as decimalv3(1,1), and the boolean type c1 is coerced to decimalv3(1,1). decimalv3(1,1) can only retain values in the range [0,1), while the boolean true is represented as 1, exceeding the upper bound, thus causing an overflow problem. This pull request addresses this issue by considering the boolean type as decimalv3(1,0), making both c1 and 0.0 being cast to decimal(2,1). Co-authored-by: feiniaofeiafei <moailing@selectdb.com>	2024-05-10 22:07:16 +08:00
Kaijie Chen	07207b7b51	[feature](shuffle) enable strict consistency dml by default (#32958 ) (#34641 )	2024-05-10 14:31:50 +08:00
Mingyu Chen	3ae3f9d6e1	[opt](catalog) support using loading cache for db/table list in external catalog (#33610 ) (#34596 ) bp #33610	2024-05-09 17:50:39 +08:00
yiguolei	8fa1b78d7b	Revert "[feature](shuffle) enable strict consistency dml by default (#32958 )" This reverts commit 400105a92182755bdd95a58a7d378d67c6b27f51.	2024-05-08 23:00:46 +08:00
Kaijie Chen	400105a921	[feature](shuffle) enable strict consistency dml by default (#32958 )	2024-05-08 11:00:14 +08:00
wudongliang	182177def0	[Improve](config)The stream_load label length is changed to a configurable (#34459 ) pick from #33745	2024-05-07 20:43:16 +08:00
yiguolei	8fdfbcb3c4	Revert "[Opt](func) opt the percentile func performance (#34373 ) (#34416 )" This reverts commit 509ae425e416b4779ae94eab9c2b21f9850e03c3.	2024-05-07 07:23:48 +08:00
slothever	2d4da7d177	[fix](kerberos)enable hadoop auto renew tgt (#34439 )	2024-05-07 00:36:20 +08:00
HappenLee	509ae425e4	[Opt](func) opt the percentile func performance (#34373 ) (#34416 )	2024-05-06 20:10:35 +08:00
Jibing-Li	91887a285e	Implement HLL with 128 buckets to support statistics cache. (#34124 )	2024-04-26 15:05:36 +08:00
Lei Zhang	2a1fbfd72c	[feat](fe) Add `ignore_bdbje_log_checksum_read` for BDBEnvironment (#31247 ) * https://forums.oracle.com/ords/apexds/post/je-log-checksumexception-2812 * When meeting disk damage or other reason described in the oracle forums and fe cannot start due to `com.sleepycat.je.log.ChecksumException`, we add a param `ignore_bdbje_log_checksum_read` to ignore the exception, but there is no guarantee of correctness for bdbje kv data Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>	2024-04-22 22:33:24 +08:00
Mingyu Chen	88b3d61eca	[refactor](Mysql) Refactoring the process of using external components to authenticate in MySQL connections (#32875 ) (#33958 ) bp #32875 Co-authored-by: LompleZ Liu <47652868+LompleZ@users.noreply.github.com>	2024-04-22 16:41:49 +08:00
924060929	15f8014e4e	[enhancement](Nereids) Enable parse sql from sql cache and fix some bugs (#33867 ) * [enhancement](Nereids) Enable parse sql from sql cache (#33262) Before this pr, the query must pass through parser, analyzer, rewriter, optimizer and translator, then we can check whether this query can use sql cache, if the query is too long, or the number of join tables too big, the plan time usually >= 500ms. This pr reduce this time by skip the fashion plan path, because we can reuse the previous physical plan and query result if no any changed. In some cases we should not parse sql from sql cache, e.g. table structure changed, data changed, user policies changed, privileges changed, contains non-deterministic functions, and user variables changed. In my test case: query a view which has lots of join and union, and the tables has empty partition, the query latency is about 3ms. if not parse sql from sql cache, the plan time is about 550ms ## Features 1. use Config.sql_cache_manage_num to control how many sql cache be reused in on fe 2. if explain plan appear some plans contains `LogicalSqlCache` or `PhysicalSqlCache`, it means the query can use sql cache, like this: ```sql mysql> set enable_sql_cache=true; Query OK, 0 rows affected (0.00 sec) mysql> explain physical plan select * from test.t; +----------------------------------------------------------------------------------+ \| Explain String(Nereids Planner) \| +----------------------------------------------------------------------------------+ \| cost = 3.135 \| \| PhysicalResultSink[53] ( outputExprs=[c1#0, c2#1] ) \| \| +--PhysicalDistribute[50]@0 ( stats=3, distributionSpec=DistributionSpecGather ) \| \| +--PhysicalOlapScan[t]@0 ( stats=3 ) \| +----------------------------------------------------------------------------------+ 4 rows in set (0.02 sec) mysql> select * from test.t; +------+------+ \| c1 \| c2 \| +------+------+ \| 1 \| 2 \| \| -2 \| -2 \| \| NULL \| 30 \| +------+------+ 3 rows in set (0.05 sec) mysql> explain physical plan select * from test.t; +-------------------------------------------------------------------------------------------+ \| Explain String(Nereids Planner) \| +-------------------------------------------------------------------------------------------+ \| cost = 0.0 \| \| PhysicalSqlCache[2] ( queryId=78511f515cda466b-95385d892d6c68d0, backend=127.0.0.1:9050 ) \| \| +--PhysicalResultSink[52] ( outputExprs=[c1#0, c2#1] ) \| \| +--PhysicalDistribute[49]@0 ( stats=3, distributionSpec=DistributionSpecGather ) \| \| +--PhysicalOlapScan[t]@0 ( stats=3 ) \| +-------------------------------------------------------------------------------------------+ 5 rows in set (0.01 sec) ``` (cherry picked from commit 03bd2a337d4a56ea9c91673b3bd4ae518ed10f20) * fix * [fix](Nereids) fix some sql cache consistence bug between multiple frontends (#33722) fix some sql cache consistence bug between multiple frontends which introduced by [enhancement](Nereids) Enable parse sql from sql cache #33262, fix by use row policy as the part of sql cache key. support dynamic update the num of fe manage sql cache key (cherry picked from commit 90abd76f71e73702e49794d375ace4f27f834a30) * [fix](Nereids) fix bug of dry run query with sql cache (#33799) 1. dry run query should not use sql cache 2. fix test sql cache in cloud mode 3. enable cache OneRowRelation and EmptyRelation in frontend to skip parse sql (cherry picked from commit dc80ecf7f33da7b8c04832dee88abd09f7db9ffe) * remove cloud mode * remove @NotNull	2024-04-19 15:22:14 +08:00
Kang	ad75b9b142	[opt](auto bucket) add fe config autobucket_max_buckets (#33842 )	2024-04-19 15:03:06 +08:00
feiniaofeiafei	6776a3ad1b	[Fix](planner) fix create view star except and modify cast to sql (#33726 )	2024-04-19 15:02:49 +08:00
xueweizhang	56b7839447	[feature](backup) ignore table that not support type when backup, and… (#33158 ) * [feature](backup) ignore table that not support type when backup, and not report exception Signed-off-by: nextdreamblue <zxw520blue1@163.com> * fix Signed-off-by: nextdreamblue <zxw520blue1@163.com> --------- Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2024-04-17 23:42:11 +08:00

1 2 3 4 5 ...

470 Commits