doris

Author	SHA1	Message	Date
Xujian Duan	af7b16f213	[optimize](desc) display the correct data type of aggStateType (#34968 ) If a table column is AGG_STATE type, we can't get the clear defined data type if we use `desc tbl` statement. create table a_table( k1 int null, k2 agg_state<max_by(int not null,int)> generic, k3 agg_state<group_concat(string)> generic ) aggregate key (k1) distributed BY hash(k1) buckets 3 properties("replication_num" = "1"); before optimize: mysql> desc a_table; +-------+------------------------------------------------+------+-------+---------+---------+ \| Field \| Type \| Null \| Key \| Default \| Extra \| +-------+------------------------------------------------+------+-------+---------+---------+ \| k1 \| INT \| Yes \| true \| NULL \| \| \| k2 \| org.apache.doris.catalog.AggStateType@239f771c \| No \| false \| NULL \| GENERIC \| \| k3 \| org.apache.doris.catalog.AggStateType@2e535f50 \| No \| false \| NULL \| GENERIC \| +-------+------------------------------------------------+------+-------+---------+---------+ 3 rows in set (0.00 sec) after optimize: mysql> desc a_table; +-------+------------------------------------+------+-------+---------+---------+ \| Field \| Type \| Null \| Key \| Default \| Extra \| +-------+------------------------------------+------+-------+---------+---------+ \| k1 \| INT \| Yes \| true \| NULL \| \| \| k2 \| AGG_STATE<max_by(INT, INT NULL)> \| No \| false \| NULL \| GENERIC \| \| k3 \| AGG_STATE<group_concat(TEXT NULL)> \| No \| false \| NULL \| GENERIC \| +-------+------------------------------------+------+-------+---------+---------+ Co-authored-by: duanxujian <duanxujian@jd.com>	2024-05-22 10:03:31 +08:00
924060929	5012ddd87a	[fix](Nereids) fix sql cache return old value when truncate partition (#34698 ) 1. fix sql cache return old value when truncate partition 2. use expire_sql_cache_in_fe_second to control the expire time of the sql cache which in the NereidsSqlCacheManager	2024-05-18 18:05:31 +08:00
HHoflittlefish777	1a24895257	[opt](routine-load) optimize routine load task thread pool and related param(#32282 ) (#34896 )	2024-05-15 12:42:02 +08:00
zhangdong	f9c42f34dd	[fix](auth)Compatible with previously enabled ldap configuration (#34891 )	2024-05-15 12:36:47 +08:00
Mingyu Chen	cadbbdd2c0	[fix](config) for compatibility issue of log dir config (#34734 ) * [fix](config) for compatibility issue of log dir config * 1	2024-05-12 09:44:50 +08:00
xy720	ec34bc0386	[bug](config) Fix modifying label_num_threshold does not take effect (#34575 )	2024-05-10 22:12:17 +08:00
morrySnow	9a94681b29	[refactor](type) AggStateType should not extends ScalarType (#34463 ) 1. let AggStateType extends Type 2. remove useless interface isFixedLengthType and supportsTablePartitioning 3. let MapType implement interface isSupported 4. let VariantType extends ScalarType	2024-05-10 22:10:42 +08:00
lihangyu	853dbdcb00	[Feature](PreparedStatement) implement general server side prepared (#33807 )	2024-05-10 22:10:11 +08:00
feiniaofeiafei	6c11dd2231	[Fix](planner) fix ScalarType.getAssignmentCompatibleType() when deal boolean and decimal (#34435 ) The legacy planner encounters issues when handling filters such as: c1(boolean type)=0.0(decimalv3). The literal 0.0 is interpreted as decimalv3(1,1), and the boolean type c1 is coerced to decimalv3(1,1). decimalv3(1,1) can only retain values in the range [0,1), while the boolean true is represented as 1, exceeding the upper bound, thus causing an overflow problem. This pull request addresses this issue by considering the boolean type as decimalv3(1,0), making both c1 and 0.0 being cast to decimal(2,1). Co-authored-by: feiniaofeiafei <moailing@selectdb.com>	2024-05-10 22:07:16 +08:00
Kaijie Chen	07207b7b51	[feature](shuffle) enable strict consistency dml by default (#32958 ) (#34641 )	2024-05-10 14:31:50 +08:00
Mingyu Chen	3ae3f9d6e1	[opt](catalog) support using loading cache for db/table list in external catalog (#33610 ) (#34596 ) bp #33610	2024-05-09 17:50:39 +08:00
yiguolei	8fa1b78d7b	Revert "[feature](shuffle) enable strict consistency dml by default (#32958 )" This reverts commit 400105a92182755bdd95a58a7d378d67c6b27f51.	2024-05-08 23:00:46 +08:00
Kaijie Chen	400105a921	[feature](shuffle) enable strict consistency dml by default (#32958 )	2024-05-08 11:00:14 +08:00
wudongliang	182177def0	[Improve](config)The stream_load label length is changed to a configurable (#34459 ) pick from #33745	2024-05-07 20:43:16 +08:00
yiguolei	8fdfbcb3c4	Revert "[Opt](func) opt the percentile func performance (#34373 ) (#34416 )" This reverts commit 509ae425e416b4779ae94eab9c2b21f9850e03c3.	2024-05-07 07:23:48 +08:00
slothever	2d4da7d177	[fix](kerberos)enable hadoop auto renew tgt (#34439 )	2024-05-07 00:36:20 +08:00
HappenLee	509ae425e4	[Opt](func) opt the percentile func performance (#34373 ) (#34416 )	2024-05-06 20:10:35 +08:00
Jibing-Li	91887a285e	Implement HLL with 128 buckets to support statistics cache. (#34124 )	2024-04-26 15:05:36 +08:00
Lei Zhang	2a1fbfd72c	[feat](fe) Add `ignore_bdbje_log_checksum_read` for BDBEnvironment (#31247 ) * https://forums.oracle.com/ords/apexds/post/je-log-checksumexception-2812 * When meeting disk damage or other reason described in the oracle forums and fe cannot start due to `com.sleepycat.je.log.ChecksumException`, we add a param `ignore_bdbje_log_checksum_read` to ignore the exception, but there is no guarantee of correctness for bdbje kv data Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>	2024-04-22 22:33:24 +08:00
Mingyu Chen	88b3d61eca	[refactor](Mysql) Refactoring the process of using external components to authenticate in MySQL connections (#32875 ) (#33958 ) bp #32875 Co-authored-by: LompleZ Liu <47652868+LompleZ@users.noreply.github.com>	2024-04-22 16:41:49 +08:00
924060929	15f8014e4e	[enhancement](Nereids) Enable parse sql from sql cache and fix some bugs (#33867 ) * [enhancement](Nereids) Enable parse sql from sql cache (#33262) Before this pr, the query must pass through parser, analyzer, rewriter, optimizer and translator, then we can check whether this query can use sql cache, if the query is too long, or the number of join tables too big, the plan time usually >= 500ms. This pr reduce this time by skip the fashion plan path, because we can reuse the previous physical plan and query result if no any changed. In some cases we should not parse sql from sql cache, e.g. table structure changed, data changed, user policies changed, privileges changed, contains non-deterministic functions, and user variables changed. In my test case: query a view which has lots of join and union, and the tables has empty partition, the query latency is about 3ms. if not parse sql from sql cache, the plan time is about 550ms ## Features 1. use Config.sql_cache_manage_num to control how many sql cache be reused in on fe 2. if explain plan appear some plans contains `LogicalSqlCache` or `PhysicalSqlCache`, it means the query can use sql cache, like this: ```sql mysql> set enable_sql_cache=true; Query OK, 0 rows affected (0.00 sec) mysql> explain physical plan select * from test.t; +----------------------------------------------------------------------------------+ \| Explain String(Nereids Planner) \| +----------------------------------------------------------------------------------+ \| cost = 3.135 \| \| PhysicalResultSink[53] ( outputExprs=[c1#0, c2#1] ) \| \| +--PhysicalDistribute[50]@0 ( stats=3, distributionSpec=DistributionSpecGather ) \| \| +--PhysicalOlapScan[t]@0 ( stats=3 ) \| +----------------------------------------------------------------------------------+ 4 rows in set (0.02 sec) mysql> select * from test.t; +------+------+ \| c1 \| c2 \| +------+------+ \| 1 \| 2 \| \| -2 \| -2 \| \| NULL \| 30 \| +------+------+ 3 rows in set (0.05 sec) mysql> explain physical plan select * from test.t; +-------------------------------------------------------------------------------------------+ \| Explain String(Nereids Planner) \| +-------------------------------------------------------------------------------------------+ \| cost = 0.0 \| \| PhysicalSqlCache[2] ( queryId=78511f515cda466b-95385d892d6c68d0, backend=127.0.0.1:9050 ) \| \| +--PhysicalResultSink[52] ( outputExprs=[c1#0, c2#1] ) \| \| +--PhysicalDistribute[49]@0 ( stats=3, distributionSpec=DistributionSpecGather ) \| \| +--PhysicalOlapScan[t]@0 ( stats=3 ) \| +-------------------------------------------------------------------------------------------+ 5 rows in set (0.01 sec) ``` (cherry picked from commit 03bd2a337d4a56ea9c91673b3bd4ae518ed10f20) * fix * [fix](Nereids) fix some sql cache consistence bug between multiple frontends (#33722) fix some sql cache consistence bug between multiple frontends which introduced by [enhancement](Nereids) Enable parse sql from sql cache #33262, fix by use row policy as the part of sql cache key. support dynamic update the num of fe manage sql cache key (cherry picked from commit 90abd76f71e73702e49794d375ace4f27f834a30) * [fix](Nereids) fix bug of dry run query with sql cache (#33799) 1. dry run query should not use sql cache 2. fix test sql cache in cloud mode 3. enable cache OneRowRelation and EmptyRelation in frontend to skip parse sql (cherry picked from commit dc80ecf7f33da7b8c04832dee88abd09f7db9ffe) * remove cloud mode * remove @NotNull	2024-04-19 15:22:14 +08:00
Kang	ad75b9b142	[opt](auto bucket) add fe config autobucket_max_buckets (#33842 )	2024-04-19 15:03:06 +08:00
feiniaofeiafei	6776a3ad1b	[Fix](planner) fix create view star except and modify cast to sql (#33726 )	2024-04-19 15:02:49 +08:00
xueweizhang	56b7839447	[feature](backup) ignore table that not support type when backup, and… (#33158 ) * [feature](backup) ignore table that not support type when backup, and not report exception Signed-off-by: nextdreamblue <zxw520blue1@163.com> * fix Signed-off-by: nextdreamblue <zxw520blue1@163.com> --------- Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2024-04-17 23:42:11 +08:00
zhangstar333	b2b385a4ff	[improve](fold) support complex type for constant folding (#32867 )	2024-04-17 23:41:59 +08:00
Mingyu Chen	38c5030f97	[opt](log) refactor the log dir config (#32933 ) Refactor the config for log dir of FE and BE TLDR: - Use env variable `LOG_DIR` to set root log dir - Remove `sys_log_dir` for FE and BE Details: 1. FE 1. The root log dir is set by env variable `LOG_DIR` in `fe.conf` 2. The default value of `audit_log_dir` is same as `${LOG_DIR}/` 3. The default value of `spark_launcher_log_dir` is `${LOG_DIR}/spark_launcher_log` 4. The default value of `nereids_trace_log_dir` is `${LOG_DIR}/nereids_trace_log` 5. The origin `sys_log_dir` is deprecated, and default value is `""`. But for compatibility, if user already set `sys_log_dir` before, Doris will still use it as root log dir. 2. BE 1. The root log dir is set by env variable `LOG_DIR` in `be.conf` 2. Remove `pipeline_tracing_log_dir`, use `${LOG_DIR}` directly. 3. The origin `sys_log_dir` is deprecated, and default value is `""`. But for compatibility, if user already set `sys_log_dir` before, Doris will still use it as root log dir.	2024-04-17 23:41:59 +08:00
slothever	07f296734a	[regression](insert)add hive DDL and CTAS regression case (#32924 ) Issue Number: #31442 dependent on #32824 add ddl(create and drop) test add ctas test add complex type test TODO: bucketed table test truncate test add/drop partition test	2024-04-12 10:24:23 +08:00
slothever	36a1bf1d73	[feature][insert]Adapt the create table statement to the nereids sql (#32458 ) issue: #31442 1. adapt create table statement from doris to hive 2. fix insert overwrite for table sink > The doris create hive table statement: ``` mysql> CREATE TABLE buck2( -> id int COMMENT 'col1', -> name string COMMENT 'col2', -> dt string COMMENT 'part1', -> dtm string COMMENT 'part2' -> ) ENGINE=hive -> COMMENT "create tbl" -> PARTITION BY LIST (dt, dtm) () -> DISTRIBUTED BY HASH (id) BUCKETS 16 -> PROPERTIES( -> "file_format" = "orc" -> ); ``` > generated hive create table statement: ``` CREATE TABLE `buck2`( `id` int COMMENT 'col1', `name` string COMMENT 'col2') PARTITIONED BY ( `dt` string, `dtm` string) CLUSTERED BY ( id) INTO 16 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://HDFS8000871/usr/hive/warehouse/jz3.db/buck2' TBLPROPERTIES ( 'transient_lastDdlTime'='1710840747', 'doris.file_format'='orc') ```	2024-04-12 09:57:37 +08:00
camby	14c5247fb7	[feature](replica) support force set replicate allocation for olap tables (#32916 ) Add a config to force set replication allocation for all OLAP tables and partitions.	2024-04-10 16:00:15 +08:00
yiguolei	16f8afc408	[refactor](coordinator) split profile logic and instance report logic (#32010 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2024-04-10 15:51:32 +08:00
Xinyi Zou	80cdc74908	[fix](arrow-flight) Fix reach limit of connections error (#32911 ) Fix Reach limit of connections error in fe.conf , arrow_flight_token_cache_size is mandatory less than qe_max_connection/2. arrow flight sql is a stateless protocol, connection is usually not actively disconnected, bearer token is evict from the cache will unregister ConnectContext. Fix ConnectContext.command not be reset to COM_SLEEP in time, this will result in frequent kill connection after query timeout. Fix bearer token evict log and exception. TODO: use arrow flight session: https://mail.google.com/mail/u/0/#inbox/FMfcgzGxRdxBLQLTcvvtRpqsvmhrHpdH	2024-04-10 11:34:29 +08:00
zhangdong	1d4e5a1c58	[enhance](auth)enable col auth (#32659 )	2024-03-24 08:07:01 +08:00
deardeng	ab467f53db	[fix](partition) Fix be tablet partition id eq 0 By report tablet (#32179 ) (#32667 )	2024-03-22 15:38:58 +08:00
Mingyu Chen	279ea2f366	[feature](proxy-protocol) Support proxy protocol v1 (#32338 ) Enable proxy protocol to support IP transparency. See: `IP Transparency` in `f57387b502/docs/en/docs/admin-manual/cluster-management/load-balancing.md` for details	2024-03-21 14:07:22 +08:00
airborne12	ecadb60bcd	[Pick 2.1](inverted index) support inverted index format v2 (#30145 ) (#32418 )	2024-03-19 08:11:33 +08:00
slothever	711c0cd55c	[feature](insert)implement hive table sink plan (#31765 ) (#32386 ) from #31765	2024-03-18 22:49:30 +08:00
Mingyu Chen	4732aae628	[Refactor](insert) refactor insert command to support other type of table (#31610 ) (#32345 ) bp #31610	2024-03-17 20:46:07 +08:00
deardeng	844a1b53b7	[fix](retry) Set query encounter rpc exception default retry times to 3 (#28555 )	2024-03-16 20:53:46 +08:00
morrySnow	ea2fbfaffa	[feature](Nereids) support agg state type in create table (#32171 ) this PR introduce a behavior change, syntax of create table with agg_state type is changed.	2024-03-15 18:04:49 +08:00
zclllyybb	847ec368be	[Fix](smooth-upgrade) Fix incompatibility when upgrade from 2.0 to 2.1 (#32220 )	2024-03-14 11:23:05 +08:00
walter	b9a87c63f7	[chore](catalog recycle bin) Add option to ignore min erase latency for testing (#31417 )	2024-02-29 16:44:40 +08:00
slothever	9243b3eeee	[fix](multi-catalog) add config to disable external DDL (#31528 ) from #31453	2024-02-29 08:42:35 +08:00
Pxl	6737fdea64	[Chore](agg-state) adjust AggStateType constructor check input (#31401 ) adjust AggStateType constructor check input	2024-02-28 17:52:11 +08:00
Mingyu Chen	883d022f84	[fix](paimon) fix hadoop.username does not take effect in paimon catalog (#31478 )	2024-02-28 13:08:41 +08:00
morrySnow	a371a10603	[fix](Nereids) let time type coercion same with legacy planner (#31472 )	2024-02-28 13:07:47 +08:00
slothever	eb0416032b	[feature](multi-catalog)support hms catalog create and drop table/db (#30198 ) (#31499 ) 1. rename old create/drop table to add/removeMemoryTable 2. add new create/drop table/db method 3. support hms catalog create/drop table/db (cherry picked from commit b2e869c7414c68186de8d43b324ae736d7cc3463)	2024-02-28 09:33:54 +08:00
wangbo	1127b0065a	[Improment](executor)Add scanbytes/scanrows condition (#31364 ) * Add scanbytes/scanrows condition * fix reg	2024-02-27 10:12:33 +08:00
ZhongJinHacker	e48f4f38d0	[Fix](fe-common) Fix the Pair.java code about the hidden danger of NullPointException (#31371 ) * 修复Pair类 first 或 second 为null时，调用equals和toString 抛NullPointException问题 * add license	2024-02-26 19:07:10 +08:00
yiguolei	7a1caf4718	[refactor](wg) enable wg by default and init normal wg in constructor (#31373 ) should always enable workload group because other operations depend on it for example MTMV, and spill to disk. the normal workload group should be created in constructor.	2024-02-25 18:08:19 +08:00
Mingyu Chen	aee49adf1e	[opt](compute-node) refactor compute node doc and opt some default config (#31325 ) * [opt](compute-node) refactor compute node doc and opt some default config * 1 * 1	2024-02-24 11:44:53 +08:00

1 2 3 4 5 ...

444 Commits