Commit Graph

16407 Commits

Author SHA1 Message Date
3f20e28f3d [doc](jvm) Correct the jvm parameters of be (#29958) 2024-01-16 18:37:44 +08:00
b47e289560 [refactor](Nereids): avoid ConnectContext.get() ASAP to improve proformance (#29952) 2024-01-16 18:37:44 +08:00
54d23e0f8e [typo](docs) fix default value display exception (#29859)
---------

Co-authored-by: hechao <hechao@selectdb.com>
2024-01-16 18:37:44 +08:00
7a574df9fc [fix](pipelineX) fix multi be may be missing profiles #29914 2024-01-16 18:37:44 +08:00
fd66ce0928 [enhancement](docs) Clarify the requirement for JDK exact version 8 in install guide (#29921)
* docs: Clarified the requirement for JDK exact version 8

* fix: Update code
2024-01-16 18:37:44 +08:00
c8845c9e07 [opt](scanner) Improve the efficiency of TOPN opt (#29937) 2024-01-16 18:37:44 +08:00
fd59a3e3a8 [chore](Fix) Fix uninitilized buffer in read_cluster_id() (#29949) 2024-01-16 18:37:40 +08:00
9773fef4a1 [fix](class-loader) fix class loader conflict on BE side (#29942)
1. make `hadoop-common` in be java extension as `provided`.
2. must load be java extension jars before hadoop jars
2024-01-16 18:37:06 +08:00
4b4fd1a290 [improvement](log) add txn log (#28875) 2024-01-16 18:37:06 +08:00
8ca807578f [fix](migrate disk) fix migrate disk lost data during publish version (#29887)
Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>
2024-01-16 18:37:06 +08:00
4efc8647de [deps](hadoop) update hadoop on BE side to 3.3.6 (#29939)
Same as on FE side
2024-01-16 18:37:02 +08:00
2bb83f7621 [fix](schema cache) adjust the destruction order of _tablet_schema_cache and storage engine (#29923) 2024-01-16 18:36:51 +08:00
74e4486c65 [fix](partition) Add more log for single replica load when partition id eq 0 (#28707) 2024-01-16 18:35:32 +08:00
615d94bbc7 [log](insertadd log in parse insert into values data (#29903) 2024-01-16 18:35:32 +08:00
7309061db4 [pipelineX](improvement) Adjust local exchange strategy (#29915) 2024-01-16 18:35:32 +08:00
25428bd7fb [fix](kerberos) fix BE kerberos ccache renew, optimize kerbero options (#29291)
1. we need  remove BE kinit, and use jni login with keytab, because kinit cannot renew TGT for doris in many complex cases.
> This pull requet will support new instance from keytab: https://github.com/apache/doris-thirdparty/pull/173, so now we  won't need kinit cmd, just login with keytab and principal

2. add `kerberos_ccache_path` to set kerberos credentials cache path manually.

3. add `max_hdfs_file_handle_cache_time_ms` to set hdfs fs handle cache time.
2024-01-16 18:35:29 +08:00
5e697990a8 [bugfix](timeout) serving_blocks_num may cause timeout, try to fix it (#29912)
Although serving_blocks_num is an atomic variable. It's ++ and -- are not protected by transfer lock.
I am not sure the memory order of ++ and --.
I think it maybe the root cause of query timeout. So that I remove the check and test it in github pipeline.
2024-01-16 18:34:19 +08:00
a836f41854 [enhance](serde)update slice reserve and deduce slice back usage #29879 2024-01-16 18:33:51 +08:00
620cfc3cd7 [fix](move-memtable) set idle timeout equal to load timeout (#29839) 2024-01-16 18:33:51 +08:00
a974e96841 [community](tool) add a tool to pick pr from one branch to another (#29764) 2024-01-16 18:33:51 +08:00
d48c8a1dce [test](ut) added UT cases for show partitions external table (#29565) 2024-01-16 18:33:51 +08:00
e5c8192d47 [ut](stats) Added tests for JDBC analysis tasks (#28591) 2024-01-16 18:33:51 +08:00
612186e657 [doc](sql-dialect)(audit) add doc for sql dialect and audit plugin (#29775) 2024-01-16 18:33:51 +08:00
c599cf311d [fix](migrate) migrate check old tablet had deleted (#29909) 2024-01-16 18:33:51 +08:00
e3a1138da7 [fix](migrate disk) fix tablet disk migration timeout too large (#29895) 2024-01-16 18:33:51 +08:00
7c493b08c5 [refactor](dialect) make http sql converter plugin and audit loader as builtin plugin (#29692)
Followup #28890

Make HttpSqlConverterPlugin and AuditLoader as Doris' builtin plugin.
To make it simple for user to support sql dialect and using audit loader.

HttpSqlConverterPlugin

By default, there is nothing changed.

There is a new global variable sql_converter_service, default is empty, if set, the HttpSqlConverterPlugin will be enabled

set global sql_converter_service = "http://127.0.0.1:5001/api/v1/convert"

AuditLoader

By default, there is nothing changed.

There is a new global variable enable_audit_plugin, default is false, if set to true, the audit loader plugin will be enable.

Doris will create audit_log in __internal_schema when startup

If enable_audit_plugin is true, the audit load will be inserted into audit_log table.

3 other global variables related to this plugin:

audit_plugin_max_batch_interval_sec: The max interval for audit loader to insert a batch of audit log.
audit_plugin_max_batch_bytes: The max batch size for audit loader to insert a batch of audit log.
audit_plugin_max_sql_length: The max length of statement in audit log
2024-01-16 18:31:59 +08:00
a69ce49b07 [fix](Nereids) adjust min/max stats for cast function if types are comparable (#28166)
estimate column stats for "cast(col, XXXType)"

-----cast-est------
query4 41169 40335 40267 40267
query58 463 361 401 361
Total cold run time: 41632 ms
Total hot run time: 40628 ms

----master------
query4 40624 40180 40299 40180
query58 487 389 420 389
Total cold run time: 41111 ms
Total hot run time: 40569 ms
2024-01-16 18:31:59 +08:00
bbfc3d037e [doc](auto-inc) Add user oriented doc for auto increment column (#29230) 2024-01-16 18:31:59 +08:00
0b16938b7f [Fix](Nereids) Fix datatype length wrong when string contains chinese (#29885)
When varchar literal contains chinese, the length of varchar should not be the length of the varchar, it should be 
the actual length of the using byte.
Chinese is represented by unicode, a chinese char occypy 4 byte at mostly. So if meet chinese in varchar literal, we 
set the length is 4* length.

for example as following:
>        CREATE MATERIALIZED VIEW test_varchar_literal_mv
>             BUILD IMMEDIATE REFRESH AUTO ON MANUAL
>             DISTRIBUTED BY RANDOM BUCKETS 2
>             PROPERTIES ('replication_num' = '1')
>             AS
>             select case when l_orderkey > 1 then "一二三四" else "五六七八" end as field_1 from lineitem;

mysql> desc test_varchar_literal_mv;
the def of materialized view is as following:
+---------+-------------+------+-------+---------+-------+
| Field   | Type        | Null | Key   | Default | Extra |
+---------+-------------+------+-------+---------+-------+
| field_1 | VARCHAR(16) | No   | false | NULL    | NONE  |
+---------+-------------+------+-------+---------+-------+
2024-01-16 18:31:59 +08:00
115815739c [bugfix](fe) add check for leg/lead function params (#29617) 2024-01-16 18:31:59 +08:00
c92648cb27 [ut](meta) added unit test for frontend service impl (#28455) 2024-01-16 18:31:59 +08:00
91e546cfe3 [deps](hadoop) upgrade hadoop deps to 3.3.4.6 (#29908) 2024-01-16 18:31:55 +08:00
56f0dc77fe [doc](unique key) update description for unique key in data model doc (#28902) 2024-01-16 18:31:27 +08:00
d127527af3 [feat](meta) Reuse HMS statistics analyzed by Spark engine for Analyze Task. (#28525)
Taking the Idea further from PR #24853 (#24853)
Column statistics already analyzed and available in HMS from spark, this PR proposes to reuse the analyzed stats from external source, when executed WITH SQL clause of analyze cooamd.

Spark analyzes and stores the statistics in Table properties instead of HiveColumnStatistics. In this PR, we try to get the statistics from these properties and make it available to Doris.
2024-01-16 18:31:27 +08:00
7b30119537 [improve](multi-table-load) pause job when can not find table #29870
If there is no table that can be found, the task will cycle forever and no data will be loaded. To avoid invalid scheduled tasks, It is better to pause the job rather than run it.
2024-01-16 18:31:27 +08:00
6598b4f7c8 [fix](http) fix exception when querying map data through http #29686
The mysql type code mapped by the map type is 400, but 400 is an unknown type for mysql.
For the jdbc driver of mariadb, when querying through the http api of /api/query or using the jdbc driver of mariadb, an exception will occur.
For the jdbc driver of mysql, it will be converted into binary form, and the correct data can be read through the string type.
Therefore, the mysql custom type of map was removed and changed to string type, so that both the jdbc driver of mariadb and mysql can work normally.
2024-01-16 18:31:27 +08:00
e1a12cf222 [improvement](auth)Not allowed to operate internal_schema database (#29790)
Only root user can operate __internal_schema database
The scope of impact includes:
create database
drop database
alter database
create table
drop table
alter table
truncate table
insert overwrite
insert
delete
update
load(root also not allowed)

delete support check auth
2024-01-16 18:31:27 +08:00
8b4ffcc8f7 [typo](docs) fix typo of outfile and export md (#29804) 2024-01-16 18:31:27 +08:00
1dc0c74ad9 [improvement](statistics)Stop analyze quickly after user close auto analyze. #29809 2024-01-16 18:31:27 +08:00
9d3a017706 [fix](doriswriter)Fix the problem that specifying multiple loadurls does not take effect #29865 2024-01-16 18:31:27 +08:00
b3e37b3efa [unit test](statistics)Add unit test case for auto analyze. #29904
Add unit and p0 test case for auto analyze.
2024-01-16 18:31:27 +08:00
d47adbb81f [Fix](nereids) Fix cte rewrite by mv failure and predicates compensation by mistake (#29820)
Fix cte rewrite by mv wrongly when query has scalar aggregate but view no
For example as following, it should not be rewritten by materialized view successfully

// materialzied view define
def mv20_1 = """
select
l_shipmode,
l_shipinstruct,
sum(l_extendedprice),
count()
from lineitem
left join
orders on lineitem.L_ORDERKEY = orders.O_ORDERKEY
group by
l_shipmode,
l_shipinstruct;
"""
// query sql
def query20_1 =
"""
select
sum(l_extendedprice),
count()
from lineitem
left join
orders
on lineitem.L_ORDERKEY = orders.O_ORDERKEY
"""

Fix predicates compensation by mistake
For example as following, it can return right result, but it's wrong earlier.

// materialzied view define
def mv7_1 = """
select l_shipdate, o_orderdate, l_partkey, l_suppkey
from lineitem
left join orders
on lineitem.l_orderkey = orders.o_orderkey
where l_shipdate = '2023-12-08' and o_orderdate = '2023-12-08';
"""
// query sql
def query7_1 = """
select l_shipdate, o_orderdate, l_partkey, l_suppkey
from (select * from lineitem where l_shipdate = '2023-10-17' ) t1
left join orders
on t1.l_orderkey = orders.o_orderkey;
"""

and optimize some code usage and add more comment for method
2024-01-16 18:31:27 +08:00
e417128fb9 [bug](bitmap) should return error status when execute failed (#29841) 2024-01-16 18:30:23 +08:00
1e225b56ab [fix](doc)Added english translation for monitoring Metric description page (#28435)
Added english translation for monitoring Metric description page
2024-01-16 18:30:23 +08:00
12f936558e [fix](doc) spell errors fixes for debug-point-action (#28152) 2024-01-16 18:30:23 +08:00
1998735432 [Improvement](function) enable ipv6_num_to_string function to support handling of IPv6 type (#29886)
Enable ipv6_num_to_string function to handle IPv6 type normally in addition to handling 16 byte string types
2024-01-16 18:30:23 +08:00
e7b221ba66 [fix](be-ut) Fix unstable test cases (#29896)
The following cases are unstable.

1. LoadStreamMgrTest
2. TaskWorkerPoolTest.PriorTaskWorkerPool

Rationales

1. LoadStreamMgrTest
It is related to timeout. If we investigate the examples in BRPC, we will find the timeout is usually set to 0 rather than a specific number.
2. TaskWorkerPoolTest.PriorTaskWorkerPool
The order of the threads for the lock contentions is undetermined.
2024-01-16 18:30:23 +08:00
88eab1b4b9 [doc](hight-concurrent-point-query) Improve and supplement hight-concurrent-point-query documentation (#29396) 2024-01-16 18:30:23 +08:00
41875a0bf5 [fix](move-memtable) check segment id in add_segment (#29898) 2024-01-16 18:30:23 +08:00
ee66f1563e [fix](Nereids) fix rf push down union (#29847)
Current union rf push down only support rf from parent join, but not support ancestor join.
The pr fixes this problem on project/distribute node's rf pushing down checking.
2024-01-16 18:30:22 +08:00