Commit Graph

13319 Commits

Author SHA1 Message Date
fca34ec337 [fix](multi-catalog)support bit type and hidden mc secret key (#24124)
support max compute bit type and mask mc secret key
bool type will use bit arrow vector
should mask secret key: close #24019
2023-09-12 10:36:48 +08:00
aa850fc9c3 [doc](hive) add faq for multi delimiter config (#24179)
when a user sets `serde` as `org.apache.hadoop.hive.contrib.serde2.MultiDelimitserDe`, we should add a configuration in hive-site.xml
2023-09-12 10:34:23 +08:00
484215e1cc [fix](Nereids): datetime - offset is wrong & support Two-Digital date (#24201)
- bug: datetime - offset is wrong
- support Two-Digital date
- remove useless override code
2023-09-12 10:17:56 +08:00
6913d68ba0 [Enhancement](merge-on-write) use delete bitmap to mark delete for rows with delete sign when sequence column doesn't exist (#24011) 2023-09-12 08:56:46 +08:00
11e052c7a4 [fix](invert index) fix overloaded-virtual compiler warning (#24174) 2023-09-11 23:47:19 +08:00
1228995dec [improvement](segment) reduce memory footprint of column_reader and segment (#24140) 2023-09-11 21:54:00 +08:00
0c30fff811 add add navigator for vector distance functions (#24081) 2023-09-11 19:55:36 +08:00
6e28d878b5 [fix](hudi) compatible with hudi spark configuration and support skip merge (#24067)
Fix three bugs:
1. Hudi slice maybe has log files only, so `new Path(filePath)`  will throw errors.
2. Hive column names are lowercase only, so match column names in ignore-case-mode.
3.  Compatible with [Spark Datasource Configs](https://hudi.apache.org/docs/configurations/#Read-Options), so users can add `hoodie.datasource.merge.type=skip_merge` in catalog properties to skip merge logs files.
2023-09-11 19:54:59 +08:00
115969c3fb [opt](nereids) improve eliminate outerjoin in cascades (#24120)
* eliminate outer join cascading
2023-09-11 19:42:05 +08:00
3d6d40db33 [docker] Add kafka relate case (#24180)
Add kafka relate case
2023-09-11 19:41:21 +08:00
86a064284e (improvement)[inverted-index] add and optimize checks when IO error occurs. (#24167)
When a disk io error occurs, errors may occur when reading and writing files in the inverted index. This PR adds error checking to prevent empty files from being generated.
2023-09-11 19:10:52 +08:00
dbb9365556 [Enhance](ip)optimize priority_ network matching logic for be (#23795)
Issue Number: close #xxx

If the user has configured the wrong priority_network, direct startup failure to avoid users mistakenly assuming that the configuration is correct
If the user has not configured p_ n. Select only the first IP from the IPv4 list, rather than selecting from all IPs, to avoid users' servers not supporting IPv4
extends #23784
2023-09-11 18:32:31 +08:00
a538b4922c [fix](block rule) throw npe when use Nereids explain or fallback (#24182) 2023-09-11 18:03:46 +08:00
b5227af6a1 [Feature](partitions) Support auto partition FE part (#24079) 2023-09-11 17:48:19 +08:00
229ee50c93 [Docs](StreamLoad)Add partial columns docs (#24184) 2023-09-11 17:16:29 +08:00
6384198136 [minor](fe) optimize some log info and imports issue (#24138) 2023-09-11 16:16:58 +08:00
8ae7e67623 [Docs](Ldap)Add Jdbc connect docs (#24181) 2023-09-11 15:53:37 +08:00
f27f486e8d fix missing stats in physical plan (#24159) 2023-09-11 15:41:32 +08:00
9b4338f66a [refactor](pipelineX) Split init and open for local state (#24166) 2023-09-11 14:50:41 +08:00
be3618316f [Fix](Nereids) fix infer predicate lost cast of source expression (#23692)
Problem:
When inferring predicate,we lost cast of source expressions and some datatype derivation.

Example:
a = b and cast(a as targetType) = constant
(cast(a as targetType) = constant ) this expression is define as source expression.
we expect getting cast(b as targetType) = constant instead of b = constant

Reason:
When inferring predicate, we will compare original type of a and b. if they can be cast
without precision lost, a new predicate would be created. But created predicate forgot
to cast to target type

Solved:
Add cast to target type, and open make other datatype valid also.
2023-09-11 14:30:31 +08:00
e847091dfe [fix](Nereids): add DateTimeFormatterUtils and fix bug (#24171)
bug
- should reject 20200219 010101
- datetime should be compatible with date
2023-09-11 14:28:03 +08:00
8b5453296e [fix](optimizer) Fix sql block when new optimizer is enabled (#23804)
The check would skipped since when checkBlockPolicy get invoked, new optimizer doesn't do plan yet
2023-09-11 14:27:11 +08:00
b4020a13ef [Improve](Routineload)Set the maximum timeout for obtaining partition to 60s (#24173) 2023-09-11 14:15:06 +08:00
8f7e7a7b31 [Fix](signal) fix signal handler (#24144) 2023-09-11 13:18:49 +08:00
7abd88f1b4 remove editlogport in frontrnds disks (#24047) 2023-09-11 12:38:56 +08:00
9c441a4a16 [feature](Nereids) support create table and ctas (#24150)
Co-authored-by: sohardforaname <organic_chemistry@foxmail.com>
2023-09-11 12:37:58 +08:00
134b210c03 [improvement](shutdown) not print thread pool error stack trace when shutdown (#24155)
* [improvement](shutdown) not print thread pool error stack trace when shutdown

when thread pool shutdown, should not print error stack trace, it is very confuse.
arrow flight server should not call shutdown, if it is not enabled, because it will print error stack.
remove service unavailable from thrift because it is useless.
Part of this PR need to pick to 2.0 branch.

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-09-11 12:20:07 +08:00
db139cfd6e [fix](log) delete useless log (#24161)
useless log in #23635
2023-09-11 12:08:59 +08:00
c94e47583c [fix](join) avoid DCHECK failed in '_filter_data_and_build_output' (#24162)
avoid DCHECK failed in '_filter_data_and_build_output'
2023-09-11 11:54:44 +08:00
fb6cb88341 [feature-wip](dbt) dbt view columns comment and view rename change (#23917)
1. dbt view columns comment 
2. view rename change,Adjust view override logic.
2023-09-11 11:15:23 +08:00
d18d272ac2 [improvement](jdbc catalog) Added create jdbc catalog properties validation (#23764) 2023-09-11 10:38:53 +08:00
d2cd0c30c7 [improvement](jdbc catalog) optimize the JDBC Catalog connection error message (#23868) 2023-09-11 10:26:54 +08:00
480fcef0a1 [typo](errmsg) Improve partition error message (#23968) 2023-09-11 10:25:06 +08:00
cd13f9e8c6 [BUG](view) fix can't create view with lambda function (#23942)
before the lambda function Expr not implement toSqlImpl() function.
so it's call parent function, which is not suit for lambda function.
and will be have error when create view.
2023-09-11 10:04:00 +08:00
0896aefce3 [fix](local exchange) fix bug of accesssing released counter of local data stream receiver (#24148) 2023-09-11 09:52:31 +08:00
a0fcc30764 [Fix](Status) Handle status code correctly and add a new error code ENTRY_NOT_FOUND (#24139) 2023-09-11 09:32:11 +08:00
dcde83d6e6 [Improve](regresstests)add boundary regress tests for map & array #24133 2023-09-11 08:28:11 +08:00
31bffdb5fc [enhancement](stats) audit for stats collection #24074
log stas collection sqls in audit log
2023-09-11 08:26:12 +08:00
71db844c64 [feature](invert index) add tokenizer CharFilter preprocessing (#24102) 2023-09-10 23:08:28 +08:00
ebac816e85 Revert "[improvement](bitshuffle)Enable avx512 support in bitshuffle for performance boost (#15972)" (#24146)
This reverts commit 28fcc093a8958a6870fec9802b23db07a42bbd7b.
2023-09-10 23:06:21 +08:00
586492c124 [Feature](multi-catalog) Support sql cache for hms catalog (#23391)
**Support sql cache for hms catalog. Legacy planner and Nereids planner are all supported. 
Not support partition cache now, not support federated query now.**
2023-09-10 21:56:35 +08:00
9b3be0ba7a [Fix](multi-catalog) Do not throw exceptions when file not exists for external hive tables. (#23799)
A similar bug compares to #22140 .

When executing a query with hms catalog, the query maybe failed because some hdfs files are not existed. We should just distinguish this kind of errors and skip it.

```
errCode = 2, detailMessage = (xxx.xxx.xxx.xxx)[CANCELLED][INTERNAL_ERROR]failed to init reader for file hdfs://xxx/dwd_tmp.db/check_dam_table_relation_record_day_data/part-00000-c4ee3118-ae94-4bf7-8c40-1f12da07a292-c000.snappy.orc, err: [INTERNAL_ERROR]Init OrcReader failed. reason = Failed to read hdfs://xxx/dwd_tmp.db/check_dam_table_relation_record_day_data/part-00000-c4ee3118-ae94-4bf7-8c40-1f12da07a292-c000.snappy.orc: [INTERNAL_ERROR]Read hdfs file failed. (BE: xxx.xxx.xxx.xxx) namenode:hdfs://xxx/dwd_tmp.db/check_dam_table_relation_record_day_data/part-00000-c4ee3118-ae94-4bf7-8c40-1f12da07a292-c000.snappy.orc, err: (2), No such file or directory), reason: RemoteException: File does not exist: /xxx/dwd_tmp.db/check_dam_table_relation_record_day_data/part-00000-c4ee3118-ae94-4bf7-8c40-1f12da07a292-c000.snappy.orc at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:86) 
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76) 
at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:158) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1927) 
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738) 
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:426) 
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) 
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) 
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) 
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) 
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
```
2023-09-10 21:55:09 +08:00
f85da7d942 [improvement](jdbc) add profile for jdbc read and convert phase (#23962)
Add 2 metrics in jdbc scan node profile:
- `CallJniNextTime`: call get next from jdbc result set
- `ConvertBatchTime`: call convert jobject to columm block

Also fix a potential concurrency issue when init jdbc connection cache pool
2023-09-10 21:42:06 +08:00
ff92b5bc29 [Bug](pipelineX) Fix runtime filter on pipelineX (#24141) 2023-09-10 20:53:54 +08:00
a05003fbe1 [fix](pipeline) fix remove pipeline_x_context from fragment manager (#24062) 2023-09-10 20:53:26 +08:00
1df2e4454f [improvememt](file-cache) increase virtual node number to make file cache more even (#24143)
The origin virtual number is Math.max(Math.min(512 / backends.size(), 32), 2);, which is too small,
causing uneven cache distribution when enabling file cache.
2023-09-10 19:56:53 +08:00
102abff071 [Fix](spark-load) ignore column name case in spark load (#23947)
Doris is not case sensitive to field names, so when doing spark load, we can convert all fields to lowercase for matching and loading.
2023-09-10 19:45:01 +08:00
8e171f5cbf [Enhancement](multi-catalog) merge hms partition events. (#22869)
This pr mainly has two changes:

1. add some merge processes about partition events
2. add a ut for `MetastoreEventFactory`. First add some mock classes (`MockCatalog`/`MockDatabase` ...) to simulate the real hms catalog/databases/tables/partitions,  then create a event producer which can produce every kinds of `MetastoreEvent`s randomly. Use two catalogs for test, one is named `testCatalog` and the other is the `validateCatalog`, use event producer to produce many events and let `validateCatalog` to handle all of the events, but `testCatalog` just handles the events  which have been merged by `MetastoreEventFactory`, check if the `validateCatalog` is equals to `testCatalog`.
2023-09-10 18:29:54 +08:00
32a7eef96a [schedule](pipeline) Remove wait schedule time in pipeline query engine (#23994)
Co-authored-by: yiguolei <676222867@qq.com>
2023-09-10 17:06:51 +08:00
648bf77c72 [Fix](MemtableMemoryLimiter) fix memtable memory limiter tigger flush log (#24137) 2023-09-10 16:33:35 +08:00