Commit Graph

5558 Commits

Author SHA1 Message Date
6913d68ba0 [Enhancement](merge-on-write) use delete bitmap to mark delete for rows with delete sign when sequence column doesn't exist (#24011) 2023-09-12 08:56:46 +08:00
11e052c7a4 [fix](invert index) fix overloaded-virtual compiler warning (#24174) 2023-09-11 23:47:19 +08:00
1228995dec [improvement](segment) reduce memory footprint of column_reader and segment (#24140) 2023-09-11 21:54:00 +08:00
6e28d878b5 [fix](hudi) compatible with hudi spark configuration and support skip merge (#24067)
Fix three bugs:
1. Hudi slice maybe has log files only, so `new Path(filePath)`  will throw errors.
2. Hive column names are lowercase only, so match column names in ignore-case-mode.
3.  Compatible with [Spark Datasource Configs](https://hudi.apache.org/docs/configurations/#Read-Options), so users can add `hoodie.datasource.merge.type=skip_merge` in catalog properties to skip merge logs files.
2023-09-11 19:54:59 +08:00
86a064284e (improvement)[inverted-index] add and optimize checks when IO error occurs. (#24167)
When a disk io error occurs, errors may occur when reading and writing files in the inverted index. This PR adds error checking to prevent empty files from being generated.
2023-09-11 19:10:52 +08:00
dbb9365556 [Enhance](ip)optimize priority_ network matching logic for be (#23795)
Issue Number: close #xxx

If the user has configured the wrong priority_network, direct startup failure to avoid users mistakenly assuming that the configuration is correct
If the user has not configured p_ n. Select only the first IP from the IPv4 list, rather than selecting from all IPs, to avoid users' servers not supporting IPv4
extends #23784
2023-09-11 18:32:31 +08:00
9b4338f66a [refactor](pipelineX) Split init and open for local state (#24166) 2023-09-11 14:50:41 +08:00
8f7e7a7b31 [Fix](signal) fix signal handler (#24144) 2023-09-11 13:18:49 +08:00
134b210c03 [improvement](shutdown) not print thread pool error stack trace when shutdown (#24155)
* [improvement](shutdown) not print thread pool error stack trace when shutdown

when thread pool shutdown, should not print error stack trace, it is very confuse.
arrow flight server should not call shutdown, if it is not enabled, because it will print error stack.
remove service unavailable from thrift because it is useless.
Part of this PR need to pick to 2.0 branch.

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-09-11 12:20:07 +08:00
c94e47583c [fix](join) avoid DCHECK failed in '_filter_data_and_build_output' (#24162)
avoid DCHECK failed in '_filter_data_and_build_output'
2023-09-11 11:54:44 +08:00
cd13f9e8c6 [BUG](view) fix can't create view with lambda function (#23942)
before the lambda function Expr not implement toSqlImpl() function.
so it's call parent function, which is not suit for lambda function.
and will be have error when create view.
2023-09-11 10:04:00 +08:00
0896aefce3 [fix](local exchange) fix bug of accesssing released counter of local data stream receiver (#24148) 2023-09-11 09:52:31 +08:00
a0fcc30764 [Fix](Status) Handle status code correctly and add a new error code ENTRY_NOT_FOUND (#24139) 2023-09-11 09:32:11 +08:00
71db844c64 [feature](invert index) add tokenizer CharFilter preprocessing (#24102) 2023-09-10 23:08:28 +08:00
ebac816e85 Revert "[improvement](bitshuffle)Enable avx512 support in bitshuffle for performance boost (#15972)" (#24146)
This reverts commit 28fcc093a8958a6870fec9802b23db07a42bbd7b.
2023-09-10 23:06:21 +08:00
9b3be0ba7a [Fix](multi-catalog) Do not throw exceptions when file not exists for external hive tables. (#23799)
A similar bug compares to #22140 .

When executing a query with hms catalog, the query maybe failed because some hdfs files are not existed. We should just distinguish this kind of errors and skip it.

```
errCode = 2, detailMessage = (xxx.xxx.xxx.xxx)[CANCELLED][INTERNAL_ERROR]failed to init reader for file hdfs://xxx/dwd_tmp.db/check_dam_table_relation_record_day_data/part-00000-c4ee3118-ae94-4bf7-8c40-1f12da07a292-c000.snappy.orc, err: [INTERNAL_ERROR]Init OrcReader failed. reason = Failed to read hdfs://xxx/dwd_tmp.db/check_dam_table_relation_record_day_data/part-00000-c4ee3118-ae94-4bf7-8c40-1f12da07a292-c000.snappy.orc: [INTERNAL_ERROR]Read hdfs file failed. (BE: xxx.xxx.xxx.xxx) namenode:hdfs://xxx/dwd_tmp.db/check_dam_table_relation_record_day_data/part-00000-c4ee3118-ae94-4bf7-8c40-1f12da07a292-c000.snappy.orc, err: (2), No such file or directory), reason: RemoteException: File does not exist: /xxx/dwd_tmp.db/check_dam_table_relation_record_day_data/part-00000-c4ee3118-ae94-4bf7-8c40-1f12da07a292-c000.snappy.orc at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:86) 
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76) 
at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:158) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1927) 
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738) 
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:426) 
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) 
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) 
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) 
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) 
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
```
2023-09-10 21:55:09 +08:00
f85da7d942 [improvement](jdbc) add profile for jdbc read and convert phase (#23962)
Add 2 metrics in jdbc scan node profile:
- `CallJniNextTime`: call get next from jdbc result set
- `ConvertBatchTime`: call convert jobject to columm block

Also fix a potential concurrency issue when init jdbc connection cache pool
2023-09-10 21:42:06 +08:00
ff92b5bc29 [Bug](pipelineX) Fix runtime filter on pipelineX (#24141) 2023-09-10 20:53:54 +08:00
a05003fbe1 [fix](pipeline) fix remove pipeline_x_context from fragment manager (#24062) 2023-09-10 20:53:26 +08:00
102abff071 [Fix](spark-load) ignore column name case in spark load (#23947)
Doris is not case sensitive to field names, so when doing spark load, we can convert all fields to lowercase for matching and loading.
2023-09-10 19:45:01 +08:00
32a7eef96a [schedule](pipeline) Remove wait schedule time in pipeline query engine (#23994)
Co-authored-by: yiguolei <676222867@qq.com>
2023-09-10 17:06:51 +08:00
648bf77c72 [Fix](MemtableMemoryLimiter) fix memtable memory limiter tigger flush log (#24137) 2023-09-10 16:33:35 +08:00
14f8f0cae0 [Improvement](errorcode) use error code when disk exceed capacity limit (#24136) 2023-09-10 16:32:17 +08:00
71645a391c [debug](FileCache) fail over to remote file reader if local cache failed (#24097)
Fail over to remote file reader even if local file cache failed. This operation can increase the robustness of file cache.
2023-09-10 12:26:17 +08:00
262c669918 [fix](jdbc catalog) fix jdbc catalog creating json columns when reading json data (#24122) 2023-09-10 12:00:53 +08:00
953958c486 [fix](create tablet) fix backend create tablet timeout (#23879) 2023-09-10 11:41:00 +08:00
93c1151f1a [fix](join) incorrect result of mark join (#24112) 2023-09-10 11:30:45 +08:00
5f2ca8c84c [log](load) print more message about load job on tablet error (#24096) 2023-09-10 10:30:43 +08:00
f9a75b5c4f [feature](csv_serde)1.append csv serde for serialize to csv and deserialize from csv. 2.let csvReader use csv serde not text_converter. (#23352)
1. append csv serde for serialize to csv and deserialize from csv.
2. let csvReader use csv serde not text_converter.
2023-09-10 00:16:21 +08:00
5eb9e10b51 [pipelineX](pick) pick 2 PRs to fix bugs (#24117) 2023-09-09 23:10:46 +08:00
6b9698a248 [bugfix](insert into) should not send profile during report process (#24127)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-09-09 17:12:35 +08:00
c3f3195721 [Fix](clucene) fix clucene build error in arm (#24130) 2023-09-09 15:31:40 +08:00
03757d0672 [bug](explode) fix table node not implement alloc_resource function (#24031)
fix table node not implement alloc_resource function
2023-09-09 08:25:28 +08:00
153c7982f3 [Optimize](invert index) Optimize multiple terms conjunction query (#23871) 2023-09-09 01:52:58 +08:00
0f408d1192 [improvement](executor)Add name for task scheduler #23983 2023-09-09 00:56:39 +08:00
0f0ffa3482 [Fix](Parquet Reader) fix parquet read issue (#24092) 2023-09-09 00:35:18 +08:00
0143ae8266 [fix]Add logging before _builtin_unreachable() (#24101)
Co-authored-by: 宋光璠 <songguangfan@sf.com>
2023-09-09 00:30:11 +08:00
e140938d81 [Perfomance][export] Opt the export of CSV tranformer (#24003) 2023-09-08 20:26:54 +08:00
0b24bd6a42 [Bug](pipelineX) init runtime filter profile at first (#24106) 2023-09-08 20:01:02 +08:00
2638ad0550 [fix](compaction) rowid_conversion should ignore deleted row on normal compaction (#24005) 2023-09-08 19:44:24 +08:00
f8fd8a3d17 [fix](trash) fix clean trash not working (#23936)
When executing admin clean trash, if the backend daemon clean thread is cleaning trash, then SQL command will return immediately. But for the backend daemon thread, it doesn't clean all the trashes, it clean only the expired trashes.
Also if there's lots of trashes, the daemon clean thread will busy handling trashes for a long time.
2023-09-08 18:13:22 +08:00
76ca57cf21 [bug](join) fix outer join not add tuple is null column when build rows is 0 (#23974)
fix outer join not add tuple is null column when build rows is 0
2023-09-08 17:55:03 +08:00
Pxl
69868f18d6 [Bug](join) fix nested loop join some problems (#24034) 2023-09-08 17:40:41 +08:00
1abf5e779d [pipelineX](refactor) refactor debug string (#24083) 2023-09-08 16:58:53 +08:00
82dc970916 [feature](insert) Support group commit insert (#22829) 2023-09-08 15:51:03 +08:00
2965b9b3b4 fix update delete bitmap when rowset is blank (#24075)
If the rowset (derived from a clone) does not have a segment, there is no need to update the delete bitmap.
2023-09-08 12:43:42 +08:00
cb29d1a395 fix compile error with gcc12 (#24049) 2023-09-08 10:36:30 +08:00
b73f345479 [fix](intersect) fix wrong result of intersect node (#24044)
Issue Number: close #24046
2023-09-08 10:27:37 +08:00
3927ceac95 [Bug](runtime filter) Fix runtime filter initialization (#24063)
In be.WARNING, print lots of logs like 'runtime filter params meet error'. This is misleading message
2023-09-08 10:27:20 +08:00
cdb1b341c7 [pipelineX](runtime filter) Support runtime filter (#24054) 2023-09-08 10:17:22 +08:00