Commit Graph

14060 Commits

Author SHA1 Message Date
b56eecb341 update secure flag to false (#25412)
update secure flag to false
2023-10-13 17:00:58 +08:00
789210bc38 [chore](format) Refactor BaseTablet _full_name by using fmt replacing stringstream (#25400)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-10-13 03:59:03 -05:00
ac8fbdd53c [pipelineX](fix) Fix use-after-free in shuffling (#25409) 2023-10-13 16:57:34 +08:00
37dbda6209 [pipelineX](refactor) Use class template to simplify join (#25369) 2023-10-13 16:51:55 +08:00
Pxl
f4e2eb6564 remove unused code and adjust clang-tidy checks (#25405)
remove unused code and adjust clang-tidy checks
2023-10-13 16:27:37 +08:00
1a25bb65b0 [fix](case) change dynamic_partition.time_unit from day to month to avoid the error that the intert data not in partition (#25361)
[fix](case) change dynamic_partition.time_unit from day to month to avoid the error that the intert data not in partition
Co-authored-by: stephen <hello-stephen@qq.com>
2023-10-13 16:02:17 +08:00
2ec53ff60e [fix](multi-table) fix single stream multi table load can not finish (#25379) 2023-10-13 15:47:16 +08:00
283bd59eba [improvement](scanner) Remove the predicate that is always true for the segment (#25366)
By utilizing the zonemap index of the segment, we can ascertain if a predicate is always true. For example, if the segment’s maximum value is 100 and the predicate is col < 101, then this predicate is always true for this segment.
2023-10-13 15:25:38 +08:00
cee7a6889f [test](fix) case bug (#25363) 2023-10-13 15:25:15 +08:00
9cc0e9526a [enhancement](merge-on-write) consider version count on size-based cu compaction policy (#25352) 2023-10-13 14:52:21 +08:00
96f31ae9a7 [Docs](merge-on-write) Add more docs for partial update using native insert statement (#25356) 2023-10-13 14:48:51 +08:00
6298f90347 [ecosystem](doc) mysql synchronization example add mysql-conf port (#24666) 2023-10-13 01:36:26 -05:00
522faa8cd2 [fix](jni) the offset in map type is int64 (#25394)
The offset in map type column is int64, but #24810 has put as int32, causing error like:
2023-10-13 14:23:17 +08:00
fc40788018 [enhancement](merge-on-write) refine tablet meta_lock usage and add some trace log (#25124) 2023-10-13 14:22:07 +08:00
6757d2f361 Revert "[Enhancement](show-backends-disks) Add show backends disks (#24229)" (#25389)
This reverts commit 21223e65c59c23cfcb9e8ab610ea321168bcb75a.
2023-10-13 14:08:45 +08:00
6f9a084d99 [Fix](Outfile) Use data_type_serde to export data to parquet file format (#24998) 2023-10-13 13:58:34 +08:00
4f65a9c425 [fix](auth)fix not display be_port (#25197)
fix not display be_port who has ADMIN_PRIV
2023-10-13 11:56:00 +08:00
509a79988e [FIX](regresstest) fix cases for test_nested_types_insert_into_with_s3 (#25228) 2023-10-13 11:39:29 +08:00
ffacbe7d74 [feature](thrift) Add FE thrift rpc redirect master address (#25371)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-10-13 11:17:46 +08:00
aa0b74d63a [improvement](fe and broker) support specify broker to getSplits, check isSplitable, file scan for HMS Multi-catalog (#24830)
I want to use Doris Multi-catalog to accelerate HMS query. My organization has custom distributed file system, and we think wrapping the fs access difference into broker (listLocatedFiles, openReader..) would be a elegant approach.

This pr introduce HMS catalog conf `bind.broker.name`. If we set this conf, file split, query scan operation will send to broker.

usage:
create a hms catalog with broker usage
```
CREATE CATALOG hive_catalog_broker PROPERTIES (
    'type'='hms',
    'hive.metastore.uris' = 'thrift://xxx',
    'broker.name' = 'hdfs_broker'
);
```
When we try to query from this catalog, file split and query scan request will send to broker `hdfs_broker`.

More details about this pr:
1. Introduce HMS catalog proporty `bind.broker.name` to specify broker name to do remote path work. When `broker.name` is set, `enable.self.splitter` must be `true` to ensure file splitting process is executed in Fe
2. Introduce 2 more interfaces to broker service:
- `TBrokerIsSplittableResponse isSplittable(1: TBrokerIsSplittableRequest request)`, helps to invoke input format `isSplitable` interface.
- `TBrokerListResponse listLocatedFiles(1: TBrokerListPathRequest request)`, helps to do `listFiles` or `listLocatedStatus` for remote file system
3. 3 parts of whole processing will be executed in broker:
- Check whether the path with specified input format name `isSplittable`
- `listLocatedFiles` of table / partition locations.
- `OpenReader` for specified file splits.

Co-authored-by: chenlinzhong <490103404@qq.com>
2023-10-13 11:04:38 +08:00
ed67d5a2c2 [docs](developer-guide) Improve the be-vscode-gdb document (#25192)
Add miDebuggerPath into document to allow user set the gdb path.
If miDebuggerPath is not set, vscode may choose the gdb with low version.

ref: https://code.visualstudio.com/docs/cpp/launch-json-reference#_midebuggerpath
2023-10-13 11:03:46 +08:00
a30d30e7b5 [improvement](resource-tag) limit the default user's resource tag to 'default' (#25331)
In previous, if user property `'resource_tags.location'` is not set, the can use Backends with any resource tag.
It may confuse that when the DBA set part of Backends to resource group A, then the current existing user
should not be able to use this group A util it's `'resource_tags.location'` is set.

So in this PR, I change the behavior, that if user property `'resource_tags.location'` is not set, it can only use the
Backends with `default` tag.
2023-10-13 10:50:00 +08:00
11bbeb9a21 [Enhance](resource group)db support replication_allocation (#25195)
- db support replication_allocation,when create table,if not set `replication_num` or `replication_allocation `,will use it in db
- fix partition property will disappear when table partition is not null
2023-10-13 10:24:01 +08:00
Pxl
26f50f4f0f fix heap-use-after-free on map_agg (#25380)
fix heap-use-after-free on map_agg
2023-10-13 00:19:25 +08:00
1073ef22f3 [fix](insert) improve group_commit related tests (#25319) 2023-10-12 21:19:29 +08:00
21223e65c5 [Enhancement](show-backends-disks) Add show backends disks (#24229)
* Add statement to query disk information corresponding to data directory of BE node


[msyql]->'show backends disks;'
+-----------+-------------+------------------------------+---------+----------+---------------+-------------+-------------------+---------+
| BackendId | Host | RootPath | DirType | DiskState| TotalCapacity | UsedCapacity| AvailableCapacity | UsedPct |
+-----------+-------------+------------------------------+---------+----------+---------------+-------------+-------------------+---------+
| 10002 | 10.xx.xx.90 | /home/work/output/be/storage | STORAGE | ONLINE | 7.049 TB | 2.478 TB | 4.571 TB | 35.16 % |
| 10002 | 10.xx.xx.90 | /home/work/output/be | DEPLOY | ONLINE | 7.049 TB | 2.478 TB | 4.571 TB | 35.16 % |
| 10002 | 10.xx.xx.90 | /home/work/output/be/log | LOG | ONLINE | 7.049 TB | 2.478 TB | 4.571 TB | 35.16 % |
+-----------+-------------+------------------------------+---------+----------+---------------+-------------+-------------------+---------+
2023-10-12 20:24:45 +08:00
8825aa7543 [fix](regression test) use double quota for numbers #25365 2023-10-12 19:25:20 +08:00
66db3c9deb [Fix](mvn source) Fix fe compile java-cup and cup-maven-plugin not found #25348
use official address
2023-10-12 19:21:55 +08:00
0a38546596 [opt](Nereids) reject group commit insert temporarily (#25359)
group commit insert introduced by PR #22829. since nereids has not
support it, we forbid it temporarily on Nereids until impl it.
2023-10-12 06:20:59 -05:00
Pxl
1a0344df16 [Improvement](hash) refactor of hash map context (#24966)
refactor of hash map context
2023-10-12 18:10:21 +08:00
04bda138d6 [Enhance](regression)add broker load case (#25350) 2023-10-12 17:59:21 +08:00
b7c06b2c0c [chore](workflow) add back 'License Check' and 'Clang Formatter' (#25349)
add back 'License Check' and 'Clang Formatter'
2023-10-12 17:49:31 +08:00
be27d4d921 [fix](broker-load) fix use_count() issue when doing broker load in debug mode (#25288)
When executing broker load in ASAN mode, BE may crash with error:
```
F20231010 18:18:17.044978 185490 block.cpp:694] Check failed: d.column->use_count() == 1 (3 vs. 1)
*** Check failure stack trace: ***
    @     0x55e9d94c4e46  google::LogMessage::SendToLog()
    @     0x55e9d94c1410  google::LogMessage::Flush()
    @     0x55e9d94c5689  google::LogMessageFatal::~LogMessageFatal()
    @     0x55e9c509f80d  doris::vectorized::Block::clear_column_data()
    @     0x55e9b6c170b3  doris::PlanFragmentExecutor::get_vectorized_internal()
    @     0x55e9b6c147e6  doris::PlanFragmentExecutor::open_vectorized_internal()
    @     0x55e9b6c12d9a  doris::PlanFragmentExecutor::open()
    @     0x55e9b6c18426  doris::PlanFragmentExecutor::execute()
    @     0x55e9b6945cca  doris::FragmentMgr::_exec_actual()
    @     0x55e9b696456c  doris::FragmentMgr::exec_plan_fragment()::$_0::operator()()
```

It may happen when there is column maping like:
```
(k1,v2,v3,v4,v5,v6,v7,v8)
set (k2=v4,k3=v4,k4=v4)
```

in load stmt.

Case is covered by Baidu test cases
2023-10-12 17:04:29 +08:00
013eafc1d7 [Enhancement](filter) support only min/max runtime filter in BE (#25290)
this PR #25193 have achieve about FE.
eg: select count() from lineorder join supplier on lo_partkey < s_suppkey;
will have a max filter after build hash table , so could use it to filter probe table data.
2023-10-12 16:59:52 +08:00
e17f3b72dd [fix](load) handle Status in beta rowset writer (#25293) 2023-10-12 16:58:53 +08:00
c6824ce1ae [test](fix) unstable case test_jdbc_query_mysql (#25279) 2023-10-12 03:56:38 -05:00
bdb64eab73 [feature](meta) queries as table valued function (#25052) (#25052)
1. Add queries view as table function.
2. Proxy result to other FEs and return merged results back to BE.

Co-authored-by: yiguolei <676222867@qq.com>
2023-10-12 16:26:14 +08:00
1c3ecbbae9 [docker] [fix] add kafka log collector (#25326)
add kafka log collector
2023-10-12 15:23:10 +08:00
d6ff9744c9 [feature](Nereids) covert predicate to SARGABLE (#25180)
covert predicate to SARGABLE 
1. support format like `1 - a`
2. support rearrange `year/month/week/day/minutes/seconds_sub/add` function
2023-10-12 14:46:56 +08:00
c63bf24c84 [Improvement](statistics) Improve sample count accuracy (#25175)
While doing sample analyze, the result of row count, null number and datasize need to multiply a coefficient based on 
the sample percent/rows. This pr is mainly to calculate the coefficient according to the sampled file size over total size.
2023-10-12 14:42:02 +08:00
22684dedff [pipelineX](pick) pick PRs from pipeline (#25340) 2023-10-12 14:35:32 +08:00
80a49ed97a [fix](nereids)fix some function signature issue (#25301)
1. remove wrong signature of nvl
2. the promoted type datetimev2 for datetime should be datetimev2(0)
2023-10-12 01:23:20 -05:00
a0d3206d78 [fix](Nereids) support nested complex type literal (#25287) 2023-10-12 01:17:38 -05:00
2664d1cffb [chore](vec) Make this copy constructor of StringRef explicit (#25337) 2023-10-12 14:12:46 +08:00
42f8b253aa [function](nereids) support array_apply/array_repeat/group_uniq_array/ipv4numtostring (#25249)
nereids support functions: array_apply/array_repeat/group_uniq_array/ipv4numtostring
2023-10-12 11:08:42 +08:00
Pxl
a0d2b1ec56 [Bug](materialized-view) fix not match mv when some alias on agg (#25321)
fix not match mv when some alias on agg
2023-10-12 11:02:55 +08:00
Pxl
f14e4311c4 [Chore](check) add length check for BufferWritable (#25322)
add length check for BufferWritable
2023-10-12 10:51:50 +08:00
7447ac71b5 [minor](format) fix BE code format (#25328) 2023-10-12 10:34:36 +08:00
022762d5f0 [fix](memory) Fix work load group GC and add logs to locate slow GC #24975
Fix work load group GC, add cancel load and add logs.
Unify the format and change all to lowercase of GC logs, avoid unnecessary trouble when grep or less
Add logs to help locate the cause of slow GC.
2023-10-12 10:33:56 +08:00
2014e16cfb [fix](es catalog)fix es http timeout (#25273) 2023-10-12 10:21:55 +08:00