1. Support the show load warnings for mysql load to get the detail error message.
2. Fix fillByteBufferAsync not mark the load as finished in same data load
3. Fix drain data only in client mode.
Co-authored-by: ByteYue <[yj976240184@gmail.com](mailto:yj976240184@gmail.com)>
This PR is an optimization for https://github.com/apache/doris/pull/17478:
1. Change the buffer size of `LineReader` to 4MB to align with the size of prefetch buffer.
2. Lazily prefetch data in the first read to prevent wasted reading.
3. S3 block size is 32MB only, which is too small for a file split. Set 128MB as default file split size.
4. Add `_end_offset` for prefetch buffer to prevent wasted reading.
The query performance of reading data on object storage is improved by more than 3x+.
1. Organize http documents
2. Add http interface authentication for FE
3. **Support https interface for FE**
4. Provide authentication interface
5. Add http interface authentication for BE
6. Support https interface for BE
Inspired by c++ function `std::vector::emplace_back()`, we can use variadic template for this issue.
e.g.
```
[['struct'], 'STRUCT<TYPES>', ['TYPES'], 'ALWAYS_NOT_NULLABLE', ['TYPES...']]
```
`...TYPES` in template_types defines a variadic template `TYPE`. Then the variadic template will be expanded to multiple normal templates based on actual input arguments at runtime in FE.
But make sure `TYPES...` is placed on the last position in all template type arguments.
BTW, the origin template function logic is not affected.
Sometimes the competition of lock is fierce in DatabaseTransactionMgr, which may lead to publish time out, i think we should have a log to hint these lock competition.
1. close (https://github.com/apache/doris/issues/16458) for nereids
2. varchar and string type should be treated as same type in bucket shuffle join scenario.
```
create table shuffle_join_t1 ( a varchar(10) not null )
create table shuffle_join_t2 ( a varchar(5) not null, b string not null, c char(3) not null )
```
the bellow 2 sqls can use bucket shuffle join
```
select * from shuffle_join_t1 t1 left join shuffle_join_t2 t2 on t1.a = t2.a;
select * from shuffle_join_t1 t1 left join shuffle_join_t2 t2 on t1.a = t2.b;
```
3. PushdownExpressionsInHashCondition should consider both hash and other conjuncts
4. visitPhysicalProject should handle MarkJoinSlotReference
When setting FE config default_storage_medium to SSD, and set all BE storage path as SSD.
And table will be stored with storage medium SSD.
But there is a FE config storage_cooldown_second and its default value is 30 days.
So after 30 days, the storage medium of table will be changed to HDD, which is unexpected.
This PR removes the storage_cooldown_second, and use a max value to set the cooldown time of SSD
storage medium when the default_storage_medium is SSD.
Support create/drop global function.
When you create a custom function, it can only be used within in one database. It cannot be used in other database/catalog. When there are many databases/catalog, it needs to create function one by one.
## Problem summary
Describe your changes.
1、 When a function is created or deleted, add the global keyword.
CREATE [GLOBAL] [AGGREGATE] [ALIAS] FUNCTION function_name (arg_type [, ...]) [RETURNS ret_type] [INTERMEDIATE inter_type] [WITH PARAMETER(param [,...]) AS origin_function] [PROPERTIES ("key" = "value" [, ...]) ]
DROP [GLOBAL] FUNCTION function_name (arg_type [, ...])
2、A completely global global function is set, and the global function metadata is stored in the image. The function lookup strategy is to look in the database first, and if it can't be found, it looks in the global function.
Co-authored-by: lexluo <lexluo@tencent.com>
basic functions for map datatype:
- MAP<K, V> map(K k1, V v1, ...)
- BIGINT map_size(MAP<K, V> m)
- BOOL map_contains_key(MAP<K, V> m, K k1)
- BOOL map_contains_value(MAP<K, V> m, V v1)
- ARRAY< K> map_keys(MAP<K, V> m)
- ARRAY< V> map_values(MAP<K, V> m)
How it works?
Aspectj is used to implement the aspect function of annotations. During the compilation process, the aspectj-maven-plugin plugin will automatically weave the code with aspect annotations into the generated classes file.
When to use to?
When a method wants to add a try catch to save exception information, the LogException annotation can be used. When there is a method that does not allow errors, the NoException annotation can be used.
What is the result when adding this annotation?
Use the LogException annotation to automatically capture exceptions into the Log file, and the code can be more concise. Use the NoException annotation to automatically capture the exception to the Log file and exit the program when an exception occurs.
When compaction case, memory map offsets coming to same olap convertor which is from 0 to 0+size
but it should be continue in different pages when in one segment writer .
eg :
last block with map offset : [3, 6, 8, ... 100]
this block with map offset : [5, 10, 15 ..., 100]
the same convertor should record last offset to make later coming offset followed last offset.
so after convertor :
the current offset should [105, 110, 115, ... 200], then column writer just call append_data() to make the right offset data append pages
* [Feature](vectorized)(quantile_state): support vectorized quantile state functions
1. now quantile column only support not nullable
2. add up some regression test cases
3. set default enable_quantile_state_type = true
---------
Co-authored-by: spaces-x <weixiang06@meituan.com>
Notice some changes:
1. Support cancel query for mysql load
2. Change the thread pool for mysql load manager.
3. Fix sucret path check logic
4. Fix some doc error
A new way just like c++ template is proposed in this PR. The previous functions can be defined much simpler using template function.
# map element extract template function
[['element_at', '%element_extract%'], 'E', ['ARRAY<E>', 'BIGINT'], 'ALWAYS_NULLABLE', ['E']],
# map element extract template function
[['element_at', '%element_extract%'], 'V', ['MAP<K, V>', 'K'], 'ALWAYS_NULLABLE', ['K', 'V']],
BTW, the plain type function is not affected and the legacy ARRAY_X MAP_K_V is still supported for compatability.
We set LIBHDFS3_CONF env in start_be.sh, so libhdfs3 will try to read this hdfs-site.xml,
if file does not exist, it will throw error. But Doris does not handle this error, cause BE crash.
This CL mainly changes:
Modify start_be.sh to only set LIBHDFS3_CONF if hdfs-site.xml exist.
Refactor the HDFSCommonBuilder so that it can return error correctly.
Add BE IP info in status, so that we can get ip from error msg like:
ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]failed to init reader for file 000.snappy.orc, err:
[INTERNAL_ERROR][172.21.0.101]failed to init HDFSCommonBuilder, please check check be/conf/hdfs-site.xml
The logic of prefer compute node is wrong, which causing the external table query can only assign up to 3 backends.
This CL refactor this logic and also change some FE config:
prefer_compute_node_for_external_table
If set to true, query on external table will prefer to assign to compute node.
And the max number of compute node is controlled by min_backend_num_for_external_table.
If set to false, query on external table will assign to any node.
min_backend_num_for_external_table
Only take effect when prefer_compute_node_for_external_table is true.
If the compute node number is less than this value, query on external table will try to get some mix node
to assign, to let the total number of node reach this value.
If the compute node number is larger than this value, query on external table will assign to compute node only.
Loading a big local file will cause `INTERNAL_ERROR]too many filtered rows` issue since the bytebuffer from mysql client always use the same byte array.
And the later bytes will overwrite the previous one and make wrong bytes order among the network.
Copy the byte array and then fill it into network.
1. Organize http documents
2. Add http interface authentication for FE
3. Support https interface for FE
4. Provide authentication interface
5. Add http interface authentication for BE
6. Support https interface for BE
- change for Nereids
1. add a variable length parameter to the ctor of Count for a good error reporting of Count(a, b)
2. refactor StringRegexPredicate, let it inherit from ScalarFunction
3. remove useless class TypeCollection
4. use catalog.Type.Collection to check expression arguments type
5. change type coercion for TimestampArithmetic, divide, integral divide, comparison predicate, case when and in predicate. Let them same as legacy planner.
- change for legacy planner
1. change the common type of floating and Decimal from Decimal to Double
This commits forbid struct and map type to be distributed key/aggregation key.
The sql such as:
select distinct stuct_col from struct_table
will report an error.
Currently not support insert {1, 'a'} into struct<f1:tinyint, f2:varchar(20)>
This commit will support implicitly cast the char type in the struct to varchar.
Add implicitly cast for struct-type.
MySql load can load fe server node, but it will cause secure issue that user use it to detect the fe node local file.
For this reason, add a configuration named mysql_load_server_secure_path to set a secure path to load data.
By default, load fe local file feature is disabled by this configuration.