1. Fix create catalog with resource replay bug.
If user create catalog using `create catalog hive with resource xxx`, when replaying edit log,
there is a bug that resource may be dropped, causing NPE and FE will fail to start.
In this PR, I add a new FE config `disallow_create_catalog_with_resource`, default is true.
So that `with resource` will not be allowed, and it will be deprecated later.
And also fix the replay bug to avoid NPE.
2. Fix issue when creating 2 hive catalogs to connect with and without kerberos authentication.
When user create 2 hive catalogs, one use simple auth, the other use kerberos auth.
The query may fail with error like: `Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections.`
So I add a default property for hive catalog: `"ipc.client.fallback-to-simple-auth-allowed" = "true"`.
Which means this property will be added automatically when user creating hive catalog, to avoid such problem.
3. Fix calling `hdfsExists()` issue
When calling `hdfsExists()` with non-zero return code, should check if it encounters error or is file not found.
3. Some code refactor
Avoid import `org.apache.parquet.Strings`
recurrent:
./run-regression-test.sh --run -suiteParallel 1 -actionParallel 1 -parallel 1 -d query_p0/sql_functions/window_functions
select /*+ SET_VAR(query_timeout = 600) */ subq_0.`c1` as c0 from (select ref_1.`s_name` as c0, ref_1.`s_suppkey` as c1, ref_1.`s_address` as c2, ref_1.`s_address` as c3 from regression_test_query_p0_sql_functions_window_functions.tpch_tiny_supplier as ref_1 where (ref_1.`s_name` is NULL) or (ref_1.`s_acctbal` is not NULL)) as subq_0 where (subq_0.`c3` is NULL) or (subq_0.`c2` is not NULL)
reason:
FunctionIsNull and FunctionIsNotNull execute returns a const column, but their VectorizedFnCall::is_constant returns false, which causes problems with const handling when VCompoundPred::execute.
This pr converts const column to full column in VCompoundPred execute. In the future, there will be a more thorough solution to such problems.
* [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema
1. When the system is under high-concurrency load with wide table point queries, the frequent memory allocation and deallocation of Schema become evident system bottlenecks. Additionally, the initialization of TabletSchema and Schema also becomes a CPU hotspot.Therefore, the introduction of a SchemaCache is implemented to cache these resources for reuse.
2. Make some variables wrapped with std::unique<unique_ptr>
Performance:
| 状态 | QPS | 平均响应时间 (avg) | P99 响应时间 |
|------------------|-----|------------------|-------------|
| 开启 SchemaCache | 501 | 20ms | 34ms |
| 关闭 SchemaCache | 321 | 31ms | 61ms |
* handle schema change with schema version
* remove useless header
* rebase
before
mysql [(none)]>select cast("10:10:10" as time);
+-------------------------------+
| CAST('10:10:10' AS TIMEV2(0)) |
+-------------------------------+
| 00:00:00 |
+-------------------------------+
after
mysql [(none)]>select cast("10:10:10" as time);
+-------------------------------+
| CAST('10:10:10' AS TIMEV2(0)) |
+-------------------------------+
| 10:10:10 |
+-------------------------------+
In the past, we supported this syntax.
mysql [(none)]>select cast("2023:05:01 13:14:15" as time);
+------------------------------------------+
| CAST('2023:05:01 13:14:15' AS TIMEV2(0)) |
+------------------------------------------+
| 13:14:15 |
+------------------------------------------+
However, "10:10:10" is also a valid datetime.
mysql [(none)]>select cast("10:10:10" as datetime);
+-----------------------------------+
| CAST('10:10:10' AS DATETIMEV2(0)) |
+-----------------------------------+
| 2010-10-10 00:00:00 |
+-----------------------------------+
So here, the order of parsing has been adjusted.
Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity.
By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed.
This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.
before: the node will wait to retrieve all data from child, then send data to parent.
now: for data from child that does not require sorting, it can be sent to parent immediately.
#18976 introduced merge small IO facility to optimize performance, and used by parquet reader.
This PR support this facility in orc reader. Current ORC reader implementation need to reposition parent present stream when reading lazy columns in lazy materialization facility. So let it works by removing `DCHECK_GE(offset, cached_data.end_offset)`.
/home/zcp/repo_center/doris_master/doris/be/src/olap/rowset/segment_v2/column_reader.cpp:895:21: runtime error: load of value 423208544, which is not a valid value for type 'doris::ReaderType'
/home/zcp/repo_center/doris_master/doris/be/src/vec/columns/column_decimal.cpp:260:33: runtime error: load of misaligned address 0x7fa3348b301c for type 'int64_t' (aka 'long'), which requires 8 byte alignment
/home/zcp/repo_center/doris_master/doris/be/src/olap/block_column_predicate.cpp:82:24: runtime error: variable length array bound evaluates to non-positive value 0
/home/zcp/repo_center/doris_master/doris/be/src/vec/columns/column_string.h:225:26: runtime error: null pointer passed as argument 2, which is declared to never be null
* Revert "[fix](sink) fix END_OF_FILE error for pipeline caused by VDataStreamSender eof (#20007)"
This reverts commit 2ec1d282c5e27b25d37baf91cacde082cca4ec31.
* [fix](revert) data stream sender stop sending data to receiver if it returns eos early (#19847)"
This reverts commit c73003359567067ea7d44e4a06c1670c9ec37902.
For broadcast join, only one build fragment instance will build hash table, other fragment instances just receive and throw away build side data, this is waste of memory and cpu.
This PR improve this condition, data stream receiver tells sender that it does not need data from sender, and sender stops sending anydata to it.
After the query check process memory exceed limit in Allocator, it will wait up to 5s.
Before, Allocator will not check whether the query is canceled while waiting for memory, this causes the query to not end quickly.
Firstly, to reduce memory usage, we do not pre-allocate blocks, instead we lazily allocate block when upper call get_free_block. And when upper call return_free_block to return free block, we add the block to a queue for memory reuse, and we will free the blocks in the queue when the scanner_context was closed instead of destructed.
Secondly, to limit the memory usage of the scanner, we introduce a variable _free_blocks_capacity to indicate the current number of free blocks available to the scanners. The number of scanners that can be scheduled will be calculated based on this value.
ssb flat test
previous
lineorder 1.2G:
load time: 3s, query time: 0.355s
lineorder 5.8G:
load time: 330s, query time: 0.970s
load time: 349s, query time: 0.949s
load time: 349s, query time: 0.955s
load time: 360s, query time: 0.889s (pipeline enabled)
after
lineorder 1.2G:
load time: 3s, query time: 0.349s
lineorder 5.8G:
load time: 342s, query time: 0.929s
load time: 337s, query time: 0.913s
load time: 345s, query time: 0.946s
load time: 346s, query time: 0.865s (pipeline enabled)
make VcompoundPred optimization work well
#19818 this pr try to enable VcompoundPred optimization but get wrong result on tpcds q28.
The reason is some nullable logic on mysql need special handling.
mysql [regression_test_tpcds_sf1_p1]>select null and false;
+----------------+
| NULL AND FALSE |
+----------------+
| 0 |
+----------------+
1 row in set (0.00 sec)
mysql [regression_test_tpcds_sf1_p1]>select null and true;
+---------------+
| NULL AND TRUE |
+---------------+
| NULL |
+---------------+
1 row in set (0.00 sec)
mysql [regression_test_tpcds_sf1_p1]>select null or false;
+---------------+
| NULL OR FALSE |
+---------------+
| NULL |
+---------------+
1 row in set (0.00 sec)
mysql [regression_test_tpcds_sf1_p1]>select null or true;
+--------------+
| NULL OR TRUE |
+--------------+
| 1 |
+--------------+
1 row in set (0.00 sec)
Fix partition field conjuncts not work.
Add predicate_partition_columns in _slot_id_to_filter_conjuncts(single slot conjuncts) to _filter_conjuncts, others should had been added from not_single_slot_filter_conjuncts.
1. Get DataTypeSerde in advance to avoid get temporary DataTypeSerde iterate each column
2. Iterate the original row once is enoungh for deserializing by introducing a map for record the index of each column's unique id
To be more compatible with MySQL, rename JSONB type name and function name to JSON.
The old JSONB type name and jsonb_xx function can still be used for backward compatibility.
There is a function jsonb_extract remained since json_extract is used by json string function and more work need to change it. It will be changed further.