Iceberg has its own metadata information, which includes count statistics for table data. If the table does not contain equli'ty delete, we can get the count data of the current table directly from the count statistics.
1. should not use ((LogicalPlanAdapter)parsedStmt).getStatementContext().getOriginStatement().originStmt.toLowerCase() as the cache key (do not invoke toLowerCase()), for example: select * from tbl1 where k1 = 'a' is different with select * from tbl1 where k1 = 'A', so the cache should be missed.
2. according to issue 6735 , the cache key should contains all views' s ddl sql (including nested views)
[Fix](orc-reader) Fix filling partition or missing column used incorrect row count.
`_row_reader->nextBatch` returns number of read rows. When orc lazy materialization is turned on, the number of read rows includes filtered rows, so caller must look at `numElements` in the row batch to determine how
many rows were not filtered which will to fill to the block.
In this case, filling partition or missing column used incorrect row count which will cause be crash by `filter.size() != offsets.size()` in filter column step.
When orc lazy materialization is turned off, add `_convert_dict_cols_to_string_cols(block, nullptr)` if `(block->rows() == 0)`.
1. the real value of BE config `file_cache_max_file_reader_cache_size` will be the 1/3 of process's max open file number.
2. use thread pool to create or init the file cache concurrently.
To solve the issue that when there are lots of files in file cache dir, the starting time of BE will be very slow because
it will traverse all file cache dirs sequentially.
should use getHashJoinConjuncts() and getOtherJoinConjuncts() to get hash and other conjuncts of hash join node instead of categorizing them by checking if it's 'EqualTo' expression
slot bind failed for following querys:
select tpch.lineitem.* from lineitem
select tpch.lineitem.l_partkey from lineitem
the unbound slot is tpch.lineitem.l_partkey, but the bounded slot is default_cluster:tpch.lineitem.l_partkey. They are not matched.
we need to ignore default_cluster: when compare dbName
Problem:
When executing group_concat with order by inside in view, column can not be found when analyze.
Example:
create view if not exists test_view as select group_concat(c1,',' order by c1 asc) from table_group_concat;
select * from test_view;
it will return an error like: "can not find c1 in table_list"
Reason:
When we executing this sql in multi-instance environment, Planner would try to create plan in multi phase
aggregation. And because we analyze test_view independent with tables outside view. So we can not get
table informations inside view.
Solution:
Substitute order by expression of merge aggregation expressions.
insert into table command run at a follower node, it will forward to the master node, and the parsed statement is not set to the cascades context, but set to the executor::parsedStmt, we use the latter to get the user info.