Before this PR, Paimon has created the schema of `VectorTable` by accessing meta information. However, once the schema of `VectorTable` in java is not same as `Block` in c++, BE will crashed, and there is no good way to troubleshoot errors.
Use id as where predicate to load column statistic cache. This could improve performance, because id is the first order key in column statistics table.
add subquery pull up rule group ut:
single table part
(1) subquery_basic_pullup_basic: basic part
(2) subquery_basic_pullup_uk: using unique key
(3) subquery_basic_pullup_and: with and
(4) subquery_basic_pullup_or: with or
(5) subquery_basic_pullup_and_subquery: with and subquery
(6) subquery_basic_pullup_or_subquery: with or subquery
multi table part
(1) subquery_multitable_pullup_basic: basic part
(2) subquery_multitable_pullup_and: with and
(3) subquery_multitable_pullup_or: with or
(4) subquery_multitable_pullup_and_subquery: with and subquery
(5) subquery_multitable_pullup_or_subquery: with or subquery
top op related, such as group by/having, order by, win func, select list:
(1) subquery_topop_pullup_groupby: group by
(2) subquery_topop_pullup_having: having
(3) subquery_topop_pullup_orderby: order by
(4) subquery_topop_pullup_winfunc: window function
(5) subquery_topop_pullup_selectlist: select list
misc, such as group by correlated, view, dml, etc:
(1) subquery_misc_pullup_misc: group by correlated, view, etc
(2) subquery_misc_pullup_dml: insert/update/delete
Remove retry load when load stats cache fail. This case usually happens when BE is down or BE OOM, retry doesn't work in these cases and may increase BE work load.
We change memtable size from 200MB to 100MB to achieve smoother flush
performance. We change loadStreamPerNode from 20 to 60 to avoid stream
rpc to be the bottleneck when enable memtable_on_sink_node. We change
default s3&broker load parallelsim to make the most of CPUs on moderm
multi-core systems.
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
project child not always NamedExpression
failed msg
```
org.apache.doris.common.AnalysisException: errCode = 2, detailMessage = class org.apache.doris.nereids.trees.expressions.literal.VarcharLiteral cannot be cast to class org.apache.doris.nereids.trees.expressions.NamedExpression (org.apache.doris.nereids.trees.expressions.literal.VarcharLiteral and org.apache.doris.nereids.trees.expressions.NamedExpression are in unnamed module of loader 'app')
at org.apache.doris.qe.StmtExecutor.executeByNereids(StmtExecutor.java:623) ~[classes/:?]
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:478) ~[classes/:?]
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:457) ~[classes/:?]
at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:245) ~[classes/:?]
at org.apache.doris.qe.MysqlConnectProcessor.handleQuery(MysqlConnectProcessor.java:166) ~[classes/:?]
at org.apache.doris.qe.MysqlConnectProcessor.dispatch(MysqlConnectProcessor.java:193) ~[classes/:?]
at org.apache.doris.qe.MysqlConnectProcessor.processOnce(MysqlConnectProcessor.java:246) ~[classes/:?]
at org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52) ~[classes/:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: java.lang.ClassCastException: class org.apache.doris.nereids.trees.expressions.literal.VarcharLiteral cannot be cast to class org.apache.doris.nereids.trees.expressions.NamedExpression (org.apache.doris.nereids.trees.expressions.literal.VarcharLiteral and org.apache.doris.nereids.trees.expressions.NamedExpression are in unnamed module of loader 'app')
at org.apache.doris.nereids.trees.plans.physical.PhysicalSetOperation.pushDownRuntimeFilter(PhysicalSetOperation.java:178) ~[classes/:?]
at org.apache.doris.nereids.trees.plans.physical.PhysicalHashJoin.pushDownRuntimeFilter(PhysicalHashJoin.java:229) ~[classes/:?]
at org.apache.doris.nereids.processor.post.RuntimeFilterGenerator.pushDownRuntimeFilterCommon(RuntimeFilterGenerator.java:386) ~[classes/:?]
```
In the old caching logic, we only used jdbcurl, user, and password as cache keys. This may cause the old link to be still used when replacing the jar package, so we should concatenate all the parameters required for the connection pool as the key.
We will Integrate new load job manager into new job scheduling framework so that the insert into task can be scheduled after the broker load sql is converted to insert into TVF(table value function) sql.
issue: https://github.com/apache/doris/issues/24221
Now support:
1. load data by tvf insert into sql, but just for simple load(columns need to be defined in the table)
2. show load stmt
- job id, label name, job state, time info
- simple progress
3. cancel load from db
4. support that enable new load through Config.enable_nereids_load
5. can replay job after restarting doris
TODO:
- support partition insert job
- support show statistics from BE
- support multiple task and collect task statistic
- support transactional task
- need add ut case
1. Remove `doris_max_remote_scanner_thread_pool_thread_num`, use `doris_scanner_thread_pool_thread_num` only.
2. Set the default value `doris_scanner_thread_pool_thread_num` as `std::max(48, CpuInfo::num_cores() * 4)`
* [improvement] (nereids) Get partition related table disable nullable field and modify regression test, complete agg mv rules.
* make filed not null to create partition mv