In JdbcMysqlClient, I've added methods to retrieve auto-increment and default value columns from MySQL. These columns are then mapped into Doris metadata to make them visible to users.
When handling the InsertStmt into an execution plan, Doris used to automatically fill in NULL or default values for columns not specified in the InsertStmt. However, in the JDBC catalog, we don't need Doris to handle these unspecified columns, so I've made changes to skip them directly.
For the insert prepared statement required for writing, our previous behavior was to obtain all columns for placeholders. So, the change I made is to pass in the columns processed by the execution plan during the sink task generation stage for dynamic generation.
we add penalty for broadcast join (bc for brief in the following).
the intuition of penalty is as follow:
1. if the build side is very small (< 1M), we prefer bc, and set `penalty=1`, which means no penalty
2. if build side is more than 1M, we consider the ratio of the probe row count to the build row count. the less the ratio is, the higher penalty is.
this pr has positive impact on tpch queries. Only q3 is changed. in out test (tpch 1T, 3BE) q3 improved from 5.1sec to 2.5 sec.
this pr has positive impact on tpcds queries. test on tpcds sf100 (3BE), cold run improve from 163 sec to 156 sec, hot run improves from 155 sec to 149 sec
1. add more checks for match expression in nereids:
- match expression only support in filter
- match expression left child and right child must all be string type
- left child for match expression must be sloftRef, right child for match expression must be Literal
2. to fix regression case test_index_match_select and test_index_match_phrase
1. Add regression test case for analyze to make sure show/drop/analyze stats would work as expected
2. Remove useless code, which would block the clean for expired stats
3. Fix bug of DropStats, before this PR drop the whole table stats would casuse a NPE exception when parsing stmt
currently, expression: cast('20230631' as date) will be evaluate to 2023-06-30 incorrectly, and '20230632' will be null, we fix the bug and evaluate all the invalid date to null.
Fix hive transaction table regression test test_transactional_hive by adding hive-docker missing configurations of #20679. Hive need to be set these configurations to do compaction.
In workflow BE UT (Clang), we set up the ldb_toolchain before we build the UT. During generating the toolchain tools, the script modifies the RPATH and dynamic linker of the executables which making the mtime of compilers change in every build.
By default, Ccache computes the hash of the compilers' mtime and size to check the compilers whether they are consistent with the ones which were used to generate the caches. If the compilers change, the caches can not be hit. We should change the default behavior of Ccache by setting the configuration `compiler_check (CCACHE_COMPILERCHECK)` to increase the cache hit ratio in the workflow.
Reference: https://ccache.dev/manual/latest.html#_configuration_options
fix pipeline task call finish_p_dependency more than once
When pipeline task meet eos->PENDING_FINISH->CANCELED, this task will call finish_p_dependency twice.