1. Change from using string matching function to using Expr matching
2. Replace the `nvl` function with `ifnull` when pushed down to MySQL
3. Adapt ClickHouse's `from_unixtime` function to push down
4. Non-function filtering can still be pushed down when `enable_func_pushdown` is set to false
1. Change the external hive docker network mode from the bridge mode to the host mode to support the external test of the multi-node doris cluster
2. Added more hive test data in various formats
3. Added a test case with hive
Issue Number: close#24315
The root cause of this issue is that Elasticsearch's long type allows inserting floats and strings. Doris did not handle these cases when doing type conversion. The current strategy is to take the integer before the decimal point if a float or string is found.
The changes of this PR for JdbcOracleClient are as follows:
#### bug fixes:
1. Fix the problem that if there is an approximate table name for Schema synchronization with a table name with `/` characters, the synchronization Column will be confused
2. Fix the NPE problem of metadata synchronization after enabling lower_case_table_names configuration
#### improvement:
1. Modify the method of synchronizing Oracle User to Doris Database mapping, use `metadata.getSchemas` instead of `SELECT DISTINCT OWNER FROM all_tables`
2. When synchronizing metadata, change `null` at the catalog level to `conn.getcatalog`
Load data from hdfs in hive will move the source directory into table's location directory, leading the error like Can not get first file, please check uri in tvf test.
First of all, mysql does not have a boolean type, its boolean type is actually tinyint(1), in the previous logic, We force tinyint(1) to be a boolean by passing tinyInt1isBit=true, which causes an error if tinyint(1) is not a 0 or 1, Therefore, we need to match tinyint(1) according to tinyint instead of boolean, and this change will not affect the correctness of where k = 1 or where k = true queries
`Wrong data type for column` error when column order in hive table is not same in orc file schema.
The root cause is in order to handle the following case:
The table in orc format of Hive 1.x may encounter system column names such as `_col0`, `_col1`, `_col2`... in the underlying orc file schema, which need to use the column names in the hive table for mapping.
### Solution
Currently fix this issue by handling the following case by specifying hive version to 1.x.x in the hive catalog configuration.
```sql
CREATE CATALOG hive PROPERTIES (
'hive.version' = '1.x.x'
);
```
In JdbcMysqlClient, I've added methods to retrieve auto-increment and default value columns from MySQL. These columns are then mapped into Doris metadata to make them visible to users.
When handling the InsertStmt into an execution plan, Doris used to automatically fill in NULL or default values for columns not specified in the InsertStmt. However, in the JDBC catalog, we don't need Doris to handle these unspecified columns, so I've made changes to skip them directly.
For the insert prepared statement required for writing, our previous behavior was to obtain all columns for placeholders. So, the change I made is to pass in the columns processed by the execution plan during the sink task generation stage for dynamic generation.
Fix hive transaction table regression test test_transactional_hive by adding hive-docker missing configurations of #20679. Hive need to be set these configurations to do compaction.
After supporting insert-only transactional hive full acid tables #19518, #19419, this PR support transactional hive full acid tables.
Support hive3 transactional hive full acid tables.
Hive2 transactional hive full acid tables need to run major compactions.
Add FE container and BE container to execute the logic of the Stop script when executing the exit command to ensure that the metadata is written successfully and minimize the restart exception caused by BEBJE.
We can support this by add a new properties for tvf, like :
`select * from hdfs("uri" = "xxx", ..., "compress_type" = "lz4", ...)`
User can:
Specify compression explicitly by setting `"compression" = "xxx"`.
Doris can infer the compression type by the suffix of file name(e.g. `file1.gz`)
Currently, we only support reading compress file in `csv` format, and on BE side, we already support.
All need to do is to analyze the `"compress_type"` on FE side and pass it to BE.
Fix some hive partition issues.
1. Fix be will crash when using hive partitions field of `date`, `timestamp`, `decimal` type.
2. Fix hdfs uri decode error when using `timestamp` partition filed which will cause some url-encoding for special chars, such as `%3A` will encode `:`.