BE will crash when querying partitioned hive table with text format
and put partition column at first of select items.
1. FE should use file slots to set the column mapping index of csv file.
2. BE should use `get_by_name` of block to get right column in a block in csv reader.
Add regression test for external hive orc table. This PR has generated all basic types support by hive orc, and create a hive external table to touch them in docker environment.
Functions to be tested:
1. Ensure that all types are parsed correctly
2. Ensure that the null map of all types are parsed correctly
3. Ensure that the `SearchArgument` of `OrcReader` works well
4. Only select partition columns
In previous implementation, when doing list partition prune, we need to generation `rangeToId`
every time we doing prune.
But `rangeToId` is actually a static data that should be create-once-use-every-where.
So for hive partition, I created the `rangeToId` and all other necessary data structures for partition prunning
in partition cache, so that we can use it directly.
In my test, the cost of partition prune for 10000 partitions reduce from 8s -> 0.2s.
Aslo add "partition" info in explain string for hive table.
```
| 0:VEXTERNAL_FILE_SCAN_NODE |
| predicates: `nation` = '0024c95b' |
| inputSplitNum=1, totalFileSize=4750, scanRanges=1 |
| partition=1/10000 |
| numNodes=1 |
| limit: 10 |
```
Bug fix:
1. Fix bug that es scan node can not filter data
2. Fix bug that query es with predicate like `where substring(test2,2) = "ext2";` will fail at planner phase.
`Unexpected exception: org.apache.doris.analysis.FunctionCallExpr cannot be cast to org.apache.doris.analysis.SlotRef`
TODO:
1. Some problem when quering es version 8: ` Unexpected exception: Index: 0, Size: 0`, will be fixed later.
Issue Number: close#12574
This pr adds `NewJsonReader` which implements GenericReader interface to support read json format file.
TODO:
1. modify `_scann_eof` later.
2. Rename `NewJsonReader` to `JsonReader` when `JsonReader` is deleted.
1. Modify default behavior of `build.sh`
The `BUILD_JAVA_UDF` is default ON, so that jvm is needed for compilation and runtime.
2. Add docker-compose for MySQL 5.7, PostgreSQL 14 and Hive 2
See `docker/thirdparties/docker-compose`.
3. Add some regression test cases for jdbc query on MySQL, PG and Hive Catalog
The default is `false`, if set to true, you need first start docker for MySQL/PG/Hive.
4. Support `if not exists` and `if exists` for create/drop resource and create/drop encryptkey
1. Run git clone may not been executed when docker use cached layer, so I change it to copy latest code from local.
2. Before building docker image, please clone the latest source code firstly.
* Add param of specified thirdparty path
1. The thirdparth path can be specify on build.sh: ./build.sh --thirdparty /specified/path/to/thirdparty
2. If there are only thirdparty param of build.sh, it will build both fe and be
3. Add unit test of routine load stmt
4. Remove source code in docker image
* Add DORIS_THIRDPARTY env in docker image
1. Set DORIS_THIRDPARTY env in docker image. The build.sh will use /var/local/thirdparty instead of /source/code/thirdparty
2. remove --thirdparty param of build.sh
* Change image workdir to /root