Fix bug that mistaken stats when analyzing table incrementally and partition number less than 512
Fix bug that cron expression lost during analyzing
Mark system job as running after registered to AnalysisManager to avoid submit same jobs if previous one take long time
For some certain bugs, jobs is stuck in FE by the table state. For example, There is a bug which causes table remains ROLLUP state after adding rollup job, then other alter jobs later will not succeed because the table state is always ROLLUP but not NORMAL.
This commit adds a statement which is used to set the state of the specified table.
1. do not split compress data file
Some data file in hive is compressed with gzip, deflate, etc.
These kinds of file can not be splitted.
2. Support lz4 block codec
for hive scan node, use lz4 block codec instead of lz4 frame codec
4. Support snappy block codec
For hadoop snappy
5. Optimize the `count(*)` query of csv file
For query like `select count(*) from tbl`, only need to split the line, no need to split the column.
Need to pick to branch-2.0 after this PR: #22304
```
CREATE ROW POLICY test_row_policy_1 ON test.table1
AS {RESTRICTIVE|PERMISSIVE} [TO user] [TO ROLE role] USING (id in (1, 2)); // add `to role`
DROP [ROW] POLICY [IF EXISTS] test_row_policy;//delete `for user` and `on table`
SHOW ROW POLICY [FOR user][FOR ROLE role] // add `for role`
```
Problem:
It will return a result although we use wrong ak/sk/bucket name, such as:
```sql
mysql> select * from demo.student
-> into outfile "s3://xxxx/exp_"
-> format as csv
-> properties(
-> "s3.endpoint" = "https://cos.ap-beijing.myqcloud.com",
-> "s3.region" = "ap-beijing",
-> "s3.access_key"= "xxx",
-> "s3.secret_key" = "yyyy"
-> );
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
| FileNumber | TotalRows | FileSize | URL |
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
| 1 | 3 | 26 | s3://xxxx/exp_2ae166e2981d4c08-b577290f93aa82ba_ |
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
1 row in set (0.15 sec)
```
The reason for this is that we did not catch the error returned by `close()` phase.
Sometimes, the partitions of a hive table may on different storage, eg, some is on HDFS, others on object storage(cos, etc).
This PR mainly changes:
1. Fix the bug of accessing files via cosn.
2. Add a new field `fs_name` in TFileRangeDesc
This is because, when accessing a file, the BE will get a hdfs client from hdfs client cache, and different file in one query
request may have different fs name, eg, some of are `hdfs://`, some of are `cosn://`, so we need to specify fs name
for each file, otherwise, it may return error:
`reason: IllegalArgumentException: Wrong FS: cosn://doris-build-1308700295/xxxx, expected: hdfs://[172.xxxx:4007](http://172.xxxxx:4007/)`
Fix incorrect result if null partition fields in orc file.
### Root Cause
Theoretically, the underlying file of the hive partition table should not contain partition fields. But we found that in some user scenarios, the partition field will exist in the underlying orc/parquet file and are null values. As a result, the pushed down partition field which are null values. filter incorrectly.
### Solution
we handle this case by only reading non-partition fields. The parquet reader is already handled this way, this PR handles the orc reader.