select c_name from customer union select c_name from customer
this sql used agg node to get distinct row of c_name,
so it's no need to wait for inserted all data to hash map,
could output the data which it's inserted into hash map successed.
Fix tow bugs:
1. Unexpected null values in array column. If 65535 consecutive values are not null in nullable array column, this error will be triggered. The reason is that the array parser did not handle boundary conditions.
2. The number of rows of key filed, and that of value field in map column are not equal. Similarly, the number of rows among fields in struct column are not the same. This would be triggered when the number of rows are not equal among parquet pages of different columns in a row group.
### Issue
when partition has null partitions, it throws error
`Failed to fill partition column: t_int=null`
### Resolution
- Fix the following null partitions error in iceberg tables by replacing null partition to '\N'.
- Add regression test for hive null partition.
configs
1. Because vertical compaction is enabled by default, it consumes less
memory, we can enlarge default value of compaction related configs.
2. Enlarge default value of shard size related to lock.
when brpc client make a request to a server, if the server doesn't response and may not response forever(such as BE restart), the query can be cancelled at once, but the ExchangeSinkBuffer can not be cancelled until rpc timeout.
So we hope when the query is cancelled, the ExchangeSinkBuffer can be closed at once.
configs
Bdbje elect timeout is 30 seconds, so we enlarge thrift_rpc_timeout_ms
and txn_commit_rpc_timeout_ms to 60s.
BTW: enlarge bdbje_lock_timeout_second from 1 to 5.
The default maxConnection of s3 client is 25.
It should be increased to improve the query performance.
In my test, a tpch 300 benchmark with data stored on object storage, the total time
can reduce from 430s -> 330s
1. If only read the partition columns, the `JniConnector` will produce empty required fields, so `HudiJniScanner` should read the "_hoodie_record_key" field at least to know how many rows in current hoodie split. Even if the `JniConnector` doesn't read this field, the call of `releaseTable` in `JniConnector` will reclaim the resource.
2. To prevent BE failure and exit, `JniConnector` should call release methods after `HudiJniScanner` is initialized. It should be noted that `VectorTable` is created lazily in `JniScanner`, so we don't need to reclaim the resource when `HudiJniScanner` is failed to initialize.
## Remaining works
Other jni readers like `paimon` and `maxcompute` may encounter the same problems, the jni reader need to handle this abnormal situation on its own, and currently this fix can only ensure that BE will not exit.
* [Fix](multi-catalog) Not throw exceptions when file not exists for query of hms catalog.
* [Fix](multi-catalog) Not throw exceptions when file not exists for query of hms catalog.
---------
Co-authored-by: 王翔宇 <wangxiangyu@360shuke.com>
First of all, mysql does not have a boolean type, its boolean type is actually tinyint(1), in the previous logic, We force tinyint(1) to be a boolean by passing tinyInt1isBit=true, which causes an error if tinyint(1) is not a 0 or 1, Therefore, we need to match tinyint(1) according to tinyint instead of boolean, and this change will not affect the correctness of where k = 1 or where k = true queries
In this PR, we introduce TOKENIZE function for inverted index, it is used as following:
```
SELECT TOKENIZE('I love my country', 'english');
```
It has two arguments, first is text which has to be tokenized, the second is parser type which can be **english**, **chinese** or **unicode**.
It also can be used with existing table, like this:
```
mysql> SELECT TOKENIZE(c,"chinese") FROM chinese_analyzer_test;
+---------------------------------------+
| tokenize(`c`, 'chinese') |
+---------------------------------------+
| ["来到", "北京", "清华大学"] |
| ["我爱你", "中国"] |
| ["人民", "得到", "更", "实惠"] |
+---------------------------------------+
```
New aggregation function: map_agg.
This function requires two arguments: a key and a value, which are used to build a map.
select map_agg(column1, column2) from t group by column3;