Support aggregate functions in select without from clause, here are some examples as following:
SELECT 1,
'a',
COUNT(),
SUM(1) + 1,
AVG(2) / COUNT(),
MAX(3),
MIN(4),
RANK() OVER() AS w_rank,
DENSE_RANK() OVER() AS w_dense_rank,
ROW_NUMBER() OVER() AS w_row_number,
SUM(5) OVER() AS w_sum,
AVG(6) OVER() AS w_avg,
COUNT() OVER() AS w_count,
MAX(7) OVER() AS w_max,
MIN(8) OVER() AS w_min;
1. add checks and handling of sequence column in #21896 to insert statement in origin planner and Nereids planner.
2. disable drop sequence mapping column in schema change.
This PR fix the alignment process during publish phase when conflict occurs during concurrent partial updates: if we encounter a row with the same key and larger value in sequence column, it means that there exists another load which introduces a row with the same keys and larger sequence column value published successfully after the commit phase of the current load. We should act as follows:
- If the columns we update include sequence column, we should delete the current row becase the partial update on the current row has been overwritten by the previous one with larger sequence column value.
- Otherwise, we should combine the values of the missing columns in the previous row and the values of the including columns in the current row into a new row.
Use the unified jni framework to refactor java udf.
The unified jni framework takes VectorTable as the container to transform data between c++ and java, and hide the details of data format conversion.
In addition, the unified framework supports complex and nested types.
The performance of basic types remains consistent, with a 30% improvement in string types and an order of magnitude improvement in complex types.
_do_evaluate will add temp result column into original table block, so in order to only convert correct columns to be nullable, need call convert_block_to_null before _do_evaluate
### Before:
return errors when tvf queries an empty file or an error uri:
1. get parsed schema failed, empty csv file
2. Can not get first file, please check uri.
### Now:
we just return empty set when tvf queries an empty file or an error uri.
```sql
mysql> select * from s3(
"uri" = "https://error_uri/exp_1.csv",
"s3.access_key"= "xx",
"s3.secret_key" = "yy",
"format" = "csv") limit 10;
Empty set (1.29 sec)
```
At present, `load_to_singlt_tablet` import implementation refers to simple random number remainder, which cannot achieve true averaging. This will lead to uneven disk IO and uneven use of cluster resources. To solve this problem, we are preparing to implement round-robin for each partition tablet imported each time, in order to achieve average load to each tablet.
When generating the load query plan, the tablet index record currently imported is passed to BE.
Add a deamon task in FE to regularly clean up the `loadTabletRecordMap`. The map will get the bucket_number of the partition and update the `load_tablet_index` when `getCurrentLoadTabletIndex`.
Problem:
when running pipeline, we get randomly failed of test_leading
Reason:
physical distribute was generated and choosed to be the best plan because we can not get any statistic information of empty table. So we would get some unexpect result because we can not expect the order in memo
Solved:
Add statistic of columns used in test_leading, try repeatly in pipeline
this pr
1. fix use podarray push_back() with back() will make heap_use_after_free when podarray is reach capacity which would may make heap free
2. add cases for csv format for nested types. and csv file has two define which are without quote or just like json text
By utilizing the zonemap index of the segment, we can ascertain if a predicate is always true. For example, if the segment’s maximum value is 100 and the predicate is col < 101, then this predicate is always true for this segment.