1. Support for more expression type
2. Support derive with histogram
3. Use StatisticRange to abstract to logic
4. Use Statistics rather than StatisDeriveResult
* [Feature](vectorized)(quantile_state): support vectorized quantile state functions
1. now quantile column only support not nullable
2. add up some regression test cases
3. set default enable_quantile_state_type = true
---------
Co-authored-by: spaces-x <weixiang06@meituan.com>
1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns
2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility
This pull request for bellow problem :
regression pipleline fail case always meet error "Failed to create partition. Timeout. Unfinished mark: 10003=57059", so adjust tablet_create_timeout_second to 100
WITH t0 AS(
SELECT report.date1 AS date2 FROM(
SELECT DATE_FORMAT(date, '%Y%m%d') AS date1 FROM cir_1756_t1
) report GROUP BY report.date1
),
t3 AS(
SELECT date_format(date, '%Y%m%d') AS date3
FROM cir_1756_t2
)
SELECT row_number() OVER(ORDER BY date2)
FROM(
SELECT t0.date2 FROM t0 LEFT JOIN t3 ON t0.date2 = t3.date3
) tx;
The DATE_FORMAT(date, '%Y%m%d') was calculated in GROUP BY node, which is wrong. This expr should be calculated inside the subquery.
Support delta encoding and rle(bool) to read Glue data
add delta bit pack decoder,
add delta length byte array decoder,
add delta byte array decoder.
add rle bool decoder.
We find some data type is read with delta encoding on AWS Glue, so it should be supported.
The definition of delta encoding can refer to the delta encoding in parquet.
1. add support `CAST AS Struct` from Struct type;
2. fix crash while `CAST('{}' AS Struct)`;
3. `CAST('' AS complext_type)` should return NULL instead of empty object;
---------
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
the input replicate_offsets should be the same size as ColumnArray's offset.
```
IColumn::Offsets replicate_offsets(get_offsets().size(), 0);
// |---------------------|-------------------------|-------------------------|
// [0, begin) [begin, begin + count_sz) [begin + count_sz, size())
// do not need to copy copy counts[n] times do not need to copy
```
we should
The current distribution model for Doris is as follows:
OlapTableSink seperate the original Block into serveral subblocks of each node(BE) by tablets distribution and distributes subblocks to storage engine of backends, then the storage engine will seperate the subblock into multiple tablets channel and each delta writer will handle partial of the block.
This model causes blocks to be split according to tablets, and the splitting process can be a relatively heavy operation. After splitting, the blocks are distributed to different DeltaWriters (Memtables) through RPCs to TabletChannels. The distribution operation on TabletChannels is also a relatively heavy operation. If the distribution property of the table is RANDOM distribution, then we have the opportunity to distribute the blocks according to the complete block during distribution. The advantage of doing so is to reduce memory copying and improve write locality, similar to appending the entire block to the memtable.
This optimze could save 10% ~ 20% CPU cost of RANDOM distribution table load when enable load_to_single_tablet
Example:
SELECT ROUTINE_SCHEMA AS PROCEDURE_CAT, NULL AS PROCEDURE_SCHEM,ROUTINE_NAME AS PROCEDURE_NAME,NULL AS NUM_INPUT_PARAMS,NULL AS NUM_OUTPUT_PARAMS,NULL AS NUM_RESULT_SETS,ROUTINE_COMMENT AS REMARKS,IF(ROUTINE_TYPE = 'FUNCTION', 2,IF(ROUTINE_TYPE= 'PROCEDURE', 1, 0)) AS PROCEDURE_TYPE FROM INFORMATION_SCHEMA.ROUTINES WHERE ROUTINE_SCHEMA = DATABASE();
ERROR 1105 (HY000): errCode = 2, detailMessage = invalid parameter
This wrong and some BI tools could not work correctly.
In the past, only simple predicates (slot=const), and, like, or (only bitmap index) could be pushed down to the storage layer. scan process:
Read part of the column first, and calculate the row ids with a simple push-down predicate.
Use row ids to read the remaining columns and pass them to the scanner, and the scanner filters the remaining predicates.
This pr will also push-down the remaining predicates (functions, nested predicates...) in the scanner to the storage layer for filtering. scan process:
Read part of the column first, and use the push-down simple predicate to calculate the row ids, (same as above)
Use row ids to read the columns needed for the remaining predicates, and use the pushed-down remaining predicates to reduce the number of row ids again.
Use row ids to read the remaining columns and pass them to the scanner.
before this pr, add or drop inverted index not change table state, maybe multiple alter jobs executed at the same time, that may lead to some unexpected problems.
This pull request for bellow problem :
when fe config set sys_log_verbos_modules = org.apache.doris, which will make fe get writeLock longer. In this config, make a stream load, that stream load will failed with this message ([ANALYSIS_ERROR]errCode = 2, detailMessage = get tableList write lock timeout, tableList=(Table [id=86135, name=flink_connector, type=OLAP]))
ECB algorithm, block_encryption_mode does not take effect, it only takes effect when init vector is provided.
Solved: 192/256 supports calculation without init vector
For other algorithms, an error should be reported when there is no init vector
Initialization Vector. The default value for the block_encryption_mode system variable is aes-128-ecb, or ECB mode, which does not require an initialization vector. The alternative permitted block encryption modes CBC, CFB1, CFB8, CFB128, and OFB all require an initialization vector.
Reference: https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_aes-decrypt
Note: This fix does not support smooth upgrades. during upgrade process, query may report error: funciton not found
* A in(B) -> bitmap_contains(bitmap_union(B), A)
support bitmap runtime filter on nereids
* GroupPlan -> Plan
* fmt
* fix target cast problem
remove test code
Function esquery does not work because there is a problem parsing the first parameter type.
The first parameter, which is SlotRef, will be cast to CastExpr. This will cause error while generating ES DSL.
Add more types to adapt esquery function.
Result of select bitmap_to_string(bitmap_or(to_bitmap(1), null)) should be 1 instead of null.
This PR fix logic of bitmap_or and bitmap_or_count.
Other count related funcitons should also be checked and fix, they will be fixed in another PR.
* Upgrade log4j to 2.X
- binding log4j version to 2.18.0
- used log4j-1.2-api complete smooth upgrade
* Upgrade filerupload to 1.5
* Upgrade commons-io to 2.7
* Upgrade commons-compress to 1.22
* Upgrade gson to 2.8.9
* Upgrade guava to 30.0-jre
* Binding jackson version to 2.14.2
* Upgrade netty-all to 4.1.89.final
* Upgrade protobuf to 3.21.12
* Upgrade kafka-clints to 3.4.0
* Upgrade calcite version to 1.33.0
* Upgrade aws-java-sdk to 1.12.302
* Upgrade hadoop to 3.3.4
* Upgrade zookeeper to 3.4.14
* Binding tomcat-embed-core to 8.5.86
* Upgrade apache parent pom to 25
* Use hive-exec-core as a hive dependency, add the missing jar-hive-serde separately
* Basic public dependencies are extracted to parent dependencies
* Use jackson uniformly as the basic json tool
* Remove springloaded, spring-boot-devtools has the same functionality
* Modify the spark-related dependency scope to provide, which should be provided at runtime
# Proposed changes
1.The new optimizer supports the combination of subquery and disjunction.In the way of MarkJoin, it behaves the same as the old optimizer. For design details see:https://emmymiao87.github.io/jekyll/update/2021/07/25/Mark-Join.html.
2.Implicit type conversion is performed when conjects are generated after subquery parsing
3.Convert the unnesting of scalarSubquery in filter from filter+join to join + Conjuncts.