1. fix concurrency bug of s3 fs benchmark tool, to avoid crash on multi thread.
2. Add `prefetch_read` operation to test prefetch reader.
3. add `AWS_EC2_METADATA_DISABLED` env in `start_be.sh` to avoid call ec2 metadata when creating s3 client.
4. add `AWS_MAX_ATTEMPTS` env in `start_be.sh` to avoid warning log of s3 sdk.
Support estimate table row count based on file size.
With sample size=3000 (total partition number is 87491), load cache time is 45s.
With sample size=100000 (more than total partition number 87505), load cache time is 388s.
Current runtime filter pushing down to cte internal, we construct the runtime filter expr_order with incremental number, which is not correct. For cte internal rf pushing down, the join node will be always different, the expr_order should be fixed as 0 without incrementation, otherwise, it will lead the checking for expr_order and probe_expr_size illegal or wrong query result.
This pr will revert 2827bc1 temporarily, it will break the cte rf pushing down plan pattern.
{{ config(unique_key='id') }}
{{ config(unique_key=['id','name']) }}
Follow the dbt habit, use string for a single column name, and use array for multiple columns
Add a new BE config `kerberos_ticket_lifetime_seconds`, default is 86400.
Better set it same as the value of `ticket_lifetime` in `krb5.conf`
If a HDFS fs handle in cache is live longer than HALF of this time, it will be set as invalid and recreated.
And the kerberos ticket will be renewed.
For non-master FE, must set Backend's status based on the content of edit log.
There is a bug that if we set fe config: `max_backend_heartbeat_failure_tolerance_count` larger that one,
the non-master FE will not set Backend as dead until it receive enough number of heartbeat edit log,
which is wrong.
This will causing the Backend is dead on Master FE, but is alive on non-master FE
Use spark-bundle to read hudi data instead of using hive-bundle to read hudi data.
**Advantage** for using spark-bundle to read hudi data:
1. The performance of spark-bundle is more than twice that of hive-bundle
2. spark-bundle using `UnsafeRow` can reduce data copying and GC time of the jvm
3. spark-bundle support `Time Travel`, `Incremental Read`, and `Schema Change`, these functions can be quickly ported to Doris
**Disadvantage** for using spark-bundle to read hudi data:
1. More dependencies make hudi-dependency.jar very cumbersome(from 138M -> 300M)
2. spark-bundle only provides `RDD` interface and cannot be used directly
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
1. Filtering is done at the sending end rather than the receiving end
2. Projection is done at the sending end rather than the receiving end
3. Each sender can use different shuffle policies to send data
exclude old netty version
upgrade spring-boot version to 2.7.13
used ojdbc8 replace ojdbc6
upgrade jackson version to 2.15.2
upgrade fabric8 version to 6.7.2
For pipeline, olap table sink close is divided into three stages, try_close() --> pending_finish() --> close()
only after all node channels are done or canceled, pending_finish() will return false, close() will start.
this will avoid block pipeline on close().
In close, check the index channel intolerable failure status after each node channel failure,
if intolerable failure is true, the close will be terminated in advance, and all node channels will be canceled to avoid meaningless blocking.
The syntax for supporting partition updates in the future has not been investigated yet and there are issues with partition syntax. Therefore, the partition syntax has been temporarily removed in the current version and will be added after future research.
after running EliminateNotNull rule, the join conjuncts may be removed from inner join node.
So need run ConvertInnerOrCrossJoin rule to convert inner join with no join conjuncts to cross join node.