Convert the binary predicate of the form
`<CastExpr<SlotRef(ResultType=BIGINT)>> <op><DecimalLiteral>`
to the binary predicate of
`<SlotRef(ResultType=BIGINT)> <new op> <new DecimalLiteral>`,
thereby allowing the binary predicate The predicate pushes down and completes the bucket clipped.
For query `select * from T where t1 = 2.0`, when the ResultType of column t1 is equal to BIGINT,
in the binary predicate analyze, the type will be unified to DECIMALV2, so the binary predicate will be converted to
`<CastExpr<SlotRef>> <op In the form of ><DecimalLiteral>`, because Cast wraps the t1 column, it cannot be pushed
down, resulting in poor performance.We convert it to the equivalent query `select * from T where t1 = 2` to push down
and improve performance.
SSB test:
1. query `select * from LINEORDER3 where LO_ORDERKEY <2.2`
Performance improvement: `1.587s` -> `0.012s`,
The result and performance of `select * from LINEORDER3 where LO_ORDERKEY <3` are equivalent, and the other comparison methods are the same.
2. query `select * from LINEORDER3 where LO_ORDERKEY = 2.2`
Performance improvement: `0.012s` -> `0.006`.
For the first, we need to make a parameter to discribe the data is local or remote.
At then, we need to support some basic function to support the operation for remote storage.
Reverts apache/incubator-doris#7351
This commit will cause wrong result with agg table.
For example, an agg table `(k1, k2, v1 sum)` with single non-overlapping rowset
`select count(k1) from tbl1;` should using `_direct_agg_key_next_row` instead of `_agg_key_next_row`.
Otherwise it return less rows than expected.(because `_agg_key_next_row` will only do aggregation with `k1`)
This PR mainly prohibits operations such as aggregation/sorting/window functions
on lateral views containing subqueries.
For example:
select min(e1) from (select c1 from table group by c1)tmp1 lateral view explode_split(c1, ",") tmp2 as e1
But the query can be written in another way, and the result is the same.
select min(e1) from (select e1 from (select c1 from table group by c1)tmp1 lateral view explode_split(c1, ",") tmp2 as e1) tmp3
The reason is that when the results of a inline view are subjected to a lateral view,
and the outer query performs aggregation or sorting operations on non-table-function columns.
The output slot id of the table function node is empty or has fewer columns.
The essential reason is that when the inner layer contains an inline view,
the outer expression needs to be mapped to the correct tuple through the substitute method
according to the smap instead of the virtual tuple.
But the substitute method of slot ref cannot recurse to its own source exprs.
E.g
SlotRef: c2 <source expr min(c1)> from agg tuple
smap: <c1, c3>
before: c2 <source expr min(c1)>
after: c2 <source expr min(c1)> no changed
1. Refactor the scheduling logic of broker load. Details see #7367
2. Fix bug that loadedBytes in SHOW LOAD result is wrong.
3. Cancel the thread of LoadTimeoutChecker
Now for PENDING load jobs, there will be no timeout. And the timeout of a load job
start when pending load task is scheduled.
4. Fix a bug that the loading task is never submitted to the pool.
The logic of BlockedPolicy is wrong. We should make sure the task is submitted to the pool,
or the RejectedExecutionException should be thrown.
5. Now the transaction of a load job will begin in pending task, instead of when submitting the job.
1. Delete useless variables
2. Add const modifier for read-only function
3. Delete the empty destructor, the compiler will automatically generate it, refer to the 3/5/0 rule:
[https://en.cppreference.com/w/cpp/language/rule_of_three]
4. It is recommended to add the override keyword (instead of the virtual keyword) to the subclass virtual function.
Override will let the compiler help check and improve security. This is also the reason why C++11 introduces override
The previous DataSourceFunction inherited from RichSourceFunction.
As a result, no matter how much the parallelism of flink is set, the parallelism of DataSourceFunction is only 1.
Now modify it to RichParallelSourceFunction.
And when flink has multiple degrees of parallelism, assign the doris data to each parallelism.
For example, read dorisPartitions.size = 10, flink.parallelism = 4
The task is split as follows:
task0: dorisPartitions[0],[4],[8]
task1: dorisPartitions[1],[5],[9]
task2: dorisPartitions[2],[6]
task3: dorisPartitions[3],[7]
If the memory exceeds the limit when be generates a materialized view or schema change,
a more detailed log about limit and configuration will be prompted..
1. Fix some memory leaks
2. Remove redundant and invalid code
3. Fix some buggy writes to reduce extra memory copies and return null pointers to string
4. Reframing the naming to make the structure clearer
At present, there are defects in the chapter on debugging FE in doc. My colleagues and I stepped on the pit when
building the debugging environment, so I want to improve this chapter in combination with my own stepping on the pit
experience.
The following is my explanation of the changes:
1. mkdir -p ./thirdparty/installed/bin
explain: When I downloaded versions 0.14 and 0.15, there were no files under thirdparty, so I didn't know whether to
create it myself or what to do. Finally, I decided to create it myself. I think it's necessary to add instructions here.
2. Add installation thrift@0.13.0 Failed handling method.
explain: My colleagues and I failed to find the installation package when executing the installation command, and finally
found a solution on GitHub. Therefore, I added the handling method of the problem to avoid other Mac users from
getting stuck in this place.
3. Fixed an error in the generated code description.
explain: Before I finished building the code, I debugged FE, and I failed all the time. Idea hints that no files can be found.
Later, after consulting with morningman in wechat group, it was understood that `mvn install -DskipTests` does not
need to execute `mvn generate-sources` after execution. This is inconsistent with the description in the document and
needs to be corrected.