This PR #7936 change some FE log level to debug, so that when error happens, it is not easy to find out
which SQL cause the error.
So I add stmt id and query id in error log, so that user can use these identifiers to find SQL in fe.audit.log
If the tableRef behind represents a CTE or a view,
the tableRef will be reset during semantic parsing.
The new tableRef needs to inherit the lateral view property of the origin tableRef
to ensure that the lateral view is not accidentally lost during parsing.
fix ltrim result may incorrect in some case
according to https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
Built-in Function: int __builtin_cl/tz (unsigned int x)
If x is 0, the result is undefined.
So we handle the case of 0 separately
this function return different between gcc and clang when x is 0
1. reuse Schema to avoid copying, because clone Schema will generate a lot of sub Field object
2. call interface provided by Block to reduce code lines
This PR mainly changes:
1. Change the define of PBlock
The new PBlock consists of a set of PColumnMeta and a binary buffer.
The PColumnMeta records the metadata information of all columns in the Block,
while the buffer stores the serialized binary data of all columns.
2. Refactor the serialize/deserialize method of data type
Rewrite the `serialize()/deserialize()` of IDataType. And also add
a new method `get_uncompressed_serialized_bytes()` to get the total length
of uncompressed serialized data of a column.
3. Rewrite the serialize/deserialize method of Block
Now, when serializing a Block to PBlock, it will first get the total length
of uncompressed serialized data of all columns in this Block, and then allocate
the memory to write the serialized data to the buffer.
4. Use brpc attachment to transmit the serialized column data
1. If the table or db has been dropped,we will get write lock failed or just skip or throw exception,
2. and if we recover table or db, we must ensure that unmark dropped state after writing recover journal.
3. db.dropTable corresponds to db.createTable, I don't move table.markDropped method to the db.dropTable,
for that all meta added to db or catalog must after writing recover journal, so we must invoke markDropped
and unmarkDropped method outside the dropTable and createTable method.
CMAKE_BUILD_DIR is set while building be. "build.sh --clean" just
cleans and exits, however clean_be does not works without
CMAKE_BUILD_DIR set. This patch set CMAKE_BUILD_DIR in clean_be
to teach build.sh --clean work correctly.
Support implement UDF through GRPC protocol. This brings several benefits:
1. The udf implementation language is not limited to c++, users can use any familiar language to implement udf
2. UDF is decoupled from Doris, udf will not cause doris coredump, udf computing resources are separated from doris, and doris services are not affected
But RPC's UDF has a fixed overhead, so its performance is much slower than C++ UDF, especially when the amount of data is large.
Create function like
```
CREATE FUNCTION rpc_add(INT, INT) RETURNS INT PROPERTIES (
"SYMBOL"="add_int",
"OBJECT_FILE"="127.0.0.1:9999",
"TYPE"="RPC"
);
```
Function service need to implement `check_fn` and `fn_call` methods
Note:
THIS IS AN EXPERIMENTAL FEATURE, THE INTERFACE AND DATA STRUCTURE MAY BE CHANGED IN FUTURE !!!
This PR mainly changes:
1. Fix bug when enable `transfer_data_by_brpc_attachment`
In `data_stream_sender`, we will send a serialized PRowBatch data to multiple Channels.
And if `transfer_data_by_brpc_attachment` is enabled, we will mistakenly clear the data in PRowBatch
after sending PRowBatch to the first Channel.
As a result, the following Channel cannot receive the correct data, causing an error.
So I use a separate buffer instead of `tuple_data` in PRowBatch to store the serialized data
and reuse it in multiple channels.
2. Fix bug that the the offset in serialized row batch may overflow
Use int64 to replace int32 offset. And for compatibility, add a new field `new_tuple_offsets` in PRowBatch.
This PR mainly includes the following two changes:
1. Shorten FE single measurement time
In Doris's FE unit test, starting a Doris cluster is a time-consuming operation.
In this PR, the unit tests of some small functions are merged into @QueryPlanTest,
the same cluster is used centrally,
so as to avoid the problem that the overall unit test time of FE is too long.
2. Refine the logic of "PR 7851"
Although the function can be implemented correctly in PR #7851,
the logic is not brief enough.
This PR mainly succinct redundant code in terms of engineering implementation.
Change 1: Support an adaptive runtime filter: IN_OR_BLOOM_FILTER
The processing logic is
If the number of rows in the right table < runtime_filter_max_in_num, then IN predicate will work
If the number of rows in the right table >= runtime_filter_max_in_num, then Bloom filter can take effect
Change 2: The default runtime filter is changed to filter: IN_OR_BLOOM_FILTER
* [improvement](show) Support that user can use show data skew statement instead of admin
This PR mainly do two things:
1. Support that user can use show data skew statement instead of admin
2. Fix fe ut failed caused by pr [improvement](rewrite) Make RewriteDateLiteralRule to be compatible with mysql #7876 and pr [feature-wip](iceberg) Step1: Support create Iceberg external table #7391
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
The tuple ids of the empty set node must be exactly the same as the tuple ids of the origin root node.
In the issue, we found that once the tree where the root node is located has a window function,
the tuple ids of the empty set node cannot be calculated correctly.
This pr mostly fixes the problem.
In order to calculate the correct tuple ids,
the tuple ids obtained from the SelectStmt.getMaterializedTupleIds() function in the past
are changed to directly use the tuple ids of the origin root node.
Although we tried to fix#7929 by modifying the SelectStmt.getMaterializedTupleIds() function,
this method can't get the tuple of the last correct window function.
So we use other ways to construct tupleids of empty nodes.