when show databases/tables/table status where xxx, it will change a selectStmt to select result from
information_schema, it need catalog info to scan schema table, otherwise may get many
database or table info from multi catalog.
for example
mysql> show databases where schema_name='test';
+----------+
| Database |
+----------+
| test |
| test |
+----------+
MySQL [internal.test]> show tables from test where table_name='test_dc';
+----------------+
| Tables_in_test |
+----------------+
| test_dc |
| test_dc |
+----------------+
Fix three bugs:
1. DataTypeFactory::create_data_type is missing the conversion of binary type, and OrcReader will failed
2. ScalarType#createType is missing the conversion of binary type, and ExternalFileTableValuedFunction will failed
3. fmt::format can't generate right format string, and will be failed
The result of load should be failed when all tablets delta writer failed to close on single node.
But the result returned to client is success.
The reason is that the committed tablets and error tablets are both empty, so publish will be success.
We should add it to error tablets when delta writer failed to close, then the transaction will be failed.
Currently, newly created segment could be chosen to be compaction
candidate, which is prone to bugs and segment file open failures. We
should skip last (maybe active) segment while doing segcompaction.
In the following case, data inconsistency would happen between multiple replicas
current delta writer only writes a few lines of data (which meas the write() method only called once)
writer failed when init()(which is called at the fist time we call write()), and current tablet is recorded in _broken_tablets
delta writer closed, and in the close() method, delta writer found it's not inited, treat such case as an empty load, it will try to init again, which would create an empty rowset.
tablet sink received the error report in rpc response, marked the replica as failed, but since the quorum replicas are succeed, so the following load commit operation will succeed.
FE send publish version task to each be, the one with empty rowset will publish version successfully.
We got 2 replica with data and 1 empty replica.
Set FE `enable_new_load_scan_node` to true by default.
So that all load tasks(broker load, stream load, routine load, insert into) will use FileScanNode instead of BrokerScanNode
to read data
1. Support loading parquet file in stream load with new load scan node.
2. Fix bug that new parquet reader can not read column without logical or converted type.
3. Change jsonb parser function to "jsonb_parse_error_to_null"
So that if the input string is not a valid json string, it will return null for jsonb column in load task.
FE file path cache for external table may out of date. In this case, BE may fail to find the not exist file from FE cache.
This pr is to handle this case: instead of throw an error message to the user, we return empty result set to the user.
when some case(need modify be.conf), a memtable may flush many segments and then calc delete bitmap with new data. but now, it just only load one segment with max sgement id and this bug will not cala delte bitmap with all data of all segment of one memtable, and will get many rows with same key from merge-on-write table.
1. a icebergv2 delete file may cross many data paths, so the path of a file split is required as a predicate to filter rows of delete file
- create delete file structure to save predicate parameters
- create predicate for file path
2. add some log to print row range
3. fix bug when create file metadata
Currently, there are two sets of file readers in Doris, this pr rewrites the old broker reader with the new file reader.
TODO:
1. rewrite stream load pipe and kafka consumer pipe
fix the err when parse page index: Couldn't deserialize thrift msg.
use two buffer to store column index and offset index msg, avoid parse them in a buffer
The external table file path cache may out of date, which will cause orc reader to visit non-exist files.
In this case, orc file reader is nullptr.
This pr is to check the reader before using it to avoid core dump of visiting nullptr.