json reader DCHECK fail because of missing TYPE_STRING
fix bug that if no file is found, the tvf will throw NPE.
The predicate conjuncts can not be pushed down to parquet reader if this is a load task.
Because the predicate should be applied on column of dest table, not on column of source file.
Add a temp property "use_new_load_scan_node" of broker load to make regression test happy.
So that we can use new load scan node for a certain job and avoid setting global FE config.
MySQL [db]> SELECT SUM(a.r[1]) as active_user_num, SUM(a.r[2]) as active_user_num_1day, SUM(a.r[3]) as active_user_num_3day, SUM(a.r[4]) as active_user_num_7day FROM ( SELECT user_id, retention( day = '2022-11-01', day = '2022-11-02', day = '2022-11-04', day = '2022-11-07') as r FROM login_event WHERE (day >= '2022-11-01') AND (day <= '2022-11-21') GROUP BY user_id ) a;
ERROR 1105 (HY000): errCode = 2, detailMessage = sum requires a numeric parameter: sum(%element_extract%(a.r, 1))
If there are too much deleted tablets in RocksDB, there are many OLAP_ERR_TABLE_ALREADY_DELETED_ERROR during startup and will try to get error stack. It will cost a lot of time and the start process taks very very long.
Co-authored-by: yiguolei <yiguolei@gmail.com>
1. change jsonb_extract_string behavior: convert to string instead of NULL if the type of json path is not string
2. move jsonb tutorial doc to JSONB data type
BE will crash when querying partitioned hive table with text format
and put partition column at first of select items.
1. FE should use file slots to set the column mapping index of csv file.
2. BE should use `get_by_name` of block to get right column in a block in csv reader.
1.Support in bitmap syntax, like 'where k1 in (select bitmap_column from tbl)';
2.Support bitmap runtime filter. Generate a bitmap filter using the right table bitmap and push it down to the left table storage layer for filtering.
Problem:
We got following error frequently while SELECT xxx INTO OUTFILE:
ERROR 1064 (HY000): RpcException, msg: Fail to write to broker, broker:TNetworkAddress(hostname=a.b.c.d, port=8111) failed:write() send(): Broken pipe
Reason:
we cache broker thrift client in BE;
thrift client check connect isOpen only return cached flag, not care the real socket is opened or closed;
after we get client from cache, the socket may already closed, then pwrite will failed.
How to fix:
Other interfaces such as open and close, will reopen and retry again, but pwrite do not retry.
As there are write offset inside pwrite, and the broker(server) side also will check the write offset, it is safe to retry pwrite.
in the previous, the result is:
```
mysql> select array_position([1, null], null);
+--------------------------------------+
| array_position(ARRAY(1, NULL), NULL) |
+--------------------------------------+
| NULL |
+--------------------------------------+
1 row in set (0.02 sec)
```
but after this commit, the result become:
```
mysql> select array_position([1, null], null);
+--------------------------------------+
| array_position(ARRAY(1, NULL), NULL) |
+--------------------------------------+
| 2 |
+--------------------------------------+
1 row in set (0.02 sec)
```
The run length of null map is saved as `uint16_t`. Previously, the run length of null map was
limited by `batch_size` in the `ParquetReader`, by setting `batch_size = std::min(batch_size, (size_t)USHRT_MAX)`.
It works well when the batch size is less than `USHRT_MAX`.
However, [Lazy read](https://github.com/apache/doris/pull/13917) will merge empty batches until reading
a non-empty batch or reaching the EOF of a row group, so the `batch_size` may be greater than `USHRT_MAX`
in non-predicate columns.
In addition, even if the `batch_size` does not exceed `USHRT_MAX`, the adjacent batches may also make
the run length exceed the `USHRT_MAX` in `ColumnSelectVector::get_next_run`.
This PR is to make sharing hash table for broadcast more robust:
Add a session variable to enable/disable this function.
Do not block the hash join node's close function.
Use shared pointer to share hash table and runtime filter in broadcast join nodes.
The Hash join node that doesn't need to build the hash table will close the right child without reading any data(the child will close the corresponding sender).
Array column should not have encoding info because it use its sub columns' encoding info
And this encoding info is never used and easy to make people confused.
We should remove it.
When unique key with MOW table has sequence column, the query result may be wrong with predicates. There are two problems:
The sequence column needs to be removed from primary key index when comparing key.
The sequence column needs to be removed from min/max key.
Currently, BE will print fail to get master client from cache. host=xxxxx, port=9228, code=THRIFT_RPC_ERROR but we did not know which step generate this error. So that I refactor error status in be and add error stack for RPC_ERROR.
W1122 10:19:21.130796 30405 utils.cpp:89] fail to get master client from cache. host=xxxx, port=9228, code=RPC error(error -1): Couldn't open transport for xxxx:9228 (open() timed out)/n @ 0x559af8f774ea doris::Status::ConstructErrorStatus()
@ 0x559af9aacbee _ZN5doris16ThriftClientImpl4openEv.cold
@ 0x559af97f563a doris::ClientCacheHelper::_create_client()
@ 0x559af97f78cd doris::ClientCacheHelper::get_client()
@ 0x559af934f38b doris::MasterServerClient::report()
@ 0x559af932e7a7 doris::TaskWorkerPool::_handle_report()
@ 0x559af932f07c doris::TaskWorkerPool::_report_task_worker_thread_callback()
@ 0x559af9b223c5 doris::ThreadPool::dispatch_thread()
@ 0x559af9b187af doris::Thread::supervise_thread()
@ 0x7f661bd8bea5 start_thread
@ 0x7f661c09eb0d __clone
Co-authored-by: yiguolei <yiguolei@gmail.com>