Support running transactional insert operation with new scan framework. eg:
admin set frontend config("enable_new_load_scan_node" = "true");
begin;
insert into tbl1 values(1,2);
insert into tbl1 values(3,4);
insert into tbl1 values(5,6);
commit;
Add some limitation to transactional insert
Do not support non-literal value in insert stmt
Fix some issue about array type:
Forbid cast other non-array type to NESTED array type, it may cause BE crash.
Add getStringValueForArray() method for Expr, to get valid string-formatted array type value.
Add useLocalSessionState=true in regression-test jdbc url
without this config, the jdbc driver will send some init cmd each time it connect to server, such as
select @@session.tx_read_only.
But when we use transactional insert, after begin command, Doris do not support any other type of
stmt except for insert, commit or rollback.
So adding this config to let the jdbc NOT send cmd when connecting.
* [fix](streamload) report exactly error message when be does not receive heartbeat from fe
Http service is started before hearbeat from fe, so if a streamload comes before heartbeat
from fe, then be report an error like below because master address is not set.
"Couldn't open transport for :0 (Could not resolve host for client socket.)".
We should consider memory which are being flushed from memtable to disk when trying to reduce memory by flushing memtable. Otherwise, we might not release memory space as expected. (e.g. lots of large memtable is in flush, the reduce_mem_usage method picks some small memtables to flush, it can't release enough memory and also can generate lots of small segments, which can cause -238 error)
1. remove FE config `enable_array_type`
2. limit the nested depth of array in FE side.
3. Fix bug that when loading array from parquet, the decimal type is treated as bigint
4. Fix loading array from csv(vec-engine), handle null and "null"
5. Change the csv array loading behavior, if the array string format is invalid in csv, it will be converted to null.
6. Remove `check_array_format()`, because it's logic is wrong and meaningless
7. Add stream load csv test cases and more parquet broker load tests
# Proposed changes
This PR fixed lots of issues when building from source on macOS with Apple M1 chip.
## ATTENTION
The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime:
1. Some errors with memory tracker occur when BE (RELEASE) starts.
2. Some UT cases fail.
...
Temporarily, the following changes are made on macOS to start BE successfully.
1. Disable memory tracker.
2. Use tcmalloc instead of jemalloc.
This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues.
## Use case
```shell
./build.sh -j 8 --be --clean
cd output/be/bin
ulimit -n 60000
./start_be.sh --daemon
```
## Something else
It takes around _**10+**_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the development experience on macOS greatly when we finish the adaptation job.
1. Refactor the file reader creation in FileFactory, for simplicity.
Previously, FileFactory had too many `create_file_reader` interfaces.
Now unified into two categories: the interface used by the previous BrokerScanNode,
and the interface used by the new FileScanNode.
And separate the creation methods of readers that read `StreamLoadPipe` and other readers that read files.
2. Modify the StreamLoadPlanner on FE side to support using ExternalFileScanNode
3. Now for generic reader, the file reader will be created inside the reader, not passed from the outside.
4. Add some test cases for csv stream load, the behavior is same as the old broker scanner.
Previously, bthread_getspecific was called every time bthread local was used. In the test at #10823, it was found that frequent calls to bthread_getspecific had performance problems.
So a cache is implemented on pthread local based on the btls key, but the btls key cannot correctly sense bthread switching.
So, based on bthread_self to get the bthread id to implement the cache.
disable page cache by default
disable chunk allocator by default
not use chunk allocator for vectorized allocator by default
add a new config memory_linear_growth_threshold = 128Mb, not allocate memory by RoundUpToPowerOf2 if the allocated size is larger than this threshold. This config is added to MemPool, ChunkAllocator, PodArray, Arena.
This config is never used online and there exist bugs if enable this config. So that I remove this config and related tests.
Co-authored-by: yiguolei <yiguolei@gmail.com>
Now, every preare put a runtime filter controller, so it takes the
mutex lock on the controller map. Init of bloom filter takes some
time in allocate and memset. If we run p1 tests with -parallel=20
-suiteParallel=20 -actionParallel=20, then we get error message like
'send fragment timeout 5s'.
The patch fixes the problem in the following 2 ways:
1. Replace one mutex block with 128s.
2. If a plan fragment does not have a runtime filter, it does not need to take
the locks.
If a scan node has predicate, we can not limit the concurrency of scanner.
Because we don't know how much data need to be scan.
If we limit the concurrency, this will cause query to be very slow.
For exmple:
select * from tbl limit 1, the concurrency will be 1;
select * from tbl where k1=1 limit 1, the concurrency will not limit.