There are several configs related to tcmalloc, users do know how to config them. Actually users just want two modes, performance or compact, in performance mode, users want doris run query and load quickly while in compact mode, users want doris run with less memory usage.
If we want to config tcmalloc individually, we can use env variables which are supported by tcmalloc.
When upgrade 1.2 version from 1.1, FE version will don't match BE version for a period of time. After upgrade BE and doing schema change, BE will use a field desc_tbl that add in 1.2 version FE. BE will coredump because the field desc_tbl is nullptr. So it need to refuse the request.
- this pr is used to fix the be core dump when import array.
- before the change, we import array by rapidjson string will core dump under the non-vectorized scenario.
- after the change, we can import array by rapidjson string successfully.
1.
In the previous implementation, the max thread num of olap scanner was set relatively small, such as 3.
which would slow down some of queries.
In this PR, I changed the max thread num to a quarter of the scaner thread pool(default is 12),
which is less than the old scan node's max thread num, but larger than the previous implementation.
The upper limit of the max thread num of the old scan node is too high, which is not reasonable.
2.
Lower down the number of pre allocated free blocks.
1.remove quick_compaction's rowset pick policy, call cu compaction when trigger
quick compaction
2. skip tablet's compaction task when compaction score is too small
Co-authored-by: yixiutt <yixiu@selectdb.com>
PR(https://github.com/apache/doris/pull/13404) introduced that ParquetReader
will break up batch insertion when encountering null values, which leads to the bad performance
compared to OrcReader.
So this PR has pushed null map into decode function, reduce the time of virtual function call
when encountering null values.
Further more, reuse hdfsFS among file readers to reduce the time of building connection to hdfs.
When execute shell command bash build.sh --be to build the backend, the cmake tool will show can't find the boost library, because the variable of BOOST_ROOT has some spelling mistake.
OS: Ubuntu 22.04 x86_64
CMake: 3.22.1
compiler: gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0
Previous logic of reverse function might not be strong enough to handle illegal character. For example, one one byte size character would be mistaken as one utf-8 character which occupies more than one byte space. And unfortunately exceeding the buffer space during future process.
1. this pr is used to fix the be core dump when select the invalid array.
2. before the change, we run "select array_intersect([1, 2, 3, 1, 2, 3], '1[3, 2, 5]');" will cause be core dump.
MySQL [example_db]> select array_intersect([1, 2, 3, 1, 2, 3], '1[3, 2, 5]');
ERROR 1105 (HY000): RpcException, msg: io.grpc.StatusRuntimeException: UNAVAILABLE: Network closed for unknown reason
3. after the change, we run "select array_intersect([1, 2, 3, 1, 2, 3], '1[3, 2, 5]');" will get error message.
MySQL [example_db]> select array_intersect([1, 2, 3, 1, 2, 3], '1[3, 2, 5]');
errCode = 2, detailMessage = No matching function with signature: array_intersect(array<tinyint(4)>, varchar(-1))"
Co-authored-by: hucheng01 <hucheng01@baidu.com>
There are some issues with the compile flag `-Wno-unused-but-set-variable` for clang.
1. `-Wno-unused-but-set-variable` should be set when building source by clang-15 on Linux. (#13000#13016)
2. On macOS Monterey, Apple Clang 13 may treat it as a unknown warning option and the compilation process may interrupt.
This PR introduces a better way to make this compile flag more portable.
1. Test whether the compiler recognizes this flag.
2. Add this flag if the compiler recognizes it.
Issue Number: close#12574
This pr adds `NewJsonReader` which implements GenericReader interface to support read json format file.
TODO:
1. modify `_scann_eof` later.
2. Rename `NewJsonReader` to `JsonReader` when `JsonReader` is deleted.
The index for external table columns from path is incorrect in new scanner. This is a fix for it.
e.g. In the next query, nation and city columns are from path
```
mysql> select nation, city, count(*) from parquet_two_part group by nation, city;
+--------+------------+----------+
| nation | city | count(*) |
+--------+------------+----------+
| cn | beijing | 1199969 |
| cn | shanghai | 1199771 |
| jp | tokyo | 599715 |
| rus | moscow | 600659 |
| us | chicago | 1199805 |
| us | washington | 1201296 |
+--------+------------+----------+
6 rows in set (0.39 sec)
```