This PR was originally #16940 , but it has not been updated for a long time due to the original author @Cai-Yao . At present, we will merge some of the code into the master first.
thanks @Cai-Yao @yiguolei
For load request, there are 2 tuples on scan node, input tuple and output tuple.
The input tuple is for reading file, and it will be converted to output tuple based on user specified column mappings.
And the broker load support different column mapping in different data description to same table(or partition).
So for each scanner, the output tuples are same but the input tuple can be different.
The previous implements save the input tuple in scan node level, causing different scanner using same input tuple,
which is incorrect.
This PR remove the input tuple from scan node and save them in each scanners.
If there is a core dump here, it may cover up the real stack, if stack trace indicates heap corruption
(which led to invalid jemalloc metadata), like double free or use-after-free in the application.
Try sanitizers such as ASAN, or build jemalloc with --enable-debug to investigate further.
count_by_enum(expr1, expr2, ... , exprN);
Treats the data in a column as an enumeration and counts the number of values in each enumeration. Returns the number of enumerated values for each column, and the number of non-null values versus the number of null values.
In some cases, the high load of HDFS may lead to a long time to read the data on HDFS,
thereby slowing down the overall query efficiency. HDFS Client provides Hedged Read.
This function can start another read thread to read the same data when a read request
exceeds a certain threshold and is not returned, and whichever is returned first will use the result.
eg:
create catalog regression properties (
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.16.47:7004',
'dfs.client.hedged.read.threadpool.size' = '128',
'dfs.client.hedged.read.threshold.millis' = "500"
);
If a column is defined as: col VARCHAR/CHAR NULL and no default value. Then we load json data which misses column col, the result queried is not correct:
+------+
| col |
+------+
| 1 |
+------+
But expect:
+------+
| col |
+------+
| NULL |
+------+
---------
Co-authored-by: duanxujian <duanxujian@jd.com>
be_custom.conf persistence path is ${doris_home}/conf/be_custom.conf, but if we set ${custom_config_dir} is a different path, will cause be can't read be_custom.conf from ${custom_config_dir}.
set be_custom.conf persist path to ${custom_config_dir}.
FunctionStringConcat::execute_impl resized with size that include string null terminator, which causes ColumnString::chars.size() does not match with ColumnString::offsets.back, this will cause problems for some string functions, e.g. like and regexp.
return empty result instead of error for empty match query as follows:
`SELECT * FROM t WHERE msg MATCH ''`
`SELECT * FROM t WHERE msg MATCH 'stop_word'`
Memtrackers are usually bound to operators in query/load. If a large number of query/loads are stuck, memtrackers will be very large. memory tracker profile refresh thread will get stuck on the lock.
This pr is for branch-2.0, I will rewrite the memory profile in the next pr
Bug:
When the value of some ES column is empty, querying these value in doc_values mode will receive an error.
Reson:
In doc values mode, these values are empty, We need to determine if the array is empty
Now we make wrong for decimal parse from string
if given string precision is bigger than defined decimal precision, we will return a overflow error, but only digit part is bigger than typed digit length , we should return overflow error when we traverse given string to decimal value
Jemalloc heap profile follows libgcc's way of backtracing by default.
rewrites dl_iterate_phdr will cause Jemalloc to fail to run after enable profile.
TODO, two solutions:
- Jemalloc specifies GNU libunwind as the prof backtracing way, but my test failed,
--enable-prof-libunwind not work: --enable-prof-libunwind not work jemalloc/jemalloc#2504
- ClickHouse/libunwind solves Jemalloc profile backtracing, but the branch of ClickHouse/libunwind
has been out of touch with GNU libunwind and LLVM libunwind, which will leave the fate to others.