This fix is based on UBSAN unit test. So if we create & use class obj in a different way, may have runtime error: load of value XX, which is not a valid value for type 'YYY' warnings again.
Unit test should build in DEBUG or XXSAN mode(at lease DEBUG). RELEASE mode will add -DNDEBUG, turn off dchecks/asserts/debug.
* 1. Add enable spilling in query option, support spill disk in Analytic_Eval_Node, FE can open enable spilling by
set enable_spilling = true;
Now, Sort Node and Analytic_Eval_Node can spill to disk.
2. Delete merge merge_sorter code we do not use now.
3. Replace buffered_tuple_stream by buffered_tuple_stream2 in Analytic_Eval_Node and support spill to disk. Delete the useless code of buffered_block_mgr and buffered_tuple_stream.
4. Add DataStreamRecvr Profile. Move the counter belong to DataStreamRecvr from fragment to DataStreamRecvr Profile to make clear of Running Profile.
* change some hint in code
* replace disable_spill with enable_spill which is better compatible to FE
1. the table include key column of double/float type
2. when run checksum task, will use all of key columns to compare
3. schema.column(idx) of double/float type is NULL
#3735
[Doris On ES] Skip function_call expr when process predicate
Fixed#3801
Do not push-down function_call such as split_xxx when process predicate, Doris BE is responsible for processing these predicate
All rows in table:
```
+------+------+------+------------+------------+
| k1 | k2 | k3 | UpdateTime | ArriveTime |
+------+------+------+------------+------------+
| NULL | NULL | kkk1 | 123456789 | NULL |
| kkk1 | NULL | NULL | 123456789 | NULL |
| NULL | kkk2 | NULL | 123456789 | NULL |
+------+------+------+------------+------------+
```
The following predicate could not push down to ES.
```
SQL 1:
mysql> select * from (select split_part(k1, "1", 1) as kk from case_replay_for_milimin) t where t.kk is not null;
+------+
| kk |
+------+
| kkk |
+------+
1 row in set (0.02 sec)
SQL 2:
mysql> select * from (select split_part(k1, "1", 1) as kk from case_replay_for_milimin) t where t.kk > 'a';
+------+
| kk |
+------+
| kkk |
+------+
SQL 3:
mysql> select * from (select split_part(k1, "1", 1) as kk from case_replay_for_milimin) t where t.kk > '2';
+------+
| kk |
+------+
| kkk |
+------+
1 row in set (0.03 sec)
```
StorageEngine::open just return a very vague status info when failed,
we have to check logs to find out the root reason, and it's not
convenient to check logs if we run unit tests in CI dockers.
It would be better to return more detailed failure info to point out
the root reason, for example, it may return error status with message
"file descriptors limit is too small".
TabletMeta's _preferred_rowset_type is not initialized after object constructing and
may be a random value, and this field is not updated when create ALPHA_ROWSET tablet,
and it will not be serialized into pb in this case. So if cloning an ALPHA_ROWSET
tablet from another BE, this new created local tablet's _preferred_rowset_type field
may be random as BETA_ROWSET and can not be overwrote after cloned, then new input
rows will be wrote as BETA_ROWSET format which is not we expect.
This patch fix this bug by giving _preferred_rowset_type a default value and updating
this field when create any type of tablet, and add an unit test and related overwrite
equal operator functions.
The other PR : https://github.com/apache/incubator-doris/pull/3513 (https://github.com/apache/incubator-doris/issues/3479) try to resolved the `inner hits node is not an array` because when a query( batch-size) run against new segment without this field, as-well the filter_path just only take `hits.hits.fields` 、`hits.hits._source` into account, this would appear an null inner hits node:
```
{
"_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAHaUWY1ExUVd0ZWlRY2",
"hits": {
"total": 1
}
}
```
Unfortunately this PR introduce another serious inconsistent result with different batch_size because of misusing the `total`.
To avoid this two problem, we just add `hits.hits._score` to filter_path when `docvalue_mode` is true, `_score` would always `null` , and populate the inner hits node:
```
{
"_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAHaUWY1ExUVd0ZWlRY2",
"hits": {
"total": 1,
"hits": [
{
"_score": null
}
]
}
}
```
related issue: https://github.com/apache/incubator-doris/issues/3752
This CL mainly changes:
1. Add a new BE config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to storage engine.
2. Add 2 new session variables `max_scan_key_num` and `doris_max_scan_key_num` which can set in session level and overwrite the config value in BE.
This is a simple refactor patch on class Reader without any functional changes.
Main refactor points:
- Remove some useless return value
- Use range loop
- Use empty() instead of size() for some STL containers size judgement
- Use in-class initialization instead of initialize in constructor function
- Some other small refactor
* Fix large string val allocation failure
Large bitmap will need use StringVal to allocate large memory, which is large than MAX_INT.
The overflow will cause serialization failure of bitmap.
Fixed#3600
Current cumulative point calculate algorithm may skip singleton rowset when the rowset has only one segment and with NONOVERLAPPING flag. When a tablet is new created and cumulate many singleton rowsets, cumulative point will be calculated as the max version + 1, and then cumulative compaction couldn't pick any rowsets and compaction failed, and
will lead the next base compaction on this tablet with all rowsets, which can also cause memory consume problem, suppose there are thousands of rowsets.
All singleton rowsets must be newly wrote by delta writer and hasn't
do any compaction, we should place cumulative point before any of these rowsets.
Fixes#3706
DataSink uses instance and query MemTracker from RuntimeState, therefore it should be destructed before RuntimeState. Otherwise memory corruption and segfault could happen.
```
CREATE TABLE `query_detail` (
`query_id` varchar(100) NULL COMMENT "",
`start_time` datetime NULL COMMENT "",
`end_time` datetime NULL COMMENT "",
`latency` int(11) NULL COMMENT "unit is milliseconds",
`state` varchar(20) NULL COMMENT "RUNNING/FINISHED/FAILED",
`sql` varchar(1024) NULL COMMENT ""
)
DUPLICATE KEY(`query_id`)
SELECT COUNT(*) FROM query_detail WHERE start_time >= '2020-05-27 14:52:16' AND start_time < '2020-05-27 14:52:31';
```
The above query will core because of ZoneMap only in query_id.
Use start_time to match ZoneMap cause this core.
Add a JSON format for existing metrics like this.
```
{
"tags":
{
"metric":"thread_pool",
"name":"thrift-server-pool",
"type":"active_thread_num"
},
"unit":"number",
"value":3
}
```
I add a new JsonMetricVisitor to handle the transformation.
It's not to modify existing PrometheusMetricVisitor and SimpleCoreMetricVisitor.
Also I add
1. A unit item to indicate the metric better
2. Cloning tablet statistics divided by database.
3. Use white space to replace newline in audit.log
Change the load label of audit plugin as:
`audit_yyyyMMdd_HHmmss_feIdentity`.
The `feIdentity` is got from the FE which run this plugin, currently just use FE's IP_editlog_port.
This CL mainly changes:
1. Support `SELECT INTO OUTFILE` command.
2. Support export query result to a file via Broker.
3. Support CSV export format with specified column separator and line delimiter.
Fix some unittest failed due to glog, this may be we change the ut build dir,and the log path is not exist in new build dir, so we change the log from file to stdout