RandomAccessFileOptions, WritableFileOptions, RandomRWFileOptions
defined as a struct but previously declared as a class; this is valid,
but will result in compile warning or error under clang compiler
1. reserve `SegmentWriter::_column_writers` before writing it
2. remove some condition branchs in SegmentWriter::init
3. fix hard-coded library names in build-thirdpary.sh
The resource pool in runtime state will be automatically unregistered
when deconstructing the RuntimeState. So no need to unregister it when
closing the plan fragment executor.
* [Fix] Fix bug that rowset meta is deleted after compaction
After compaction, the tablet rowset meta will be modified by
adding to new output rowsets and deleting the old input rowsets.
The output version may equals to the input version.
So we should delete the "input" version from _rs_version_map
before adding the "output" version to _rs_version_map. Otherwise,
the new "output" version will be lost in _rs_version_map.
Adding a new storage engine, we need to make an extensible tablet interface, so olap/StorageEngine can support and manage new tablet types.
To start, this commit creates a class BaseTablet and make Tablet and new MemTablet inherit
this base class, some common fields & methods are moved to BaseTablet class, which fields
and methods belong to base/old class is not finalized yet, it will change as the project evolves.
Fix#3384
This CL mainly made the following modifications:
1. Delete Invalid MemoryUsed Counter and Add PeakMemUsage in each exec node and datastreamsender
2. Add intent in child execnode profile,make it is easily to know the relationship between execnode
3. Del _is_result_order we not support any more in olap_scan_node.h and olap_scan_node.cpp
4. Add scan_disk method to olap_scanner to fix the counter _num_disks_accessed_counter
5. Now we do not use buffer pool to read and write disk, so annotation eadio counter and
6. Delete the MemUsed counter in exec node.
The child.open() function is not called before this commit.
If the assert num rows node has child which process data in open function, the assert num rows node will fetch no data from child. So the result will be empty(incorrect).
This error only appear in inner subquery which has a aggregation function.
For example:
`select * from table where k1=(select k1 from (select avg(k1) from table) a);`
The first level of subquery returns a non-scalar value, so the assert num rows node is needed.
The second level of subquery has a aggregation function, so the child of assert node is aggregate node.
However, if the open stage of the aggregate node is not called, the get next state of aggregate node will return empty set.
So the result is wrong.
Fixed#3435.
Fix#3390
This CL add more info in `JobDetails` column of `SHOW LOAD` result for Broker Load Job.
For example:
```
{
"Unfinished backends": {
"9c3441027ff948a0-8287923329a2b6a7": [10002]
},
"All backends": {
"9c3441027ff948a0-8287923329a2b6a7": [10002, 10004, 10006]
},
"ScannedRows": 2390016,
"TaskNumber": 1,
"FileNumber": 1,
"FileSize": 1073741824
}
```
2 newly added keys:
`Unfinished backends` indicates the BE which task on them are not finished.
`All backends` indicates the BE which this job has tasks on it.
One more thing, I pass the Backend Id along with the heartbeat msg from FE to BE, so that BE can
know the Id of themselves.
We can observe the workload of BE, and also it's a way to check
whether there is any problem in BE, like some container increase
too large and lead to OOM.
This patch add the following metrics:
```
Name Description
rowset_count_generated_and_in_use The total count of rowset id generated and in use since BE last start
unused_rowsets_count The total count of unused rowset waiting to be GC
broker_count The total count of brokers in management
data_stream_receiver_count The total count of data stream receivers in management
fragment_endpoint_count The total count of fragment endpoints of data stream in management, should always equal to data_stream_receiver_count
active_scan_context_count The total count of active scan contexts
plan_fragment_count The total count of plan fragments in executing
load_channel_count The total count of load channels in management
result_buffer_block_count The total count of result buffer blocks for queries, each block has a limited queue size (default 1024)
result_block_queue_count The total count of queues for fragments, each queue has a limited size (default 20, by config::max_memory_sink_batch_count)
routine_load_task_count The total count of routine load tasks in executing
small_file_cache_count The total count of cached small files' digest info
stream_load_pipe_count The total count of stream load pipes, each pipe has a limited buffer size (default 1M)
tablet_writer_count The total count of tablet writers
brpc_endpoint_stub_count The total count of brpc endpoints
```
There is no functional changes in this patch.
Key refactor points are:
- Remove meaningless return value of functions in class Tablet, and
also some related functions in other classes
- Allow RowsetGraph::capture_consistent_versions to pass a nullptr
to the output parameter
- Use CHECK instead of LOG(FATAL) to simplify code
This CL mainly made the following modifications:
1. Delete Invalid method in Running Profile Class.
2. Move Memlimit Counter from blockmgr to fragment and add PeakMemUsage Counter
3. Fix the bug of buffer pool memlimit counter
4. Call compute_time_in_profile() before pretty_print() to show the _local_time_percent without child running profile
5. Add TransferThread ThreadToken count in AveThreadToken Counter
Process castexpr, such as: k (float) > 2.0, k(int) > 3.2, Doris On Es should ignore this doris native cast transformation for every row's col value, we push down this `cast semantic` to Elasticsearch.
I believe in this `predicate` situation, would decrease the mount of data for transmission。
k1 is float:
````
k1 >= 5
````
push-down filter:
```
{"range":{"k1":{"gte":"5.000000"}}}
```
k2 is int :
```
k2 > 3.2
```
push-down filter:
```
{"range":{"k2":{"gte":"3.2"}}}
```
related issue: #3306
Note: this PR just remove the es_scan_node_test.cpp which is useless
For the moment, just add a simple explain syntax for EsTable without translating the native predicates to ES queryDSL which is better to finished with moving the predicate translating from Doris BE to Doris FE, the whole work is still WIP.
This PR is just a transitional way,but it is better to move the predicates transformation from Doris BE to Doris BE, in this way, Doris BE is responsible for fetching data from ES.
Add a `enable_keyword_sniff ` configuration item in creating External Elasticsearch Table ,it default to true , would to sniff the `keyword` type on the `text analyzed` Field and return the `json_path` which substitute the origin col name.
```
CREATE EXTERNAL TABLE `test` (
`k1` varchar(20) COMMENT "",
`create_time` datetime COMMENT ""
) ENGINE=ELASTICSEARCH
PROPERTIES (
"hosts" = "http://10.74.167.16:8200",
"user" = "root",
"password" = "root",
"index" = "test",
"type" = "doc",
"enable_keyword_sniff" = "true"
);
```
note: `enable_keyword_sniff` default to "true"
run this SQL:
```
select * from test where k1 = "wu yun feng"
```
Output predicate DSL:
```
{"term":{"k1.keyword":"wu yun feng"}}
```
and in this PR, I remove the elasticsearch version detected logic for now this is useless, maybe future is needed.
Now, column with REPLACE/REPLACE_IF_NOT_NULL can be filtered by ZoneMap/BloomFilter
when the rowset is base(version starts with zero). Always we think is an optimization.
But when some case, it will occurs bug.
create table test(
k1 int,
v1 int replace,
v2 int sum
);
If I have two records on different two versions
1 2 2 on version [0-10]
1 3 1 on version 11
If I perform a query
select * from test where k1 = 1 and v1 = 3;
The result will be 1 3 1, this is not right because of the first record is filtered.
The right answer is 1 3 3, the v2 should be summed.
Remove this optimization is necessity to make the result is right.
main refactor points are:
- Use a single get_absolute_tablet_path function instead of 3
independent functions
- Remove meaningless return value of register_tablet and deregister_tablet
- Some typo and format
This PR is to enhance the performance for txn manage task, when there are so many txn in
BE, the only one txn_map_lock and additional _txn_locks may cause poor performance, and
now we remove the additional _txn_locks and split the txn_map_lock into many small locks.
Relate Issue: https://github.com/apache/incubator-doris/issues/3248
SQL:
```
select * from test where (k2 = 6 and k3 = 1) or (k2 = 2 and k3 =3 and k4 = 'beijing');
```
Output filter:
```
((#k2:[6 TO 6] #k3:[1 TO 1]) (#(#k2:[2 TO 2] #k3:[3 TO 3]) #k4:beijing))~1
```
SQL:
```
select * from test where (k2 = 6 or k3 = 7) or (k2 = 2 and k3 =3 and (k4 = 'beijing' or k4 = 'zhaochun'));
```
Output filter:
```
(k2:[6 TO 6] k3:[7 TO 7] (#(#k2:[2 TO 2] #k3:[3 TO 3]) #((k4:beijing k4:zhaochun)~1)))~1
```
SQL:
```
select * from test where (k2 = 6 or k3 = 7) or (k2 = 2 and abs(k3) =3 and (k4 = 'beijing' or k4 = 'zhaochun'));
```
Output filter (`abs` can not be pushed down to es, so doris on es would not process this scenario ):
```
match_all
```
This CL fixes#3270 by skipping recently added version when performing cumulative compaction. A new config named "cumulative_compaction_skip_window_seconds" is added to adjust the time window.
In the past, when we want to modify some BE configs, we have to modify be.conf and then restart BE.
This patch provides a way to modify configs in the type of 'threshold', 'interval', 'enable flag'
when BE is running without restarting it.
You can update a single config once by BE's http API: `be_host:be_http_port/api/update_config?config_name=new_value`
It's not possible to insert duplicated transaction ids for a specific tablet, therefore we could use map<TabletInfo, vector<int64_t>> instead of map<TabletInfo, set<int64_t>> for expire_txn_map.
When calculating the cumulative point at first time, we should stop increasing
the cumulative point when we meet a rowset with overlap flag as OVERLAPPING,
even if it has only one segments.
select date_format(k10, '%Y%m%d') as myk10 from baseall group by myk10;
The date_format function in query above will be stored in MemPool during
the query execution. If the query handles millions of rows, it will
consume much memory. Should clear the MemPool at interval.
This CL fixes a bug that could cause wrong answer for beta rowset with nullable column. The root cause is that NullBitmapBuilder is not reset when the current page doesn't contain NULL, which leads to wrong null map to be written for the next page.
Added a test case to reproduce the problem.