When the physical memory of the process reaches 90% of the mem limit, trigger the load channel mgr to brush down
The default value of be.conf mem_limit is changed from 90% to 80%, and stability is the priority.
Fix deadlock in arena_locks in BufferPool::BufferAllocator::ScavengeBuffers and _lock in DebugString
In order to avoid different mem tracker consumption values of multiple queries/loads, and the difference between the virtual memory of alloc and the physical memory actually increased by the process.
The memory alloc in PODArray and mempool will not be recorded in the query/load mem tracker immediately, but will be gradually recorded in the mem tracker during the memory usage.
But mem pool allocates memory from chunk allocator. If this chunk is used after the second time, it may have used physical memory. The above mechanism will cause the load channel memory statistics to be less than the actual value.
* [fix](parquet) fix write error data as parquet format.
Fix incorrect data conversion when writing tiny int and small int data
to parquet files in non-vectorized engine.
Add `JSON` datatype, following features are implemented by this PR:
1. `CREATE` tables with `JSON` type columns
2. `INSERT` values containing `JSON` type value stored in `String`, which is represented as binary format(AKA `JSONB`) at BE
3. `SELECT` JSON columns
Detail design refers [DSIP-016: Support JSON type](https://cwiki.apache.org/confluence/display/DORIS/DSIP-016%3A+Support+JSON+type)
* add JSONB data storage format type
* fix JsonLiteral resolve bug
* add DataTypeJson case in data_type_factory
* add JSON syntax check in FE
* add operators for jsonb_document, currently not support comparison between any JSON type value
* add ColumnJson and DataTypeJson
* add JsonField to store JsonValue
* add JsonValue to convert String JSON to BINARY JSON and JsonLiteral case for vliteral
* add push_json for MysqlResultWriter
* JSON column need no zone_map_index
* Revert "JSON column need no zone_map_index"
This reverts commit f71d1ce1ded9dbae44a5d58abcec338816b70d79.
* add JSON writer and reader, ignore zone-map for JSON column
* add json_to_string for DataTypeJson
* add olap_data_convertor for JSON type
* add some enum
* add OLAP_FIELD_TYPE_JSON type, FieldTypeTraits for it and corresponding cases or functions
* fix column_json offsets overflow bug, format code
* remove useless TODOs, add CmpType cases for JSON type
* add license header
* format license
* format be codes
* resolve rebase master conflicts
* fix bugs for CREATE and meta related code
* refactor JsonValue constructors, add fe JSON cases and fix some bugs, reformat codes
* modification be codes along code review advice
* fix rebase conflicts with master
* add unit test for json_value and column_json
* fix rebase error
* rename json to jsonb
* fix some data convert bugs, set Mysql type to JSON
Fix bugs:
1. Fe need to send file format (e.g. parquet, orc ...) to be while processing load jobs using new scanner.
2. Try to get parquet file column type from SchemaElement.type before getting from Logical type and Converted type.
1. Remove single load channel mem limit, only use load channel mgr mem limit
2. Default load channel mgr mem limit from 50% to 80%
3. load channel mgr add soft mem limit. When the soft limit is exceeded, other threads will not hang, only current thread triggers flush
4. When exceed load channel mgr mem limit, find a load channel with the largest mem usage, continue to find a tablet channel with the largest mem usage, and try to flush 1/3 of the mem usage of this tablet channel.
1. HttpServer threads allocate bytebuffer and put them into streamload pipe, but scanner thread release them with query tracker.
2. We can assume brpc allocate memory in doris thread.
Above problems leads to wrong result of memtracker.
The like predicate process data in block perform better than in row. Currently, only not null column is optimized, nullable column will be handled later.
SELECT COUNT(*) FROM hits WHERE URL LIKE '%google%';
before: ~680ms
after: ~570ms
Get schema from parquet reader.
The new VFileScanner need to get file schema (column name to type map) from parquet file while processing load job,
this pr is to set the type information for parquet columns.
refactor some arguments for parquet reader
1. Add new parquet context to wrap reader arguments
2. Reduced some arguments for function call
Co-authored-by: jinzhe <jinzhe@selectdb.com>
The new scanner (VFileScanner) need a counter to record two values in load job.
1. The number of rows unselected by pre-filter, and
2. The number of rows filtered by unmatched schema or other error. This pr is to implement the counter.
The mem hook record tracker cannot guarantee that the final consumption is 0, nor can it guarantee that the memory alloc and free are recorded in a one-to-one correspondence.
In the life cycle of a memtable from insert to flush, the memory free of hook is more than that of alloc, resulting in tracker consumption less than 0.
In order to avoid the cumulative error of the upper load channel tracker, the memtable tracker consumption is reset to zero on destructor.
The mem hook consumes the orphan tracker by default. If the thread does not attach other trackers, by default all consumption will be passed to the process tracker through the orphan tracker.
In real time, consumption of all other trackers + orphan tracker consumption = process tracker consumption.
Ideally, all threads are expected to attach to the specified tracker, so that "all memory has its own ownership", and the consumption of the orphan mem tracker is close to 0, but greater than 0.
Following the iteration order of the hash table will result in out-of-order access to aggregate states, which is very inefficient.
Traversing aggregate states in memory write order can significantly improve memory read efficiency.
Test
hash table items count: 3.35M
Before this optimization: insert keys into column takes 500ms
With this optimization only takes 80ms
In current policy, if mem-limit exceeded, load channel will pick tablets that consume most memory, but mem_consumption contains memory in flush, if some delta writer flushing a full memtable(default 200MB), the current memtable might be very small, we should avoid flush such memtable, which can generate a very small segment.
Refactor of scanners. Support broker load.
This pr is part of the refactor scanner tasks. It provide support for borker load using new VFileScanner.
Work still in progress.