Buffer flip is used incorrectly.
When the hash key is string type, the hash value is always zero.
The reason is that the buffer of string type is obtained by wrap, which is not needed to flip.
If we do so, the buffer limit for read will be zero.
* load newly generated image file as soon as generated to check if it is valid.
* delete the latest invalid image file
* fix
* fix
* get filePath from saveImage() to ensure deleting the correct file while exception happens
* fix
Co-authored-by: wuhangze <wuhangze@jd.com>
* avoiding a corrupt image file when there is image.ckpt with non-zero size
For now, saveImage writes data to image.ckpt via an append FileOutputStream,
when there is a non-zero size file named image.ckpt, a disaster would happen
due to a corrupt image file. Even worse, fe only keeps the lastest image file
and removes others.
BTW, image file should be synced to disk.
It is dangerous to only keep the latest image file, because an image file is
validated when generating the next image file. Then we keep an non validated
image file but remove validated ones. So I will issue a pr which keeps at least
2 image file.
* append other data after MetaHeader
* use channel.force instead of sync
For now, dbTransactionManager::getTransactionNum is only used by
checkpoint to get transaction num to put into a image file. However,
transactions written into a image file do not come from the same
data structure as the num comes. Thus, we should pay much attention to
assure two data structue is consistent on size. Actually, it is
very difficult to do so.
This patch just let getTransactionNum get number from the same data
structure as write method.
The change was introduced by b93e841688.
1. Add TStorageMigrationReqV2 and EngineStorageMigrationTask to support migration action
2. Change TabletManager::create_tablet() for remote storage
3. Change TabletManager::try_delete_unused_tablet_path() for remote storage
This is the first PR for statistics collection includes some implementations of the statistics(#6370), it will not affect any existing code and users will not be able to create statistics job.
It mainly implements the semantic checking module for statistical information collection jobs, and the job creation module.
The syntax is:
ANALYZE [[ db_name.tb_name ] [( column_name [, ...] )], ...] [ PROPERTIES(...) ]
e.g.
ANALYZE;
ANALYZE tbl1;
ANALYZE tbl1(col1, col2) PROPERTIES("cbo_ statistics_ task_ timeout" = "10");
Two configurations have been added:
Timeout time of a single task max_cbo_statistics_task_timeout_sec
The maximum number of running jobs the system can receive cbo_max_statistics_job_num
Co-authored-by: weizhengte <1141550741@qq.com>
Co-authored-by: weizhengte <weizhengte@foxmail.com>
Co-authored-by: EmmyMiao87 <522274284@qq.com>
Co-authored-by: frankywei <frankywei@tencent.com>
SQL with JOIN and columns ARRAY, will call function ColumnArray::replicate. At this pr,
we implement replicate for ARRAY type, to support SQL like this:
`SELECT count(lo_array),count(d_array),SUM(lo_extendedprice*lo_discount) AS REVENUE FROM lineorder, date WHERE lo_orderdate = d_datekey AND d_year = 1993 AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25;`
## Design:
For now, there are two categories of types in Doris, one is for scalar types (such as int, char and etc.) and the other is for composite types (array and etc.). For the sake of performance, we can cache type info of scalar types globally (unique objects) due to the limited number of scalar types. When we consider the composite types, normally, the type info is generated in runtime (we can also use some cache strategy to speed up). The memory thereby should be reclaimed when we create type info for composite types.
There are a lots of interfaces to get the type info of a specific type. I reorganized those as the following describes.
1. `const TypeInfo* get_scalar_type_info(FieldType field_type)`
The function is used to get the type info of scalar types. Due to the cache, the caller uses the result **WITHOUT** considering the problems about memory reclaim.
2. `const TypeInfo* get_collection_type_info(FieldType sub_type)`
The function is used to get the type info of array types with just **ONE** depth. Due to the cache, the caller uses the result **WITHOUT** considering the problems about memory reclaim.
3. `TypeInfoPtr get_type_info(segment_v2::ColumnMetaPB* column_meta_pb)`
4. `TypeInfoPtr get_type_info(const TabletColumn* col)`
These functions are used to get the type info of **BOTH** scalar types and composite types. The caller should be responsible to manage the resources returned.
#### About the new type `TypeInfoPtr`
`TypeInfoPtr` is an alias type to `unique_ptr` with a custom deleter.
1. For scalar types, the deleter does nothing.
2. For composite types, the deleter reclaim the memory.
By analyzing the callers of `get_type_info`, these classes should hold TypeInfoPtr:
1. `Field`
2. `ColumnReader`
3. `DefaultValueColumnIterator`
Other classes are either constructed by the foregoing classes or hold those, so they can just use the raw pointer of `TypeInfo` directly for the sake of performance.
1. `ScalarColumnWriter` - holds `Field`
1. `ZoneMapIndexWriter` - created by `ScalarColumnWriter`, use `type_info` from the field in `ScalarColumnWriter`
1. `IndexedColumnWriter` - created by `ZoneMapIndexWriter`, only uses scalar types.
2. `BitmapIndexWriter` - created by `ScalarColumnWriter`, uses `type_info` from the field in `ScalarColumnWriter`
1. `IndexedColumnWriter` - created by `BitmapIndexWriter`, uses `type_info` in `BitmapIndexWriter` and `BitmapIndexWriter` doesn't support `ArrayType`.
3. `BloomFilterIndexWriter` - created by `ScalarColumnWriter`, uses `type_info` from the field in `ScalarColumnWriter`
1. `IndexedColumnWriter` - created by `BloomFilterIndexWriter`, only uses scalar types.
2. `IndexedColumnReader` initializes `type_info` by the field type in meta (only scalar types).
3. `ColumnVectorBatch`
1. `ZoneMapIndexReader` creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `IndexedColumnReader`
2. `BitmapIndexReader` supports scalar types only and it creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `BitmapIndexReader`
3. `BloomFilterIndexWriter` supports scalar types only and it creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `BloomFilterIndexWriter`