In our product environment, we use LVS to dispatch requests to FEs,
however, LVS will send probes to check whether FE is alive, and will
close the connection immediately. It will cause much verbose log,
this patch aim to reduce these log by catch related exceptions.
In version 0.13, we support a more efficient compaction logic.
This logic will maintain multiple version paths of the tablet.
This can avoid -230 errors and can also support incremental clone.
But the previous incremental clone uses the incremental rowset meta recorded in `incr_rs_meta`.
At present, the incremental rowset meta recorded in `incr_rs_meta` and the records
in `stale_rs_meta` are duplicated, and the current clone logic does not adapt to the
new multi-version path, resulting in many cases not triggering incremental clone.
This CL mainly modified:
1. Removed `incr_rs_meta` metadata
2. Modified the clone logic. When the clone is incremented, it will try to read the rowset in `stale_rs_meta`.
3. Delete a lot of code that was previously used for version compatibility.
One Tablet have three replicas, the compaction progress is different.
Considering the following scenario, replica A have 3 versions(1, 2, 3);
replica B have 2 versions(1-2, 3);
replica C have 1 versions(1-3).
Now a column named city been added with default zero 1.
Replica A will be resulted as 3, replica B results as 2,
replica C results as 1.
So there is a necessity to restrict the default value to zero for SUM aggregation column.
The io related codes may be used by new modules, so It's better to move them to fe-common.
The modification to fe-core is frequent, but there are many generated java files by thrift
will slow down the compilation, so It's better to move thrift generation process to fe-common.
Currently both log4j1 and log4j2 are used, which leads to logs are written to wrong files.
Our modification will remove log4j1 from dependency, use slf4j + slf4j -> log4j2 instead.
Previously, we introduced an optimization logic for the aggr table,
that is, in the case of only one rowset and nonoverlapping,
the data can be read directly without merging.
But this logic has bugs.
This PR support following functions:
1. Support content properties in backup stmt. It means user can backup only metadata or
meta+data which use content [METADATA_ONLY| ALL]attribute to distinguish.
2. Support exclude some tables in backup and restore stmt. This means that some
very large and unimportant tables can be excluded when the entire database is backed up.
3. Support backup and restore whole database instead of declaring each table name
in the backup and restore statement.
The backup and restore api has changed as following:
```
BACKUP SNAPSHOT [db_name].{snapshot_name}
TO 'repo_name'
[ON|EXCLUDE (
'table_name' [partition (p1,...)]
)]
[properties (
"content" = "metadata_only|all"
)]
RESTORE SNAPSHOT [db_name].{snapshot_name}
TO 'repo_name'
[EXCLUDE|ON (
'table_name' [partition (p1,...)]
)]
[properties (
)]
```
Support delete statement like:
1. delete from table partitions(p1, p2) where xxx; // apply to p1, p2
2. delete from table where xxx; // apply to all partitions
Also remove code about the deprecated sync/async delete job.
This CL changes FE meta version to 94
1. add graceful exit mechanism for the compaction producer thread.
2. if compaction task submits unsuccessfully, the compaction task should pop from `_tablet_submitted_compaction`.
[BackupAndRestore] Support backup and restore view and external odbc table
1. Support backup and restore view and odbc table. The syntax is the same as that of the backup and restore table.
2. If the table associated with the view does not exist in the snapshot,
the view can still be backed up successfully, but the TableNotFound exception will be thrown when querying the view.
3. If the odbc table associated with the odbc resource, the odbc resource will be backuped and restored together.
4. If the same view, odbc table and resource already exists in the database, it will compare whether the metadata of snapshot is consistent.
If it is inconsistent, the restoration will fail.
4. This pr also modified the json format of the backup information.
A `new_backup_objects` object is added to the root node to store backup meta-information other than olap table,
such as views and external tables.
```
{
"backup_objects": {},
"new_backup_objects": {
"view": [
{"name": "view1", "id": "10001"}
],
"odbc_table": [
{"name":"xxx", xxx}
]
"odbc_resources": [
{"name": "bj_oracle"}
]
}
}
```
5. This pr changes the serialization and deserialization method of backup information
from manual construction to automatic analysis by Gson tools.
Change-Id: I216469bf2a6484177185d8354dcca2dc19f653f3
If there are too large fields in the table, there may be only one row in each page,
and this row also has a zone map index
This causes the stored data to expand three times the original data,
It also takes up more memory when reading those segments
Therefore, we need to Disable the creation of zonemap indexes for segments with too few rows
In the previous implementation, in an load job,
multiple memtables of the same tablet are written to disk sequentially.
In fact, multiple memtables can be written out of order in parallel,
only need to ensure that each memtable uses a different segment writer.
* [Load] Broker Load supports setting the load parallelism
Similar to the parallel_fragment_exec_instance_num parameter,
it allows the user to set the parallelism of the load execution plan
on a single node when the broker load is submitted.
eg:
```
...
properties (
"load_parallelism" = "4";
...
)
```
This parameter is currently only used to support the load parallelism setting,
but it cannot significantly improve the load speed for the time being.
The speed increase will be completed in subsequent code submissions.
Documents will also be added in subsequent submissions.
This PR also update the FE meta version.
The essence of the problem is behavior of negative zero (- 0.0) in comparison with positive zero (+ 0.0).
Currently in GroupBy and HashPartition, -0.0 is not equal to 0.0 (result of Hash function),
so the -0.0 and 0.0 are divided into 2 partitions.
In row_number analytic function, for the sorted data, a new partition will be opened when the values of
the upper and lower rows are not equal. But in C++ the comparison 0.0 == -0.0 is true, so 0.0 and -0.0
are divided into the same partition for row_number.
(Floating point arithmetic in C++ is often IEEE-754. This norm defines two different representations for
the value zero: positive zero and negative zero. It is also defined that those two representations must
compare equals. Refer to https://stackoverflow.com/questions/45795397)
At present, the application of vlog in the code is quite confusing.
It is inherited from impala VLOG_XX format, and there is also VLOG(number) format.
VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG
1. Schema hash is useless long time ago
Currently, schema hash can only be generated as a random integer, no need to calculated
from real schema.
2. The CRC32 algo is not enough to generate the table' signature.
Table's signature is used to determine whether the tables have the same schema.
And current CRC32 algo may return same signature even if table's schema are different.
So I change it to calculate the md5 of a signature string assembled by schema info of a table.