The thread context saves some info about a working thread.
1. thread_id: Current thread id, Auto generated.
2. type: The type is a enum value indicating which type of task current thread is running.
For example: QUERY, LOAD, COMPACTION, ...
3. task id: A unique id to identify this task. maybe query id, load job id, etc.
Using gcc11 compiles thread_local variable on lower versions of GLIBC will report an error, see https://github.com/apache/incubator-doris/pull/7911
This is very difficult to solve, so kudu Class-scoped static thread local implementation was introduced.
Solve the above problem by Thread-scopedthread local + Class-scoped thread local.
See the comments for ThreadContextPtr for details.
There are 3 error code types in BE: OLAPStatus AgentStatus Status.
It is very confused and sometimes conflict during write code.
I will try to unify them to Status.
In the original tablet reporting information, the version missing information is done by combining
two pieces of information as follows:
1. the maximum consecutive version number
2. the `version_miss` field
The logic of this approach is confusing and inconsistent with the logic of checking for missing versions when querying.
After the change, we directly use the version checking logic used in the query, and set `version_miss` to true
if a missing version is found
and on the FE processing side. Originally, only the **bad replica** information was syncronized among FEs,
but not the **version missing** information. As a result, the non-master FE is not aware of the missing version information.
In the new design, we deprecate the original log persistence class `BackendTabletsInfo` and use the new
`BackendReplicasInfo` to record replica reporting information and write both **bad** and **version missing**
information to metadata so that other FEs can synchronize these information.
1. No longer using short-circuit to evaluate date type, because the cost of read date type is small,
lazy materialization has higher costs.
2. Fix read hll/bitmap/date type error results.
In some scenarios, users cannot find a suitable hash key to avoid data skew, so we need to provide an additional data distribution for olap table to avoid data skew
example:
CREATE TABLE random_table
(
siteid INT DEFAULT '10',
citycode SMALLINT,
username VARCHAR(32) DEFAULT '',
pv BIGINT SUM DEFAULT '0'
)
AGGREGATE KEY(siteid, citycode, username)
DISTRIBUTED BY random BUCKETS 10
PROPERTIES("replication_num" = "1");
Co-authored-by: caiconghui1 <caiconghui1@jd.com>