1. upgrade gutil code from imapla to new verison, include `cpuinfo`, `spinlock` and `linux_syscall_support `
2. impliments arm version utf8 check code
3. remove incompatible code from stopwatch
This CL supports arrow's zero copy read interface, which can make code
comply with arrow 0.15.
And the schema change unit test has some problem, I disable it in run-ut.sh
In some scenarios, when a user creates an olap table that is range partition by time, the user needs to periodically add and remove partitions to ensure that the data is valid. As a result, adding and removing partitions dynamically can be very useful for users.
* [Alter Table] No need to check whether table is stable when doing some kinds of alter operation.
Not all alter table operation require table to be stable. Such as rename, modify meta data.
Compaction task may sometimes consume much memory and results in OOM.
And currently, there is no good way to predict the mem consumption of
a compaction task, so I add a new BE config: max_compaction_concurrency
to limit the max concurrency of running compaction tasks manually.
When merge reads from one rowset with multi overlapping segments,
I introduce a priority queue(A Minimum heap data structure) for multipath merge sort,
to replace the old N*M time complexity algorithm.
This can significantly improve the read efficiency when merging large number of
overlapping data.
In mytest:
1. Compaction with 187 segments reduce time from 75 seconds to 42 seconds
2. Compaction with 3574 segments cost 43 seconds, and with old version, I kill the
process after waiting more than 10 minutes...
This CL only change the reads of alpha rowset. Beta rowset will be changed in another CL.
ISSUE: #2631
ISSUE #1553
This commit will remove function count_distinct().
We already have function multi_distinct_count as an alternative to help us calculate "count distinct" of any type value.
Besides, the count_distinct() function is with the the same symbol as count() function, which fails to express the meaning.
So I suggest to remove count_distinct() function.
Doris support AGG_KEYS/UNIQUE_KEYS/DUP_KEYS/ three storage model.
Among these three model, UNIQUE_KYES/DUP_KEYS is added after AGG_KEYS.
For historical tablet, the keys_type field to indicate storage model
may be missed for AGG_KEYS.
So upgrade from historical tablet, this situation should be taken into
consideration and set to be AGG_KEYS.
The upcoming patch will use CREATE_OR_OPEN mode
This patch also remove virtual dtors to cpp file.
* Move the dtors back to env.h
Generally, placing the dtor in an `.h` file(inline) or in a `cpp` file
depends on the trade-off between code expansion and function call overhead.
The code expansion rate is closely related to the number of class members
and the inheritance level.
For the several classes here: `Env`, `ReadableFile`, and `WritableFile`
have no members and are the top level of the inheritance hierarchy, But
for now I have no obvious evidence to prove that make their dtors inline
will cause serious code expansion and more instruction cache-misses,
even if there are thousands of `ReadableFile` objects kept being created
and released during running.
Only the Pages in the linked-list can be destructed in the
ColumnWriter dtor, but if we meet something wrong, we will
return directly, which causes a memory leak
* Unify the names of methods in `TabletManager` which do not require locks
Currently, there are several naming patterns in `TabletManager` class
for methods (mainly private methods) that needs to be executed inside the lock:
1. **`xxx_with_no_lock()`**:
The "with_no_lock" suffix has two meanings: one is not needed,
and the other is that a lock has been added externally;
2. **`xxx_unlock()`**:
"unlock" is a verb and may be mistaken for the need to unlock
a mutex in this method.
3. **`xxx_unlocked()`**:
Note that "unlocked" is an adjective that means the operation
in this method is not locked.
4. **`xxx_locked()`**:
"locked" is also an adjective, meaning that the method is locked.
This is also more likely to be misunderstood: one is already
locked externally; the other is locked internally by the method.
Actually what we really want is `xxx_already_locked`, but this way
the name is a little longer.
5. There is no identification in the method name:
the reader cannot intuitively know whether the method needs to be locked
This patch unifies all the above pattern to be `xxx_unlocked()`, and adjust
some indentation in code style.
Additionally, this patch also remove an unused `add_tablet()` method, because
a new version has already been used.
This patch doesn't contain any functional modifications.
This commit adds a new statement named alter view, like
ALTER VIEW view_name
(
col_1,
col_2,
col_3,
)
AS SELECT k1, k2, SUM(v1) FROM exampleDb.testTbl GROUP BY k1,k2
Support compaction operation to compact only one rowset.
After the modification, the last rowset of the tablet will
also be compacted.
At the same time, we added a `segments_overlap_pb` field to
the rowset meta. Used to describe whether the segment data
in the rowset overlaps. This field is set by `rowset_writer`.
Initially UNKNOWN for compatibility with existing data.
In addition, the version hash of the rowset generated after
compaction is directly set to the version hash of last rowset
participating in compaction, to ensure that the tablet's
version hash remains unchanged after compaction.