This commit adds a new statement named alter view, like
ALTER VIEW view_name
(
col_1,
col_2,
col_3,
)
AS SELECT k1, k2, SUM(v1) FROM exampleDb.testTbl GROUP BY k1,k2
Support compaction operation to compact only one rowset.
After the modification, the last rowset of the tablet will
also be compacted.
At the same time, we added a `segments_overlap_pb` field to
the rowset meta. Used to describe whether the segment data
in the rowset overlaps. This field is set by `rowset_writer`.
Initially UNKNOWN for compatibility with existing data.
In addition, the version hash of the rowset generated after
compaction is directly set to the version hash of last rowset
participating in compaction, to ensure that the tablet's
version hash remains unchanged after compaction.
to solve the issue #2246.
scheme is as following:
add a optional preferred_rowset_type in TabletMeta for V2 format rollup index tablet
add a boolean session variable use_v2_rollup, if set true, the query will v2 storage format rollup index to process the query.
test queries will be sent to online service to verify the correctness of segment-v2 by send the the same queries to fe with use_v2_rollup set or not to check whether the returned results are the same.
When there are to many segment in one rowset, which is larger than
BE config 'max_cumulative_compaction_num_singleton_deltas', the
cumulative compaction will not work and just increase the cumulative
point, because there is only once rowset being selected.
So when selecting rowset for cumulative compaction, we should meet 2
requirments before finishing the selection logic:
1. compaction score is larger than 'max_cumulative_compaction_num_singleton_deltas'
2. at least 2 rowsets are selected.
The current compaction selection strategy and cumulative point update logic
will cause the cumulative compaction to not work, and all compaction tasks
will be completed only by the base compaction. This can cause a large number
of data versions to pile up.
In the current cumulative point update logic, when a cumulative cannot select
enough number of rowsets, it will directly increase the cumulative point.
Therefore, when the data version generates the same speed as the cumulative
compaction polling, it will cause the cumulative point to continuously increase
without triggering the cumulative compaction.
The new strategy mainly modifies the update logic of cumulative point to ensure
that the above problems do not occur. At the same time, the new strategy also
takes into account the problem that compaction cannot be performed if cumulative
points stagnate for a long time. Cumulative points will be forced to increase
through threshold settings to ensure that compaction has a chance to execute.
Also add a new HTTP API to view the compaction status of specified tablet.
See `compaction-action.md` for details.
The control framework is implemented through heartbeat message. Use uint64_t as flags to control different functions.
Now add a flag to set the default rowset type to beta.
**Authorization checking logic**
There are some problems with the current password and permission checking logic. For example:
First, we create a user by:
`create user cmy@"%" identified by "12345";`
And then 'cmy' can login with password '12345' from any hosts.
Second, we create another user by:
`create user cmy@"192.168.%" identified by "abcde";`
Because "192.168.%" has a higher priority in the permission table than "%". So when "cmy" try
to login in by password "12345" from host "192.168.1.1", it should match the second permission
entry, and will be rejected because of invalid password.
But in current implementation, Doris will continue to check password on first entry, than let it pass. So we should change it.
**Permission checking logic**
After a user login, it should has a unique identity which is got from permission table. For example,
when "cmy" from host "192.168.1.1" login, it's identity should be `cmy@"192.168.%"`. And Doris
should use this identity to check other permission, not by using the user's real identity, which is
`cmy@"192.168.1.1"`.
**Black list**
Functionally speaking, Doris only support adding WHITE LIST, which is to allow user to login from
those hosts in the white list. But is some cases, we do need a BLACK LIST function.
Fortunately, by changing the logic described above, we can simulate the effect of the BLACK LIST.
For example, First we add a user by:
`create user cmy@'%' identified by '12345';`
And now user 'cmy' can login from any hosts. and if we don't want 'cmy' to login from host A, we
can add a new user by:
`create user cmy@'A' identified by 'other_passwd';`
Because "A" has a higher priority in the permission table than "%". If 'cmy' try to login from A using password '12345', it will be rejected.
This variable is mainly for INSERT operation, because INSERT operation has both query and load part.
Using only the exec_mem_limit variable does not make a good distinction of memory limit between the two parts.
This commit will add a new sql mode named MODE_PIPES_AS_CONCAT:
Description:
1、If this mode is active, '||' will be handled different from the original way ('||' and 'or' are seen as the same symbols in Doris) that it can be used to concat two exps and returns a new string. For example, 'a' || 'b' = 'ab' and 1 || 0 = '10'.
2. User can active this mode by "SET sql_mode = PIPES_AS_CONCAT", and deactive it by "SET sql_mode = '' ".
1. upgrade log4j to 2.12.1
2. Add 2 new FE config:
'sys_log_delete_age' and default is '7d', for sys log.
'audit_log_delete_age' and default is '30d', for audit log.
it means if a log's last modification time is 7/30 days ago, it will be deleted.
Some use has the requirment that only some of columns will be update in
one load operation, and others will retain as original. However, Doris
can't handle this situation, because user must specify value for all
columns. Then if a column aggregation method is REPLACE, use must query
original value to overwrite it. This often needs some work for user to
do.
If this CL is applied, user can use REPLACE_IF_NOT_NULL instead of
REPLACE. Then when load data to table, if user don't intent to change
value of this column, user can specify NULL for this column. Doris will
retain original value for this column.
At present, we do not support SQL MODE which is similar to MySQL. In MySQL, SQL MODE is stored in global session and session with a 64 bit address,and every bit 0 or 1 on this address represents a mode state. Besides, MySQL supports combine mode which is composed of several modes.
We should support SQL MODE to deal with sql dialect problems. We can heuristically use the MySQL way to store SQL MODE in session and parse it into string when we need to return it back to client.
This commit suggests a solution to support SQL MODE. But it's just a sample, and the mode types in SqlModeHelper.java are not really meaningful from now on.
Mainly fix the following issues:
1. A null pointer exception is raised when a database or table is dropped. The expected behavior is that the routine load job is stopped.
2. Memory leaks. Batch routine load task submissions are no longer performed, and modifications are submitted separately for each task.
3. Unreasonable task timeout.
Routine load tasks should not be queued in the BE thread pool for execution. The task sent to the BE should be executed immediately, otherwise the task in the FE will be timeout first. Eventually leads to constant timeout for all subsequent tasks.
4. All routine load job should be scheduled once it being submitted. Not waiting the available BE slot. Otherwise, all later submitted jobs may not be scheduled forever.
Add a new type: Object. Currently, it's mainly for complex aggregate metrics(HLL , Bitmap).
The Object type has the following constraints:
1 Object type could not as key column type
2 Object type doesn't support all indices (BloomFilter, short key, zone map, invert index)
3 Object type doesn't support filter and group by
In the implementation:
The Object type reuse the StringValue and StringVal, because in storage engine, the Object type is binary, it has a pointer and length.
ISSUE-2069: This kind of query could be stuck.
The sender failed to send the last packet to receiver.
Also, the failure does not be reportted to FE , so the query is not cancelled.
The error log sames as "body_size=xxxx from xxx:xxx is too large".
The reason of the socket is that the packet of the query is too big which is more then the max_body_size of brpc.
This commit add a config named brpc_max_body_size whcih is used to change the max_body_size of brpc.
Also, user can change the max_body_size directly on-the-fly by "http://host:brpc_port/flags".