Make database, table, column and other names support unicode by changing LABEL_REGEX COMMON_NAME_REGIEX COMMON_TABLE_NAME_REGEX COLUMN_NAME_REGEX regular expressions in class FeNameFormat.
P.S. @SharpRay has transfered PR #13467 to me, and I‘m responsible for the task now. There will be some modifications during the review period, so I create a new PR and the original #13467 could be closed. Thanks.
This pr refactor the rewrite framework from memo to plan tree, and speed up the analyze/rewrite stage.
Changes:
- abandoned memo in the analysis/rewrite stage, so that we can skip some actions, like new GroupExpression, distinct GroupExpression in the memo(high cost), update children to GroupPlan
- change the most of rules to static rule, so that we can skip initialize lots of rules in Analyzer/Rewriter at every query. but some rules need context, like visitor rule, create rule at the runtime make it is easy to use, so make `custom` rule can help us to create it.
- remove the `logger` field in the Job, Job are generated in large quantities at runtime, we don't need to use logger so save huge time to initialize logger.
- skip some rule as far as possible, e.g. `SelectMaterializedIndexWithoutAggregate`, skip select mv if the table not exist rullup.
- add some caches for frequent operation, like get Job.getDisableRules, Plan.getUnboundExpression
- new bottom up rewrite rule, it can keep traverse multiple new plan which return by rules, this feature depends on `Plan.mutableState`, it is necessary to add this variable field for plan. if the plan is fully immutable, we must use withXxx to renew the plan and set the state for it, this take more runtime overhead and developing workload. another reason is we need multiple mutable state, e.g. whether is applied the rule, whether this plan is manage by the rewrite framework. the good side of mutable state is efficient, but I suggest we don't direct use mutable state in the rule as far as possible, if we need use it, please wrap the mutable state in the framework to update and release it correctly. a good example is `AppliedAwareRuleCondition`, it can update and get the state: whether this plan is applied to a rule before.
- merge some rules, invoke multiple rules in one traverse
- refactor the `EliminateUnnecessaryProject` by CustomRewritor, fix the problem which eliminate some Project which decided the query output order, the case is limit(project), sort(project).
TODO: add trace for new rewrite framework
benchmark:
legacy optimizer:
```
+-----------+---------------+---------------+---------------+
| SQL ID | avg | min | max |
+-----------+---------------+---------------+---------------+
| SQL 1 | 1.39 ms | 0 ms | 9 ms |
| SQL 2 | 1.38 ms | 0 ms | 10 ms |
| SQL 3 | 2.05 ms | 1 ms | 18 ms |
| SQL 4 | 0.89 ms | 0 ms | 9 ms |
| SQL 5 | 1.74 ms | 1 ms | 11 ms |
| SQL 6 | 2.00 ms | 1 ms | 13 ms |
| SQL 7 | 1.83 ms | 1 ms | 15 ms |
| SQL 8 | 0.92 ms | 0 ms | 7 ms |
| SQL 9 | 2.60 ms | 1 ms | 19 ms |
| SQL 10 | 3.54 ms | 2 ms | 28 ms |
| SQL 11 | 3.04 ms | 1 ms | 18 ms |
| SQL 12 | 3.26 ms | 2 ms | 16 ms |
| SQL 13 | 1.10 ms | 0 ms | 10 ms |
| SQL 14 | 2.90 ms | 1 ms | 13 ms |
| SQL 15 | 1.18 ms | 0 ms | 9 ms |
| SQL 16 | 1.05 ms | 0 ms | 13 ms |
| SQL 17 | 1.03 ms | 0 ms | 7 ms |
| SQL 18 | 0.94 ms | 0 ms | 7 ms |
| SQL 19 | 1.47 ms | 0 ms | 13 ms |
| SQL 20 | 0.47 ms | 0 ms | 4 ms |
| SQL 21 | 0.54 ms | 0 ms | 5 ms |
| SQL 22 | 3.34 ms | 1 ms | 19 ms |
| SQL 23 | 7.97 ms | 4 ms | 44 ms |
| SQL 24 | 11.11 ms | 7 ms | 28 ms |
| SQL 25 | 0.98 ms | 0 ms | 8 ms |
| SQL 26 | 0.83 ms | 0 ms | 7 ms |
| SQL 27 | 0.93 ms | 0 ms | 16 ms |
| SQL 28 | 2.19 ms | 1 ms | 18 ms |
| SQL 29 | 3.23 ms | 1 ms | 20 ms |
| SQL 30 | 59.99 ms | 51 ms | 81 ms |
| SQL 31 | 2.65 ms | 1 ms | 18 ms |
| SQL 32 | 2.47 ms | 1 ms | 17 ms |
| SQL 33 | 2.30 ms | 1 ms | 16 ms |
| SQL 34 | 0.66 ms | 0 ms | 8 ms |
| SQL 35 | 0.63 ms | 0 ms | 6 ms |
| SQL 36 | 2.25 ms | 1 ms | 15 ms |
| SQL 37 | 5.97 ms | 3 ms | 20 ms |
| SQL 38 | 5.73 ms | 3 ms | 21 ms |
| SQL 39 | 6.32 ms | 4 ms | 23 ms |
| SQL 40 | 8.61 ms | 5 ms | 35 ms |
| SQL 41 | 6.29 ms | 4 ms | 28 ms |
| SQL 42 | 6.04 ms | 4 ms | 15 ms |
| SQL 43 | 5.81 ms | 3 ms | 16 ms |
+-----------+---------------+---------------+---------------+
| TOTAL AVG | 4.22 ms | 2.47 ms | 17.05 ms |
| TOTAL SUM | 181.62 ms | 106 ms | 733 ms |
+-----------+---------------+---------------+---------------+
```
nereids with memo rewrite framework(old):
```
+-----------+---------------+---------------+---------------+
| SQL ID | avg | min | max |
+-----------+---------------+---------------+---------------+
| SQL 1 | 3.61 ms | 1 ms | 20 ms |
| SQL 2 | 3.47 ms | 2 ms | 16 ms |
| SQL 3 | 3.27 ms | 1 ms | 18 ms |
| SQL 4 | 2.23 ms | 1 ms | 12 ms |
| SQL 5 | 3.60 ms | 1 ms | 20 ms |
| SQL 6 | 2.73 ms | 1 ms | 17 ms |
| SQL 7 | 3.04 ms | 1 ms | 23 ms |
| SQL 8 | 3.53 ms | 2 ms | 20 ms |
| SQL 9 | 3.74 ms | 2 ms | 22 ms |
| SQL 10 | 3.66 ms | 2 ms | 18 ms |
| SQL 11 | 3.93 ms | 2 ms | 15 ms |
| SQL 12 | 4.85 ms | 2 ms | 27 ms |
| SQL 13 | 4.41 ms | 2 ms | 28 ms |
| SQL 14 | 5.16 ms | 2 ms | 41 ms |
| SQL 15 | 4.33 ms | 2 ms | 33 ms |
| SQL 16 | 4.94 ms | 2 ms | 51 ms |
| SQL 17 | 3.27 ms | 1 ms | 25 ms |
| SQL 18 | 2.78 ms | 1 ms | 22 ms |
| SQL 19 | 3.51 ms | 1 ms | 42 ms |
| SQL 20 | 1.84 ms | 1 ms | 13 ms |
| SQL 21 | 3.47 ms | 1 ms | 66 ms |
| SQL 22 | 5.21 ms | 2 ms | 29 ms |
| SQL 23 | 5.55 ms | 3 ms | 25 ms |
| SQL 24 | 4.21 ms | 2 ms | 28 ms |
| SQL 25 | 3.47 ms | 1 ms | 23 ms |
| SQL 26 | 3.03 ms | 2 ms | 21 ms |
| SQL 27 | 3.07 ms | 1 ms | 17 ms |
| SQL 28 | 4.51 ms | 3 ms | 22 ms |
| SQL 29 | 4.97 ms | 3 ms | 21 ms |
| SQL 30 | 11.95 ms | 8 ms | 33 ms |
| SQL 31 | 3.92 ms | 2 ms | 23 ms |
| SQL 32 | 3.74 ms | 2 ms | 15 ms |
| SQL 33 | 3.62 ms | 2 ms | 22 ms |
| SQL 34 | 4.60 ms | 1 ms | 55 ms |
| SQL 35 | 3.47 ms | 2 ms | 25 ms |
| SQL 36 | 3.34 ms | 2 ms | 18 ms |
| SQL 37 | 4.77 ms | 2 ms | 23 ms |
| SQL 38 | 4.44 ms | 2 ms | 39 ms |
| SQL 39 | 4.52 ms | 2 ms | 23 ms |
| SQL 40 | 5.50 ms | 3 ms | 30 ms |
| SQL 41 | 5.01 ms | 2 ms | 24 ms |
| SQL 42 | 4.32 ms | 2 ms | 24 ms |
| SQL 43 | 4.29 ms | 2 ms | 42 ms |
+-----------+---------------+---------------+---------------+
| TOTAL AVG | 4.11 ms | 1.91 ms | 26.30 ms |
| TOTAL SUM | 176.88 ms | 82 ms | 1131 ms |
+-----------+---------------+---------------+---------------+
```
nereids with plan tree rewrite framework(new):
```
+-----------+---------------+---------------+---------------+
| SQL ID | avg | min | max |
+-----------+---------------+---------------+---------------+
| SQL 1 | 3.21 ms | 1 ms | 18 ms |
| SQL 2 | 3.99 ms | 1 ms | 76 ms |
| SQL 3 | 2.93 ms | 1 ms | 21 ms |
| SQL 4 | 2.13 ms | 1 ms | 21 ms |
| SQL 5 | 2.43 ms | 1 ms | 30 ms |
| SQL 6 | 2.08 ms | 1 ms | 11 ms |
| SQL 7 | 2.03 ms | 1 ms | 11 ms |
| SQL 8 | 2.27 ms | 1 ms | 22 ms |
| SQL 9 | 2.42 ms | 1 ms | 16 ms |
| SQL 10 | 2.65 ms | 1 ms | 14 ms |
| SQL 11 | 2.78 ms | 1 ms | 14 ms |
| SQL 12 | 3.09 ms | 1 ms | 19 ms |
| SQL 13 | 2.33 ms | 1 ms | 13 ms |
| SQL 14 | 2.66 ms | 1 ms | 16 ms |
| SQL 15 | 2.34 ms | 1 ms | 15 ms |
| SQL 16 | 2.04 ms | 1 ms | 30 ms |
| SQL 17 | 2.09 ms | 1 ms | 17 ms |
| SQL 18 | 1.87 ms | 1 ms | 15 ms |
| SQL 19 | 2.21 ms | 1 ms | 50 ms |
| SQL 20 | 1.32 ms | 0 ms | 12 ms |
| SQL 21 | 1.63 ms | 1 ms | 11 ms |
| SQL 22 | 2.75 ms | 1 ms | 30 ms |
| SQL 23 | 3.44 ms | 2 ms | 17 ms |
| SQL 24 | 2.01 ms | 1 ms | 14 ms |
| SQL 25 | 1.58 ms | 1 ms | 11 ms |
| SQL 26 | 1.53 ms | 0 ms | 13 ms |
| SQL 27 | 1.62 ms | 1 ms | 12 ms |
| SQL 28 | 2.90 ms | 1 ms | 21 ms |
| SQL 29 | 3.04 ms | 2 ms | 17 ms |
| SQL 30 | 10.54 ms | 7 ms | 49 ms |
| SQL 31 | 2.61 ms | 1 ms | 21 ms |
| SQL 32 | 2.42 ms | 1 ms | 14 ms |
| SQL 33 | 2.13 ms | 1 ms | 14 ms |
| SQL 34 | 1.69 ms | 1 ms | 14 ms |
| SQL 35 | 1.87 ms | 1 ms | 15 ms |
| SQL 36 | 2.37 ms | 1 ms | 21 ms |
| SQL 37 | 3.06 ms | 1 ms | 15 ms |
| SQL 38 | 4.09 ms | 1 ms | 31 ms |
| SQL 39 | 5.81 ms | 2 ms | 43 ms |
| SQL 40 | 4.55 ms | 2 ms | 34 ms |
| SQL 41 | 3.49 ms | 1 ms | 20 ms |
| SQL 42 | 2.75 ms | 1 ms | 26 ms |
| SQL 43 | 2.81 ms | 1 ms | 14 ms |
+-----------+---------------+---------------+---------------+
| TOTAL AVG | 2.78 ms | 1.19 ms | 21.35 ms |
| TOTAL SUM | 119.56 ms | 51 ms | 918 ms |
+-----------+---------------+---------------+---------------+
```
1. Make sure all sub types which STRUCT supported work correctly;
2. remove unused variable `_need_validate_data`;
3. lazy init min or max decimal to support nested DecimalV2 column validate;
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Hive store all the data without partition columns to a default partition named __HIVE_DEFAULT_PARTITION__.
Doris will fail to get the this partition when the partition column type is INT or something else that
__HIVE_DEFAULT_PARTITION__ couldn't convert to.
This pr is to support hive default partition, set the column value to NULL for the missing partition columns.
Loading a big local file will cause `INTERNAL_ERROR]too many filtered rows` issue since the bytebuffer from mysql client always use the same byte array.
And the later bytes will overwrite the previous one and make wrong bytes order among the network.
Copy the byte array and then fill it into network.
* disable setting storage policy on MoW table
* fix error in regression test
* make the name of test table unique
* use Strings.isNullOrEmpty to replace equals
* fix error in if statement
* Support mapping es date format, default/yyyy-MM-dd HH:mm:ss/yyyy-MM-dd/epoch_millis
* Replace simple json with jackson, resolve column order random problem
* Add es array doc version
Enhance aggregate function `collect_set` and `collect_list` to support optional `max_size` param,
which enables to limit the number of elements in result array.
Demo:
```
# HELP doris_fe_mtmv_job Total job number of mtmv.
# TYPE doris_fe_mtmv_job gauge
doris_fe_mtmv_job{type="TOTAL-JOB"} 1
doris_fe_mtmv_job{type="ACTIVE-JOB"} 1
# HELP doris_fe_mtmv_task Running task number of mtmv.
# TYPE doris_fe_mtmv_task gauge
doris_fe_mtmv_task{type="RUNNING-TASK"} 0
doris_fe_mtmv_task{type="PENDING-TASK"} 0
doris_fe_mtmv_task{type="FAILED-TASK"} 0
doris_fe_mtmv_task{type="TOTAL-TASK"} 1
```
when emitCsgCmp, we should check if there is some missed edges should be used as connection edge. If there is missed edge but can't be used as connection edge, the emitCsgCmp should return and seek for another plan.
Add use_fix_replica session variable, so that we can be better debug replica inconsistencies problem.
If use_fix_replica default is -1, which means not fix,
else we will choose the {use_fix_replica} smallest replica.
function pushdown: #10355
NGram BloomFilter Index apply like pushdown: #11579
Enabled by default, make sure it stays active.
If NGram BloomFilter Index is not used, this like pushdown can be replaced by #15917, which can push down all expressions including like.
[WARNING:gensrc/thrift/parquet.thrift:22] Uncaptured doctext at on line 18.
[WARNING:gensrc/thrift/parquet.thrift:23] Uncaptured doctext at on line 22.
[WARNING:gensrc/thrift/parquet.thrift:436] Uncaptured doctext at on line 428.
WARNING in asset size limit: The following asset(s) exceed the recommended size limit (244 KiB).WARNING in asset size limit: The following asset(s) exceed the
recommended size limit (244 KiB). This can impact web performance
WARNING in entrypoint size limit: The following entrypoint(s) combined asset size exceeds the recommended limit
Warning : Macro "NonTerminator" has been declared but never used.
when stream load with 2pc, the table was droped before commit, it will get error commit or abort, trasaction can not finish.
if commit or abort ,will get error:
{
"status": "ANALYSIS_ERROR",
"msg": "errCode = 7, detailMessage = unknown table, tableId=52579"
}
after this pr, i can abort success.
The data type `NUMBER(p,s)` of oracle has some different of doris decimal type in semantics.
For Oracle Number(p,s) type:
1.
if s<0 , it means this is an Interger. This `NUMBER(p,s)` has (p+|s| ) significant digit,
and rounding will be performed at s position.
eg: if we insert 1234567 into `NUMBER(5,-2)` type, then the oracle will store 1234500. In this case,
Doris will use
int type (`TINYINT/SMALLINT/INT/.../LARGEINT`).
2. if s>=0 && s<p , it just like doris Decimal(p,s) behavior.
3. if s>=0 && s>p, it means this is a decimal(like 0.xxxxx).
p represents how many digits can be left to the left after the decimal point,
the figure after the decimal point s will be rounded. eg: we can not insert 0.0123456 into `NUMBER(5,7)` type,
because there must be two zeros on the right side of the decimal point,
we can insert 0.0012345 into `NUMBER(5,7)` type. In this case, Doris will use `DECIMAL(s,s)`
4. if we don't specify p and s for `NUMBER(p,s)` like `NUMBER`,
the p and s of `NUMBER` are uncertain. In this case, doris can not determine p and s,
so doris can not determine data type.
1. Enhencement:
For single-charset column separator,csv_reader use another method of `split value`.
2. BugFix
Set `json` file format loading to be sensitive.
Support parsing map&struct type in parquet&orc reader.
## Remaining Problems
1. Doris use array type to build the key and value column of a `map`, but doesn't fill the offsets in value column, so the offsets in value column is wasted.
2. Parquet support reading only key or value column in `map`, this PR hasn't supported yet.
3. Parquet support reading partial columns in `struct`, this PR hasn't supported yet.
This PR mainly changes:
When upgrading from old version to master, the ADMIN_PRIV for normal user may be lost.
This may only happen if:
Create a user with ADMIN_PRIV privilege.
Upgrade Doris to v1.2.x or master before the meta image which contains the edit log in step 1 is generate.
And the ADMIN_PRIV will be lost in Global Privileges
This PR will rectify this bug and set ADMIN_PRIV to the right place
Refactor the user's implicit role name
In [feature](auth)Implementing privilege management with rbac model #16091, we refactor the Doris auth model by introducing RBAC. And each user will have an implicit role,
named with prefix default_role_rbac_. But it has wrong format like:
default_role_rbac_'default_cluster:user1'@'%'
This PR change the role name's format, like:
default_role_rbac_user1@%
default_role_rbac_user2@[domain]
NOTICE: this change may cause incompatible metadata, but since [feature](auth)Implementing privilege management with rbac model #16091 is not released, we should fix it soon.
Add a new session variable show_user_default_role
When set to true, it will show implicit role of user in the result of show roles stmt. Default is false