Currently, when setting variables with `global` keywords, it will not affect the
current session variable's value. That is always make user confused.
This CL mainly changes:
1. Change session variable when set global variable
Change docs about array functions to correct directory.
Because we already refractor the docs directory.
```
docs/en/sql-manual/sql-functions/array-functions/ ===>
docs/en/docs/sql-manual/sql-functions/array-functions
```
```
docs/zh-CN/sql-manual/sql-functions/array-functions/ ===>
docs/zh-CN/docs/sql-manual/sql-functions/array-functions/
```
1. fix all checkstyle warning
2. change all checkstyle rules to error
3. remove some java doc rules
a. RequireEmptyLineBeforeBlockTagGroup
b. JavadocStyle
c. JavadocParagraph
4. suppress some rules for old codes
a. all java doc rules only affect on Nereids
b. DeclarationOrder only affect on Nereids
c. OverloadMethodsDeclarationOrder only affect on Nereids
d. VariableDeclarationUsageDistance only affect on Nereids
e. suppress OneTopLevelClass on org/apache/doris/load/loadv2/dpp/ColumnParser.java
f. suppress OneTopLevelClass on org/apache/doris/load/loadv2/dpp/SparkRDDAggregator.java
g. suppress LineLength on org/apache/doris/catalog/FunctionSet.java
h. suppress LineLength on org/apache/doris/common/ErrorCode.java
add some logic to opt compaction:
1.seperate base&cumu compaction in case base compaction runs too long and
affect cumu compaction
2.fix level size in cu compaction so that file size below 64M have a right level
size, when choose rowsets to do compaction, the policy will ignore big rowset,
this will reduce about 25% cpu in high frequency concurrent load
3.remove skip window restriction so rowset can do compaction right after
generated, cause we'll not delete rowset after compaction. This will highly
reduce compaction score in concurrent log.
4.remove version consistence check in can_do_compaction, we'll choose a
consecutive rowset to do compaction, so this logic is useless
after add logic above, compaction score and cpu cost will have a substantial
optimize in concurrent load.
Co-authored-by: yixiutt <yixiu@selectdb.com>
* [Vectorized][Function] add orthogonal bitmap agg functions
save some file about orthogonal bitmap function
add some file to rebase
update functions file
* refactor union_count function
refactor orthogonal union count functions
* remove bool is_variadic
Support query hive table on S3. Pass AK/SK, Region and s3 endpoint to hive table while creating the external table.
example create table sql:
```
CREATE TABLE `region_s3` (
`r_regionkey` integer NOT NULL,
`r_name` char(25) NOT NULL,
`r_comment` varchar(152) )
engine=hive
properties
("database"="default",
"table"="region_s3",
“hive.metastore.uris"="thrift://127.0.0.1:9083",
“AWS_ACCESS_KEY”=“YOUR_ACCESS_KEY",
“AWS_SECRET_KEY”=“YOUR_SECRET_KEY",
"AWS_ENDPOINT"="s3.us-east-1.amazonaws.com",
“AWS_REGION”=“us-east-1”);
```
At present, Doris can only access the hadoop cluster with kerberos authentication enabled by broker, but Doris BE itself
does not supports access to a kerberos-authenticated HDFS file.
This PR hope solve the problem.
When create hive external table, users just specify following properties to access the hdfs data with kerberos authentication enabled:
```sql
CREATE EXTERNAL TABLE t_hive (
k1 int NOT NULL COMMENT "",
k2 char(10) NOT NULL COMMENT "",
k3 datetime NOT NULL COMMENT "",
k5 varchar(20) NOT NULL COMMENT "",
k6 double NOT NULL COMMENT ""
) ENGINE=HIVE
COMMENT "HIVE"
PROPERTIES (
'hive.metastore.uris' = 'thrift://192.168.0.1:9083',
'database' = 'hive_db',
'table' = 'hive_table',
'dfs.nameservices'='hacluster',
'dfs.ha.namenodes.hacluster'='n1,n2',
'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020',
'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020',
'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
'dfs.namenode.kerberos.principal'='hadoop/_HOST@REALM.COM'
'hadoop.security.authentication'='kerberos',
'hadoop.kerberos.principal'='doris_test@REALM.COM',
'hadoop.kerberos.keytab'='/path/to/doris_test.keytab'
);
```
If you want to `select into outfile` to HDFS that kerberos authentication enable, you can refer to the following SQL statement:
```sql
select * from test into outfile "hdfs://tmp/outfile1"
format as csv
properties
(
'fs.defaultFS'='hdfs://hacluster/',
'dfs.nameservices'='hacluster',
'dfs.ha.namenodes.hacluster'='n1,n2',
'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020',
'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020',
'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
'dfs.namenode.kerberos.principal'='hadoop/_HOST@REALM.COM'
'hadoop.security.authentication'='kerberos',
'hadoop.kerberos.principal'='doris_test@REALM.COM',
'hadoop.kerberos.keytab'='/path/to/doris_test.keytab'
);
```
When the length of `Tuple/Block data` is greater than 2G, serialize the protoBuf request and embed the
`Tuple/Block data` into the controller attachment and transmit it through http brpc.
This is to avoid errors when the length of the protoBuf request exceeds 2G:
`Bad request, error_text=[E1003]Fail to compress request`.
In #7164, `Tuple/Block data` was put into attachment and sent via default `baidu_std brpc`,
but when the attachment exceeds 2G, it will be truncated. There is no 2G limit for sending via `http brpc`.
Also, in #7921, consider putting `Tuple/Block data` into attachment transport by default, as this theoretically
reduces one serialization and improves performance. However, the test found that the performance did not improve,
but the memory peak increased due to the addition of a memory copy.
Add ntile function.
For non-vectorized-engine, I just implemented like Impala, rewrite ntile to row_number and count.
But for vectorized-engine, I implemented WindowFunctionNTile.
In order to cooperate with Doris's successful graduation from Apache, the Doris official website also needs a new look
and more powerful feature, so we decided to redesign the Doris official website.
The code and documents of the new official website are included in this PR.
Since the new website is completely rewritten, the content and structure of the project are different from the previous one.
In particular, the directory structure of documents has changed, and the number of documents is large, so the number of
files in this PR is very large.
In the old website,all English documents are in the en/ directory, and Chinese documents in the zh-CN/ directory,
but in the new website,the documents are split into multiple directories according to the nav.
The document's directory structure changes as follows:
```
docs (old website)
| |—— .vuepress (library)
| |—— en
| | |—— admin-manual
│ │ |—— advanced
| | |—— article
| | |—— benchmark
| | |—— case-user
| | |—— community
| | |—— data-operate
| | |—— data-table
| | |—— design
| | |—— developer-guide
| | |—— downloads
| | |—— ecosystem
| | |—— faq
| | |—— get-starting
| | |—— install
| | |—— sql-manual
| | |—— summary
| | |___ README.md
| |—— zh-CN
...
docs (new website)
| |—— .vuepress (library)
| |—— en
| | |—— community (unchanged, community nav)
│ │ |—— developer (new directory, developer nav)
│ │ | |—— design (moved from en/design)
│ │ | |__ developer-guide (moved from en/developer-guide)
| | |—— docs (new directory, all children directories moved from en/, document nav)
│ │ | |—— admin-manual
│ │ | |—— advanced
│ │ | |—— benchmark
│ │ | |—— data-operate
│ │ | |—— data-table
│ │ | |—— ecosystem
│ │ | |—— faq
│ │ | |—— get-starting
│ │ | |—— install
│ │ | |—— sql-manual
│ │ | |—— summary
| | |—— downloads (unchanged, downloads nav)
| | |—— userCase (moved from en/case-user, user nav)
| | |___ README.md
| |—— zh-CN
...
```
Currently, only the root user has node_priv privileges.
That is, only the root user can operate the addition and deletion of nodes.
In the original design of Doris, there is an Operator role. This role can have node_priv for node operations.
This PR supports assigning node_priv to users other than root.
However, only users who have both grant_priv and node_priv can assign node_priv to other users.
This ensures that only the root user has this permission, and users who are given node_priv
cannot continue to expand this permission outward.