Commit Graph

4909 Commits

Author SHA1 Message Date
dad953bc08 [doc](website)fix SSR bug and add algolia search (#10178)
* fix ssr bug and add algolia search
2022-06-16 14:25:46 +08:00
3f9436c6a8 [compile]fix simdjson compile flags (#10054) 2022-06-16 11:28:51 +08:00
28e8effc52 [Refactor] Refactor vectorized scan node (#9968) 2022-06-16 11:10:56 +08:00
4b9d500425 [improvement](profile) Add table name and predicates (#10093) 2022-06-16 10:59:31 +08:00
Pxl
3b6451273b [regression test]fix test_outfile to use user regression conf (#10123) 2022-06-16 10:58:36 +08:00
Pxl
5805f8077f [Feature] [Vectorized] Some pre-refactorings or interface additions for schema change part2 (#10003) 2022-06-16 10:50:08 +08:00
90f229c038 [refactor] remove useless plugin test code (#10061)
* remove plugin test code

* remove plugin test

Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-06-16 10:43:28 +08:00
bc431f2806 [typo] Fix typos in comments (#10142) 2022-06-16 10:13:59 +08:00
9217223cc5 [doc] update sequence en and zh-CN doc. (#10164)
* update sequence en and zh-CN doc.
2022-06-16 09:32:52 +08:00
dff1f09406 [doc](website)update Chinese heme page text (#10168)
update Chinese home page text
2022-06-16 08:04:21 +08:00
ca88f258d9 [improvement] remove unused codes and docs for SHOW USER (#10107)
* remove unused codes and docs for `SHOW USER`
2022-06-15 21:49:08 +08:00
4dfebb9852 [Feature] compaction quickly for small data import (#9804)
* compaction quickly for small data import #9791
1.merge small versions of rowset as soon as possible to increase the import frequency of small version data
2.small version means that the number of rows is less than config::small_compaction_rowset_rows  default 1000
2022-06-15 21:48:34 +08:00
c4871fb306 [doc](website)remove translate warning form Chinese docs (#10157)
* modify home page text
2022-06-15 18:17:37 +08:00
4005b34a52 [doc] add tpc-h benchmark (#10150)
[doc] add tpc-h benchmark
2022-06-15 16:43:10 +08:00
49f4437396 [fix] Fix disk used pct only consider the data that used by Doris (#9705) 2022-06-15 16:28:56 +08:00
f1d0c231b9 [Opt][Vectorized] Opt vectorized the unique_table in storage vectorized (#10132)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-06-15 15:32:15 +08:00
606c32cc30 [doc](website)add translate warning in docs (#10152)
* fix docs bugs with sidebar can not display and some style problems
2022-06-15 14:51:53 +08:00
983cdc7b0d [feature-wip](array-type) Support loading data in vectorized format (#10065) 2022-06-15 14:40:28 +08:00
96b54dd1d5 [doc](website)modify home page text and navbar (#10148)
* fix docs bugs with sidebar can not display and some style problems
2022-06-15 12:21:40 +08:00
76a968d1dd [Enhancement][Refactor](Nereids) generate pattern by operator and refactor Plan and NODE_TYPE generic type (#10019)
This pr support
1. remove the generic type from operator, remove some NODE_TYPE from plan and expression
2. refactor Plan and NODE_TYPE generic type
3. support child class matching by TypePattern
4. analyze the code of operator and generate pattern makes it easy to create rules.


e.g. 
```java
class LogicalJoin extends LogicalBinaryOperator;
class PhysicalFilter extends PhysicalUnaryOperator;
```

will generate the code
```java
interface GeneratedPatterns extends Patterns {
  default PatternDescriptor<LogicalBinaryPlan<LogicalJoin, Plan, Plan>, Plan> logicalJoin() {
      return new PatternDescriptor<LogicalBinaryPlan<LogicalJoin, Plan, Plan>, Plan>(
          new TypePattern(LogicalJoin.class, Pattern.FIXED, Pattern.FIXED),
          defaultPromise()
      );
  }
  
  default <C1 extends Plan, C2 extends Plan>
  PatternDescriptor<LogicalBinaryPlan<LogicalJoin, C1, C2>, Plan>
          logicalJoin(PatternDescriptor<C1, Plan> child1, PatternDescriptor<C2, Plan> child2) {
      return new PatternDescriptor<LogicalBinaryPlan<LogicalJoin, C1, C2>, Plan>(
          new TypePattern(LogicalJoin.class, child1.pattern, child2.pattern),
          defaultPromise()
      );
  }

  default PatternDescriptor<PhysicalUnaryPlan<PhysicalFilter, Plan>, Plan> physicalFilter() {
      return new PatternDescriptor<PhysicalUnaryPlan<PhysicalFilter, Plan>, Plan>(
          new TypePattern(PhysicalFilter.class, Pattern.FIXED),
          defaultPromise()
      );
  }
  
  default <C1 extends Plan>
  PatternDescriptor<PhysicalUnaryPlan<PhysicalFilter, C1>, Plan>
          physicalFilter(PatternDescriptor<C1, Plan> child1) {
      return new PatternDescriptor<PhysicalUnaryPlan<PhysicalFilter, C1>, Plan>(
          new TypePattern(PhysicalFilter.class, child1.pattern),
          defaultPromise()
      );
  }
}
```
and then we don't have to add pattern for new operators.

this function utilizing jsr269 to do something in compile time, and utilizing antlr4 to analyze the code of `Operator`, then we can generate corresponding pattern.


pattern generate steps:
1. maven-compiler-plugin in the pom.xml will compile fe-core three terms. first term will compile `PatternDescribable.java` and `PatternDescribableProcessor.java`
2. second compile term will compile `PatternDescribableProcessPoint.java`, and enable annotation process `PatternDescribableProcessor`, PatternDescribableProcessor will receive the event and know that `PatternDescribableProcessPoint` class contains the `PatternDescribable` annotation.
3. `PatternDescribableProcessor` will not process `PatternDescribableProcessPoint`, but find all java file exists in `operatorPath` that specify in pom.xml, and then parse to Java AST(abstract syntax tree).
5. PatternDescribableProcessor collect java AST and use `PatternGeneratorAnalyzer` to analyze AST, find the child class file for `PlanOperator` then generate `GeneratedPatterns.java` by the AST.
6. third compile term will compile `GeneratedPatterns.java` and other java file.
2022-06-15 11:44:54 +08:00
c9f33fa051 [test] add cast array regression test (#10069)
* [test] add cast array regression test
2022-06-15 11:29:28 +08:00
c4d0fba713 Add storage policy for remote storage migration (#9997) 2022-06-15 11:00:06 +08:00
4c24586865 [Vectorized][UDF] support java-udaf (#9930) 2022-06-15 10:53:44 +08:00
f4e2f78a1a [fix] Fix the bug that data balance causes tablet loss (#9971)
1. Provide a FE conf to test the reliability in single replica case when tablet scheduling are frequent.
2. According to #6063, almost apply this fix on current code.
2022-06-15 09:52:56 +08:00
7ab64f9155 [doc][website]update home page content and add slack button (#10091)
* fix docs bugs with sidebar can not display and some style problems
2022-06-15 09:31:40 +08:00
02b1908ce4 [modify default config]add be 2pc config enbale defalut (#10110)
Co-authored-by: wudi <>
2022-06-15 09:08:28 +08:00
34ea6ce850 [doc]Added be enable_stream_load_record configuration description (#10130) 2022-06-15 08:14:47 +08:00
be3aa2aa37 [enhancement](community): polish doc to reformat (#10137) 2022-06-15 08:14:13 +08:00
85362a907e [fix](mem tracker) Fix some memory leaks, inaccurate statistics, core dump, deadlock bugs (#10072)
1. Fix the memory leak. When the load task is canceled, the `IndexChannel` and `NodeChannel` mem trackers cannot be destructed in time.
2. Fix Load task being frequently canceled by oom and inaccurate `LoadChannel` mem tracker limit, and rewrite the variable name of `mem limit` in `LoadChannel`.
3. Fix core dump, when logout task mem tracker, phmap erase fails, resulting in repeated logout of the same tracker.
4. Fix the deadlock, when add_child_tracker mem limit exceeds, calling log_usage causes `_child_trackers_lock` deadlock.
5. Fix frequent log printing when thread mem tracker limit exceeds, which will affect readability and performance.
6. Optimize some details of mem tracker display.
2022-06-14 21:38:37 +08:00
f7b5f36da4 [feature] Support read hive external table and outfile into HDFS that authenticated by kerberos (#9579)
At present, Doris can only access the hadoop cluster with kerberos authentication enabled by broker, but Doris BE itself 
does not supports access to a kerberos-authenticated HDFS file.

This PR hope solve the problem.

When create hive external table, users just specify following properties to access the hdfs data with kerberos authentication enabled:

```sql
CREATE EXTERNAL TABLE t_hive (
k1 int NOT NULL COMMENT "",
k2 char(10) NOT NULL COMMENT "",
k3 datetime NOT NULL COMMENT "",
k5 varchar(20) NOT NULL COMMENT "",
k6 double NOT NULL COMMENT ""
) ENGINE=HIVE
COMMENT "HIVE"
PROPERTIES (
'hive.metastore.uris' = 'thrift://192.168.0.1:9083',
'database' = 'hive_db',
'table' = 'hive_table',
'dfs.nameservices'='hacluster',
'dfs.ha.namenodes.hacluster'='n1,n2',
'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020',
'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020',
'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
'dfs.namenode.kerberos.principal'='hadoop/_HOST@REALM.COM'
'hadoop.security.authentication'='kerberos',
'hadoop.kerberos.principal'='doris_test@REALM.COM',
'hadoop.kerberos.keytab'='/path/to/doris_test.keytab'
);
```

If you want  to `select into outfile` to HDFS that kerberos authentication enable, you can refer to the following SQL statement:

```sql
select * from test into outfile "hdfs://tmp/outfile1" 
format as csv
properties
(
'fs.defaultFS'='hdfs://hacluster/',
'dfs.nameservices'='hacluster',
'dfs.ha.namenodes.hacluster'='n1,n2',
'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020',
'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020',
'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
'dfs.namenode.kerberos.principal'='hadoop/_HOST@REALM.COM'
'hadoop.security.authentication'='kerberos',
'hadoop.kerberos.principal'='doris_test@REALM.COM',
'hadoop.kerberos.keytab'='/path/to/doris_test.keytab'
);
```
2022-06-14 20:07:03 +08:00
25b9d6eba2 [feature](nereids) Plan Translator (#9993)
Issue Number: close #9621

Add following physical operator: PhysicalAgg PhysicalSort PhysicalHashJoin

Add basic logic of plan translator

1. add new agg phase enum for nereids
2. remove the Analyzer from PlanContext.java
3. implement PlanTranslator::visitPhysicalFilter
2022-06-14 19:39:55 +08:00
15e1bb448f [test] tpch q3 rewrite, change join order, make lineitem on left side (#10055)
rewrite the sql in tpch test tools
2022-06-14 17:16:33 +08:00
c2af14fc61 [Bug] return type is not always nullable of function (#10116)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-06-14 16:32:35 +08:00
2fadaddda0 [Enhancement] (Nereids) scalar expression rewrite framework (#9942)
Issue Number: close #9633

The scalar expression is rewritten using the visitor pattern as a traversal.

In the abstract class ExpressionVisitor, which contains all predicate to rewrite.

We have provided a rewrite rules interface ExpressionRewriteRule, AbstractExpressionRewriteRule class implements the interface and expanded the ExpressionVisitor, if we want to realize an expression rewriting rules, Direct implementation AbstractExpressionRewriteRule provided in the method of traversing the predicate.

There are two rules to refer: NormalizeExpressionRule and SimplifyNotExprRule
2022-06-14 16:20:48 +08:00
14bc971159 [Bug] Fix bug push value predicate of unique table when have sequence column (#10060)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-06-14 15:35:31 +08:00
59b3023adf [fix](regression)bucket shuffle join with collocate table should use order_qt (#10082) 2022-06-14 15:34:39 +08:00
81e0a348a7 [fix] fix bug that show proc "/cluster_balance/history_tablets" return malformat error (#10073) 2022-06-14 15:34:16 +08:00
Pxl
5d624dfe6c [bugfix]fix segmentation fault at unalign address cast to int128 (#10094) 2022-06-14 15:32:58 +08:00
eb4d0f508a [doc] Add docs for SHOW TABLETS (#10105)
* add docs for SHOW TABLETS

* update

* add more examples for SHOW TABLETS

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-06-14 15:29:46 +08:00
2a96d7ffde [spell] Fix spell error in row_batch.h (#10109) 2022-06-14 15:28:29 +08:00
622143f87c [typo] Fix typos in comments (#10111) 2022-06-14 15:28:11 +08:00
9203a235e0 [typo] Fix typos in runtime_state.cpp (#10112) 2022-06-14 15:27:40 +08:00
dc4761593b [docs] Add common error messages to doris backup (#10048) 2022-06-14 09:20:04 +08:00
Pxl
e58cac1f00 [build] use inline to replace static (#10087) 2022-06-14 09:18:15 +08:00
39a2785ce2 [enhancement] support simd instructions on arm cpus through sse2neon (#10068)
* [enhancement] support simd instructions on arm cpus through sse2neon
2022-06-14 09:17:09 +08:00
7cf0cc7dd6 [deps] update libhdfs3 to fix a uuid set problem (#10092) 2022-06-14 09:16:32 +08:00
bdcf2e7ed2 [Improvement] set table name in olap scanner (#10102) 2022-06-14 08:18:18 +08:00
d4d2e82bdf [typo] Fix typos in comments (#10106) 2022-06-14 08:17:19 +08:00
ce730293c0 [improvement] send merged runtime filter asynchrously (#10080) 2022-06-14 08:16:25 +08:00
d58e00c49c [fix](brpc) Embed serialized request into the attachment and transmit it through http brpc (#9803)
When the length of `Tuple/Block data` is greater than 2G, serialize the protoBuf request and embed the
`Tuple/Block data` into the controller attachment and transmit it through http brpc.

This is to avoid errors when the length of the protoBuf request exceeds 2G:
`Bad request, error_text=[E1003]Fail to compress request`.

In #7164, `Tuple/Block data` was put into attachment and sent via default `baidu_std brpc`,
but when the attachment exceeds 2G, it will be truncated. There is no 2G limit for sending via `http brpc`.

Also, in #7921, consider putting `Tuple/Block data` into attachment transport by default, as this theoretically
reduces one serialization and improves performance. However, the test found that the performance did not improve,
but the memory peak increased due to the addition of a memory copy.
2022-06-13 20:41:48 +08:00