Commit Graph

2187 Commits

Author SHA1 Message Date
8d98c17c4e [Bug][Vectorized] Fix DCHECK failed in VExchangeNode close twice (#10184)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-06-16 23:56:49 +08:00
75a7e72402 [Refactor] Use iequal to replace boost::iequals (#10146)
* [Refactor] Use iequal to replace boost::iequals

* remove unused include
2022-06-16 18:18:38 +08:00
Pxl
ae9c231925 [Enhancement][Storage] refactor InListPredicate/NotInListPredicate (#10139)
* refactor in_list_pred

* update
2022-06-16 18:09:29 +08:00
f49a4535c4 [Fix] fix vjson_scanner heap use after free when meet object or array type (#10179)
quick merge. It is a serious bug in 1.1.
2022-06-16 16:01:18 +08:00
33921c5e75 [Bug] Fix _add_block_closure do not delete in ~VNodeChannel() (#10180)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-06-16 15:56:07 +08:00
28e8effc52 [Refactor] Refactor vectorized scan node (#9968) 2022-06-16 11:10:56 +08:00
4b9d500425 [improvement](profile) Add table name and predicates (#10093) 2022-06-16 10:59:31 +08:00
Pxl
5805f8077f [Feature] [Vectorized] Some pre-refactorings or interface additions for schema change part2 (#10003) 2022-06-16 10:50:08 +08:00
90f229c038 [refactor] remove useless plugin test code (#10061)
* remove plugin test code

* remove plugin test

Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-06-16 10:43:28 +08:00
bc431f2806 [typo] Fix typos in comments (#10142) 2022-06-16 10:13:59 +08:00
4dfebb9852 [Feature] compaction quickly for small data import (#9804)
* compaction quickly for small data import #9791
1.merge small versions of rowset as soon as possible to increase the import frequency of small version data
2.small version means that the number of rows is less than config::small_compaction_rowset_rows  default 1000
2022-06-15 21:48:34 +08:00
f1d0c231b9 [Opt][Vectorized] Opt vectorized the unique_table in storage vectorized (#10132)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-06-15 15:32:15 +08:00
983cdc7b0d [feature-wip](array-type) Support loading data in vectorized format (#10065) 2022-06-15 14:40:28 +08:00
4c24586865 [Vectorized][UDF] support java-udaf (#9930) 2022-06-15 10:53:44 +08:00
f4e2f78a1a [fix] Fix the bug that data balance causes tablet loss (#9971)
1. Provide a FE conf to test the reliability in single replica case when tablet scheduling are frequent.
2. According to #6063, almost apply this fix on current code.
2022-06-15 09:52:56 +08:00
02b1908ce4 [modify default config]add be 2pc config enbale defalut (#10110)
Co-authored-by: wudi <>
2022-06-15 09:08:28 +08:00
85362a907e [fix](mem tracker) Fix some memory leaks, inaccurate statistics, core dump, deadlock bugs (#10072)
1. Fix the memory leak. When the load task is canceled, the `IndexChannel` and `NodeChannel` mem trackers cannot be destructed in time.
2. Fix Load task being frequently canceled by oom and inaccurate `LoadChannel` mem tracker limit, and rewrite the variable name of `mem limit` in `LoadChannel`.
3. Fix core dump, when logout task mem tracker, phmap erase fails, resulting in repeated logout of the same tracker.
4. Fix the deadlock, when add_child_tracker mem limit exceeds, calling log_usage causes `_child_trackers_lock` deadlock.
5. Fix frequent log printing when thread mem tracker limit exceeds, which will affect readability and performance.
6. Optimize some details of mem tracker display.
2022-06-14 21:38:37 +08:00
f7b5f36da4 [feature] Support read hive external table and outfile into HDFS that authenticated by kerberos (#9579)
At present, Doris can only access the hadoop cluster with kerberos authentication enabled by broker, but Doris BE itself 
does not supports access to a kerberos-authenticated HDFS file.

This PR hope solve the problem.

When create hive external table, users just specify following properties to access the hdfs data with kerberos authentication enabled:

```sql
CREATE EXTERNAL TABLE t_hive (
k1 int NOT NULL COMMENT "",
k2 char(10) NOT NULL COMMENT "",
k3 datetime NOT NULL COMMENT "",
k5 varchar(20) NOT NULL COMMENT "",
k6 double NOT NULL COMMENT ""
) ENGINE=HIVE
COMMENT "HIVE"
PROPERTIES (
'hive.metastore.uris' = 'thrift://192.168.0.1:9083',
'database' = 'hive_db',
'table' = 'hive_table',
'dfs.nameservices'='hacluster',
'dfs.ha.namenodes.hacluster'='n1,n2',
'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020',
'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020',
'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
'dfs.namenode.kerberos.principal'='hadoop/_HOST@REALM.COM'
'hadoop.security.authentication'='kerberos',
'hadoop.kerberos.principal'='doris_test@REALM.COM',
'hadoop.kerberos.keytab'='/path/to/doris_test.keytab'
);
```

If you want  to `select into outfile` to HDFS that kerberos authentication enable, you can refer to the following SQL statement:

```sql
select * from test into outfile "hdfs://tmp/outfile1" 
format as csv
properties
(
'fs.defaultFS'='hdfs://hacluster/',
'dfs.nameservices'='hacluster',
'dfs.ha.namenodes.hacluster'='n1,n2',
'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020',
'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020',
'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
'dfs.namenode.kerberos.principal'='hadoop/_HOST@REALM.COM'
'hadoop.security.authentication'='kerberos',
'hadoop.kerberos.principal'='doris_test@REALM.COM',
'hadoop.kerberos.keytab'='/path/to/doris_test.keytab'
);
```
2022-06-14 20:07:03 +08:00
c2af14fc61 [Bug] return type is not always nullable of function (#10116)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-06-14 16:32:35 +08:00
14bc971159 [Bug] Fix bug push value predicate of unique table when have sequence column (#10060)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-06-14 15:35:31 +08:00
Pxl
5d624dfe6c [bugfix]fix segmentation fault at unalign address cast to int128 (#10094) 2022-06-14 15:32:58 +08:00
2a96d7ffde [spell] Fix spell error in row_batch.h (#10109) 2022-06-14 15:28:29 +08:00
622143f87c [typo] Fix typos in comments (#10111) 2022-06-14 15:28:11 +08:00
9203a235e0 [typo] Fix typos in runtime_state.cpp (#10112) 2022-06-14 15:27:40 +08:00
Pxl
e58cac1f00 [build] use inline to replace static (#10087) 2022-06-14 09:18:15 +08:00
39a2785ce2 [enhancement] support simd instructions on arm cpus through sse2neon (#10068)
* [enhancement] support simd instructions on arm cpus through sse2neon
2022-06-14 09:17:09 +08:00
d4d2e82bdf [typo] Fix typos in comments (#10106) 2022-06-14 08:17:19 +08:00
ce730293c0 [improvement] send merged runtime filter asynchrously (#10080) 2022-06-14 08:16:25 +08:00
d58e00c49c [fix](brpc) Embed serialized request into the attachment and transmit it through http brpc (#9803)
When the length of `Tuple/Block data` is greater than 2G, serialize the protoBuf request and embed the
`Tuple/Block data` into the controller attachment and transmit it through http brpc.

This is to avoid errors when the length of the protoBuf request exceeds 2G:
`Bad request, error_text=[E1003]Fail to compress request`.

In #7164, `Tuple/Block data` was put into attachment and sent via default `baidu_std brpc`,
but when the attachment exceeds 2G, it will be truncated. There is no 2G limit for sending via `http brpc`.

Also, in #7921, consider putting `Tuple/Block data` into attachment transport by default, as this theoretically
reduces one serialization and improves performance. However, the test found that the performance did not improve,
but the memory peak increased due to the addition of a memory copy.
2022-06-13 20:41:48 +08:00
8af9339b00 [BUGFIX] Fix wrong column types in result file sink (#10079) 2022-06-13 09:05:11 +08:00
797f6e1472 [Enhancement]Decode bitshuffle data before adding it into PageCache (#10036)
* [Enhancement]Decode bitshuffle data before add into PageCache

* Fix be ut failed
2022-06-13 09:04:23 +08:00
415b6b8086 [feature-wip](array-type) Support array type which doesn't contain null (#9809) 2022-06-12 23:35:28 +08:00
990a2940ca [metric] add some metrics for cpu and memory (#9887)
1. add some metrics for cpu monitor;
2. add metrics for process state monitor;
3. add metrics for memory monitor;

It is convenient for us to use grafana to filter through different conditions.

After the added, we can find the cpu metrics like this:
doris_be_cpu{device="cpu1",mode="guest_nice"} 0
doris_be_cpu{device="cpu1",mode="guest"} 0
doris_be_cpu{device="cpu1",mode="steal"} 0
doris_be_cpu{device="cpu1",mode="soft_irq"} 107168
doris_be_cpu{device="cpu1",mode="irq"} 0
doris_be_cpu{device="cpu1",mode="iowait"} 3726931
doris_be_cpu{device="cpu1",mode="idle"} 2358039214
doris_be_cpu{device="cpu1",mode="system"} 58699464
doris_be_cpu{device="cpu1",mode="nice"} 1700438
doris_be_cpu{device="cpu1",mode="user"} 54974091

we can find the memory metrics as follow:
doris_be_memory_pswpin 167785
doris_be_memory_pswpout 203724
doris_be_memory_pgpgin 22308762092
doris_be_memory_pgpgout 152101956232


we also can find the process metrics as follow:
doris_be_proc{mode="interrupt"} 421721020416
doris_be_proc{mode="ctxt_switch"} 2806640907317
doris_be_proc{mode="procs_running"} 8
doris_be_proc{mode="procs_blocked"} 3
2022-06-10 19:45:31 +08:00
e0cf2677a0 [dependency][enhancement] support build libhdfs in arm cpus (#10018)
Supports native hdfs functionality on arm cpu
This pr mainly upgrades libdfs3 and supports running on arm,and make libhdfs3 with kerberos as default
2022-06-10 19:40:41 +08:00
Pxl
979c81b066 [bugfix] signed long will be converted to signed long during dcheck and cause dcheck fail (#10047) 2022-06-10 14:26:38 +08:00
4a474420c8 [feature](function) Add ntile function (#9867)
Add ntile function.
For non-vectorized-engine, I just implemented like Impala, rewrite ntile to row_number and count.
But for vectorized-engine, I implemented WindowFunctionNTile.
2022-06-10 10:32:40 +08:00
1220cc147d [feature](vectorized) Support outfile on vectorized engine (#10013)
This PR supports output csv format file on vectorized engine.

** Parquet is still not supported. **
2022-06-10 09:15:53 +08:00
3363b3aa19 [fix](load) fix streamload failure due to false unhealthy replica in concurrent stream load (#10007)
in concurrent stream load, fe will run publish version task concurrently,
which cause publish task disorder in be.
For example:
fe publish task with version 1 2 3 4
be may handle task with sequence 1 2 4 3
In case above, when report tablet info, be found that version 4
published but version 3 not visible, it'll report version miss to fe,
and fe will set replica lastFailedVersion, and finally makes transaction
commits fail while no quorum health replicas。

Add a time condition if a version miss for 60 seconds then report version miss.
2022-06-10 09:15:14 +08:00
d247d06180 [Improvement] refine codes in TabletReader (#10042)
fast merge: just remove some code have no affect to other components
2022-06-10 09:12:33 +08:00
9c1ba771da [fix](be) fix asan be set_storage_medium core (#9986) (#9987)
https://github.com/apache/incubator-doris/issues/9986
2022-06-09 23:12:58 +08:00
dc874709d7 [feature-wip](array-type) support array<decimal128> in mysql_result_writer (#9998) 2022-06-09 15:15:26 +08:00
6fab1cbf3c [feature-wip](array-type) Add array functions size and cardinality (#9921)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-06-09 15:03:03 +08:00
19bc14cf8d [feature-wip](array-type) Add array type support for vectorized parquet-orc scanner (#9856)
Only support one level array now.
for example:
- nullable(array(nullable(tinyint))) is **support**.
- nullable(array(nullable(array(xx))) is **not support**.
2022-06-09 12:11:47 +08:00
bf8b4fb2d3 [Bugfix] be crash when executing sql contains bitmap_intersect function (#9910)
* fix bitmap serialize bug

* add regression test for bitmap seralize bugfix

* add missing regression test out file

* fix reggresion test failed issue
2022-06-09 08:45:46 +08:00
9c52b4a508 [enhance] improve dict in-predicate evaluate (#10009) 2022-06-09 00:25:30 +08:00
d9bbf67b9e [DefaultConfigChange]enable query vectorization and storage vectorization and storage low cardinality optimization by default (#9848)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-06-08 15:29:43 +08:00
94089b9192 [Refactor] Use file factory to replace create file reader/writer (#9505)
1. Simplify code logic and improve abstraction
2. Fix the mem leak of raw pointer

Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-06-08 15:07:39 +08:00
fc9afda97a [enhancement][diagnostics] Add a diagnostic: detect unused includes (#9117) 2022-06-08 11:52:48 +08:00
35c3e4e33c [Bug] runtime filter is not used as expected (#10001)
* [Bug] runtime filter is not used as expected

* update
2022-06-08 11:10:39 +08:00
Pxl
f2aa5f32b8 [Feature] [Vectorized] Some pre-refactorings or interface additions for schema change (#9811)
Some pre-refactorings or interface additions for schema change
2022-06-07 15:04:57 +08:00