Commit Graph

2213 Commits

Author SHA1 Message Date
f5e5880fb6 [Improvement] make expression for template argument a constexpr (#10268) 2022-06-21 07:42:02 +08:00
5974e452bc [enhancement] CRC32 instructions compatible arm arch (#10261)
The performance of some CPUs that do not implement CRC instructions is particularly poor
2022-06-20 17:49:06 +08:00
c3743ec9aa [enhancement] optmize 2 cases in seg_iter: all/none rows passed predicate (#10259)
* [enhancement] optmize 2 cases: all/none rows passed predicate in seg_iter.

* format
2022-06-20 17:47:52 +08:00
57327e6236 [improvement]Separate input and output parameters in ColumnPredicate (#10249)
```cpp
for (uint16_t i = 0; i < *size; ++i) {
	// some code here
}
```
The value of size is read for each conditional test, which also prevents possible vectorization.
2022-06-20 15:04:57 +08:00
588634ddf6 [feature] support runtime filter on vectorized engine (#10103) 2022-06-20 09:46:38 +08:00
ecdf8bcfdd [comments]Replace some chinese comments in product Code (#10243) 2022-06-20 09:24:19 +08:00
1c9ce29440 [improvement]Avoid frequently allocating and releasing flags in InListPredicate (#10248) 2022-06-20 09:08:02 +08:00
ab29ad2144 [typo] Fix typos in comments (#10247) 2022-06-20 09:06:29 +08:00
67f341f44e [TLP](step-1) Remove incubator prefix (#10230)
Remove some `incubator-` prefix in source code.
The document is not modified, will be done in next PR.
2022-06-19 19:34:52 +08:00
6ad024a2bf [fix] (mem tracker) Refactor memtable mem tracker, fix flush memtable DCHECK failed (#10156)
1. Added memory leak detection for `DeltaWriter` and `MemTable` mem tracker
2. Modify memtable mem tracker to virtual to avoid frequent recursive consumption of parent tracker.
3. Disable memtable flush thread attach memtable tracker, ensure that memtable mem tracker is completely accurate.
4. Modify `memory_verbose_track=false`. At present, there is a performance problem in the frequent switch thread mem tracker. 
      - Because the mem tracker exists as a shared_ptr in the thread local. Each time it is switched, the atomic variable use_count in the shared_ptr of the current tracker will be -1, and the tracker to be replaced use_count +1, multi-threading Frequent changes to the same tracker shared_ptr are slow.
      - TODO: 1. Reduce unnecessary thread mem tracker switch, 2. Consider using raw pointers for mem tracker in thread local.
2022-06-19 16:48:42 +08:00
70450d04ba [typo] Fix typos in comments (#10172) 2022-06-19 10:30:17 +08:00
ffe466cbc7 [fix](reader)replace an auto with size_t to avoid integer overflow (#10163) 2022-06-19 10:29:01 +08:00
5fdd995b4c [fix] Fix heap-use-after-free when using type array<string> (#10127) 2022-06-19 10:27:36 +08:00
1d3496c6ab [feature] support backup/restore connect to HDFS (#10081) 2022-06-19 10:26:20 +08:00
0e404edf54 [improvement] Change array offset type from UInt32 to UInt64 (#10070)
Now column `Array<T>` contains column `offsets` and `data`, and type of column `offsets` is UInt32 now.
If we call array_union to merge arrays repeatedly, the size of array may overflow.
So we need to extend it before `Array Data Type` release.
2022-06-19 10:24:08 +08:00
7a85e8d525 [bug](be) fix be block_reader.cc::_update_agg_value() mem leak.(#10216) (#10218) 2022-06-17 21:25:52 +08:00
f7789f4bc4 [fix]InListPredicate wrong result (#10211)
* fix

* reg test

Co-authored-by: Wang Bo <wangbo36@meituan.com>
2022-06-17 18:34:25 +08:00
f35b235c3b [opt](compaction) optimize compaction in concurrent load (#10153)
add some logic to opt compaction:
1.seperate base&cumu compaction in case base compaction runs too long and
affect cumu compaction
2.fix level size in cu compaction so that file size below 64M have a right level
size, when choose rowsets to do compaction, the policy will ignore big rowset,
this will reduce about 25% cpu in high frequency concurrent load
3.remove skip window restriction so rowset can do compaction right after
generated, cause we'll not delete rowset after compaction. This will highly
reduce compaction score in concurrent log.
4.remove version consistence check in can_do_compaction, we'll choose a
consecutive rowset to do compaction, so this logic is useless

after add logic above, compaction score and cpu cost will have a substantial
optimize in concurrent load.

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-06-17 17:49:45 +08:00
60147ad7a5 [Improvement] build runtime filters asynchronously (#10186) 2022-06-17 11:09:13 +08:00
5e47b03595 [feature-wip](array-type) Add array aggregation functions (#10108) 2022-06-17 11:07:49 +08:00
Pxl
fd0bd395ac [Enhancement] Remove some unused include (#10035) 2022-06-17 10:47:25 +08:00
44e979e43b [Vectorized][Function] add orthogonal bitmap agg functions (#10126)
* [Vectorized][Function] add orthogonal bitmap agg functions
save some file about orthogonal bitmap function
add some file to rebase
update functions file

* refactor union_count function
refactor orthogonal union count functions

* remove bool is_variadic
2022-06-17 08:48:41 +08:00
1cca319d18 [fix](vectorized) intersect operator takes too long time to execute (#10183)
* fix itersect operator takes too long time to execute

* modify code based on review comments
2022-06-17 08:43:53 +08:00
6f5f447aa3 [FOLLOWUP] cherrypick after refactoring scan nodes (#10177) 2022-06-17 08:41:47 +08:00
96de99525e [compile&build]clang compile errors fix (#10201)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-06-17 08:41:25 +08:00
c784fb3ddd [fix] (mem tracker) Fix core dump during transmit_block (#10133)
In some cases, query mem tracker does not exist in BE when transmit block. This will result in a null pointer for get query mem tracker in brpc transmit_block
2022-06-17 00:01:30 +08:00
8d98c17c4e [Bug][Vectorized] Fix DCHECK failed in VExchangeNode close twice (#10184)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-06-16 23:56:49 +08:00
75a7e72402 [Refactor] Use iequal to replace boost::iequals (#10146)
* [Refactor] Use iequal to replace boost::iequals

* remove unused include
2022-06-16 18:18:38 +08:00
Pxl
ae9c231925 [Enhancement][Storage] refactor InListPredicate/NotInListPredicate (#10139)
* refactor in_list_pred

* update
2022-06-16 18:09:29 +08:00
f49a4535c4 [Fix] fix vjson_scanner heap use after free when meet object or array type (#10179)
quick merge. It is a serious bug in 1.1.
2022-06-16 16:01:18 +08:00
33921c5e75 [Bug] Fix _add_block_closure do not delete in ~VNodeChannel() (#10180)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-06-16 15:56:07 +08:00
28e8effc52 [Refactor] Refactor vectorized scan node (#9968) 2022-06-16 11:10:56 +08:00
4b9d500425 [improvement](profile) Add table name and predicates (#10093) 2022-06-16 10:59:31 +08:00
Pxl
5805f8077f [Feature] [Vectorized] Some pre-refactorings or interface additions for schema change part2 (#10003) 2022-06-16 10:50:08 +08:00
90f229c038 [refactor] remove useless plugin test code (#10061)
* remove plugin test code

* remove plugin test

Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-06-16 10:43:28 +08:00
bc431f2806 [typo] Fix typos in comments (#10142) 2022-06-16 10:13:59 +08:00
4dfebb9852 [Feature] compaction quickly for small data import (#9804)
* compaction quickly for small data import #9791
1.merge small versions of rowset as soon as possible to increase the import frequency of small version data
2.small version means that the number of rows is less than config::small_compaction_rowset_rows  default 1000
2022-06-15 21:48:34 +08:00
f1d0c231b9 [Opt][Vectorized] Opt vectorized the unique_table in storage vectorized (#10132)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-06-15 15:32:15 +08:00
983cdc7b0d [feature-wip](array-type) Support loading data in vectorized format (#10065) 2022-06-15 14:40:28 +08:00
4c24586865 [Vectorized][UDF] support java-udaf (#9930) 2022-06-15 10:53:44 +08:00
f4e2f78a1a [fix] Fix the bug that data balance causes tablet loss (#9971)
1. Provide a FE conf to test the reliability in single replica case when tablet scheduling are frequent.
2. According to #6063, almost apply this fix on current code.
2022-06-15 09:52:56 +08:00
02b1908ce4 [modify default config]add be 2pc config enbale defalut (#10110)
Co-authored-by: wudi <>
2022-06-15 09:08:28 +08:00
85362a907e [fix](mem tracker) Fix some memory leaks, inaccurate statistics, core dump, deadlock bugs (#10072)
1. Fix the memory leak. When the load task is canceled, the `IndexChannel` and `NodeChannel` mem trackers cannot be destructed in time.
2. Fix Load task being frequently canceled by oom and inaccurate `LoadChannel` mem tracker limit, and rewrite the variable name of `mem limit` in `LoadChannel`.
3. Fix core dump, when logout task mem tracker, phmap erase fails, resulting in repeated logout of the same tracker.
4. Fix the deadlock, when add_child_tracker mem limit exceeds, calling log_usage causes `_child_trackers_lock` deadlock.
5. Fix frequent log printing when thread mem tracker limit exceeds, which will affect readability and performance.
6. Optimize some details of mem tracker display.
2022-06-14 21:38:37 +08:00
f7b5f36da4 [feature] Support read hive external table and outfile into HDFS that authenticated by kerberos (#9579)
At present, Doris can only access the hadoop cluster with kerberos authentication enabled by broker, but Doris BE itself 
does not supports access to a kerberos-authenticated HDFS file.

This PR hope solve the problem.

When create hive external table, users just specify following properties to access the hdfs data with kerberos authentication enabled:

```sql
CREATE EXTERNAL TABLE t_hive (
k1 int NOT NULL COMMENT "",
k2 char(10) NOT NULL COMMENT "",
k3 datetime NOT NULL COMMENT "",
k5 varchar(20) NOT NULL COMMENT "",
k6 double NOT NULL COMMENT ""
) ENGINE=HIVE
COMMENT "HIVE"
PROPERTIES (
'hive.metastore.uris' = 'thrift://192.168.0.1:9083',
'database' = 'hive_db',
'table' = 'hive_table',
'dfs.nameservices'='hacluster',
'dfs.ha.namenodes.hacluster'='n1,n2',
'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020',
'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020',
'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
'dfs.namenode.kerberos.principal'='hadoop/_HOST@REALM.COM'
'hadoop.security.authentication'='kerberos',
'hadoop.kerberos.principal'='doris_test@REALM.COM',
'hadoop.kerberos.keytab'='/path/to/doris_test.keytab'
);
```

If you want  to `select into outfile` to HDFS that kerberos authentication enable, you can refer to the following SQL statement:

```sql
select * from test into outfile "hdfs://tmp/outfile1" 
format as csv
properties
(
'fs.defaultFS'='hdfs://hacluster/',
'dfs.nameservices'='hacluster',
'dfs.ha.namenodes.hacluster'='n1,n2',
'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020',
'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020',
'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
'dfs.namenode.kerberos.principal'='hadoop/_HOST@REALM.COM'
'hadoop.security.authentication'='kerberos',
'hadoop.kerberos.principal'='doris_test@REALM.COM',
'hadoop.kerberos.keytab'='/path/to/doris_test.keytab'
);
```
2022-06-14 20:07:03 +08:00
c2af14fc61 [Bug] return type is not always nullable of function (#10116)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-06-14 16:32:35 +08:00
14bc971159 [Bug] Fix bug push value predicate of unique table when have sequence column (#10060)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-06-14 15:35:31 +08:00
Pxl
5d624dfe6c [bugfix]fix segmentation fault at unalign address cast to int128 (#10094) 2022-06-14 15:32:58 +08:00
2a96d7ffde [spell] Fix spell error in row_batch.h (#10109) 2022-06-14 15:28:29 +08:00
622143f87c [typo] Fix typos in comments (#10111) 2022-06-14 15:28:11 +08:00
9203a235e0 [typo] Fix typos in runtime_state.cpp (#10112) 2022-06-14 15:27:40 +08:00