doris

Author	SHA1	Message	Date
Adonis Ling	5fdd995b4c	[fix] Fix heap-use-after-free when using type array<string> (#10127 )	2022-06-19 10:27:36 +08:00
xiepengcheng01	1d3496c6ab	[feature] support backup/restore connect to HDFS (#10081 )	2022-06-19 10:26:20 +08:00
camby	0e404edf54	[improvement] Change array offset type from UInt32 to UInt64 (#10070 ) Now column `Array<T>` contains column `offsets` and `data`, and type of column `offsets` is UInt32 now. If we call array_union to merge arrays repeatedly, the size of array may overflow. So we need to extend it before `Array Data Type` release.	2022-06-19 10:24:08 +08:00
Lei Zhang	7a85e8d525	[bug](be) fix be block_reader.cc::_update_agg_value() mem leak.(#10216 ) (#10218 )	2022-06-17 21:25:52 +08:00
wangbo	f7789f4bc4	[fix]InListPredicate wrong result (#10211 ) * fix * reg test Co-authored-by: Wang Bo <wangbo36@meituan.com>	2022-06-17 18:34:25 +08:00
yixiutt	f35b235c3b	[opt](compaction) optimize compaction in concurrent load (#10153 ) add some logic to opt compaction: 1.seperate base&cumu compaction in case base compaction runs too long and affect cumu compaction 2.fix level size in cu compaction so that file size below 64M have a right level size, when choose rowsets to do compaction, the policy will ignore big rowset, this will reduce about 25% cpu in high frequency concurrent load 3.remove skip window restriction so rowset can do compaction right after generated, cause we'll not delete rowset after compaction. This will highly reduce compaction score in concurrent log. 4.remove version consistence check in can_do_compaction, we'll choose a consecutive rowset to do compaction, so this logic is useless after add logic above, compaction score and cpu cost will have a substantial optimize in concurrent load. Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-06-17 17:49:45 +08:00
Gabriel	60147ad7a5	[Improvement] build runtime filters asynchronously (#10186 )	2022-06-17 11:09:13 +08:00
Adonis Ling	5e47b03595	[feature-wip](array-type) Add array aggregation functions (#10108 )	2022-06-17 11:07:49 +08:00
Pxl	fd0bd395ac	[Enhancement] Remove some unused include (#10035 )	2022-06-17 10:47:25 +08:00
zhangstar333	44e979e43b	[Vectorized][Function] add orthogonal bitmap agg functions (#10126 ) * [Vectorized][Function] add orthogonal bitmap agg functions save some file about orthogonal bitmap function add some file to rebase update functions file * refactor union_count function refactor orthogonal union count functions * remove bool is_variadic	2022-06-17 08:48:41 +08:00
starocean999	1cca319d18	[fix](vectorized) intersect operator takes too long time to execute (#10183 ) * fix itersect operator takes too long time to execute * modify code based on review comments	2022-06-17 08:43:53 +08:00
Gabriel	6f5f447aa3	[FOLLOWUP] cherrypick after refactoring scan nodes (#10177 )	2022-06-17 08:41:47 +08:00
camby	96de99525e	[compile&build]clang compile errors fix (#10201 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-06-17 08:41:25 +08:00
Xinyi Zou	c784fb3ddd	[fix] (mem tracker) Fix core dump during transmit_block (#10133 ) In some cases, query mem tracker does not exist in BE when transmit block. This will result in a null pointer for get query mem tracker in brpc transmit_block	2022-06-17 00:01:30 +08:00
HappenLee	8d98c17c4e	[Bug][Vectorized] Fix DCHECK failed in VExchangeNode close twice (#10184 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-06-16 23:56:49 +08:00
yinzhijian	75a7e72402	[Refactor] Use iequal to replace boost::iequals (#10146 ) * [Refactor] Use iequal to replace boost::iequals * remove unused include	2022-06-16 18:18:38 +08:00
Pxl	ae9c231925	[Enhancement][Storage] refactor InListPredicate/NotInListPredicate (#10139 ) * refactor in_list_pred * update	2022-06-16 18:09:29 +08:00
lihangyu	f49a4535c4	[Fix] fix vjson_scanner heap use after free when meet object or array type (#10179 ) quick merge. It is a serious bug in 1.1.	2022-06-16 16:01:18 +08:00
HappenLee	33921c5e75	[Bug] Fix _add_block_closure do not delete in ~VNodeChannel() (#10180 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-06-16 15:56:07 +08:00
Gabriel	28e8effc52	[Refactor] Refactor vectorized scan node (#9968 )	2022-06-16 11:10:56 +08:00
Jerry Hu	4b9d500425	[improvement](profile) Add table name and predicates (#10093 )	2022-06-16 10:59:31 +08:00
Pxl	5805f8077f	[Feature] [Vectorized] Some pre-refactorings or interface additions for schema change part2 (#10003 )	2022-06-16 10:50:08 +08:00
yiguolei	90f229c038	[refactor] remove useless plugin test code (#10061 ) * remove plugin test code * remove plugin test Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-06-16 10:43:28 +08:00
yinzhijian	bc431f2806	[typo] Fix typos in comments (#10142 )	2022-06-16 10:13:59 +08:00
chenlinzhong	4dfebb9852	[Feature] compaction quickly for small data import (#9804 ) * compaction quickly for small data import #9791 1.merge small versions of rowset as soon as possible to increase the import frequency of small version data 2.small version means that the number of rows is less than config::small_compaction_rowset_rows default 1000	2022-06-15 21:48:34 +08:00
HappenLee	f1d0c231b9	[Opt][Vectorized] Opt vectorized the unique_table in storage vectorized (#10132 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-06-15 15:32:15 +08:00
Adonis Ling	983cdc7b0d	[feature-wip](array-type) Support loading data in vectorized format (#10065 )	2022-06-15 14:40:28 +08:00
zhangstar333	4c24586865	[Vectorized][UDF] support java-udaf (#9930 )	2022-06-15 10:53:44 +08:00
plat1ko	f4e2f78a1a	[fix] Fix the bug that data balance causes tablet loss (#9971 ) 1. Provide a FE conf to test the reliability in single replica case when tablet scheduling are frequent. 2. According to #6063, almost apply this fix on current code.	2022-06-15 09:52:56 +08:00
wudi	02b1908ce4	[modify default config]add be 2pc config enbale defalut (#10110 ) Co-authored-by: wudi <>	2022-06-15 09:08:28 +08:00
Xinyi Zou	85362a907e	[fix](mem tracker) Fix some memory leaks, inaccurate statistics, core dump, deadlock bugs (#10072 ) 1. Fix the memory leak. When the load task is canceled, the `IndexChannel` and `NodeChannel` mem trackers cannot be destructed in time. 2. Fix Load task being frequently canceled by oom and inaccurate `LoadChannel` mem tracker limit, and rewrite the variable name of `mem limit` in `LoadChannel`. 3. Fix core dump, when logout task mem tracker, phmap erase fails, resulting in repeated logout of the same tracker. 4. Fix the deadlock, when add_child_tracker mem limit exceeds, calling log_usage causes `_child_trackers_lock` deadlock. 5. Fix frequent log printing when thread mem tracker limit exceeds, which will affect readability and performance. 6. Optimize some details of mem tracker display.	2022-06-14 21:38:37 +08:00
gtchaos	f7b5f36da4	[feature] Support read hive external table and outfile into HDFS that authenticated by kerberos (#9579 ) At present, Doris can only access the hadoop cluster with kerberos authentication enabled by broker, but Doris BE itself does not supports access to a kerberos-authenticated HDFS file. This PR hope solve the problem. When create hive external table, users just specify following properties to access the hdfs data with kerberos authentication enabled: ```sql CREATE EXTERNAL TABLE t_hive ( k1 int NOT NULL COMMENT "", k2 char(10) NOT NULL COMMENT "", k3 datetime NOT NULL COMMENT "", k5 varchar(20) NOT NULL COMMENT "", k6 double NOT NULL COMMENT "" ) ENGINE=HIVE COMMENT "HIVE" PROPERTIES ( 'hive.metastore.uris' = 'thrift://192.168.0.1:9083', 'database' = 'hive_db', 'table' = 'hive_table', 'dfs.nameservices'='hacluster', 'dfs.ha.namenodes.hacluster'='n1,n2', 'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020', 'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020', 'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider', 'dfs.namenode.kerberos.principal'='hadoop/_HOST@REALM.COM' 'hadoop.security.authentication'='kerberos', 'hadoop.kerberos.principal'='doris_test@REALM.COM', 'hadoop.kerberos.keytab'='/path/to/doris_test.keytab' ); ``` If you want to `select into outfile` to HDFS that kerberos authentication enable, you can refer to the following SQL statement： ```sql select * from test into outfile "hdfs://tmp/outfile1" format as csv properties ( 'fs.defaultFS'='hdfs://hacluster/', 'dfs.nameservices'='hacluster', 'dfs.ha.namenodes.hacluster'='n1,n2', 'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020', 'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020', 'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider', 'dfs.namenode.kerberos.principal'='hadoop/_HOST@REALM.COM' 'hadoop.security.authentication'='kerberos', 'hadoop.kerberos.principal'='doris_test@REALM.COM', 'hadoop.kerberos.keytab'='/path/to/doris_test.keytab' ); ```	2022-06-14 20:07:03 +08:00
HappenLee	c2af14fc61	[Bug] return type is not always nullable of function (#10116 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-06-14 16:32:35 +08:00
HappenLee	14bc971159	[Bug] Fix bug push value predicate of unique table when have sequence column (#10060 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-06-14 15:35:31 +08:00
Pxl	5d624dfe6c	[bugfix]fix segmentation fault at unalign address cast to int128 (#10094 )	2022-06-14 15:32:58 +08:00
yinzhijian	2a96d7ffde	[spell] Fix spell error in row_batch.h (#10109 )	2022-06-14 15:28:29 +08:00
yinzhijian	622143f87c	[typo] Fix typos in comments (#10111 )	2022-06-14 15:28:11 +08:00
yinzhijian	9203a235e0	[typo] Fix typos in runtime_state.cpp (#10112 )	2022-06-14 15:27:40 +08:00
Pxl	e58cac1f00	[build] use inline to replace static (#10087 )	2022-06-14 09:18:15 +08:00
Zhengguo Yang	39a2785ce2	[enhancement] support simd instructions on arm cpus through sse2neon (#10068 ) * [enhancement] support simd instructions on arm cpus through sse2neon	2022-06-14 09:17:09 +08:00
zxealous	d4d2e82bdf	[typo] Fix typos in comments (#10106 )	2022-06-14 08:17:19 +08:00
jacktengg	ce730293c0	[improvement] send merged runtime filter asynchrously (#10080 )	2022-06-14 08:16:25 +08:00
Xinyi Zou	d58e00c49c	[fix](brpc) Embed serialized request into the attachment and transmit it through http brpc (#9803 ) When the length of `Tuple/Block data` is greater than 2G, serialize the protoBuf request and embed the `Tuple/Block data` into the controller attachment and transmit it through http brpc. This is to avoid errors when the length of the protoBuf request exceeds 2G: `Bad request, error_text=[E1003]Fail to compress request`. In #7164, `Tuple/Block data` was put into attachment and sent via default `baidu_std brpc`, but when the attachment exceeds 2G, it will be truncated. There is no 2G limit for sending via `http brpc`. Also, in #7921, consider putting `Tuple/Block data` into attachment transport by default, as this theoretically reduces one serialization and improves performance. However, the test found that the performance did not improve, but the memory peak increased due to the addition of a memory copy.	2022-06-13 20:41:48 +08:00
Gabriel	8af9339b00	[BUGFIX] Fix wrong column types in result file sink (#10079 )	2022-06-13 09:05:11 +08:00
Jerry Hu	797f6e1472	[Enhancement]Decode bitshuffle data before adding it into PageCache (#10036 ) * [Enhancement]Decode bitshuffle data before add into PageCache * Fix be ut failed	2022-06-13 09:04:23 +08:00
Adonis Ling	415b6b8086	[feature-wip](array-type) Support array type which doesn't contain null (#9809 )	2022-06-12 23:35:28 +08:00
carlvinhust2012	990a2940ca	[metric] add some metrics for cpu and memory (#9887 ) 1. add some metrics for cpu monitor; 2. add metrics for process state monitor; 3. add metrics for memory monitor; It is convenient for us to use grafana to filter through different conditions. After the added, we can find the cpu metrics like this： doris_be_cpu{device="cpu1",mode="guest_nice"} 0 doris_be_cpu{device="cpu1",mode="guest"} 0 doris_be_cpu{device="cpu1",mode="steal"} 0 doris_be_cpu{device="cpu1",mode="soft_irq"} 107168 doris_be_cpu{device="cpu1",mode="irq"} 0 doris_be_cpu{device="cpu1",mode="iowait"} 3726931 doris_be_cpu{device="cpu1",mode="idle"} 2358039214 doris_be_cpu{device="cpu1",mode="system"} 58699464 doris_be_cpu{device="cpu1",mode="nice"} 1700438 doris_be_cpu{device="cpu1",mode="user"} 54974091 we can find the memory metrics as follow： doris_be_memory_pswpin 167785 doris_be_memory_pswpout 203724 doris_be_memory_pgpgin 22308762092 doris_be_memory_pgpgout 152101956232 we also can find the process metrics as follow: doris_be_proc{mode="interrupt"} 421721020416 doris_be_proc{mode="ctxt_switch"} 2806640907317 doris_be_proc{mode="procs_running"} 8 doris_be_proc{mode="procs_blocked"} 3	2022-06-10 19:45:31 +08:00
Zhengguo Yang	e0cf2677a0	[dependency][enhancement] support build libhdfs in arm cpus (#10018 ) Supports native hdfs functionality on arm cpu This pr mainly upgrades libdfs3 and supports running on arm，and make libhdfs3 with kerberos as default	2022-06-10 19:40:41 +08:00
Pxl	979c81b066	[bugfix] signed long will be converted to signed long during dcheck and cause dcheck fail (#10047 )	2022-06-10 14:26:38 +08:00
Jing Shen	4a474420c8	[feature](function) Add ntile function (#9867 ) Add ntile function. For non-vectorized-engine, I just implemented like Impala, rewrite ntile to row_number and count. But for vectorized-engine, I implemented WindowFunctionNTile.	2022-06-10 10:32:40 +08:00

1 2 3 4 5 ...

2201 Commits