doris

Author	SHA1	Message	Date
Pxl	5805f8077f	[Feature] [Vectorized] Some pre-refactorings or interface additions for schema change part2 (#10003 )	2022-06-16 10:50:08 +08:00
yiguolei	90f229c038	[refactor] remove useless plugin test code (#10061 ) * remove plugin test code * remove plugin test Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-06-16 10:43:28 +08:00
Adonis Ling	983cdc7b0d	[feature-wip](array-type) Support loading data in vectorized format (#10065 )	2022-06-15 14:40:28 +08:00
plat1ko	f4e2f78a1a	[fix] Fix the bug that data balance causes tablet loss (#9971 ) 1. Provide a FE conf to test the reliability in single replica case when tablet scheduling are frequent. 2. According to #6063, almost apply this fix on current code.	2022-06-15 09:52:56 +08:00
gtchaos	f7b5f36da4	[feature] Support read hive external table and outfile into HDFS that authenticated by kerberos (#9579 ) At present, Doris can only access the hadoop cluster with kerberos authentication enabled by broker, but Doris BE itself does not supports access to a kerberos-authenticated HDFS file. This PR hope solve the problem. When create hive external table, users just specify following properties to access the hdfs data with kerberos authentication enabled: ```sql CREATE EXTERNAL TABLE t_hive ( k1 int NOT NULL COMMENT "", k2 char(10) NOT NULL COMMENT "", k3 datetime NOT NULL COMMENT "", k5 varchar(20) NOT NULL COMMENT "", k6 double NOT NULL COMMENT "" ) ENGINE=HIVE COMMENT "HIVE" PROPERTIES ( 'hive.metastore.uris' = 'thrift://192.168.0.1:9083', 'database' = 'hive_db', 'table' = 'hive_table', 'dfs.nameservices'='hacluster', 'dfs.ha.namenodes.hacluster'='n1,n2', 'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020', 'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020', 'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider', 'dfs.namenode.kerberos.principal'='hadoop/_HOST@REALM.COM' 'hadoop.security.authentication'='kerberos', 'hadoop.kerberos.principal'='doris_test@REALM.COM', 'hadoop.kerberos.keytab'='/path/to/doris_test.keytab' ); ``` If you want to `select into outfile` to HDFS that kerberos authentication enable, you can refer to the following SQL statement： ```sql select * from test into outfile "hdfs://tmp/outfile1" format as csv properties ( 'fs.defaultFS'='hdfs://hacluster/', 'dfs.nameservices'='hacluster', 'dfs.ha.namenodes.hacluster'='n1,n2', 'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020', 'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020', 'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider', 'dfs.namenode.kerberos.principal'='hadoop/_HOST@REALM.COM' 'hadoop.security.authentication'='kerberos', 'hadoop.kerberos.principal'='doris_test@REALM.COM', 'hadoop.kerberos.keytab'='/path/to/doris_test.keytab' ); ```	2022-06-14 20:07:03 +08:00
Pxl	5d624dfe6c	[bugfix]fix segmentation fault at unalign address cast to int128 (#10094 )	2022-06-14 15:32:58 +08:00
Xinyi Zou	d58e00c49c	[fix](brpc) Embed serialized request into the attachment and transmit it through http brpc (#9803 ) When the length of `Tuple/Block data` is greater than 2G, serialize the protoBuf request and embed the `Tuple/Block data` into the controller attachment and transmit it through http brpc. This is to avoid errors when the length of the protoBuf request exceeds 2G: `Bad request, error_text=[E1003]Fail to compress request`. In #7164, `Tuple/Block data` was put into attachment and sent via default `baidu_std brpc`, but when the attachment exceeds 2G, it will be truncated. There is no 2G limit for sending via `http brpc`. Also, in #7921, consider putting `Tuple/Block data` into attachment transport by default, as this theoretically reduces one serialization and improves performance. However, the test found that the performance did not improve, but the memory peak increased due to the addition of a memory copy.	2022-06-13 20:41:48 +08:00
Jerry Hu	797f6e1472	[Enhancement]Decode bitshuffle data before adding it into PageCache (#10036 ) * [Enhancement]Decode bitshuffle data before add into PageCache * Fix be ut failed	2022-06-13 09:04:23 +08:00
Adonis Ling	415b6b8086	[feature-wip](array-type) Support array type which doesn't contain null (#9809 )	2022-06-12 23:35:28 +08:00
carlvinhust2012	990a2940ca	[metric] add some metrics for cpu and memory (#9887 ) 1. add some metrics for cpu monitor; 2. add metrics for process state monitor; 3. add metrics for memory monitor; It is convenient for us to use grafana to filter through different conditions. After the added, we can find the cpu metrics like this： doris_be_cpu{device="cpu1",mode="guest_nice"} 0 doris_be_cpu{device="cpu1",mode="guest"} 0 doris_be_cpu{device="cpu1",mode="steal"} 0 doris_be_cpu{device="cpu1",mode="soft_irq"} 107168 doris_be_cpu{device="cpu1",mode="irq"} 0 doris_be_cpu{device="cpu1",mode="iowait"} 3726931 doris_be_cpu{device="cpu1",mode="idle"} 2358039214 doris_be_cpu{device="cpu1",mode="system"} 58699464 doris_be_cpu{device="cpu1",mode="nice"} 1700438 doris_be_cpu{device="cpu1",mode="user"} 54974091 we can find the memory metrics as follow： doris_be_memory_pswpin 167785 doris_be_memory_pswpout 203724 doris_be_memory_pgpgin 22308762092 doris_be_memory_pgpgout 152101956232 we also can find the process metrics as follow: doris_be_proc{mode="interrupt"} 421721020416 doris_be_proc{mode="ctxt_switch"} 2806640907317 doris_be_proc{mode="procs_running"} 8 doris_be_proc{mode="procs_blocked"} 3	2022-06-10 19:45:31 +08:00
camby	6fab1cbf3c	[feature-wip](array-type) Add array functions size and cardinality (#9921 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-06-09 15:03:03 +08:00
yinzhijian	19bc14cf8d	[feature-wip](array-type) Add array type support for vectorized parquet-orc scanner (#9856 ) Only support one level array now. for example: - nullable(array(nullable(tinyint))) is support. - nullable(array(nullable(array(xx))) is not support.	2022-06-09 12:11:47 +08:00
HappenLee	94089b9192	[Refactor] Use file factory to replace create file reader/writer (#9505 ) 1. Simplify code logic and improve abstraction 2. Fix the mem leak of raw pointer Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-06-08 15:07:39 +08:00
Pxl	c0ad1be1bd	[Enhancement][Chore] remove breakpad and unused variable (#9937 )	2022-06-02 20:52:17 +08:00
HappenLee	c426c2e4b1	[Vectorized-Load] Support vectorized load table with materialized view (#9923 ) * [Vectorized-Load] Support vectorized load table with materialized view * fix ut Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-06-02 14:59:01 +08:00
Gabriel	632f7a3d3d	[Feature] add `weekday` function on vectorized engine (#9901 )	2022-06-01 14:47:37 +08:00
Xinyi Zou	0376ca17f3	[Enhancement] Remove minidump (#9894 )	2022-06-01 08:04:24 +08:00
HappenLee	0cba6b7d95	[Bug][Fix] One Rowset have same key output in unique table (#9858 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-05-31 12:29:16 +08:00
Adonis Ling	f377c26bf7	[refactor][be] Optimize headers (#9708 )	2022-05-30 16:12:10 +08:00
Jing Shen	7b98dd438d	[feature](function) Add nvl function (#9726 )	2022-05-30 09:43:00 +08:00
Mingyu Chen	9fe3827239	[fix](ut) fix BE ut (#9831 ) introduced from #8923, the github checks has some problem that failed to check BE ut in #8923	2022-05-29 12:25:41 +08:00
Pxl	f33ef32d92	[Bug] [Bitmap] change to_bitmap to always_not_nullable (#9716 )	2022-05-28 17:33:55 +08:00
Dayue Gao	4d1e926b6c	[feature][config] introduce a new BE config storage_page_cache_shard_size (#9821 ) Co-authored-by: gaodayue <gaodayue@bytedance.com>	2022-05-28 10:17:09 +08:00
Kang	efdb3b79a5	[feature] add zstd compression codec (#9747 ) ZSTD compression is fast with high compression ratio. It can be used to archive higher compression ratio than default Lz4f codec for storing cost sensitive data such as logs. Compared to Lz4f codec, we see zstd codec get 35% compressed size off, 30% faster at first time read without OS page cache, 40% slower at second time read with OS page cache in the following comparison test. test data: 25GB text log, 110 million rows test table: test_table(ts varchar(30), log string) test SQL: set enable_vectorized_engine=1; select sum(length(log)) from test_table be.conf: disable_storage_page_cache = true set this config to disable doris page cache to avoid all data cached in memory for test real decompression speed. test result master branch with lz4f codec result: - compressed size 4.3G - SQL first exec time(read data from disk + decompress + little computation) : 18.3s - SQL second exec time(read data from OS pagecache + decompress + little computation) : 2.4s this branch with zstd codec (hardcode enable it) result: - compressed size: 2.8G - SQL first exec time: 12.8s - SQL second exec time: 3.4s	2022-05-27 21:56:18 +08:00
yinzhijian	cbbda7857b	[feature-wip](parquet-orc) Support orc scanner in vectorized engine (#9541 )	2022-05-26 21:39:12 +08:00
Pxl	13c1d20426	[Bug] [Vectorized] add padding when load char type data (#9734 )	2022-05-26 16:51:01 +08:00
jacktengg	9236c2efc9	[improvement] Show detail status code string for be http api (#9771 ) 1. move to_json method to common/status 2. modify related usage in http folder	2022-05-26 15:09:21 +08:00
Adonis Ling	2a11a4ab99	[feature-wip][array-type] Support more sub types. (#9466 ) Please refer to #9465	2022-05-26 08:41:34 +08:00
Gabriel	8470543144	[Improvement] fix typo (#9743 )	2022-05-25 19:29:01 +08:00
Xinyi Zou	ca05d1ee01	[fix](memory tracker) Fix lru cache, compaction tracker, add USE_MEM_TRACKER compile (#9661 ) 1. Fix Lru Cache MemTracker consumption value is negative. 2. Fix compaction Cache MemTracker has no track. 3. Add USE_MEM_TRACKER compile option. 4. Make sure the malloc/free hook is not stopped at any time.	2022-05-25 08:56:17 +08:00
pengxiangyu	75b3707a28	[refactor](load) add tablet errors when close_wait return error (#9619 )	2022-05-22 21:27:42 +08:00
xiepengcheng01	31e40191a8	[Refactor] add vpre_filter_expr for vectorized to improve performance (#9508 )	2022-05-22 11:45:57 +08:00
HappenLee	8fa677b59c	[Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner (#9666 ) * [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner 1. fix bug of vjson scanner not support `range_from_file_path` 2. fix bug of vjson/vbrocker scanner core dump by src/dest slot nullable is different 3. fix bug of vparquest filter_block reference of column in not 1 4. refactor code to simple all the code It only changed vectorized load, not original row based load. Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-05-20 11:43:03 +08:00
Dayue Gao	c09858671d	[improvement][performance] improve lru cache resize performance and memory usage (#9521 )	2022-05-19 23:37:59 +08:00
Shuangchi He	73c4ec7167	Fix some typos in be/. (#9681 )	2022-05-19 20:55:39 +08:00
Adonis Ling	ec2cd0083a	[code format]Upgrade clang-format in BE Code Formatter from 8 to 13 (#9602 )	2022-05-17 19:28:15 +08:00
yinzhijian	bee5c2f8aa	[feature-wip](parquet-vec) Support parquet scanner in vectorized engine (#9433 )	2022-05-17 09:37:17 +08:00
zhangstar333	953429e370	[fix](function) fix last_value get wrong result when have order by clause (#9247 )	2022-05-15 23:56:01 +08:00
Kang	e0c790094c	[enhancement][betarowset]optimize lz4 compress and decompress speed by reusing context (#9566 )	2022-05-15 21:18:32 +08:00
yiguolei	cd105bee0a	[refactor](es) Clean es tcp scannode and related thrift definitions (#9553 ) PaloExternalSourcesService is designed for es_scan_node using tcp protocol. But es tcp protocol need deploy a tcp jar into es code. Both es version and lucene version are upgraded, and the tcp jar is not maintained any more. So that I remove all the related code and thrift definitions.	2022-05-14 10:03:55 +08:00
carlvinhust2012	b817efd652	[feature] add vectorized vjson_scanner (#9311 ) This pr is used to add the vectorized vjson_scanner, which can support vectorized json import in stream load flow.	2022-05-14 09:50:05 +08:00
camby	650e3a6ba0	[feature-wip](array-type) array_contains support more nested data types (#9170 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-05-13 12:42:40 +08:00
Adonis Ling	718a51a388	[refactor][style] Use clang-format to sort includes (#9483 )	2022-05-10 21:25:35 +08:00
xiepengcheng01	eec1dfde3a	[feature] (vec) instead of converting line to src tuple for stream load in vectorized. (#9314 ) Co-authored-by: xiepengcheng01 <xiepengcheng01@xafj-palo-rpm64.xafj.baidu.com>	2022-05-09 11:24:07 +08:00
Xinyi Zou	ae01862ae4	[fix](ut) fix DeltaWriter::close_wait parameter mismatch in delta_writer_test (#9457 )	2022-05-09 09:38:12 +08:00
Mingyu Chen	dce18cb325	[doc] Add window functions sql help doc (#9393 )	2022-05-07 08:43:51 +08:00
Mingyu Chen	e5d4cf01ed	[fix](ut) fix a potential memory leak in BE ut (#9362 )	2022-05-05 20:47:31 +08:00
chenlinzhong	c9961c9bb9	[style] clang-format all c++ code (#9305 ) - sh build-support/clang-format.sh to clang-format all c++ code	2022-04-29 16:14:22 +08:00
HappenLee	d330bc3806	[Vectorized](stream-load-vec) Support stream load in vectorized engine (#8709 ) (#9280 ) Implement vectorized stream load. Added fe configuration option `enable_vectorized_load` to enable vectorized stream load. Co-authored-by: tengjp@outlook.com Co-authored-by: mrhhsg@gmail.com Co-authored-by: minghong.zhou@163.com Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>	2022-04-29 09:50:51 +08:00
Mingyu Chen	7cfebd05fd	[fix](hierarchical-storage) Fix bug that storage medium property change back to SSD (#9158 ) 1. fix bug described in #9159 2. fix a `fill_tuple` bug introduced from #9173	2022-04-26 10:15:19 +08:00

1 2 3 4 5 ...

678 Commits