doris

Author	SHA1	Message	Date
yiguolei	321107cb40	[refactor](schema change) Using tablet schema shared ptr instead of raw ptr (#11475 ) * Using tabletschema shared ptr instead of raw ptrs Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-08-05 11:04:38 +08:00
Mingyu Chen	abbf75d302	[doc][refactor](metrics) Reorganize FE and BE metrics and add document (#11307 )	2022-08-02 11:34:06 +08:00
Pxl	4e6a59df4c	[Improvement][chore] add const to all operator== (#11251 )	2022-07-27 21:46:47 +08:00
Xin Liao	eab8382b4a	[feature-wip](unique-key-merge-on-write) add the implementation of primary key index update, DSIP-018 (#11057 )	2022-07-27 14:17:56 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
Adonis Ling	e5663f9872	[Bug](array-type) Fix the core dump caused by unaligned __int128 (#11020 ) Fix the core dump caused by unaligned __int128 and change DEFAULT_ALIGNMENT	2022-07-20 16:37:27 +08:00
zhannngchen	86502b014d	[feature-wip](unique-key-merge-on-write)port IntervalTree from kudu (#10511 ) See the DISP-18:https://cwiki.apache.org/confluence/display/DORIS/DSIP-018%3A+Support+Merge-On-Write+implementation+for+UNIQUE+KEY+data+model This patch is for step 3.1 in scheduling.	2022-07-05 17:43:01 +08:00
xiepengcheng01	1d3496c6ab	[feature] support backup/restore connect to HDFS (#10081 )	2022-06-19 10:26:20 +08:00
carlvinhust2012	990a2940ca	[metric] add some metrics for cpu and memory (#9887 ) 1. add some metrics for cpu monitor; 2. add metrics for process state monitor; 3. add metrics for memory monitor; It is convenient for us to use grafana to filter through different conditions. After the added, we can find the cpu metrics like this： doris_be_cpu{device="cpu1",mode="guest_nice"} 0 doris_be_cpu{device="cpu1",mode="guest"} 0 doris_be_cpu{device="cpu1",mode="steal"} 0 doris_be_cpu{device="cpu1",mode="soft_irq"} 107168 doris_be_cpu{device="cpu1",mode="irq"} 0 doris_be_cpu{device="cpu1",mode="iowait"} 3726931 doris_be_cpu{device="cpu1",mode="idle"} 2358039214 doris_be_cpu{device="cpu1",mode="system"} 58699464 doris_be_cpu{device="cpu1",mode="nice"} 1700438 doris_be_cpu{device="cpu1",mode="user"} 54974091 we can find the memory metrics as follow： doris_be_memory_pswpin 167785 doris_be_memory_pswpout 203724 doris_be_memory_pgpgin 22308762092 doris_be_memory_pgpgout 152101956232 we also can find the process metrics as follow: doris_be_proc{mode="interrupt"} 421721020416 doris_be_proc{mode="ctxt_switch"} 2806640907317 doris_be_proc{mode="procs_running"} 8 doris_be_proc{mode="procs_blocked"} 3	2022-06-10 19:45:31 +08:00
Kang	efdb3b79a5	[feature] add zstd compression codec (#9747 ) ZSTD compression is fast with high compression ratio. It can be used to archive higher compression ratio than default Lz4f codec for storing cost sensitive data such as logs. Compared to Lz4f codec, we see zstd codec get 35% compressed size off, 30% faster at first time read without OS page cache, 40% slower at second time read with OS page cache in the following comparison test. test data: 25GB text log, 110 million rows test table: test_table(ts varchar(30), log string) test SQL: set enable_vectorized_engine=1; select sum(length(log)) from test_table be.conf: disable_storage_page_cache = true set this config to disable doris page cache to avoid all data cached in memory for test real decompression speed. test result master branch with lz4f codec result: - compressed size 4.3G - SQL first exec time(read data from disk + decompress + little computation) : 18.3s - SQL second exec time(read data from OS pagecache + decompress + little computation) : 2.4s this branch with zstd codec (hardcode enable it) result: - compressed size: 2.8G - SQL first exec time: 12.8s - SQL second exec time: 3.4s	2022-05-27 21:56:18 +08:00
jacktengg	9236c2efc9	[improvement] Show detail status code string for be http api (#9771 ) 1. move to_json method to common/status 2. modify related usage in http folder	2022-05-26 15:09:21 +08:00
Adonis Ling	2a11a4ab99	[feature-wip][array-type] Support more sub types. (#9466 ) Please refer to #9465	2022-05-26 08:41:34 +08:00
Adonis Ling	ec2cd0083a	[code format]Upgrade clang-format in BE Code Formatter from 8 to 13 (#9602 )	2022-05-17 19:28:15 +08:00
Kang	e0c790094c	[enhancement][betarowset]optimize lz4 compress and decompress speed by reusing context (#9566 )	2022-05-15 21:18:32 +08:00
Mingyu Chen	dce18cb325	[doc] Add window functions sql help doc (#9393 )	2022-05-07 08:43:51 +08:00
Mingyu Chen	e5d4cf01ed	[fix](ut) fix a potential memory leak in BE ut (#9362 )	2022-05-05 20:47:31 +08:00
chenlinzhong	c9961c9bb9	[style] clang-format all c++ code (#9305 ) - sh build-support/clang-format.sh to clang-format all c++ code	2022-04-29 16:14:22 +08:00
Adonis Ling	bd126f0679	[improvement] Refactor type info for further optimizations. (#8786 ) ## Design: For now, there are two categories of types in Doris, one is for scalar types (such as int, char and etc.) and the other is for composite types (array and etc.). For the sake of performance, we can cache type info of scalar types globally (unique objects) due to the limited number of scalar types. When we consider the composite types, normally, the type info is generated in runtime (we can also use some cache strategy to speed up). The memory thereby should be reclaimed when we create type info for composite types. There are a lots of interfaces to get the type info of a specific type. I reorganized those as the following describes. 1. `const TypeInfo* get_scalar_type_info(FieldType field_type)` The function is used to get the type info of scalar types. Due to the cache, the caller uses the result WITHOUT considering the problems about memory reclaim. 2. `const TypeInfo* get_collection_type_info(FieldType sub_type)` The function is used to get the type info of array types with just ONE depth. Due to the cache, the caller uses the result WITHOUT considering the problems about memory reclaim. 3. `TypeInfoPtr get_type_info(segment_v2::ColumnMetaPB* column_meta_pb)` 4. `TypeInfoPtr get_type_info(const TabletColumn* col)` These functions are used to get the type info of BOTH scalar types and composite types. The caller should be responsible to manage the resources returned. #### About the new type `TypeInfoPtr` `TypeInfoPtr` is an alias type to `unique_ptr` with a custom deleter. 1. For scalar types, the deleter does nothing. 2. For composite types, the deleter reclaim the memory. By analyzing the callers of `get_type_info`, these classes should hold TypeInfoPtr: 1. `Field` 2. `ColumnReader` 3. `DefaultValueColumnIterator` Other classes are either constructed by the foregoing classes or hold those, so they can just use the raw pointer of `TypeInfo` directly for the sake of performance. 1. `ScalarColumnWriter` - holds `Field` 1. `ZoneMapIndexWriter` - created by `ScalarColumnWriter`, use `type_info` from the field in `ScalarColumnWriter` 1. `IndexedColumnWriter` - created by `ZoneMapIndexWriter`, only uses scalar types. 2. `BitmapIndexWriter` - created by `ScalarColumnWriter`, uses `type_info` from the field in `ScalarColumnWriter` 1. `IndexedColumnWriter` - created by `BitmapIndexWriter`, uses `type_info` in `BitmapIndexWriter` and `BitmapIndexWriter` doesn't support `ArrayType`. 3. `BloomFilterIndexWriter` - created by `ScalarColumnWriter`, uses `type_info` from the field in `ScalarColumnWriter` 1. `IndexedColumnWriter` - created by `BloomFilterIndexWriter`, only uses scalar types. 2. `IndexedColumnReader` initializes `type_info` by the field type in meta (only scalar types). 3. `ColumnVectorBatch` 1. `ZoneMapIndexReader` creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `IndexedColumnReader` 2. `BitmapIndexReader` supports scalar types only and it creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `BitmapIndexReader` 3. `BloomFilterIndexWriter` supports scalar types only and it creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `BloomFilterIndexWriter`	2022-04-20 14:47:29 +08:00
Zhengguo Yang	290366787c	[refactor] refactor code, replace some file with stl libs (#8759 ) 1. replace ConditionVariables with std::condition_variable 2. repalace Mutex with std::mutex 3. repalce MonoTime with std::chrono	2022-04-13 09:55:29 +08:00
Zhengguo Yang	5a44eeaf62	[refactor] Unify all unit tests into one binary file (#8958 ) 1. solved the previous delayed unit test file size is too large (1.7G+) and the unit test link time is too long problem problems 2. Unify all unit tests into one file to significantly reduce unit test execution time to less than 3 mins 3. temporarily disable stream_load_test.cpp, metrics_action_test.cpp, load_channel_mgr_test.cpp because it will re-implement part of the code and affect other tests	2022-04-12 15:30:40 +08:00
caiconghui	4076c5466b	[refactor][improvement](type_info) use template and single instance to refactor get type info logic (#8680 ) 1. use const pointer instead of shared_ptr 2. Restrict array types to support only primitive types and nest up to 9 levels.	2022-04-03 10:10:36 +08:00
caiconghui	eb68dd0bb5	[fix](ut) Fix be ut not work for byte_buffer_test2 and json_scanner_with_jsonpath_test (#8791 )	2022-04-01 10:12:47 +08:00
Xinyi Zou	b2a56e2ab2	fix_internal_queue_test_ut (#8754 )	2022-03-31 13:54:05 +08:00
Zhengguo Yang	0d43f8e130	[refactor] remove atomic.h/cpp use std::atomic instead (#8693 )	2022-03-29 12:41:41 +08:00
caiconghui	b67596ba2a	[fix](ut) fix be ut failed (#8682 )	2022-03-28 10:50:41 +08:00
Adonis Ling	11f9f5fe4d	[chore][be-test] Link gtest_main to provide default main function definition. (#8631 )	2022-03-28 10:14:48 +08:00
spaces-x	bea9a7ba4f	[feature] Support pre-aggregation for quantile type (#8234 ) Add a new column-type to speed up the approximation of quantiles. 1. The new column-type is named `quantile_state` with fixed aggregation function `quantile_union`, which stores the intermediate results of pre-aggregated approximation calculations for quantiles. 2. support pre-aggregation of new column-type and quantile_state related functions.	2022-03-24 09:11:34 +08:00
Adonis Ling	2580da4f72	[feature-wip](array-type) Support insertion for vectorized engine. (#8494 ) (#8590 ) Please refer to #8493	2022-03-22 15:48:13 +08:00
Adonis Ling	e44038caf3	[feature-wip](array-type) Array data can be loaded in stream load. (#8368 ) (#8585 ) Please refer to #8367 .	2022-03-22 15:25:40 +08:00
Xinyi Zou	eeae516e37	[Feature](Memory) Hook TCMalloc new/delete automatically counts to MemTracker (#8476 ) Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G Implement a new way of memory statistics based on TCMalloc New/Delete Hook, MemTracker and TLS, and it is expected that all memory new/delete/malloc/free of the BE process can be counted.	2022-03-20 23:06:54 +08:00
Pxl	a8af8d2981	[fix](vectorized) fix core dump on get_json_string and add some ut (#8496 )	2022-03-17 10:08:31 +08:00
camby	3ba4de0d27	[fix](ut) fix some UT compile or run failed cases (#8489 )	2022-03-16 11:38:35 +08:00
Xinyi Zou	e17aef9467	[refactor] refactor the implement of MemTracker, and related usage (#8322 ) Modify the implementation of MemTracker: 1. Simplify a lot of useless logic; 2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing; 3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes; 4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection 5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently; 6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later; 7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env; 8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.; Modify where MemTracker is used: 1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code; 2. Added trackers for global objects such as ChunkAllocator and StorageEngine; 3. Added more fine-grained trackers such as ExprContext; 4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode; 5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;	2022-03-11 22:04:23 +08:00
Zhengguo Yang	f622ce0497	[refactor] remove types_test (#8289 ) * [refactor] remove types_test 1. remove types_test, it will cause core dump in higher version GCC or clang, because of memory align, some code will be vectorized in higher GCC or clang 2. Change string type length to 2 GB instead of -1 3. modify inaccessible code	2022-03-03 09:31:35 +08:00
Zhengguo Yang	246ac4e37a	[fix] fix a bug of encryption function with iv may return wrong result (#8277 )	2022-03-02 17:26:44 +08:00
Zhengguo Yang	50864aca7d	[refactor] fix warings when compile with clang (#8069 )	2022-02-19 11:29:02 +08:00
Zhengguo Yang	f8d086d87f	[feature](rpc) (experimental)Support implement UDF through GRPC protocol. (#7519 ) Support implement UDF through GRPC protocol. This brings several benefits: 1. The udf implementation language is not limited to c++, users can use any familiar language to implement udf 2. UDF is decoupled from Doris, udf will not cause doris coredump, udf computing resources are separated from doris, and doris services are not affected But RPC's UDF has a fixed overhead, so its performance is much slower than C++ UDF, especially when the amount of data is large. Create function like ``` CREATE FUNCTION rpc_add(INT, INT) RETURNS INT PROPERTIES ( "SYMBOL"="add_int", "OBJECT_FILE"="127.0.0.1:9999", "TYPE"="RPC" ); ``` Function service need to implement `check_fn` and `fn_call` methods Note: THIS IS AN EXPERIMENTAL FEATURE, THE INTERFACE AND DATA STRUCTURE MAY BE CHANGED IN FUTURE !!!	2022-02-08 09:25:09 +08:00
Mingyu Chen	0efef1b332	[fix](schema-change) Fix bug that schema change may return -102 error (#7808 ) When using linked schema change, we need to check if all rowsets are of the same type, ALPHA or BETA. otherwise, we need to use direct schema change to convert the data.	2022-01-21 10:59:54 +08:00
Zhengguo Yang	07e2acb2f3	[feature] Suport national secret (national commercial password) algorithm SM3/SM4 (#7464 ) SM3 is password hash algorithm SM4 is a block cipher used to replace DES / AES and other international algorithms.	2021-12-28 10:39:54 +08:00
pengxiangyu	20ef8a6e21	[feature-wip](remote storage)(step1) use a struct instead of string for parameter path, add basic remote method (#7098 ) For the first, we need to make a parameter to discribe the data is local or remote. At then, we need to support some basic function to support the operation for remote storage.	2021-12-22 22:58:23 +08:00
Mingyu Chen	0499b2211b	[feat](lateral-view) Support execution of lateral view stmt (#7255 ) 1. Add table function node 2. Add 3 table functions: explode_split, explode_bitmap and explode_json_array	2021-12-16 10:46:15 +08:00
caiconghui	382351b0ee	[fix](ut) Fix run fe ut failed, be ut memory leak and build thirdparty failed (#7377 )	2021-12-15 11:00:20 +08:00
xinghuayu007	dd36ccc3bf	[feature](storage-format) Z-Order Implement (#7149 ) Support sort data by Z-Order: ``` CREATE TABLE table2 ( siteid int(11) NULL DEFAULT "10" COMMENT "", citycode int(11) NULL COMMENT "", username varchar(32) NULL DEFAULT "" COMMENT "", pv bigint(20) NULL DEFAULT "0" COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(siteid, citycode) COMMENT "OLAP" DISTRIBUTED BY HASH(siteid) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "data_sort.sort_type" = "ZORDER", "data_sort.col_num" = "2", "in_memory" = "false", "storage_format" = "V2" ); ```	2021-12-02 11:39:51 +08:00
Zhengguo Yang	e2d3d0134e	dd a method to get doris current memory usage (#6979 ) Add all memory usage check when TryConsume memory	2021-11-24 10:07:54 +08:00
Zhengguo Yang	6c6380969b	[refactor] replace boost smart ptr with stl (#6856 ) 1. replace all boost::shared_ptr to std::shared_ptr 2. replace all boost::scopted_ptr to std::unique_ptr 3. replace all boost::scoped_array to std::unique<T[]> 4. replace all boost:thread to std::thread	2021-11-17 10:18:35 +08:00
Zhengguo Yang	760fc02bfe	Added bprc stub cache check and reset api, used to test whether the bprc stub cache is available, and reset the bprc stub cache (#6916 ) Added bprc stub cache check and reset api, used to test whether the bprc stub cache is available, and reset the bprc stub cache add a config used for auto check and reset bprc stub	2021-11-05 09:45:37 +08:00
pengxiangyu	599ecb1f30	[Function] Add bitmap function bitmap_subset_limit (#6980 ) Add bitmap function bitmap_subset_limit. This function will return subset in specified index.	2021-11-04 12:14:47 +08:00
Mingyu Chen	e8cabfff27	[S3] Support path style endpoint (#6962 ) Add a use_path_style property for S3 Upgrade hadoop-common and hadoop-aws to 2.8.0 to support path style property Fix some S3 URI bugs Add some logs for tracing load process.	2021-11-01 10:48:10 +08:00
Zhengguo Yang	24d38614a0	[Dependency] Upgrade thirdparty libs (#6766 ) Upgrade the following dependecies: libevent -> 2.1.12 OpenSSL 1.0.2k -> 1.1.1l thrift 0.9.3 -> 0.13.0 protobuf 3.5.1 -> 3.14.0 gflags 2.2.0 -> 2.2.2 glog 0.3.3 -> 0.4.0 googletest 1.8.0 -> 1.10.0 snappy 1.1.7 -> 1.1.8 gperftools 2.7 -> 2.9.1 lz4 1.7.5 -> 1.9.3 curl 7.54.1 -> 7.79.0 re2 2017-05-01 -> 2021-02-02 zstd 1.3.7 -> 1.5.0 brotli 1.0.7 -> 1.0.9 flatbuffers 1.10.0 -> 2.0.0 apache-arrow 0.15.1 -> 5.0.0 CRoaring 0.2.60 -> 0.3.4 orc 1.5.8 -> 1.6.6 libdivide 4.0.0 -> 5.0 brpc 0.97 -> 1.0.0-rc02 librdkafka 1.7.0 -> 1.8.0 after this pr compile doris should use build-env:1.4.0	2021-10-15 13:03:04 +08:00
weizuo93	57199955d6	[Compaction][ThreadPool]Support adjust compaction threads num at runtime (#5781 ) * adjust thread number of compaction thread pool dynamically Co-authored-by: weizuo <weizuo@xiaomi.com>	2021-09-02 10:01:44 +08:00

1 2 3 4

167 Commits