doris

Author	SHA1	Message	Date
Kang	bd9a9a32f5	[bugfix](s3 fs) fix s3 uri parsing for http/https uri (#20656 )	2023-06-11 14:00:04 +08:00
Pxl	7dc7ed97eb	[Chore](build) remove some unused code and remove some wno (#20326 ) remove some unused code about spinlock remove some wno and fix warning remove varadic macro usage	2023-06-05 10:48:07 +08:00
Jerry Hu	c03a19ea23	[improvement](bitmap) Using set to store a small number of elements to improve performance (#19973 ) Test on SSB 100g: select lo_suppkey, count(distinct lo_linenumber) from lineorder group by lo_suppkey; exec time: 4.388s create materialized view: create materialized view customer_uv as select lo_suppkey, bitmap_union(to_bitmap(lo_linenumber)) from lineorder group by lo_suppkey; select lo_suppkey, count(distinct lo_linenumber) from lineorder group by lo_suppkey; exec time: 12.908s test with the patch, exec time: 5.790s	2023-05-31 16:13:42 +08:00
Jerry Hu	e5eed53b89	[improvement](bitmap) Use shared_ptr in BitmapValue to avoid deep copying (#19101 ) Currently bitmapvalue type is copied between columns, it cost a lot of memory. Use a shared ptr in bitmap value to avoid copy data.	2023-05-24 16:13:01 +08:00
ZhangYu0123	1c950d6930	[fix](config) fix memory config enable_query_memroy_overcommit spell problem #19898	2023-05-22 00:32:20 +08:00
Pxl	9b7a419aed	[Chore](build) update some doc about build enviroment (#19325 ) update some doc about build enviroment	2023-05-10 16:18:44 +08:00
Adonis Ling	16a394da0e	[chore](build) Use include-what-you-use to optimize includes (PART III) (#18958 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-24 14:51:51 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
Xinyi Zou	d9fe5f7b67	[enhancement](memory) Remove MemPool and replace it with Arena (#17820 ) Arena can replace MemPool in most scenarios. Except for memory reuse, MemPool supports reuse of previous memory chunks after clear, but Arena does not. Some comparisons between MemPool and Arena: 1. Expansion Arena is less than 128M index 2 alloc chunk; more than 128M memory, allocate 128M * n > `size`, n is equal to the minimum value that satisfies the expression; MemPool less than 512K index 2 alloc chunk, greater than 512K memory, separately apply for a `size` length chunk After Arena applied for a chunk larger than 128M last time, the minimum chunk applied for after that is 128M. Does this seem to be a waste of memory? MemPool is also similar. After the chunk of 512K was applied for last time, the minimum chunk of subsequent applications is 512K. 2. Alignment MemPool defaults to 16 alignment, because memtable and other places that use int128 require 16 alignment; Arena has no default alignment; 3. Memory reuse Arena only supports `rollback`, which reuses the memory of the current chunk, usually the memory requested last time. MemPool supports clear(), all chunks can be reused; or call ReturnPartialAllocation() to roll back the last requested memory; if the last chunk has no memory, search for the most free chunk for allocation 4. Realloc Arena supports realloc contiguous memory; it also supports realloc contiguous memory from any position at the time of the last allocation. The difference between `alloc_continue` and `realloc` is: 1. Alloc_continue does not need to specify the old size, but the default old size = head->pos - range_start 2. alloc_continue supports expansion from range_start when additional_bytes is between head and pos, which is equivalent to reusing a part of memory, while realloc completely allocates a new memory MemPool does not support realloc, but supports transferring or absorbing chunks between two MemPools 5. check mem limit MemPool checks the mem limit, and Arena checks at the Allocator layer. 6. Support for ASAN Arena does something extra 7. Error handling MemPool supports returning the error message of application failure directly through `Status`, and Arena throws Exception. Tests that Arena can consider 1. After the last applied chunk is larger than 128M, the minimum applied chunk is 128M, which seems to waste memory; 2. Support clear, memory multiplexing; 3. Increase the large list, alloc the memory larger than 128M, and the size is equal to `size`, so as to avoid the current chunk not being fully used, which is wasteful. 4. In some cases, it may be possible to allocate backwards to find chunks t	2023-03-29 20:56:49 +08:00
Mingyu Chen	05db6e9b55	[refactor](file-system)(step-2) remove env, file_utils and filesystem_utils (#18009 ) Follow #17586. This PR mainly changes: Remove env/ Remove FileUtils/FilesystemUtils Some methods are moved to LocalFileSystem Remove olap/file_cache Add s3 client cache for s3 file system In my test, the time of open s3 file can be reduced significantly Fix cold/hot separation bug for s3 fs. This is the last PR of #17764. After this, all IO operation should be in io/fs. Except for tests in #17586, I also tested some case related to fs io: clone concurrency query on local/s3/hdfs load error log create and clean disk metrics	2023-03-29 09:00:52 +08:00
zclllyybb	5191b4f473	[fix](ut)support run be-ut on release mode (#18119 ) Fixed improper usage. So now be ut could be run on release mode. btw, split be build type environment variable to be/be-ut.	2023-03-27 23:00:03 +08:00
Mingyu Chen	cb79e42e5c	[refactor](file-system)(step-1) refactor file sysmte on BE and remove storage_backend (#17586 ) See #17764 for details I have tested: - Unit test for local/s3/hdfs/broker file system: be/test/io/fs/file_system_test.cpp - Outfile to local/s3/hdfs/broker. - Load from local/s3/hdfs/broker. - Query file on local/s3/hdfs/broker file system, with table value function and catalog. - Backup/Restore with local/s3/hdfs/broker file system Not test: - cold & host data separation case.	2023-03-21 21:08:38 +08:00
yiguolei	4692d6764c	[refactor](remove string val) remove string val structure, it is same with string ref (#17461 ) remove stringval, decimalv2val, bigintval	2023-03-08 10:42:20 +08:00
yiguolei	9477c48ef8	[refactor](functioncontext) remove duplicate type definition in function context (#17421 ) remove duplicate type definition in function context remove unused method in function context not need stale state in vexpr context because vexpr is stateless and function context saves state and they are cloned. remove useless slot_size in all tuple or slot descriptor. remove doris_udf namespace, it is useless. remove some unused macro definitions. init v_conjuncts in vscanner, not need write the same code in every scanner. using unique ptr to manage function context since it could only belong to a single expr context. Issue Number: close #xxx --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-06 16:07:09 +08:00
Adonis Ling	3b94ca5ceb	[chore](macOS) Use LLVM Clang by default (#17292 ) Use LLVM Clang by default	2023-03-03 14:18:02 +08:00
ZhaoChangle	e82b827bc8	[optimize](vectorization)Optimize to_string's performance. (#17076 )	2023-03-03 10:35:59 +08:00
zhengshengjun	d013d529c8	[Feature](ipv6)Support IPV6 (#14063 ) Support IPV6 in Apache Doris, the main changes are: 1. enable binding to IPV6 address if network priority in config file contains an IPV6 CIDR string 2. BRPC and HTTP support binding to IPV6 address 3. BRPC and HTTP support visiting IPV6 Services	2023-02-14 21:43:10 +08:00
yiguolei	be9385d40a	[improvement](lock raii) use raii to lock and unlock (#16652 ) * [improvement](lock raii) use raii to lock and unlock This is part of exception safe: #16366. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-02-13 14:06:36 +08:00
Pxl	5e4bb98900	[Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290 ) enable -Wpedantic and update lowest gcc version to 11.1	2023-02-03 11:28:48 +08:00
HappenLee	7c145faa80	[Enhance] use fast_float::from_chars to do str cast to float/double to avoid lose precision (#16190 )	2023-02-01 23:53:34 +08:00
yiguolei	90b12143a3	[refactor](remove unused code) remove runtime tuple structure and useless utils class (#16237 )	2023-01-30 16:45:14 +08:00
yiguolei	adb758dcac	[refactor](remove non vec code) remove json functions string functions match functions and some code (#16141 ) remove json functions code remove string functions code remove math functions code move MatchPredicate to olap since it is only used in storage predicate process remove some code in tuple, Tuple structure should be removed in the future. remove many code in collection value structure, they are useless	2023-01-26 16:21:12 +08:00
yiguolei	615a5e7b51	[refactor](remove non vec code) remove non vec functions and AggregateInfo (#16138 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-25 12:53:05 +08:00
yiguolei	6e8eedc521	[refactor](remove unused code) remove storage buffer and orc reader (#16137 ) remove olap storage byte buffer remove orc reader remove time operator remove read_write_util remove aggregate funcs remove compress.h and cpp remove bhp_lib Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-24 22:29:32 +08:00
ZhaoChangle	199d7d3be8	[Refactor]Merged string_value into string_ref (#15925 )	2023-01-22 16:39:23 +08:00
luozenglin	b1fb1277dd	[fix](bitmap) fix bitmap iterator comparison error (#15779 ) Fix the bug that bitmap.begin() == bitmap.end() is always true when the bitmap contains a single value.	2023-01-13 11:37:07 +08:00
Zhengguo Yang	aa0f38f864	[chore](gutil) remove some gutil files and use c++ stl instead (#15357 ) * [chore](gutil) remove some gutil files and use c++ stl instead * fix * fix	2022-12-26 21:25:09 +08:00
yiguolei	06d0035c02	[refactor](non-vec)remove schema change related non-vec code (#15313 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-12-23 18:33:04 +08:00
Gabriel	e9a201e0ec	[refactor](non-vec) delete some non-vec exec node (#15239 ) * [refactor](non-vec) delete some non-vec exec node	2022-12-22 14:05:51 +08:00
plat1ko	f3aea7f0f0	[Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744 )	2022-12-11 23:33:18 +08:00
Adonis Ling	2b6f85ab96	[chore](macOS) Fix BE UT (#14307 ) #13195 left some unresolved issues. One of them is that some BE unit tests fail. This PR fixes this issue. Now, we can run the command ./run-be-ut.sh --run successfully on macOS.	2022-11-18 10:13:38 +08:00
xy720	035657c5a1	[typo](comment) Fix a lot of spell errors in be comments (#14208 ) fix typos.	2022-11-12 16:06:15 +08:00
Xinyi Zou	0b945fe361	[enhancement](memtracker) Refactor mem tracker hierarchy (#13585 ) mem tracker can be logically divided into 4 layers: 1)process 2)type 3)query/load/compation task etc. 4)exec node etc. type includes enum Type { GLOBAL = 0, // Life cycle is the same as the process, e.g. Cache and default Orphan QUERY = 1, // Count the memory consumption of all Query tasks. LOAD = 2, // Count the memory consumption of all Load tasks. COMPACTION = 3, // Count the memory consumption of all Base and Cumulative tasks. SCHEMA_CHANGE = 4, // Count the memory consumption of all SchemaChange tasks. CLONE = 5, // Count the memory consumption of all EngineCloneTask. Note: Memory that does not contain make/release snapshots. BATCHLOAD = 6, // Count the memory consumption of all EngineBatchLoadTask. CONSISTENCY = 7 // Count the memory consumption of all EngineChecksumTask. } Object pointers are no longer saved between each layer, and the values of process and each type are periodically aggregated. other fix: In [fix](memtracker) Fix transmit_tracker null pointer because phamp is not thread safe #13528, I tried to separate the memory that was manually abandoned in the query from the orphan mem tracker. But in the actual test, the accuracy of this part of the memory cannot be guaranteed, so put it back to the orphan mem tracker again.	2022-11-08 09:52:33 +08:00
yiguolei	32fea672b0	[chore](gutil) remove some gutil macros and solve some macro conflict with brpc (#13954 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-11-07 13:39:52 +08:00
yixiutt	b136d80e1a	[enhancement](compress) reuse compression ctx and buffer (#12573 ) Reuse compression ctx and buffer. Use a global instance for every compression algorithm, and use a thread saft buffer pool to reuse compression buffer, pool size is equal to max parallel thread num in compression, and this will not be too large. Test shows this feature increase 5% of data import and compaction. Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-09-15 10:59:46 +08:00
Pxl	0ead048b93	[Enhancement](column) remove ColumnString terminating zero and add a data_version for pblock (#12456 ) 1. remove ColumnString terminating zero 2. add a data_version for pblock 3. change EncryptionMode to enum class	2022-09-14 21:25:22 +08:00
Yongqiang YANG	c3af60eff8	[fix](threadpool) threadpool schedules does not work right on concurr… (#12370 ) * [fix](threadpool) threadpool schedules does not work right on concurrent token Assuming there is a concurrent thread token whose concurrency is 2, and the 1st submit on the token is submitted to threadpool while the 2nd is not submitted due to busy. The token's active_threads is 1, then thread pool does not schedule the token. The patch fixes the problem.	2022-09-08 14:54:46 +08:00
plat1ko	db07e51cd3	[refactor](status) Refactor status handling in agent task (#11940 ) Refactor TaggableLogger Refactor status handling in agent task: Unify log format in TaskWorkerPool Pass Status to the top caller, and replace some OLAPInternalError with more detailed error message Status Premature return with the opposite condition to reduce indention	2022-08-29 12:06:01 +08:00
Xinyi Zou	1fc5515a78	[enhancement](memory) Remove unused reservation tracker (#11969 )	2022-08-24 08:49:34 +08:00
ZenoYang	288b440b14	[improvement](vectorized) Improve count distinct performance by using fastunion (#11516 ) Improve count distinct performance by using fastunion. Testing our user real data has a 10-40% performance improvement.	2022-08-16 12:18:46 +08:00
yiguolei	321107cb40	[refactor](schema change) Using tablet schema shared ptr instead of raw ptr (#11475 ) * Using tabletschema shared ptr instead of raw ptrs Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-08-05 11:04:38 +08:00
Mingyu Chen	abbf75d302	[doc][refactor](metrics) Reorganize FE and BE metrics and add document (#11307 )	2022-08-02 11:34:06 +08:00
Pxl	4e6a59df4c	[Improvement][chore] add const to all operator== (#11251 )	2022-07-27 21:46:47 +08:00
Xin Liao	eab8382b4a	[feature-wip](unique-key-merge-on-write) add the implementation of primary key index update, DSIP-018 (#11057 )	2022-07-27 14:17:56 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
Adonis Ling	e5663f9872	[Bug](array-type) Fix the core dump caused by unaligned __int128 (#11020 ) Fix the core dump caused by unaligned __int128 and change DEFAULT_ALIGNMENT	2022-07-20 16:37:27 +08:00
zhannngchen	86502b014d	[feature-wip](unique-key-merge-on-write)port IntervalTree from kudu (#10511 ) See the DISP-18:https://cwiki.apache.org/confluence/display/DORIS/DSIP-018%3A+Support+Merge-On-Write+implementation+for+UNIQUE+KEY+data+model This patch is for step 3.1 in scheduling.	2022-07-05 17:43:01 +08:00
xiepengcheng01	1d3496c6ab	[feature] support backup/restore connect to HDFS (#10081 )	2022-06-19 10:26:20 +08:00
carlvinhust2012	990a2940ca	[metric] add some metrics for cpu and memory (#9887 ) 1. add some metrics for cpu monitor; 2. add metrics for process state monitor; 3. add metrics for memory monitor; It is convenient for us to use grafana to filter through different conditions. After the added, we can find the cpu metrics like this： doris_be_cpu{device="cpu1",mode="guest_nice"} 0 doris_be_cpu{device="cpu1",mode="guest"} 0 doris_be_cpu{device="cpu1",mode="steal"} 0 doris_be_cpu{device="cpu1",mode="soft_irq"} 107168 doris_be_cpu{device="cpu1",mode="irq"} 0 doris_be_cpu{device="cpu1",mode="iowait"} 3726931 doris_be_cpu{device="cpu1",mode="idle"} 2358039214 doris_be_cpu{device="cpu1",mode="system"} 58699464 doris_be_cpu{device="cpu1",mode="nice"} 1700438 doris_be_cpu{device="cpu1",mode="user"} 54974091 we can find the memory metrics as follow： doris_be_memory_pswpin 167785 doris_be_memory_pswpout 203724 doris_be_memory_pgpgin 22308762092 doris_be_memory_pgpgout 152101956232 we also can find the process metrics as follow: doris_be_proc{mode="interrupt"} 421721020416 doris_be_proc{mode="ctxt_switch"} 2806640907317 doris_be_proc{mode="procs_running"} 8 doris_be_proc{mode="procs_blocked"} 3	2022-06-10 19:45:31 +08:00
Kang	efdb3b79a5	[feature] add zstd compression codec (#9747 ) ZSTD compression is fast with high compression ratio. It can be used to archive higher compression ratio than default Lz4f codec for storing cost sensitive data such as logs. Compared to Lz4f codec, we see zstd codec get 35% compressed size off, 30% faster at first time read without OS page cache, 40% slower at second time read with OS page cache in the following comparison test. test data: 25GB text log, 110 million rows test table: test_table(ts varchar(30), log string) test SQL: set enable_vectorized_engine=1; select sum(length(log)) from test_table be.conf: disable_storage_page_cache = true set this config to disable doris page cache to avoid all data cached in memory for test real decompression speed. test result master branch with lz4f codec result: - compressed size 4.3G - SQL first exec time(read data from disk + decompress + little computation) : 18.3s - SQL second exec time(read data from OS pagecache + decompress + little computation) : 2.4s this branch with zstd codec (hardcode enable it) result: - compressed size: 2.8G - SQL first exec time: 12.8s - SQL second exec time: 3.4s	2022-05-27 21:56:18 +08:00

1 2 3 4 5

207 Commits