doris

Author	SHA1	Message	Date
Pxl	d7e84b7ee3	[Enchancement](bitmap) optimize bitmap deserialize and remove some unused code (#37623 ) ## Proposed changes pick from #35789	2024-07-16 11:21:54 +08:00
Jerry Hu	f27ae8fa09	[fix](bitmap) incorrect type of BitmapValue with fastunion (#36834 ) (#36896 )	2024-06-28 11:29:03 +08:00
yiguolei	9dd573888a	[bugfix](stdcallonce) replace std callonce with a lock because it is not exception safe (#35126 )	2024-06-01 08:00:42 +08:00
zhiqiang	0ae1b9c70a	[chore](remove code) Remove dragonbox related (#34528 ) * Revert "[refactor](mysql result format) use new serde framework to tuple convert (#25006)" This reverts commit e5ef0aa6d439c3f9b1f1fe5bc89c9ea6a71d4019. * run buildall * MORE * FIX	2024-05-13 22:16:57 +08:00
yiguolei	8fdfbcb3c4	Revert "[Opt](func) opt the percentile func performance (#34373 ) (#34416 )" This reverts commit 509ae425e416b4779ae94eab9c2b21f9850e03c3.	2024-05-07 07:23:48 +08:00
HappenLee	509ae425e4	[Opt](func) opt the percentile func performance (#34373 ) (#34416 )	2024-05-06 20:10:35 +08:00
yujun	bc929686e3	[feature](debug point) add macro DBUG_RUN_CALLBACK (#33407 )	2024-04-11 09:31:50 +08:00
yujun	e2ad7149c3	[feature](debug point) Add handler to debug point (#33350 )	2024-04-10 16:24:13 +08:00
Mingyu Chen	ef2151ae66	[Feature-WIP](multi-catalog) Add Hive sink on BE side. (#32306 ) (#32364 ) bp #32306 Co-authored-by: Qi Chen <kaka11.chen@gmail.com>	2024-03-18 11:23:01 +08:00
plat1ko	1afdbfe723	[enhance](BE) Refactor TaskWorkerPool (#27555 )	2023-12-04 21:46:10 +08:00
Jerry Hu	3ad865fef9	[refactor](storage) Expressing the types of computation layer and storage layer in PrimitiveTypeTraits (#26191 )	2023-11-15 21:34:49 +08:00
Adonis Ling	d2eea9b3ae	[chore](macOS) Reduce the size of executables on macOS arm64 (#26894 ) Like #15641, we should reduce the size of executables on macOS arm64. Otherwise, we can not run doris_be and doris_be_test with ASAN build type on macOS arm64 now.	2023-11-14 12:21:08 +08:00
zhangstar333	74e452f19c	[bug](bitmap) fix bitmap value copy operator not call reset (#26451 ) when a empty bitmap assign to other bitmap the other bitmap should reset self firstly, and then set empty type.	2023-11-09 10:05:09 +08:00
Gabriel	6761dc4113	[coverage](test) improve test coverage (#26096 ) improve test coverage	2023-10-30 18:01:55 +08:00
Jerry Hu	cedab51676	[enhancement](UT) add unit test cases about bitmap (#25867 ) * [fix](bitmap) incorrect result of operator == * [enhancement](UT) add unit test cases about bitmap	2023-10-27 11:27:14 +08:00
yujun	2679fa4ea7	[improvement](tablet clone) furthur repair replicas should be check even if they are versions catchup (#25551 )	2023-10-26 18:14:40 +08:00
zhiqiang	e5ef0aa6d4	[refactor](mysql result format) use new serde framework to tuple convert (#25006 )	2023-10-14 19:46:42 +08:00
yujun	73c3e3ab55	[Feature](x-load) support config min replica num for loading data (#21118 )	2023-10-11 21:07:35 +08:00
bobhan1	642e5cdb69	[Fix](Status) Make `Status` `[[nodiscard]]` and handle returned `Status` correctly (#23395 )	2023-09-29 22:38:52 +08:00
yujun	8679095e5c	[feature](debug) support debug point used in debug code (#24502 )	2023-09-25 17:56:12 +08:00
zhangdong	dbb9365556	[Enhance](ip)optimize priority_ network matching logic for be (#23795 ) Issue Number: close #xxx If the user has configured the wrong priority_network, direct startup failure to avoid users mistakenly assuming that the configuration is correct If the user has not configured p_ n. Select only the first IP from the IPv4 list, rather than selecting from all IPs, to avoid users' servers not supporting IPv4 extends #23784	2023-09-11 18:32:31 +08:00
TengJianPing	2f8b075b71	[improvement](bitmap) support version for ser/deser of bitmap (#23959 )	2023-09-07 09:55:29 +08:00
Pxl	a96adc01aa	[Chore](function) refactor of quantile_state (#23862 ) refactor of quantile_state	2023-09-06 15:39:19 +08:00
zzzzzzzs	765f1b6efe	[Refactor](load) Extract load public code (#22304 )	2023-07-29 12:56:31 +08:00
HHoflittlefish777	9e16c69925	[improvement](compression) support LZ4_HC algorithm and parse LZ4_RAW (#22165 )	2023-07-26 18:23:39 +08:00
Mingyu Chen	4b15185e25	[improvement](hdfs) add parquet footer cache and hdfs file handle cache (#20544 ) 1. Add hdfs file handle cache for hdfs file reader Copied from Impala, `https://github.com/apache/impala/blob/master/be/src/util/lru-multi-cache.h`. (Thanks for the Impala team) This is a lru cache that can store multi entries with same key. The key is build with {file name + modification time} The value is the hdfsFile pointer that point to a certain hdfs file. This cache is to avoid reopen same hdfs file mutli time, which can save query time. Add a BE config `max_hdfs_file_handle_cache_num` to limit the max number of file handle cache, default is 20000. 2. Add file meta cache The file meta cache is a lru cache. the key is {file name + modification time}, the value is the parsed file meta info of the certain file, which can save the time of re-parsing file meta everytime. Currently, it is only used for caching parquet file footer. The test show that is cache is hit, the `FileOpenTime` and `ParseFooterTime` is reduce to almost 0 in query profile, which can save time when there are lots of files to read.	2023-06-13 15:13:57 +08:00
Kang	bd9a9a32f5	[bugfix](s3 fs) fix s3 uri parsing for http/https uri (#20656 )	2023-06-11 14:00:04 +08:00
Pxl	7dc7ed97eb	[Chore](build) remove some unused code and remove some wno (#20326 ) remove some unused code about spinlock remove some wno and fix warning remove varadic macro usage	2023-06-05 10:48:07 +08:00
Jerry Hu	c03a19ea23	[improvement](bitmap) Using set to store a small number of elements to improve performance (#19973 ) Test on SSB 100g: select lo_suppkey, count(distinct lo_linenumber) from lineorder group by lo_suppkey; exec time: 4.388s create materialized view: create materialized view customer_uv as select lo_suppkey, bitmap_union(to_bitmap(lo_linenumber)) from lineorder group by lo_suppkey; select lo_suppkey, count(distinct lo_linenumber) from lineorder group by lo_suppkey; exec time: 12.908s test with the patch, exec time: 5.790s	2023-05-31 16:13:42 +08:00
Jerry Hu	e5eed53b89	[improvement](bitmap) Use shared_ptr in BitmapValue to avoid deep copying (#19101 ) Currently bitmapvalue type is copied between columns, it cost a lot of memory. Use a shared ptr in bitmap value to avoid copy data.	2023-05-24 16:13:01 +08:00
ZhangYu0123	1c950d6930	[fix](config) fix memory config enable_query_memroy_overcommit spell problem #19898	2023-05-22 00:32:20 +08:00
Pxl	9b7a419aed	[Chore](build) update some doc about build enviroment (#19325 ) update some doc about build enviroment	2023-05-10 16:18:44 +08:00
Adonis Ling	16a394da0e	[chore](build) Use include-what-you-use to optimize includes (PART III) (#18958 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-24 14:51:51 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
Xinyi Zou	d9fe5f7b67	[enhancement](memory) Remove MemPool and replace it with Arena (#17820 ) Arena can replace MemPool in most scenarios. Except for memory reuse, MemPool supports reuse of previous memory chunks after clear, but Arena does not. Some comparisons between MemPool and Arena: 1. Expansion Arena is less than 128M index 2 alloc chunk; more than 128M memory, allocate 128M * n > `size`, n is equal to the minimum value that satisfies the expression; MemPool less than 512K index 2 alloc chunk, greater than 512K memory, separately apply for a `size` length chunk After Arena applied for a chunk larger than 128M last time, the minimum chunk applied for after that is 128M. Does this seem to be a waste of memory? MemPool is also similar. After the chunk of 512K was applied for last time, the minimum chunk of subsequent applications is 512K. 2. Alignment MemPool defaults to 16 alignment, because memtable and other places that use int128 require 16 alignment; Arena has no default alignment; 3. Memory reuse Arena only supports `rollback`, which reuses the memory of the current chunk, usually the memory requested last time. MemPool supports clear(), all chunks can be reused; or call ReturnPartialAllocation() to roll back the last requested memory; if the last chunk has no memory, search for the most free chunk for allocation 4. Realloc Arena supports realloc contiguous memory; it also supports realloc contiguous memory from any position at the time of the last allocation. The difference between `alloc_continue` and `realloc` is: 1. Alloc_continue does not need to specify the old size, but the default old size = head->pos - range_start 2. alloc_continue supports expansion from range_start when additional_bytes is between head and pos, which is equivalent to reusing a part of memory, while realloc completely allocates a new memory MemPool does not support realloc, but supports transferring or absorbing chunks between two MemPools 5. check mem limit MemPool checks the mem limit, and Arena checks at the Allocator layer. 6. Support for ASAN Arena does something extra 7. Error handling MemPool supports returning the error message of application failure directly through `Status`, and Arena throws Exception. Tests that Arena can consider 1. After the last applied chunk is larger than 128M, the minimum applied chunk is 128M, which seems to waste memory; 2. Support clear, memory multiplexing; 3. Increase the large list, alloc the memory larger than 128M, and the size is equal to `size`, so as to avoid the current chunk not being fully used, which is wasteful. 4. In some cases, it may be possible to allocate backwards to find chunks t	2023-03-29 20:56:49 +08:00
Mingyu Chen	05db6e9b55	[refactor](file-system)(step-2) remove env, file_utils and filesystem_utils (#18009 ) Follow #17586. This PR mainly changes: Remove env/ Remove FileUtils/FilesystemUtils Some methods are moved to LocalFileSystem Remove olap/file_cache Add s3 client cache for s3 file system In my test, the time of open s3 file can be reduced significantly Fix cold/hot separation bug for s3 fs. This is the last PR of #17764. After this, all IO operation should be in io/fs. Except for tests in #17586, I also tested some case related to fs io: clone concurrency query on local/s3/hdfs load error log create and clean disk metrics	2023-03-29 09:00:52 +08:00
zclllyybb	5191b4f473	[fix](ut)support run be-ut on release mode (#18119 ) Fixed improper usage. So now be ut could be run on release mode. btw, split be build type environment variable to be/be-ut.	2023-03-27 23:00:03 +08:00
Mingyu Chen	cb79e42e5c	[refactor](file-system)(step-1) refactor file sysmte on BE and remove storage_backend (#17586 ) See #17764 for details I have tested: - Unit test for local/s3/hdfs/broker file system: be/test/io/fs/file_system_test.cpp - Outfile to local/s3/hdfs/broker. - Load from local/s3/hdfs/broker. - Query file on local/s3/hdfs/broker file system, with table value function and catalog. - Backup/Restore with local/s3/hdfs/broker file system Not test: - cold & host data separation case.	2023-03-21 21:08:38 +08:00
yiguolei	4692d6764c	[refactor](remove string val) remove string val structure, it is same with string ref (#17461 ) remove stringval, decimalv2val, bigintval	2023-03-08 10:42:20 +08:00
yiguolei	9477c48ef8	[refactor](functioncontext) remove duplicate type definition in function context (#17421 ) remove duplicate type definition in function context remove unused method in function context not need stale state in vexpr context because vexpr is stateless and function context saves state and they are cloned. remove useless slot_size in all tuple or slot descriptor. remove doris_udf namespace, it is useless. remove some unused macro definitions. init v_conjuncts in vscanner, not need write the same code in every scanner. using unique ptr to manage function context since it could only belong to a single expr context. Issue Number: close #xxx --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-06 16:07:09 +08:00
Adonis Ling	3b94ca5ceb	[chore](macOS) Use LLVM Clang by default (#17292 ) Use LLVM Clang by default	2023-03-03 14:18:02 +08:00
ZhaoChangle	e82b827bc8	[optimize](vectorization)Optimize to_string's performance. (#17076 )	2023-03-03 10:35:59 +08:00
zhengshengjun	d013d529c8	[Feature](ipv6)Support IPV6 (#14063 ) Support IPV6 in Apache Doris, the main changes are: 1. enable binding to IPV6 address if network priority in config file contains an IPV6 CIDR string 2. BRPC and HTTP support binding to IPV6 address 3. BRPC and HTTP support visiting IPV6 Services	2023-02-14 21:43:10 +08:00
yiguolei	be9385d40a	[improvement](lock raii) use raii to lock and unlock (#16652 ) * [improvement](lock raii) use raii to lock and unlock This is part of exception safe: #16366. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-02-13 14:06:36 +08:00
Pxl	5e4bb98900	[Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290 ) enable -Wpedantic and update lowest gcc version to 11.1	2023-02-03 11:28:48 +08:00
HappenLee	7c145faa80	[Enhance] use fast_float::from_chars to do str cast to float/double to avoid lose precision (#16190 )	2023-02-01 23:53:34 +08:00
yiguolei	90b12143a3	[refactor](remove unused code) remove runtime tuple structure and useless utils class (#16237 )	2023-01-30 16:45:14 +08:00
yiguolei	adb758dcac	[refactor](remove non vec code) remove json functions string functions match functions and some code (#16141 ) remove json functions code remove string functions code remove math functions code move MatchPredicate to olap since it is only used in storage predicate process remove some code in tuple, Tuple structure should be removed in the future. remove many code in collection value structure, they are useless	2023-01-26 16:21:12 +08:00
yiguolei	615a5e7b51	[refactor](remove non vec code) remove non vec functions and AggregateInfo (#16138 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-25 12:53:05 +08:00
yiguolei	6e8eedc521	[refactor](remove unused code) remove storage buffer and orc reader (#16137 ) remove olap storage byte buffer remove orc reader remove time operator remove read_write_util remove aggregate funcs remove compress.h and cpp remove bhp_lib Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-24 22:29:32 +08:00

1 2 3 4 5

233 Commits