doris

Author	SHA1	Message	Date
Pxl	bfa6bc3b0a	[fix](function) fix aggregate function min() at type varchar (#7437 )	2021-12-24 21:27:01 +08:00
Pxl	6d1cf599f8	[fix] DCHECK fail at BitmapValue getSizeInBytes (#7430 )	2021-12-24 21:23:58 +08:00
shee	3ba6dcf236	[fix](function) fix round function for inaccuracy (#7421 )	2021-12-24 21:23:11 +08:00
Pxl	ff5a0e98b0	[improvement](planner) make BinaryPredicate do not cast date to datetime/varchar (#7045 )	2021-12-24 21:22:43 +08:00
pengxiangyu	20ef8a6e21	[feature-wip](remote storage)(step1) use a struct instead of string for parameter path, add basic remote method (#7098 ) For the first, we need to make a parameter to discribe the data is local or remote. At then, we need to support some basic function to support the operation for remote storage.	2021-12-22 22:58:23 +08:00
Mingyu Chen	97749ed85b	[community][chore] Modify .asf.yaml and fix BE build warning (#7439 )	2021-12-21 11:06:12 +08:00
Zhengguo Yang	2d72c039ad	[deps](openssl) upgrade openssl to 1.1.1m (#7446 ) upgrade openssl to 1.1.1m, ready for support SM2 / SM3 / SM4 national secret (national commercial password) algorithm	2021-12-21 10:09:36 +08:00
Mingyu Chen	f6e598dca2	Revert "[improvement](reader) optimize for single rowset reading (#7351 )" (#7427 ) Reverts apache/incubator-doris#7351 This commit will cause wrong result with agg table. For example, an agg table `(k1, k2, v1 sum)` with single non-overlapping rowset `select count(k1) from tbl1;` should using `_direct_agg_key_next_row` instead of `_agg_key_next_row`. Otherwise it return less rows than expected.(because `_agg_key_next_row` will only do aggregation with `k1`)	2021-12-19 18:31:11 +08:00
caiconghui	06c38ce46e	[enhancement] Make concurrent_number for routine load task can be larger than be num (#7386 ) * [enhancement] Make concurrent_number for routine load task can be larger than be num Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2021-12-17 11:04:29 +08:00
Xinyi Zou	7d4da7af5c	[fix](rpc) fix BE crash in SendRpcResponse when high concurrency (#7413 ) The response is accessed when done->Run is called in transmit_data(), give response a default value to avoid null pointers in high concurrency.	2021-12-16 20:27:24 +08:00
Mingyu Chen	0499b2211b	[feat](lateral-view) Support execution of lateral view stmt (#7255 ) 1. Add table function node 2. Add 3 table functions: explode_split, explode_bitmap and explode_json_array	2021-12-16 10:46:15 +08:00
Mingyu Chen	2b90967c4c	[fix][refactor](broker load) refactor the scheduling logic of broker load (#7371 ) 1. Refactor the scheduling logic of broker load. Details see #7367 2. Fix bug that loadedBytes in SHOW LOAD result is wrong. 3. Cancel the thread of LoadTimeoutChecker Now for PENDING load jobs, there will be no timeout. And the timeout of a load job start when pending load task is scheduled. 4. Fix a bug that the loading task is never submitted to the pool. The logic of BlockedPolicy is wrong. We should make sure the task is submitted to the pool, or the RejectedExecutionException should be thrown. 5. Now the transaction of a load job will begin in pending task, instead of when submitting the job.	2021-12-16 10:39:22 +08:00
HappenLee	4afdcdb939	[performance](reader) Opt the unique reader to reduce unnecessary compare and function call (#7348 )	2021-12-16 10:36:43 +08:00
zhoubintao	85521944dd	[refactor](olap-scan-node) Refactor olap scannode (#7131 ) 1. Delete useless variables 2. Add const modifier for read-only function 3. Delete the empty destructor, the compiler will automatically generate it, refer to the 3/5/0 rule: [https://en.cppreference.com/w/cpp/language/rule_of_three] 4. It is recommended to add the override keyword (instead of the virtual keyword) to the subclass virtual function. Override will let the compiler help check and improve security. This is also the reason why C++11 introduces override	2021-12-16 10:33:41 +08:00
Zhengguo Yang	926540c561	[feature] Support return bitmp/hll data in select statement (#7276 ) Support return bitmp/hll data in select statement, this can be used when set show_object_data=true;	2021-12-15 09:48:27 +08:00
EmmyMiao87	d9c927fdc6	[improvement](log)(schema change) Add a clear memory description in the log (#7378 ) If the memory exceeds the limit when be generates a materialized view or schema change, a more detailed log about limit and configuration will be prompted..	2021-12-14 15:56:50 +08:00
HappenLee	4e02109926	[refactor][fix](constants-fold) Refactor the code of fold constant mgr and fix some undefined behavior and mem leak (#7373 ) 1. Fix some memory leaks 2. Remove redundant and invalid code 3. Fix some buggy writes to reduce extra memory copies and return null pointers to string 4. Reframing the naming to make the structure clearer	2021-12-14 15:53:56 +08:00
Dayue Gao	414c5a8b5a	[fix] LRUCache::prune_if may not remove all the entries matching the predicate (#7383 ) [fix] LRUCache::prune_if may not remove all the entries matching the predicate Co-authored-by: gaodayue <gaodayue@bytedance.com>	2021-12-13 21:09:47 +08:00
SleepyBear	e0889aee1e	[typo](load) correct the error of ‘EtlJobMgr::get_job_status’ function (#7353 )	2021-12-11 16:54:25 +08:00
GoGoWen	5745adb26c	[improvement](reader) optimize for single rowset reading (#7351 ) read single rowset without do aggregation when reading all columns, and otherwise should use `_agg_key_next_row`	2021-12-11 16:53:56 +08:00
thinker	80c11da3df	[refactor] modify the implements of Tuple & RowBatch (#7319 ) code refactor: improve code's readability, avoid const_cast 1. make loop simpler and clearer by using range-based loop grammar, it's safer than old loop style 2. iteration for _row_desc.tuple_descriptors() use index replace index and iterator mixed 3. add new function To cast_to(From from), use this union-based casting between two types to replace reinterpret_cast, this new cast is more readable 4. avoid using the same variable name for nested loop, it's dangerous 5. add const keyword for member functions followed CppCoreGuidelines	2021-12-09 22:36:37 +08:00
Mingyu Chen	db57c42c83	[improvement](compaction)(tablet repair) Add missing rowsets in compaction status url and support force dropping redundant replica (#7283 ) 1. Add missing rowsets in compaction status url 2. Add a new config `force_drop_redundant_replica` to force drop redundant replicas. 3. Fix FE ut	2021-12-09 22:34:57 +08:00
weizuo93	6f91741628	[Bug]Fix BE coredump when manual compaction task is triggered (#7260 ) * fix compaction action bug Co-authored-by: weizuo <weizuo@xiaomi.com>	2021-12-08 17:10:34 +08:00
Zhengguo Yang	62d12067aa	[feature](udf) make orthogonal bitmap udaf as build in functions (#7211 ) move orthogonal bitmap udaf as build in functions add three buildin bitmap functions: - orthogonal_bitmap_intersect - orthogonal_bitmap_intersect_count - orthogonal_bitmap_union_count	2021-12-07 09:57:26 +08:00
thinker	f9be31d4bc	[refactor](rowbatch) make RowBatch better (#7286 ) 1. add const keyword for RowBatch's read-only member functions 2. should use member object rather than member object pointer as possible as you can	2021-12-06 10:31:43 +08:00
thinker	8a6528a2fb	[fix](executor) set the length of StringValue to 0 when it is null (#7284 ) the tuple String Slot's ptr and len are not assigned appropriately on send side, the receive side may crash in some situation. detail description: on send side, when we call RowBatch::serialize(PRowBatch* output_batch) to pack RowBatch, the Tuple::deep_copy() will be called, for each String Slot, only String Slots that is not null will set ptr and len with proper value, the null String Slots will keep original status, the ptr member will point randomly and the len member may unexpect. on recv side, unpack is processed by RowBatch::RowBatch(const RowDescriptor&, const PRowBatch&...), in this function, each String Slot will transfer offset to valid string_val->ptr whether the String Slot is null or not. but some business logic depends on string_val->len=0, such as AggregateFuncTraits::init(), HyperLogLog::deserialize() will return correctly if slice.size<=0. so if string_val->len is set to 0 in send side, everything will be ok, otherwise server may crash. by netcomm viewpoint, we should make sure transfer correct data, it's sender's responsibility to set data with proper value, and do not make any presume which way the recv side will use it.	2021-12-06 10:30:26 +08:00
HappenLee	d3316ff567	[performance](function) Support SIMD function in some string function (#7236 ) Support SIMD function in some string function：lrtim，rtrim，trim，reverse，hex	2021-12-06 10:24:26 +08:00
Xinyi Zou	fc9e502b51	[improvement](brpc)(config) Support transfer RowBatch in Controller Attachment (#7164 ) Transfer RowBatch in Protobuf Request to Controller Attachment, when the maximum length of the RowBatch in the Protobuf Request is exceeded. This can avoid reaching the upper limit of the Protobuf Request length (2G), and it is expected that performance can be improved.	2021-12-02 11:41:38 +08:00
xinghuayu007	dd36ccc3bf	[feature](storage-format) Z-Order Implement (#7149 ) Support sort data by Z-Order: ``` CREATE TABLE table2 ( siteid int(11) NULL DEFAULT "10" COMMENT "", citycode int(11) NULL COMMENT "", username varchar(32) NULL DEFAULT "" COMMENT "", pv bigint(20) NULL DEFAULT "0" COMMENT "" ) ENGINE=OLAP DUPLICATE KEY(siteid, citycode) COMMENT "OLAP" DISTRIBUTED BY HASH(siteid) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "data_sort.sort_type" = "ZORDER", "data_sort.col_num" = "2", "in_memory" = "false", "storage_format" = "V2" ); ```	2021-12-02 11:39:51 +08:00
Zhengguo Yang	d8ba6e3eb6	1. Fix an error when fetch string type field may cause malform packet error. (#7262 ) This is beacuse of an const MAX_PHYSICAL_PACKET_LENGTH in fe should be 2^24 -1, but it is set as 2^24 -2 by mistake. 2. Fix bitmap_to_string may failed when the result is large than 2G	2021-12-01 10:02:34 +08:00
Mingyu Chen	6c4aeab06f	[fix](broker-load) BE may crash when using preceding filter in broker or routine load (#7193 ) The broker scan node has two tuple descriptors: One is dest tuple and the other is src tuple. The src tuple is used to read the lines of the original file, and the dest tuple is used to save the converted lines. The preceding filter is executed on the src tuple, so src tuple descriptor should be used to initialize the filter expression	2021-11-30 22:04:05 +08:00
HappenLee	91a3150910	[fix](reader) Fix the bug that reader call _capture_rs_readers function twice (#7224 )	2021-11-26 10:17:33 +08:00
曹建华	948a2a738d	[performance] Improve DeltaWriter's performance. (#7216 ) 1. Support batch write for DeltaWriter. 2. Use mutex instead of SpinLock.	2021-11-26 10:15:27 +08:00
Hao Tan	a1bf2878c0	[feat-opt](json-function) optimize get_json_xx function (#7157 ) Avoid repeated parsing json string is the first parameter of function is constant.	2021-11-26 10:12:55 +08:00
Pxl	2445f10868	[fix](bitmap-function) fix core dump at some bitmap function (#7221 )	2021-11-25 22:52:50 +08:00
Zhengguo Yang	c9e578032b	optimize bitmap function count, use roaring cardinality method, this will more fast than current version (#7151 )	2021-11-24 14:42:48 +08:00
HappenLee	fb5adaf18e	[fix](mem-tracker) Fix mem limit -1 in partition aggregate node (#7181 ) Make error message more clear.	2021-11-24 10:43:35 +08:00
Pxl	3fcb3db57a	[fix](vectorized-engine) fix core when enable_vectorized_engine open (#7159 )	2021-11-24 10:42:12 +08:00
Pxl	a74fdf184c	[refactor](be) refactor predicate function creator (#7054 ) Refactor predicate function creator, make MinMaxFunction/HybridSet/BloomFilter use a unified interface through template to get function.	2021-11-24 10:39:29 +08:00
Zhengguo Yang	d420ff0afd	display current load bytes to show load progress, (#7134 ) this value may greate than the file size when loading parquert or orc file, will less than file size when loading csv file.	2021-11-24 10:08:32 +08:00
Zhengguo Yang	e2d3d0134e	dd a method to get doris current memory usage (#6979 ) Add all memory usage check when TryConsume memory	2021-11-24 10:07:54 +08:00
Xinyi Zou	ad0d2b82ab	[fix](memory) fix bug that ~BitShufflePageDecoder destroys uninitialized chunk (#7172 ) Added a safe way to destroy Chunk.	2021-11-23 15:24:25 +08:00
xy720	836c95c2ca	[feat](memory-track) Print peak memory use of all backend after query in audit log (#7030 ) Add a new field `peakMemoryBytes` in fe.audit.log	2021-11-22 14:46:08 +08:00
thinker	fcd4f0b5c2	[fix](profile) fix some bugs about ReportProfile on BE (#7144 ) 1. setting _report_thread_active to false is not necessary protected by _report_thread_lock, because _report_thread_active's type is bool, writing data is multi-threadly safety if size <= marchine word length 2. report_profile thread terminates early is possiable, in the function report_profile(), while (_report_thread_active) may break if _report_thread_active is false, the thread of calling open() may be scheduled out between _report_thread_started_cv.wait(l) and _report_thread_active = true, we should not assume that how long time elapsed between a thread be scheduled twice	2021-11-20 21:43:57 +08:00
Mingyu Chen	a81f4da4e4	[feat](minidump) Add minidump support (#7124 ) Now minidump file will be created when BE crashes. And user can manually trigger a minidump by sending SIGUSR1 to BE process. More details can be found in minidump.md documents	2021-11-20 21:41:26 +08:00
Zhengguo Yang	52ebb3d8f5	[feat](mysql-compatibility) Increase compatibility with mysql (#7041 ) Increase compatibility with mysql 1. Added two system tables files and partitions 2. Improved the return logic of mysql error code to make the error code more compatible with mysql 3. Added lock/unlock tables statement and show columns statement for compatibility with mysql dump 4. Compatible with mysqldump tool, now you can use mysql dump to dump data and table structure from doris now use mysqldump may print error message like ``` $ mysqldump -h127.0.0.1 -P9130 -uroot test_query_qa > a mysqldump: Error: 'errCode = 2, detailMessage = select list expression not produced by aggregation output (missing from GROUP BY clause?): `EXTRA`' when trying to dump tablespaces ``` This error message not effect the export file, you can add `--no-tablespaces` to avoid this error	2021-11-20 21:39:37 +08:00
Xinyi Zou	f5a35c28e9	[Optimize] [Memory] BitShufflePageDecoder use memory allocated by ChunkAllocator instead of Faststring (#6515 ) BitShufflePageDecoder reuses the memory for storing decoder results, allocate memory directly from the `ChunkAllocator`, the performance is improved to a certain extent. In the case of #6285, the total time consumption is reduced by 13.5%, and the time consumption ratio of `~Reader()` has also been reduced from 17.65% to 1.53%, and the memory allocation is unified to `ChunkAllocator` for centralized management , Which is conducive to subsequent memory optimization. which can avoid the memory waste caused by `Mempool`, because the chunk can be free at any time, but the performance is lower than the allocation from `Mempool`. The guess is that there is no `Mempool` after secondary allocation of large chunks , Will directly apply for a large number of small chunks from `ChunkAllocator`, and it takes longer to lock in `pop_free_chunk` and `push_free_chunk` (but this is not proven from the flame graphs of BE's cpu and contention).	2021-11-17 11:20:21 +08:00
Zhengguo Yang	6c6380969b	[refactor] replace boost smart ptr with stl (#6856 ) 1. replace all boost::shared_ptr to std::shared_ptr 2. replace all boost::scopted_ptr to std::unique_ptr 3. replace all boost::scoped_array to std::unique<T[]> 4. replace all boost:thread to std::thread	2021-11-17 10:18:35 +08:00
Zhengguo Yang	4bc5ba8819	mark the load job fail when more than a half of replica write failed of a tablet, (#7126 ) the code before is counting all replica has more than a half write failed.	2021-11-17 10:18:04 +08:00
Mingyu Chen	dcad6ff5e5	[License] Add License header for missing files (#7130 ) 1. Add License header for missing files 2. Modify the spark pom.xml to correct the location of `thrift`	2021-11-16 18:37:54 +08:00

1 2 3 4 5 ...

1550 Commits