doris

Author	SHA1	Message	Date
Pxl	be3d203289	[feature][vectorized] support table function explode_numbers() (#8509 )	2022-03-22 11:38:00 +08:00
yiguolei	989e03ddf9	[improvement] Improve sig handler (#8545 ) * Refactor glog's default signal handler Co-authored-by: Zhengguo Yang <780531911@qq.com>	2022-03-22 10:40:31 +08:00
caiconghui	905b9a6289	[fix](lru_cache) fix heap-use-after-free problem for lru cache(#8569 )	2022-03-21 21:23:43 +08:00
Mingyu Chen	04004021b5	[chore] Separate debugging information from BE binaries (#8544 ) Currently, the compiled output of BE mainly consists of two binaries: palo_be and meta_tool, which are both around 1.6G in size. However, the debug information is only needed for debugging purposes. So I separate the debug info from binaries. After BE is built, the debug info file will be saved in `be/lib/debug_info/` dir. `palo_be` and `meta_tool`'s size decrease to about 100MB This is optional, and default is disabled. To enable it, use: `STRIP_DEBUG_INFO=ON sh build.sh`	2022-03-21 16:33:01 +08:00
Zhengguo Yang	7c1c2b1d17	[chore] fix compile error when use clang as compiler and a be ut problem (#8554 )	2022-03-21 15:38:59 +08:00
yiguolei	337d174c14	[Refactor](schema_change) Remove tablet instances since tablet id is unique between base tablet and new schema change tablet (#8486 )	2022-03-21 12:43:54 +08:00
minghong	c772020db4	[fix] fix bug in WindowFunctionLastData::data, it keeps the first data not the last. (#8536 ) WindowFunctionLastData::add should keep the last value, but current implementation keeps the first one. Obviously, this code is copied from WindowFunctionFirstData::add.	2022-03-21 09:51:56 +08:00
Pxl	fc3ad371c8	[fix](vec) fix regexp_replace get wrong result on clang (#8505 )	2022-03-20 23:11:24 +08:00
Xinyi Zou	eeae516e37	[Feature](Memory) Hook TCMalloc new/delete automatically counts to MemTracker (#8476 ) Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G Implement a new way of memory statistics based on TCMalloc New/Delete Hook, MemTracker and TLS, and it is expected that all memory new/delete/malloc/free of the BE process can be counted.	2022-03-20 23:06:54 +08:00
ZenoYang	2ec0b81030	[improvement](storage) Low cardinality string optimization in storage layer (#8318 ) Low cardinality string optimization in storage layer	2022-03-20 23:04:25 +08:00
Zhengguo Yang	58a4c70fd4	[fix] fix String type comapaction or agg may crash when string is null (#8515 )	2022-03-18 11:27:28 +08:00
morrySnow	4da1718147	[fix] memory leak in ResourceTls (#8517 )	2022-03-18 09:42:19 +08:00
yinzhijian	94991864f5	[fix] Fix bug that __set_ missing for thrift optional fields in be (#8507 )	2022-03-18 09:41:06 +08:00
Zhengguo Yang	035ca5240f	[fix] Fix may coredump when check if all rowset is beta-rowset of a tablet (#8503 ) core dump like ``` * Aborted at 1647468467 (unix time) try "date -d @1647468467" if you are using GNU date * PC: @ 0x5555576940b0 doris::OlapScanNode::start_scan_thread() * SIGSEGV (@0x84) received by PID 39139 (TID 0x7ffee8388700) from PID 132; stack trace: * @ 0x555558926212 google::(anonymous namespace)::FailureSignalHandler() @ 0x7ffff753d400 (unknown) @ 0x5555576940b0 doris::OlapScanNode::start_scan_thread() @ 0x555557696e1b doris::OlapScanNode::start_scan() @ 0x55555769737d doris::OlapScanNode::get_next() @ 0x5555570784f5 doris::PlanFragmentExecutor::get_next_internal() @ 0x55555707d24c doris::PlanFragmentExecutor::open_internal() @ 0x55555707e72f doris::PlanFragmentExecutor::open() @ 0x555556ffab95 doris::FragmentExecState::execute() @ 0x555556fff0ed doris::FragmentMgr::_exec_actual() @ 0x5555570088ec std::_Function_handler<>::_M_invoke() @ 0x55555719a099 doris::ThreadPool::dispatch_thread() @ 0x555557193a8f doris::Thread::supervise_thread() @ 0x7ffff72f2ea5 start_thread @ 0x7ffff76058dd __clone @ 0x0 (unknown) ```	2022-03-18 09:39:13 +08:00
Mingyu Chen	b07b840b76	[fix](load) fix bug that BE may crash when calling `mark_as_failed` (#8501 ) 1. The methods in the IndexChannel are called back in the RpcClosure in the NodeChannel. However, this callback may occur after the whole task is finished (e.g. due to network latency), and by that time the IndexChannel may have been destructured, so we should not call the IndexChannel methods anymore, otherwise the BE will crash. Therefore, we use the `_is_closed` variable and `_closed_lock` to ensure that the RPC callback function will not call the IndexChannel's method after the NodeChannel is closed. 2. Do not add IndexChannel to the ObjectPool. Because when deconstruct IndexChannel, it may call the deconstruction of NodeChannel. And the deconstruction of NodeChannel maybe time consuming(wait rpc finished). But the ObjectPool will hold a SpinLock to destroy the objects, so it may cause CPU busy.	2022-03-18 09:38:16 +08:00
dataroaring	25cdd0be1a	[refactor] CalcPageLenForRow return void rather than always Status::Ok (#8490 ) Thus we can remove branches depending on CalcPageLenForRow.	2022-03-18 09:34:49 +08:00
Pxl	a8af8d2981	[fix](vectorized) fix core dump on get_json_string and add some ut (#8496 )	2022-03-17 10:08:31 +08:00
Zhengguo Yang	848acec584	[chore](dependency) update Croaring for good performance (#8492 ) update Croaring for good performance, according to RoaringBitmap/CRoaring#320	2022-03-17 10:07:55 +08:00
ZenoYang	b537e06ecd	[improvement](vectorized) Make bloom filter predicate run short-circuit logic (#8484 ) The current BloomFilter runs vectorization predicate evaluate, but `evaluate_vec` interface is not implemented, so the RuntimeFilter does not play a role after it is pushed down to the storage layer. And BF predicate computation cannot be automatically vectorized, thus making BloomFilter run short-circuit logic. For SSB Q2.1，`enable_storage_vectorization = true;` ``` test before impl: - Total: 36s164ms - RowsVectorPredFiltered: 0 - RealRuntimeFilterType: bloomfilter - HasPushDownToEngine: true test after impl: - Total: 2s345ms - RowsVectorPredFiltered: 595.247102M (595247102) - RealRuntimeFilterType: bloomfilter - HasPushDownToEngine: true ```	2022-03-17 10:07:30 +08:00
Pxl	a824c3e489	[feature](vectorized) support lateral view (#8448 )	2022-03-17 10:04:24 +08:00
wangbo	b8e6c3a00c	[fix] fix bitmap wrong result (#8478 ) Fix a bug when query bitmap return wrong result, even the simplest query. Such as ``` CREATE TABLE `pv_bitmap_fix2` ( `dt` int(11) NULL COMMENT "", `page` varchar(10) NULL COMMENT "", `user_id_bitmap` bitmap BITMAP_UNION NULL COMMENT "" ) ENGINE=OLAP AGGREGATE KEY(`dt`, `page`) COMMENT "OLAP" DISTRIBUTED BY HASH(`dt`) BUCKETS 2 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2" ) Insert any hundreds of rows of data select count(distinct user_id_bitmap) from pv_bitmap_fix2 the result is wrong ``` This is a bug of vectorization of storage layer.	2022-03-16 11:39:41 +08:00
HappenLee	d39c021d71	[fix] min function of not null varchar column get error result (#8479 )	2022-03-16 11:38:55 +08:00
camby	3ba4de0d27	[fix](ut) fix some UT compile or run failed cases (#8489 )	2022-03-16 11:38:35 +08:00
caiconghui	c666eaadfd	[fix] Fix some mistakes for ReadWriteLock in be (#8464 )	2022-03-15 11:45:00 +08:00
zhannngchen	febfe2f09d	[improvement](ut) add unit tests for min/max function, and cleaned up some unused code (#8458 )	2022-03-15 11:43:18 +08:00
HappenLee	41a15ccd45	[fix](vectorized) Agg/Unique not null column outer join coredump (#8461 )	2022-03-14 10:52:17 +08:00
caiconghui	991dc7fc5c	[fix][routine-load] fix bug that routine load cannot cancel task when append_data return error (#8457 )	2022-03-14 10:18:14 +08:00
Kang	e807e8b108	[improvement](memory) fix olap table scan and sink memory usage problem (#8451 ) Due to unlimited queue in OlapScanNode and NodeChannel, memory usage can be very large for reading and writing large table, e.g 'insert into tableB select * from tableA'.	2022-03-13 22:12:15 +08:00
awakeljw	705989d239	[improvement](VHashJoin) add probe timer (#8233 )	2022-03-13 20:54:44 +08:00
HappenLee	2c63fc1d6c	[improvement](vectorized) Support BetweenPredicate enable fold const expr (#8450 )	2022-03-13 09:36:24 +08:00
Mingyu Chen	5f8e948125	[fix] BE crash when reporting tablet (#8453 ) this bug was introduced from #8209	2022-03-12 23:12:52 +08:00
Zhengguo Yang	f3c44bcd75	[chore][fix](librdkafka) disable librdkafka assert and update some thirdparty (#8425 ) 1. comment librdkafka `rd_assert(thrd_is_current(rkb->rkb_thread));` to avoid core dump 2. upgrade arrow to 7.0.0 3. upgrade aws sdk to 1.9 4. upgrade orc to 1.7.2	2022-03-12 22:09:06 +08:00
dataroaring	a467e7a790	[refactor][fix] small fixes and code cleanups related to schema change (#8328 ) For now, usage of RowBlockAllocator::allocate is a little complicated due to its ambiguous return value. Some callers just test the return value while some test the return value and non-null pointer. This patch let it return success code only when it succeeds, then caller can just test the return value.	2022-03-12 22:05:43 +08:00
Xinyi Zou	e17aef9467	[refactor] refactor the implement of MemTracker, and related usage (#8322 ) Modify the implementation of MemTracker: 1. Simplify a lot of useless logic; 2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing; 3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes; 4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection 5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently; 6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later; 7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env; 8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.; Modify where MemTracker is used: 1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code; 2. Added trackers for global objects such as ChunkAllocator and StorageEngine; 3. Added more fine-grained trackers such as ExprContext; 4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode; 5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;	2022-03-11 22:04:23 +08:00
caiconghui	c86d469baf	[Refactor](storage_engine) Use std::shared_mutex to replace RWMutex (#8387 )	2022-03-11 18:14:24 +08:00
Mingyu Chen	ffddebfd1d	[fix](report) fix bug that tablet may already be delete when reporting (#8444 ) 1. This bug was introduced by #8209. Error in fe.warn.log: ``` java.lang.IllegalStateException: 560278 at com.google.common.base.Preconditions.checkState(Preconditions.java:508) ~[spark-dpp-0.15-SNAPSHOT.jar:0.15-SNAPSHOT] at org.apache.doris.catalog.TabletInvertedIndex.getReplica(TabletInvertedIndex.java:462) ~[palo-fe.jar:0.15-SNAPSHOT] at org.apache.doris.catalog.Catalog.replayBackendReplicasInfo(Catalog.java:6941) ~[palo-fe.jar:0.15-SNAPSHOT] at org.apache.doris.persist.EditLog.loadJournal(EditLog.java:626) [palo-fe.jar:0.15-SNAPSHOT] at org.apache.doris.catalog.Catalog.replayJournal(Catalog.java:2446) [palo-fe.jar:0.15-SNAPSHOT] at org.apache.doris.master.Checkpoint.doCheckpoint(Checkpoint.java:116) [palo-fe.jar:0.15-SNAPSHOT] at org.apache.doris.master.Checkpoint.runAfterCatalogReady(Checkpoint.java:74) [palo-fe.jar:0.15-SNAPSHOT] at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) [palo-fe.jar:0.15-SNAPSHOT] at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [palo-fe.jar:0.15-SNAPSHOT] ``` Since the reporting of a tablet and the deletion of a tablet are two independent events and are not mutually exclusive, it may happen that the tablet is deleted first and the reporting is done later. 2. Change the tablet report info. Now, the version of a tablet report from BE is the largest continuous version. Eg, versions: [1,2,3,5,7], the report version of this tablet will be 3.	2022-03-11 17:24:20 +08:00
Mingyu Chen	a76889b319	[improvement] Avoid print large string in error log (#8436 ) 1. Avoid print large string in error log If user load a unqualified large string, the all string will be saved in error log, so the error log is too big that can not be shown be using `show load warnings on "url"`. Err: `Got packet bigger than 'max_allowed_packet' bytes` 2. Remove duplicate help doc Do not allow doc with same title, or error thrown when starting FE: `java.lang.IllegalArgumentException: Multiple entries with same key:`	2022-03-11 17:23:47 +08:00
zhangstar333	e0ef9b8f6c	[refactor](vectorized) to_bitmap(-1) return NULL instead of return parse failed error_message (#8373 )	2022-03-11 17:21:47 +08:00
HappenLee	68dd799796	[improvement](vectorized) Support function tuple is null (#8442 )	2022-03-11 16:54:37 +08:00
yiguolei	7cfcddd8df	[fix] brpc will check required field in proto and need_gen_rollup is moved will throw exception (#8420 )	2022-03-11 00:28:33 +08:00
Pxl	bc923b8c63	[fix](vectorized) core dump on runtime filter insert because of block address change (#8429 )	2022-03-10 19:02:19 +08:00
Xinyi Zou	8eec4bf99d	[feature](thread-local) Add thread local variable ThreadContext (#7234 ) The thread context saves some info about a working thread. 1. thread_id: Current thread id, Auto generated. 2. type: The type is a enum value indicating which type of task current thread is running. For example: QUERY, LOAD, COMPACTION, ... 3. task id: A unique id to identify this task. maybe query id, load job id, etc. Using gcc11 compiles thread_local variable on lower versions of GLIBC will report an error, see https://github.com/apache/incubator-doris/pull/7911 This is very difficult to solve, so kudu Class-scoped static thread local implementation was introduced. Solve the above problem by Thread-scopedthread local + Class-scoped thread local. See the comments for ThreadContextPtr for details.	2022-03-10 09:05:40 +08:00
Zhengguo Yang	2a433136bc	[fix] fix compile error (#8410 ) * [fix] fix compile error	2022-03-09 18:55:37 +08:00
HappenLee	f4663ad2eb	[improvement](vectorized) Merge block in scanner to speed up query with conjunct (#8395 )	2022-03-09 13:11:18 +08:00
Tanya-W	51103dcf6e	[typo] translate the comments of delete_handler.cpp (#8402 )	2022-03-09 13:08:28 +08:00
yiguolei	d880559214	[refactor] remove old schema change code on BE (#8342 )	2022-03-09 13:05:44 +08:00
yiguolei	0ff7de4157	[refactor] remove agent status (#8273 ) There are 3 error code types in BE: OLAPStatus AgentStatus Status. It is very confused and sometimes conflict during write code. I will try to unify them to Status.	2022-03-09 13:04:50 +08:00
Pxl	10c3712aa1	[fix](vectorized) fix arithmetic calculate get wrong result(#8226 )	2022-03-09 13:03:57 +08:00
Mingyu Chen	826467e116	[fix](replica) handle replica version missing info to avoid -214 error (#8209 ) In the original tablet reporting information, the version missing information is done by combining two pieces of information as follows: 1. the maximum consecutive version number 2. the `version_miss` field The logic of this approach is confusing and inconsistent with the logic of checking for missing versions when querying. After the change, we directly use the version checking logic used in the query, and set `version_miss` to true if a missing version is found and on the FE processing side. Originally, only the bad replica information was syncronized among FEs, but not the version missing information. As a result, the non-master FE is not aware of the missing version information. In the new design, we deprecate the original log persistence class `BackendTabletsInfo` and use the new `BackendReplicasInfo` to record replica reporting information and write both bad and version missing information to metadata so that other FEs can synchronize these information.	2022-03-09 13:03:22 +08:00
Pxl	58e85375ca	[fix](vectorized) fix float to string inaccurate (#8392 )	2022-03-08 18:58:52 +08:00

1 2 3 4 5 ...

1845 Commits