doris

Author	SHA1	Message	Date
kangshisen	54aaa8a56a	[doc] update star-schema-benchmark.md (#8565 )	2022-03-22 11:42:10 +08:00
kangshisen	4335c07c35	[doc] update star-schema-benchmark.md (#8564 )	2022-03-22 11:41:45 +08:00
Jibing-Li	9a0a1c693e	[fix] fix NPE in thrift when forwarding stmt to master FE	2022-03-22 11:41:13 +08:00
Pxl	be3d203289	[feature][vectorized] support table function explode_numbers() (#8509 )	2022-03-22 11:38:00 +08:00
yiguolei	989e03ddf9	[improvement] Improve sig handler (#8545 ) * Refactor glog's default signal handler Co-authored-by: Zhengguo Yang <780531911@qq.com>	2022-03-22 10:40:31 +08:00
kangshisen	011985e7e3	fix en broker load (#8566 ) fix en broker load	2022-03-21 22:53:51 +08:00
caiconghui	905b9a6289	[fix](lru_cache) fix heap-use-after-free problem for lru cache(#8569 )	2022-03-21 21:23:43 +08:00
Mingyu Chen	04004021b5	[chore] Separate debugging information from BE binaries (#8544 ) Currently, the compiled output of BE mainly consists of two binaries: palo_be and meta_tool, which are both around 1.6G in size. However, the debug information is only needed for debugging purposes. So I separate the debug info from binaries. After BE is built, the debug info file will be saved in `be/lib/debug_info/` dir. `palo_be` and `meta_tool`'s size decrease to about 100MB This is optional, and default is disabled. To enable it, use: `STRIP_DEBUG_INFO=ON sh build.sh`	2022-03-21 16:33:01 +08:00
Zhengguo Yang	7c1c2b1d17	[chore] fix compile error when use clang as compiler and a be ut problem (#8554 )	2022-03-21 15:38:59 +08:00
yiguolei	337d174c14	[Refactor](schema_change) Remove tablet instances since tablet id is unique between base tablet and new schema change tablet (#8486 )	2022-03-21 12:43:54 +08:00
Zhengguo Yang	f06780249a	fix some fe ut failed (#8547 )	2022-03-21 10:36:06 +08:00
minghong	c772020db4	[fix] fix bug in WindowFunctionLastData::data, it keeps the first data not the last. (#8536 ) WindowFunctionLastData::add should keep the last value, but current implementation keeps the first one. Obviously, this code is copied from WindowFunctionFirstData::add.	2022-03-21 09:51:56 +08:00
Mingyu Chen	dde50fb2bf	[doc] change http to https in download page (#8546 )	2022-03-20 23:36:17 +08:00
Pxl	fc3ad371c8	[fix](vec) fix regexp_replace get wrong result on clang (#8505 )	2022-03-20 23:11:24 +08:00
Xinyi Zou	eeae516e37	[Feature](Memory) Hook TCMalloc new/delete automatically counts to MemTracker (#8476 ) Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G Implement a new way of memory statistics based on TCMalloc New/Delete Hook, MemTracker and TLS, and it is expected that all memory new/delete/malloc/free of the BE process can be counted.	2022-03-20 23:06:54 +08:00
Dongyang Li	276792daeb	[feature](benchmark) Add TPC-H benchmark tools (#8408 )	2022-03-20 23:06:10 +08:00
ZenoYang	2ec0b81030	[improvement](storage) Low cardinality string optimization in storage layer (#8318 ) Low cardinality string optimization in storage layer	2022-03-20 23:04:25 +08:00
Mingyu Chen	ed47e20eea	[license] Update license for thirdparties (#8537 )	2022-03-19 16:24:27 +08:00
jiafeng.zhang	f91d78bf8d	[doc] fix backup doc (#8529 )	2022-03-19 15:45:45 +08:00
Mingyu Chen	12bd967846	[doc] Fix some typo about spark load and broker load (#8520 ) 1. add hive-bitmap-udf link 2. modify preceding-filter	2022-03-19 15:45:17 +08:00
Mingyu Chen	ef852d6a26	[release] Add download link for flink/spark connector (#8535 ) Add Releases: 1. Flink Connector 1.0.3 2. Spark Connector 1.0.1	2022-03-19 15:44:35 +08:00
Zhengguo Yang	58a4c70fd4	[fix] fix String type comapaction or agg may crash when string is null (#8515 )	2022-03-18 11:27:28 +08:00
morrySnow	4da1718147	[fix] memory leak in ResourceTls (#8517 )	2022-03-18 09:42:19 +08:00
jiafeng.zhang	8765759a18	[doc] add flink 1.14 support (#8511 ) flink 1.14 support	2022-03-18 09:41:28 +08:00
yinzhijian	94991864f5	[fix] Fix bug that __set_ missing for thrift optional fields in be (#8507 )	2022-03-18 09:41:06 +08:00
Zhengguo Yang	035ca5240f	[fix] Fix may coredump when check if all rowset is beta-rowset of a tablet (#8503 ) core dump like ``` * Aborted at 1647468467 (unix time) try "date -d @1647468467" if you are using GNU date * PC: @ 0x5555576940b0 doris::OlapScanNode::start_scan_thread() * SIGSEGV (@0x84) received by PID 39139 (TID 0x7ffee8388700) from PID 132; stack trace: * @ 0x555558926212 google::(anonymous namespace)::FailureSignalHandler() @ 0x7ffff753d400 (unknown) @ 0x5555576940b0 doris::OlapScanNode::start_scan_thread() @ 0x555557696e1b doris::OlapScanNode::start_scan() @ 0x55555769737d doris::OlapScanNode::get_next() @ 0x5555570784f5 doris::PlanFragmentExecutor::get_next_internal() @ 0x55555707d24c doris::PlanFragmentExecutor::open_internal() @ 0x55555707e72f doris::PlanFragmentExecutor::open() @ 0x555556ffab95 doris::FragmentExecState::execute() @ 0x555556fff0ed doris::FragmentMgr::_exec_actual() @ 0x5555570088ec std::_Function_handler<>::_M_invoke() @ 0x55555719a099 doris::ThreadPool::dispatch_thread() @ 0x555557193a8f doris::Thread::supervise_thread() @ 0x7ffff72f2ea5 start_thread @ 0x7ffff76058dd __clone @ 0x0 (unknown) ```	2022-03-18 09:39:13 +08:00
Mingyu Chen	b07b840b76	[fix](load) fix bug that BE may crash when calling `mark_as_failed` (#8501 ) 1. The methods in the IndexChannel are called back in the RpcClosure in the NodeChannel. However, this callback may occur after the whole task is finished (e.g. due to network latency), and by that time the IndexChannel may have been destructured, so we should not call the IndexChannel methods anymore, otherwise the BE will crash. Therefore, we use the `_is_closed` variable and `_closed_lock` to ensure that the RPC callback function will not call the IndexChannel's method after the NodeChannel is closed. 2. Do not add IndexChannel to the ObjectPool. Because when deconstruct IndexChannel, it may call the deconstruction of NodeChannel. And the deconstruction of NodeChannel maybe time consuming(wait rpc finished). But the ObjectPool will hold a SpinLock to destroy the objects, so it may cause CPU busy.	2022-03-18 09:38:16 +08:00
BrightHewei	ac9acc8e9d	[fix](sample)(cpp) fix the condition of breaking for loop in function (#8497 )	2022-03-18 09:37:48 +08:00
dataroaring	25cdd0be1a	[refactor] CalcPageLenForRow return void rather than always Status::Ok (#8490 ) Thus we can remove branches depending on CalcPageLenForRow.	2022-03-18 09:34:49 +08:00
caiconghui	8470455e0a	[fix](tablet-report) Fix bug that tabletReport function of ReportHandler in fe may throw NullPointerException due to transaction check logic (#8481 )	2022-03-18 09:31:51 +08:00
morrySnow	70fbb3b55c	[test] support run regression test with out load data (#8499 ) skip load data in regression test to avoid load large dataset every time	2022-03-17 10:10:31 +08:00
Pxl	a8af8d2981	[fix](vectorized) fix core dump on get_json_string and add some ut (#8496 )	2022-03-17 10:08:31 +08:00
Zhengguo Yang	848acec584	[chore](dependency) update Croaring for good performance (#8492 ) update Croaring for good performance, according to RoaringBitmap/CRoaring#320	2022-03-17 10:07:55 +08:00
ZenoYang	b537e06ecd	[improvement](vectorized) Make bloom filter predicate run short-circuit logic (#8484 ) The current BloomFilter runs vectorization predicate evaluate, but `evaluate_vec` interface is not implemented, so the RuntimeFilter does not play a role after it is pushed down to the storage layer. And BF predicate computation cannot be automatically vectorized, thus making BloomFilter run short-circuit logic. For SSB Q2.1，`enable_storage_vectorization = true;` ``` test before impl: - Total: 36s164ms - RowsVectorPredFiltered: 0 - RealRuntimeFilterType: bloomfilter - HasPushDownToEngine: true test after impl: - Total: 2s345ms - RowsVectorPredFiltered: 595.247102M (595247102) - RealRuntimeFilterType: bloomfilter - HasPushDownToEngine: true ```	2022-03-17 10:07:30 +08:00
Arthur Yang	30d8089b2f	[fix](partition_cache) Fix Partition Cache NullPointerException bug (#8454 ) Filter the partitions in predicate but not in OlapTable.	2022-03-17 10:04:49 +08:00
Pxl	a824c3e489	[feature](vectorized) support lateral view (#8448 )	2022-03-17 10:04:24 +08:00
dataroaring	aadfbcb9c8	[test] support order qt for sql file and fix exception (#8483 ) We need order some sql's results to get steady output.	2022-03-16 17:09:23 +08:00
wangbo	b8e6c3a00c	[fix] fix bitmap wrong result (#8478 ) Fix a bug when query bitmap return wrong result, even the simplest query. Such as ``` CREATE TABLE `pv_bitmap_fix2` ( `dt` int(11) NULL COMMENT "", `page` varchar(10) NULL COMMENT "", `user_id_bitmap` bitmap BITMAP_UNION NULL COMMENT "" ) ENGINE=OLAP AGGREGATE KEY(`dt`, `page`) COMMENT "OLAP" DISTRIBUTED BY HASH(`dt`) BUCKETS 2 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2" ) Insert any hundreds of rows of data select count(distinct user_id_bitmap) from pv_bitmap_fix2 the result is wrong ``` This is a bug of vectorization of storage layer.	2022-03-16 11:39:41 +08:00
HappenLee	d39c021d71	[fix] min function of not null varchar column get error result (#8479 )	2022-03-16 11:38:55 +08:00
camby	3ba4de0d27	[fix](ut) fix some UT compile or run failed cases (#8489 )	2022-03-16 11:38:35 +08:00
Mingyu Chen	2252ff81d7	[fix](dynamic-partition) fix bug that can not set dynamic_partition.replication_allocation property (#8471 )	2022-03-15 11:45:18 +08:00
caiconghui	c666eaadfd	[fix] Fix some mistakes for ReadWriteLock in be (#8464 )	2022-03-15 11:45:00 +08:00
mklzl	30eff9d6e9	[improvement] Update ShowExecutor.java (#8462 ) we have some engines like mysql,olap,es,hive and so on , we should add more details for show engines	2022-03-15 11:44:36 +08:00
dataroaring	c1a195421a	[test] let framework support sql cases and run cases in parallel and random order (#8460 ) We generate groovy files from sql cases and run the generated groovy file. This way, we can just put sql cases, then framework handles left work.	2022-03-15 11:44:08 +08:00
zhannngchen	febfe2f09d	[improvement](ut) add unit tests for min/max function, and cleaned up some unused code (#8458 )	2022-03-15 11:43:18 +08:00
Gabriel	7d1d45d6dc	[feature-wip](udf) support java udf in FE (#8437 ) First step to support Java UDF in Doris. After this PR, we can create Java UDF in doris. For example, we create Java UDF function by code below. ``` CREATE FUNCTION test_udf(int) RETURNS int PROPERTIES ( "file"="file:///root/hive-udf-1.0-SNAPSHOT.jar", "symbol"="udf.Main", "type"="JAVA_UDF" ) ``` 1. `file` indicate where user file is. 2. `symbol` for java udf means udf class in this jar. 3. `type` indicate this function is a java udf.	2022-03-15 11:42:39 +08:00
wunan1210	571f0b688d	[improvment] show export support label like (#8202 ) using `show export where label like 'xxx%'` to list more results.	2022-03-15 11:41:59 +08:00
HappenLee	41a15ccd45	[fix](vectorized) Agg/Unique not null column outer join coredump (#8461 )	2022-03-14 10:52:17 +08:00
caiconghui	991dc7fc5c	[fix][routine-load] fix bug that routine load cannot cancel task when append_data return error (#8457 )	2022-03-14 10:18:14 +08:00
Kang	e807e8b108	[improvement](memory) fix olap table scan and sink memory usage problem (#8451 ) Due to unlimited queue in OlapScanNode and NodeChannel, memory usage can be very large for reading and writing large table, e.g 'insert into tableB select * from tableA'.	2022-03-13 22:12:15 +08:00

1 2 3 4 5 ...

4118 Commits