doris

Author	SHA1	Message	Date
chenlinzhong	c9961c9bb9	[style] clang-format all c++ code (#9305 ) - sh build-support/clang-format.sh to clang-format all c++ code	2022-04-29 16:14:22 +08:00
jacktengg	201cd207f9	[Enhancement][Vectorized] Improve hash table build efficiency (#9250 ) 1. MAP_POPULATE is missing for mmap in Allocator, because macro OS_LINUX is not defined in allocator.h; 2. MAP_POPULATE has no effect for mremap as for mmap, zero-fill enlarged memory range explicitly to pre-fault the pages	2022-04-29 14:26:33 +08:00
ZenoYang	ce7905e983	[fix](vectorized) Query get wrong result when ColumnDict concurrent predicate eval (#9270 )	2022-04-29 11:45:04 +08:00
ZenoYang	2fa19113ab	[fix](profile) Short-circuit and del predicate filter rows are not counted on vectorized exec (#9268 )	2022-04-29 10:45:48 +08:00
HappenLee	d330bc3806	[Vectorized](stream-load-vec) Support stream load in vectorized engine (#8709 ) (#9280 ) Implement vectorized stream load. Added fe configuration option `enable_vectorized_load` to enable vectorized stream load. Co-authored-by: tengjp@outlook.com Co-authored-by: mrhhsg@gmail.com Co-authored-by: minghong.zhou@163.com Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>	2022-04-29 09:50:51 +08:00
wangbo	48222f1fb0	[fix](storage)bloom filter support ColumnDict (#9167 ) bloom filter support ColumnDict(#9167)	2022-04-28 20:03:26 +08:00
xy720	2ec0b98787	[fix](routine-load) Fix bug that new coming routine load tasks are rejected all the time and report TOO_MANY_TASK error (#9164 ) ``` CREATE ROUTINE LOAD iaas.dws_nat ON dws_nat WITH APPEND PROPERTIES ( "desired_concurrent_number"="2", "max_batch_interval" = "20", "max_batch_rows" = "400000", "max_batch_size" = "314572800", "format" = "json", "max_error_number" = "0" ) FROM KAFKA ( "kafka_broker_list" = "xxxx:xxxx", "kafka_topic" = "nat_nsq", "property.kafka_default_offsets" = "2022-04-19 13:20:00" ); ``` In the create statement example below, you can see The user didn't specify the custom partitions. So that 1. Fe will get all kafka partitions from server in routine load's scheduler. The user set the default offset by datetime. So that 2. Fe will get kafka offset by time from server in routine load's scheduler. When 1 is success, meanwhile 2 is failed, the progress of this routine load may not contains any partitions and offsets. Nevertheless, since newCurrentKafkaPartition which is get by kafka server may be always equal to currentKafkaPartitions, the wrong progress will never be updated.	2022-04-27 23:21:17 +08:00
Xinyi Zou	26bc462e1c	[feature-wip] (memory tracker) (step5) Fix track bthread, fix track vectorized query (#9145 ) 1. fix track bthread - Bthread, a high performance M:N thread library used by brpc. In Doris, a brpc server response runs on one bthread, possibly on multiple pthreads. Currently, MemTracker consumption relies on pthread local variables (TLS). - This caused pthread TLS MemTracker confusion when switching pthread TLS MemTracker in brpc server response. So replacing pthread TLS with bthread TLS in the brpc server response saves the MemTracker. Ref: `731730da85/docs/en/server.md (bthread-local)` 2. fix track vectorized query - Added track mmap. Currently, mmap allocates memory in many places of the vectorized execution engine. - Refactored ThreadContext to avoid dependency conflicts and make it easier to debug. - Fix some bugs.	2022-04-27 20:34:02 +08:00
Zhengguo Yang	597115c305	[feature] add `SHOW TABLET STORAGE FORMAT` stmt (#9037 ) use this stmt to show tablets storage format in be, if verbose is set, will show detail message of tablet storage format. e.g. ``` MySQL [(none)]> admin show tablet storage format; +-----------+---------+---------+ \| BackendId \| V1Count \| V2Count \| +-----------+---------+---------+ \| 10002 \| 0 \| 2867 \| +-----------+---------+---------+ 1 row in set (0.003 sec) MySQL [test_query_qa]> admin show tablet storage format verbose; +-----------+----------+---------------+ \| BackendId \| TabletId \| StorageFormat \| +-----------+----------+---------------+ \| 10002 \| 39227 \| V2 \| \| 10002 \| 39221 \| V2 \| \| 10002 \| 39215 \| V2 \| \| 10002 \| 39199 \| V2 \| +-----------+----------+---------------+ 4 rows in set (0.034 sec) ``` add storage format infomation to show full table statment. ``` MySQL [test_query_qa]> show full tables; +-------------------------+------------+---------------+ \| Tables_in_test_query_qa \| Table_type \| StorageFormat \| +-------------------------+------------+---------------+ \| bigtable \| BASE TABLE \| V2 \| \| test_dup \| BASE TABLE \| V2 \| \| test \| BASE TABLE \| V2 \| \| baseall \| BASE TABLE \| V2 \| \| test_string \| BASE TABLE \| V2 \| +-------------------------+------------+---------------+ 5 rows in set (0.002 sec) ```	2022-04-27 10:53:43 +08:00
zhannngchen	87fc46f84c	update comments in run-be-ut.sh (#9092 )	2022-04-26 12:48:35 +08:00
SleepyBear	47a59c7fe6	[fix](OlapScanner)fix bitmap or hll's OOM when loading too many unqualified data (#9205 )	2022-04-26 10:25:56 +08:00
Pxl	951c2a90eb	[fix](Lateral-View)(Vectorized) core dump on lateral-view with nullable column (#9191 )	2022-04-26 10:24:11 +08:00
Userwhite	555cc0dfce	[fix] fix sequence bug in non-vec mode (#9184 )	2022-04-26 10:15:59 +08:00
Mingyu Chen	7cfebd05fd	[fix](hierarchical-storage) Fix bug that storage medium property change back to SSD (#9158 ) 1. fix bug described in #9159 2. fix a `fill_tuple` bug introduced from #9173	2022-04-26 10:15:19 +08:00
camby	88115ffcb3	[feature-wip](array-type) ArrayFileColumnIterator bug fix (#9114 )	2022-04-26 09:35:46 +08:00
yiguolei	3bdfcde8e8	[Improvement] not print logs to fe.out when fe is running under daemon mode (#9195 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-04-25 18:29:29 +08:00
Gabriel	b81f49b0d3	[BUG] fix compiling bug for java udf (#9161 )	2022-04-25 10:02:01 +08:00
SleepyBear	c3d0fee01b	[fix](broker load) sync the workflow of BrokerScanner to other Scanner to avoid oom (#9173 )	2022-04-25 10:01:42 +08:00
Pxl	2d83167e50	[Feature] [Lateral-View] support outer combinator of table function (#9147 )	2022-04-24 12:09:40 +08:00
pengxiangyu	e157c2c254	[feature-wip](remote-storage) step3: Support remote storage, only for be, add migration_task_v2 (#8806 ) 1. Add TStorageMigrationReqV2 and EngineStorageMigrationTask to support migration action 2. Change TabletManager::create_tablet() for remote storage 3. Change TabletManager::try_delete_unused_tablet_path() for remote storage	2022-04-22 22:38:10 +08:00
Zhengguo Yang	ae680b4248	[UDF] support RPC udaf part 1: support create RPC udaf in fe (#8510 )	2022-04-21 17:38:58 +08:00
Mingyu Chen	7b3865b524	[fix](ut)(vectorized) fix a potential stack overflow bug and some unit test (#9140 )	2022-04-21 12:17:03 +08:00
Pxl	dda7604e16	[Bug][Storage-vectorized] fix code dump on outer join with not nullable column (#9112 )	2022-04-21 11:02:04 +08:00
camby	a2edc6fd8b	[feature-wip](array-type) replicate impl for ColumnArray to support join with array column (#9070 ) SQL with JOIN and columns ARRAY, will call function ColumnArray::replicate. At this pr, we implement replicate for ARRAY type, to support SQL like this: `SELECT count(lo_array),count(d_array),SUM(lo_extendedprice*lo_discount) AS REVENUE FROM lineorder, date WHERE lo_orderdate = d_datekey AND d_year = 1993 AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25;`	2022-04-20 14:50:34 +08:00
caiconghui	df3a8545dc	[fix](routine_load) Add retry mechanism for routine load task which encounter Broker transport failure (#9067 )	2022-04-20 14:49:58 +08:00
Adonis Ling	bd126f0679	[improvement] Refactor type info for further optimizations. (#8786 ) ## Design: For now, there are two categories of types in Doris, one is for scalar types (such as int, char and etc.) and the other is for composite types (array and etc.). For the sake of performance, we can cache type info of scalar types globally (unique objects) due to the limited number of scalar types. When we consider the composite types, normally, the type info is generated in runtime (we can also use some cache strategy to speed up). The memory thereby should be reclaimed when we create type info for composite types. There are a lots of interfaces to get the type info of a specific type. I reorganized those as the following describes. 1. `const TypeInfo* get_scalar_type_info(FieldType field_type)` The function is used to get the type info of scalar types. Due to the cache, the caller uses the result WITHOUT considering the problems about memory reclaim. 2. `const TypeInfo* get_collection_type_info(FieldType sub_type)` The function is used to get the type info of array types with just ONE depth. Due to the cache, the caller uses the result WITHOUT considering the problems about memory reclaim. 3. `TypeInfoPtr get_type_info(segment_v2::ColumnMetaPB* column_meta_pb)` 4. `TypeInfoPtr get_type_info(const TabletColumn* col)` These functions are used to get the type info of BOTH scalar types and composite types. The caller should be responsible to manage the resources returned. #### About the new type `TypeInfoPtr` `TypeInfoPtr` is an alias type to `unique_ptr` with a custom deleter. 1. For scalar types, the deleter does nothing. 2. For composite types, the deleter reclaim the memory. By analyzing the callers of `get_type_info`, these classes should hold TypeInfoPtr: 1. `Field` 2. `ColumnReader` 3. `DefaultValueColumnIterator` Other classes are either constructed by the foregoing classes or hold those, so they can just use the raw pointer of `TypeInfo` directly for the sake of performance. 1. `ScalarColumnWriter` - holds `Field` 1. `ZoneMapIndexWriter` - created by `ScalarColumnWriter`, use `type_info` from the field in `ScalarColumnWriter` 1. `IndexedColumnWriter` - created by `ZoneMapIndexWriter`, only uses scalar types. 2. `BitmapIndexWriter` - created by `ScalarColumnWriter`, uses `type_info` from the field in `ScalarColumnWriter` 1. `IndexedColumnWriter` - created by `BitmapIndexWriter`, uses `type_info` in `BitmapIndexWriter` and `BitmapIndexWriter` doesn't support `ArrayType`. 3. `BloomFilterIndexWriter` - created by `ScalarColumnWriter`, uses `type_info` from the field in `ScalarColumnWriter` 1. `IndexedColumnWriter` - created by `BloomFilterIndexWriter`, only uses scalar types. 2. `IndexedColumnReader` initializes `type_info` by the field type in meta (only scalar types). 3. `ColumnVectorBatch` 1. `ZoneMapIndexReader` creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `IndexedColumnReader` 2. `BitmapIndexReader` supports scalar types only and it creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `BitmapIndexReader` 3. `BloomFilterIndexWriter` supports scalar types only and it creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `BloomFilterIndexWriter`	2022-04-20 14:47:29 +08:00
zhannngchen	1b4cd76847	[feature](vectorized)(function) Support min_by/max_by function. (#8623 ) Support min_by/max_by on vectorized engine.	2022-04-20 14:46:19 +08:00
Mingyu Chen	869fdff2f0	[refactor] add reference path for source file from impala (#9115 ) According to the requirements of the APLv2, the referenced code needs to be marked with the path of the source code.	2022-04-20 12:29:57 +08:00
HappenLee	51db4e54c0	[fix](table-function) Fix bug of table function with outer join cause nullptr of tuple (#9041 )	2022-04-18 19:35:26 +08:00
shee	f3dce9a6c1	[fix](planner) fix is-null predicate in where statement cannot be pushed down to the storage layer (#9035 )	2022-04-18 19:35:02 +08:00
Pxl	681f960257	[fix](storage)(vectorized) query get wrong result when read datetime type column (#8872 )	2022-04-18 19:34:06 +08:00
chenlinzhong	afce993ca7	[feature](load)(csv) CSV import and export support header (#8765 ) - Add two new types to stream load boker load: csv_with_names and csv_with_name_sand_types - Add two new types to export: csv_with_names and csv_with_names_and_types	2022-04-18 15:29:18 +08:00
Mingyu Chen	9051ed7c7d	Revert "[Refactor] remove some useless code (#8976 )" (#9074 ) This reverts commit de7dce4df84fcbfbbaf715cbac151e802321f80f. Reverts apache/incubator-doris#8976 This cause BE ut failed: sh run-be-ut.sh --run --filter OlapTableSinkTest.* ``` ==62008==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x7ffff36867c0 in thread T0 ```	2022-04-18 12:01:14 +08:00
dataroaring	de7dce4df8	[Refactor] remove some useless code (#8976 )	2022-04-18 09:55:54 +08:00
hongbin	be0ba76dff	[Refactor] Use '#pragma once' to replace '#define' and '#endif' (#9062 )	2022-04-18 09:54:59 +08:00
hongbin	c71ffc01de	[Refactor] Cleanup some unused include (#9063 )	2022-04-18 09:52:31 +08:00
Luwei	d1d834694f	[fix] Fix bug of wrong argument of drop_tablet function (#9031 ) introduced from #8574	2022-04-15 15:19:28 +08:00
Amos Bird	0bf72caf68	[Bug][Vectorized] Fix UB when doing ORDER BY. (#9023 )	2022-04-15 14:02:29 +08:00
Pxl	f7a5ff4f1d	[Enhancement] [Storage Vectorize] optimize BitmapRangeIterator.next_range() (#9013 )	2022-04-15 11:27:03 +08:00
Mingyu Chen	579aee110a	[fix](ut)(compile) Fix BE compile bug and FE unit test (#9027 ) 1. The compile bug is introduced from #8855 2. FE ut bug is introduced from #8848 and #8770	2022-04-14 17:37:41 +08:00
zhangstar333	9ac6d23a44	[Feature]support stddev/variance agg functions to window function (#8962 )	2022-04-14 12:07:26 +08:00
Mingyu Chen	5e95d99925	[fix](load) fix bug of infinite loop in orc scanner (#9007 ) When encounter unqualified data, orc scanner may not be able to quit correctly.	2022-04-14 11:46:48 +08:00
yiguolei	e5e0dc421d	[refactor] Change ALL OLAPStatus to Status (#8855 ) Currently, there are 2 status code in BE, one is common/Status.h, and the other is olap/olap_define.h called OLAPStatus. OLAPStatus is just an enum type, it is very simple and could not save many informations, I will unify these code to common/Status.	2022-04-14 11:43:49 +08:00
dataroaring	8765881d8b	[fix](load) wait _send_batch_thread_pool_token rather than shutdown. (#8970 ) We can not shutdown _send_batch_thread_pool_token, because _packet_in_flight has to be clear finally. Otherwise a never ended join on rpc would happen. It is difficult to handle concurrent problem if a flag setter is not guaranteed to run.	2022-04-14 10:05:14 +08:00
camby	943b08bcdf	fix master compile error (#8992 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-04-13 11:23:37 +08:00
yiguolei	c872793a23	remove rowset converter since it is useless (#8974 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-04-13 10:40:12 +08:00
Zhengguo Yang	5881e8fdc6	[refactor] use c++ 14 deprecated instaed of comment, this detect usage of deprecated var or func at compile time (#8439 )	2022-04-13 10:19:04 +08:00
Zhengguo Yang	290366787c	[refactor] refactor code, replace some file with stl libs (#8759 ) 1. replace ConditionVariables with std::condition_variable 2. repalace Mutex with std::mutex 3. repalce MonoTime with std::chrono	2022-04-13 09:55:29 +08:00
Pxl	64cf64d1f8	remove unused code and opt int_div (#8966 )	2022-04-13 09:51:01 +08:00
camby	52d18aa83c	permute impl for column array; and codes format (#8949 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-04-13 09:47:54 +08:00

1 2 3 4 5 ...

1910 Commits