doris

Author	SHA1	Message	Date
Mingyu Chen	7b3865b524	[fix](ut)(vectorized) fix a potential stack overflow bug and some unit test (#9140 )	2022-04-21 12:17:03 +08:00
Pxl	dda7604e16	[Bug][Storage-vectorized] fix code dump on outer join with not nullable column (#9112 )	2022-04-21 11:02:04 +08:00
camby	a2edc6fd8b	[feature-wip](array-type) replicate impl for ColumnArray to support join with array column (#9070 ) SQL with JOIN and columns ARRAY, will call function ColumnArray::replicate. At this pr, we implement replicate for ARRAY type, to support SQL like this: `SELECT count(lo_array),count(d_array),SUM(lo_extendedprice*lo_discount) AS REVENUE FROM lineorder, date WHERE lo_orderdate = d_datekey AND d_year = 1993 AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25;`	2022-04-20 14:50:34 +08:00
caiconghui	df3a8545dc	[fix](routine_load) Add retry mechanism for routine load task which encounter Broker transport failure (#9067 )	2022-04-20 14:49:58 +08:00
Adonis Ling	bd126f0679	[improvement] Refactor type info for further optimizations. (#8786 ) ## Design: For now, there are two categories of types in Doris, one is for scalar types (such as int, char and etc.) and the other is for composite types (array and etc.). For the sake of performance, we can cache type info of scalar types globally (unique objects) due to the limited number of scalar types. When we consider the composite types, normally, the type info is generated in runtime (we can also use some cache strategy to speed up). The memory thereby should be reclaimed when we create type info for composite types. There are a lots of interfaces to get the type info of a specific type. I reorganized those as the following describes. 1. `const TypeInfo* get_scalar_type_info(FieldType field_type)` The function is used to get the type info of scalar types. Due to the cache, the caller uses the result WITHOUT considering the problems about memory reclaim. 2. `const TypeInfo* get_collection_type_info(FieldType sub_type)` The function is used to get the type info of array types with just ONE depth. Due to the cache, the caller uses the result WITHOUT considering the problems about memory reclaim. 3. `TypeInfoPtr get_type_info(segment_v2::ColumnMetaPB* column_meta_pb)` 4. `TypeInfoPtr get_type_info(const TabletColumn* col)` These functions are used to get the type info of BOTH scalar types and composite types. The caller should be responsible to manage the resources returned. #### About the new type `TypeInfoPtr` `TypeInfoPtr` is an alias type to `unique_ptr` with a custom deleter. 1. For scalar types, the deleter does nothing. 2. For composite types, the deleter reclaim the memory. By analyzing the callers of `get_type_info`, these classes should hold TypeInfoPtr: 1. `Field` 2. `ColumnReader` 3. `DefaultValueColumnIterator` Other classes are either constructed by the foregoing classes or hold those, so they can just use the raw pointer of `TypeInfo` directly for the sake of performance. 1. `ScalarColumnWriter` - holds `Field` 1. `ZoneMapIndexWriter` - created by `ScalarColumnWriter`, use `type_info` from the field in `ScalarColumnWriter` 1. `IndexedColumnWriter` - created by `ZoneMapIndexWriter`, only uses scalar types. 2. `BitmapIndexWriter` - created by `ScalarColumnWriter`, uses `type_info` from the field in `ScalarColumnWriter` 1. `IndexedColumnWriter` - created by `BitmapIndexWriter`, uses `type_info` in `BitmapIndexWriter` and `BitmapIndexWriter` doesn't support `ArrayType`. 3. `BloomFilterIndexWriter` - created by `ScalarColumnWriter`, uses `type_info` from the field in `ScalarColumnWriter` 1. `IndexedColumnWriter` - created by `BloomFilterIndexWriter`, only uses scalar types. 2. `IndexedColumnReader` initializes `type_info` by the field type in meta (only scalar types). 3. `ColumnVectorBatch` 1. `ZoneMapIndexReader` creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `IndexedColumnReader` 2. `BitmapIndexReader` supports scalar types only and it creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `BitmapIndexReader` 3. `BloomFilterIndexWriter` supports scalar types only and it creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `BloomFilterIndexWriter`	2022-04-20 14:47:29 +08:00
zhannngchen	1b4cd76847	[feature](vectorized)(function) Support min_by/max_by function. (#8623 ) Support min_by/max_by on vectorized engine.	2022-04-20 14:46:19 +08:00
Mingyu Chen	869fdff2f0	[refactor] add reference path for source file from impala (#9115 ) According to the requirements of the APLv2, the referenced code needs to be marked with the path of the source code.	2022-04-20 12:29:57 +08:00
HappenLee	51db4e54c0	[fix](table-function) Fix bug of table function with outer join cause nullptr of tuple (#9041 )	2022-04-18 19:35:26 +08:00
shee	f3dce9a6c1	[fix](planner) fix is-null predicate in where statement cannot be pushed down to the storage layer (#9035 )	2022-04-18 19:35:02 +08:00
Pxl	681f960257	[fix](storage)(vectorized) query get wrong result when read datetime type column (#8872 )	2022-04-18 19:34:06 +08:00
chenlinzhong	afce993ca7	[feature](load)(csv) CSV import and export support header (#8765 ) - Add two new types to stream load boker load: csv_with_names and csv_with_name_sand_types - Add two new types to export: csv_with_names and csv_with_names_and_types	2022-04-18 15:29:18 +08:00
Mingyu Chen	9051ed7c7d	Revert "[Refactor] remove some useless code (#8976 )" (#9074 ) This reverts commit de7dce4df84fcbfbbaf715cbac151e802321f80f. Reverts apache/incubator-doris#8976 This cause BE ut failed: sh run-be-ut.sh --run --filter OlapTableSinkTest.* ``` ==62008==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x7ffff36867c0 in thread T0 ```	2022-04-18 12:01:14 +08:00
dataroaring	de7dce4df8	[Refactor] remove some useless code (#8976 )	2022-04-18 09:55:54 +08:00
hongbin	be0ba76dff	[Refactor] Use '#pragma once' to replace '#define' and '#endif' (#9062 )	2022-04-18 09:54:59 +08:00
hongbin	c71ffc01de	[Refactor] Cleanup some unused include (#9063 )	2022-04-18 09:52:31 +08:00
Luwei	d1d834694f	[fix] Fix bug of wrong argument of drop_tablet function (#9031 ) introduced from #8574	2022-04-15 15:19:28 +08:00
Amos Bird	0bf72caf68	[Bug][Vectorized] Fix UB when doing ORDER BY. (#9023 )	2022-04-15 14:02:29 +08:00
Pxl	f7a5ff4f1d	[Enhancement] [Storage Vectorize] optimize BitmapRangeIterator.next_range() (#9013 )	2022-04-15 11:27:03 +08:00
Mingyu Chen	579aee110a	[fix](ut)(compile) Fix BE compile bug and FE unit test (#9027 ) 1. The compile bug is introduced from #8855 2. FE ut bug is introduced from #8848 and #8770	2022-04-14 17:37:41 +08:00
zhangstar333	9ac6d23a44	[Feature]support stddev/variance agg functions to window function (#8962 )	2022-04-14 12:07:26 +08:00
Mingyu Chen	5e95d99925	[fix](load) fix bug of infinite loop in orc scanner (#9007 ) When encounter unqualified data, orc scanner may not be able to quit correctly.	2022-04-14 11:46:48 +08:00
yiguolei	e5e0dc421d	[refactor] Change ALL OLAPStatus to Status (#8855 ) Currently, there are 2 status code in BE, one is common/Status.h, and the other is olap/olap_define.h called OLAPStatus. OLAPStatus is just an enum type, it is very simple and could not save many informations, I will unify these code to common/Status.	2022-04-14 11:43:49 +08:00
dataroaring	8765881d8b	[fix](load) wait _send_batch_thread_pool_token rather than shutdown. (#8970 ) We can not shutdown _send_batch_thread_pool_token, because _packet_in_flight has to be clear finally. Otherwise a never ended join on rpc would happen. It is difficult to handle concurrent problem if a flag setter is not guaranteed to run.	2022-04-14 10:05:14 +08:00
camby	943b08bcdf	fix master compile error (#8992 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-04-13 11:23:37 +08:00
yiguolei	c872793a23	remove rowset converter since it is useless (#8974 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-04-13 10:40:12 +08:00
Zhengguo Yang	5881e8fdc6	[refactor] use c++ 14 deprecated instaed of comment, this detect usage of deprecated var or func at compile time (#8439 )	2022-04-13 10:19:04 +08:00
Zhengguo Yang	290366787c	[refactor] refactor code, replace some file with stl libs (#8759 ) 1. replace ConditionVariables with std::condition_variable 2. repalace Mutex with std::mutex 3. repalce MonoTime with std::chrono	2022-04-13 09:55:29 +08:00
Pxl	64cf64d1f8	remove unused code and opt int_div (#8966 )	2022-04-13 09:51:01 +08:00
camby	52d18aa83c	permute impl for column array; and codes format (#8949 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-04-13 09:47:54 +08:00
dataroaring	0c8ea8ce9f	[Vectorizd] Let VAssertRowNumNode handle return value of child->get_next (#8969 )	2022-04-12 19:56:03 +08:00
Zhengguo Yang	5a44eeaf62	[refactor] Unify all unit tests into one binary file (#8958 ) 1. solved the previous delayed unit test file size is too large (1.7G+) and the unit test link time is too long problem problems 2. Unify all unit tests into one file to significantly reduce unit test execution time to less than 3 mins 3. temporarily disable stream_load_test.cpp, metrics_action_test.cpp, load_channel_mgr_test.cpp because it will re-implement part of the code and affect other tests	2022-04-12 15:30:40 +08:00
Xinyi Zou	66d2f4e1fd	[fix][mem tracker] Fix MemTracker null pointer in vectorized (#8925 ) Fix ThreadMemTrackerMgr::update_tracker null pointer and some details. Issue Number: close #8920	2022-04-12 10:17:10 +08:00
Mingyu Chen	067309c466	[fix](compile) fix compilation bug (#8950 )	2022-04-11 13:12:34 +08:00
Pxl	8a066e2586	[fix](vectorized) core dump on ST_AsText (#8870 )	2022-04-11 09:39:32 +08:00
Mingyu Chen	8158b05ea0	[fix] Fix bug that tablet data size and row num info are failed to report. (#8945 ) Introduced from #8146	2022-04-11 09:38:28 +08:00
Gabriel	7f7172807f	[feature](function)(vectorized) Support all geolocation functions on vectorized engine (#8846 )	2022-04-11 09:36:53 +08:00
Gabriel	0d761f9909	[feature-wip][UDF][DIP-1] Support variable-size input and output for Java UDF (#8678 ) This feature is proposed in DSIP-1. This PR support variable-length input and output Java UDF.	2022-04-11 09:36:16 +08:00
zbtzbtzbt	6ed59bb98b	[refactor](code_style) remove useless inline #8933 1.Member functions defined in a class are inline by default (implicitly), and do not need to be added 2.inline is a keyword used for implementation, which has no effect when placed before the function declaration	2022-04-10 18:29:55 +08:00
yiguolei	1fe4ea4c7c	[Refactor-step1] Add OLAPInternalError to status (#8900 )	2022-04-10 00:16:43 +08:00
Lightman	5706679e08	[fix] fix the problem that using tsan to compile，BE will stack overflow when start (#8904 ) Currently TSAN can only be compiled using CLang, not GCC. And when compiling with -o0, stack overflow occurs at startup, issue #8868. A function definition will be reported missing at compile time, the file provided in PR #8665 is required.	2022-04-09 19:17:28 +08:00
HappenLee	ce6b5169c2	[fix](join) Fix error bucket num get in bucket shuffle join in dynamic partition (#8891 )	2022-04-09 19:11:44 +08:00
camby	c5718928df	[feature-wip](array-type) support explode and explode_outer table function (#8766 ) explode(ArrayColumn) desc: > Create a row for each element in the array column. explode_outer(ArrayColumn) desc: > Create a row for each element in the array column. Unlike explode, if the array is null or empty, it returns null. Usage example: 1. create a table with array column, and insert some data; 2. open enable_lateral_view and enable_vectorized_engine; ``` set enable_lateral_view = true; set enable_vectorized_engine=true; ``` 3. use explode_outer ``` > select * from array_test; +------+------+--------+ \| k1 \| k2 \| k3 \| +------+------+--------+ \| 3 \| NULL \| NULL \| \| 1 \| 2 \| [1, 2] \| \| 2 \| 3 \| NULL \| \| 4 \| NULL \| [] \| +------+------+--------+ > select k1,explode_column from array_test LATERAL VIEW explode_outer(k3) TempExplodeView as explode_column; +------+----------------+ \| k1 \| explode_column \| +------+----------------+ \| 1 \| 1 \| \| 1 \| 2 \| \| 2 \| NULL \| \| 4 \| NULL \| \| 3 \| NULL \| +------+----------------+ ``` 4. explode usage example. explode return empty rows while the ARRAY is null or empty ``` > select k1,explode_column from array_test LATERAL VIEW explode(k3) TempExplodeView as explode_column; +------+----------------+ \| k1 \| explode_column \| +------+----------------+ \| 1 \| 1 \| \| 1 \| 2 \| +------+----------------+ ```	2022-04-08 12:11:04 +08:00
Mingyu Chen	bd0a3369b7	[fix] check disk capacity before writing data (#8887 ) 1. We forgot to check disk capacity when writing data. 2. TODO: the user specified disk capacity is not used now. We need to find a way to use it. 3. Avoid print too much compaction log when there is not suitable version for compaction.	2022-04-08 11:29:49 +08:00
dataroaring	f854f0e83e	remove unreadable char in comment (#8909 )	2022-04-08 09:26:53 +08:00
Pxl	dbbc6549bd	[feature](vectorized) support vexplode_bitmap (#8890 )	2022-04-08 09:20:26 +08:00
Gabriel	3f04220d49	[typo] Fix typo in function.cpp (#8873 )	2022-04-08 09:09:19 +08:00
zbtzbtzbt	0b98d78664	[improvement](hll) Optimize Hyperloglog (#8829 ) In meituan, pr #6625 was revert due to the oom probleam. currently, we are trying to modify the old hyperloglog, based on pr #8555, we did some works. via some test, we find it better than old hll, and better than apache:master hll. Changes summary: - use SIMD max tp speed up heavy function _merge_registers - use phmap::flat_hash_set rather than std::set - replace std::max - other small changes	2022-04-08 09:06:08 +08:00
Xinyi Zou	519305cb22	[feature-wip] (memory tracker) (step4) Switch TLS mem tracker to separate more detailed memory usage (#8669 ) Based on #8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.	2022-04-08 09:02:26 +08:00
dataroaring	7fb4b6a6e2	[chore](tsan) add file mremap_fallback for tsan (#8665 )	2022-04-08 09:01:53 +08:00
caiconghui	d51545a952	[fix](ut)(memory-leak) Fix be asan ut failed and hdfs file reader memory leak (#8905 )	2022-04-08 00:07:00 +08:00

1 2 3 4 5 ...

1982 Commits