doris

Author	SHA1	Message	Date
zuochunwei	9cb9781d86	[chore](storage) add STORAGE_LAYER_VECTORIZED_SWITCH (#8005 ) if you want test storage layer vectorized, you need modify some codes to let vectorized storage layer working, it's boring work. now, you can just change one code (redefine the macro STORAGE_LAYER_VECTORIZED_SWITCH as 1 or 0), this gets more convenient.	2022-02-19 11:47:36 +08:00
Zhengguo Yang	50864aca7d	[refactor] fix warings when compile with clang (#8069 )	2022-02-19 11:29:02 +08:00
zhangstar333	8892780091	[Vectorized][Feature] support agg function percentile&&percentile_approx (#8066 )	2022-02-18 13:42:24 +08:00
HappenLee	bcde1f265a	[Function][Vectorized] Support least/greast function (#8107 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-02-18 11:57:07 +08:00
HappenLee	68b24d608f	[fix] (vectorization)Fix nullable column compute the hash value error (#8105 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-02-18 11:20:47 +08:00
yiguolei	d383821fd5	[refactor] Remove unused code in data dir (#8092 )	2022-02-18 11:14:02 +08:00
HappenLee	31399d5876	[Bug][Vec] Fix the bug of coredump when vec exec engine with delete condition (#8109 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-02-18 11:09:05 +08:00
wangbo	b9f0b5565c	[refactor](storage) refactor some interfaces of storage layer column (#8064 ) 1 format binary plain 2 remove batch_set_null_bitmap 3 fix segiter return value 4 set insert_many_binary_data args	2022-02-18 10:54:51 +08:00
yinzhijian	936da4f10a	[feature](thread-pool) Support thread pool per disk for scanners (#7994 ) Support thread pool per disk for scanners to prevent pool performance from some high ioutil disks happening key point: 1. each disk has a thread pool for scanners 2. whenever a thread pool of one disk runs out of local work, tasks can be retrieved from other threads(disks). This is done round-robin. performance testing: vec version: 25% faster than single thread pool in a high io util disk test case normal version: 8% faster than single thread pool in a high io util disk test case	2022-02-18 09:40:58 +08:00
zuochunwei	a162f56284	(test) resolve unit test failed problem for VGenericIteratorsTest Co-authored-by: zuochunwei <zuochunwei@meituan.com>	2022-02-17 20:03:07 +08:00
awakeljw	bdd78f20c8	[Vectorized][HashJoin] Eliminate hashjoin branch prediction (#8051 ) Co-authored-by: jewisliu <jewisliu@tencent.com>	2022-02-17 19:00:26 +08:00
Pxl	e0dbf48682	[Vectorized] [AggFunction] Support group_concat (#8086 )	2022-02-17 14:19:07 +08:00
HappenLee	f6e2a4fe16	[Vectorized][Function] Support year/month/week/hour/mintue/day/second floor/ceil function (#8068 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-02-17 14:18:02 +08:00
zhangstar333	f8411f3c6a	[refactor](mysql_table_writer)split into two parts of vectorized and row mode (#8081 )	2022-02-17 11:29:25 +08:00
Mingyu Chen	26289c28b0	[fix](load)(compaction) Fix NodeChannel coredump bug and modify some compaction logic (#8072 ) 1. Fix the problem of BE crash caused by destruct sequence. (close #8058) 2. Add a new BE config `compaction_task_num_per_fast_disk` This config specify the max concurrent compaction task num on fast disk(typically .SSD). So that for high speed disk, we can execute more compaction task at same time, to compact the data as soon as possible 3. Avoid frequent selection of unqualified tablet to perform compaction. 4. Modify some log level to reduce the log size of BE. 5. Modify some clone logic to handle error correctly.	2022-02-17 10:52:08 +08:00
Pxl	f06c13a828	[feature](vec)(function) support function `convert_tz()` (#8060 )	2022-02-17 10:51:32 +08:00
HappenLee	bef1b55c1f	[feature][fix](vec)(function) Fix multi args function call the DATETIME type not effective in DATE type and add the alias function (#8050 ) 1. Support some function alias of mod/fmod, adddate/add_data 2. Support some function of multi args: week, yearweek 3. Fix bug of multi args function call the DATETIME type not effective in DATE type	2022-02-17 10:49:25 +08:00
spaces-x	53f22bbc14	[fix] fix incorrect serialized_size of TDigest object (#8046 )	2022-02-17 10:47:22 +08:00
yiguolei	d1cb2913c1	[improvement] check simd instructions before start (#8042 ) Sometimes BE is build on a machine with SIMD instruction such as AVX2. But the BE binary will be copied to a machine without AVX2. It will crashed without any error message. This PR will check the required SIMD instructions and print error messages during startup.	2022-02-17 10:46:03 +08:00
zhangstar333	0003822da7	[feature](vec) add ColumnHLL to support hll type (#7828 )	2022-02-17 10:44:42 +08:00
Pxl	143c4085ee	[Feature][Vectorized] support aggregate function ndv()/approx_count_distinct() (#8044 )	2022-02-16 14:30:13 +08:00
weizuo93	a6bf8c13eb	[Feature](Transaction) Support two phase commit (2PC) for stream load (#7473 ) The two phase batch commit means： During Stream load, after data is written, the message will be returned to the client, the data is invisible at this point and the transaction status is PRECOMMITTED. The data will be visible only after COMMIT is triggered by client. 1. User can invoke the following interface to trigger commit operations for transaction： curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:commit" \ http://fe_host:http_port/api/{db}/_stream_load_2pc or curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:commit" \ http://be_host:webserver_port/api/{db}/_stream_load_2pc 2.User can invoke the following interface to trigger abort operations for transaction： curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:abort" \ http://fe_host:http_port/api/{db}/_stream_load_2pc or curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:abort" \ http://be_host:webserver_port/api/{db}/_stream_load_2pc	2022-02-16 11:55:04 +08:00
zhangstar333	25d64775d1	[Vectorized][Feature] Support mysql external table insert into stm (#7979 )	2022-02-15 14:58:58 +08:00
Mingyu Chen	884fddbf33	[fix](compatibility) Fix compatibility issue of PRowBatch and some tablet sink bugs (#8000 ) 1. set both `tuple_offsets` and `new_tuple_offsets` in PRowBatch for compatibility 2. set FE config `repair_slow_replica` default to false Avoid impacting the load process after upgrading. Eg, if there are only 2 replicas, one is with high version count. After upgrade, that replica will be set to bad, so that the load process will be stopped because only 1 replica is alive. 3. Fix a bug that NodeChannel may be blocked at `close_wait()` Forget to set `add_batch_finish` flag after the last rpc finished. 4. Fix a NPE of RoutineLoadScheduler	2022-02-15 11:23:19 +08:00
yiguolei	a390b766d4	[Improvement] BE could print log foreground when not use daemon mode (#8031 )	2022-02-14 09:30:12 +08:00
yiguolei	aea3e4e59b	[refactor] Remove version hash from BE and related test in BE (#8027 )	2022-02-14 09:29:27 +08:00
Pxl	64f71ddae3	[fix](be-ut) fix segmentation fault at unaligned address int128 (#8021 )	2022-02-14 09:29:05 +08:00
Adonis Ling	18e2071278	[fix](be-unit-test) Fix memory problems in agg_test.cpp. (#8019 )	2022-02-14 09:23:40 +08:00
yiguolei	7d7e3a39f5	[refactor] Remove snapshot converter and unused Protobuf Definitions (#8026 ) 1. remove snapshot converter 2. remove unused protobuf definitions 3. move some macro as const variables	2022-02-12 16:06:04 +08:00
Pxl	b26e7e3c28	[feature](function)(vec) support locate function (#7988 ) * support function locate in vectorized engine * add ut and fix some bug	2022-02-12 16:00:37 +08:00
Pxl	64fb8dab39	[feature] (function)(vec) support pmod function (#7977 )	2022-02-12 16:00:11 +08:00
Zhengguo Yang	7a73645eee	[refactor] remove some unused code (#8022 )	2022-02-12 15:17:28 +08:00
yiguolei	6b9cb49779	[Refactor] remove plugin folder in be since it is useless and it need fPIC tag to build and we will remove all fPIC tag in the future (#8008 )	2022-02-12 12:28:14 +08:00
Pxl	a4e7c76336	[Enhancement] use std::search to replace custom search (#7999 )	2022-02-11 10:47:58 +08:00
wangyongfeng	690b3b7283	[doc] Translate the Chinese comments (#7982 ) Translate the Chinese comments of file /be/src/common/config.h	2022-02-10 15:08:45 +08:00
smallhibiscus	2e27827c73	[doc] Added http interface return example to obtain the specified table structure information (#7955 ) 1. Added http interface return example in table-schema-action.md. 2. Correct typos in the document in error.md. 3. Modify the content of the code comments in the text_converter.hpp file.	2022-02-10 15:07:28 +08:00
Zhengguo Yang	5029ef46c9	[fix] fix ltrim result may incorrect in some case (#7963 ) fix ltrim result may incorrect in some case according to https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html Built-in Function: int __builtin_cl/tz (unsigned int x) If x is 0, the result is undefined. So we handle the case of 0 separately this function return different between gcc and clang when x is 0	2022-02-09 13:06:37 +08:00
zuochunwei	db20e1f323	[refactor](storage) VGenericIterator to reuse Schema (#7858 ) 1. reuse Schema to avoid copying, because clone Schema will generate a lot of sub Field object 2. call interface provided by Block to reduce code lines	2022-02-09 13:06:03 +08:00
Pxl	0553ce2944	[feature](vectorization) support function topn && remove some unused code (#7793 )	2022-02-09 13:05:31 +08:00
Mingyu Chen	3048ce8a4f	[improvement][refactor](vec) Refactor serde of vec block and using brpc attachment (#7939 ) This PR mainly changes: 1. Change the define of PBlock The new PBlock consists of a set of PColumnMeta and a binary buffer. The PColumnMeta records the metadata information of all columns in the Block, while the buffer stores the serialized binary data of all columns. 2. Refactor the serialize/deserialize method of data type Rewrite the `serialize()/deserialize()` of IDataType. And also add a new method `get_uncompressed_serialized_bytes()` to get the total length of uncompressed serialized data of a column. 3. Rewrite the serialize/deserialize method of Block Now, when serializing a Block to PBlock, it will first get the total length of uncompressed serialized data of all columns in this Block, and then allocate the memory to write the serialized data to the buffer. 4. Use brpc attachment to transmit the serialized column data	2022-02-08 11:11:42 +08:00
HappenLee	ef233701b3	[feature](vec)(load) Support vtablet sink to enable insert into by using vec query engine (#7957 ) Support vtablet sink to enable insert into query in vec query engine	2022-02-08 11:04:09 +08:00
HappenLee	505acae931	[fix](vectorization) make sure the mem address use in agg is align in proper way before use (#7960 )	2022-02-08 10:05:03 +08:00
caoliang-web	8fcae0f0f4	[refactor] Modify the content of code comments (#7950 ) Co-authored-by: caol <caol@shuhaisc.com>	2022-02-08 09:55:46 +08:00
Zhengguo Yang	f8d086d87f	[feature](rpc) (experimental)Support implement UDF through GRPC protocol. (#7519 ) Support implement UDF through GRPC protocol. This brings several benefits: 1. The udf implementation language is not limited to c++, users can use any familiar language to implement udf 2. UDF is decoupled from Doris, udf will not cause doris coredump, udf computing resources are separated from doris, and doris services are not affected But RPC's UDF has a fixed overhead, so its performance is much slower than C++ UDF, especially when the amount of data is large. Create function like ``` CREATE FUNCTION rpc_add(INT, INT) RETURNS INT PROPERTIES ( "SYMBOL"="add_int", "OBJECT_FILE"="127.0.0.1:9999", "TYPE"="RPC" ); ``` Function service need to implement `check_fn` and `fn_call` methods Note: THIS IS AN EXPERIMENTAL FEATURE, THE INTERFACE AND DATA STRUCTURE MAY BE CHANGED IN FUTURE !!!	2022-02-08 09:25:09 +08:00
Adonis Ling	03f5fc2b0b	[chore] Make build setting BUILD_META_TOOL optional. (#7948 )	2022-02-07 16:06:42 +08:00
HappenLee	9eb1d1df27	[fix](vec) fix block mem use-after-free bug in agg table read (#7944 )	2022-02-06 00:34:38 +08:00
HappenLee	51abaa89f3	[fix](vec) Fix some bugs about vec engine (#7884 ) 1. mem leak in vcollector iter 2. query slow in agg table limit 10 3. query slow in SSB q4,q5,q6	2022-02-03 19:21:17 +08:00
Mingyu Chen	c0e59e59aa	[fix][refactor] fix bugs and refactor some code by lint (#7871 ) 1. Fix some `passedByValue` issues. 2. Fix some `dereferenceBeforeCheck` issues. 3. Fix some `uninitMemberVar` issues. 4. Fix some iterator `eraseDereference` issues. 5. Fix compile issue introduced from #7923 #7905 #7848	2022-02-01 14:31:14 +08:00
Mingyu Chen	82f421a019	[fix](brpc-attachment) Fix bug that may cause BE crash when enable `transfer_data_by_brpc_attachment` (#7921 ) This PR mainly changes: 1. Fix bug when enable `transfer_data_by_brpc_attachment` In `data_stream_sender`, we will send a serialized PRowBatch data to multiple Channels. And if `transfer_data_by_brpc_attachment` is enabled, we will mistakenly clear the data in PRowBatch after sending PRowBatch to the first Channel. As a result, the following Channel cannot receive the correct data, causing an error. So I use a separate buffer instead of `tuple_data` in PRowBatch to store the serialized data and reuse it in multiple channels. 2. Fix bug that the the offset in serialized row batch may overflow Use int64 to replace int32 offset. And for compatibility, add a new field `new_tuple_offsets` in PRowBatch.	2022-02-01 08:51:16 +08:00
zuochunwei	4e783afa7a	[feature] add Generic debug timer for debugging or profiling (#7923 ) add a group of debug-timer for the purpose of profiling or testing you can use these timers for custom meaning purpose unlike the specific named timer	2022-01-31 22:15:43 +08:00

1 2 3 4 5 ...

1729 Commits