doris

Author	SHA1	Message	Date
ZhangYu0123	d4688620e9	[opt](array) optimize array_sortby using qsort instead of bubble sort #18311	2023-04-03 17:10:51 +08:00
Gabriel	368a2f7ace	[Bug](decimal) Fix string to decimal (#18282 )	2023-04-03 15:30:48 +08:00
Xin Liao	6677841b7e	[fix](merge-on-write) fix that failed to capture_consistent_rowsets when revise tablet meta (#18283 ) Should modify _timestamped_version_tracker firstly before capture_consistent_rowsets when update delete bitmap in revise_tablet_meta.	2023-04-03 13:02:34 +08:00
Liqf	961f5d1bb7	[feature](function)Add St_Angle/St_Azimuth function (#18293 ) Add St_Angle/St_azimuth function： St_Angle： Enter three point, which represent two intersecting lines. Returns the angle between these lines. Point 2 and point 1 represent the first line and point 2 and point 3 represent the second line. The angle between these lines is in radians, in the range [0, 2pi). The angle is measured clockwise from the first line to the second line. ` mysql> SELECT ST_Angle(ST_Point(1, 0),ST_Point(0, 0),ST_Point(0, 1)); +----------------------------------------------------------------------+ \| st_angle(st_point(1.0, 0.0), st_point(0.0, 0.0), st_point(0.0, 1.0)) \| +----------------------------------------------------------------------+ \| 4.71238898038469 \| +----------------------------------------------------------------------+ 1 row in set (0.04 sec) ` St_azimuth： Enter two point, and returns the azimuth of the line segment formed by points 1 and 2. The azimuth is the angle in radians measured between the line from point 1 facing true North to the line segment from point 1 to point 2. ` mysql> SELECT st_azimuth(ST_Point(0, 0),ST_Point(1, 0)); +----------------------------------------------------+ \| st_azimuth(st_point(0.0, 0.0), st_point(1.0, 0.0)) \| +----------------------------------------------------+ \| 1.5707963267948966 \| +----------------------------------------------------+ 1 row in set (0.04 sec)	2023-04-03 13:01:59 +08:00
Pxl	e77833bfa1	[Bug](materialized-view) fix where clause persistence replay incorrect (#18228 ) fix where clause persistence replay incorrect	2023-04-03 12:49:01 +08:00
zhangstar333	94e3472050	[bug](function) fix count equal function return incorrect value (#18200 ) fix count equal function return incorrect value	2023-04-03 11:20:36 +08:00
TengJianPing	7cd8f7c9ba	[fix](grouping) fix coredump of grouping function for outer join (#18292 ) Result of functions grouping and grouping_id is always not nullable, but outer join will convert the result column to nullable when necessary, which will cause mismatch of column type and column object when executing unctions grouping and grouping_id.	2023-04-03 09:35:31 +08:00
Xin Liao	b66e9f8906	[fix](load) handle null map right in OlapDataConvertor (#18236 ) The offset of _nullmap and _value are inconsistent in OlapDataConvertor, so the obtained null flag is incorrect when calling get_ data_ at function. When the key column or sequence column has null values, the encoding of the short key index or primary key index may be wrong. This was introduced by #10883 #10925.	2023-04-03 09:14:05 +08:00
Xinyi Zou	4b914c196a	[fix](expr pushdown) Fix VRuntimeFilterWrapper cannot get children #18289	2023-04-03 09:09:52 +08:00
Yongqiang YANG	419aa4f12a	[fix](thrift_server) do not check started state in ThriftServer::join (#18314 ) started may be set to false when server thread is stopped.	2023-04-02 19:24:41 +08:00
slothever	97aab138aa	[fix](parquet-reader) reset value idx in bool rle decoder and support iceberg datetime(3) (#18245 ) 1. Fix value idx in bool rle decoder 2. Iceberg table support datetimev2(3). In the previous version, we converted hive timestamp to datetimev2(0) default.	2023-04-01 21:00:01 +08:00
Xinyi Zou	5e7ea5e305	[fix](memory) Fix `bthread_setspecific` log fatal on UBSAN build (#18274 )	2023-03-31 19:46:53 +08:00
Mingyu Chen	7e61a85331	[refactor](libhdfs) introduce hadoop libhdfs (#18204 ) 1. Introduce hadoop libhdfs 2. For Linux-X86 platform, use the hadoop libhdfs 3. For other platform, use libhdfs3, because currently we don't have hadoop libhdfs binary for other platform Co-authored-by: adonis0147 <adonis0147@gmail.com>	2023-03-31 18:41:39 +08:00
yiguolei	a77921d767	[refactor](typesystem) remove unused rpc common file and using function rpc (#18270 ) rpc common is duplicate, all its method is included in function rpc. So that I remove it. get_field_type is never used, remove it. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-31 18:13:25 +08:00
Jerry Hu	22a705543b	[fix](string_ref) Incorrect result caused by the improperly comparing of StringRef on macOS with Apple silicon or using non-avx2 #18264 On macOS systems with Apple silicon, the '==' operator of StringRef uses string_compare, which takes StringRef as a C-String with null-terminated chars.	2023-03-31 15:11:11 +08:00
Xin Liao	c3e2269c4c	[fix](merge-on-write) fix that missed rows don't match merged rows for base compaction (#18262 )	2023-03-31 15:06:51 +08:00
yiguolei	1027abe0d3	[enhancement](query exec) should print error status when query meet error (#18247 ) If BE is in heavy load, the query may failed, but BE will try to connect to FE using thrift, if FE is also in heavy load the thrift connection will failed. And the status is rewritten at line 342, and the actual failure reason for the query is lost. Should print the error status every time during update. Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-31 14:08:24 +08:00
yongkang.zhong	1c2f95b887	[improve](clickhouse jdbc) support clickhouse jdbc 4.x version (#18258 ) In clickhouse's 4.x version of jdbc, some UInt types use special Java types, so I adapted Doris's ClickHouse JDBC External ``` com.clickhouse.data.value.UnsignedByte; com.clickhouse.data.value.UnsignedInteger; com.clickhouse.data.value.UnsignedLong; com.clickhouse.data.value.UnsignedShort; ```	2023-03-31 13:40:10 +08:00
gitccl	20b3bdb000	[vectorized](function) support array_first_index function (#18175 ) mysql> select array_first_index(x->x+1>3, [2, 3, 4]); +-------------------------------------------------------------------+ \| array_first_index(array_map([x] -> x(0) + 1 > 3, ARRAY(2, 3, 4))) \| +-------------------------------------------------------------------+ \| 2 \| +-------------------------------------------------------------------+ mysql> select array_first_index(x -> x is null, [null, 1, 2]); +----------------------------------------------------------------------+ \| array_first_index(array_map([x] -> x(0) IS NULL, ARRAY(NULL, 1, 2))) \| +----------------------------------------------------------------------+ \| 1 \| +----------------------------------------------------------------------+ mysql> select array_first_index(x->power(x,2)>10, [1, 2, 3, 4]); +---------------------------------------------------------------------------------+ \| array_first_index(array_map([x] -> power(x(0), 2.0) > 10.0, ARRAY(1, 2, 3, 4))) \| +---------------------------------------------------------------------------------+ \| 4 \| +---------------------------------------------------------------------------------+	2023-03-31 12:51:29 +08:00
Pxl	307170030c	[Bug](materialized-view) fix core dump when create mv have case different with base table (#18206 ) fix core dump when create mv have case different with base table	2023-03-31 12:32:09 +08:00
zhangstar333	1b2aaab2f2	[vectorized](bug) fix some case in enable fold constant (#17997 ) fix some case in enable fold constant	2023-03-31 11:41:31 +08:00
zclllyybb	f800ba8f4c	[Exec](opt) Optimize function call for const columns (#18212 )	2023-03-31 11:36:21 +08:00
lihangyu	35bae25568	[Improve](row store) add more profile info in log for point query and make row column page size more configurable (#18181 ) save about 20% FE cpu cost for point query with prepared statement which table contains 100 columns	2023-03-31 10:58:59 +08:00
camby	7d92bf095a	[fix](expr) refractor create_tree_from_thrift to avoid stack overflow (#18214 )	2023-03-31 10:38:20 +08:00
Kang	4e1e0ce06d	[bugfix](topn) fix topn optimzation wrong result for NULL values (#18121 ) 1. add PassNullPredicate to fix topn wrong result for NULL values 2. refactor RuntimePredicate to avoid using TCondition 3. refactor using ordering_exprs in fe and vsort_node	2023-03-31 10:01:34 +08:00
HappenLee	8be43857ef	[feature](executor) Add memory limit for pip_scanner_context (#18238 ) Co-authored-by: wangbo <506340561@qq.com>	2023-03-31 09:36:57 +08:00
Xinyi Zou	e5793249cd	[opt](hashtable) Modify default filled strategy to 75% (#18242 )	2023-03-31 09:28:11 +08:00
lihangyu	e0f6083e73	[refactor](dynamic table) add `get_type_as_tprimitive_type` and `get_type_as_primitive_type` in IDataType to get `PrimitiveType` and `TPrimitiveType` (#18260 )	2023-03-31 09:03:06 +08:00
Ashin Gau	d6b0fe9072	[feature](jni) jni table scanner framework (#17960 ) A framework that read data from jni scanner, which can support the data source from java ecosystem(java API). ## Java Interface Java scanner should extends `org.apache.doris.jni.JniScanner`, implements the following methods: ``` // Initialize JniScanner public abstract void open() throws IOException; // Close JniScanner and release resources public abstract void close() throws IOException; // Scan data and save as vector table public abstract int getNext() throws IOException; ``` See demo usage in `org.apache.doris.jni.MockJniScanner` ## c++ interface C++ reader should use `doris::JniConnector` to get data from `org.apache.doris.jni.JniScanner`. See demo usage in `doris::MockJniReader`. ## Pushed-down predicates Java scanner can get pushed-down predicates by `org.apache.doris.jni.vec.ScanPredicate`. ## Remaining works: 1. Implement complex nested types. 2. Read hudi MOR table as the end-to-end demo usage.	2023-03-30 23:47:45 +08:00
HappenLee	1d2dbe7898	[Bug][Pipeline] Run clickbench dead lock in pipeline exec engine (#18211 ) In pipeline exec engine run clickbench may dead lock in some query	2023-03-30 21:41:57 +08:00
Mingyu Chen	1050df7076	[fix](fs) fix local file system copy bug (#18243 ) `copy_dirs` has a bug that will cause infinity iteration	2023-03-30 21:36:07 +08:00
amory	ea41d94582	[Improve](complex-type) Support Count(complexType) (#17868 ) Support count function for ARRAY/MAP/STRUCT type	2023-03-30 15:43:32 +08:00
huanghaibin	e3bd812887	[fix](stream-load) find line delimiter in csv should start with no offset (#18161 ) when loading big file with multi bytes line delimiter, some line record maybe incomplete because of _output_buf_limit, so this incomplete data will move to the beginning of the output buf and read more data into output buf. In this case, find line delimiter should start with no offset to avoid a bug that spilt two lines as one line.	2023-03-30 14:42:34 +08:00
Gabriel	b7af110f61	[Bug](bloomfilter) Fix bloom filter for date type (#18205 )	2023-03-30 14:15:06 +08:00
zhangstar333	525f15dddf	[vectorized](function) support array_sortby function (#18071 )	2023-03-30 11:07:49 +08:00
TengJianPing	9877143210	[fix](like) fix wrong result of like pattern with backslash (#18039 ) Result is empty for query select * from person where address like '%\\\\%';, but MySQL can get a line of result. CREATE TABLE `person` ( `id` int(11) NULL, `name` text NULL, `age` int(11) NULL, `class` int(11) NULL, `address` text NULL ) ENGINE=OLAP UNIQUE KEY(`id`) COMMENT 'OLAP' DISTRIBUTED BY HASH(`id`) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2", "disable_auto_compaction" = "false" ); insert into person values (10001,'test1',30,2,'test\\\\,xxx'); Adding logs: select * from person where address like '%\\\\%'; I0323 10:26:15.907760 2387043 like.cpp:558] arg str: %\\%, size: 4, pattern LIKE_ENDS_WITH_RE: (?:%+)(((\\%)\|(\\_)\|([^%_]))+), size: 30 I0323 10:26:15.907789 2387043 like.cpp:562] match 0: \\%, size: 3 I0323 10:26:15.907801 2387043 like.cpp:562] match 1: \%, size: 2 I0323 10:26:15.907811 2387043 like.cpp:562] match 2: \%, size: 2 I0323 10:26:15.907821 2387043 like.cpp:562] match 3: , size: 0 I0323 10:26:15.907830 2387043 like.cpp:562] match 4: \, size: 1 I0323 10:26:15.907842 2387043 like.cpp:615] search_string : \\% I0323 10:26:15.907855 2387043 like.cpp:619] search_string escape removed: \% It matchs against the LIKE_ENDS_WITH_RE which is wrong, the meaning of the sql should be: match strings that have one backslash in any place.	2023-03-30 11:05:09 +08:00
yiguolei	a1114d46e8	[refactor](unify type system) remove switch case in histogram helper (#18222 )	2023-03-30 10:54:08 +08:00
Lijia Liu	2ee1468576	[improvement](executor) Support task group schedule in pipeline engine (#17615 )	2023-03-30 10:49:50 +08:00
Adonis Ling	f9c4542d04	[chore](build) Porting to Clang-16 (#18196 ) This PR ports the codebase to Clang-16. Upgrade some third-party libraries: 1. Apache BRPC: 1.2.0 -> 1.4.0 (Some bugs are fixed and all patches for 1.2.0 can be removed.) 2. Boost: 1.73.0 -> 1.81.0 (Porting to Clang-16) 3. libclucene: 2.4.6 -> 2.4.8 (Porting to Clang-16)	2023-03-30 10:36:29 +08:00
yiguolei	3094815f8f	[enhancement](profile) add blocks produced profile to track if output block is very small (#18217 )	2023-03-30 09:51:03 +08:00
Xinyi Zou	01d012bab7	[fix](memory) Remove page cache regular clear, disabled jemalloc prof by default (#18218 ) Remove page cache regular clear Now the page cache is turned off by default. If the user manually opens the page cache, it can be considered that the user can accept the memory usage of the page cache, and then can consider adding a manual clear command to the cache. fix memory gc cancel top memory query jemalloc prof is not enabled by default	2023-03-30 09:39:37 +08:00
TengJianPing	3b04d42779	[fix](bitmap) fix bug: orthogonal_bitmap_union_count coredump when arg is nullable (#18182 ) Query cause be cordump: select ORTHOGONAL_BITMAP_UNION_COUNT( cast(null as bitmap)) from t;	2023-03-30 09:31:58 +08:00
yiguolei	21895abfe7	[bugfix](buffercontrolblock) many query becomes very slow in 1.2.3 (#18229 ) predicate in wait for is wrong, should not check is cancelled. VDataBufferSender (dst_fragment_instance_id=-39f306bf41e3bafb--5dc95f12d4afdcdb): - AppendBatchTime: 7s50ms - ResultRendTime: 7s5ms - TupleConvertTime: 41.829ms - NumSentRows: 38.114K (38114) Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-30 08:54:38 +08:00
Xinyi Zou	6964d9f99c	[fix](function) resubmit-fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17907 ) * Revert "[fix](function) fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17420)" This reverts commit 397cc011c4f1ba5a25c770258c13f1cd3f28b47d. * [fix-resubmit](function) fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17420) ECB algorithm, block_encryption_mode does not take effect, it only takes effect when init vector is provided. Solved: 192/256 supports calculation without init vector For other algorithms, an error should be reported when there is no init vector Initialization Vector. The default value for the block_encryption_mode system variable is aes-128-ecb, or ECB mode, which does not require an initialization vector. The alternative permitted block encryption modes CBC, CFB1, CFB8, CFB128, and OFB all require an initialization vector. Reference: https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_aes-decrypt Note: This fix does not support smooth upgrades. during upgrade process, query may report error: funciton not found	2023-03-29 21:13:01 +08:00
Xinyi Zou	d9fe5f7b67	[enhancement](memory) Remove MemPool and replace it with Arena (#17820 ) Arena can replace MemPool in most scenarios. Except for memory reuse, MemPool supports reuse of previous memory chunks after clear, but Arena does not. Some comparisons between MemPool and Arena: 1. Expansion Arena is less than 128M index 2 alloc chunk; more than 128M memory, allocate 128M * n > `size`, n is equal to the minimum value that satisfies the expression; MemPool less than 512K index 2 alloc chunk, greater than 512K memory, separately apply for a `size` length chunk After Arena applied for a chunk larger than 128M last time, the minimum chunk applied for after that is 128M. Does this seem to be a waste of memory? MemPool is also similar. After the chunk of 512K was applied for last time, the minimum chunk of subsequent applications is 512K. 2. Alignment MemPool defaults to 16 alignment, because memtable and other places that use int128 require 16 alignment; Arena has no default alignment; 3. Memory reuse Arena only supports `rollback`, which reuses the memory of the current chunk, usually the memory requested last time. MemPool supports clear(), all chunks can be reused; or call ReturnPartialAllocation() to roll back the last requested memory; if the last chunk has no memory, search for the most free chunk for allocation 4. Realloc Arena supports realloc contiguous memory; it also supports realloc contiguous memory from any position at the time of the last allocation. The difference between `alloc_continue` and `realloc` is: 1. Alloc_continue does not need to specify the old size, but the default old size = head->pos - range_start 2. alloc_continue supports expansion from range_start when additional_bytes is between head and pos, which is equivalent to reusing a part of memory, while realloc completely allocates a new memory MemPool does not support realloc, but supports transferring or absorbing chunks between two MemPools 5. check mem limit MemPool checks the mem limit, and Arena checks at the Allocator layer. 6. Support for ASAN Arena does something extra 7. Error handling MemPool supports returning the error message of application failure directly through `Status`, and Arena throws Exception. Tests that Arena can consider 1. After the last applied chunk is larger than 128M, the minimum applied chunk is 128M, which seems to waste memory; 2. Support clear, memory multiplexing; 3. Increase the large list, alloc the memory larger than 128M, and the size is equal to `size`, so as to avoid the current chunk not being fully used, which is wasteful. 4. In some cases, it may be possible to allocate backwards to find chunks t	2023-03-29 20:56:49 +08:00
Pxl	0c01df6bb2	[Bug](view) fix AES_ENCRYPT have wrong result on view (#18034 )	2023-03-29 10:49:39 +08:00
Pxl	664fbffcba	[Enchancement](table-function) optimization for vectorized table function (#17973 )	2023-03-29 10:45:00 +08:00
zhannngchen	4d5f93b343	[log](load) print detailed error message when publish failed (#18176 ) W0326 22:46:21.081120 30803 engine_publish_version_task.cpp:215] failed to publish version. rowset_id=0200000000010b38154fe7bb7ca4314a18698cb6a0efc9a3, tablet_id=63390785, txn_id=43714869 The log missed detailed error message	2023-03-29 09:14:47 +08:00
Mingyu Chen	05db6e9b55	[refactor](file-system)(step-2) remove env, file_utils and filesystem_utils (#18009 ) Follow #17586. This PR mainly changes: Remove env/ Remove FileUtils/FilesystemUtils Some methods are moved to LocalFileSystem Remove olap/file_cache Add s3 client cache for s3 file system In my test, the time of open s3 file can be reduced significantly Fix cold/hot separation bug for s3 fs. This is the last PR of #17764. After this, all IO operation should be in io/fs. Except for tests in #17586, I also tested some case related to fs io: clone concurrency query on local/s3/hdfs load error log create and clean disk metrics	2023-03-29 09:00:52 +08:00
Ashin Gau	a813ad56ad	[fix](multi-catalog) key and value columns of map are normal column type (#18160 ) PR(#17330) has changed the column type of kay and value from array to normal column, but orc&parquet reader still cast to array column, resulting in cast error.	2023-03-28 23:11:40 +08:00

1 2 3 4 5 ...

4174 Commits